Thomas Hornschuh

  • Content count

  • Joined

  • Last visited

  • Days Won


Community Reputation

1 Neutral

About Thomas Hornschuh

  • Rank

Profile Information

  • Gender Male
  • Location Wiesloch, Germany
  • Interests RISC-V, FPGA
  1. Hi all, sorry for the long delay since my last post. I was distracted by a few other things, in addition it took the German Telekom two weeks to get the upgrade of my internet connection to VDSL working. Finally I have now 50/10Mbit instead of 3/0.5Mbit , so it was worth the trouble. Attached to this post is a Bitstream with the working Bonfire SoC for the Papilio Pro. It boots into a monitor program, which allows some basic operation of the board. Connection speed is 500000 Baud per default. If this is a problem, I can also provide bitstreams with other default baud rates. It should print a message like this: Bonfire Boot Monitor 0.2d MIMPID: 0001000e MISA: 40001100 UART Divisor: 11 UART Revision 00000012 Uptime 0 sec SPI Flash JEDEC ID: 001720c2 The monitor supports the following commands: D <address>: dump memory, it will always dump 64 32Bit words, starting per default with address 1000000. Without entering a address the dump command will automatically dump the next 64 words X <load adr> <max size in hex>: Download a file with xmodem-crc protocol and load to <load adr>. Default load address is 100000. When no size is specified it will load the whole file in case it fits into the DRAM. Normally it is sufficent to just enter x without arguments. It has been tested to work with minicom under Linux G <address> jumps to <address> (default is again hex 1000000 when ommited, can be used to start a program downloaded with the X command E print xmodem error status. Shows the status of the last xmodem download. T test DRAM. Makes a simple (destructive) pattern test of the DRAM. When running the bitstream the first time it is best to use this command to check that everything is fine. B change baudrate. The user will be prompted for a the new baudrate. Every value between 300 and 500000 is allowed, no check further check is done, so it is possible to enter baudrates like 2423 :-) I re-display the boot message with some system info W: Write boot image. Writes the image downloaded with the X command to the flash ROM. It will write a 4KB header to flash offset 512KB, and then the image data directly behind it. The command can only be executed directly after a X command, because it will take the size of the downloaded file to determine the size of the image. In addition the X command that the heap "sbrk" address to the first free address after the downloaded code, this address is also written to the flash header. R: run boot image. Will run an image written with the W command. The second attachment is a compiled binary of my eLua ( implementation for RISC-V (source is on To run it, download it with the X command into RAM and start with G (both commands works with their default parameter). To permanently add it to flash do the following Reset the Papilio Pro with the reset button (or reload the bistream, in case you don't like to program the bitstream to flash Download with the X command Write to Flash with the W command From now on you can start eLua after boot just with "R" command >r Reading Header ...OK Boot Image found, length 339968 Bytes, Break Address: 00063b00 ...OK Heap: 00062eb0 .. 007ef7ff eLua for Bonfire SoC 1.0a __virt_timer_period 1920000 eLua v0.9_bonfire_RV32IM-7-g7996f83 Copyright (C) 2007-2013 eLua# You can enter help to get a command help... Tipp: From the eLua# promt run: lua /rom/life.lua for a demo of the game of live in Lua. It runs 50 iterations and prints then the runtime: ---------O---------------O------ ----------O--------------------- -----OO---------O-O------------- ----OO---------OO-OO------------ ---OO--O-OO-OOO---O-O----O------ --OO--OO-O---O------O---O-O----- ---O---OOO---O-----O----O-O----- ----OOO------O-O---------O------ -----OOO----OO-OO--------------- ------------O-O--------------OO- -------OO-------------------O--O ---------OO------------------OO- --------O--O-------------------- ---------O-O-------------O------ ----------O-------------O-O----- ------------------------O-O----- Life - generation 50, mem 22.8 kB Execution time 16.903 sec (16903.39) ms eLua# Enjoy and please give me feedback if you like it. Regards Thomas monitor.bit elua_lua_bonfire_papilio_pro.bin
  2. Good to hear, that it may not be worth the effort. My idea to ease the implementation was to switch from Wishbone "incrementing burst" to e.g. "Wrap-8" mode and just start with the offset of the access triggering the miss. So if for example the initial miss is at offset 4 the burst will be 4-5-6-7-0-1-2-3. The line offset counter would wrap-around automatically anyway. Nevertheless the hit determination would need additonal logic to determine validity for single words in the cache line. Indeed the RISC-V ISA spec exactly specifies this approach as simplest way of branch prediction. The code RISC-V gcc gnerates also seems to obey this rule. The RISC-V spec itself tries to be micro-architecture agnostic, but the code generator of a compiler of course cannot be. For example the code generator assumes that the processor has a barrel-shifter and shifts are cheap: Masking upper bits of a word (e.g converting int to char) is done with a shift left/shift right pair with the number of bits to shift. This was already a discussion on some of the RISC-V workshops/presentations. The RISC-V inventors at UCB focus mainly on designing a Linux-capable 64-Bit processor comparable with ARM Cortex-A series designs (without the "bloat" of course). In the community there are more designs which are focused more on Microcontroller class processors. One example is PicoRV. Thomas
  3. I think you mean lxp32_icache.vhd? Basically this is outdated. The original lxp32 design (which I use as base for Bonfire) has no real cache, it is more a 256Byte prefetch buffer. When used with large prefetch_size values it has a very negative impact on data access performance with single port RAMs like external SDRAM: It blocks the bus until prefetch is finished. I tried to solve this with monitoring the dbus_cyc and aborting the prefetch. It didn't have a noticeable effect. Finally I decided to build a real direct mapped cache It still contains the dbus_cyc signal, but it is not used. Actually I like this cache because it is clean, easy to understand and only consumes 20 slices + RAM. It also has a few drawbacks: When the cache line to be accessed changes there is a one clock penalty because of the tag RAM access The tag RAM is only updated when the full cache line is read, therefore the cache miss latency is always the time for reading the full cache line The second topic is something I like to change at some time but it has no high priority yet. I think adding a data cache and a branch prediction will help more... Still the repo needs some cleanup, there are unused files and also I changed the name from wildfire to bonfire because I saw more potentially conflicting other users of the wildfire name compared to bonfire. But the old name is still used partly Thomas
  4. Hi Not if you use the embedded multipliers. Those are slow (never managed to get a 32x32 to work above ~105MHz or so). I hope with the 4 stage mutiplier clock can be higher. Of course the mult instructions now take 4 clocks instead of 2. It also consumes less LUTs than the original design. Definitely:-) Data cache is the hard part compared to code. The boot monitor is in the bitfile, added with data2mem, currently I have 32Kb for it, the final version should be smaller. The 2nd stage in then loaded from a fixed address in flash to DRAM or downloaded with XModem. The second stage should implement a file system (e.g. SPIFFS). Currently my boot monitor has also flash write command. So initalisation of the flash is done with first xmodem download and then write to flash. It automatically writes the downloaded number of sectors to flash, and also a small header with information about the size. To simplify testing the boot loader implements a small subset of Linux type syscalls.It uses the same ABI as the RISC-V spike simulator (proxy kernel). So I can execute programs compiled for Spike
  5. Currently I'm using gcc. There is a LLVM port going on by Alex Bradburry from lowrisc (, I recenlty spoke with Alex at an event in Munich. They made great progress, LLVM is able to pass 90% of the gcc torture tests right now. They also in the process of upstreaming both the gcc and the llvm ports. The llvm port will now also support RV32, the preliminary port on the website only supports RV64. My design currently qualifies for 100Mhz. I think I can quite easily reach about ~130Mhz, currently the limitations are more in some not so optimal code in the SoC (e.g. the 32KB BRAM for the bootloader is organized with 16*2K*32Bit blocks wich is not the best way to organize it, but it helps to run the same setup in simulation and on hardware quickly...). The whole system (with UART, SPI interface and DRAM controller) uses 60% of the LX9 slices, the CPU itself 743 slices. It can go down to less than 500 slices if the M extension (Mul and div) is removed and some of the privilege mode things (e.g. 64Bit cycle counters...). The RISC-V privilege mode is not very FPGA friendly, because the CSR registers are allocated in a spares 12Bit address space, consuming a lot of comparators and muxes to implement. Running completly in Block RAM I reach 0,67DMIPS/Mhz, in DRAM it reaches only 0,35DMIPS/Mhz. Main reason is that I don't have a data cache implemented yet, only instruction cache. I will upload the bitstream and the binaries for eLua and dhrystone soon so you can easily test it :-) Thomas
  6. Hi all, over the last half year I have implemented a processor and surrounding SoC bringing the RISC-V ISA ( to the Papilio Pro. It implements the 32Bit integer subset (RV32IM). The project is hosted on Gitub ( It still needs some additional documentation, cleanup and ready-to-run ISE projects to make it easy reproducable for others. But I post this link now, to find out if anybody is interested in my work. I will soon also post a bitstream here so anybody with access to a Papilio Pro can play with it. I have also ported eLua to it @Jack: If you like I can also present the project in the GadgetFactory blog. Regards Thomas
  7. My concern is not so much the stability of the ESP8266 I just woud like to get some information (preferably from Jack) about the power budget left for wings on the 3,3V rail. According do the datasheet ESP draws about 180mA when sending (in 802.11b mode, in newer modes it is even less). My measurements confirm that. If there are very short peaks with more is possible, but this may influence stability but no damage the switch regulators.
  8. Hi, I'm currently conneting an ESP8266 WiFi module to the Papilo Pro. I tried to find a specification about the max. current the 3,3V rail can deliver. According to the data sheet of the LTC3419 converter it can deliver 600mA. My measurements show that the ESP8266 draws around 190mA average current. I'm aware that also the USB port itself is limited, but this is not my question at the moment. I think the SPARTAN-& draws most of its current at the 1,2V rail, so the 3,3V rail must power the other chips (SDRAM, Flash, FTDI), so I think the addtional 200mA is ok. Thomas
  9. Are there still SDR-SDRAMs on the market? DDR will be difficult with a soft memory controller, and SRAM is small and expensive... I'm currently working on a project which implements th RISC-V ISA (see and runs a port of eLua to this architecture on a Papilio Pro. It is making great progress, I will post more information here in the forum in the next days. I'm already thinking about moving to another board (e.g. Pipistrello, Arty). But in some way I like the PaPro a lot: It is "publishing friendly" because I don't need a Xilinx CoreGen generated core to access the DRAM. With ISE a synthesis run for the LX9 usually takes only a few minutes including map, place and route, so it is possible to quick check design changes in Hardware As long a design fits in the LX9 it is really a convenient platform. And Xilinx has recently announced that they continue supporting the Spartan 6 series because of their success. Thomas
  10. Hi, I have seen the pro is out of stock, and also can't be preordered. Will it be in produced again or is the product retired? Regards Thomas
  11. Hi, I just try to find the Linux Download. When pressing Download button I get three files to select. Tried the zip. It contains a papilio loader that I can start with Linux. But it did not connect to the board. It is not in sync with the how-to: There is no linux installer, sript in the download and also not any makefile in the papilio-prog directory. The fdti kernel drivers are loading, I can access the serial port of the board with a terminal program. Thomas
  12. Hi all, I have now published the project on GitHub:
  13. Hi all, finally I managed to finalize my work in a way that I can upload it. The bit files are my extended version of SOCZ80 (I named my variant "RETRO80"). The "Serialboot" bitfile contains the orginal ROM Monitor from Will which interacts with the serial console (the baud rate is 115200 bit/sec fixed). The video RAM is initalized with a test pattern (with initalizing the Block RAM by VHDL code), so it is easy to check if VGA is working. The "consoleboot" bitfile contains a ROM Monitor which uses the VGA Port and the PS/2 "A" port of the arcade megawing as conole. Unfortunately the keyboard layout is german at the moment... Having a boot image is much more usefull. To get the boot image onto the system you can either use the method described in Wills readme.txt, or as fast alternative merge it at address 0x200000 to the bitfile and upload it. For merging bitfiles there are different tools, I used papillo-prog. The command is papilio-prog.exe -v -f retro80Serialboot20160604.bit -b ..\bscan_spi_xc6slx9.bit -a 200000:retro80_200.image Of course you must adjust the pathes to the structure on your system. My example was from a start directly in the directory where papilio-prog resides after intallationof the Papilio tools. When everything is loaded to the Papillio Pros flash chip and you restart the board (best with power cycling) you should see the boot monitor prompt either on your terminal emulator or the VGA display. Then you enter rread 200 200 rboot 200 Now CP/M is booted, the provided image will always boot CP/M on the serial console. On this console you can enter "MPMLDR", this will fire up MP/M II running console 0 on PS/2 / VGA, console 1 on serial port. The disk image contains a lot, e.g. Turbo Pascal and Wordstar. Please note that all this programs are patched for using the ADM3A screen sequences of the VGA console. So they will not work with the usual VT100 emulation of e.g. putty or TerraTerm. There is a which can be used with VT100. On user 6 there is an adapted versio of the startrek game. You can start it with: user 6 startrek If you want also the CP/M part work in VGA/PS/2 you can patch the disk image: First reset the SOC with the reset button on the Arcade MegaWing (this will only reset the CPU/board not reload the bitfile and will therefore not harm the DRAM contents) Boot CP/M Enter sysload bootvga.bin a: press "C" to continue (the sysload program is missing a message saying this ....) Reset the system again. rboot 200 should now boot to the VGA PS/2 console. To make the change permnent you need to enter rwrite 200 200 in the ROM Monitor. I will post a link to my bitbucket Repository soon. Regards Thomas retro80Serialboot20160604.bit retro80consoleboot20160620.bit retro80_200.image
  14. Hi Alan, I have already downloaded your fork a while ago to look into it. SD Card is one of the things on my list. I'm struggling a bit with the fact, that there are not enough free I/O pins anymore when using the Arcade MegaWing. There is exaclty one pin missing, I'm considering of just cutting one of the joystick pins - I'm personally not interested in Games CP/M 3.0 is much newer than MP/M II and one advantage is indeed that it can do the blocking/deblocking and there is no need for doing it in the BIOS. BTW I also did some closer look in the timing violations of SOCZ80, I think you mentioned it a while ago in this thread. With the Timing analysis tools in the ISE "PlanAhead" tool it is not that hard. I'm afraid the reason is a 14 level deep logic path in the T80 CPU which cannot be changed without completly redesigning the core. On the LX9 the T80 core simply cannot run faster then ~85 Mhz. That it works with 128Mhz simply shows that the Xilinx device seem to have a lot of overclocking margin. 85Mhz on the other hand is close to the clock range where Hamsters DRAM controller becomes unreliable. A "clean" design of SOCZ80 would need to work with DRAM clock separated from CPU clock. I have already take a look into the design of the cache, it should be not to difficult to separte the clock domains in the cache controller. Regards Thomas
  15. Hi Jack, thanks for your offer. Hope on the weekend I have time to upload everything. Thomas