alvieboy Posted October 21, 2012 Report Share Posted October 21, 2012 I just uploaded a small video demoing Linux running inside ZPUino (simulator). Things are really going smooth. Now, hardware must be slightly adapted in order to achieve a "decent" performance.ZPUino running Linux 3.4.0 (uclinux/Linux MMU-less) with uClibc 0.9.29 and busybox 1.20.2.AlvieThis post has been promoted to an article Quote Link to comment Share on other sites More sharing options...
hamster Posted October 21, 2012 Report Share Posted October 21, 2012 Wow! Quote Link to comment Share on other sites More sharing options...
alvieboy Posted October 22, 2012 Author Report Share Posted October 22, 2012 Perhaps you want to help me with this dual-ported data cache It's the only thing missing in HW. Quote Link to comment Share on other sites More sharing options...
hamster Posted October 23, 2012 Report Share Posted October 23, 2012 HUmmm... I have just seen something else to distract me... http://boingboing.net/2012/10/11/game-of-life-with-floating-poi.htmlCaches have always been a mystery for me...actually content addressable memory in general. If you want one that works I'm not sure I'm the person for the job.. Quote Link to comment Share on other sites More sharing options...
alvieboy Posted October 23, 2012 Author Report Share Posted October 23, 2012 Actually the implementation is simpler. The cache will be direct-mapped, 1-way associative. So only one simple "tag memory" is needed. Quote Link to comment Share on other sites More sharing options...
ben Posted October 23, 2012 Report Share Posted October 23, 2012 maybe you can have a look at the amber ARMv2 core, in OpenCores. There's a cache in there -- not that I really looked into it, though, it might just be rubbish for you. Quote Link to comment Share on other sites More sharing options...
alvieboy Posted November 5, 2012 Author Report Share Posted November 5, 2012 All caches I found so far do not fit the design, unfortunately. Good thing is I have now a perfect idea of how the cache will work, and how to adapt all system to use the caches - this is complex, because the CPU has four external interfaces (two to RAM, one to ROM, and one to IO), and all of them have concurrent accesses (R/W). Add the DMA engine to this, and you'll see how much stress is put on the RAM chip.Plus, add three pipeline stages, that can go "busy" individually, and then add cache "miss", IO delay and other things that cause the CPU to wait for data - it's a pain to keep everything synchronized. Add an extra delay to cache write (it takes 2 clock cycles, if successful), and note that almost all CPU operations cause writes - still we need to accomplish everything close to a 1-cycle delay for each instruction. Man, believe me, this is complex...The ROM is also complex, cause it does not shadow copies into RAM, so we have to do that in software (hw is also possible, but would kill timing).I hope to have some more time this and next week to work on this. Quote Link to comment Share on other sites More sharing options...
Tb_ Posted November 5, 2012 Report Share Posted November 5, 2012 Usually cache design is often as complex than the processor itself. And some time even more.On most cpu, misscache simply freeze the core until the cache get the data.Doing so you preserve all internal synchronization in the pipeline until cache line is fill with the requiere data without additional complexity.My 2c Thomas Quote Link to comment Share on other sites More sharing options...
alvieboy Posted November 25, 2012 Author Report Share Posted November 25, 2012 Ok, dcache design is moving well, working on cache flush right now. So far, "hitting" a cache line gives you a 1-cycle delay read on both cache ports, and a 0-cycle delay for writes (only one port is writeable though).Tb_: well, simple freezing the pipeline is not a good approach. Sometimes theory and practice do not entirely cooperate Freezing the pipeline is only possible if the pipeline is no-delay, which is not the case - every pipeline stage can go busy individually. One example of this is when we for example have an instruction cache miss, and the instruction fetch unit is refilling the instruction cache. There's no logic halting this process because we had a write miss two stages later. Plus, some of these processes assume constant delays in some operations, and delaying those becomes problematic in some scenarios.Alvie Quote Link to comment Share on other sites More sharing options...
alvieboy Posted December 16, 2012 Author Report Share Posted December 16, 2012 Ok, for the 1st time Linux is actually booting in Papilio pro. There's no "init" there yet, which is the 1st application linux runs, but this also means that the Linux Kernel is perfectly working. Note that this is a multi-stage boot - the 1st "application" run is "ZbFLT", which is an arduino sketch that loads FLAT binaries. The 2nd application (ZLinux loader) is also a sketch, which is converted to FLAT and loaded by the 1st one. The latter one then loads the Linux Kernel from a SD card. The kernel starts and then fails to completely boot cause there is no "userspace" applications yet on the SD card.ZbFLT loader v1.0, © 2012 Alvaro LopesLoading 'loader.bflt'...Loaded .text=0x007f2c00 .data=0x007fd150 .bss=0x007fdd84, starting...ZLinux loader v1.0, © 2012 Alvaro LopesLoading linux: ............................................... OK.Starting the kernel, sp 0x007ffff3, pc 0x00001008Linux version 3.4.0-uc0 (alvieboy@della) (gcc version 3.4.2) #718 PREEMPT Sun Dec 16 16:00:50 WET 2012bootconsole [earlyconsole0] enabledZPU: setting fast paths for interrupt and syscallCPU: ZPUino Running at 96.000 MHz.Physical memory: 00000000-00800000Reserved memory: 00000000-00000fff: Bootloader 00001000-001265ba: Kernel code 001265bb-001c3827: Kernel dataNode 0: start_pfn = 0x0, low = 0x800Node 0: mem_map starts at 001c5000Built 1 zonelists in Zone order, mobility grouping off. Total pages: 2032Kernel command line: root=/dev/mmcblk0p1 rootwait console=ttySZ0 init=/bin/shPID hash table entries: 1024 (order: 0, 4096 bytes)Dentry cache hash table entries: 1024 (order: 0, 4096 bytes)Inode-cache hash table entries: 1024 (order: 0, 4096 bytes)Memory: 6264k/6264k available (1109k kernel code, 1928k reserved, 78k data, 64k init)SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1NR_IRQS:32ZPU: setup timer OKzpu_clockevent: irq 0, 96.000 MHzCalibrating delay using timer specific routine.. 197.86 BogoMIPS (lpj=989322)pid_max: default: 4096 minimum: 301Mount-cache hash table entries: 512bio: create slab <bio-0> at 0Switching to clocksource zpuino_countermsgmni has been set to 16Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)io scheduler noop registeredio scheduler deadline registered (default)gpiochip_add: registered GPIOs 0 to 127 on device: zpuino_gpio.1ZPUino GPIO driver registered, 128 pinsSerial: ZPUINO UART driverzpuino_uart.2: ttySZ0 at MMIO 0x8800000 (irq = 1) is a ZPUino UARTconsole [ttySZ0] enabled, bootconsole disabledconsole [ttySZ0] enabled, bootconsole disabledZPUINO: UART at 0x8800000, irq 1brd: module loadedloop: module loadedRegistering ZPUino SPI driverZPUino: probing for SPI controllerzpuino_spi zpuino_spi.0: at 0x0B000000ZPUino. SPI controller initialized 004a4000mousedev: PS/2 mouse device common for all micemmc_spi spi0.0: SD/MMC host mmc0, no DMA, cd pollingWaiting for root device /dev/mmcblk0p1...mmc0: new SDHC card on SPImmcblk0: mmc0:0000 SD4GB 3.67 GiB mmcblk0: p1VFS: Mounted root (vfat filesystem) readonly on device 179:1.Freeing init memory: 64K (1000 - 11000)Failed to execute /bin/sh. Attempting defaults...Kernel panic - not syncing: No init found. Try passing init= option to kernel. See Linux Documentation/init.txt for guidance.Call trace: [<00014b6f>] panic+0x63/0x13c [<000110c1>] match_dev_by_uuid+0x0/0x2f [<00001996>] kernel_init+0x93/0x9c [<00017a97>] do_exit+0x0/0x251 Quote Link to comment Share on other sites More sharing options...
hamster Posted December 16, 2012 Report Share Posted December 16, 2012 Superb! What a nice early Christmas present for you! Quote Link to comment Share on other sites More sharing options...
alex Posted December 16, 2012 Report Share Posted December 16, 2012 Wow well done Alvie, it looks like you're 99% there. This will be quite epic when you finally get a login prompt Is this running on the early P pro with the SRAM or the current release version P pro with SDRAM? Quote Link to comment Share on other sites More sharing options...
Jack Gassett Posted December 16, 2012 Report Share Posted December 16, 2012 Man! This is awesome!!! It's so, so close; it's going to be an amazing feeling once you see that login prompt after all of that hard work you've put into this!Alex, this is on the Papilio Pro board with 64Mb SDRAM. The 1Mb SRAM on the Papilio Plus was not enough to do much with linux.I'm thinking that the MegaWing I made that was meant to be the second version of the Arcade MegaWing is going to be the perfect companion to this. I need to build up a couple more boards, one for Alvie and one for a manufacturing prototype. Then we can get the ball rolling on a nice MegaWing to go along with linux! Quote Link to comment Share on other sites More sharing options...
alex Posted December 17, 2012 Report Share Posted December 17, 2012 Have you thought about an HDMI MegaWing instead of VGA? Most people would have a junk monitor supporting VGA but we can't be far behind having junk monitors with HDMI inputs Quote Link to comment Share on other sites More sharing options...
alvieboy Posted December 17, 2012 Author Report Share Posted December 17, 2012 Wow well done Alvie, it looks like you're 99% there. This will be quite epic when you finally get a login prompt Is this running on the early P pro with the SRAM or the current release version P pro with SDRAM?Hi alex,It's running on current version with SDRAM.Hope to get busybox running today (it was running already in simulator, but I need to check if the cache flush is working properly for userspace applications).Alvie Quote Link to comment Share on other sites More sharing options...
Jack Gassett Posted December 17, 2012 Report Share Posted December 17, 2012 Have you thought about an HDMI MegaWing instead of VGA? Most people would have a junk monitor supporting VGA but we can't be far behind having junk monitors with HDMI inputs Yes, but it will take several months to make a prototype and test it. The existing design is tested and ready for manufacturing.Jack. Quote Link to comment Share on other sites More sharing options...
alvieboy Posted December 23, 2012 Author Report Share Posted December 23, 2012 Ok, guys, I'm not very fan of copy/paste, but here's a copy of what I just posted to ZPU mailing list:http://mail.zylin.com/pipermail/zylin-zpu_zylin.com/2012-December/001913.htmlHi guys,Since it's almost Christmas it's perhaps time to get you all updated about ZPUino, what has been done and accomplished so far, what is being done right now, andwhat future holds.The ZPUino project started back in 2010 and published first alpha release in December the same year. The objective of the project was to implement an Arduino(wiring) compatible platform, but running with a ZPU core and devices similar to those present on Arduino AVR devices. The project developed in several phasesand with several hardware versions for each phase. It started by a simple SoC using the traditional ZPU core, and with some basic devices like UART and SPI. Asoftware bootloader/programmer was also implemented, using the standard serial port and a variant (very variant) of HDLC protocol for communication withprogrammer devices - ZPUino was designed to bootstrap its "sketches" from an external SPI flash, and logic for programming those flash devices was split betweenthe host programmer (which now is known to run on major operating systems, like Microsoft Windows, Linux and MacOS), and the device programmer.Everything was set up to allow almost seamless migration of Arduino code into ZPUino code.During this first phase the Arduino IDE/Wiring library was adapted to support ZPUino, and a new compiler mode was then implemented, since it did not supportmulti platform (as of now, it does, but I still keep the "make" approach I designed back then).The second phase relied on hardware design. A new core was implemented (ZPUino Premium), which had a full 3-stage pipeline and was able to execute most basicinstructions in one clock.Some new core devices were also added, like Audio (sigma-delta), and complex PWM-able timers. The main IO interface is wishbone compliant, so any wishbonecompliant device should work with the design (I've tested a few, like OpenCores I2C, and works like a charm). A few design variants were written, like memorymapped VGA, DMA VGA (such as the ZX Spectrum version), audio synthesis, and many more. But only internal RAM (BRAM) was supported.There was a singular variant of this design, one which actually implemented a new instruction (which I called FMUL16), which could perform a 16.16 fixed pointmultiplication, and speed up some operations. This variant was used in the SoundPuddle project.Let me now tell you about the SoundPuddle project.Back in April this (2012) year, I was contacted by John English from Colorado, US, asking if ZPUino could do real time signal analysis for a project he wantedto show in Apogaea 2012.After some initial analysis I said it was feasible, and so we moved to implement the thing on ZPUino in a S3E500 board (Papilio One), from Gadget Factory. Itwas indeed feasible, and it was a huge success. It was improved and shown at Burning Man festival the same year. Feedback was awesome.For some low-level details on this one:A 1024-point FFT was implemented in software, whose inputs came from an external ADC. The FFT code was entirely done in assembly code (a whopping 177 bytes!),using the FMUL16 instruction. This was fast enough for what the project needed (actually, it ended up being too fast, and we had to add some delays). The realconstraint here was the amount of memory available of the device. The system ran with around 40KB. Tough, but possible.Intro video for Kickstacker is here: http://kck.st/MAu7oQAlmost at same time, Jack Gasset (from Gadget Factory) started the Retrocade Synth project:http://www.kickstarter.com/projects/13588168/retrocade-synth-one-chiptune-board-to-rule-them-al . This uses now the Extreme core, as described below.Both projects were successfully funded, and are now shipping to its supporters.Back to the design:The core, due to it's pipelined design, required fast memory since it needed to simultaneously read the instruction stream, read stack values and write backstack. And we werevery limited on block RAM, so it was time to move to another design.ZPUino Extreme was then born.ZPUino Extreme took another approach - it used block RAM for the stack (which was fixed, 4KB or 8KB), and used external memory for the program area and data. Inorder to do so, we designed memory interfaces (SRAM, SDRAM and DDR-SDRAM), all working in wishbone pipelined mode, and added a simple, direct-mapped instructioncache. This allowed us to run larger codebases, and access more memory than usual. This is still the fastest core if you need large code/data, and can live withthe limited, non-switchable stack. For most single-task applications, this is indeed the core you need.But for complex designs this was still not enough. The fixed, limited stack prevented us from running more complex applications. At first a simplewrite-back-stack, read-new-stack approach was tried, but was somewhat complex, and very slow.So, ZCoreV3 was born Yes, I decided to change the name for the core. I was running out of acronyms - now, seriously, I though a lot about the naming of ZPUino cores, and theywouldn't cope with further development improvements, so I went radical.First of all, ZCoreV3 is not yet in production, although it's considered (by me) stable. It's stability will be proven during next months, although I'm feelingconfident. A few improvements are also being thought of, so it might take a while before a first stable version is available to you all.So, what's so different about ZCoreV3 ? Well, something simple, but something very complex: the stack is no longer fixed.Although this might look like a simple thing, it's indeed the most complex thing I did in hardware!!!ZCoreV3 shares the same pipeline and instruction cache as ZPUino Extreme, and adds a data cache, direct-mapped, one-way associative, dual-ported, write-back,which can in "hit" scenarios attain a 1-clock read delay, and 0-clock write delay. Only one of the ports is writeable, though. Conflicts (r/w) are handled bythe cache itself, so the core does not need to address that. The core is also slightly different, featuring not only TOS cacheing, bu also NOS cacheing (butTOS is always written back for stack push operations). Further improvements are to identify "hot" cache lines (those being accessed as stack) and performwrite-through for some memory accesses (or eventually convert it to a two-way associative cache).So, since ZCoreV3 design is able to address a lot of memory, and not many restrictions on it's use (if any), we can probably put it to some real work....... and it now runs Linux (MMU-less version)!There are still some things needing implementation on Linux side (and uClibc), and a few stability issues, but things now look very promising.I'm uploading a small video of it running on Gadget Factory Papilio Pro board (S6LX9), with 8MB SDRAM, and a real SD card. You can see it here:http://youtu.be/WXhLxfztSZoA few things still to address. Some stability issues need to be addressed (all those are software, eventually related to kernel stack switch), some functions(memcpy, memset, string functions) need some optimizations (ie., assembler versions, memcpy already has one), the SPI controller is limited to 8-bit, whichmakes it very slow (as you can see from the video, takes some time to exec. the first application), and some more, which I'll address. First, make it runstable, then optimize.I'm hoping to get this to run on S3ESK soon, at same speed (96MHz), so you guys can also help (I know some of you have this board at home).Plans for the future: oh, well, first, get Linux and other operating systems running stable, getting DMA to work properly with the dcache, some new VGAadaptors, what else....Let's hope 2013 is a good year for ZPU and ZPUino.A few thank-you:- To all ZPU and ZPUino users, we're doing this for you, thank you !- My family, for their support (although they don't know what I'm doing! )- Jack Gassett, and Gadget Factory, for they support with hardware and ideas! Thanks Jack!- John English, the SoundPuddle Engineer, for the real-world use of ZPUino and a lot more!- All those who helped with ZPUino, they are so many I won't risk forgetting anyone, so you're all included!- All ZPU fans!As always, any doubts, questions, opinions, so on, are very very welcome!And have a merry Christmas!AlviePS: I'm not explaining something here - it's a challenge to your intellect and HDL knowledge I'll just say "data cache", hopefully someone will question howis it possible. lol!And merry Christmas to you all Alvie Quote Link to comment Share on other sites More sharing options...
Jack Gassett Posted December 26, 2012 Report Share Posted December 26, 2012 Alvie,That was a great read, it's amazing to go over all that you have accomplished with the ZPUino. I'm very grateful, the RetroCade would not have been remotely possible without the ZPUino and the prospect of Linux soon is very exciting! Let's make 2013 the year that we get ZPUino in the hands of many more users. Jack. Quote Link to comment Share on other sites More sharing options...
engineR Posted November 12, 2018 Report Share Posted November 12, 2018 Hallo all, I was very impressed seeing the linux kernel booting on ppro in your video, I downloaded the github repo with Linux3.7-zpu, but I have still big problems to get it compiled and I also couldn't find the source of zbflt-loader and zlinux-loader to get it running. Maybe someone has some useful tipps or a short tutorial. Thanks in advance Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.