socz80: A Z80 retro microcomputer for the Papilio Pro

Will Sowerbutts

Recommended Posts

Hi everyone.


Last year I wrote a Z80-based retro microcomputer which runs in the Papilio Pro. It started out small but I added a few interesting features, in particular a memory management unit and a 16KB cache to hide the SDRAM latency. I've ported several operating systems to it. Both the hardware and software aspects of the project have been good fun with lots of new opportunities to learn.


I've just made my first public release, you can download it at and try it out. That page also describes the project in a bit more detail.


RAM disk images are included to boot CP/M-2.2, MP/M-II and UZI (a UNIX system). I've included Zork and the Hitchhiker's Guide game which will play under all three operating systems; they are native CP/M application but MP/M-II implements the CP/M system calls, and UZI includes a CP/M emulator.


The release also includes the full VHDL source code for the machine and the source code to all the software I've written, with the exception of the UZI port which I plan to release shortly after I extend it to support the N8VEM Mark IV SBC.


Please let me know if you get it to work!

This post has been promoted to an article
Link to comment
Share on other sites

Excellent project! Thanks for sharing.


I wonder if I can use this to implement a ZX Spectrum clone. It requires some odd timings because part of the DRAM is shared with the display, and exhibits different timings. Games rely on these timings, so it might be a bit more complex.

Link to comment
Share on other sites

alvieboy -- there's more than 32KB of block RAM on the FPGA that remains unused. I had been saving this for video memory because the block RAM is dual-ported. You can use one port for the CPU driven by the 128MHz system clock, and the second port for the video circuit driven at whatever clock speed the video timing requires.

Link to comment
Share on other sites

Thanks, Will!


I did a quick synthesis trial with a stripped-down version, no peripherals, on-chip RAM and a one-bit GPIO for LED, to get an idea of the minimum size.

The processor alone uses about 25 % of the LUTs of the LX9 (P. Pro). So this could make a nice development platform, with SDCC compiler, when 16 bits is enough.

I wonder how performance would compare with a ZPU. It can't be any slower than my usual "small" ZPU but that's only about half the size.


Device Utilization Summary:

Slice Logic Utilization:
  Number of Slice Registers:                   251 out of  11,440    2%
    Number used as Flip Flops:                 251
    Number used as Latches:                      0
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:                0
  Number of Slice LUTs:                      1,285 out of   5,720   22%
    Number used as logic:                    1,251 out of   5,720   21%
      Number using O6 output only:           1,116
      Number using O5 output only:              29
      Number using O5 and O6:                  106
      Number used as ROM:                        0
    Number used as Memory:                      32 out of   1,440    2%
      Number used as Dual Port RAM:             32
        Number using O6 output only:             8
        Number using O5 output only:             0
        Number using O5 and O6:                 24
      Number used as Single Port RAM:            0
      Number used as Shift Register:             0
    Number used exclusively as route-thrus:      2
      Number with same-slice register load:      0
      Number with same-slice carry load:         2
      Number with other load:                    0

Slice Logic Distribution:
  Number of occupied Slices:                   418 out of   1,430   29%
  Number of MUXCYs used:                       112 out of   2,860    3%
  Number of LUT Flip Flop pairs used:        1,304
    Number with an unused Flip Flop:         1,056 out of   1,304   80%
    Number with an unused LUT:                  19 out of   1,304    1%
    Number of fully used LUT-FF pairs:         229 out of   1,304   17%
    Number of slice register sites lost
      to control set restrictions:               0 out of  11,440    0%


Link to comment
Share on other sites

Hi offroad,


I expect the ZPU performance would be higher; the Z80 wastes several cycles each instruction performing refresh cycles for DRAM. I did wonder if the Z80 core could be modified to omit those cycles, I think it could but this is a project for another day. Your ZPU data path is four times wider so also a big advantage. The flat 32-bit address space would certainly be a preferable model for the programmer!


I use SDCC to cross-compile my UZI kernel. Works well. I have some tools that postprocess the linker output to build the final kernel executable that can be loaded from CP/M and then a little stub program juggles things around in RAM until they occupy the required addresses. Be warned that marking a function with "__critical" is buggy and will smash up the stack, but it works fine for a critical section inside a function. SDCC assumes the compiled executable will live in ROM and so any initialised variable is allocated space in ROM (for the initial value) as well as space in RAM, and a chunk of code at startup copies from one to the other. Not a problem for small programs and I think probably simple to patch the compiler or even do a bit more postprocessing of the linker output.



Link to comment
Share on other sites

Although the ZPU Extreme Core is fully pipelined, the usual approach that compilers use is "R->R" operations. ZPU has no registers, so usually these "virtual registers" reside somewhere in the stack, with an offset from current stack pointer.


So, operations like "Rd <- Rs1 AND Rs2" are indeed translated to:


LoadSP X (Load Rs1)

LoadSP Y (Load Rs2)


StoreSP Z (Store Rd)


This uses 4 cycles. Other operations can be faster, but depends on how your compiler optimizes things. And I definitely don't want at this point to mess with the GCC backend.

Link to comment
Share on other sites

Thanks... maybe the Z80 is not the best choice for new developments. I like my "int"s to be 32 bit...

I might give the ZPUino a shot one day if I have to upgrade from the "small" ZPU.

As an alternative I had also a look at Lattice's' MICO32 but that used up half the LX 9.


The whole line of mda VST plugins has been open sourced. Getting those on a handheld board would be pretty cool. The Spartan 6 has more than enough horsepower, but maybe not when hardware is running an emulated CPU that runs software that emulates synth hardware


PS apologies for thread hijack... back to topic :)

Link to comment
Share on other sites

  • 2 weeks later...

CP/M V3.0 Loader
Copyright © 1998, Caldera Inc.    

 BIOS3    SPR  EA00  0400
 BDOS3    SPR  C900  2100
 50K TPA

CP/M 3.x Unbanked BIOS (Will Sowerbutts, Alan Cox 2014-05-24)



getting there, and CP/M 3 supports banked drivers, time of day clocks, device redirection and most importantly larger disks with different block sizes, so it ought to be able to support an SD card.


Need to finish the wboot BIOS code to make it load the CCP (command line), as until that loads its a little bit useless 8)

Link to comment
Share on other sites

  • 3 weeks later...

I'm currently fiddling with SocZ80 a bit more, having jumped in at the deep end and failed spectacularly on the first attempts. It's now running with the logicstart board


So far I've got the switches done (0 is reset, 1-7 are the GPIO), the LEDs now use the LogicStart LEDs, the joystick can be read and the audio output driven. No sigma dac nonsense, pure full on retro 1 bit audio. The ADC is wired to the SPI instead of an SD card but needs some code writing to use it yet.


I also fixed the annoying habit SocZ80 had of returning a byte of ROM space for undefined I/O. Undefined banks now return 0xFF


Next stop the 7 segment displays.


VGA may be a bit harder 8) at least until I learn a lot more about clocks and about dual ported ram.



Link to comment
Share on other sites



I have code from an earlier project that generates VGA at 1024x768. It uses a character generator and 8x16 font to draw 128x48 character text mode at this resolution. There are no frills, the CPU is responsible for shuffling characters about in memory to scroll the screen etc.


The display memory uses dual ported block SRAM with one port for the character generator to read out data while drawing the screen, and a second port for the CPU. Dual ported RAM is exactly what it sounds like; one set of memory cells that can be read/written concurrently through two independent ports. Each port is synchronous to an independent clock so this is one way to get information safely from one clock domain into another. If you search for "Xilinx UG383" you'll find a technical reference with plenty of detail about how to use the Spartan-6 on-chip block RAMs.


Should be easy to integrate with socz80 if you're interested; just instantiate the hardware blocks and map the additional SRAM into the CPU's address space (there's plenty of address space reserved above 16MB for this).


Let me know if you'd like a copy of the code.



Link to comment
Share on other sites

7 segments now seem to work.


I know what dual ported RAM is - but took me a while to find out how to declare RAMB16BWERs for it.


I have however got VGA(ish) signals at 32MHz (should be 31.5) and a plain blue screen so progress. Once I figured out that the syncs were inverted, I wanted 32MHz not KHz and needed to generate hsyncs during vsync the monitor finally admitted I had a display.


Now trying to plumb in some RAM. I've added 8K for testing and experimentation but 640x400 needs 32K for 1 bit mode.


Would be interesting to see how I should have done it though!

Link to comment
Share on other sites



This is the relevant code from my earlier 6502 project. The "video.vhd" file ties it all together. It's even older than socz80 so the code is yet worse! It should be easy enough to adapt to the bus signals that socz80 uses. Requires a 64MHz clock (vga_pixel_clk) which you can easily get the existing PLL in socz80 to generate on one of the unused outputs.


The 0th entry in the font ROM should (arguably) be an empty space but for my testing I redefined it to a symbol with clearly indicated corners. The four corner characters of the character memory are initialised to 0 and this makes it easy to check that the corners of the video signal appear correctly ie that the video timing is correct.



Link to comment
Share on other sites


This topic is now archived and is closed to further replies.