alvieboy

ZPUino next steps: Your help is needed!

13 posts in this topic

Hi guys,

Yes, it's true, the ZPUino project is really needing your help. Your help as users, because all we do is for the users benefit, and sometimes we just don't have the required free time to accomplish all we thing users need, but also because since we are not real users, we fail to see what real ones expect from the platform. Business as usual ;)

Anyway,

A lot of improvements for ZPUino are being polished as I write. The one that will have more impact is an instruction cache. This instruction cache is now perfectly stabilized, and ready for mainstream.

And you might ask: how come an instruction cache is so important and has that much impact ?

Well, it's actually quite simple. ZPUino requires a constant feed of instructions, in order to attain its maximum performance. As of now we use internal FPGA block rams, and things go smooth because they are a 1-cycle read. BUT, if you want to use external memory (SRAM, SDRAM, DDR), the latencies are sooo high to fetch a single byte from it that ZPU ends up idling all the time. So, in order to properly use external memories, the instruction cache was implemented - it works perfectly with all three memory types stated above. So, with instruction cache, we can actually use external memory now - and this is a huge improvement. Note that external memory is used for *both* instruction and data, unlike Arduino, which imposes distinct limits on code and data. Just sum your code and data, and see if it fits (btw, upcoming IDE will do that for you).

Second feature, which is aimed at internal memory devices, is a ZX Spectrum compatible graphical adaptor. It's so small that you can even use a Papilio One 250 to build applications and games.

And this second feature is indeed where we want your help.

We need help with documentation, and help with a new demo we are working on - all to be ran on a Papilio One 250 (you can use another board if you feel like).

So, if you like coding, and want to help with code itself or documentation, drop us a note at zpuino@alvie.com. We will provide you everything you need to contribute. Right now, we have a game almost ready for testing, but we do want *you* to improve it before others can see it and enjoy.

So, if you know "C++" (arduino-style) and wish to help, please contact us. You will of course be mentioned as a co-author, and eventually other benefits.

We are also planning a full-color, full-resolution graphical adaptor for Papilio Pro. But let's go step by step :)

Best,

Alvie

Share this post


Link to post
Share on other sites

Hi Alvie,

Excellent to hear about the unified memory - I assume you are going from Harvard Architecture to modified Harvard, rather than a true to von Neumann, One step closer to a general processor! Did you have to add any special instructions to purge the i-cache?

Have you tried porting any devtools to run natively? I can only imagine the joy of compiling your first program four your very own CPU on your own CPU, saying goodbye to cross-compiling (if only for a second). I've been keen on exploring writing a custom back-end to LCC (https://sites.google.com/site/lccretargetablecompiler/) but I can't even get it to compile cleanly on a modern Linux build. It might be easier than getting the whole GCC toolchain to work!

If you want a small, very basic interpreted BASIC to try for a laugh, have a look at http://hamsterworks.co.nz/mediawiki/index.php/Arduino_Basic. It fits in < 8k on an AVR, Now that you have lots of shiny new SDRAM will even be able to have BASIC programs larger than 1.4k bytes! Who knows, maybe you could even add some custom key words to speed up your dev/test cycle, avoiding the need to have a new build every time you want to do something a little different.

Does having a '3GL' language running on the ZPUino appeal to you? If so, what sort of language? I'ld love to port one - maybe a PASCAL variant?

Share this post


Link to post
Share on other sites

Hi Alvie,

Excellent to hear about the unified memory - I assume you are going from Harvard Architecture to modified Harvard, rather than a true to von Neumann, One step closer to a general processor! Did you have to add any special instructions to purge the i-cache?

Indeed, you can call it a modified Harvard architecture. :)

I did not add any special instruction to flush the icache, but the cache controller is also mapped to IO bus. There's a control bit there that performs flushing. Also, in order to start up the firmware, another bit selects between the internal 4KB firmware and RAM for the codepath.

Basically we start by executing the firmware, which copies itself into the RAM, and then switches the code mapping and performs a cache flush.

Have you tried porting any devtools to run natively? I can only imagine the joy of compiling your first program four your very own CPU on your own CPU, saying goodbye to cross-compiling (if only for a second). I've been keen on exploring writing a custom back-end to LCC (https://sites.google...etablecompiler/) but I can't even get it to compile cleanly on a modern Linux build. It might be easier than getting the whole GCC toolchain to work!

Not really, because that requires an operating system. I was looking at MMU-less Linux, but it's harder to port than I first thought.

If you want a small, very basic interpreted BASIC to try for a laugh, have a look at http://hamsterworks....p/Arduino_Basic. It fits in < 8k on an AVR, Now that you have lots of shiny new SDRAM will even be able to have BASIC programs larger than 1.4k bytes! Who knows, maybe you could even add some custom key words to speed up your dev/test cycle, avoiding the need to have a new build every time you want to do something a little different.

I'll take a look at that :P

I'm also planning to have a quick shot at Python. Not sure if will be feasible.

Does having a '3GL' language running on the ZPUino appeal to you? If so, what sort of language? I'ld love to port one - maybe a PASCAL variant?

Well, all languages that can be converted to C/C++ for compilation should work. Everything else might be a bit complex - ZPU is a stack machine, writing compilers for it is very tough.

Alvie

Share this post


Link to post
Share on other sites

If you want a small, very basic interpreted BASIC to try for a laugh, have a look at http://hamsterworks....p/Arduino_Basic. It fits in < 8k on an AVR, Now that you have lots of shiny new SDRAM will even be able to have BASIC programs larger than 1.4k bytes! Who knows, maybe you could even add some custom key words to speed up your dev/test cycle, avoiding the need to have a new build every time you want to do something a little different.

Hi Hamster and Alvie,

Not wanting to derail the ZPUIno focus of this thread, a few weeks ago I borrowed Mike's Arduino TinyBasic port to adapt it to the Arduino's TVOut and PS2Keyboard libraries, was a bit of a squeeze but the result was here:

. I was connecting the RCA composite video cable with crocodile clips and pushing female jumper leads directly into the PS/2 keyboard plug's pins, so was a bit rough.

Since then I've got the Papilio One and am trying to get started (working through the LogicStart tutorial and getting Pacman etc working, so far). I have Alvie's Jet Set Willy ZPUino implementation running, using the bitfile which came with it on GitHub - this bitfile seems to be ZPUino with a VGAZX video generator for P1-500, which isn't one of the ones offered on the ZPUino site. I had the idea to try to use this bitfile to recreate what I'd done with TinyBasic on Arduino, using Papilio One with the Arcade megawing to connect a PS/2 keyboard and VGA, and the Spectrum's character set over VGAZX.

Unfortunately I had some trouble running my adapted TinyBasic sketch in ZPUino. Examples of things that wouldn't compile included font files defined as arrays of binary numbers e.g. B00001, this looks like the sort of thing the compiler shouldn't have any trouble with but it seemed to want to treat them as unknown tokens.

Also, is there an actual library for using VGAZX? The Jet Set Willy code didn't refer to a library but had some internal routines getting stuff done via memcpy, but if a friendlier library did exist would be nice to use.

Lastly - In the Jet Set Willy sketch is a directory SmallFS containing files JSW (64k, presumably a memory dump of the Spectrum when JSW is running, including ROM) and ZPUINO. Looks like the Spectrum's character set is grabbed from this 64k file. If I was wanting to borrow just the text display bits (PRMESSAGE & putChar), could I cut this 64k file down to just the bit with the character set in it?

Thanks,

RorschachUK

Share this post


Link to post
Share on other sites

Hi Hamster and Alvie,

Unfortunately I had some trouble running my adapted TinyBasic sketch in ZPUino. Examples of things that wouldn't compile included font files defined as arrays of binary numbers e.g. B00001, this looks like the sort of thing the compiler shouldn't have any trouble with but it seemed to want to treat them as unknown tokens.

Also, is there an actual library for using VGAZX? The Jet Set Willy code didn't refer to a library but had some internal routines getting stuff done via memcpy, but if a friendlier library did exist would be nice to use.

Lastly - In the Jet Set Willy sketch is a directory SmallFS containing files JSW (64k, presumably a memory dump of the Spectrum when JSW is running, including ROM) and ZPUINO. Looks like the Spectrum's character set is grabbed from this 64k file. If I was wanting to borrow just the text display bits (PRMESSAGE & putChar), could I cut this 64k file down to just the bit with the character set in it?

Hi,

Well, actually the B01010101 extension is not understood by all compilers (zpu one does not recognize it). The alternative is to add a #define table with all 256 values.

Regarding the JSW, yes I started a small library but unfortunately I had not much time to complete it. The ROM you mention is indeed a dump of a running JSW, so we can use its resources. You can strip the "normal" character set from there and create a new ROM with only the chars.

Alvie

Share this post


Link to post
Share on other sites

Hi Alvie,

 

First - thankyou for your most valuable work in bringing the ZPUino Extreme softcore to the average guy :)

 

I have been fascinated by FPGA implementations of cpus since Chuck Moore brought out the Novix NC4000 in the late 1980s. However, nearly 25 years later - I am only just beginning to catch up with VHDL - they didn't teach that when I was at college.

 

A couple of questions:

 

1.  Does the compiler allow inline assembly - so I can directly access the instruction set?

 

2.  Have you any plans to extend the VGA capabilities to a greater resolution - say 640x480?  What resolution is the Spectrum implementation 256 x 192 - I'm guessing?

 

 

The reason for my interest  - is I am wanting to follow some of Chuck Moore's work,  running applications in Forth, on a super-fast softcore.  Inspired by his OKAD suite of VLSI chip design tools - he wrote in the early 90's, I hope to experiment with a "pcb design workstation" running on a ZPUino.

 

If anyone is interested in Chuck Mooore's (and the late Jeff Fox)  - the history of their developments is captured here on the Ultratechnology site - this is fasinating stuff from 20 years ago

 

http://www.ultratechnology.com/dindex.htm 

 

 

Many thanks

 

 

 

Ken

 

 

London

Share this post


Link to post
Share on other sites

Hi Ken,

 

Thanks for your post.

 

 

 

1.  Does the compiler allow inline assembly - so I can directly access the instruction set?

 

Yes, it does, but has some limitations because ZPU does not have registers.

 

 

2.  Have you any plans to extend the VGA capabilities to a greater resolution - say 640x480?  What resolution is the Spectrum implementation 256 x 192 - I'm guessing?

 

Already did. We have now a generic VGA controller that can output up to 2048x2048, asssuming you have enough memory and enough memory bandwidth.

 

Jack has some examples on how to use the new controller, I think.

 

Now, for forth... this can be interesting because ZPU is a stack processor. So it may be possible to take full advantage of Forth. Note however that, due to design/speed limitations, the ZPU Extreme core has a fixed stack, split from the main memory. I have other core (ZCoreV2) which can share stack and memory, but its bigger and does not run at >80MHz.

 

Alvie

Share this post


Link to post
Share on other sites

Alvie

 

Thankyou for your fast response.

 

A generic VGA up to 2048x2048 would be fantastic   - especially for my pcb CAD ideas :)  I will wait for more information concerning examples from Jack.

 

Regarding the Zpuino as a stack machine  - a fixed stack should not be a limitation -  Chuck Moore's F21  Forth mcu had fixed stack of only a few levels deep

 

http://www.ultratechnology.com/f21data.pdf

 

Return stack 17 levels

 

Datastack 18 levels

 

Thanks for sharing

 

 

Ken

Share this post


Link to post
Share on other sites

We are using a 4096b (4KB) stack if I recall well, so that's enough for 1024 words.

It may be interesting to support Forth out of the box. It may be simple to port Gforth to work with ZPU (actually I do think someone already did so).

 

On the VGA matter - what board/boards do you have ? Or are you still planning to buy one ?

Share this post


Link to post
Share on other sites

Alvie,

 

I have a Papilio Duo with a 2MB SRAM and a Logicstart shield for the VGA.

 

A video resolution of 1024 x 768 in RGB332 would be more than adequate for the type of applications I am thinking.

 

I take your point about the video clock from your other post - and how the SRAM can be the video bottleneck.

 

 

 

Ken

Share this post


Link to post
Share on other sites

Monsonite, it's good to see that there's still interest in Forth. I did some programming in Forth in the 80's (I'm co-inventor of the virtual stack method of compiling Forth to register machines). Forth on a single stack machine can be more frustrating than on register machines. I think there are a couple FPGA  Forth implementations out there, although they may be 16-bit versions. There are a couple FPGA projects that interest me, but at the moment I have way to much on my plate to seriously consider them. But one of those would be a 32-bit Forth machine with two instruction formats, one strictly Forth operations and the other containing a 16 bit Forth operation in parallel with a 16 bit coprocessor operation. The thought being to create a model that would allow people to create coprocessors closely coupled to the cpu. In the late 80's I met a couple guys from NASA that were running Forth on NASA's Massively Parallel Processor. A couple years later I became aware of FPGA's and it struck me that the day would come that people could create there own MPP's. I think we're there. Anyway, if you're interested in creating a Forth machine or compiling Forth to a register machine like MicroBlaze, I would certainly be willing to help within the constraints of my current situation.

Share this post


Link to post
Share on other sites

Pharseid1,

 

Thanks for your reply.

 

It would be good to have someone to correspond with about Forth ideas, and FPGA implementations.

 

Chuck Moore has been one of my computing heroes for almost 30 years - since I first came across Forth.

 

I am trying to follow the work that he and Jeff Fox did in the 1990s - to see if any of it is more accessible now we have faster FPGAs  - without having to resort to custom VLSI  - as they did.

 

Chuck Moore is now involved with Green Arrays  - who put a 144 core Forth processor on a single die - but getting all the processors to work in a co-ordinated manner seems to be the biggest problem.

 

http://www.greenarraychips.com/

 

Jeff Fox wrote an article about Aha - where he managed to minimise the compilation effort, allowing compilation speeds of millions of Forth words per second.  This, he claimed would allow recompilation of large applications between keystrokes - giving real-time update in edited code. He had some pretty radical ideas - and I am yet to dig deeper - to see if any of it is achievable and useful. He was hinting at multiple big Forth applications on a desktop (or tablet) invoked by tapping the icon - but in a Just In Time compile manner - that did not need the "fat" of an operating system.  

 

Meanwhile I am stretching my brain writing a simple CAD application - almost in the spirit of Chuck Moore's OKAD.   The Papilio Duo and ZPUino - certainy make this sort of stuff  a whole lot easier.

 

 

Keep in touch

 

 

Ken

 

 

London

Share this post


Link to post
Share on other sites

The MPP was a big SIMD  machine. Because everything works on the same clock, coordination is pretty simple, but on the other hand, you're pretty much only looking at problems that have large amounts of data parallelism. That works for some things, for instance, GPU'S work on problems with huge amounts of data parallelism. So it's not ideal for all problems, but an interesting asset to have. Especially in the case of FPGA'S, where you can design a custom machine for each problem.

 

The virtual stack method is interesting because it allows efficient compilation of Forth onto register machines. You keep a list of available registers and you have a virtual stack at compile time. So if you compiled something like "dataValue @ 4 +" you would push dataValue , with the tag that it was a literal value (which I would hope would be a variable address) onto the tagged virtual stack, then compile a machine language instruction to read that value to a register obtained from the available register list, removing that register from the list, then compile an instruction to add the immediate 4 to the register, leaving the value of the register with a register tag on the virtual stack. Operations like "swap" occur at compile time if both operands are on the virtual stack, requiring no runtime activity. Compared to naïve compilation on the cpu's of that day (late 80's), this technique was typically 3 times faster for binary operations.

 

This method is attractive for an incremental compiler (traditional Forth) or a JIT compiler because it is simple and fast. So on an FPGA you have the choice of using an existing register machine and doing a little extra work to build an efficient compiler or designing a Forth machine that doesn't need this optimization.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now