jamesbowman

J1 Forth CPU on Papilio Duo

20 posts in this topic

I have the J1 CPU running on the Papilio Duo.

It runs a standard 32-bit ANS Forth, communication is through the UART.

 

It is working quite well; in fact I used it to run my slides for a presentation last week (slides were on microSD, buttons and VGA output from the Computing Shield).

 

Is anyone interested in giving it a tryout? Let me know if so and I will put together a release.

 

Thanks!

J.

 

 

2 people like this

Share this post


Link to post
Share on other sites

Hi James,

 

thanks for all the work you've put into this.

Speaking for myself, I'm managing quite well with the existing material. Anyway, if your design is available I'll have a look sooner or later.

 

What I did this morning is to put in an inferred memory for the RAMB16, add a clk and a constant to io_din and simulate in iVerilog.

With six hand-coded assembly instructions, it fetches IO from 0xdead and stores to 0xbeef. I consider that a success, and now I have to learn FORTH...

 

You write in the paper that memory read is the critical path. Do you know what clock speeds can be reached on the Spartan XC6?

it could be possible to put pipeline registers into memory and ALU, then run several (e.g. 4) completely independent CPUs with independent stacks on subsequent clock cycles. This would also leave some cycles to respond to IO reads.

Share this post


Link to post
Share on other sites

Hi @offroad,

 

Glad to hear it is alive... 

 

The fastest I can get the 32-bit J1b running on the DUO's XC6 is about 180 MHz.

 

And yes, four cores should fit -- just need to think of some suitable application!

 

J.

Share this post


Link to post
Share on other sites

OK, I got it also synthesized and it looks OK on the logic analyzer.

The synthesis tool thinks 83 MHz is the limit (the regular 16-bit J1) and a bit of pipelining should bring that up to the 104 MHz that I'm using elsewhere.

180M sounds pretty darn fast.

 

BTW, I found this quote on reddit

>> Forth is one of the few environments which is totally comprehensible by one person.

and from what I've seen so far, this may actually be true.

 

What I meant with my pipelined four-core version is to share most of the logic, but use a 4-step ring shift register for every regular register (so that any new value isn't active in the next cycle but only after four cycles).

The stacks would go into another BRAM and are multiplexed via two address lines. It shouldn't be much bigger than a single core.

Anyway, just thinking aloud, might give it a shot on some rainy weekend.

Share this post


Link to post
Share on other sites

I have the J1 CPU running on the Papilio Duo.

It runs a standard 32-bit ANS Forth, communication is through the UART.

 

It is working quite well; in fact I used it to run my slides for a presentation last week (slides were on microSD, buttons and VGA output from the Computing Shield).

 

Is anyone interested in giving it a tryout? Let me know if so and I will put together a release.

 

Thanks!

J.

I am interested although I have no Idea of  Forth programming, but there are examples on http://www.excamera.com/sphinx/fpga-j1.html

:)

 

In what form were your slides ?

 

 

Filip

Share this post


Link to post
Share on other sites

Hi James,

I would like to give the J1 a spin. Thanks for your work on thus and Gameduino

Bob

Share this post


Link to post
Share on other sites

Hi James,

 

Thanks for all the hard work you have put into this.

 

Would there be any possibility of a tutorial on video, showing the process for the typical DUO / computing shield user?

 

I spent some time over the weekend getting to know the J1b core in more detail, understanding the instruction set with a home-cooked simulator and an assembler.

 

My Forth skills are not quite up to snuff yet, to allow me to make instant headway  - but my interest is in the journey (into working with FPGA soft cores) and not necessarily the final destination.

 

I'd like to explore some of the ideas that Jeff Fox came up with back in the '90s while working with Chuck Moore - and the idea of a reasonably quick, self contained Forth workstation (DUO + Computing shield) running open source tools is of interest.

 

I have a friend with a scanning electron microscope (circa 1985) and the idea is to build an image capture, process and display system.  This will need 12 bit DACs to generate the scans, and a video capture front end - probably based on an 8 bit Flash ADC.  I need to get my Verilog skills up a bit before I tackle this project for real.  The J1b seems ideal, quick enough and some SEM imaging applications written in Forth might just be the motivation I need.

 

With 4 instantiations of the J1b core in a Spartan 6 ,   it would be great for specialist image processing  - such as optical flow / motion sensing / control for quadcopters etc.

 

Can you get the 800x600 60Hz VGA on a 2MB Duo?

 

 

Regards

 

 

Ken

Share this post


Link to post
Share on other sites

James, Jack & All

 

Would anyone be interested in forming a sub-forum to discuss the J1 specifically on Papilio boards?

 

Hey, it could turn into the homebrew computer club  :)

 

 

Ken

Share this post


Link to post
Share on other sites

I'd be happy to make a forum here if that is the best place for it. Just let me know.

 

Jack

Share this post


Link to post
Share on other sites

Jack,

 

Would you be able to put a J1 topic under the soft processors section of the forum. That would probably be the best way forward.

 

Thanks in advance

 

 

Ken

Share this post


Link to post
Share on other sites

Might be a good idea.

I see some real potential as a Pico(/Paco)-blaze upgrade - why mess with 8 bit assembly when you can mess with 16 bit Forth.

And a lot of potential for creative hacking because the code is so short.

 

For those who haven't, it may be the best excuse ever to learn Verilog :D

Share this post


Link to post
Share on other sites

Offroad,

 

Agreed,

 

An excellent opportunity/excuse to learn Verilog and tinker with real hardware.  J1 is open source, and James Bowman is a real proponent of open source tool chains (wherever possible).

 

If you have not already come across Victor Yurkovsky's blog - this might be worth a look.

 

http://www.fpgarelated.com/showarticle/797.php

 

I'm as excited about the J1 project,  as Woz was - when back at Wescon '75 he bought his first 6502 for $25    :)

 

Real Forth chips for those that have been waiting 30 years.

 

 

Ken

Share this post


Link to post
Share on other sites

Hi,

 

>> - this might be worth a look.

 

it's interesting, but IMO this stuff is for the real "pro" coders who save their company money by squeezing a design into the smallest possible FPGA.

The downside is that my design gets locked in to one specific vendor (which doesn't matter for a commercial system, they are far beyond the point where I can simply switch from Xilinx to Altera to Lattice by editing a top level file.

 

For hobbyists and hackers, there is (IMHO) a more promising strategy: Write portable code. The magic word is "inference". For example, instead of instantiating an 18x18 multiplier macro, I simply write "a * b" and let the synthesis tool figure out how to do it. The hardware multiplier is vendor dependent, the "inferred" operation is portable.

Often (but not always) this works surprisingly well.

 

Here is one example from the J1 code: [1] The original implementation uses Xilinx-proprietary memory macros (and there is a good reason to do so, they allow to change the memory contents in the bit stream without recompiling the whole project. My inferred memory can't do that, due to limitations of ISE.).

Now I had to "port" the design from Xilinx to another platform, which is the open-source iverilog simulator (which knows nothing about Xilinx-proprietary BRAMs).

So I replaced it with an inferred memory [2] that I wrote myself - code that describes the functionality of the memory, without referring to specific features of the FPGA.

As intended, this works in iverilog. And the Xilinx compiler is smart enough [*] to figure out that I'm trying to implement memory, so the resulting FPGA hardware is more or less identical. It's not completely trivial - the memory has an output register with an initial value that effectively contains my reset vector as it's the first word passed to the CPU. But it works.

 

As Donald Knuth put it, "premature optimization is the root of all evil". If it saves me money - no more questions, go for it. But otherwise, I'll pick my battles carefully - this stuff is difficult enough even when taking all the shortcuts.

 

 

------------- [1] -------------------

  generate
    for (i = 0; i < (1 << `RAMS); i=i+1) begin : ram
      // RAMB16_S18_S18
      RAMB16_S2_S2
      ram(
        .DIA(0),
        // .DIPA(0),
        .DOA(insn[`w*i+`w1:`w*i]),
        .WEA(0),
        .ENA(1),
        .CLKA(sys_clk_i),
        .ADDRA({_pc}),

        .DIB(st1[`w*i+`w1:`w*i]),
        // .DIPB(2'b0),
        .WEB(_ramWE & (_st0[15:14] == 0)),
        .ENB(|_st0[15:14] == 0),
        .CLKB(sys_clk_i),
        .ADDRB(_st0[15:1]),
        .DOB(ramrd[`w*i+`w1:`w*i]));
    end
  endgenerate

 

------------------ [2] inferred memory -----------------------------

   reg [15:0] mem[0:16383];

   reg [15:0] memAout = {3'd0, resetVector[13:1]}; // UBRANCH resetVector   
   reg [15:0] memBout = 16'd0;   
   assign insn = memAout;
   assign ramrd = memBout;

   initial $readmemh("firmware/j1.mem", mem);

   always @(posedge sys_clk_i) begin
      // port A
      memAout <= mem[_pc];      

      // port B
      if (|_st0[15:14] == 0) // .ENB
    if (_ramWE)
      mem[_st0[15:1]] <= st1;
    else
      memBout <= mem[_st0[15:1]];      
   end // always

 

[*] there are code templates in some ISE menu for design patterns that will be "inferred" correctly.

Share this post


Link to post
Share on other sites

>> that have been waiting 30 years.

 

PS an exercise for the hardcore nerds is to extend the J1 compiler (and the CPU but that's trivial) to 18 bits.

Some people think it is a bad idea :) but hey the two extra bits are for free.

Share this post


Link to post
Share on other sites

Ok, excellent everyone, I just made a new J1 Forth forum under Soft Processors. Don't feel that it has to be limited to the Papilio, if it adds value to the community then I'm all for it.

http://forum.gadgetfactory.net/index.php?/forum/121-j1-forth/

 

I made James Bowman and Monsonite the moderators of the new forum too.

 

Jack.

Share this post


Link to post
Share on other sites

Hey Jack,

 

It's an honor and a privilege - but as they said in "Wayne's World"    - "We're not Worthy" - or more specifically "I'm not worthy!"

 

I'm sure the J1 forum will get pretty rowdy sometimes, arguing about what to do with those two free extra bits in the instruction set -  what to do with bit 4, and how to use just 4.5 slices to implement the whole ALU in LUTs

 

I hope that James will show us the way, the light and the truth about Forth, and that we will all become stack manipulating polymaths, like Chuck Moore.

 

But like I have said in the past, it's about the journey, not the destination. And Chuck said " The map is not the territory".

 

Hopefully the J1 will open a whole new generation to Forth.  I started with ZX ROM Forth in 1984, and bought a Jupiter Ace in the same year.  In '88 I bought a Novix NC4016 dev board,  30 years on, my brain is still knotted by the intricacies of Forth.  It's an alternative to Suduko puzzles to stop you going senile.

 

Welcome aboard fellow Forth-wrights  - May the Forth be with you :)

 

 

Adios Amigos

 

 

Ken

 

London

Share this post


Link to post
Share on other sites

Speaking of small CPUs in Verilog, I've long been wanting to put in a plug for this one:  Arlet Ottens's 6502 model (http://ladybug.xs4all.nl/arlet/fpga/6502/).  A historic architecture; more versatile and portable than PicoBlaze; and fairly compact.  On the Papilio Pro, it uses about 9% of the LUTs and gets a clock rate of up to ~90MHz.

Share this post


Link to post
Share on other sites

Jaxartes,

 

Yes it is really cool that there now exists the reverse engineered designs of long-gone microprocessors.

 

In their day, these were a miracle of digital design and state of the art photo-lithography processes. Virtually every kid my age, grew up with a home computer that was based on either 6502 or Z80 - and some went on to learn machine code and become game developers and engineers.

 

The fact that you can take a 40 year old CPU design, and run it nearly 100X its original clock rate just shows how the technology has come on. Coupled with all the retro-engineered support chips like the SID, synth and VGA graphics and you can quickly have a fast machine with a retro feel.

 

I have seen that at least 1 person has reverse engineered the BBC Micro - I believe there is quire a following to engineer these Acorn designs including it's close relatives the Acorn Electron, Atom and Master 128.

 

http://www.mike-stirling.com/retro-fpga/bbc-micro-on-an-fpga/

 

It was out of the Acorn Computer design team that the ARM processor was developed in the mid-1980s  and does bear some heritage to the 6502 in terms of memory access and lack of microcode.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now