stm

Using on-board RAM in designs

47 posts in this topic

I've stil made no progress with the DMA peripheral implementation, so I need again to ask for help.

 

As I'm not able to get the peripheral to work with the wb_master_np_to_slave_p.vhd wrapper around my "classic standard single read/write cycles" implementation, I'm now trying to implement the "pipelined  read/write cycle" directly.

 

For that I have a question about the significance of the Wishbone STALL_I signal of the DMA master. Alvie wrote earlier that I need to take that signal into account. But if I understand the Wishbone specification correctly, this is only relevant if the master tries to make multiple read/write cycles in a row, and if the slave wants to signal that it can't take further requests. My current implementation will only do a single read or write. So it will start a read or write cycle and then wait for the high ACK_I signal from the slave.

 

Am I correct that in this scenario the STALL_I signal can be ignored, and that the master can simply wait for the  ACK_I signal from the slave?

Share this post


Link to post
Share on other sites

It seems like I'm trying to go where no man has gone before in Papilio land...

 

Jack, you mention in the thread Hardware Verification for AVR8 Soft Core a tool to generate a ROM image for simulating an entire AVR8 soft processor in Xilinx ISE. Would that be usable also for simulating the whole ZPUino in ISE? It looks that this is the only option for me to find the problem with my DMA peripheral.

 

How would one set up the actual RAM in such a simulation?

Share this post


Link to post
Share on other sites

Hey stm,

 

I think you would be going down a very tricky and time consuming route to simulate the entire zpuino processor in ISIM. Alvie is the only one I know of that has simulated zpuino and he doesn't use ISIM, he has modified ghdl source code to do so. He also says it takes a long, long time to run a simulation before you can view the waveforms. That's about all I know on the topic, as far as making it work in ISIM, I haven't even attempted it because it would be a huge undertaking...

 

I think what we really need here is to get a working DMA example up and running. I'll see if I can work with Alvie to make this happen soon.

 

Jack.

Share this post


Link to post
Share on other sites

Hello Jack,

Hey stm,

 

I think you would be going down a very tricky and time consuming route to simulate the entire zpuino processor in ISIM. Alvie is the only one I know of that has simulated zpuino and he doesn't use ISIM, he has modified ghdl source code to do so. He also says it takes a long, long time to run a simulation before you can view the waveforms. That's about all I know on the topic, as far as making it work in ISIM, I haven't even attempted it because it would be a huge undertaking...

 

I think what we really need here is to get a working DMA example up and running. I'll see if I can work with Alvie to make this happen soon.

 

Jack.

it would be very helpful if you could put up a working DMA example! The implementation of such a thing apparently is a little bit hard for a VHDL newbie like me...

 

Please let me know if can help in any way, maybe with testing.

 

I have the prototype of my DMA peripheral on GitHub, maybe I'm doing something obviously stupid:

 

The classic implementation with the wb_master_np_to_slave_p.vhd wrapper is on the master branch:

 

https://github.com/smuehlst/c1p610/tree/master/memext

 

My last attempt to implement the pipeline cycle is on the "feature_pipeline" branch:

 

https://github.com/smuehlst/c1p610/tree/feature_pipeline/memext

 

Thanks

Stephan

Share this post


Link to post
Share on other sites

I was thinking about the best way to do this all morning. I think what Alvie and I will work on, when he gets back from a business trip, is a new wishbone peripheral that exposes address lines and data lines from the schematic symbol. The new symbol will connect to the Wishbone bus and then provide you with address and data lines that you can then use to communicate to SDRAM/SRAM memory. So you don't need to mess with any Wishbone or DMA VHDL code. 

 

Do you think that would be good?

 

Jack.

Share this post


Link to post
Share on other sites

I was thinking about the best way to do this all morning. I think what Alvie and I will work on, when he gets back from a business trip, is a new wishbone peripheral that exposes address lines and data lines from the schematic symbol. The new symbol will connect to the Wishbone bus and then provide you with address and data lines that you can then use to communicate to SDRAM/SRAM memory. So you don't need to mess with any Wishbone or DMA VHDL code. 

 

Do you think that would be good?

 

Jack.

That sounds easy, therefore good :-)

 

For the memory expansion aspect of my project this will be fine. You might recall that exposing a part of the Papilio RAM as a memory expansion for a 6502-based computer is the first step of my project. The second step will be to use the Papilio DUO also as a floppy disk emulator with storage on an SD card. This will also need a buffer in RAM. I guess that I will then have to manage a larger chunk of memory and use part of the memory as memory expansion and another part as disk buffer.

 

Stephan

Share this post


Link to post
Share on other sites

Hello Jack,

 

I was thinking about the best way to do this all morning. I think what Alvie and I will work on, when he gets back from a business trip, is a new wishbone peripheral that exposes address lines and data lines from the schematic symbol. The new symbol will connect to the Wishbone bus and then provide you with address and data lines that you can then use to communicate to SDRAM/SRAM memory. So you don't need to mess with any Wishbone or DMA VHDL code. 

 

Do you think that would be good?

 

Jack.

 

I'm currently trying to revive my Papilio Duo project that I abandoned earlier this year.

 

I still have no luck with my own DMA implementation, I can't make it work.Therefore I wanted to ask whether you actually did implement a Wishbone peripheral for SDRAM access, as you outlined in your post. Did anything happen in this area, or are you still planning to do this?

 

Stephan

Share this post


Link to post
Share on other sites

I wrote a burst controller that you may use. It eases access to DMA.

 

https://github.com/alvieboy/ZPUino-HDL/blob/master/zpu/hdl/zpuino/lib/wishbone/wb_burstctrl.vhd

Assuming a burst with of 16 words, you should use it like this:

  signal bctrl_sob:   std_logic;   signal bctrl_rnext: std_logic;  signal bctrl_wnext: std_logic;  signal bctrl_req:   std_logic;  signal bctrl_eob:   std_logic;---  burstctl: entity work.wb_burstctrl    port map (      clk     => wb_clk_i,      rst     => wb_rst_i,      sob     => bctrl_sob,      eob     => bctrl_eob,      cti     => mi_wb_cti_o,      stb     => mi_wb_stb_o,      cyc     => mi_wb_cyc_o,      stall   => mi_wb_stall_i,      ack     => mi_wb_ack_i,      req     => bctrl_req,      rnext   => bctrl_rnext,      wnext   => bctrl_wnext    );

An explanation of the required signals:

 

  signal bctrl_sob:   std_logic; -- Start Of Burst. Input to burst controller. Set to one for one clock cycle.
  signal bctrl_rnext: std_logic; -- Read-Next. Output from burst controller.
  signal bctrl_wnext: std_logic; -- Write-Next. Output from burst controller.
  signal bctrl_req:   std_logic; -- Request in progress signal. Output.
  signal bctrl_eob:   std_logic; -- End-of-Burst signal. Output.

 

See how VGA uses it:https://github.com/alvieboy/ZPUino-HDL/blob/master/zpu/hdl/zpuino/devices/video/vga_generic.vhd

Do you need read, write or both ?

Share this post


Link to post
Share on other sites

I do want to get this working for the Logic Analyzer core soon too... Hopefully I'll get time after Christmas to dig into this some more.

 

Jack.

Share this post


Link to post
Share on other sites

I wrote a burst controller that you may use. It eases access to DMA.

 

https://github.com/alvieboy/ZPUino-HDL/blob/master/zpu/hdl/zpuino/lib/wishbone/wb_burstctrl.vhd

Assuming a burst with of 16 words, you should use it like this:

...
 

See how VGA uses it:https://github.com/alvieboy/ZPUino-HDL/blob/master/zpu/hdl/zpuino/devices/video/vga_generic.vhd

Do you need read, write or both ?

Thanks Alvie, I will take a look. I only need read and write for a single byte at a time, but probably this can be tailored to my needs.

Stephan

Share this post


Link to post
Share on other sites

I do want to get this working for the Logic Analyzer core soon too... Hopefully I'll get time after Christmas to dig into this some more.

 

Jack.

Thank you, Jack! Please let me know when something is available. In the meantime I will try to make progress with Alvie's proposal.

Stephan

Share this post


Link to post
Share on other sites

You need byte access... that's not very good. The memory controller is optimized for large read/write blocks, not single word access. Remember SDRAM has a big latency.

 

What's your read/write pattern ? Is is sequencial or random ?

 

Perhaps using a small cache may help here.

Share this post


Link to post
Share on other sites

Hello Alvie,

You need byte access... that's not very good. The memory controller is optimized for large read/write blocks, not single word access. Remember SDRAM has a big latency.

 

What's your read/write pattern ? Is is sequencial or random ?

 

Perhaps using a small cache may help here.

my project is a RAM expansion and floppy emulator for a 6502 board, and this means read/write random access at byte level.

Do you mean a cache in connection with your burst controller?

Stephan

Share this post


Link to post
Share on other sites

Yes, I mean a cache in conjunction with the DMA engine, with IWF reads and write-combining.

 

I do happen to have one I developed for XThunderCore. It can be adapted to byte-wide (currently it's 32-bit wide), and it's rather fast (as fast as possible, at least). Still, it's a simple cache, direct mapped (two-way associative also possible, but expensive).

 

How fast is your design, in Bytes per Second ?

Share this post


Link to post
Share on other sites

Yes, I mean a cache in conjunction with the DMA engine, with IWF reads and write-combining.

 

I do happen to have one I developed for XThunderCore. It can be adapted to byte-wide (currently it's 32-bit wide), and it's rather fast (as fast as possible, at least). Still, it's a simple cache, direct mapped (two-way associative also possible, but expensive).

 

How fast is your design, in Bytes per Second ?

I don't have any design yet, except my own previous tries to implement the DMA Wishbone peripheral where only the read cycle works (see earlier posts in this thread).

The 6502 has a clock frequency of 1 MHz, so an upper bound is 1 million bytes per second, but in practice will be probably less than 500000 bytes per second as the 6502 cannot read/write on every clock cycle.

Share this post


Link to post
Share on other sites

I can mock up a 6502 memory interface.

But I need to know if you are to use cross-clock. Memory runs at 96-133Mhz, we need to do some domain crossing here.

Share this post


Link to post
Share on other sites

I can mock up a 6502 memory interface.

But I need to know if you are to use cross-clock. Memory runs at 96-133Mhz, we need to do some domain crossing here.

Yes, this requires clock domain crossing. The 6502 runs at 1 MHz.

Share this post


Link to post
Share on other sites

Hi Stephan,

 

While I'm sure the proposed solution using a zpuino can be made to work, it seems a bit heavy-handed to me.  If the only thing you need is to connect the DUO SRAM to the 6502 expansion bus and emulate the floppy with the sd-card then a more straight-forward approach would be to write a simple state machine to directly access the SRAM from the expansion bus and write a (not-so-simple) state machine to initiate and do sector-access to/from the sd-card. 

 

An example of the latter can be found in the apple2fpga_papilioduo project (see http://forum.gadgetfactory.net/index.php?/topic/2473-apple2fpga-for-duoclassic-computing-shield/)

 

Just my 2c :)

 

Magnus

Share this post


Link to post
Share on other sites

I think there's some merit in trying to use as high level approach as possible since it allows using fat (as an example) formatted cards so the card doesnt have to be dedicated to one task.

I've been experimenting with zpu(and z80/6505) floppy emulator for apple2 and the zpu one seems to require a lot less floorspace för the fat access.

Having said that the actual floppy-apple io is much easier to implement in hardware because the clocking is guaranteed to be accurate.

Share this post


Link to post
Share on other sites

Yeah, if you need FAT disk access then you definitely need a processor. 

 

However, in most cases you only need low level access since the disk image is already formatted in some other file system (like Apple HFS) and the system code only needs low level disk access to read and write data on the disk.

 

Also, it might be difficult to hide the additional latency created by accessing a native file system stored as a file on a FAT formatted sd-card, and with 4GB sd-cards costing only a few bucks it seems reasonable the dedicate an sc-card for a single purpose vs. sharing the sd-card with other uses.

 

Here is a case in point:  The MiST folks took the Big Mess o' Wires Mac Plus code and added SCSI disk support. They used the on-board ARM processor they have to do the disk access, with the HFS file system stored as a file on a FAT formatted sd-card.  However, they had to run the 68K processor at a tuned speed during boot-time since the SCSI boot-code will time out on their system if the processor is running at normal speed due to the added SCSI disk access latency, and the floppy code won't work if the processor is running too slow.  When I ported the code to Pipistrello I replaced their low-level disk access mechanism (SPI I/O to the ARM processor) with RTL code for direct disk access using a state machine to access the sd-card, using the native 4-bit wide SD protocol.  This resulted in a system that closely mimics the original SCSI implementation (the Mac SCSI had 1.25 MB/s SCSI data rate and I got 2 MB/s sd data rate using the 8 MHz system clock) and the Pipistrello version can boot from the SCSI disk even if the 68K processor is running in turbo mode (2X speed).

 

Links:

 

MiST code with SCSI disk access via ARM processor: https://github.com/mist-devel/mist-board/tree/master/cores/plus_too

Pipistrello version with disk access via state machine: http://www.saanlima.com/download/pipistrello-v2.0/PlusToo_scsi_20151219.zip

 

Magnus

Share this post


Link to post
Share on other sites

Magnus and Vlait, thanks for the suggestions. My idea was also that it would be the easiest way to use a zpuino-based solution, with a FAT file system on the SD card. I will a look at the Pipistrello solution.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now