Using on-board RAM in designs


stm

Recommended Posts

Hi,

 

I wonder how the access to the on-board RAM can be implemented and organized.

 

So far I know that a ZPUino uses the on-board RAM to store programs and data. If I understand it correctly, the ZPUino implementation uses the sram_ctl8.vhd source to access the RAM chip via a Wishbone bus interface.

 

I want to use the Papilio DUO to simulate an expansion board for an old 6502-based computer (Ohio Scientific Challenger 1P). On the one hand I want to use a part of the Papilio DUO RAM chip as a memory expansion for the 6502 board. On the other hand a floppy controller with attached floppy disks shall be simulated with storage on an SD card.

 

So for the RAM expansion alone I only need to translate the 6502 bus to something that can communicate with the Papilio DUO's RAM chip. I either could create a Wishbone bus wrapper around the 6502 bus, and use the sram_ctl8.vhd as-is, or I could directly build a bridge from the 6502 bus to the RAM chip based on the internals of the sram_ctl8.vhd source.

 

But when the floppy controller comes into the picture it gets more complicated. I need to simulate some of the chips for the floppy controller (6850 ACIA and 6820 PIA) and map them into the address space of the 6502. For access to an SD card I will either need to use a ZPUino or the real ATmega32U4 processor. With a ZPuino I would need to divide the RAM between the memory expansion and the ZPUino.

 

So lots of questions arise:

 

How can I divide the RAM between the ZPUino and other parts of my design? Could that be managed with compile and link time options when building the sketch for the ZPUino, or would I need to build something in VHDL?

 

Is it even possible at all to influence the ZPUino's use of the RAM chip without modifying its implementation?

 

Would it be better to use the ATmega32U4 for implementing the access to the SD card?

 

I'm obviously at the very beginner level regarding designing and implementing such a project, so I would be very grateful about any tips and experiences in this area.

 

Thanks

Stephan

Link to comment
Share on other sites

Hello Stephan,

 

Alvie will probably have a better answer, but let me give it a try.

 

I would go for a ZPUino solution here so you can benefit from the SD card libraries. 

 

The best solution will probably be to make a Wishbone peripheral that uses DMA to access the memory for your memory expansion module. The VGA adapter that Alvie recently made is a perfect example of this. Both the ZPUino and the DMA peripheral are Wishbone masters. You just add a register to the memory expansion peripheral telling it where in the memory space it should put its base address. Then in your sketch you will use malloc to allocate, lets say 32KB, memory. You will get a pointer to where that memory is located which you then set your DMA Wishbone peripheral's base address to. This way the ZPUino and your DMA Wishbone peripheral can use the memory at the same time without stepping on each other. The only problem is we have yet to make an easy to understand example of how to do this, but all the pieces of the puzzle are there and we can help you through it in the forum here.

 

The other solution that occurs to me is to use the ZPUino for both tasks. The ZPUino runs at 96Mhz and your 6502 probably runs at 1Mhz. You should have plenty of cycles to present the SD card interface and the memory expansion on GPIO pins... This solution would be a ZPUino_Vanilla circuit with just straight c++ code to implement the functionality... 

 

Jack.

Link to comment
Share on other sites

Hello Jack,

 

Hello Stephan,

 

Alvie will probably have a better answer, but let me give it a try.

 

I would go for a ZPUino solution here so you can benefit from the SD card libraries.

Just for my understanding: I could also use the SD card libraries from the real ATmega32U4, right?

 

The best solution will probably be to make a Wishbone peripheral that uses DMA to access the memory for your memory expansion module. The VGA adapter that Alvie recently made is a perfect example of this. Both the ZPUino and the DMA peripheral are Wishbone masters. You just add a register to the memory expansion peripheral telling it where in the memory space it should put its base address. Then in your sketch you will use malloc to allocate, lets say 32KB, memory. You will get a pointer to where that memory is located which you then set your DMA Wishbone peripheral's base address to. This way the ZPUino and your DMA Wishbone peripheral can use the memory at the same time without stepping on each other. The only problem is we have yet to make an easy to understand example of how to do this, but all the pieces of the puzzle are there and we can help you through it in the forum here.

Ah yes, getting the block for the memory expansion via malloc makes sense.

Where is the VGA adapter from Alvie? Is that part of the DesignLab libraries, or does it live somewhere else?

 

The other solution that occurs to me is to use the ZPUino for both tasks. The ZPUino runs at 96Mhz and your 6502 probably runs at 1Mhz. You should have plenty of cycles to present the SD card interface and the memory expansion on GPIO pins... This solution would be a ZPUino_Vanilla circuit with just straight c++ code to implement the functionality...

Because I want to learn some VHDL with this project, I think I leave the memory expansion part and floppy controller part as VHDL for now...

As always great answers, thank you Jack!

Stephan

Link to comment
Share on other sites

  • 2 weeks later...

Hello Jack,

 

The best solution will probably be to make a Wishbone peripheral that uses DMA to access the memory for your memory expansion module. The VGA adapter that Alvie recently made is a perfect example of this. Both the ZPUino and the DMA peripheral are Wishbone masters. You just add a register to the memory expansion peripheral telling it where in the memory space it should put its base address. Then in your sketch you will use malloc to allocate, lets say 32KB, memory. You will get a pointer to where that memory is located which you then set your DMA Wishbone peripheral's base address to. This way the ZPUino and your DMA Wishbone peripheral can use the memory at the same time without stepping on each other. The only problem is we have yet to make an easy to understand example of how to do this, but all the pieces of the puzzle are there and we can help you through it in the forum here.

 

I tried to make progress according to your suggestions, but I didn't get very far. I have serious problems to understand the whole process to set up a Wishbone peripheral and to access it from a ZPUino sketch.

 

This was my idea to get started:

 

  • Look at how the ZPUIno_GFX module sets up the VGA adapter.
  • Create a minimal Wishbone peripheral that only can read a value and write back a value to simulate the later needed interface for setting the base address of the memory block.
  • Connect the peripheral to the ZPUino in the same manner as the VGA adapter is in the example (connect to the left connector of Wishbone slot 14).

For the minimal Wishbone peripheral I took the \DesignLab-1.0.5\libraries\ZPUino_Wishbone_Peripherals\Wishbone_to_Registers_x10.vhd file as a blueprint.

 

You can see my attempt to implement this minimal functionality under this directory in the GitHub repository for my project.

 

What I don't understand currently:

 

  • How are the Wishbone peripherals initialized correctly?
  • Would my envisioned memory extension be a "device" that needs to be added to the registry of Wishbone devices?
  • How is the slot set correctly when initializing the Wishbine peripheral?

The current code is the result of my guesswork so far, so please apologize if it's total nonsense :-) What I want to do in my test sketch is to write an integer value to the Wishbone peripheral and read it back, and I can't get this work.

 

Best regards

Stephan

Link to comment
Share on other sites

Stephan,

 

You will want to look at these two tutorials as they are meant to get you started with making your own Wishbone peripheral:

http://gadgetfactory.net/learn/2015/04/03/designlab-libraries-library-quickstart/

http://gadgetfactory.net/learn/2015/04/03/designlab-libraries-make-a-wishbone-library/

 

Please keep in mind that Alvie's code is much more advanced then the examples. I only pointed you to his examples to look at the DMA portion of the code...

 

Jack.

Link to comment
Share on other sites

Hello Jack,

 

Stephan,

 

You will want to look at these two tutorials as they are meant to get you started with making your own Wishbone peripheral:

http://gadgetfactory.net/learn/2015/04/03/designlab-libraries-library-quickstart/

http://gadgetfactory.net/learn/2015/04/03/designlab-libraries-make-a-wishbone-library/

 

Please keep in mind that Alvie's code is much more advanced then the examples. I only pointed you to his examples to look at the DMA portion of the code...

 

Jack.

 

I had already watched these videos back and forth, but I hadn't paid enough attention to how the library accesses the Wishbone registers in the second video.

 

After switching from the REG() function used in the VGA code to the REGISTER macro used in the video my test sketch began to work.

 

Thanks, Jack. As a next step I will try to adapt the DMA code.

 

So what exactly is the difference between the two methods to access the Wishbone registers, via REG() and via REGISTER()?

 

Best regards

Stephan

Link to comment
Share on other sites

Stephan,

 

The REG() version is a nice class (BaseDevice) that Alvie wrote that provides additional functionality such as auto detection of peripherals. I haven't had the time to fully understand the class yet, so have not personally used it or made any examples of how to use it yet...

 

Maybe I'll try to make an example using the BaseDevice class tomorrow.

 

Jack.

Link to comment
Share on other sites

Jack,

 

I now tried to further proceed with the implementation of my DMA peripheral.

 

I implemented the Wishbone read and write cycles, and they can be triggered from an Arduino sketch over the other Wishbone interface for testing. I wrote a VHDL testbench, and it looks to me like the bus cycles should work correctly.

 

I have some questions regarding the Wishbone implementation:

  • Is it correct that reads and writes are always in units of 4 bytes?
  • Why is there no SEL_O signal needed? Isn't that mandatory according to the Wishbone specification?

 

When I try my test sketch I can see that something happens, although not the right things... :( Bogus values are written and read, and reproducibly after three invocations of the loop() function there's a hang.

 

Do you have any tips or tricks for debugging the Wishbone peripheral?

 

Thanks

Stephan

Link to comment
Share on other sites

 

I have some questions regarding the Wishbone implementation:

  • Is it correct that reads and writes are always in units of 4 bytes?
  • Why is there no SEL_O signal needed? Isn't that mandatory according to the Wishbone specification?

 

When I try my test sketch I can see that something happens, although not the right things... :( Bogus values are written and read, and reproducibly after three invocations of the loop() function there's a hang.

 

Hi stm,

 

Regarding your questions:

Yes, it is correct that you cannot do unaligned accesses, so you need to perform a 32-bit read or write.

Regarding SEL, it's not mandatory, and we don't use it for DMA right now, although I can add it really quick.

 

Also note:

 

The DMA interface is Wishbone in Pipelined Mode, not Wishbone Classical. You need to take this into account.

 

Can I get a copy of your Wishbone Master device ? Send it to me by email (alvieboy at alvie dot com), and I'll take a look and do some simulations.

 

Alvie

Link to comment
Share on other sites

Hi Alvie,

Hi stm,

 

Regarding your questions:

Yes, it is correct that you cannot do unaligned accesses, so you need to perform a 32-bit read or write.

ok, thanks.

 

Regarding SEL, it's not mandatory, and we don't use it for DMA right now, although I can add it really quick.

Ok, then I misinterpreted the Wishbone specification. It's fine for me that we can live without this signal.

 

Also note:

 

The DMA interface is Wishbone in Pipelined Mode, not Wishbone Classical. You need to take this into account.

This is a fundamental problem then with my current implementation. I will look at pipelined mode and I will try to implement it.

 

Can I get a copy of your Wishbone Master device ? Send it to me by email (alvieboy at alvie dot com), and I'll take a look and do some simulations.

I will send you a ZIP file via email. The whole circuit and the ZPUino test sketch are also available on GitHub.

Thanks for taking a look at the module. As it has no pipeline mode, it cannot work currently, but I'm grateful for any remarks.

 

Best regards

Stephan

Link to comment
Share on other sites

Hello Alvie,

 

Just replied to you.

 

I remebered that I have a wrapper in case you want to use it for now. https://github.com/alvieboy/ZPUino-HDL/blob/work-0200/zpu/hdl/zpuino/wb_master_np_to_slave_p.vhd

 

It's a wrapper for a non-pipelined master to connect to a pipelined slave (like the DMA interface).

As a first step I used the wb_master_np_to_slave_p.vhd wrapper that you suggested. I also fixed some problems in my VHDL. The good news is that all hangs are gone now, and the that pipelined reads do work now.

What does not work are the pipelined writes. I did a simulation, and what appears to be wrong to me is the WE_O signal that comes out of the wrapper. According to "Illustration 3-8: Pipelined SINGLE WRITE cycle" in the Wishbone specification this should be asserted with the first rising edge of the CLK_I signal and negated with the second rising edge of the CLK_I signal. But in my simulation it stays asserted until the third rising edge of the CLK_I signal.

When I look at the VHDL code in wb_master_np_to_slave_p.vhd, it's clear that this happens:

s_wb_we_o <= m_wb_we_i;

WE_O is simply forwarded from the corresponding signal of the classic standard single write cycle. In classic standard single write cycle the WE_O signal is to be asserted from the first rising edge of the CLK_I signal until the third rising edge of the CLK_I signal (see "Illustration 3-7: Standard SINGLE WRITE cycle").

So I guess there should be some logic that negates the WE_O signal with the second rising edge of the CLK_I signal, or am I missing something?

Best regards

Stephan

Link to comment
Share on other sites

According to "Illustration 3-8: Pipelined SINGLE WRITE cycle" in the Wishbone specification this should be asserted with the first rising edge of the CLK_I signal and negated with the second rising edge of the CLK_I signal.

 

 

WE signal is only valid when CYC and STB are asserted. It can stay "undefined" (either 0 or 1) as long as one is not on.

 

In 3-8, no stalling occurs. What is relevant is STB being de-asserted. WE can be asserted because it is ignored (no strobing occurs).

Link to comment
Share on other sites

WE signal is only valid when CYC and STB are asserted. It can stay "undefined" (either 0 or 1) as long as one is not on.

 

In 3-8, no stalling occurs. What is relevant is STB being de-asserted. WE can be asserted because it is ignored (no strobing occurs).

I see, so this was a red herring.

It's strange that the read cycle is working correctly while the write cycle isn't. May I send you the files of my implementation another time?

Thanks

Stephan

Link to comment
Share on other sites

I have done more testing now, and it looks like I'm stuck. While the read cycle of my DMA peripheral now works as desired, the write cycle doesn't work, it seems to have no effect. In an ISim simulation of the DMA peripheral the Wishbone pipelined write cycle looks ok to me.

Are there any additional tips or tricks how I can debug this at the Wishbone bus level when my DMA peripheral interacts with the ZPUino? I guess an ISim simulation of ZPUino plus DMA peripheral is a hopeless untertaking...

Stephan

Link to comment
Share on other sites

Hello Alvie,
 

Can you show me a waveform ?


if you mean the diagram of an ISim simulation, I have attached a picture when running the DMA peripheral in an ISim simulation. The s_* signals are the signals connected to the "wishbone_slot_video_out" and "wishbone_slot_video_in" buses in the full circuit with the ZPUino.

 

The DMA  write cycle starts at 170ns and the DMA read cycle starts at 220ns.

 

Stephan

post-38585-0-52677000-1430843498_thumb.p

Link to comment
Share on other sites

Indeed, s_ signals look correct. But...

 

But I don't think you are acessing RAM. I see a 1-cycle delay for reads.

So I assume you're not using the RAM model for simulation. And your "model" does not model stalling, so I cannot say what is going on. Stalling will definitely happen, and read delays will exist.

 

On a side note: I am able to do full ZPUino simulations indeed, but they can be tricky.

 

What do you have connected as "slave" for the DMA ?

 

Edit: can you send me the model ?

Link to comment
Share on other sites

Indeed, s_ signals look correct. But...

 

But I don't think you are acessing RAM. I see a 1-cycle delay for reads.

So I assume you're not using the RAM model for simulation. And your "model" does not model stalling, so I cannot say what is going on. Stalling will definitely happen, and read delays will exist.

You are right, my simulation does not model stalling, and I'm not using a RAM model in the simulation. I was not aware that one is available, or do you mean the sram_ctl8.vhd module?

On a side note: I am able to do full ZPUino simulations indeed, but they can be tricky.

 

What do you have connected as "slave" for the DMA ?

 

Edit: can you send me the model ?

My VHDL testbench simply sets ACK_I after a cycle to acknowledge the read or write. I will try to model stalling in the simulation to see how it behaves.

As we found out earlier my initial implementation of the DMA peripheral was non-pipelined. It implemented the "classic standard single read/write cycles". After your hint I added the wb_master_np_to_slave_p.vhd wrapper around it, and that is what you are seeing in the simulation above. When I added the wrapper, the read cycle started to work.

I was under the impression that the wb_master_np_to_slave_p.vhd wrapper would deal with stalling.

I will send you my current implementation and the test bench, thanks for taking a look!

Stephan

Link to comment
Share on other sites

Hello Alvie,
 

So I assume you're not using the RAM model for simulation. And your "model" does not model stalling, so I cannot say what is going on. Stalling will definitely happen, and read delays will exist.


I now added a read and a write cycle with stalling to the ISim simulation (see attached image). The cycles that include stalling are a write cycle that starts at 220ns and a read cycle that starts at 340ns. Do these look correct?

Stephan

 

 

post-38585-0-57022300-1430929717_thumb.p

Link to comment
Share on other sites

Hi Alvie,

Looks good. Let me dwelve into this during weekend.

 

Alvie.

 

sorry for being such a pain, but were you able to look into the problem?

 

What can I do myself to further troubleshoot the issue?

 

Thanks

Stephan

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.