Guest essele Posted October 7, 2011 Report Share Posted October 7, 2011 Hi, I know this probably isn't the best section to post a generic memory question, but it does seem relevant to the platform (given the C/RAM wing and the Spartan 6 projects.) I have a project that I'm working on where I need to run a multi-layer framebuffer for a 320x240 LCD screen ... I'm using the Papilio One to get the LCD working but will have to look at different options when I come to implement. Basically I need to build up the video signal by reading (and processing) the four different layers, each of them is 320x240x8 (or at least the bits I'll need to access) ... therefore, with a 60Hz refresh rate, by my calculations I'll need to read 18,432,000 bytes per second just to refresh. The problem is that the nature of the processing I'm doing means that the memory won't be read sequentially (not even within the layers) so I need good performing memory that can deal with random access. So my current thinking is to use 8bit wide 10ns SRAM, latency isn't an issue as I'll need to heavily pipeline the processing anyway, so it's bandwidth that's my concern. With 10ns async I would expect to be able to get to 100MHz using the setup & read on consecutive clocks (assuming I understand this correctly) ... in any case 50MHz would actually be ok. That should give me plenty of time to be able to handle updates etc (with an appropriate multiplexer on the memory.) So that seems ok, however ultimately I'd like to either use higher resolution screens or add some anti-aliasing to the system ... this will need potentially 4-times the read bandwidth, and I'd still need to be able to make updates. So what are my options here? I'd still like to stay with a Spartan 3E, but could potentially use a bigger (208) footprint to get more I/O's. 1. I could use wider memory (16 or 32bit), though my accesses are all 8bit and for the most part are random so I'm confused as to how I could really make use of this. 2. I could use multiple memories and use them in parallel - one for each layer ... simple, but would take up loads of I/O's. 3. Is there any benefit to using DDR/DDR2 ... it looks like you can get better throughput, but non-consecutive reads, and intermingled writes sound like a problem, not to mention that the DDR controllers seem to be pretty complex. Any help appreciated. Lee. Link to comment Share on other sites More sharing options...
alvieboy Posted October 7, 2011 Report Share Posted October 7, 2011 Hi, let me see if I understood correctly. The problem is that the nature of the processing I'm doing means that the memory won't be read sequentially (not even within the layers) so I need good performing memory that can deal with random access. Are your memory accesses predictable in any way ? So my current thinking is to use 8bit wide 10ns SRAM, latency isn't an issue as I'll need to heavily pipeline the processing anyway, so it's bandwidth that's my concern. With 10ns async I would expect to be able to get to 100MHz using the setup & read on consecutive clocks (assuming I understand this correctly) ... in any case 50MHz would actually be ok. 100MHz is tricky, almost impossible to met. 50Mhz is fine though, even with sequential reads. Expect about 3ns offset out (assuming you are driving from a FF) and 3ns offset in. If you include propagation delays, and if you note that this SRAM has only a 2ns hold time, thinks are a bit complex. 1. I could use wider memory (16 or 32bit), though my accesses are all 8bit and for the most part are random so I'm confused as to how I could really make use of this. Use a 16-bit memory and use the UB/LB masks. For reads, just use an 8-bit muxer. Note that I think LB/UB will not be connected in Papilio Plus, so to write 8-bits you will need to read the 16-bit word, modify it and then write the whole value. And stay away from DDR, at least with Spartan3. It's a very complex matter. Spartan6 seems to make things easier, but I have not tried. Alvie Link to comment Share on other sites More sharing options...
Guest essele Posted October 7, 2011 Report Share Posted October 7, 2011 Hi Alvie, Thanks for the response. Are the reads predictable? Not really, some of the layers will be being rotated and so the memory accesses used during the video-out phase will be dependent on the angle ... and this can change each frame. I'm not sure I entirely follow your timing comments on 100MHz. I think what you're saying is that there is a 3ns delay to get in and out of the fpga. So therefore assuming the address is set at clock 0, it won't actually get set until 3ns after, the data wont actually be available at the second clock etc. If that's what you mean, then it makes sense. I guess the answer to this is synchronous memory then?? For example IDT71V3578S133PFG? Still looks a little more complex, but should get up to 100MHz? In this case I could use each half of the data for different layers (not improving performance, but I can't find any 8bit versions of this.) This device shows a 1.5ns "clock high to data change" time, does this mean I'll have the same problem, i.e. the valid data won't be available for long enough for me to get it?? Anyway, I think my first solution will be async sram at around 50Mhz ... once I've got that working I'll consider other options, the easiest one certainly looks like multiple srams in parallel ... but that does give pin, space & cost issues. Thanks, Lee. Link to comment Share on other sites More sharing options...
alvieboy Posted October 7, 2011 Report Share Posted October 7, 2011 I'm not sure I entirely follow your timing comments on 100MHz. I think what you're saying is that there is a 3ns delay to get in and out of the fpga. So therefore assuming the address is set at clock 0, it won't actually get set until 3ns after, the data wont actually be available at the second clock etc. Let me try explaining this: Let's say you have an internal FPGA clock named CLK. Let's say your are using some FF to drive the address of the SRAM (meaning the FF outputs are mapped to output pins). I'll When you clock rises, the FF output changes after time T1. The output on pin will be available after T2 due to output buffer delays. Data will then be available at SRAM after Tpd (propagation delay). SRAM data will be present at output, and will take Tpd to reach the FPGA pin. Due to input buffer delays, this signal will be available after T3. This signal will then be sampled to FPGA clock either directly or using a latch. So, let's assume this: a) SRAM data output for address A is valid 10ns after address changes to A; SRAM data output for address A is still valid within 2ns when address change from A to B (hold time) Let's say CLK rises at time 0. At this time, we have presented the address on the FF. This address will reach the SRAM at T1+T2+Tpd The SRAM will react, and data will be available at time: (T1+T2+Tpd) + 10ns It will be available inside FPGA at time (T1+T2+Tpd) + 10ns + (Tpd + T3) This is a simple computation, right ? Problem is you want to change the address in the mean time, so to fetch a new value. If you imagine a 100Mhz clock (10ns period) you can see that you must issue the new address before the previous address data was retrieved. Tricky, right ? We have to pipeline these operations. But this is not all: when you change address, output will only be stable for 2ns. A timing diagram helps understanding the problem: As you can see, not only you have a tiny 2ns window, you also don't have a clock which can be used to sample data. I used about 3ns-3.5ns delays. These vary a lot from device to device, and from routing to routing. Almost never the same. Alvie Link to comment Share on other sites More sharing options...
Jack Gassett Posted October 7, 2011 Report Share Posted October 7, 2011 Alvie answered this much better then I could have. One thing I would also recommend is to stay away from DDR with the Spartan 3E. I made an attempt years ago and I was never able to get it to work correctly. I think it should be no problem to implement SDRAM, but you will need an SDRAM controller core to use it. The Spartan 6 makes DDR easier if you are using one of the larger chips that include a DDR controller. You have to use fixed pins if I'm not mistaken, and anyway the LX4 and LX9 chips do not include the DDR controller. Now Artix is a different matter, with Artix they are making it very easy to implement DDR memory on every single size chip, and you won't have fixed pin locations. I'm very much looking forward to the Artix chips. If you want I can send you a C/RAM Wing to experiment with. Keep us posted, Jack. Link to comment Share on other sites More sharing options...
Guest essele Posted October 7, 2011 Report Share Posted October 7, 2011 Wow ... what a great explaination. Thanks. So providing you can assure that the data has reached the FPGA by the time the clock ticks then it's fine ... so if we assume (for arguments sake) 3ns total delay each way, then we need to add 6ns to the 10ns cycle ... giving us 16ns, and a theoretical max (for these fake numbers) of 62Mhz. What about the synchronous option? Is that another alternative? (I assume that's clocked and hence you don't have the data-valid window problem??) Jack - I'd love to experiment with the C/RAM wing, I'm in the UK ... happy to pay though! It would be great if I could get everything done with the Papilio! Lee. Link to comment Share on other sites More sharing options...
alvieboy Posted October 7, 2011 Report Share Posted October 7, 2011 So providing you can assure that the data has reached the FPGA by the time the clock ticks then it's fine ... so if we assume (for arguments sake) 3ns total delay each way, then we need to add 6ns to the 10ns cycle ... giving us 16ns, and a theoretical max (for these fake numbers) of 62Mhz. Truth is you can actually run this at 100Mhz, by increasing input delay, so that the rising clock edge is about half way on those 2ns. This is however, tricky. It will also add additional latency, but if you use bursts you should not be affected. I'm running the SRAM at 50Mhz now, and works pretty well. I forgot to focus on write: writing those memories at 100Mhz is painful: the WE signal is required to be '0' for at least 8ns, and it has to go back to '1' so that memory write is actually performed. Again, a 2ns thing. I tried to find a way to do this, I might have a solution, but its pretty much awkward. Why is this signal so complex to generate ? Because it's a 10ns clock, with 80% duty cycle. The usual clocking options (clk90 [adds 2.5ns], clk180 [adds 5ns], clk270 [adds 7.5ns]) are not within the required time. The only option I see is to use a FF with asynchronous set/reset connected to CLK270 with added delay provided by a few LUT. Tricky, tricky. What about the synchronous option? Is that another alternative? (I assume that's clocked and hence you don't have the data-valid window problem??) Source-Synchronous designs use another technique: a separate clock, with is not phase aligned with main clock. The phase is set either by feedbacking the output clock into the FPGA, or manually programmed. This is also tricky sometimes (that's why it's so difficult to put DDR to work with S3E). DDR also poses another problem - [most of] those devices require at least a 75MHz clock, so you cannot clock them at lower speeds, in case you don't meet timings. Here's a paper for Altera that shows things involved: http://www.altera.com/literature/an/an433.pdf And one from Xilinx: http://www.eng.utah.edu/~cs3710/xilinx-docs/XAPP768c.pdf These things are so complex sometimes that a few mm PCB trace length error is enough for your design to not work properly. That's also why you see sometimes a few weird traces in DDR connections, so that all signals have the same Tpd (by making all traces the same length, and with same impedance). Alvie Link to comment Share on other sites More sharing options...
Jack Gassett Posted October 8, 2011 Report Share Posted October 8, 2011 Lee, PM me your shipping address and I'll send a C/RAM Wing. Jack. Link to comment Share on other sites More sharing options...
rpflaum Posted October 10, 2011 Report Share Posted October 10, 2011 I would like to use the C/RAM wing also with the Papilio One + board. Let me know the details as to payment so that it can possibly ship with a Papilio One + board when available. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.