A VGA 800x600 byte-addressed frame buffer


hamster

Recommended Posts

Hi, I've managed to get my memory controller / frame buffer working.

With an 80MHz clock it generates 800x600@60Hz with 8bit colour making the most of the 512KB SRAM.

Currently the interface supports only a relatively low 10MB/s of byte-wide writes (no reads), but should you want to adapt it to take only 16 bit writes then you will be able to get 40MB/s as it will no longer need to perform a read for every write cycle.

I've closely integrated the video signal generation with the memory controller. It is relatively light weight (49 slices), and no FIFOs are required to cross clock domains between the memory and display.

Source is available at http://hamsterworks.co.nz/mediawiki/index.php/DSPfract#Memory_Controller

I've done some basic testing (well, a few test patterns looks correct), but some small errors may still lurk.

Link to comment
Share on other sites

You can take a look at my VGA implementation @50MHz pixel rate. it's also 8-bit mapped (3R-3G-2B).

No problems at all with SRAM. Of course blitting is always slow, but that can be improved. it does however use a FIFO, so to not stress bandwidth too much.

I assume you did take a look at my code (I think I posted it here back then).

What's your SRAM clock ?

Link to comment
Share on other sites

Hi Alvieboy - yes, I did look at your code, and then made my reads take two cycles and they worked :-). New I was doing something dumb.

I was attempting to do reads in a single cycle of an 80MHz clock (12.5ns) so it should be possible, and performing the writes int two cycles using the a DDR output on the WE signal to make give 18.75ns setup time for the address and data before the rising edge of the 12.5 ns WE pulse.

I was attempting to fit the following into four clock cycles (50ns):

* READ for VGA (1 cycles)

* READ for byte write (1 cycles)

* WRITE (2 cycles)

All good in theory, but my mental model doesn't map to reality too well. Instead I'm now doing

* READ for VGA (2 cycles)

* READ for byte write (2 cycles)

* READ for VGA (2 cycles)

* WRITE (2 cycles)

Like you, it does all I need for now.

What do you thing the solution will be to get the higher throughput? just adding  IODELAY blocks onto the inputs for MEM_DATA?

Link to comment
Share on other sites

What do you thing the solution will be to get the higher throughput? just adding  IODELAY blocks onto the inputs for MEM_DATA?

Actually adding delay to outputs, using LUT+OFF. Idea is to get that nasty WE signal in the right place. But this is trick, because you need to lock LUT to their places because P&R might change the delay values.

Link to comment
Share on other sites

Ok, I might have found a way... which can be simpler, but involves using an extra DCM. I have not implemented it, just simulated a simple design.

So, let's take a base clock of 100MHz (period 10ns). Out write signal must be at least 8ns, but needs to rise so that memory write occurs (according to datasheet).

Let's pick another DCM, and generate a 200MHz signal, shifted by 270deg.


CLK100:  __________----------__________----------__________
CLK200:  -----_____-----_____-----_____-----_____-----_____
SHIFTED: --_____-----_____-----_____-----_____-----_____---

If we use the SHIFTED clock to synchronous load a '0' onto FF, and use CLK100 as an asynchronous set, we get this:


CLK100:  __________----------__________----------__________---
SHIFTED: --_____-----_____-----_____-----_____-----_____-----_
OUTPUT:  __________-----------------___-----------------___---
 

 

That short pulse is about 1.25ns, so the WE pulse will be 8.75ns.

In theory this ought to work.

Link to comment
Share on other sites

  • 2 weeks later...

I just noticed that if you don't have OE active, you can use a 6.5 ns pulse on WE.

This could be generated using a ODDR2 output if the rest of the controller is running at 75MHz or less (it actually worked for me at 80MHz (which gave an out of spec 6.25MHz pulse).

If you are running at 100Mhz, using a 270 degree phase shifted clock attached to the  async 'set' of the DDR2 should give you a well in-spec 7.5ns.

Mike

Link to comment
Share on other sites

We are indeed using the -10 device, I just checked on Digikey to see what the difference in price between -10 and -8 is and they don't even have the -8 in stock.

I'm in the process of putting together the manufacturing package and I will see if we can do a -8 instead of -10.

Jack.

Link to comment
Share on other sites

My own view is that it is most probably not worth the extra $s (unless it is just a few cents)

I've currently got memory running at 12.5ns reads and 25ns writes, and the 12.5ns writes are 'almost' working.

I'm almost sure that most people will take a conservative approach and run the SRAM at 50MHz, even if they use 100MHz internally.

If you are emulating old kit (e.g. Arcade games)  that runs at a conservative clock rate of 16MHz, you can fit three reads and a write into each 62ns time slot. Being able to fit an extra cycle isn't going to be a deal-breaker - unless marketing requirements demand 200MB/s bandwidth!

If it is, using "16 bit, 10ns SRAM" tells the truth - you may get 200MB/s out of it, but you will be running at the devices limits.

Anyway, if 200MB/s is an essential requirement most people will be going to SDRAM as a megabyte or two will be too small for most high-bandwidth applications (e.g. full HD video needs approx 6MB per frame, and 150MB / sec @ 25 frames).

Link to comment
Share on other sites

I'm wondering if the info in table 94 of the 3E datasheet (page 133 of http://www.xilinx.com/support/documentation/data_sheets/ds312.pdf) could be of any use with timing the WE pulse.

If you use different drive strengths for the the data  address lines and 'the 'WE' signal you could get the pulse to be advanced by offset by over 3ns.

Might try it tomorrow...

Link to comment
Share on other sites

Anyway, if 200MB/s is an essential requirement most people will be going  to SDRAM as a megabyte or two will be too small for most high-bandwidth  applications (e.g. full HD video needs approx 6MB per frame, and 150MB /  sec @ 25 frames).

Agreed, the Papilio Plus uses SRAM because I want an option that is as easy to use as possible for people. But if we want more memory then SDRAM is going to be more cost effective with the addition of a more complicated control scheme. Once the Papilio Plus board is being manufactured and available I will start working on another board that provides more, and faster memory.

If the timing is right I will make an Artix (Series 7) board. The nice thing about the Artix chip is that they have DDR memory controller support in every single chip size. So, we should be able to add TONS, of very cheap, very fast memory to an Artix based board.

I tried to add DDR memory to a Spartan 3E board once and I failed miserably. :) Doing it on a two layer board was just not a good idea without a dedicated memory controller. But the Artix has come a long, long way in making DDR memory easier to implement.

Jack.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.