Matt Ownby

Writing to an SD card and logging via FTDI usb port

35 posts in this topic

Hi guys,

 

What I'm currently trying to do (aside from going through most of Hamster's tutorials, thanks these are GREAT!) is to read from a 15 MHz ADC (hardware already working) and stream the results to an SD card.  I don't even know if I'll be able to pull this off due to the performance demands, but it will still be very educational for me to go through the steps of learning how to control an SD card.

 

I realize that it is popular to control SD cards in SPI mode but since I really need every ounce of speed I can get, I want to do it in the 25 megabyte/second mode ("high speed, wide-bus SD mode").

 

Since the SD spec requires quite a bit of setup before one can actually start bulk writing, I am _positive_ that I will be making mistakes and I am going to need some feedback along the way.

 

So I think it would be _very_ helpful I can get some logging over the FTDI usb port to show me what's going on.

 

I've gone through hamster's tutorial on how to output RS232 and I actually got this working earlier today (woohoo) using putty on a windows PC.

 

But here is the issue I am currently grappling with:

 

- since the FPGA has different components running in parallel, it is conceivable that two different modules may want to output to the log simultaneously.  Is there some kind of locking mechanism to prevent this?  like some kind of mutual exclusion object or something?

 

- And what about the issue of the log wanting to output to the serial port while another module wants to add to its buffer?

 

- Speaking of buffers, I am thinking about using a small array of bytes with indices pointing to the start and end of the buffer (ie a revolving buffer).  Too complicated?

Share this post


Link to post
Share on other sites

Hi Matt,

 

First let me answer some of your questions.

 

You would have each thing that outputs to the log only do so on the condition that it is ready to receive it.

 

The serial port would best have a FIFO buffer that is emptied on one end into the serial port and added to when things write to the log, this buffer should be long enough to try not to hold up anything when it prints.

 

You are talking about things and seem to be thinking about them in a very software like manner, you must remember that what you write is hardware, not software.

 

 

Second,

 

despite the fact that the 4 bit SD interface allows transfers up to 25MB/s the cards will likely not write at this speed, in fact class 2 4 and 6 cards are guaranteed only to write at 2 4 or 6 MB/s in a fragmented state. (the slowest circumstance)

class 10 cards guarantee the 10MB/s write only in an unfragmented state.

 

So you are likely not to achieve the write speeds you want.

Share this post


Link to post
Share on other sites

Hi!  Thanks for your response!

 

Hi Matt,

 

First let me answer some of your questions.

 

You would have each thing that outputs to the log only do so on the condition that it is ready to receive it.

 

I realize this.  But what if two things see that the log is ready and both try to add new content to it at the same time?  Chaos would ensue.

 

I think I've figured out how to deal with this though.  I just won't let multiple sources write to the log; instead I will make the log pull from known sources.  That way only one thing will be writing to the FIFO buffer.

 

The serial port would best have a FIFO buffer that is emptied on one end into the serial port and added to when things write to the log, this buffer should be long enough to try not to hold up anything when it prints.

 

You are talking about things and seem to be thinking about them in a very software like manner, you must remember that what you write is hardware, not software.

 

Ah, please forgive me.  I talk about things from a software paradigm because that's the easiest way for me to describe them.  My hope is that if I describe them from a software POV that smart people like you guys will understand what I am trying to say anyway.

 

Second,

 

despite the fact that the 4 bit SD interface allows transfers up to 25MB/s the cards will likely not write at this speed, in fact class 2 4 and 6 cards are guaranteed only to write at 2 4 or 6 MB/s in a fragmented state. (the slowest circumstance)

class 10 cards guarantee the 10MB/s write only in an unfragmented state.

 

So you are likely not to achieve the write speeds you want.

 

Well, my first exercise will be trying to write (unfragmented) a bulk page of bytes to the card and see how fast I can do it.  I will be using a UHS class card and will not using a file system.  Just trying to do a raw write.

 

Thanks for your help!

Share this post


Link to post
Share on other sites

Hi Matt, Half of your project is eerily similar to something I am working on, and OmniTechnoMancer knows of too. I'm trying to capture high speed data and needed a way to see what is going on, 

 

I've just hacked up  a module that will capture an 18 bit wide bus at 256MHz, and then send them up the serial port as ASCII ones and zeros so I can see what is going on.

 

Since it might be close to what you are trying to do I've posted it on my Wiki at http://hamsterworks.co.nz/mediawiki/index.php/RS232_dumper

Share this post


Link to post
Share on other sites

I'm using the LUT primitive as an inverter, and controlling placement. This gives a delay chain with approximately a 1ns delay between sampling points. The LUTs have to be set up as inverters as the 1s hold up longer than the 0s, so your 0s get shorter unless you invert the signal at every stage to even out the effect.

 

I'm trying to get a half nano-second delay by using an adder with one input of zeros, the other of ones. Each step in the carry chain adds another 0.1ns on top of the 1ns, however 0s propagate quicker up the carry chain slightly than 1s, so I have to tune it carefully.

 

 I'm aiming to recover about 512Mb/s, sent from another Papilio One using a DDR output at 256MHz. I can't just use a DDR input as the clocks on the two FPGAs will using different crystals, so will slowly go in and out of phase. I'm hoping to track this phase drift and then use it to sample at the optimal time. 

 

Learning lots about manual placement and routing delays.  :) 

 

If this fails to work I'm going to use two DCMs to track phase much like http://www.date-conference.com/proceedings/PAPERS/2010/DATE10/PDFFILES/IP2_04.PDF, but switching between DCMs when it reaches the limit for phase tuning.

Share this post


Link to post
Share on other sites
Hi Matt, Half of your project is eerily similar to something I am working on, and OmniTechnoMancer knows of too. I'm trying to capture high speed data and needed a way to see what is going on, 

 

I've just hacked up  a module that will capture an 18 bit wide bus at 256MHz, and then send them up the serial port as ASCII ones and zeros so I can see what is going on.

 

Since it might be close to what you are trying to do I've posted it on my Wiki at http://hamsterworks.co.nz/mediawiki/index.php/RS232_dumper

 

Hi Hamster,

 

I didn't realize that I never responded to this but I appreciate you posting that link.  At first I thought you were saying that you were sending 512 megabytes/second up the serial port and I thought "What the heck?"  Then I looked at your code and it appears you are just logging an occasional snapshot or .. something.  I am still a bit too new to see exactly what is going on.

 

But!  I wanted to give an update on my end...

 

I got the serial logger working.  I ended up just having a buffer size of 1 byte (hehe) so I can do some primitive logging that way.  it is good enough for my purposes.  I wrote a program on the PC end that spits out any byte it receives as ASCII hex.

 

I also have successfully started talking to an SD card (in SD mode, not SPI mode*) and hope to be writing bits to the card soon.  I'm halfway through the init phase (there are like 10 commands you need to send to actually get out of init, it's a bit of a mess but once you have code in place to send the first few commands the rest are pretty much the same).

 

* - I've been searching on google for example code showing how to talk to SD cards in SD mode but I have found exactly 0 examples!  Everyone seems to heavily favor SPI mode.  I don't get it.  I know SD mode is "evil" because it is encumbered by proprietary shackles, and SPI is an open standard, but SPI mode also performs at best 4 times slower than SD mode so I thought someone would eventually say "screw it, here is how to do SD mode".  I guess I was wrong!  Anyway, since I am building a one-off solution that only I (and maybe a few close friends) will be using, I have absolutely no problem using SD mode because I need the speed.

 

--Matt

Share this post


Link to post
Share on other sites

Matt, Excelllent progress.

 

I'll be really interested in hosting the SD card code if you have no where better to put it!

 

That is what drove me to put my wiki up in the first place, apart from OpenCores there are very few examples of how to do [mostly] trivial stuff,

 

However, with FPGAs almost nothing is trivial!

Share this post


Link to post
Share on other sites
Matt, Excelllent progress.

 

I'll be really interested in hosting the SD card code if you have no where better to put it!

 

That is what drove me to put my wiki up in the first place, apart from OpenCores there are very few examples of how to do [mostly] trivial stuff,

 

However, with FPGAs almost nothing is trivial!

 

Great, I'm glad you're interested :)

 

I'm a little embarrassed at you seeing my code since this is basically my first attempt at VHDL and I mostly write software in C/C++/C# (with occasional assembly language for good measure!) but maybe you will be able to help give me some tips to improve it.

 

One question I have right now is... how the heck do you do ASCII notation?

For my logger, I am doing stuff like:

 

sd_log_byte <= X"49"; -- 'I'

 

meaning I want to sent an "I" to the serial port to show I am initializing... but I have to keep looking up ASCII values and then putting in their hex equivalents.  It is a pain.  I've googled for "ascii notation" and am not getting very far (ie nowhere).

Share this post


Link to post
Share on other sites

ASCII to hex/binary is really stupidly awkward. Here's a solution from Google:

 

character'pos(charIn) will give you the ascii code  for the character

 

to_unsigned(character'pos(charIn),8); will convert it to an 8 bit unsigned.

std_logic_vector(to_unsigned(character'pos(charIn),8)) will give you what you need, (assuming you have the correct libraries included.

 

The least two steps could most probably be shortened, but I am no expert on conversion functions :-(

Share this post


Link to post
Share on other sites
ASCII to hex/binary is really stupidly awkward. Here's a solution from Google:

 

character'pos(charIn) will give you the ascii code  for the character

 

to_unsigned(character'pos(charIn),8); will convert it to an 8 bit unsigned.

std_logic_vector(to_unsigned(character'pos(charIn),8)) will give you what you need, (assuming you have the correct libraries included.

 

The least two steps could most probably be shortened, but I am no expert on conversion functions :-(

 

yikes!  ohwell, thanks for finding that for me! :)

 

I've made more progress on talking to the SD card.  I've now completely finished the initialization sequence and am in transfer mode and ready to start multi-block writing.  That will be my next task when I work on this again.

 

I've also beefed up the logger.  It is really inefficient but still meets my timing constraints so I am pleased.  Here is what the logger spits out.  My comments are in italics.

 

 

- Reset
C00 go idle
C08 notify SD card that we're using 3.3V and we can support newer commands
C55 get ready to send ACMD41
A41 initialize
C55 initialization is busy, so we keep looping until it's ready
A41
C55
A41
C55
A41
C55
A41
C55
A41
C55
A41
C55
A41
C55
A41
C02 get SD card's CID
C03 get SD card's relative address
C07 select this SD card for transfer mode
C13 get current status
C55 get ready to send ACMD6
A06 switch into 4-bit (wide) data bus instead of 1-bit data bus
C06 switch from 25 MHz to 50 MHz (ie "high speed" mode)
 

Share this post


Link to post
Share on other sites

I've finally worked out enough of the issues in my VHDL code that I've been able to successfully write 512 bytes to the SD card.

Here is how the correct CRC and response now look.

I've also fixed the way that I output to the CMD and DATA lines so that it is consistent with the way that the SD card does it (ie the correct way).

 

sd_timing_diagram_with_crc.png

Share this post


Link to post
Share on other sites

Ok, I'm to the point of this SD card writing business that I need to ask...

 

How does memory work on the FPGA?  Does it have any internal RAM?  Is it crazy to try to make a massively large resistor (ie 4096 bit) to behave like memory?  I know from reading Hamster's guide that it has internal ROM but I am unclear if it also has RAM.

 

Basically, I think that I need maybe 1k of RAM (to be safe) that can be written to at 50 MHz.

 

The details:

 

I will be reading from the ADC at 15 MHz and we can assume 8-bits for now for simplicity.

 

The SD card accepts writes in 512 byte blocks and it has a short delay between each block while it writes to its internal storage.  Its clock runs at 50 MHz (in high speed mode) and can write 4 bits per clock, so its bus can essentially receive data at a rate of 25 megabytes/second minus the delay in writing each block (when writing multiple blocks, this delay can be reduced).

 

So assuming the SD card can write at an average of over 15 megabytes/second when doing a continuous stream (which I think is possible as I was seeing those rates above that when I plugged it into a usb adapter), I will need some kind of buffer to hold data from the ADC while the SD card is busy between 512 byte blocks.

 

For fun, I tried creating a 4096 bit std_logic_vector but I was informed that I had exceeded the maximum space for the FPGA (this made me sad).  So what is the standard way of getting RAM on these things?

 

I appreciate any help! :)

--Matt

Share this post


Link to post
Share on other sites

Hi,

 

Yes, it does have internal RAM - True Dual port RAM at that.

 

You can use the  block RAM generator if you want large memory, or use a primitive.

 

 

Here is a 18-bit x 1024 RAM primitive with two ports, one for read, one for write, each using different clocks.

 

   RAMB16_S18_S18_inst : RAMB16_S18_S18   generic map (      INIT_A => X"00000",      INIT_B => X"00000",      SRVAL_A => X"00000",       SRVAL_B => X"00000",       WRITE_MODE_A => "WRITE_FIRST",      WRITE_MODE_B => "WRITE_FIRST",      SIM_COLLISION_CHECK => "ALL",       INIT_00 => X"00000000000000000000000000000000000000000000000000000000000000FF" -- use the INIT_?? to set initial contents.   )   port map (      ENA   => '1',      DOA   => open,      DOPA  => open,      ADDRA => write_address,       CLKA  => clk_data,      DIA   => write_data(15 downto 0),       DIPA  => write_data(17 downto 16),      SSRA  => '0',         WEA   => write_enable,            DOB   => read_data(15 downto 0),       DOPB  => read_data(17 downto 16),      ADDRB => read_address,        CLKB  => clk_64,      DIB   => (others => '0'),      DIPB  => (others => '0'),      ENB   => '1',      SSRB  => '0',      WEB   => '0'   );

 

 

ISE can also infer block ram as long as you use have a register in the mix - the internal block RAM requires one clock cycle read. If you don't have this cycle delay it will try to implement using all your lookup tables.

 

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;
entity mem1 is
 port (
  CLK : in std_logic;
  WE : in std_logic;
  WADDR : in std_logic_vector(5 downto 0);
  RE: in std_logic;
  RADDR : in std_logic_vector(5 downto 0);
  DIN : in std_logic_vector(15 downto 0);
  DOUT : out std_logic_vector(15 downto 0)
 );
end mem1;

 

architecture behavioral of mem1 is

 type ram_type is array (63 downto 0) of std_logic_vector (15 downto 0);
 signal RAM : ram_type;
 signal read_addr : std_logic_vector(5 downto 0);
 attribute syn_ramstyle : string;
 attribute syn_ramstyle of RAM : signal is "block_ram";

begin
  
process (CLK)
begin
 if rising_edge(CLK) then
  if WE='1' then
   RAM(conv_integer(WADDR))<=DIN;
  end if;
 end if;
end process;

process (CLK)
begin
 if rising_edge(CLK) then
  if RE='1' then
   read_addr<=RADDR;
  end if;
 end if;
end process;

DOUT<=RAM(conv_integer(read_addr));
end behavioral;


Read more: http://www.fpgacentral.com/docs/fpga-tutorial/inferring-block-rams#ixzz2KuePNazh

 

Share this post


Link to post
Share on other sites
ISE can also infer block ram as long as you use have a register in the mix - the internal block RAM requires one clock cycle read. If you don't have this cycle delay it will try to implement using all your lookup tables.

 

Ok, I got the RAM working using this inference method.  Slick!  Thanks for the tip!  It would've taken me forever to find that on my own!

 

Now I've got a question about clocks.

 

The way the SD card works is during initialization its clock needs to run at 400 kHz but at some point it can be configured to run at 50 MHz (which is my goal).  So I am using a 100 MHz clock generated from the DCM to derive both of these target clocks from.

 

I've got a little clock divider thingy going on but being the newbie that I am I'm probably doing something wrong (plus I am getting a new warning now).

 

The new warning is:

 

"Phase  8  : 0 unrouted; WARNING:Route:455 - CLK Net:myclk100 may have excessive skew because 

      0 CLK pins and 2 NON_CLK pins failed to route using a CLK template."
 
And here is my clock divider:
 
divider: process(sd_clk100_in, bDivideClock)
variable clock_counter : integer range 0 to 62;
begin
if bDivideClock = '1' then
if rising_edge(sd_clk100_in) then
-- if we are dividing the clock
-- 100,000,000 Hz / (400,000 Hz * 2 for double clock * 2 for falling_edge) = 62.5
-- see section 4.2.1 for 400KHz frequency note
if (clock_counter = 62) then
sd_clk_2x <= NOT(sd_clk_2x);
clock_counter := 0;
else
clock_counter := clock_counter + 1;
end if;
end if;
-- else let it go full blast!
else
sd_clk_2x <= sd_clk100_in;
end if;
end process;
 
(I divide the sd_clk_2x in another process down to just sd_clk due to me seeing someone else doing that with his SPI SD card code, so I figure it may come in handy)

 

The problem area is in bold.  I am pretty sure this is bad practice.  But I'm not sure what else to do.

 

I could have two clocks but I have a bunch of utility stuff inside one of my processes (which I can't figure out how to get out due to my inexperience) so if I had two clocks I'd have to duplicate the utility stuff which I definitely know would be the wrong approach.  I know that having huge processes is a Bad Thing but I have not been able to avoid it here even though I've looked for ways.

 

The good news is that despite this warning, I am reading and writing to the SD card (apparently at 50 MHz!) which is awesome!

Share this post


Link to post
Share on other sites

The warning is because problem clock through logic, and as the logic introduces propagation delays (of maybe 1ns per LUT) it makes it hard for the tools to assess the validity of the timing.

 

So the rising edge of your slow logic will be skewed at least 1ns after the 100Mhz clock, plus whatever routing delays are incurred getting it to where it is being used.

 

The "proper" way is to generate a one-clock-pulse-wide "clock enable" signal, and then use that to select when the slow logic will be active:

 

  if rising_edge(clk_100) and sd_card_clock_enable = '1' then

     ... stuff to happen at 400KHz or 50MHz

  end if;

 

That way the timing will always be referenced to the nice and clean 100MHz clock, and your static timing analysis will be accurate.

Share this post


Link to post
Share on other sites

Ok, I've got a v1.0 release of working code now at 50 MHz!

Many thanks to Hamster for the help!

You can find the code (with readme) here: http://www.rulecity.com/browsetmp/sd_card_writer-17Feb2013.7z

Please feel free to suggest improvements to the stupid bone-headed newbie VHDL mistakes I make!

--------------------------------------------------------


VHDL SD card writer v1.0
By Matt Ownby (http://www.daphne-emu.com and http://www.laserdisc-replacement.com)
16 Feb 2013

This is VHDL code (for the Papillio One FPGA board) that shows how to write to an SD card in a high-speed mode.

Features:
- Uses 4-bit SD mode, not 1-bit SPI mode (faster)
- Uses 50 MHz "high speed" mode, not 25 MHz "normal speed" mode (also faster)
- Logs to Papillio USB serial port (230400 bps) so you can see what is going on

Requirements:
- An SDHC card (older probably won't work) that supports high-speed aka 50 MHz mode.

How to use:
- Get the pins of an SD card (or adapter) somehow connected to Papillio pins. Not discussed here as you are expected to have enough expertise to figure this out on your own :)

What you will see on serial port (230400 bps) if everything is working correctly (this is very terse since string handling is a pain in VHDL):

-
C00
C08
C55
A41
C55
A41
C55
A41
C55
A41
C55
A41
C55
A41
C55
A41
C55
A41
C55
A41
C02
C03
C07
C55
A06
C06
C24
C13
S

Explanation of above data:
"-" gets sent on reset, it means the process is starting over
"C00" is the reset command sent to the SD card. It expects no response.
"C08" is the voltage-request command and is only supported on newer cards. The Papillio requests normal 3.3V operation. If I/O is not working, or if the SD card does not support this command, this is the last line you will see.
"C55/A41" is the init command. It usually needs to be repeated over and over again until initialization is finished which is normal.
"C02" is the "get SD card CID" command. It is required otherwise I wouldn't bother sending it.
"C03" is the "get SD card's relative address" command. It is also required.
"C07" puts the SD card into transfer mode.
"C55/A06" switches the SD card from a 1-bit bus into a 4-bit bus (4x speed increase).
"C06" switches the SD card from 25 MHz mode to 50 MHz mode (2x speed increase). If you see an "E" after this, it means that the CRC check failed (this happened to me occasionally during development).
"C24" starts writing a single 512 byte block at address 0 (beginning of card).
"C13" requests the card's status and is optional but it makes me feel better to see it working since at this point we are communicating at 50 MHz.
"S" means 'success' and means that the SD card responded that the write operation was completely successful. If you see an "E" here it means that the CRC check probably failed.

If you see anything with "X" in front of it, it means that an unexpected response was received and the unexpected response has been dumped out in ASCII hex format. This is for troubleshooting purposes.



This post has been promoted to an article

Share this post


Link to post
Share on other sites

blogspot is a google backed site, I think it redirects you to the nearest regional cache, so if you go to blogspot.com you get redirected to blogspot.co.nz

Having said that, I can't make any assumptions as to where Matt is based :) he could be anywhere in the world

Share this post


Link to post
Share on other sites

This is really great, I just promoted this to the article showcase and we will post it to the blog too!

 

I wonder if we can convert this to a wishbone peripheral to be included with the Papilio SOC system at some point?

 

Jack

Share this post


Link to post
Share on other sites

Jack, see this Opencores for a wishbone example sd/mmc, description reads:

SD (Secure Digital) and MMC memory card controller with Wishbone slave interface. Handles all aspects of card initialization, 512 byte block read, and block write. Hides the complicated SD/MMC memory interface, and presents the user with a simple Fifo interface. Provides transfer speeds up to 24Mbps.

 

Not to detract from what Matt is doing of course, there is definitely value for Matt doing this as a learning exercise from scratch, and of course Matt's implementation works in four bit mode and is faster I think but he needs to add the Wishbone layer on top, which as I understand could be as little as splitting the in and out data paths and providing the necessary strobe and control signals.

Share this post


Link to post
Share on other sites

Hmm, right, I thought I remembered seeing one there in the past that was wishbone and full speed implementation. Will take a look at that one as well.

 

Jack.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now