hamster

Medium speed serial comms.

Recommended Posts

While travelling back from Atlanta to LAX I came up with a that can send data between FPGAs at about 53% bits-per-clock efficiency using single signal wire (and ground, of course). I've got it to the proof of concept work, so with a 100 MHz clock it works out to be around 50 Mb/s over a single wire (or pair of wires if using LVDS). It should be good for a bit faster, with tweaking.

 

It has some nice properties, that make it quite useful as a board to board comms link.

 

So here's how it works.....

 

Sending data

 

The data is 8b/10b coded, ensuring that the signal is DC balanced. COMMA codewords are added to help wiht synchronisation.

 

Bits are encoded as either 100 or 110, sent out using a DDR register - making each bit 1.5 cycles long. This ensures that there is a rising edge every one and a half cycle, and the data is carried in the position of the falling edge.

 

 

Receiving data

 

A flip-flop is toggled on the rising edges, giving a clock at 33% the sender's clock rate, with a 50% duty cycle. That is fed into a DCM, and doubled back to 66% and phase aligned with the rising edge

 

The DCM's 180 degree output is used as the clock for sampling the incoming signal. This gives either a '1' or '0' depending on the bit that was sent, as it samples in the middle of "100" or "110".

 

These are moved into a shift register, which is watched until a 8b/10b COMMA codeword seen.  This syncs the framing between the sender and receiver.

 

The 8b/10b data is then framed and can be then decoded back to the original data and pushed into a FIFO, with the write port clocked by the recovered clock.

 

From there the data can be consumed by the receiver.

 

The only downside I can see is that each link requires it's own DCM to allow it to sync, allowing at most only four links to be received (but that is enough for a 2D mesh of FPGA boards to be built.

 

 

What do you think? is it worth progressing with?? 

Share this post


Link to post
Share on other sites

Ok, it's an interesting idea. Now, I am thinking if is there any way to get rid of the DCM/PLL. Are you planning a constant-frequency receiver ? I wonder if you could use a sort of delay-tap to generate the falling edge of the clock (and hence the sampling point).

Share this post


Link to post
Share on other sites

Alvie, as you ponder the project, here is a simulation of the locking, and finding the sync pattern (0011111010 or 1100000101).

 

You could do it without a DCM, but you would need a really clock fast domain, counting and averaging the ticks between rising edges, then sample at (average_ticks/2) after the rising edge. If a DDR register is used, then the fast domain would need to be about 4x faster than the raw bit rate, (or about 5x faster than the user data bit rate rate).

 

So with an Fmax of 200MHz you could transfer about 40Mb/s per channel over multiple channels without using a DCM (limited by RX speed), or 133Mb/s per channel if you are using a DCM (limited by TX speed).

post-29512-0-30167400-1405989942_thumb.j

Share this post


Link to post
Share on other sites

I currently use

  100 => 0

  110 => 0 

which (ignoring DDR) gives 1 bit per three clock ticks = 33%, and when used with 8b/10b gives 0.264 bits per clock tick.

 

I've also been thinking about this coding scheme

 10000 => 00

 11000 => 01

 11100 => 10

 11110 => 11

Which transfers two bits per five clock ticks. = 40%, or 0.32 bits per clock tick after 8b/10b decoding - however it destroys the DC balanced nature of the 8b/10b. The decoding is far more complex because the DCM would need to multiply by 5 and then move it 180 degrees, and sample on three out of five cycles, where as for the current case it reduces to a simple scheme.

 

If I used 64b/66b frames then the 2-bit encoding would require 66/2*5 = 165 ticks to transfer 64 bits of user data = 0.388 bits per clock tick, giving almost 47% higher end-to-end throughput than my current scheme, but would be very complex and heavy (esp the 64b/66b coding at a guess).

 

A lot of the inefficiency is because the clock is explicitly present in the rising edge of the signal, allowing it to be used to drive the DCM. Even with optimal coding it could only reach 0.50 bits per tick. What I need is a good book on high speed clock recovery when using FPGAs...

 

Maybe there is a way that the idle sequence could be 0101010101010101..., which can then be used to assert/release the DCM's FREEZEDCM signal - as per the documentation the signal "prevents tap adjustment drift in the event of a lost CLKIN input. The DCM is then configured into a free-run mode". A bit like the preamble on a ethernet packet. 

Share this post


Link to post
Share on other sites

What error correction do you anticipate ?

 

Transputer emulation time 8)

 

I've got the physical side started tested, but at the moment I'm slowly chipping away at the 8b/10b layer when I'm not distracted with work, shiny new wing PCBs and TMDS. The 8b/10b can do the first layer of error detection, by rejecting invalid symbols on the wire.

 

On top of that it could either be a stream bytes, or maybe have some sort of framing based around a 'Kx.y' symbol to start a frame, and a different 'Kx.y' symbol to end a frame, with an ethernet style CRC checksum within the packet. That should be pretty much bullet-proof. 

Share this post


Link to post
Share on other sites

I'd suggest HLDC framing (either bit or octet-stream). It's pretty much tested, and proven robust.

 

You can also give a look at Aurora. "The Aurora 8B/10B protocol is an open standard and is available for implementation by anyone without restriction."

 

I'm sending you the Aurora spec by email, I've it right here.

 

Alvie

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now