Fast internal clock on DUO

Recommended Posts

Has anyone attempted to use a fast internal clock speed on the DUO? I tried to use the clk_32to960_pll.vhd entity (via its symbol), but the resulting clock is not stable, as far as I can tell. The two methods that I tested that with are (1) serial communications, and (2) watching a blinking LED.

As far as I can tell, theoretically, the Spartan 6 can run at up to 1 GHz internally. For a 32 MHz input clock, the PLL would need a multiplier of 32 to reach 1 GHz.

So, I am asking whether anyone has had success using such a fast clock rate (i.e., faster than about 375 MHz).

Thanks!

Share on other sites

I've never tried a very fast clock on an FPGA because a 3 ns or faster clock requires extremely simple combinational logic between registers.  You probably can't do much of anything in 1 ns.  Most of my designs are 33-50 MHz (30-20 ns clock period) which allows quite a few levels of logic and routing.

When you run the Xilinx tools for your tests, what clock period or frequency do you get for post-P&R timing analysis?

Share on other sites

FYI, this whole FPGA concept is pretty new to me. I used one years ago, but had an EE standing next to me, helping me to go through the design/build process. This is the first time that I have taken on a real VHDL coding project.

I was able to use the PLL to get a clock speed of 248 MHz, which is pretty nice. I believe the clock can go faster with very simple logic, but I read in a flyer for the Spartan 6 that its BRAM is limited to 300+ MHz, and it appears that if I approach 300 (say, 288), things get hairy.

Build results show this, although I'm unsure what it all means. What does the max frequency represent?

Timing Summary:
---------------

Minimum period: 47.081ns (Maximum Frequency: 21.240MHz)
Minimum input arrival time before clock: 2.009ns
Maximum output required time after clock: 5.220ns
Maximum combinational path delay: No path found

I had this dream of using a 1 GHz internal clock, but I did not realize that very parts (guts) of the FPGA have distinct speed limits (e.g., BRAM, multipliers, i/o, etc.)

Since things are generally working at the moment, I will try to stick with the speed that I have.

thanks!

Share on other sites

Well, I had my wires crossed! What IS working right now is the serial communication (RX pin), on which I am sending data from a PC at 3.4 Mbps.

What is NOT apparently working is the BRAM, because the data that come over are kept in BRAM, then used by the state machine. The data do not appear to be correct.

So maybe 248 MHz internal clock is too fast for the BRAM? Strange, since the BRAM has a limit of 300+ MHz. But there are probably things involved that I know nothing about. :-)

Share on other sites
Build results show this, although I'm unsure what it all means. What does the max frequency represent?

Timing Summary:
---------------

Minimum period: 47.081ns (Maximum Frequency: 21.240MHz)
Minimum input arrival time before clock: 2.009ns
Maximum output required time after clock: 5.220ns
Maximum combinational path delay: No path found

This means that some part of your design has a delay path that's 47.081 ns from clock to clock and you shouldn't clock your design faster than 21.240 MHz.  This is assuming worst case temperature (electrons move slower at higher temperature), worst case supply voltage (electrons move slower at lower supply voltage), and worst case process (some chips are faster than others when manufactured).  At room temperature with good voltage regulation (which you get with Papilio DUO), you can go faster than this -- maybe as much as a factor of 2.

I don't know if the timing analysis considers the PLL multiplier -- I've never used Xilinx PLLs or DLLs, so someone will have to help me out here.  If the timing analysis doesn't consider the PLL, your design is nowhere near fast enough for a 248 MHz clock.  If timing analysis considers the PLL, it's telling you that the maximum reference clock is 21.240 MHz.  That means Papilio DUO's 32 MHz oscillator is 50% too fast, which may be OK at room temperature.  If the clock is too fast, the fast parts of your design will work but the slow parts will be unreliable.

Update: alvieboy says timing analysis considers the PLL multiplier, so in your case the maximum reference clock is 21.240 MHz, which means Papilio DUO's 32 MHz oscillator is too fast for reliable operation.

It may be that your design is working as far as timing is concerned, but that you don't have the BRAM configured properly.  BRAM options are complex, and you have to get the pipelining correct.  For example, when reading from BRAM you set up the address in the current clock cycle, but the data isn't available until the next clock cycle.

Individual blocks in an FPGA are very fast, like the 300 MHz BRAM and look-up tables which are around a nanosecond.  What's slow in an FPGA is the routing, especially when a design gets large and signals have lots of fanout.  My Flavia (Free Logic Array via...) Spartan-6 design is basically a bunch of multiplexers that connect to each other.  I get a 16 ns minimum clock cycle even though the logic is pretty shallow.  All the delay is in the routing, which has to go all over the chip.  Most designs are going to be a LOT better than that.

Share on other sites

If PLL or DCM is used, tools will propagate those timings, so report shows the "input" clock speed, not internal speed.

After P&R you will have more detailed reports about these, like this (example for Pipistrello with HDMI, which uses many many clocks):

Clock summary:

`+---------------------+--------------+------+------+------------+-------------+|        Clock Net    |   Resource   |Locked|Fanout|Net Skew(ns)|Max Delay(ns)|+---------------------+--------------+------+------+------------+-------------+|              sysclk |  BUFGMUX_X2Y2| No   | 1017 |  0.539     |  1.750      |+---------------------+--------------+------+------+------------+-------------+| slot9/clocking.clk2 |  BUFGMUX_X2Y4| No   |   45 |  0.039     |  1.272      |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/c3 |              |      |      |            |             ||        _mcb_drp_clk | BUFGMUX_X3Y13| No   |    6 |  0.020     |  1.264      |+---------------------+--------------+------+------+------------+-------------+| slot9/clocking.clk1 | BUFGMUX_X2Y12| No   |   20 |  0.237     |  1.473      |+---------------------+--------------+------+------+------------+-------------+|  slot9/mydvid/ioclk |         Local|      |    8 |  0.000     |  1.463      |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/c3 |              |      |      |            |             ||          _sysclk_2x |         Local|      |   30 |  0.571     |  1.543      |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/me |              |      |      |            |             ||mc3_wrapper_inst/mem |              |      |      |            |             ||c3_mcb_raw_wrapper_i |              |      |      |            |             ||     nst/ioi_drp_clk |         Local|      |   22 |  0.000     |  0.002      |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/c3 |              |      |      |            |             ||      _sysclk_2x_180 |         Local|      |   37 |  0.590     |  1.562      |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/me |              |      |      |            |             ||mc3_wrapper_inst/mem |              |      |      |            |             ||c3_mcb_raw_wrapper_i |              |      |      |            |             ||nst/idelay_dqs_ioi_m |              |      |      |            |             ||                     |         Local|      |    1 |  0.000     |  0.002      |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/me |              |      |      |            |             ||mc3_wrapper_inst/mem |              |      |      |            |             ||c3_mcb_raw_wrapper_i |              |      |      |            |             ||nst/idelay_udqs_ioi_ |              |      |      |            |             ||                   m |         Local|      |    1 |  0.000     |  0.002      |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/me |              |      |      |            |             ||mc3_wrapper_inst/mem |              |      |      |            |             ||c3_mcb_raw_wrapper_i |              |      |      |            |             ||nst/idelay_dqs_ioi_s |              |      |      |            |             ||                     |         Local|      |    1 |  0.000     |  0.002      |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/me |              |      |      |            |             ||mc3_wrapper_inst/mem |              |      |      |            |             ||c3_mcb_raw_wrapper_i |              |      |      |            |             ||nst/idelay_udqs_ioi_ |              |      |      |            |             ||                   s |         Local|      |    1 |  0.000     |  0.002      |+---------------------+--------------+------+------+------------+-------------+`

Clock constraints:

`+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+|                               |   Period    |       Actual Period       |      Timing Errors        |      Paths Analyzed       ||           Constraint          | Requirement |-------------+-------------|-------------+-------------|-------------+-------------||                               |             |   Direct    | Derivative  |   Direct    | Derivative  |   Direct    | Derivative  |+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+|TS_clkin                       |     20.000ns|      8.000ns|     19.562ns|            0|            0|            0|       452226|| TS_hdmi_pre_clock_in          |     44.444ns|     20.000ns|     31.982ns|            0|            0|            0|       271516||  TS_slot9_clocking_pllinst_c1 |     20.000ns|      4.574ns|          N/A|            0|            0|          131|            0||  TS_slot9_clocking_clk0       |     20.000ns|          N/A|          N/A|            0|            0|            0|            0||  TS_slot9_clocking_pllinst_c2 |     20.000ns|     14.392ns|          N/A|            0|            0|       271385|            0|| TS_memctrl_inst_ctrl_memc3_inf|     20.000ns|      5.000ns|     19.562ns|            0|            0|            0|       180710|| rastructure_inst_sys_clk_ibufg|             |             |             |             |             |             |             ||  TS_memctrl_inst_ctrl_memc3_in|     10.000ns|      2.505ns|          N/A|            0|            0|          152|            0||  frastructure_inst_mcb_drp_clk|             |             |             |             |             |             |             ||  _bufg_in                     |             |             |             |             |             |             |             ||  TS_memctrl_inst_ctrl_memc3_in|      2.500ns|      1.499ns|          N/A|            0|            0|            0|            0||  frastructure_inst_clk_2x_180 |             |             |             |             |             |             |             ||  TS_memctrl_inst_ctrl_memc3_in|      2.500ns|      1.499ns|          N/A|            0|            0|            0|            0||  frastructure_inst_clk_2x_0   |             |             |             |             |             |             |             ||  TS_memctrl_inst_ctrl_memc3_in|     10.000ns|      9.781ns|          N/A|            0|            0|       180558|            0||  frastructure_inst_clk0_bufg_i|             |             |             |             |             |             |             ||  n                            |             |             |             |             |             |             |             |+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+`

And slack report:

`----------------------------------------------------------------------------------------------------------  Constraint                                |    Check    | Worst Case |  Best Case | Timing |   Timing                                               |             |    Slack   | Achievable | Errors |    Score   ----------------------------------------------------------------------------------------------------------  TS_SYS_TO_PIX = MAXDELAY FROM TIMEGRP "GR | SETUP       |     0.012ns|    14.988ns|       0|           0  P_sysclk" TO TIMEGRP "GRP_pixclk" 15      | HOLD        |     0.034ns|            |       0|           0      ns                                    |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_memctrl_inst_ctrl_memc3_infrastructure | SETUP       |     0.219ns|     9.781ns|       0|           0  _inst_clk0_bufg_in = PERIOD TIMEGRP       | HOLD        |     0.263ns|            |       0|           0     "memctrl_inst_ctrl_memc3_infrastructur |             |            |            |        |              e_inst_clk0_bufg_in"         TS_memctrl_i |             |            |            |        |              nst_ctrl_memc3_infrastructure_inst_sys_cl |             |            |            |        |              k_ibufg / 2 HIGH         50% INPUT_JITTER |             |            |            |        |               0.2 ns                                   |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_memctrl_inst_ctrl_memc3_infrastructure | MINPERIOD   |     1.001ns|     1.499ns|       0|           0  _inst_clk_2x_180 = PERIOD TIMEGRP         |             |            |            |        |               "memctrl_inst_ctrl_memc3_infrastructure_ |             |            |            |        |              inst_clk_2x_180"         TS_memctrl_inst_ |             |            |            |        |              ctrl_memc3_infrastructure_inst_sys_clk_ib |             |            |            |        |              ufg / 8 PHASE         1.25 ns HIGH 50% IN |             |            |            |        |              PUT_JITTER 0.2 ns                         |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_memctrl_inst_ctrl_memc3_infrastructure | MINPERIOD   |     1.001ns|     1.499ns|       0|           0  _inst_clk_2x_0 = PERIOD TIMEGRP         " |             |            |            |        |              memctrl_inst_ctrl_memc3_infrastructure_in |             |            |            |        |              st_clk_2x_0"         TS_memctrl_inst_ctrl |             |            |            |        |              _memc3_infrastructure_inst_sys_clk_ibufg  |             |            |            |        |              / 8 HIGH         50% INPUT_JITTER 0.2 ns  |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_memctrl_inst_ctrl_memc3_infrastructure | MINLOWPULSE |    15.000ns|     5.000ns|       0|           0  _inst_sys_clk_ibufg = PERIOD TIMEGRP      |             |            |            |        |                  "memctrl_inst_ctrl_memc3_infrastructu |             |            |            |        |              re_inst_sys_clk_ibufg" TS_clkin         P |             |            |            |        |              HASE 5 ns HIGH 50% INPUT_JITTER 0.2 ns    |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_slot9_clocking_pllinst_c2 = PERIOD TIM | SETUP       |     5.608ns|    14.392ns|       0|           0  EGRP "slot9_clocking_pllinst_c2"          | HOLD        |     0.388ns|            |       0|           0  TS_hdmi_pre_clock_in / 2.22222222 HIGH 50 |             |            |            |        |              % INPUT_JITTER 0.2 ns                     |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_memctrl_inst_ctrl_memc3_infrastructure | SETUP       |     7.495ns|     2.505ns|       0|           0  _inst_mcb_drp_clk_bufg_in = PERIOD        | HOLD        |     0.463ns|            |       0|           0    TIMEGRP         "memctrl_inst_ctrl_memc |             |            |            |        |              3_infrastructure_inst_mcb_drp_clk_bufg_in |             |            |            |        |              "         TS_memctrl_inst_ctrl_memc3_infr |             |            |            |        |              astructure_inst_sys_clk_ibufg / 2 HIGH    |             |            |            |        |                    50% INPUT_JITTER 0.2 ns             |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_SYS_TO_PIXw = MAXDELAY FROM TIMEGRP "G | SETUP       |     8.098ns|     6.902ns|       0|           0  RP_pixclk" TO TIMEGRP "GRP_sysclk" 15     | HOLD        |     1.571ns|            |       0|           0       ns                                   |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_hdmi_pre_clock_in = PERIOD TIMEGRP "hd | MINLOWPULSE |    24.444ns|    20.000ns|       0|           0  mi_pre_clock_in" TS_clkin / 0.45 HIGH     |             |            |            |        |                   50% INPUT_JITTER 0.2 ns              |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_clkin = PERIOD TIMEGRP "clkin" 20 ns H | MINLOWPULSE |    12.000ns|     8.000ns|       0|           0  IGH 50% INPUT_JITTER 0.2 ns               |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_slot9_clocking_pllinst_c1 = PERIOD TIM | SETUP       |    15.426ns|     4.574ns|       0|           0  EGRP "slot9_clocking_pllinst_c1"          | HOLD        |     0.124ns|            |       0|           0  TS_hdmi_pre_clock_in / 2.22222222 HIGH 50 |             |            |            |        |              % INPUT_JITTER 0.2 ns                     |             |            |            |        |            ----------------------------------------------------------------------------------------------------------  TS_slot9_clocking_clk0 = PERIOD TIMEGRP " | N/A         |         N/A|         N/A|     N/A|         N/A  slot9_clocking_clk0"         TS_hdmi_pre_ |             |            |            |        |              clock_in / 2.22222222 HIGH 50% INPUT_JITT |             |            |            |        |              ER 0.2 ns                                 |             |            |            |        |            ----------------------------------------------------------------------------------------------------------`

Looking at input clock constraint:

`|TS_clkin                       |     20.000ns|      8.000ns|     19.562ns`

We see that input clock as a constraint of 20ns (50MHz), and best derivative (i.e., clocks that derive from this base clock) case is 19.562ns (51.12MHz).

This clock is then propagated to other clocks, and each one has its own constraint and slack. Note some of these constraints are manual constraints (like the 15ns prop delay from sysclk to pixclk). Worst clock, which goes through memory controller, and which generates all other clocks, best case is "9.781ns" (for a period requirement of 10ns). This is internal clock for ZPUino and memory (100MHz).

Yes, these timing reports are tricky.

Alvie

Share on other sites

TWR report is even clearer:

`Derived Constraints for TS_clkin+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+|                               |   Period    |       Actual Period       |      Timing Errors        |      Paths Analyzed       ||           Constraint          | Requirement |-------------+-------------|-------------+-------------|-------------+-------------||                               |             |   Direct    | Derivative  |   Direct    | Derivative  |   Direct    | Derivative  |+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+|TS_clkin                       |     20.000ns|      5.000ns|     19.942ns|            0|            0|            0|       262646|| TS_memctrl_inst_ctrl_memc3_inf|     10.000ns|      2.930ns|          N/A|            0|            0|          168|            0|| rastructure_inst_mcb_drp_clk_b|             |             |             |             |             |             |             || ufg_in                        |             |             |             |             |             |             |             || TS_memctrl_inst_ctrl_memc3_inf|      2.500ns|      1.499ns|          N/A|            0|            0|            0|            0|| rastructure_inst_clk_2x_180   |             |             |             |             |             |             |             || TS_memctrl_inst_ctrl_memc3_inf|      2.500ns|      1.499ns|          N/A|            0|            0|            0|            0|| rastructure_inst_clk_2x_0     |             |             |             |             |             |             |             || TS_memctrl_inst_ctrl_memc3_inf|     10.000ns|      9.971ns|      9.165ns|            0|            0|       250991|        11487|| rastructure_inst_clk0_bufg_in |             |             |             |             |             |             |             ||  TS_hclk_clkp_i               |      4.000ns|      3.666ns|          N/A|            0|            0|          372|            0||  TS_hclk_clkpix_i             |     20.000ns|     14.154ns|          N/A|            0|            0|        11115|            0||  TS_hclk_clkn_i               |      4.000ns|      1.730ns|          N/A|            0|            0|            0|            0|+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+`

Here you can see the actual period for each clock, as well as the best case (the fastest) clock achievable.

Note all clocks derive from input clock (see indentation).

What this tells us is:

- We wanted 50Mhz for input clock, and propagates as: 100Mhz for "...mcb_drp_clk...", 400MHz for "clk_2x_180" and "clk_2x_0", 100 Mhz for "...inst_clk0...", 250MHz for clkp_i and clkn_i (HDMI bitclock), and 50MHz for clkpix (note these last three will be dynamically updated by software).

Alvie

Share on other sites

cawhitely: If you haven't already done so, add a clock frequency constraint to your UCF (User Constraint File).  This will tell the Xilinx tools to take timing more seriously when doing map/place & route and give a warning if it can't meet your constraint.  You may have to reduce your clock multiplier to get your design to work.

Share on other sites

Thanks for all the specific info guys. Great food for thought. Over the past couple of days, a thought came to me of how to reduce the data throughput requirement, which may allow me to eliminate the need for BRAM completely, and thus allow me to optimize the code more. Hopefully, I can can a faster, more reliable design.

Share on other sites

>> As far as I can tell, theoretically, the Spartan 6 can run at up to 1 GHz internally.

Hi,

are you using the regular Xilinx tools? I remember I once did a similar experiment, and ended up in the 300 MHz range for a simple divider.

If your synthesis tool lets you get away with 1 GHz, something is fundamentally wrong.

It's very important that you do proper timing analysis. Successfully testing a design on hardware is necessary but most definitely not sufficient for any real-world application.

You expect a digital circuit to fail "once in a million years", give or take some.

Beyond identifiying a circuit that obviously fails, testing under nominal conditions, e.g. voltage, clock purity, temperature, is practically useless.

You can find the maximum allowed PLL frequency in the data sheet, which is 1 GHz for speed-grade 2 LX devices (page 59).

But, that doesn't mean that arbitrary logic can be clocked at that rate, only specialized circuitry.

Share on other sites

Yes, no way you can clock anything at 1ns on these devices.

If you can get any real RTL design to work > 180Mhz, you'll be lucky.

Alvie

Share on other sites

Best I could ever do was about 1.6 ns in a Virtex 5, but that was with some tricky primitives placement and hand-routing. Even then the design was constrained to a small area of the chip.

Glad you were able to do away with the BRAMs. I was about to warn that timing constraints don't normally propagate through them. (Last time I did an FPGA design with one). There should be an area in the place and route report that indicates constraint coverage. Should also be a listing of all unconstrained paths. Those are areas where the tool was unable to reach, and should be constrained using internal net names if those paths matter. If the timing tool indicated a system clock of under 30-40MHz then there's likely some code optimization you can do to get that speed up.

EDIT: Actually I think unconstrained paths is in the timing report.  .twr file I think.  It's been too long.