alvieboy

Members
  • Content count

    865
  • Joined

  • Last visited

  • Days Won

    18

Everything posted by alvieboy

  1. alvieboy

    Papilio Pro without SDRAM

    Hmm. Something is definitely broken then. I don't have a board to test with right now, only in February. Let me see if I can do something on HDL simulation for now. But I suspect issue might be in Arduino code instead. You are using the "ZPUino 2.0 boards" version on the IDE, correct ? Alvie
  2. alvieboy

    Papilio Pro without SDRAM

    Rx: 0x7e 0x81 0x01 0x09 0x04 0x60 0x00 0x00 0x2f 0x80 0x05 0xb8 0xd8 0x00 0xb4 0x01 0x0f 0x00 0x00 0x00 0x00 0x00 0xe7 0x6b 0x7e This means: 01 09 - Bootloader version 04 60 00 -- SPI offset: 0x046000 THIS IS WRONG FOR S6 devices. For S6 it needs to be higher. See below. 0x00 0x2f 0x80 - Max sketch size: 12KB 0x05 0xb8 0xd8 0x00 - Clock frequency (96000000) 0xb4 0x01 0x0f 0x00 - Board ID - This is S3E500 board, not S6 board. See below. 0x00 0x00 0x00 0x00 - Memory top. Not set yet. Here are the defines for PPRO (common/board_papilio_pro.h): /* LX9 bitfile is 0x5327C in size */#define SPIOFFSET 0x60000The correct board ID for your system should be: 0xB4040F00 "0F" means 15 addres bits for memory, hence 32KB. You need to update it in the Makefile. Alvie
  3. alvieboy

    Papilio Pro without SDRAM

    Looks like you're overwriting bootloader itself when performing the copy. How much memory do you have ? Can you try printint the sketch size before copy_sketch ? you can use "printhex" for that. Can you also send the output from the programmer in verbose mode ? Example: zpuinoprogrammer -v -v -v -R -r -d /dev/ttyUSB0
  4. alvieboy

    Pin providing 2.5v insted of 3.3v

    Unless there's a load on the pin it should indeed be around 3.3V. Did you have anything connected to that pin ? Did you have any wing connected to it ? Alvie
  5. We can use conditionals easily on ZPUino, unlike Arduino. So it's an option indeed. Still regarding 1) and so that I don't forget about it, can you add an issue here ? https://github.com/alvieboy/ZPUino/issues You can also add 2), I'll see best way to implement it without compromising performance too much.
  6. alvieboy

    Papilio Pro without SDRAM

    Not clear yet what might be happening. Are you able to recompile the bootloader so we can see what is going on ? Hmm another question: if you program a sketch and restart board (powercycle) does FPGA load ? The flash offsets for both devices are different. That might explain what is happening. Can you also try to see if upload-to-ram works ? (hold CTRL when hitting the upload button on the IDE) Alvie
  7. alvieboy

    Using FPGA - Am I right to do so?

    @johnbeethem: heh, just tell me if you want a wider bus (quad-spi, 8-bit spi). Just make it synchronous. Is GPMC synchronous ? Asynchronous [parallel] buses are long dead. It's faster and more reliable to use SPI on let's say 80Mhz with a quad-bus (yielding 40MBit/s) than using plain asynch stuff. i2C is so slow no one uses it unless a shared bus is required (cause CAN is still encumbered by Bosch's patents).
  8. alvieboy

    Using FPGA - Am I right to do so?

    "Out of interest, how would you interface the FPGA to the beaglebone?" Preferrably SPI. "I want to simultaneously sample from a number of ADCs, I do not want to "mux" the adcs" Why not ? your sampling rate is quite small. I happen to work on a system that does sampling of more than 100 analogue signals each 12-bit, with an average sample rate of 400Hz each, with a single ADC. All done in software with a microcontroller much much slower than a beaglebone, but a real-time system nonetheless. Analog muxers are cheaper than ADCs. You can use the FPGA to control all this and present all ADC data on SPI bus (either master or slave) to main CPU in case you are worried. You can also do RMS computations on FPGA (as well as other goodies). Alvie
  9. Hi Jaxartes, Thanks for the reports. Regarding 1), yes, it's probably not well implemented. Let me fix that during the next days. Regarding 2), I have mixed feelings: For one, it would be useful to validate all measures to ensure they stay on screen. For other, these checks are expensive and will degrade performance significantly. Let me see if I can put those "conditional" in compile time, so one can test their sketches with or without checks. Alvie
  10. alvieboy

    Using on-board RAM in designs

    I can mock up a 6502 memory interface. But I need to know if you are to use cross-clock. Memory runs at 96-133Mhz, we need to do some domain crossing here.
  11. alvieboy

    Using on-board RAM in designs

    Yes, I mean a cache in conjunction with the DMA engine, with IWF reads and write-combining. I do happen to have one I developed for XThunderCore. It can be adapted to byte-wide (currently it's 32-bit wide), and it's rather fast (as fast as possible, at least). Still, it's a simple cache, direct mapped (two-way associative also possible, but expensive). How fast is your design, in Bytes per Second ?
  12. alvieboy

    Using on-board RAM in designs

    You need byte access... that's not very good. The memory controller is optimized for large read/write blocks, not single word access. Remember SDRAM has a big latency. What's your read/write pattern ? Is is sequencial or random ? Perhaps using a small cache may help here.
  13. alvieboy

    Using on-board RAM in designs

    I wrote a burst controller that you may use. It eases access to DMA. https://github.com/alvieboy/ZPUino-HDL/blob/master/zpu/hdl/zpuino/lib/wishbone/wb_burstctrl.vhd Assuming a burst with of 16 words, you should use it like this: signal bctrl_sob: std_logic; signal bctrl_rnext: std_logic; signal bctrl_wnext: std_logic; signal bctrl_req: std_logic; signal bctrl_eob: std_logic;--- burstctl: entity work.wb_burstctrl port map ( clk => wb_clk_i, rst => wb_rst_i, sob => bctrl_sob, eob => bctrl_eob, cti => mi_wb_cti_o, stb => mi_wb_stb_o, cyc => mi_wb_cyc_o, stall => mi_wb_stall_i, ack => mi_wb_ack_i, req => bctrl_req, rnext => bctrl_rnext, wnext => bctrl_wnext );An explanation of the required signals: signal bctrl_sob: std_logic; -- Start Of Burst. Input to burst controller. Set to one for one clock cycle. signal bctrl_rnext: std_logic; -- Read-Next. Output from burst controller. signal bctrl_wnext: std_logic; -- Write-Next. Output from burst controller. signal bctrl_req: std_logic; -- Request in progress signal. Output. signal bctrl_eob: std_logic; -- End-of-Burst signal. Output. See how VGA uses it:https://github.com/alvieboy/ZPUino-HDL/blob/master/zpu/hdl/zpuino/devices/video/vga_generic.vhd Do you need read, write or both ?
  14. alvieboy

    Using Config Flash from ZPUino

    Not sure exacly whay you mean by reading and writing values. Indeed you have access to the whole of the SPI flash, if that is what you ask for. If you only need reading, you can find "smallfs" useful. It will store a bunch of files in flash easily accessible using open/fopen and similar means. Just place all files inside a folder named "smallfs" in your sketch and use SmallFS library or open/read/so on. For more advanced uses, you can: a) access the end of used flash from within ZPUino. That will tell you what you can or not use. Use SPI communication to flash in order to read and erase/write what you want. Best, Alvie
  15. alvieboy

    Papilio One 500k - Speed Grade 4C

    Indeed. You should select -4 speed grade. Your timing analysis will be wrong. -5 devices are more performative than -4. That means that if you are close to the timing limits on -5, it will never run on a -4. If it was the opposite (running a -4 on a -5 device) you'd be fine, but your design could eventually become faster if you chose the correct timings. "C" refers to temperature range, as in "Commercial" (as opposed to "I", which means "Industrial"). See page 7 of DS312 - Spartan-3E FPGA Family: Complete Data Sheet Alvie
  16. this is SDRAM which is a pain to use Don't scare people, james1095. It's not a pain to use, it's actually a blessing. Internal RAM is too short, and provided you use a decent controller (ZPUino is quite decent) you can do amazing things. Not as easy as predictable SRAM, but not as complex as driving an AXI bus....
  17. alvieboy

    Mouse Input?

    Ken, I've used the current opencores PS2 module successfuly to interface with mice and keyboard. There's also a wishbone wrapper for it. https://github.com/alvieboy/ZPUino-HDL/tree/master/zpu/hdl/zpuino/contrib/ps2 Note however that module is licenced under GPL. Alvie
  18. alvieboy

    Platform for use with GHDL

    I use GHDL extensively, and by that I mean quite large designs. I even simulated Linux on ZCoreV3 using GHDL (a few hours to simulate a few ms). The only problem is you cannot simulate some of hard-ip cores, like MCB. For those you need to use FUSE (the simulator that comes with ISE), but performance will be degraded if your design is considered "too big". Which most are, by the way. Believe it or not, I use Modelsim at work and I often export waveforms so to use GTKWave for visualization and exporing. Modelsim (and ISe waveviewer) are unfortunately a pain to work with. Alvie
  19. alvieboy

    Fast internal clock on DUO

    Yes, no way you can clock anything at 1ns on these devices. If you can get any real RTL design to work > 180Mhz, you'll be lucky. Alvie
  20. alvieboy

    HDMI wing status

    For RGB panel feeding you don't need that much speed (they are quite slow). HDMI is feasible with S6 up to 720p - the bandwidth it's an amazing 2.2Gbit/s, split across three lanes. Memory BW is around ~150MBytes/s for simple 16-bit output ay 720p. For true-color (24-bit) you need more, often you need twice that (due to alignment you need 32-bit), so ~300Mbytes/s.
  21. alvieboy

    HDMI wing status

    Yes, HDMI is working now, once Jack gets back we'll do some more testing and publish the design. Unfortunately it will be hard to do 720p due to memory bandwidth limitations (on Pro), unless it's indexed color or RGB332. 1080p is also not possible due to Spartan6 limitations. For DUO, I am sorry not even RGB332 will work at 720p, so you need to use a smaller resolution. Alvie
  22. alvieboy

    Fast internal clock on DUO

    TWR report is even clearer: Derived Constraints for TS_clkin+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+| | Period | Actual Period | Timing Errors | Paths Analyzed || Constraint | Requirement |-------------+-------------|-------------+-------------|-------------+-------------|| | | Direct | Derivative | Direct | Derivative | Direct | Derivative |+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+|TS_clkin | 20.000ns| 5.000ns| 19.942ns| 0| 0| 0| 262646|| TS_memctrl_inst_ctrl_memc3_inf| 10.000ns| 2.930ns| N/A| 0| 0| 168| 0|| rastructure_inst_mcb_drp_clk_b| | | | | | | || ufg_in | | | | | | | || TS_memctrl_inst_ctrl_memc3_inf| 2.500ns| 1.499ns| N/A| 0| 0| 0| 0|| rastructure_inst_clk_2x_180 | | | | | | | || TS_memctrl_inst_ctrl_memc3_inf| 2.500ns| 1.499ns| N/A| 0| 0| 0| 0|| rastructure_inst_clk_2x_0 | | | | | | | || TS_memctrl_inst_ctrl_memc3_inf| 10.000ns| 9.971ns| 9.165ns| 0| 0| 250991| 11487|| rastructure_inst_clk0_bufg_in | | | | | | | || TS_hclk_clkp_i | 4.000ns| 3.666ns| N/A| 0| 0| 372| 0|| TS_hclk_clkpix_i | 20.000ns| 14.154ns| N/A| 0| 0| 11115| 0|| TS_hclk_clkn_i | 4.000ns| 1.730ns| N/A| 0| 0| 0| 0|+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+Here you can see the actual period for each clock, as well as the best case (the fastest) clock achievable. Note all clocks derive from input clock (see indentation). What this tells us is: - We wanted 50Mhz for input clock, and propagates as: 100Mhz for "...mcb_drp_clk...", 400MHz for "clk_2x_180" and "clk_2x_0", 100 Mhz for "...inst_clk0...", 250MHz for clkp_i and clkn_i (HDMI bitclock), and 50MHz for clkpix (note these last three will be dynamically updated by software). Alvie
  23. alvieboy

    Fast internal clock on DUO

    If PLL or DCM is used, tools will propagate those timings, so report shows the "input" clock speed, not internal speed. After P&R you will have more detailed reports about these, like this (example for Pipistrello with HDMI, which uses many many clocks): Clock summary: +---------------------+--------------+------+------+------------+-------------+| Clock Net | Resource |Locked|Fanout|Net Skew(ns)|Max Delay(ns)|+---------------------+--------------+------+------+------------+-------------+| sysclk | BUFGMUX_X2Y2| No | 1017 | 0.539 | 1.750 |+---------------------+--------------+------+------+------------+-------------+| slot9/clocking.clk2 | BUFGMUX_X2Y4| No | 45 | 0.039 | 1.272 |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/c3 | | | | | || _mcb_drp_clk | BUFGMUX_X3Y13| No | 6 | 0.020 | 1.264 |+---------------------+--------------+------+------+------------+-------------+| slot9/clocking.clk1 | BUFGMUX_X2Y12| No | 20 | 0.237 | 1.473 |+---------------------+--------------+------+------+------------+-------------+| slot9/mydvid/ioclk | Local| | 8 | 0.000 | 1.463 |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/c3 | | | | | || _sysclk_2x | Local| | 30 | 0.571 | 1.543 |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/me | | | | | ||mc3_wrapper_inst/mem | | | | | ||c3_mcb_raw_wrapper_i | | | | | || nst/ioi_drp_clk | Local| | 22 | 0.000 | 0.002 |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/c3 | | | | | || _sysclk_2x_180 | Local| | 37 | 0.590 | 1.562 |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/me | | | | | ||mc3_wrapper_inst/mem | | | | | ||c3_mcb_raw_wrapper_i | | | | | ||nst/idelay_dqs_ioi_m | | | | | || | Local| | 1 | 0.000 | 0.002 |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/me | | | | | ||mc3_wrapper_inst/mem | | | | | ||c3_mcb_raw_wrapper_i | | | | | ||nst/idelay_udqs_ioi_ | | | | | || m | Local| | 1 | 0.000 | 0.002 |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/me | | | | | ||mc3_wrapper_inst/mem | | | | | ||c3_mcb_raw_wrapper_i | | | | | ||nst/idelay_dqs_ioi_s | | | | | || | Local| | 1 | 0.000 | 0.002 |+---------------------+--------------+------+------+------------+-------------+|memctrl_inst/ctrl/me | | | | | ||mc3_wrapper_inst/mem | | | | | ||c3_mcb_raw_wrapper_i | | | | | ||nst/idelay_udqs_ioi_ | | | | | || s | Local| | 1 | 0.000 | 0.002 |+---------------------+--------------+------+------+------------+-------------+Clock constraints: +-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+| | Period | Actual Period | Timing Errors | Paths Analyzed || Constraint | Requirement |-------------+-------------|-------------+-------------|-------------+-------------|| | | Direct | Derivative | Direct | Derivative | Direct | Derivative |+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+|TS_clkin | 20.000ns| 8.000ns| 19.562ns| 0| 0| 0| 452226|| TS_hdmi_pre_clock_in | 44.444ns| 20.000ns| 31.982ns| 0| 0| 0| 271516|| TS_slot9_clocking_pllinst_c1 | 20.000ns| 4.574ns| N/A| 0| 0| 131| 0|| TS_slot9_clocking_clk0 | 20.000ns| N/A| N/A| 0| 0| 0| 0|| TS_slot9_clocking_pllinst_c2 | 20.000ns| 14.392ns| N/A| 0| 0| 271385| 0|| TS_memctrl_inst_ctrl_memc3_inf| 20.000ns| 5.000ns| 19.562ns| 0| 0| 0| 180710|| rastructure_inst_sys_clk_ibufg| | | | | | | || TS_memctrl_inst_ctrl_memc3_in| 10.000ns| 2.505ns| N/A| 0| 0| 152| 0|| frastructure_inst_mcb_drp_clk| | | | | | | || _bufg_in | | | | | | | || TS_memctrl_inst_ctrl_memc3_in| 2.500ns| 1.499ns| N/A| 0| 0| 0| 0|| frastructure_inst_clk_2x_180 | | | | | | | || TS_memctrl_inst_ctrl_memc3_in| 2.500ns| 1.499ns| N/A| 0| 0| 0| 0|| frastructure_inst_clk_2x_0 | | | | | | | || TS_memctrl_inst_ctrl_memc3_in| 10.000ns| 9.781ns| N/A| 0| 0| 180558| 0|| frastructure_inst_clk0_bufg_i| | | | | | | || n | | | | | | | |+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+And slack report: ---------------------------------------------------------------------------------------------------------- Constraint | Check | Worst Case | Best Case | Timing | Timing | | Slack | Achievable | Errors | Score ---------------------------------------------------------------------------------------------------------- TS_SYS_TO_PIX = MAXDELAY FROM TIMEGRP "GR | SETUP | 0.012ns| 14.988ns| 0| 0 P_sysclk" TO TIMEGRP "GRP_pixclk" 15 | HOLD | 0.034ns| | 0| 0 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_memctrl_inst_ctrl_memc3_infrastructure | SETUP | 0.219ns| 9.781ns| 0| 0 _inst_clk0_bufg_in = PERIOD TIMEGRP | HOLD | 0.263ns| | 0| 0 "memctrl_inst_ctrl_memc3_infrastructur | | | | | e_inst_clk0_bufg_in" TS_memctrl_i | | | | | nst_ctrl_memc3_infrastructure_inst_sys_cl | | | | | k_ibufg / 2 HIGH 50% INPUT_JITTER | | | | | 0.2 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_memctrl_inst_ctrl_memc3_infrastructure | MINPERIOD | 1.001ns| 1.499ns| 0| 0 _inst_clk_2x_180 = PERIOD TIMEGRP | | | | | "memctrl_inst_ctrl_memc3_infrastructure_ | | | | | inst_clk_2x_180" TS_memctrl_inst_ | | | | | ctrl_memc3_infrastructure_inst_sys_clk_ib | | | | | ufg / 8 PHASE 1.25 ns HIGH 50% IN | | | | | PUT_JITTER 0.2 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_memctrl_inst_ctrl_memc3_infrastructure | MINPERIOD | 1.001ns| 1.499ns| 0| 0 _inst_clk_2x_0 = PERIOD TIMEGRP " | | | | | memctrl_inst_ctrl_memc3_infrastructure_in | | | | | st_clk_2x_0" TS_memctrl_inst_ctrl | | | | | _memc3_infrastructure_inst_sys_clk_ibufg | | | | | / 8 HIGH 50% INPUT_JITTER 0.2 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_memctrl_inst_ctrl_memc3_infrastructure | MINLOWPULSE | 15.000ns| 5.000ns| 0| 0 _inst_sys_clk_ibufg = PERIOD TIMEGRP | | | | | "memctrl_inst_ctrl_memc3_infrastructu | | | | | re_inst_sys_clk_ibufg" TS_clkin P | | | | | HASE 5 ns HIGH 50% INPUT_JITTER 0.2 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_slot9_clocking_pllinst_c2 = PERIOD TIM | SETUP | 5.608ns| 14.392ns| 0| 0 EGRP "slot9_clocking_pllinst_c2" | HOLD | 0.388ns| | 0| 0 TS_hdmi_pre_clock_in / 2.22222222 HIGH 50 | | | | | % INPUT_JITTER 0.2 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_memctrl_inst_ctrl_memc3_infrastructure | SETUP | 7.495ns| 2.505ns| 0| 0 _inst_mcb_drp_clk_bufg_in = PERIOD | HOLD | 0.463ns| | 0| 0 TIMEGRP "memctrl_inst_ctrl_memc | | | | | 3_infrastructure_inst_mcb_drp_clk_bufg_in | | | | | " TS_memctrl_inst_ctrl_memc3_infr | | | | | astructure_inst_sys_clk_ibufg / 2 HIGH | | | | | 50% INPUT_JITTER 0.2 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_SYS_TO_PIXw = MAXDELAY FROM TIMEGRP "G | SETUP | 8.098ns| 6.902ns| 0| 0 RP_pixclk" TO TIMEGRP "GRP_sysclk" 15 | HOLD | 1.571ns| | 0| 0 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_hdmi_pre_clock_in = PERIOD TIMEGRP "hd | MINLOWPULSE | 24.444ns| 20.000ns| 0| 0 mi_pre_clock_in" TS_clkin / 0.45 HIGH | | | | | 50% INPUT_JITTER 0.2 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_clkin = PERIOD TIMEGRP "clkin" 20 ns H | MINLOWPULSE | 12.000ns| 8.000ns| 0| 0 IGH 50% INPUT_JITTER 0.2 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_slot9_clocking_pllinst_c1 = PERIOD TIM | SETUP | 15.426ns| 4.574ns| 0| 0 EGRP "slot9_clocking_pllinst_c1" | HOLD | 0.124ns| | 0| 0 TS_hdmi_pre_clock_in / 2.22222222 HIGH 50 | | | | | % INPUT_JITTER 0.2 ns | | | | | ---------------------------------------------------------------------------------------------------------- TS_slot9_clocking_clk0 = PERIOD TIMEGRP " | N/A | N/A| N/A| N/A| N/A slot9_clocking_clk0" TS_hdmi_pre_ | | | | | clock_in / 2.22222222 HIGH 50% INPUT_JITT | | | | | ER 0.2 ns | | | | | ----------------------------------------------------------------------------------------------------------Looking at input clock constraint: |TS_clkin | 20.000ns| 8.000ns| 19.562nsWe see that input clock as a constraint of 20ns (50MHz), and best derivative (i.e., clocks that derive from this base clock) case is 19.562ns (51.12MHz). This clock is then propagated to other clocks, and each one has its own constraint and slack. Note some of these constraints are manual constraints (like the 15ns prop delay from sysclk to pixclk). Worst clock, which goes through memory controller, and which generates all other clocks, best case is "9.781ns" (for a period requirement of 10ns). This is internal clock for ZPUino and memory (100MHz). Yes, these timing reports are tricky. Alvie
  24. I think Xilinx just launched a new 16nm Zynq. http://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html
  25. Not in synchronous processes with signals. If you want to do that you can use variables. What you seem to be trying to model is a Write-First memory, where when a write happens the output will be set to the write value, not the previous value. Using a shared variable instead of a signal may help here. See https://github.com/alvieboy/xtc-base/blob/master/generic_dp_ram_r.vhd this for example, and it's usage on a register bank https://github.com/alvieboy/xtc-base/blob/master/regbank_2p.vhd Alvie