Leaderboard


Popular Content

Showing content with the highest reputation since 07/08/2019 in all areas

  1. 1 point
    Loading your own programs into block RAM there are many ways to load data into a block RAM. These can be broken down into two general categories - pre-synthisis or post-presynthisis. Assuming that you are going to place a compiled AVR project into the program memory of the AVR8 processor you will most probably want to use both at differnt times, as pre-systhisis is best suited for debugging test programs for the FPGA "hardware", where as post-systhisis is really convienient for implmenting new software revisions. Pre-synthesis This involves adding the required data into the FPGA source tree, and then building a new FPGA programming bitstream. Pros: Data is present in BRAM allowing source level simulation Easy to visualise and understand Cons: Long turnaround time, as a full rebuild is required to change RAM contents before configuring devices Data must be converted into a FPGA tool friendly format Data must be known at designtime A full FPGA development toolset is needed to update the BRAM contents If you use the VDHL 'GENERATE' contruct to create multiple BRAM blocks they will all have the same contents, limiting you to around 1k of code. Only really works with BRAMs who's width matches that of the target architecture. Putting data into an 8kx8bit memory out of eight 8kx1bit bit-planes isn't feasible. A new FPGA build is required for each different BRAM contents (in this case programs) Post-synthesis This involves updating the contents of the FPGA bitstram to contain the correct data in the correct location. Pros: All BRAM instances can have different contents Only a limited FPGA toolset is required to merge the new BRAM contents into the bitstream (the data2mem utility). Fast turnaround, as no project rebuild is required Can usually be integrated into the a software development toolchain (e.g. "makefile"). Can handle more complex memory configurations such as 'bit slices'. you build the hardware once and then merge many different programs quickly Cons: Can not be used with source level simulation, but can be used with device level simulation Far more complex to understand and implement How to update BRAM contents pre-synthesis The recipie for this is: Convert your '.hex' file to VDHL 'INIT_xx' attributes. In my Papillo build this is XPM8kx16.vhd Replace the existing 'INIT_xx' in the PM_INST instance in the AVR8 processor Rebuild the project Configure the device with the resulting bit file Note that the 'INIT_xx' parameters are interpeated like really large integers (32 byte integers) in LSB format. So if your wanted to have the following bytes in memory 0000: [tt] 0000: 00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF [/tt] Your init string will be: [tt] INIT_00 => X"00000000000000000000000000000000FFEEDDCCBBAA99887766554433221100", [/tt] This doesn't make much sense until put in the context that BRAM blocks can have different widths. I've written a small 'C' program (available in the Papillo playground) which takes an Intel '.hex' file and outputs the lines to cut and past into the VDHL source. Another option (not useful in the context of the AVR8 processor) is to create a '.coe' file, which is used by the Block RAM generator wizard to set the inital values. A small example is: [tt] memory_initialization_radix=16; memory_initialization_vector=00,11,22,33,44,55,66,77,88,99,AA,BB,CC,DD,EE,FF; [/tt] If you do use a '.coe' file you will need to rebuild the BRAM IP and then the entire project after changing the file to include the updated data in your project. It is very long winded! How to update BRAM contents post-synthisis The recipie for this is: Convert your HEX file to a ".mem" file. Use the Xilinx "data2mem" program to insert the data in the '.mem' file into the '.bit' file. Configure the device with the resulting bit file. The merging process uses a "_bd.bmm" file define the "address space" created by one or more BRAM blocks. The BMM file for my AVR8 looks like: [tt] ADDRESS_MAP avrmap PPC405 0 0x00000000:0x00003FFF (16 KBytes). ADDRESS_SPACE rom_code RAMB16 [0x00000000:0x00003FFF] BUS_BLOCK avr_processor/PM_Inst/RAM_Inst[0].RAM_Word [15:0] PLACED = X1Y7; END_BUS_BLOCK; BUS_BLOCK avr_processor/PM_Inst/RAM_Inst[1].RAM_Word [15:0] PLACED = X1Y0; END_BUS_BLOCK; ... BUS_BLOCK avr_processor/PM_Inst/RAM_Inst[7].RAM_Word [15:0] PLACED = X1Y6; END_BUS_BLOCK; END_ADDRESS_SPACE; END_ADDRESS_MAP; [/tt] What first perplexed me was how to get the vaules for the "PLACED = XxYy" clause, but I discoved that the FPGA toolset does this for you. You first create a template "whatever.bmm" (e.g. "progmem.bmm") without the PLACED clauses, and during the Place & Route this gets updated and saved as "whatever_bd.bmm". Simple really once you know how it works. I've chosen to implement my merge as a windows CMD script, which copies in the ".hex" file from the project, then srec_cat converts it, and finally data2mem merges it with the FPGA bitstream: [tt] copy "c:AVR\vgatest\default\vgatest.hex" . srec_cat vgatest.hex -Intel --byte-swap 2 -Data_Only -Line_Length 105 -o vgatest.mem -vmem 8 C:\Xilinx\12.4\ISE_DS\ISE\bin\nt\data2mem -bm progmem_bd.bmm -bd vgatest.mem -bt avr8.bit -o b vgatest.bit [/tt] Due to unexpected behaviour in data2mem the ".mem" file needs to have an even number of bytes per line. A line length of "105" is enough to have 32 bytes on each line and matches nicely with the data in the '.hex' file. Debugging when things go wrong As with all things, debugging is the hard bit. The data2mem utility allows you to dump the contents of a bitstream, and you can then view it in a text editor. If you are lucky to be running on UNIX you can then use the 'diff' utility to compare the bitstream contents before and after the data2mem. As the AVR8 has an interrupt table at the start of memory it's regular structure is a great help. If you also use supply "_bd.bmm" file the BRAMs will be named with their instance names: [tt] C:\Xilinx\12.4\ISE_DS\ISE\bin\nt\data2mem -bm progmem_bd.bmm -bt avr8.bit -d [/tt] Gives me: [tt] ... BRAM data, Column 01, Row 07. Design instance "avr_processor/PM_Inst/RAM_Inst[0].RAM_Word". 00000000: 94 0C 00 30 94 0C 00 52 94 0C 00 52 94 0C 00 52 94 0C 00 52 94 0C 00 52 94 0C 00 52 94 0C 00 52 ...0...R...R...R...R...R...R...R 00000020: 94 0C 00 52 94 0C 00 52 94 0C 00 52 94 0C 00 52 94 0C 00 52 94 0C 00 52 94 0C 00 52 94 0C 00 52 ...R...R...R...R...R...R...R...R ... [/tt] And there you have it! Hope it saves you a few days of banging you head against what feels like a brick wall.
  2. 1 point
    I just wanted to show off my latest FPGA project, this is not currently Papilio based but I'm contemplating doing a megawing for the Papilio One/Pro if there is enough interest, I don't want to manufacture it but I would open source the design. For the time being this is based on the little Altera Cyclone II boards that can be had for about $13, cheap enough to integrate into projects. I designed a daughterboard that plugs onto these and holds a triple DAC, op-amps with gain and offset adjustments for the X and Y deflection, a DC-DC converter to provide the +/-15V rails for the amps, an audio amplifier with volume control and I tossed in a I2C EEPROM with hopes of using it to emulate the EAROM that stores the high scores in Asteroids Deluxe. Initially I tried using delta-sigma DACs for the deflection as Spritesmods did with his Black Widow project but I was not happy with the result. Then I got the idea to use a VGA DAC since it's a high quality high speed 10 bit triple DAC and there was one on my DE2 dev board that I used for prototyping. It worked perfectly so I carried it over to this design. The monitor is based on a cheap 5" B&W CRT TV with the vertical winding of the deflection yoke rewound and a custom electronics, the initial deflection board was designed by Fred Kono a number of years ago but I'm working on a cleaner and more integrated solution. I've also tested this FPGA board with a G05 and a 19K6101 vector monitor in my fullsized cabinets and it drives them fine. As it stands, I've got Asteroids Deluxe working perfectly using code originally from fpgaarcade.com modified to eliminate the rasterizer and ported to my hardware. As of yesterday I also have the original Asteroids working except I have not modeled the analog circuits for many of the sounds which Deluxe replaced with a POKEY chip. If anyone is interested in helping out with this project there are a few items on the to-do list that I could use a hand with, in the process I'd be happy to post the code I have and provide all of the details for anyone who wants to replicate this to do so. Eventually my plan is to build a few miniature arcade cabinets replicating the original classic games in small desktop form. To do: - Implement missing sound effects in Asteroids, all of the control logic is there and working, still need the thump-thump, ship and saucer firing sound and saucer warble. - Implement EAROM emulation, I'm not entirely certain how feasible this is but I included an EEPROM because there was space. It's trivial to use a block RAM in place of the EAROM but that needs to be backed up to and restored from the serial EEPROM. - Get Lunar Lander working on the platform, I made an initial attempt and was not successful, I plan to give it another go. I've brought out the I2C bus on my board with thoughts of using a I2C ADC for the thrust input. - Implement Omega Race hardware, this is a much larger task since it's a completely different hardware platform. It uses the same 10 bit DACs as the Atari games and the hardware is of similar complexity. One last thing worth mentioning, these little CRT TVs are currently fairly easy to find but they are going away fast. They are essentially useless to most people since the death of analog TV but they are perfect for projects like this, a real CRT is the only way games like this look right at all, if you come across these things pick them up while you can because nobody is making CRTs anymore and they are disappearing fast.
  3. 1 point
    That's fine, thanks for giving it a go. I'll take a look over the weekend and upload a ready to go configuration.
  4. 1 point
    I found Free Range VHDL to be an excellent book, and it's free in digital form too. Anyway back to the Lunar Lander issue, I *think* I accounted for the differences you mention. Attached are snippets of the input 0 buffers for both Asteroids and Lunar Lander. Note that Asteroids uses a 74LS251 which is a selector/multiplexer with an inverted output read on D7. Lunar Lander on the other hand uses a 74LS367 tristate buffer which is non-inverting and read on D7, D6, D2, D1 and D0, both are active when SINP0 goes low. Now for the code, here's a snippet from Asteroids: -- self test, slam, diag step, fire, hyper control_ip0_l <= "11111"; control_ip0_l(4) <= SELF_TEST_SWITCH_L; control_ip0_l(3) <= '1'; -- slam control_ip0_l(2) <= '1'; -- diag step control_ip0_l(1) <= BUTTON(4); -- fire control_ip0_l(0) <= BUTTON(5); -- shield test_l <= SELF_TEST_SWITCH_L; p_input_sel : process(c_addr, dips_p6_l, control_ip0_l, control_ip1_l, clk_3k, halt) begin control_ip0_sel <= '0'; case c_addr(2 downto 0) is when "000" => control_ip0_sel <= '1'; when "001" => control_ip0_sel <= not clk_3k; when "010" => control_ip0_sel <= not halt; when "011" => control_ip0_sel <= not control_ip0_l(0); when "100" => control_ip0_sel <= not control_ip0_l(1); when "101" => control_ip0_sel <= not control_ip0_l(2); when "110" => control_ip0_sel <= not control_ip0_l(3); when "111" => control_ip0_sel <= not control_ip0_l(4); when others => null; end case; p_cpu_data_mux : process(c_addr, ram_dout, rom_dout, vg_dout, zpage_l, pmem_l, vmem_l, sinp0_l, control_ip0_sel, sinp1_l, control_ip1_sel, dpts_l, dips_ip_sel) begin c_din <= (others => '0'); if (sinp0_l = '0') then c_din <= control_ip0_sel & "1111111"; elsif (sinp1_l = '0') then c_din <= control_ip1_sel & "1111111"; elsif (dpts_l = '0') then c_din <= "111111" & dips_ip_sel; elsif (zpage_l = '0') then c_din <= ram_dout; elsif (pmem_l = '0') then c_din <= rom_dout; elsif (vmem_l = '0') then c_din <= vg_dout; end if; end process; Note that each of the signals in the first section is inverted, accounting for the inverted output of the selector/multiplexer. Now looking at a snippet (with some irrelevant bits removed for clarity) from Lunar Lander: p_input_registers : process begin wait until rising_edge(CLK_6); -- diag step, 3khz, slam, self test, halt control_ip0_l <= "11111"; control_ip0_l(4) <= '1'; -- diag step control_ip0_l(3) <= clk_3K; -- 3 khz control_ip0_l(2) <= SELF_TEST_SWITCH_L; control_ip0_l(1) <= '1'; -- slam control_ip0_l(0) <= halt; end process; p_cpu_data_mux : process(c_addr, ram_dout, rom_dout, vg_dout, zpage_l, pmem_l, vmem_l, sinp0_l, control_ip0_l, sinp1_l, control_ip1_sel, dpts_l, potin_l, potval, dips_ip_sel) begin c_din <= (others => '0'); if (sinp0_l = '0') then c_din <= control_ip0_l(4 downto 3) & "111" & control_ip0_l(1) & control_ip0_l(2) & control_ip0_l(0); elsif (sinp1_l = '0') then c_din <= control_ip1_sel & "1111111"; elsif (dpts_l = '0') then c_din <= "111111" & dips_ip_sel; elsif (potin_l = '0') then c_din <= potval; elsif (zpage_l = '0') then c_din <= ram_dout; elsif (pmem_l = '0') then c_din <= rom_dout; elsif (vmem_l = '0') then c_din <= vg_dout; end if; end process; Note that in this case there is no intermediate step so I'm not inverting those signals, and rather than the signals being read one at a time onto bit 7 of the data bus, they are all fed simutaneously to the CPU data in when sinP0_l is low.
  5. 1 point
    Well here's where I am with this so far, I'm posting these in case I get hit by a bus or something, one of my pet peeves is when people show off cool projects but then never release the code, so here's the code. Currently this is set up for an Altera FPGA but it was originally written for Xilinx and I do intend to port it back to the Papilio, doing so is relatively easy. These are set up to work with the program ROMs in an external parallel EEPROM because the little EP2C5T155C8 I'm using lacks sufficient block RAM to hold all the ROMs internally but the FPGA on the Papilio boards is large enough that this is not needed. Anyway here's the state of things: Asteroids Deluxe - Fully working, no high score save yet but that was never implemented by MikeJ who originally released this on fpgaarcade.com Asteroids - Works but several of the sounds are missing which really detracts from the game. If someone wants to work on modeling the missing sound circuits that would be cool. Lunar Lander - This is not working at all yet and I'm banging my head against the wall trying to determine why. The hardware is very, very similar to that of Asteroids, more ROM, less RAM, one of the input banks is done differently, it also has an analog input for the thrust control but that shouldn't be necessary for the attract mode to run. I could really use a bit of help getting this to run at all at which point I'll work on the details. Currently I've got the watchdog disabled because otherwise the reset keeps pulsing. Trying to figure this out has distracted me from finalizing the hardware revisions and polishing up the other two games. I printed out the schematics, highlighted all the changes I could find and then methodically implemented them in the code but it's possible I missed something somewhere. Asteroids Deluxe.zip Asteroids.zip Lunar Lander.zip
  6. 1 point
    First, on latches: The Xilinx Spartan-6 does support latches. The registers in a CLB can be edge-triggered flip-flops or level-sensitive latches. The are other FPGA families that only have edge-triggered flip-flops. The CLK placement error is more serious. This error usually means that there's a conflict between two or more clock signals trying to use the same clock buffers. Your conflict might be the latch clock, or it could be something else. Spartan-6 clocking is very complex (see the Xilinx "Spartan-6 FPGA Clocking Resources User Guide UG382). Take a look at all the signals in the Map/Placement report that Xilinx thinks are clocks, and how Xilinx has assigned them to pins and clock buffers. There may be some surprises there. I've found it useful to print out some of the clock diagrams in UG382 to check for conflicts. It might be useful to post your VHDL source. I mostly do Verilog myself, but others in this community might see from the VHDL why you're having problems. In FPGA designs, it's usually best to have a single clock. You then sample inputs using this clock, usually with an extra flip-flop stage to reduce metastability. Slower clocks are often simulated by the FF's clock enable to convert the master clock to a slower clock.