thefloe Posted March 27, 2013 Report Share Posted March 27, 2013 Hi, I started working on my Papilio Pro and managed to get everything building and downloading. I use a customized ZPUino HDL design derived from the RetroCade. I just removed some code and added my own interface. Now I wanted to talk to an external SPI device but encountered two problems and I hope you can help me. The SPI is initialized this way:USPICTL = BIT(SPICP2) | BIT(SPICPOL) | BIT(SPIEN)| BIT(SPISRE) | BIT(SPIBLOCK); I confirmed using a scope that the sampling clock phase is correct and data is transmitted. I cannot set the transfer size of the communication. From the HDL code I see that the SPITS bits are not used any more but that it is somehow guessed by how the USPIDATA register is accessed how many bits there are to be transferred. When trying this:USPIDATA = 0xAA;USPIDATA = 0xAAAA;USPIDATA = 0xAAAAAAAA; I can see the same 8 bit pattern being transferred. Then I thought o.k. no problem I will just transfer two 8 bit chunks to get my 16 bits but here another problem popped up. This is how I transfer data:void setDAC1(byte ch, uint16_t val){ digitalWrite(DAC1_CS, LOW); val = (uint16_t)((ch&0x01)<<15) | (1<<13) | (1<<12) | (val&0x0FFF); USPIDATA = val>>8; USPIDATA = val; digitalWrite(DAC1_CS, HIGH);} On the scope I could see that the CS pins goes HIGH before the complete second 8 bit data is transferred which should not happen because SPIBLOCK is set and the HDL is build using "zpuino_spiblocking=true" Any tips on that? Furthermore I recognized that IO speed is very low. I used a while loop and toggeled one pin on and off using digitalWrite function. The toggle speed was only at about 700kHz, which seems very low to me. Link to comment Share on other sites More sharing options...
thefloe Posted March 27, 2013 Author Report Share Posted March 27, 2013 The problem with the datatransfer size I could solve by adding the offsets to the register address:#define USPIDATA16 *((&USPIDATA)+2)#define USPIDATA24 *((&USPIDATA)+4)#define USPIDATA32 *((&USPIDATA)+6) But also for this the lock feature does not work. I need to add a while loop to wait for ready bit after the transmission.void setDAC1(byte ch, uint16_t val){ digitalWrite(DAC1_CS, LOW); val = (uint16_t)((ch&0x01)<<15) | (1<<13) | (1<<12) | (val&0x0FFF); USPIDATA16 = val; while ((USPICTL & (1<<SPIREADY)) == 0); digitalWrite(DAC1_CS, HIGH);} Link to comment Share on other sites More sharing options...
alvieboy Posted March 27, 2013 Report Share Posted March 27, 2013 On the scope I could see that the CS pins goes HIGH before the complete second 8 bit data is transferred which should not happen because SPIBLOCK is set and the HDL is build using "zpuino_spiblocking=true" Any tips on that? Furthermore I recognized that IO speed is very low. I used a while loop and toggeled one pin on and off using digitalWrite function. The toggle speed was only at about 700kHz, which seems very low to me. Hi, Add a read right after issuing the write. That will cause it to block (write only blocks if no word is being transmitted). Regarding digitalWrite: that might be an issue, but does not make much sense. But you're using Pro, and depending on the design it might be somewhat faster or slower depending on the memory architecture. Can you send me your generated "bin" file (the one with the loop) and .pde so I can see what is happening on simulation ? Perhaps its a no-issue, some computations need to be done to properly find where the pin is. I wrote some accel functions a while back, but I'm not sure if they made it into mainline. You need more performance for IO toggling ? If so, we can improve that a bit. Right now. manipulating GPIO requires RMW (read, modify, write), and that can take some time. Best,Alvie Link to comment Share on other sites More sharing options...
thefloe Posted March 27, 2013 Author Report Share Posted March 27, 2013 Hi, thank you for your answer. I will send you the files tomorrow when I'm at work again. Concerning the I/O toggle speed: It does not need to be faster, it's just that if you want to talk to communicate with some devices over SPI and use digitalWrite for selecting the different chips the time from CS low to SPI transfer and then from end of transfer to CS high takes more time than the 16bit transfer with SPICP2 set. I did not really understand how the digitalWrite function is implemented, but I got a good idea. Is there a way to access the pin more direct?Like for the AVR I can directly access the register instead of using digitalWrite. I think I will try to access the pin directly by writing to GPIODATA. Link to comment Share on other sites More sharing options...
alvieboy Posted March 27, 2013 Report Share Posted March 27, 2013 You can, but it still needs to be read, modified (most of the times requires a shift) and written back, because you don't want to mess with other GPIO on same register. And we don't have bit operators on ZPU. One of my ideas is to provide another IO space so you can set any pin to 1 or 0, without having to perform shift operations. I might work on that for next version, that will give a huge boost, with small HDL cost. Another idea is to have the SPI controller handle the CS pin itself, but that can be more troublesome on 32-bit transfers. Another idea on my mind is to support DMA-like transfers, and there you might be able to control that at start/end of the DMA. Alvie Link to comment Share on other sites More sharing options...
thefloe Posted March 27, 2013 Author Report Share Posted March 27, 2013 I would not need a shift that needs to be computed as I know which pin I want to change. So the mask would be fixed at compile time. GPIODATA = GPIODATA | 0x0100; or GPIODATA = GPIODATA & ~(0x0100) for example. When the SPI conroller handels the chip select than one would have problems accessing multiple devices over the same bus or one would need to pass which chip select to use for each transfer. Link to comment Share on other sites More sharing options...
alvieboy Posted March 27, 2013 Report Share Posted March 27, 2013 You still need to read/write, so the idea is to have a special register like: REGISTER(GPIOSET, 24)=1; // set GPIO24 to 1REGISTER(GPIOSET, 2)=0; // Set GPIO2 to 0 That would allow you to change a single GPIO, without need for reading the value. Alvie Link to comment Share on other sites More sharing options...
thefloe Posted March 27, 2013 Author Report Share Posted March 27, 2013 I think I know what you mean. I dont remember where I saw it (maybe MSP430) but there is some design using two register for manipulating the pins. One to set them high (if a one is written), and another one that clears the position if written one. Like: GPIOSET = 0x01; will se bit 0 and GPIOCLR = 0x01; will clear it. Maybe I will find some time tomorrow to implement this. Link to comment Share on other sites More sharing options...
alvieboy Posted March 27, 2013 Report Share Posted March 27, 2013 Yes, at least TMS320F does that. It can be done either individually (as I wrote) or in blocks of GPIO (as TI does). Should be a simple change actually. A toggle function can also be useful. Alvie Link to comment Share on other sites More sharing options...
thefloe Posted March 28, 2013 Author Report Share Posted March 28, 2013 Hi so I just quickly edited the zpuino_gpio.vhd and added direct write for set, clear and toggle. testing the code in a while loop gave me 4.8 MHz toggle frequency using the toggle. Here are my changes: In file zpuion_gpio.vhd I changed the last process (from line 191):process(wb_clk_i)begin if rising_edge(wb_clk_i) then if wb_rst_i='1' then gpio_tris_q <= (others => '1'); ppspin_q <= (others => '0'); gpio_q <= (others => DontCareValue); -- Default values for input/output mapper --for i in 0 to 127 loop -- input_mapper_q(i) <= 0; -- output_mapper_q(i) <= 0; --end loop; elsif wb_stb_i='1' and wb_cyc_i='1' and wb_we_i='1' then case wb_adr_i(10 downto 9) is when "00" => case wb_adr_i(6 downto 4) is when "000" => case wb_adr_i(3 downto 2) is when "00" => gpio_q(31 downto 0) <= wb_dat_i; when "01" => gpio_q(63 downto 32) <= wb_dat_i; when "10" => gpio_q(95 downto 64) <= wb_dat_i; when "11" => gpio_q(127 downto 96) <= wb_dat_i; when others => end case; when "001" => case wb_adr_i(3 downto 2) is when "00" => gpio_tris_q(31 downto 0) <= wb_dat_i; when "01" => gpio_tris_q(63 downto 32) <= wb_dat_i; when "10" => gpio_tris_q(95 downto 64) <= wb_dat_i; when "11" => gpio_tris_q(127 downto 96) <= wb_dat_i; when others => end case; when "010" => if zpuino_pps_enabled then case wb_adr_i(3 downto 2) is when "00" => ppspin_q(31 downto 0) <= wb_dat_i; when "01" => ppspin_q(63 downto 32) <= wb_dat_i; when "10" => ppspin_q(95 downto 64) <= wb_dat_i; when "11" => ppspin_q(127 downto 96) <= wb_dat_i; when others => end case; end if; when "100" => -- set bits case wb_adr_i(3 downto 2) is when "00" => gpio_q(31 downto 0) <= gpio_q(31 downto 0) or wb_dat_i; when "01" => gpio_q(63 downto 32) <= gpio_q(63 downto 32) or wb_dat_i; when "10" => gpio_q(95 downto 64) <= gpio_q(95 downto 64) or wb_dat_i; when "11" => gpio_q(127 downto 96) <= gpio_q(127 downto 96) or wb_dat_i; when others => end case; when "101" => -- clear bits case wb_adr_i(3 downto 2) is when "00" => gpio_q(31 downto 0) <= gpio_q(31 downto 0) and not wb_dat_i; when "01" => gpio_q(63 downto 32) <= gpio_q(63 downto 32) and not wb_dat_i; when "10" => gpio_q(95 downto 64) <= gpio_q(95 downto 64) and not wb_dat_i; when "11" => gpio_q(127 downto 96) <= gpio_q(127 downto 96) and not wb_dat_i; when others => end case; when "110" => -- toggle bits case wb_adr_i(3 downto 2) is when "00" => gpio_q(31 downto 0) <= gpio_q(31 downto 0) xor wb_dat_i; when "01" => gpio_q(63 downto 32) <= gpio_q(63 downto 32) xor wb_dat_i; when "10" => gpio_q(95 downto 64) <= gpio_q(95 downto 64) xor wb_dat_i; when "11" => gpio_q(127 downto 96) <= gpio_q(127 downto 96) xor wb_dat_i; when others => end case; when others => end case; when "01" => if zpuino_pps_enabled then input_mapper_q( conv_integer(wb_adr_i(8 downto 2)) ) <= conv_integer(wb_dat_i(6 downto 0)); end if; when "10" => if zpuino_pps_enabled then output_mapper_q( conv_integer(wb_adr_i(8 downto 2)) ) <= conv_integer(wb_dat_i(6 downto 0)); end if; when others => end case; end if; end if;end process; Adding it that way allows one still to use digitalWrite and read.And in the arduino register.h I added the following:#define GPIOSET(x) REGISTER(GPIOBASE,(16+x))#define GPIOCLR(x) REGISTER(GPIOBASE,(20+x))#define GPIOTGL(x) REGISTER(GPIOBASE,(24+x))#define PINSET(x) GPIOSET((x>>5))=(1<<(x&0x1F))#define PINCLR(x) GPIOCLR((x>>5))=(1<<(x&0x1F))#define PINTGL(x) GPIOTGL((x>>5))=(1<<(x&0x1F)) I could not force the Arduino IDE to generate my a listing file to check into how many instructions the shifting and calculation is broken down. But as all numbers are constant the preprocessor / compiler should do all the calculations and reduce it to one write. Does the 4.8 MHz toggle frequency make sense here (ZPUino running at 96MHz)? According to that when doing some simple calculation and neglecting the jump for the loop one write cycle would take 10 clock cycles. When accounting 4 cycles for the loop this would be 8 clock cycles. while (1) { PINTGL(0); } Hope someone can make use of this little modification. Edit: Now I made a test toggling the IOPin in a row without a loop and the toggle speed is now 6.86 MHz. This gives 7 clock cycles per write. Link to comment Share on other sites More sharing options...
alvieboy Posted March 28, 2013 Report Share Posted March 28, 2013 Excellent. Can you do a pull request on github for this ? Regarding the IO writes: in order to allow the FPGA to meet timing, some paths (like IO<->CPU) have additional delay stages (flip-flops). An additional stage is also used for IO outputs, so that all IO signals are registered. As you can see from the image, the signals are set 1 clock after the instruction is the execution stage (when you see "decoded_store"), and we change to a state called "state_store". The signals only arrive at the IO device one clock later, cause they are buffered by the IO controller. The acknowledge from the GPIO is also delayed for one cycle as it is buffered by the IO controller before arriving at the CPU. An extra stage is needed after a "store", in order to restore the stack cache. I did not bother to speed up IO, memory and stack is more important. Link to comment Share on other sites More sharing options...
thefloe Posted March 28, 2013 Author Report Share Posted March 28, 2013 just send you the pull request. At this point I want to thank you for your effort you put into the project and congratulate you to your well structured and readable code. I could really get the whole ZPUino working on the Papilio (and here I have to thank Jack for this wonderfull board) within hours. Most of the time I spend downloading the ISE Webpack. Link to comment Share on other sites More sharing options...
alvieboy Posted March 28, 2013 Report Share Posted March 28, 2013 Thanks Great that things went well for you. And keep on hacking it. Alvie Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.