Using SPI on the ZPUino


thefloe

Recommended Posts

Hi,

 

I started working on my Papilio Pro and managed to get everything building and downloading. I use a customized ZPUino HDL design derived from the RetroCade. I just removed some code and added my own interface.

 

Now I wanted to talk to an external SPI device but encountered two problems and I hope you can help me. The SPI is initialized this way:

USPICTL = BIT(SPICP2) | BIT(SPICPOL) | BIT(SPIEN)| BIT(SPISRE) | BIT(SPIBLOCK);

 

I confirmed using a scope that the sampling clock phase is correct and data is transmitted. I cannot set the transfer size of the communication. From the HDL code I see that the SPITS bits are not used any more but that it is somehow guessed by how the USPIDATA register is accessed how many bits there are to be transferred. 

When trying this:

USPIDATA = 0xAA;USPIDATA = 0xAAAA;USPIDATA = 0xAAAAAAAA;

 

 

I can see the same 8 bit pattern being transferred. Then I thought o.k. no problem I will just transfer two 8 bit chunks to get my 16 bits but here another problem popped up. This is how I transfer data:



void setDAC1(byte ch, uint16_t val){  digitalWrite(DAC1_CS, LOW);  val = (uint16_t)((ch&0x01)<<15) | (1<<13) | (1<<12) | (val&0x0FFF);   USPIDATA = val>>8;  USPIDATA = val;  digitalWrite(DAC1_CS, HIGH);}

 

 

 

On the scope I could see that the CS pins goes HIGH before the complete second 8 bit data is transferred which should not happen because SPIBLOCK is set and the HDL is build using "zpuino_spiblocking=true" 

 

Any tips on that?

 

Furthermore I recognized that IO speed is very low. I used a while loop and toggeled one pin on and off using digitalWrite function. The toggle speed was only at about 700kHz, which seems very low to me.

Link to comment
Share on other sites

The problem with the datatransfer size I could solve by adding the offsets to the register address:

#define USPIDATA16 *((&USPIDATA)+2)#define USPIDATA24 *((&USPIDATA)+4)#define USPIDATA32 *((&USPIDATA)+6)

 

 

But also for this the lock feature does not work. I need to add a while loop to wait for ready bit after the transmission.

void setDAC1(byte ch, uint16_t val){  digitalWrite(DAC1_CS, LOW);  val = (uint16_t)((ch&0x01)<<15) | (1<<13) | (1<<12) | (val&0x0FFF);   USPIDATA16 = val;  while ((USPICTL & (1<<SPIREADY)) == 0);  digitalWrite(DAC1_CS, HIGH);}
Link to comment
Share on other sites

On the scope I could see that the CS pins goes HIGH before the complete second 8 bit data is transferred which should not happen because SPIBLOCK is set and the HDL is build using "zpuino_spiblocking=true" 

 

Any tips on that?

 

Furthermore I recognized that IO speed is very low. I used a while loop and toggeled one pin on and off using digitalWrite function. The toggle speed was only at about 700kHz, which seems very low to me.

 

Hi,

 

Add a read right after issuing the write. That will cause it to block (write only blocks if no word is being transmitted).

 

Regarding digitalWrite: that might be an issue, but does not make much sense. But you're using Pro, and depending on the design it might be somewhat faster or slower depending on the memory architecture.

 

Can you send me your generated "bin" file (the one with the loop) and .pde so I can see what is happening on simulation ? Perhaps its a no-issue, some computations need to be done to properly find where the pin is. I wrote some accel functions a while back, but I'm not sure if they made it into mainline.

 

You need more performance for IO toggling ? If so, we can improve that a bit. Right now. manipulating GPIO requires RMW (read, modify, write), and that can take some time.

 

Best,

Alvie

Link to comment
Share on other sites

Hi,

 

thank you for your answer.

 

I will send you the files tomorrow when I'm at work again.

 

Concerning the I/O toggle speed: It does not need to be faster, it's just that if you want to talk to communicate with some devices over SPI and use digitalWrite for selecting the different chips the time from CS low to SPI transfer and then from end of transfer to CS high takes more time than the 16bit transfer with SPICP2 set. 

 

I did not really understand how the digitalWrite function is implemented, but I got a good idea. Is there a way to access the pin more direct?

Like for the AVR I can directly access the register instead of using digitalWrite.

 

I think I will try to access the pin directly by writing to GPIODATA.

Link to comment
Share on other sites

You can, but it still needs to be read, modified (most of the times requires a shift) and written back, because you don't want to mess with other GPIO on same register. And we don't have bit operators on ZPU.

 

One of my ideas is to provide another IO space so you can set any pin to 1 or 0, without having to perform shift operations. I might work on that for next version, that will give a huge boost, with small HDL cost.

 

Another idea is to have the SPI controller handle the CS pin itself, but that can be more troublesome on 32-bit transfers. Another idea on my mind is to support DMA-like transfers, and there you might be able to control that at start/end of the DMA.

 

Alvie

Link to comment
Share on other sites

I would not need a shift that needs to be computed as I know which pin I want to change. So the mask would be fixed at compile time. GPIODATA = GPIODATA | 0x0100; or GPIODATA = GPIODATA & ~(0x0100) for example.

 

When the SPI conroller handels the chip select than one would have problems accessing multiple devices over the same bus or one would need to pass which chip select to use for each transfer.

Link to comment
Share on other sites

I think I know what you mean. I dont remember where I saw it (maybe MSP430) but there is some design using two register for manipulating the pins. One to set them high (if a one is written), and another one that clears the position if written one.

 

Like: GPIOSET = 0x01; will se bit 0 and GPIOCLR = 0x01; will clear it.

 

Maybe I will find some time tomorrow to implement this.

Link to comment
Share on other sites

Hi so I just quickly edited the zpuino_gpio.vhd and added direct write for set, clear and toggle. testing the code in a while loop gave me 4.8 MHz toggle frequency using the toggle.

 

Here are my changes:

 

In file zpuion_gpio.vhd I changed the last process (from line 191):

process(wb_clk_i)begin  if rising_edge(wb_clk_i) then    if wb_rst_i='1' then      gpio_tris_q <= (others => '1');      ppspin_q <= (others => '0');      gpio_q <= (others => DontCareValue);      -- Default values for input/output mapper      --for i in 0 to 127 loop      --  input_mapper_q(i) <= 0;      --  output_mapper_q(i) <= 0;      --end loop;    elsif wb_stb_i='1' and wb_cyc_i='1' and wb_we_i='1' then      case wb_adr_i(10 downto 9) is        when "00" =>          case wb_adr_i(6 downto 4) is            when "000" =>              case wb_adr_i(3 downto 2) is                when "00" =>                  gpio_q(31 downto 0) <= wb_dat_i;                when "01" =>                  gpio_q(63 downto 32) <= wb_dat_i;                when "10" =>                  gpio_q(95 downto 64) <= wb_dat_i;                when "11" =>                  gpio_q(127 downto 96) <= wb_dat_i;                when others =>              end case;            when "001" =>              case wb_adr_i(3 downto 2) is                when "00" =>                  gpio_tris_q(31 downto 0) <= wb_dat_i;                when "01" =>                  gpio_tris_q(63 downto 32) <= wb_dat_i;                when "10" =>                  gpio_tris_q(95 downto 64) <= wb_dat_i;                when "11" =>                  gpio_tris_q(127 downto 96) <= wb_dat_i;                when others =>              end case;            when "010" =>              if zpuino_pps_enabled then                case wb_adr_i(3 downto 2) is                  when "00" =>                    ppspin_q(31 downto 0) <= wb_dat_i;                  when "01" =>                    ppspin_q(63 downto 32) <= wb_dat_i;                  when "10" =>                    ppspin_q(95 downto 64) <= wb_dat_i;                  when "11" =>                    ppspin_q(127 downto 96) <= wb_dat_i;                  when others =>                end case;              end if;				when "100" =>		-- set bits              case wb_adr_i(3 downto 2) is                when "00" =>                  gpio_q(31 downto 0) <= gpio_q(31 downto 0) or wb_dat_i;                when "01" =>                  gpio_q(63 downto 32) <= gpio_q(63 downto 32) or wb_dat_i;                when "10" =>                  gpio_q(95 downto 64) <= gpio_q(95 downto 64) or wb_dat_i;                when "11" =>                  gpio_q(127 downto 96) <= gpio_q(127 downto 96) or wb_dat_i;                when others =>              end case;								when "101" =>		-- clear bits              case wb_adr_i(3 downto 2) is                when "00" =>                  gpio_q(31 downto 0) <= gpio_q(31 downto 0) and not wb_dat_i;                when "01" =>                  gpio_q(63 downto 32) <= gpio_q(63 downto 32) and not wb_dat_i;                when "10" =>                  gpio_q(95 downto 64) <= gpio_q(95 downto 64) and not wb_dat_i;                when "11" =>                  gpio_q(127 downto 96) <= gpio_q(127 downto 96) and not wb_dat_i;                when others =>              end case;										when "110" =>		-- toggle bits              case wb_adr_i(3 downto 2) is                when "00" =>                  gpio_q(31 downto 0) <= gpio_q(31 downto 0) xor wb_dat_i;                when "01" =>                  gpio_q(63 downto 32) <= gpio_q(63 downto 32) xor wb_dat_i;                when "10" =>                  gpio_q(95 downto 64) <= gpio_q(95 downto 64) xor wb_dat_i;                when "11" =>                  gpio_q(127 downto 96) <= gpio_q(127 downto 96) xor wb_dat_i;                when others =>              end case;							              when others =>          end case;        when "01" =>          if zpuino_pps_enabled then            input_mapper_q( conv_integer(wb_adr_i(8 downto 2)) ) <= conv_integer(wb_dat_i(6 downto 0));          end if;        when "10" =>          if zpuino_pps_enabled then            output_mapper_q( conv_integer(wb_adr_i(8 downto 2)) ) <= conv_integer(wb_dat_i(6 downto 0));          end if;        when others =>      end case;    end if;  end if;end process;

 

Adding it that way allows one still to use digitalWrite and read.

And in the arduino register.h I added the following:

#define GPIOSET(x)	REGISTER(GPIOBASE,(16+x))#define GPIOCLR(x)	REGISTER(GPIOBASE,(20+x))#define GPIOTGL(x)	REGISTER(GPIOBASE,(24+x))#define PINSET(x) GPIOSET((x>>5))=(1<<(x&0x1F))#define PINCLR(x) GPIOCLR((x>>5))=(1<<(x&0x1F))#define PINTGL(x) GPIOTGL((x>>5))=(1<<(x&0x1F))

 

I could not force the Arduino IDE to generate my a listing file to check into how many instructions the shifting and calculation is broken down. But as all numbers are constant the preprocessor / compiler should do all the calculations and reduce it to one write. Does the 4.8 MHz toggle frequency make sense here (ZPUino running at 96MHz)? According to that when doing some simple calculation and neglecting the jump for the loop one write cycle would take 10 clock cycles. When accounting 4 cycles for the loop this would be 8 clock cycles.

 

  while (1) {    PINTGL(0);  }

 

Hope someone can make use of this little modification.

 

Edit:

 

Now I made a test toggling the IOPin in a row without a loop and the toggle speed is now 6.86 MHz. This gives 7 clock cycles per write.

Link to comment
Share on other sites

Excellent. Can you do a pull request on github for this ?

 

Regarding the IO writes: in order to allow the FPGA to meet timing, some paths (like IO<->CPU) have additional delay stages (flip-flops).

gpio.png

 

An additional stage is also used for IO outputs, so that all IO signals are registered. As you can see from the image, the signals are set 1 clock after the instruction is the execution stage (when you see "decoded_store"), and we change to a state called "state_store". The signals only arrive at the IO device one clock later, cause they are buffered by the IO controller. The acknowledge from the GPIO is also delayed for one cycle as it is buffered by the IO controller before arriving at the CPU. An extra stage is needed after a "store", in order to restore the stack cache.

 

I did not bother to speed up IO, memory and stack is more important.

Link to comment
Share on other sites

just send you the pull request.

 

At this point I want to thank you for your effort you put into the project and congratulate you to your well structured and readable code. I could really get the whole ZPUino working on the Papilio (and here I have to thank Jack for this wonderfull board) within hours. Most of the time I spend downloading the ISE Webpack. 

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.