ROCCC 2.0 has anyone converted c++ code to work in hardware from this?


Guest josheeg

Recommended Posts

Guest josheeg

ROCCC 2.0 has anyone converted c++ code to work in hardware from this?

I would be interested in agreeing on a price for a tutorial turning C++ from ALGLib's LDA code to something that can run in a papilio.

Link to comment
Share on other sites

  • 8 months later...

Sorry for resurrecting an old thread, but I just got my papilio and jumped onto the forums, and I thought that I would be the best person to answer this, seeing as I worked on the ROCCC 2.0 backend. ROCCC really wasn't built for general purpose EDA, but rather for optimizing datapaths embedded in loop nests; I can imagine that most users of a small development board, like the Papilio, will get more usage out of a soft processor than out of a core created through ROCCC. If you really want to embed a ROCCC core, most likely the biggest hurdle you will find with porting a ROCCC core to something like the papilio is space constraints; ROCCC optimizes for throughput, not space, so throwing a big datapath into 250-500k gates will probably be difficult. That being said, I would love to get a ROCCC core working next to a soft processor, I just can't think of an example application!

Link to comment
Share on other sites

Adrian,

Thank you for joining us, I just took a quick look at ROCC and it looks interesting. I'm currently working on an audio based Kickstarter project and it occurred to me that ROCC might be a good candidate for making a library of audio effects that run in hardware? Once the Kickstarter project is finished and more details are out there we can talk about it more...

Jack.

Link to comment
Share on other sites

Jack - Sounds interesting! I'd love to help out, audio stream processing, such as FFT, is actually a decent fit for what ROCCC is designed for. More importantly, I'd be very interested in some audio libraries - a friend and I are working on digital effect stomp boxes, but we are really running into the limits of (cheap) software, and the obvious next step is to move to hardware.

Link to comment
Share on other sites

Adrian, mind me the question, but how does ROCCC differ from other high-level HDL languages such as SystemC ?

I understand that coding some HDL parts in high level "languages" can reduce the overall TTM and allows some abstraction, but in the long run, people just need to rewrite those to match the actual "hardware design".

As we all know, in digital circuits the slowest path (on the same clock domain) defines the overall frequency a system can run. High level languages do not allow you to have a fine-grained control over those combinatorial circuits, so, more than often, your design is slower just because you don't actually have an idea about how it's going to be implemented.

Most algorithms are so complex they require some aid in order to become pipelined, so to not disturb other fast components.

Alvie

Link to comment
Share on other sites

Hi Alvie, let me address your two questions separately:

Adrian, mind me the question, but how does ROCCC differ from other high-level HDL languages such as SystemC ?

I understand that coding some HDL parts in high level "languages" can reduce the overall TTM and allows some abstraction, but in the long run, people just need to rewrite those to match the actual "hardware design".

ROCCC is not really an HDL like SystemC, but rather a compiler for C kernels down to RTL VHDL. You dont so much "describe" your hardware in ROCCC as you write the equivalent C code that would implement that algorithm. To give you an extremely simple example, a 4-tap FIR filter acting on a stream of data would be coded in ROCCC like this:

[tt]

typedef int ROCCC_int32;

void FIRSystem(ROCCC_int32* A, int length, ROCCC_int32* B)

{

  for(int i = 0; i < length; ++i)

  {

      B = A[i+0] * 1 + A[i+1] * 3 + A[i+2] * 5 + A[i+3] * 7;

  }

}

[/tt]

ROCCC then optimizes the datapath and access windows, and generates RTL VHDL that implements the same algorithm. Again, ROCCC is designed specifically for stream-based kernel calculations; you do not write your entire hardware algorithm in ROCCC.

As we all know, in digital circuits the slowest path (on the  same clock domain) defines the overall frequency a system can run. High  level languages do not allow you to have a fine-grained control over  those combinatorial circuits, so, more than often, your design is slower  just because you don't actually have an idea about how it's going to be  implemented.

Most algorithms are so complex they require some aid in order to become pipelined, so to not disturb other fast components.

I've put a great deal of effort into making sure the code generated by ROCCC performs well in terms of frequency and throughput; area, often, is sacrificed for these goals, but ROCCC targets the domain of high-performance computing, where this is generally acceptable. In terms of throughput, ROCCC is generally very well performing; for example, I was able to get 1.76 GB/s throughput implementing SHA1 from C in ROCCC, compared to a commercial SHA1 ASIC ipcore that gets 2.185 GB/s throughput. For an automated tool, that is pretty impressive, especially considering we are going to an FPGA rather than an ASIC.

ROCCC implements automatic register placement along critical paths to help aid timing; this is guided by the user supplying estimates of the delay of basic components, such as adds or multiplies, and also a desired goal frequency. In this way, area / frequency tradeoffs can be explored, without needing to rewrite the source C code.

That is the main gist of ROCCC, I am not 100% sure if ROCCC is a good match for the sorts of things hobbyists are interested in, but I would love to see an example application come up. The most likely application would be an audio or video coprocessor.

Link to comment
Share on other sites

  • 2 weeks later...

Most of signal processing algorithm are developed in C, C    or matlab.  Then there are task that consist to transform this code to RTL code for  an hw implementation. It is a time consuming effort because you have :

1) to be bit true with reference model algo

2) to meet timing constraint.

When one of the following point is not met a correction can break the other target. Usually Development loop can take months.

Such  tool give you point 1) for far less effort than if you had to do it by  hand. Then the tool config can propose some trade off between  surface/ressource and speed. But sometime it is still not enough to meet  requirement. Then you still have the possibility to rework your C code  to constrain differently the tool.

One important things to  understand is that even if commercially speaking it says "C to hdl -  easy - buy it" if you don't have a good understanding of hdl meaning (  combinatorial mesh, reg-to-reg, delay path) you can't easily understand  what the tool does. I don't know ROCCC. Usually such tools have few  benefice to support all C and/or C    specification because most of the  time the construct doesn't make sense to represent HW (example  C  standard define variadic function that can take a variable number of  argument, printf is a variadic function. What hell it will mean to deal  with HW ??? ):

That's why it is most of the time not possible to take  an existing library and pass it into a HLS-flow (High Language  Synthesis) in a push button way.

SystemC is like verilog or vhdl a  way to represent the hardware but it is far often use to do high level  modeling to speed up simulation. It is far easy to understand that on a  computer you will get the result of    [tt]void FIRSystem(ROCCC_int32*  A, int length, ROCCC_int32* B) { [...] for ([...]) {[...]}}[/tt]

faster than if you have to simulate the equivalent netlist in rtl and  then you have benefice to use a C code that can be use for both high  level Simulation and high level Synthesis.

Hobbyists are interrested in everything soon as it is accessible to their  wallet ! Moreover we never know who are hobbyists  some of them may  provide good feedback ... we never know :)

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.