We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Synthesizing gateware with GCC

00:00

Formale Metadaten

Titel
Synthesizing gateware with GCC
Untertitel
Bambu a free framework for the high-level synthesis of complex applications
Alternativer Titel
Electronic Design Automation - Bambu
Serientitel
Anzahl der Teile
150
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
Produktionsjahr2015

Inhaltliche Metadaten

Fachgebiet
Genre
Field programmable gate arraySoftwareHardwareEnergiedichteSimulationDiskrete-Elemente-MethodePhysikalisches SystemLineare AbbildungKonfiguration <Informatik>Dienst <Informatik>RechenzentrumFreewareLogiksyntheseMAPQuaderRahmenproblemProzess <Informatik>AppletEin-AusgabeFunktion <Mathematik>StandardabweichungCodeOISCStochastische AbhängigkeitGlobale OptimierungTuring-TestWärmeübergangGenerizitätEmulatorKonstanteInformationAliasingPartielle DifferentiationEliminationsverfahrenRegulärer Ausdruck <Textverarbeitung>NummernsystemGroße VereinheitlichungQuellcodeBitElektronische PublikationPlug inAnalysisCliquenweiteDifferenteGeradeCompilerROM <Informatik>BimodulGamecontrollerOpen SourcePunktZeiger <Informatik>Auflösung <Mathematik>GasströmungÜbersetzer <Informatik>AusnahmebehandlungSimulationSoftwareURLFormation <Mathematik>Ordnung <Mathematik>Demoszene <Programmierung>BeobachtungsstudieCoxeter-GruppeHardwarebeschreibungsspracheSpeicheradresseProgrammierspracheTranslation <Mathematik>MultiplikationsoperatorDeskriptive StatistikOpen SourceMAPResultanteFormale SpracheFront-End <Software>MikrocontrollerMikroprozessorParametersystemProgrammierungMaßerweiterungSchlüsselverwaltungNummernsystemAdhäsionSpieltheorieUmwandlungsenthalpieMereologieSprachsyntheseInverser LimesHypermediaSchiefe WahrscheinlichkeitsverteilungElement <Gruppentheorie>Puffer <Netzplantechnik>Globale OptimierungStandardabweichungAnalysisCompilerMailing-ListeLesen <Datenverarbeitung>Kartesische KoordinatenLeistung <Physik>TermSelbstrepräsentationTransformation <Mathematik>LogiksyntheseProzess <Informatik>PaarvergleichGamecontrollerEnergiedichteAusnahmebehandlungFunktionalGrundraumGewicht <Ausgleichsrechnung>ZwischenspracheHöhere ProgrammierspracheBitRechenwerkMessage-PassingSpannweite <Stochastik>Generator <Informatik>Wort <Informatik>HardwarePlug inProdukt <Mathematik>HalbleiterspeicherBetriebsmittelverwaltungProjektive EbeneKernel <Informatik>Endliche ModelltheorieSerielle SchnittstelleGrenzschichtablösungComputeranimation
ProgrammbibliothekDiskrete-Elemente-MethodeFreewareModul <Datentyp>PunktDifferenteEndliche ModelltheorieROM <Informatik>GruppoidMereologieFunktion <Mathematik>SystemzusammenbruchSimulationCodeGerichtete MengeHardwarebeschreibungsspracheHardwareSoftwaretestLineare RegressionSuite <Programmpaket>CoprozessorPuls <Technik>Anpassung <Mathematik>DifferentialPhysikalisches SystemLineare AbbildungAnalysisMobiles InternetSpezialrechnerAdvanced Encryption StandardData Encryption StandardAlgorithmusHash-AlgorithmusCompilerLogiksyntheseSkriptspracheField programmable gate arrayOISCOpen SourceRhombus <Mathematik>Desintegration <Mathematik>KnotenmengeKryptologieSpeicherabzugTouchscreenSoftwareentwicklerProtokoll <Datenverarbeitungssystem>MAPRahmenproblemQuellcodeSpeichermodellProjektive EbeneOrdnung <Mathematik>BetriebsmittelverwaltungGemeinsamer SpeicherCASE <Informatik>Reverse EngineeringCoprozessorDistributionenraumSpannweite <Stochastik>LogiksyntheseDatenparallelitätMultiplikationsoperatorFunktionalSpeicherabzugRekursive FunktionWhiteboardZusammenhängender GraphWort <Informatik>HalbleiterspeicherCompilerPrimitive <Informatik>TermersetzungssystemMailing-ListeSimulationt-TestLineare RegressionSchnittmengePunktOpen SourceMAPTermEinfache GenauigkeitGenerator <Informatik>SoftwareAnalysisKonfiguration <Informatik>FreewareBitFlächeninhaltRechenwerkFaktor <Algebra>Bus <Informatik>EreignishorizontDeskriptive StatistikSoftwaretestVersionsverwaltungParametersystemHardwareZeiger <Informatik>UmwandlungsenthalpieBeobachtungsstudieProgrammierungGlobale OptimierungFormale SpracheProzess <Informatik>AdressraumEndliche ModelltheorieKrümmungsmaßZahlenbereichKontrollstrukturPaarvergleichGleitendes MittelTabelleTouchscreenURLIntegralDifferenz <Mathematik>ÄhnlichkeitsgeometriePhysikalisches SystemGamecontrollerAnalogieschlussSymboltabelleUmkehrung <Mathematik>AdditionMenütechnikBildschirmmaskeSoftwareentwicklerPi <Zahl>Produkt <Mathematik>Gewicht <Ausgleichsrechnung>Gebäude <Mathematik>RechteckGleitkommarechnungSIMA-DialogverfahrenComputeranimation
GoogolComputeranimation
Transkript: Englisch(automatisch erzeugt)
OK, so I can start. So discussion later. OK, fine.
I'm Fabrizio Ferandier. I'm from Politecnico of Milan. It's a university. This is open source software developed in our university and together with other peoples. So this talk is related to FPGA design. And the idea is actually that FPGA could be very helpful in order to accelerate
some specific application, not for general kind of acceleration. But there are a lot of nice story about accelerating critical key application. Here we have a list of possible acceleration.
That acceleration range from 2x to sometimes 100x. The nice things about FPGA is that you have acceleration, but you are able even to control the power consumption. So power is not an issue even if you accelerate very evenly.
You have sometimes, for example, for Monte Carlo simulation, 800 to the faster simulation and 45 more efficient in terms of energy and so on and so forth. So this is the first element of the talk. The second element of the talk is
related to how to program these kind of things. I mean, it's not easy. And in the past, there are several efforts from handmade design that is something that currently works even now to some automated kind of things. I mean, in the past and currently, we
have investigated how to automatically translate, for example, a behavioral specification down to an RTL description. I mean, my limit in these talks will be something that could be synthesized
by standard RTL tools, like Xilinx IC, or Vivador, or Quartus RTR, and so on and so forth. But even that kind of thing is not easy. So in the past, there are several languages that are used as a way to express the behavior.
We moved from, in the past, there was some effort in order to synthesize behavioral VHDL down to RTL VHDL, for example. But that kind of thing does not work since actually you are playing the same game of the other designer providing another language or another extension
of the language in order to do the same things that a standard designer usually does. So recently, the kind of specification used in order to automate this kind of process have been moved to something somehow totally different.
I mean, instead of using our description language, recently we moved to softer language, I mean, like C, C++, Java, for example. There are even, for example, around MIT and some company, there is even some effort in order to do high level synthesis starting,
for example, from Argo specific language, like maybe it could be BlueSpec. So but anyway, restricted to a description based on software programming language, in this talk, we mainly consider C function. The idea is to generate for each of the function
you have in your description, a controller and a data path in other words. So you have the controller, the elaboration unit to describe in RTL at the end of the process. And that description should be a synthesizer, very log over VHDL kind of things. Technology could be, usually these kinds of things
target ASIC or FPGA, but I mean, I think that recently, the FPGA target seems to be a more available solution to this problem. As usual, this kind of thing seems too easy, but actually, designing hardware is not very easy.
I mean, you need maybe a PhD or somehow a very high skill in order to do that kind of things. Maybe you need such kind of things even for software, but anyway, it seems that exploiting hardware description, sorry, software programming language
could be a viable solution in order to implement in hardware some key kernels. And actually, it should be helpful in term of increasing the productivity of the designer. And the other kind of things that could be interesting is that actually, the idea is to not to actually
need all the skills that other designer usually need. And you just need to know C and more or less, that's it, it's not true. But anyway, that is the aim. So that is actually a very nice thing, but what happened at the end, so which is the quality of this kind of tools
is getting better. That's something that is getting better as the time pass. Usually, it's worse than handmade RTL design. But I mean, I think it's even true in software, if you write a similar code,
you usually are able to do better than any kind of C or high-level languages for software programming. But that is one side of the problem. And the other kind of nice thing is that usually it's better than software programming.
So if you have a microprocessor or microcontroller, and you compare the results of the high-level synthesis, usually high-level synthesis win. So that is another nice thing of the HLS. First things about these talks is related to GCC. Why GCC could came in this picture?
GCC is a compiler to start from a C, it has several front-end, it supports several language. When we try to perform this kind of process to automate the translation from an high-level description to RTL design, we discover actually that
there are a lot of things in common with compiler infrastructure. So we start to study GCC in 2004, so this kind of project is 10 years old, and we discover that actually we may exploit more or less the same intermediate representation exploited by GCC, and so we study
that kind of representation and we extract such kind of representation because at that level of GCC, so the intermediate one, the middle one, where actually all the intermediate transformation are performed, we exploit in such intermediate
representation, we may actually exploit all the standard optimization techniques performing in a standard compiler like GCC, even in some advanced one, and that kind of things. So we develop a plugin that express such intermediate representation and serialize it in a file, and then we build up
over this intermediate representation all the things needed in order to optimize and generate the hardware starting from then. I mean, there is function allocation sharing, memory allocation, in hardware you need to perform some analysis in order to understand the bit size
of the wires and some stuff like that, model allocation, register allocation, and that kind of things generate a controller and data path. At the end, we have a single tool, a command line tool that is able to start from, okay, start from, it's able to start from a C description
and generate VHDL and, or very long. This is the list of feature, this is support, more or less ANSI support, ANSI support is more or less complete. Obviously, a recursive function is not so easy to support. We support more or less GCC from 4.5 to 4.9.
There is a lot of distribution we support from Ubuntu to Fedora. And there is a rich set, a component already developed that perform more or less all the basic functionality you may have at lower level internally in GCC
from addition, subtraction, supporting of floating point whatever you will have in this kind of things. All these kind of things are described in open source, so we have an XML description that could be easily extendable.
There is a support for verification, automatic generation of test bench. We exploit two free software project in order to perform such simulation. In particular, we exploit IKAROS Verilog and Verilator. And also, we support some commercial tool
like ModelSim, iSim, or XSim from Xilinx. We have a larger regression test as any compiler should have. This is a large set taken from academic side and even from GCC. Support for synthesis, more or less the list
is almost complete in terms of tools supported. And we are even currently considering even some open source project. In particular, there is the other project that could be very nice in order to perform the synthesis more or less till the end. We missed the last step, I think,
but we are not so far in order to be able to program an FPGA. Some case studies. So, three example, Ketchak is a crypto core. It was the winner of the SHA-3 competition
and performed some times ago. We take the C description and we compare with the VHDL handmade developed by the winner of the contest. So, I usually control the Cs more easy and made the event start to write the VHDL.
So, that kind of things was passed to Bamboo, the tool we developed. And we actually are able to obtain better performance losing some area. I mean, if you look to the lookup table, I mean, it's not so comparable in terms of area. In terms of performance, it's not so bad.
The second example is a nice kind of things that one of my students does, is writing some rectangular circular on a VGA screen, just exploiting some C primitives and having integrated that with a standard core.
Finally, we have even tried to synthesize and see which kind of C support we are able to have. And we start from an open source project developed at CERN with two other partners. This is pretty large kind of project,
but that kind of project, we was able to synthesize and we are able to fit that kind of things in a single zinc board. That's it. We are keen to cooperation, integration, we have the kind of tools, questions, comments and whatever.
It's very difficult to say so many words in 50 minutes.
No, pointer is not a problem. Sorry, so the question is, one of the problem of synthesis of program of specification based on C or whatever, is that C thinks to have a shared memory there.
And so you have pointers, arithmetical pointers and whatever it is. We actually are able to deal with that kind of things since we have a modular location, we use what GCC does in order to figure out which is the size of the LN10 and so on and so forth. We have developed some units that are able to support
aligned and non-aligned access to the memory, more or less as processors does. So there is a bus where we put the address, the memory return and the data and so on and so forth. That kind of things works and works pretty efficient.
So the question is, there is something that, maybe I miss, maybe you could elaborate a little bit more, but the question is related to that there are some inconsistent between the two languages.
So the idea is to reverse engineering the VHDL in C
and then actually have something that could be re-targeted to any kind of hardware and so on and so forth. I mean, it's a matter of designer fantasy. I mean, I did such kind of things, for example, for floating point course. So I start sometimes from VHDL description of available of floating point units
and then try to rewrite everything in C. Actually, most of the time you have to deal with bit size but that kind of things could be controlled in C through masking and saying that kind of thing is not greater than that and so on and so forth.
Compiler pretty clever in that kind of things since, for example, GCC4.9, it starts to support value range and so we just have to have the bit value analysis and so on and so forth. So actually, there are things that could be not easily ported to C.
I'm thinking maybe to when you have things concurrent and stuff like that but I mean, it's another option. I mean, if you are hardware design and as I was most of the time, sometimes it's better to start in hardware design
but I'm trying to do that kind of things that, is try to involve people that does not have all the skill usually you have an hardware design have in order to build an FPGA. Just giving exposing to a tool and see.
I mean, that kind of things at the beginning is not optimal but I mean, it's a trade off.
So the question is about recursive function. Recursive function, there are solution about that. Solution are currently are not implemented in the tools and solution concern the allocation of a memory and so you have a stack and actually during the synthesis is try to mimic what the processor does.
Building the stack and the parameter allocation you usually have with a recursion. It's even true that at least till now, I do not see any embedded things or high performance things that could not be translated in iterative way.
GCC for example, I have optimization that automatically translate that kind of thing that could be translate from a recursive version to iterative one. For example, factorial could be an example.