Synthesizing gateware with GCC
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 150 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/34353 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2015 |
Content Metadata
Subject Area | |
Genre |
FOSDEM 2015136 / 150
3
5
9
11
13
14
15
16
17
18
19
20
24
26
27
28
29
30
37
41
42
44
48
50
54
62
64
65
67
68
69
71
75
78
79
82
83
84
85
86
87
88
89
90
94
96
97
103
104
105
107
108
113
114
115
117
118
121
122
123
125
126
127
129
130
131
133
136
137
138
141
142
147
148
149
150
00:00
Field programmable gate arraySoftwareComputer hardwareFood energySimulationDiscrete element methodPhysical systemLinear mapComputer configurationService (economics)Data centerFreewareLogic synthesisLevel (video gaming)CuboidFrame problemProcess (computing)Java appletoutputFunction (mathematics)Standard deviationCodeSicIndependence (probability theory)Mathematical optimizationTuring testHeat transferGeneric programmingEmulatorLogical constantInformationAliasingPartial derivativeGaussian eliminationRegulärer Ausdruck <Textverarbeitung>Numbering schemeGrand Unified TheorySource codeBitComputer filePlug-in (computing)Mathematical analysisClique-widthDifferent (Kate Ryan album)Line (geometry)CompilerRead-only memoryModule (mathematics)Game controllerOpen sourcePoint (geometry)Pointer (computer programming)Image resolutionAerodynamicsCompilerException handlingSimulationSoftwareUniform resource locatorMusical ensembleOrder (biology)DemosceneObservational studyPresentation of a groupHardware description languageSpeicheradresseProgramming languageTranslation (relic)Multiplication signDescriptive statisticsOpen sourceLevel (video gaming)ResultantFormal languageFront and back endsMicrocontrollerMicroprocessorParameter (computer programming)Computer programmingExtension (kinesiology)Key (cryptography)Numbering schemeAdhesionGame theoryLatent heatMereologySpeech synthesisLimit (category theory)HypermediaSkewnessElement (mathematics)Buffer solutionMathematical optimizationStandard deviationMathematical analysisCompilerElectronic mailing listReading (process)Cartesian coordinate systemPower (physics)Term (mathematics)Representation (politics)Transformation (genetics)Logic synthesisProcess (computing)Pairwise comparisonGame controllerFood energyException handlingFunctional (mathematics)Universe (mathematics)WeightIntermediate languageHigh-level programming languageBitExecution unitMessage passingRange (statistics)Electric generatorWordComputer hardwarePlug-in (computing)Product (business)Semiconductor memoryResource allocationProjective planeKernel (computing)Endliche ModelltheorieSerial portFlow separationComputer animation
09:04
Library (computing)Discrete element methodFreewareModul <Datentyp>Point (geometry)Different (Kate Ryan album)Endliche ModelltheorieRead-only memoryMilitary operationMereologyFunction (mathematics)Crash (computing)SimulationCodeDirected setHardware description languageComputer hardwareSoftware testingLinear regressionSuite (music)CoprocessorPulse (signal processing)Adaptive behaviorDifferential (mechanical device)Physical systemLinear mapMathematical analysisMobile WebComputer-generated imageryAdvanced Encryption StandardData Encryption StandardAlgorithmHash functionCompilerLogic synthesisScripting languageField programmable gate arraySicOpen sourceRhombusDisintegrationVertex (graph theory)CryptographyCore dumpTouchscreenSoftware developerCommunications protocolLevel (video gaming)Frame problemSource codeSpeichermodellProjective planeOrder (biology)Resource allocationShared memoryCASE <Informatik>Reverse engineeringCoprocessorDistribution (mathematics)Range (statistics)Logic synthesisConcurrency (computer science)Multiplication signFunctional (mathematics)Core dumpRecursionWhiteboardConnectivity (graph theory)WordSemiconductor memoryCompilerPrimitive (album)RewritingElectronic mailing listSimulationStudent's t-testLinear regressionSet (mathematics)Point (geometry)Open sourceLevel (video gaming)Term (mathematics)Single-precision floating-point formatElectric generatorSoftwareMathematical analysisComputer configurationFreewareBitAreaExecution unitFactory (trading post)Bus (computing)Event horizonDescriptive statisticsSoftware testingRevision controlParameter (computer programming)Computer hardwarePointer (computer programming)Latent heatObservational studyComputer programmingMathematical optimizationFormal languageProcess (computing)Address spaceEndliche ModelltheorieCurvatureNumberControl flowPairwise comparisonMoving averageTable (information)TouchscreenUniform resource locatorINTEGRALDifferenz <Mathematik>Similarity (geometry)Physical systemGame controllerAnalogySymbol tableInversion (music)AdditionMenu (computing)Form (programming)Software developerPiProduct (business)WeightBuildingRectangleFloating pointSima (architecture)Computer animation
18:02
GoogolComputer animation
Transcript: English(auto-generated)
00:13
OK, so I can start. So discussion later. OK, fine.
00:20
I'm Fabrizio Ferandier. I'm from Politecnico of Milan. It's a university. This is open source software developed in our university and together with other peoples. So this talk is related to FPGA design. And the idea is actually that FPGA could be very helpful in order to accelerate
00:44
some specific application, not for general kind of acceleration. But there are a lot of nice story about accelerating critical key application. Here we have a list of possible acceleration.
01:01
That acceleration range from 2x to sometimes 100x. The nice things about FPGA is that you have acceleration, but you are able even to control the power consumption. So power is not an issue even if you accelerate very evenly.
01:24
You have sometimes, for example, for Monte Carlo simulation, 800 to the faster simulation and 45 more efficient in terms of energy and so on and so forth. So this is the first element of the talk. The second element of the talk is
01:41
related to how to program these kind of things. I mean, it's not easy. And in the past, there are several efforts from handmade design that is something that currently works even now to some automated kind of things. I mean, in the past and currently, we
02:02
have investigated how to automatically translate, for example, a behavioral specification down to an RTL description. I mean, my limit in these talks will be something that could be synthesized
02:22
by standard RTL tools, like Xilinx IC, or Vivador, or Quartus RTR, and so on and so forth. But even that kind of thing is not easy. So in the past, there are several languages that are used as a way to express the behavior.
02:43
We moved from, in the past, there was some effort in order to synthesize behavioral VHDL down to RTL VHDL, for example. But that kind of thing does not work since actually you are playing the same game of the other designer providing another language or another extension
03:02
of the language in order to do the same things that a standard designer usually does. So recently, the kind of specification used in order to automate this kind of process have been moved to something somehow totally different.
03:21
I mean, instead of using our description language, recently we moved to softer language, I mean, like C, C++, Java, for example. There are even, for example, around MIT and some company, there is even some effort in order to do high level synthesis starting,
03:42
for example, from Argo specific language, like maybe it could be BlueSpec. So but anyway, restricted to a description based on software programming language, in this talk, we mainly consider C function. The idea is to generate for each of the function
04:01
you have in your description, a controller and a data path in other words. So you have the controller, the elaboration unit to describe in RTL at the end of the process. And that description should be a synthesizer, very log over VHDL kind of things. Technology could be, usually these kinds of things
04:21
target ASIC or FPGA, but I mean, I think that recently, the FPGA target seems to be a more available solution to this problem. As usual, this kind of thing seems too easy, but actually, designing hardware is not very easy.
04:42
I mean, you need maybe a PhD or somehow a very high skill in order to do that kind of things. Maybe you need such kind of things even for software, but anyway, it seems that exploiting hardware description, sorry, software programming language
05:01
could be a viable solution in order to implement in hardware some key kernels. And actually, it should be helpful in term of increasing the productivity of the designer. And the other kind of things that could be interesting is that actually, the idea is to not to actually
05:22
need all the skills that other designer usually need. And you just need to know C and more or less, that's it, it's not true. But anyway, that is the aim. So that is actually a very nice thing, but what happened at the end, so which is the quality of this kind of tools
05:43
is getting better. That's something that is getting better as the time pass. Usually, it's worse than handmade RTL design. But I mean, I think it's even true in software, if you write a similar code,
06:01
you usually are able to do better than any kind of C or high-level languages for software programming. But that is one side of the problem. And the other kind of nice thing is that usually it's better than software programming.
06:22
So if you have a microprocessor or microcontroller, and you compare the results of the high-level synthesis, usually high-level synthesis win. So that is another nice thing of the HLS. First things about these talks is related to GCC. Why GCC could came in this picture?
06:45
GCC is a compiler to start from a C, it has several front-end, it supports several language. When we try to perform this kind of process to automate the translation from an high-level description to RTL design, we discover actually that
07:03
there are a lot of things in common with compiler infrastructure. So we start to study GCC in 2004, so this kind of project is 10 years old, and we discover that actually we may exploit more or less the same intermediate representation exploited by GCC, and so we study
07:22
that kind of representation and we extract such kind of representation because at that level of GCC, so the intermediate one, the middle one, where actually all the intermediate transformation are performed, we exploit in such intermediate
07:40
representation, we may actually exploit all the standard optimization techniques performing in a standard compiler like GCC, even in some advanced one, and that kind of things. So we develop a plugin that express such intermediate representation and serialize it in a file, and then we build up
08:04
over this intermediate representation all the things needed in order to optimize and generate the hardware starting from then. I mean, there is function allocation sharing, memory allocation, in hardware you need to perform some analysis in order to understand the bit size
08:22
of the wires and some stuff like that, model allocation, register allocation, and that kind of things generate a controller and data path. At the end, we have a single tool, a command line tool that is able to start from, okay, start from, it's able to start from a C description
08:43
and generate VHDL and, or very long. This is the list of feature, this is support, more or less ANSI support, ANSI support is more or less complete. Obviously, a recursive function is not so easy to support. We support more or less GCC from 4.5 to 4.9.
09:05
There is a lot of distribution we support from Ubuntu to Fedora. And there is a rich set, a component already developed that perform more or less all the basic functionality you may have at lower level internally in GCC
09:23
from addition, subtraction, supporting of floating point whatever you will have in this kind of things. All these kind of things are described in open source, so we have an XML description that could be easily extendable.
09:43
There is a support for verification, automatic generation of test bench. We exploit two free software project in order to perform such simulation. In particular, we exploit IKAROS Verilog and Verilator. And also, we support some commercial tool
10:00
like ModelSim, iSim, or XSim from Xilinx. We have a larger regression test as any compiler should have. This is a large set taken from academic side and even from GCC. Support for synthesis, more or less the list
10:23
is almost complete in terms of tools supported. And we are even currently considering even some open source project. In particular, there is the other project that could be very nice in order to perform the synthesis more or less till the end. We missed the last step, I think,
10:41
but we are not so far in order to be able to program an FPGA. Some case studies. So, three example, Ketchak is a crypto core. It was the winner of the SHA-3 competition
11:00
and performed some times ago. We take the C description and we compare with the VHDL handmade developed by the winner of the contest. So, I usually control the Cs more easy and made the event start to write the VHDL.
11:20
So, that kind of things was passed to Bamboo, the tool we developed. And we actually are able to obtain better performance losing some area. I mean, if you look to the lookup table, I mean, it's not so comparable in terms of area. In terms of performance, it's not so bad.
11:41
The second example is a nice kind of things that one of my students does, is writing some rectangular circular on a VGA screen, just exploiting some C primitives and having integrated that with a standard core.
12:04
Finally, we have even tried to synthesize and see which kind of C support we are able to have. And we start from an open source project developed at CERN with two other partners. This is pretty large kind of project,
12:21
but that kind of project, we was able to synthesize and we are able to fit that kind of things in a single zinc board. That's it. We are keen to cooperation, integration, we have the kind of tools, questions, comments and whatever.
12:50
It's very difficult to say so many words in 50 minutes.
13:03
No, pointer is not a problem. Sorry, so the question is, one of the problem of synthesis of program of specification based on C or whatever, is that C thinks to have a shared memory there.
13:23
And so you have pointers, arithmetical pointers and whatever it is. We actually are able to deal with that kind of things since we have a modular location, we use what GCC does in order to figure out which is the size of the LN10 and so on and so forth. We have developed some units that are able to support
13:43
aligned and non-aligned access to the memory, more or less as processors does. So there is a bus where we put the address, the memory return and the data and so on and so forth. That kind of things works and works pretty efficient.
14:23
So the question is, there is something that, maybe I miss, maybe you could elaborate a little bit more, but the question is related to that there are some inconsistent between the two languages.
14:56
So the idea is to reverse engineering the VHDL in C
15:01
and then actually have something that could be re-targeted to any kind of hardware and so on and so forth. I mean, it's a matter of designer fantasy. I mean, I did such kind of things, for example, for floating point course. So I start sometimes from VHDL description of available of floating point units
15:23
and then try to rewrite everything in C. Actually, most of the time you have to deal with bit size but that kind of things could be controlled in C through masking and saying that kind of thing is not greater than that and so on and so forth.
15:40
Compiler pretty clever in that kind of things since, for example, GCC4.9, it starts to support value range and so we just have to have the bit value analysis and so on and so forth. So actually, there are things that could be not easily ported to C.
16:03
I'm thinking maybe to when you have things concurrent and stuff like that but I mean, it's another option. I mean, if you are hardware design and as I was most of the time, sometimes it's better to start in hardware design
16:24
but I'm trying to do that kind of things that, is try to involve people that does not have all the skill usually you have an hardware design have in order to build an FPGA. Just giving exposing to a tool and see.
16:45
I mean, that kind of things at the beginning is not optimal but I mean, it's a trade off.
17:02
So the question is about recursive function. Recursive function, there are solution about that. Solution are currently are not implemented in the tools and solution concern the allocation of a memory and so you have a stack and actually during the synthesis is try to mimic what the processor does.
17:23
Building the stack and the parameter allocation you usually have with a recursion. It's even true that at least till now, I do not see any embedded things or high performance things that could not be translated in iterative way.
17:43
GCC for example, I have optimization that automatically translate that kind of thing that could be translate from a recursive version to iterative one. For example, factorial could be an example.