SDCC Small Device C Compiler
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Alternativer Titel |
| |
Serientitel | ||
Anzahl der Teile | 150 | |
Autor | ||
Lizenz | CC-Namensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/34366 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache | ||
Produktionsjahr | 2015 |
Inhaltliche Metadaten
Fachgebiet | |
Genre |
00:00
StandardabweichungArchitektur <Informatik>CompilerGegenständliche BenutzeroberflächeMereologieImplementierungBinder <Informatik>AssemblerSimulationSystemprogrammierungBetriebsmittelverwaltungTypentheorieCodeDatenstrukturParametersystemGanze ZahlZellularer AutomatMaskierung <Informatik>Regulärer Ausdruck <Textverarbeitung>UnicodeNP-hartes ProblemProgrammfehlerCliquenweiteBitInnerer PunktStandardabweichungMultiplikationsoperatorDelisches ProblemDickeArray <Informatik>LoopPunktCompilerÜbersetzer <Informatik>ImplementierungZeiger <Informatik>Generator <Informatik>VollständigkeitMessage-PassingDeklarative ProgrammierspracheDatenstrukturEliminationsverfahrenPhysikalisches SystemFormale SpracheProgrammierungGanze ZahlTypentheorieGewicht <Ausgleichsrechnung>CodeFunktionalVariableMereologieComputersimulationCASE <Informatik>Globale OptimierungAdressraumHydrostatikTermAdditionAusnahmebehandlungComputerarchitekturLaufzeitfehlerDifferenteHalbleiterspeicherGrenzschichtablösungFreewareCoxeter-GruppeArithmetischer AusdruckNormalvektorUnendlichkeitMinkowski-MetrikEin-AusgabeRechenschieberROM <Informatik>Metropolitan area networkAggregatzustandSoftwareZahlenbereichSoftwareentwicklerSkalarfeldTelekommunikationVarianzZellularer AutomatGeneigte EbeneAnalysisProgrammierumgebungKonfiguration <Informatik>p-BlockSoftwaretestInformationDämpfungTeilbarkeitMusterspracheTemplateClique <Graphentheorie>Bildgebendes VerfahrenZusammenhängender GraphObjekt <Kategorie>Endliche ModelltheorieFlächeninhaltElektronische PublikationStörungstheorieBildschirmfensterÄhnlichkeitsgeometrieTaskKoordinatenOrtsoperatorProdukt <Mathematik>XMLProgramm/Quellcode
07:45
BetriebsmittelverwaltungGlobale OptimierungSpezialrechnerProgrammbibliothekLokales MinimumMenütechnikMagnettrommelspeicherBitSyntaktische AnalyseAbstrakter SyntaxbaumCodeStochastische AbhängigkeitCodegenerierungVariableCompilerROM <Informatik>Architektur <Informatik>StandardabweichungURLKomplex <Algebra>InformationTermGlobale OptimierungSchnittmengeTeilbarkeitSichtenkonzeptFunktionalGeradeBitÜbersetzer <Informatik>DatenflussProgrammierumgebungCodePräprozessorKonfiguration <Informatik>DefaultMomentenproblemAdditionCompilerProgrammierungComputerspielFlächeninhaltEnergiedichteEin-AusgabeOrdnung <Mathematik>Kontextbezogenes SystemDifferenteProgrammbibliothekMikroelektronikStandardabweichungGruppenoperationMultiplikationsoperatorComputersimulationMAPVektorraumPaarvergleichStochastische AbhängigkeitGarbentheoriePhysikalischer EffektComputerarchitekturProgrammfehlerMereologieAbstrakter SyntaxbaumSystemaufrufGraphHalbleiterspeicherKlassische PhysikWort <Informatik>Zentrische StreckungReelle ZahlAssemblerTropfenBetriebsmittelverwaltungParametersystemAbstrakte SyntaxByte-CodeLoopRechenschieberCodierungVariableDerivation <Algebra>Front-End <Software>Overhead <Kommunikationstechnik>Generator <Informatik>HyperbelverfahrenVersionsverwaltungFreewareFluss <Mathematik>SchlussregelSpeicherabzugKeller <Informatik>BenchmarkComputeranimation
15:24
AliasingTypentheorieDifferenteDatenstrukturGraphBetriebsmittelverwaltungPhysikalische TheoriePolynomGlobale OptimierungFunktion <Mathematik>CodeArchitektur <Informatik>Inklusion <Mathematik>InformationsspeicherungVariableROM <Informatik>MAPDokumentenserverLineare RegressionSoftwaretestCompilerSimulationStandardabweichungFreewareOrtsoperatorMultiplikationsoperatorHalbleiterspeicherDokumentenserverPartikelsystemEnergiedichteURLComputerarchitekturPhysikalisches SystemSoftwaretestPlastikkarteComputersimulationHelmholtz-ZerlegungGlobale OptimierungSummengleichungQuadratzahlGraphComputerspielQuaderCodeGeradeCompilerPolynomRechter WinkelMereologieDatenflussFormale SpracheSchaltnetzSoundverarbeitungFormale SemantikZahlenbereichArithmetisches MittelLaufzeitfehlerOffice-PaketPhysikalische TheorieKostenfunktionNP-hartes ProblemOrdnung <Mathematik>Automatische IndexierungHyperbelverfahrenAggregatzustandTermDatenverwaltungLogischer SchlussQuellcodeMailing-ListeFunktionalGenerator <Informatik>Leistung <Physik>Spiegelung <Mathematik>DifferenteMaßerweiterungBitKreisflächeSchätzfunktionTypentheorieProfil <Aerodynamik>CachingGüte der AnpassungVariableMAPBetriebsmittelverwaltungLineare RegressionExponentKontrollstrukturProgrammierspracheSchnittmengeÜbersetzer <Informatik>E-MailProjektive EbeneCASE <Informatik>ProgrammierungResultanteDatenstrukturFreewareGebundener ZustandTopologieRISCProgrammfehlerProgramm/Quellcode
23:04
GoogolComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:07
OK, so welcome to the talk on the small device C compiler. I'll start with a quick introduction to SDCC, then talk a bit about the things that are important for the users,
00:21
like standard compliance and the targets with the part. Then I'll go on about optimizations. And a particular interesting optimization is our register locator, a short overview on the project. OK, so what is a small device C compiler?
00:40
Well, it's a standard C compiler supporting various C standards. 99 and 2011 standards are the latest. It's mostly meant as a freestanding implementation, but it could also be used as part of hosted implementation. So for hosted implementation, it
01:01
would be a little bit more incomplete. There's also supporting tools included, such as a sampler, linker, and simulators for the target architectures. It works on plenty of host systems, such as GNU, Linux, Windows, Mac OS, and various other unices. And it targets quite some 8-bit architectures,
01:22
the MCS51, the DS390 variant, the Z80, and various Z80 variants, the Freescale HC08 and SC08, STMicroelectronics STM8. And there's also some support for the microchip PIC targets.
01:41
It does have some unusual optimizations that make sense for these targets and make sense for embedded software, where the programs are not very big, that you don't find in general compilers at GCC or LLVM. OK, let's go on to the standard compliance.
02:02
Well, the C standards, they're not as strict as some other language standards. They allow the implementation some freedom. And we can use that to get efficient code for those small targets. For example, the weights of integer, pointer types, and so on.
02:21
There's some freedom in there. Floats don't have to be IEEE compliant. And they have some features that are particularly useful for optimizations. I'll go on about that a little bit in the following standard slides, the things that we can use. So SDCC tries to generate useful code, efficient code,
02:41
and try to be standard compliant. Sometimes those three targets conflict a bit, but I think we always found a reasonable compromise between them. OK, this is the oldest standard, the ESO-C90 standard. It's nearly completely supported in SDCC now. The two big things that are missing are double.
03:03
I mean, if there's a double, we give out the warning and treat it like a float. And structures and unions can't be assigned, passes parameters, or returned yet. Yes?
03:20
Yes? No. Basically, it's just that no one bothered to implement it yet. I mean, double is something rarely used on those small systems. So it was not the top priority to implement it. And the structures and unions, it's more likely for historic reasons. I mean, many early C compilers apparently didn't support it. If someone wants to implement it, of course, they're welcome.
03:42
For now, you have to assign them via memcpy. OK, so there's a 99 standard. We're also mostly there. But there's bigger gaps. There's intermingling of declarations and code is not there. So declarations have to be at the beginning of a block. We don't support long double. Well, not surprising if we don't support double.
04:03
Compound literals are not yet supported. And variable length arrays are not supported yet either. Notable features are we have those new integer types with their fixed widths and who in least 80, fast 80, and so on, which I think are quite useful for programming embedded systems.
04:21
Because you don't want to waste registers by using an int as a loop counter if an ooh and fast 80 would do. Although we have support for bool, which in some cases can be used for generating more efficient code than if people would use an int as a flag.
04:41
OK, now there's either C11. There's still large gaps, in particular, nearly everything on the previous slide, except for variable length arrays, which are optional in C11. We don't have type-generated expressions. For those who don't know, it's something similar to C++ templates that was introduced in C. And we don't,
05:01
unfortunately, we don't have the improved Unicode support yet. What we support is the no return and static assertions. Now, no return is particularly useful because it allows optimizations in terms of additional dead code elimination or skipping some of the function cleanup code.
05:22
So if you have a function that does not return, use no return until you are likely to get more efficient code, I mean that does not return the normal way. If it returns like per long jump, you can also put the no return there to tell the compiler we don't return the normal way. If you have a function that's not meant to return at all
05:42
because you have an infinite loop in there, you can also use no return. And there are static assertions, which are kind of useful because normal assertions are expensive. They go into the code of the program. They're done at runtime. You don't want that on a very restricted system where you have very little memory.
06:02
But the static assertions that C11 introduced are checked at compile time and can be quite useful, I think, on these architectures. OK, next, a bit on the targets we support. Now, our oldest target is a MCS51. It's quite stable.
06:20
It doesn't have all the fancy features present in some of the other targets, but it works OK. The architecture is quite hard for a target for C. On one hand, you have this tiny stack of, depending on what exactly is, between 80 and maybe 180 bytes. It's hard to access.
06:42
And the address space is quite divided. There's a separate address space for IOs that I didn't even put on the slide. But basically, there's a normal memory. We have this code memory, which is kind of read-only with 64 kilobytes. And we have these different types of data memory.
07:01
And they have to be kind of accessed using different instructions, which means if you have a normal pointer, it's not just dereferencing the pointer. But the pointer has to carry some tag to tell in which of these spaces it is. And it makes it quite expensive. But we have keywords for these named address spaces
07:21
if you know where your pointer points to tell the compiler to immediately generate the instruction. OK, now those are those big targets. Well, those big targets are in development. So they are not as feature complete and as bug free as the others yet. And they have been so for a long, long time.
07:41
The architectures are even harder in targeting. See, nowadays, the PIC 16s or the PICs with 16-bit long instructions are nearly there. The PIC 14s or the 14-bit long instruction words are still far from the other targets. For small programs, it works. If you write a bigger program, you
08:01
will most likely run into compiler bugs if you're targeting the PIC 14. OK, so let's go on to the free scale HD08 and S08. One notable thing is this target has this new great optimal register locator that we'll talk about later.
08:20
And otherwise, it's quite feature complete. And this is a bit of the history on the code size. The yellow line is a code size of the HD08 on a small benchmark that we used. The big drop in code size after the 3.10 release, that's the introduction of the new register locator.
08:43
And this bluish line down here is the code size for the S08 target, which is a newer version of the S08. Now, these two lines down here are commercial compilers. As you see, the scale does not start at zero. So yeah, code size is still a little bit bigger
09:03
than the commercial compilers, but it's just about 30% or so. So we're kind of getting there. And compared to what it used to be before the 3.1 and earlier releases, we made a lot of way in terms of what we achieved here.
09:21
OK, this is a Z80 and related ones. It also has optimal register locator. Now, the Z80 is like the real classic CISC architecture with some weird complex instructions. These are currently only used for library functions, either in the library function or standard library. Or if you put a call to a library function,
09:43
the compiler might often can optimize this out and put one of the complex instructions instead. For example, if you put a memcpy with a constant third parameter for the amount of bytes to copy, SDCC will just emit an LDIR instruction for the Z80. If you decide to write your own memcpy by a copy loop,
10:02
well, then we emit much less efficient code because there will be a copy loop instead of the LDIR. OK, now what happened here historically in terms of code size? Now, the red line is the code size with standard options. So I'm not optimizing for anything in particular.
10:23
We see this drop and then going up again. The drop is the introduction of the new register allocators. Going up again is just changing some defaults in terms of a compiler run time. There was code size trade off. We also see the green line. That's the compilation time. We see it jumps up at the introduction of the new register allocator.
10:41
And when we changed the defaults, it went down a bit again. There we introduced some optimizations. But an interesting thing is also this blue line because that's a Z80 code size with strong optimization for code size. The green lines are the respective one
11:00
for the rabbit, which is a Z80 derivative. Again, strong optimization for code size default options. Now, this part is already doing quite well compared to all the other. I mean, for Z80, there's quite some C compilers out there. The IAR compiler, a non-free compiler,
11:21
still generates more compact code. Again, you see it doesn't start at zero here, so the difference is not that big. But all the other compilers, I mean, all these other four straight lines, other compilers targeting the Z80 or Z80 derivative, generate bigger codes than SDCC does at the moment.
11:41
And as far as I know, this is IAR compiler. It's non-free, and they're not handing out licenses anymore anyway. So SDCC is the best compiler you can get for these targets at the moment. Yeah. Yeah, this yellow line is a non-free compiler
12:00
targeting the rabbit, so you have to compare with this line instead of those up there. So better there as well. OK. The latest target added in SDCC is the SDM8 for SD microelectronics and 8-bit architecture. This one is the first one written from the start with a new optimal register
12:21
locator in mind. And it's the only one that has byte-wise register location. Again, a bit more on that later. Both make code generation rather efficient. Well, this is a bit of a smaller graph because it was introduced just before the 3 for 0 release. The bluish line is, again, standard options.
12:42
The yellow line is strong optimizations for code size. These two lines down here are two non-free compilers targeting the same architecture. And we can see that even so, this is just a very recently introduced backend. And there's still quite some more optimizations we could do.
13:02
We're already within 30% of the, like our code size overhead is only 30% compared to these non-free compilers now. So I guess if someone manages to work half a year on this thing full time, it will probably be better in terms of code size.
13:25
OK, now I've been mentioning the register locator quite a lot. Before we jump into the register locator, a quick slide on the flow of compilation. Now what happens if you feed some code into their small device C compiler? Well, first there's a preprocessor not mentioned.
13:42
Then it's passed into an abstract syntax tree. The abstract syntax tree is converted to an intermediate code. On that, we do target independent optimizations, target dependent optimizations, register location, generate assembly code. And in the end, there's a people optimizer. Now SDCC can, for debugging or for finding out
14:03
what could make your code more efficient, dump a lot of intermediate information. We have a compiler flux to give you all the information out and even put it as comments into the resulting C file. For the people optimizer, you can also add your own custom people rules if you want.
14:21
OK, now let's have a slightly more detailed look. It's a register locator. OK, what's register location? Well, we have programs. By programs, we use variables in there. In particular, we use local variables in our C code. And they have to be stored somewhere.
14:42
Well, the compiler has to decide where to store them. Basically, we have registers. We have memory. Memory, for most of our targets, means somewhere on the stack. For some other targets, by default, we do other things because stack access is inefficient. But in general, it's registers versus memory. Now registers are fast to access.
15:02
On those embedded targets, the speed difference between registers and memory is much less than it is on big architectures. But it still makes a big difference in code size. Instruction access in a register is typically much shorter than when accessing memory. So where to put them is very important for code quality.
15:21
Of course, when things are in registers, that is better. But we don't have, typically, enough registers to put everything in registers. So we have to decide what to put into registers and where to put it. Because unlike those RISC architectures that are the popular thing these days, I mean, even x86 is kind of moving towards RISC-style
15:45
things with these SSE extensions and so on, where the registers are all the same. So on our targets, typically, the situation is totally different. We have different types of registers. Some instructions are only available for some registers. Even if the same instructions are
16:02
available with different registers, we have register preferences. It might be different in code size or execution time, depending on which register you use. Here, for example, the SDM8, it has a 16-bit register x and a 16-bit register y. And most instructions, if they access a 16-bit register y,
16:22
are one byte longer, even though they have the same execution time here. And on other architectures, it's sometimes the other way around. Then there's this so-called register aliasing thing. Basically, you have some 16-bit registers that could also be treated as two 8-bit registers. And you have to make sure that you don't
16:41
use those at the same time. There's coalescing, which basically means if you have an assignment, and the write operand is not used after that assignment, you could just allocate both into the same register, and you wouldn't have to emit any code for the assignment at all. Then there's rematerialization, which basically
17:01
means you have a variable that you could allocate into a register or into memory, but you also could recalculate the value of the variable as needed. And sometimes memory access are expensive. So if you put it in a register, OK, it's worth it. But it might be cheaper to recalculate it than storing it into memory.
17:22
So all these complicated things have to be considered by a register allocator. So fortunately, we do have optimal register allocation in polynomial time, which no other compiler has. This is based on graph structure theory, in particular tree decompositions,
17:43
which I don't have time to explain here. For practical purposes, this means that for the polynomial runtime, you need a bound on the tree bits of the control flow graph. And for the programming language C, that means don't use too many labels for go-tos per function.
18:01
If the number of labels for a go-to per function in your code is low, then everything is fine, and we can do things optimally in polynomial time. It's flexible through the use of a cost function. Currently, the only cost function implemented is code size. But that means register allocator will find the register allocation that
18:22
results in the smallest possible code size after code generation. Of course, we don't know what the people optimizer will do later. But in terms of what comes out of the code generation stage before the people optimizer, we can be optimal, currently, in terms of code size.
18:41
And if we get around to implement it, we could also use it for speed. If we have data from a profiler with execution probabilities for the individual instructions, we could also be optimal in terms of speed, or we could become optimal in terms of energy consumption or other such targets. Well, it's slow for architectures
19:02
with many registers. In the theoretical runtime bounds, the number of registers is in the exponent of the runtime of the register allocator. That makes it infeasible for those architectures with many registers. But our 8-bit targets often have few registers, and it's fine. There's also theoretical hardness results that basically says, we can't get rid of the.
19:23
If we want to do it optimally, we can't get rid of the number of registers in the exponent. OK. Now, because it can still be quite slow, we have a compiler switch for compilation speed code quality trade-off. Essentially, this max-alloc-parameter
19:42
tells the compiler how much memory it is allowed to use in register allocation and to other optimizations based on graph structure theory that I didn't mention. And this also affects the runtime. Set this thing to a high value, and you get more optimization. Set it to a really high value, and it's provably optimal.
20:02
Set it to a small value, and the compiler will be fast, but the code will be bad. OK. So much for the optimal register allocation. Now for the byte-wise register allocation and spilling, the things that only the SDM8 port has now. Traditionally, a compiler sees a variable and decides, OK, either it goes into a register
20:21
or it goes into memory. What we do here is, there's this variable. And for every single byte of that variable, we decide where should it go. Into the memory, which in this case means on the stack, or in some byte of some register. Here, that's a really flexible approach.
20:41
Again, being so flexible makes the compiler slower. But if your programs are small, and you don't have a lot of memory on the target, it's worth it. OK. Of course, we also have a bunch of other optimizations suitable for these small targets,
21:01
but I can't talk about everything now. So the SDCC project. Well, it's a bit traditional. Yeah, we are at SourceForge. We have our bucket tracker, feature request trackers. We have mailing lists. We have a subversion repository. We have a wiki. So not the new and fancy stuff, but so far it
21:23
works well for us. We also do regression testing to ensure that we have a good quality compiler and we don't have too many bugs. Every night, a snapshot from the subversion is compiled for various host architectures.
21:42
We compile about 7,000 regression tests and execute them on simulators. The tests come from different sources. Some of them have been written by the people implementing some feature, but most of them are regression tests for bugs that used to be there or that we took from GCC, which kind of helps
22:04
reassure users a bit if your code seems to be working OK on GCC, then probably it will work OK on SDCC as well within the bounds of the target architectures. Of course, if your target doesn't have more than a kilobyte of memory, you can't use it. We do it for all the target architectures.
22:22
We do it on various host systems and host architectures. OK, to conclude, yeah, the 8-bit architectures, not only are the 8-bit architectures alive, I mean, they use very little energy. They are cheap. For things like smart dust, they are a great source. And but we also have a free C compiler for them.
22:43
It aims for standard compliance and appropriate optimizations. That's it. No time for questions, but you can ask me, maybe,
23:02
outside or something.