FreeBSD and LLVM support
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/46996 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Physical systemExecution unitMenu (computing)Convex hullMaxima and minimaoutputUser interfaceInclusion mapAcoustic shadowEmailMemory managementSpacetimeCodeDataflowSource codeSinc functionFront and back endsRevision controlMultiplication signSemiconductor memoryFluid staticsRight angleComputer configurationSoftware bugConnectivity (graph theory)SoftwareAddress spaceBitMathematical analysisType theoryArithmetic progressionDrop (liquid)Acoustic shadowLeakComputer architectureVariable (mathematics)Buffer overflowRun time (program lifecycle phase)Parallel portMathematicsAbstract syntaxBlock (periodic table)Electronic visual displayBenchmarkCompilation albumCondition numberPhysical systemJust-in-Time-CompilerProduct (business)Flow separationError messageFunctional (mathematics)Fuzzy logicExpressionMemory managementExecution unitMereologyOpen sourceExtension (kinesiology)Symbol tableDifferent (Kate Ryan album)Library (computing)WebsiteSoftware testingCore dumpDemoscenePoint (geometry)MultilaterationNeuroinformatikData structurePerspective (visual)Message passingGame theoryCommutatorCovering spaceHand fanTheory of relativityInstance (computer science)AuthorizationPosition operatorParameter (computer programming)1 (number)Video gameProxy serverFreezingMachine visionComputer animation
08:16
Hill differential equationPrice indexThomas KuhnAcoustic shadowCondition numberLine (geometry)View (database)Pole (complex analysis)Lie groupSemiconductor memorySource code
09:12
Bit rateAddress spaceExponentiationDigitale VideotechnikSpacetimeEmailFront and back endsAddress spaceBuffer overflowSemiconductor memoryError messageDifferent (Kate Ryan album)FlagComputer animation
10:23
outputMaxima and minimaPoint (geometry)Pointer (computer programming)SpacetimeAddress spaceExponentiationLeakFormal languageConvex hullDrill commandsRead-only memoryRight angleComputer clusterCodeOrder (biology)Multiplication signSemiconductor memoryOscillation1 (number)File formatoutputFreezingQuantum stateModal logicData dictionarySystem callForcing (mathematics)Network socketSoftwareMereologyType theoryPlanningMatching (graph theory)Electronic visual displayConnectivity (graph theory)QuicksortComputer configurationLibrary (computing)Software testingCASE <Informatik>Fuzzy logicRandomizationMathematicsClient (computing)Medical imagingCartesian coordinate systemServer (computing)Software bugFitness functionConfiguration spaceBinary fileTransformation (genetics)Source codeXMLComputer animation
15:18
Pointer (computer programming)Point (geometry)Address spaceSpacetimeExponentiationLimit (category theory)Maxima and minimaWebsiteFunctional (mathematics)QuicksortRight angleSoftware testingImplementationInstance (computer science)Buffer overflowParameter (computer programming)Computer animation
16:07
Task (computing)State diagramPrice indexState of matterSpacetimeAddress spaceBinary fileComputer animation
16:49
Pointer (computer programming)Point (geometry)ExponentiationShift operatorTask (computing)Address spaceSpacetimeState of matterState diagramThread (computing)Error messageFuzzy logicExecution unitView (database)Type theoryComputer configurationComputer animation
17:39
View (database)Hill differential equationConvex hulloutputFunction (mathematics)Focus (optics)Limit (category theory)Default (computer science)Equivalence relationServer (computing)Fuzzy logicParsingParameter (computer programming)Resource allocationRead-only memoryCodeError messageTotal S.A.NumberBit rateSign (mathematics)FlagProcess (computing)Random numberMaxima and minimaLengthControl flowReduction of orderInstallable File SystemDrum memorySystem callSoftware testingSource codeASCIIData dictionaryParallel portStatisticsElement (mathematics)InformationCore dumpLeakMaß <Mathematik>MiniDiscExplosionTerm (mathematics)Density of statesLeakLengthMaxima and minimaRandomizationSemiconductor memoryProcess (computing)Computer animation
18:51
Core dumpFocus (optics)Function (mathematics)StatisticsResource allocationMaß <Mathematik>Term (mathematics)LeakLimit (category theory)Process (computing)ExplosionoutputSingle-precision floating-point formatWritingInformationSource codeSystem callASCIIData dictionaryElement (mathematics)Bus (computing)Intercept theoremInterior (topology)Read-only memoryComputer animation
19:13
Equivalence relationServer (computing)Parameter (computer programming)outputFocus (optics)Function (mathematics)System callLimit (category theory)Execution unitSoftware testingStack (abstract data type)HTTP cookieAcoustic shadowMemory managementPointer (computer programming)IntegerType theorySign (mathematics)Address spaceRight angleConvex hullMaxima and minimaLibrary (computing)Multitier architectureFiber bundleVotingIntrusion detection systemRun time (program lifecycle phase)Bus (computing)Patch (Unix)Quantum stateCone penetration testData bufferInstance (computer science)Real numberMereologyCausalityAreaRight angleDefault (computer science)CASE <Informatik>Computer fileOrder (biology)Internet forumCrash (computing)Presentation of a groupFunctional (mathematics)Multiplication signProcess (computing)Source codeBuffer overflowQuantum stateoutputArithmetic meanSoftwareBinary codeRun time (program lifecycle phase)Revision controlBenchmarkCodeHookingAttribute grammarCartesian coordinate systemConfiguration space
26:50
Computer fileGraph (mathematics)CodePerturbation theoryEmpennageRule of inferenceFormal languageMaxima and minimaFunction (mathematics)Data bufferAreaView (database)MereologyPoint (geometry)GradientGraph (mathematics)Functional (mathematics)Virtual machineOrder (biology)Electric generatorBinary codeCodeXMLComputer animation
28:37
Interior (topology)Maxima and minimaWechselseitige InformationSign (mathematics)Point cloudFacebookOpen sourceSource codeXML
Transcript: English(auto-generated)
00:06
Hello everyone. So, today we will go through LVM and its place within the free basic ecosystem. So, I'm David Carlier. I contribute to various open source projects, but more or less related to the business world.
00:26
Can be video games, can be enterprise-oriented software. More related to the topic of the day, I'm an LVM Committer since May 2018.
00:41
So, what is LVM? LVM is a compound of toolset and frontends. Frontends which are able to generate what we call LVM IR. IR stands for automated representations, kind of high-level assembly, but much more architecture-independent.
01:07
So, if we take as an example Clang or Clang++ frontends from your source code, the lecture will generate symbols which will be sent to the same R to generate the AST, which is abstract syntax tree,
01:24
and then to the code Gen to generate the LVM IR. So, with LVM, we have also tools. You can build nice just-in-time compilers. You can extend LVM itself.
01:44
For instance, you can make what we call module pass, which is an expression on the compilation unit perspective, and then function pass, basic block pass, and so on.
02:00
When you can do circum check-in, you can add some structure, remove some structure, if you like. It's the compilation times pass, right? You have, in addition, the possibility to do some static code analysis.
02:22
We have what we call sanitizers. We'll go through later. All of these are available in FreeBSD since 9.10-ish. First, it was just an option parallel to the old GCC 4.2.
02:41
It was first needed to replace the old 4.2, which was the last GPL 2 version. So, it was quite a blocker because, from this, you can't do C++11 and so forth. You can't do only C99. So, it was time to replace this for the system.
03:06
So, then it became a full part of the system since FreeBSD 10. I mean, it's used to build the kernel, the other one. Most of the ports.
03:23
So, yes, the FreeBSD code base needed a lot of changes to fit clang criteria. Yes, a lot of changes to fit more modern style and so on.
03:43
So, and then, as the time goes, more and more architecture was supported. AMD64, ARM, until now, maybe only SPARC64 remains a bit behind the rest, but that's already a nice progress.
04:05
So, yes, here we go. Sanitizers, what are these? It's not really to clean up things, unlike the name I say.
04:21
It's more to detect, at runtime, some type of bugs. So, it's kind of complete pretty much well the static code analysis part. So, sanitizer gives many different runtime libraries to detect some type of bug for memory, for rest condition, for several kinds of overflow.
04:49
For instance, we have memory sanitizer. It's mainly about unicellular variables. Address sanitizer. It's more for doubler-free.
05:04
It's like overflow. For only, also, in addition, the leak sanitizer, which is pretty effective and also have much less performance drop compared to tools like Valgrind, for example, which can be 20 times slower.
05:24
Whereas with address sanitizer, it's five times slower sometimes. So, yeah, it's pretty much. We have undefined behavior sanitizer. It's kind of small, swift knife sanitizer. That means there is no shadow memory mapping, unlike address sanitizer or memory sanitizer.
05:45
So, it was, for instance, possible to port it to OpenBSD for this reason. It's kind of small sanitizer. It's only for intergovernmental flow with aligned pointers, and the performance drop is pretty small compared to the right.
06:07
You can combine it with other sanitizer. Whereas memory sanitizer and address sanitizer, you can't use them at the same time. They are mutually exclusive. We have a nice waste condition detection called threat sanitizer.
06:28
So, all of them are supported by FreeBSD. In addition, we have components like deep-fuzzer to do some fuzzing and X-ray instrumentation to do some performance benchmarking.
06:42
Right? So, for example, this very basic code, address sanitizer is perfectly capable to catch the first error.
07:14
The double free is perfectly capable to catch it. So, use actual free as well.
07:28
As you can see, it detects the first heap overflow, as you can see. It displays a line.
07:53
Memory sanitizer as well is capable to catch this initialized variable, which can go under the radar very well in production.
08:05
It works in production environment, but it's not correct code, obviously. As well, threat sanitizer is perfectly capable to catch this obvious waste condition.
08:31
Like this. Again, it shows you where the problem lies.
08:48
Ah, sorry. So, here, memory sanitizer was able to catch the initialized variable very well.
09:12
A different behavior sanitizer is capable to catch those two obvious errors. The alignment issue and then the integral overflow should be low, as you can see.
10:10
So, here, to show you the flag to pass to the frontends. So, memory sanitizer is memory, address sanitizer is address, threat sanitizer is address, and different behavior.
10:47
So, we mentioned earlier the leap phaser component. But what is phasing all about in the first place? It's a testing technique to catch certain type of bugs with software, mainly libraries, I might say, which relies on XML inputs.
11:07
Can be just reading a config file, can be listed in a socket, whatever you like. If we take an example, an image picture pressure, if you want to fuzz the picture format detection, if you want to fuzz how it detects PNG, JPEG, and so on.
11:29
So, the leap phaser will use inputs which we can call corpuses in the fuzzing vocabulary. So, those corpuses don't have to be full picture, can be just the first bytes of the picture format.
11:46
Then, the leap phaser will take those corpuses, will proceed to do some transformation which we call mutations. It will insert some random bytes to some random offset, remove some other bytes, eventually.
12:01
In order to trigger segmentation fault, this error, whatever. Those mutations will be then stored so they can be reused once you fix your bugs. So, fuzzing is meant to be run long enough, I mean hours at least, if not
12:23
days, if not weeks, if necessary, in order to cover the code as much as possible. So, as you can see, that completes pretty well the test we all know. But leap phasing is nice, but there are some culprits.
12:46
I mean, as I said, it fits better with libraries because with monolithic applications, for instance, if you want to fuzz nginx, that can become very difficult. It's a software relying on events, and leap phaser runs the code several times.
13:07
That can contradict pretty well, pretty much, the application workflow for this case. You might need to do a lot of changes in order to fit leap phaser needs.
13:20
So, here, to display how leap phaser works, you have your fuzz bin array, you have one or several core purposes.
14:15
We have also, as an option, it supports dictionary. Dictionary is a sort of way to guide the fuzzing.
14:23
Sometimes you may want to avoid too much pointless randomness. Let's take, as an example, you want to fuzz a HTTP server. You may not want to fuzz keywords like GET, PUT, DALET, and so on. Just maybe some part of the client request.
14:41
So, dictionary is a good way to guide a little bit, to make more sense of the fuzzing. And then, with the corpus and then eventually the dictionary, the inputs will undergo some mutation,
15:02
and those mutations will be then stored in the same place as the original inputs.
15:27
So, how in practice work leap phaser? You need to at least implement LVM fuzzer test when inputs, which take as an argument the mutated data.
15:43
So, it's a C function, right? And then you do what you have to do with this. So, for instance, there is an obvious overflow here. So, that's why I recommend to combine fuzzer with at least a sanitizer, like a sanitizer, for instance.
16:09
Once you compile your first bin array, you will see.
17:16
So, I type this, and then it comes with several options.
17:51
It shows how much fun it will do, the memory usage limit, if you want to do some parallel jobs.
18:01
What you can do as well is the max length of inputs, the initial seed for randomness. There are plenty.
18:22
Again, on FreeBaseD, the leak detection is not supported, but there are many. So, then you have to create a meaningful corpus folder.
19:10
Ah, sorry. For instance, I asked him to run time with this input folder,
19:29
and then it was able to catch the overflow. So, it created a crash file.
20:53
So, yet, normally, it should create some mutated data with a Nash code, and then the transform inputs.
21:57
There it is. So, we have now XQuery cementation.
22:07
As I said earlier, it's for doing performance benchmarking. For example, you are doing a new version of your software for your company,
22:20
and then a customer of yours calls you to tell you that this new release had severe performance drops compared to the previous version. So, XQuery cementation allows you, at least will help you, to find out where the bottlenecks really lie.
22:41
So, XQuery cementation will, when you compile your binary with XQuery cementation, will put some instrumentation hook in each function entry and each function excite for the instrumenting function because you can choose which function you want to instrument and which function you don't want to.
23:04
For instance, the more function you instrument, the slower the binary will get. So, you have to choose carefully which part of the code you want to instrument.
23:27
In order to do this, you need to add those attributes. So, you want to instrument always those two. You can also say you don't want to write these attributes.
23:59
To discard the function, you want to avoid to instrument.
24:27
So, by default, once you run your updated binary, it will create a file, but you can change the file naming. By default, it's x-ray log, the name of the application, and then hash, right?
24:47
And then, with LLVM X-ray, you can find out, you can do some accounting.
25:01
That means showing where your application spends most of the time. You can order because it will generate kind of CSV-like presentation.
25:24
So, it says here the first version of the Fibonacci function is the bottleneck in this case, right? So, that's nice adding some attributes.
25:40
In practical, I might say you may not want to touch your code too much to that degree, at least when it's your corporate work. So, fortunately, there is another solution. It's via an XADAR configuration file as follows.
26:01
You can say, please always instrument those two, and you can say like these two. Never instrument this one, please. And then, you can pass this config file as follows.
26:39
Same thing, it generates the same binary.
26:58
So, to summarize, here is your binary compiled with X-ray instrumentation.
27:06
So, I mentioned an instrumentation hook, but they are empty until you run the binary. And then, it will fill with timer. So, in the beginning of each function, the xi point of each function in order to generate the delta.
27:33
So, with LLVM X-ray graph, you can generate the code graph, and then from this, you can generate SVG, for example.
27:48
So, X-ray instrumentation works well with multi-thread case, but you have the possibility to aggregate data because it can become very verbose, obviously.
28:01
So, you can aggregate, you can tell to aggregate the data in one point. So, yes, that will be all.
28:28
Fortunately, my FreeBSD machine had crashed yesterday, so I couldn't use it. If you have any concern, question?