We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Dangerous Optimizations and the Loss of Causality

00:00

Formal Metadata

Title
Dangerous Optimizations and the Loss of Causality
Title of Series
Part Number
8
Number of Parts
20
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Increasingly, compiler writers are taking advantage of undefined behaviors in the C and C++ programming languages to improve optimizations. Frequently, these optimizations are interfering with the ability of developers to perform cause-effect analysis on their source code, that is, analyzing the dependence of downstream results on prior results. Consequently, these optimizations are eliminating causality in software and are increasing the probability of software faults, defects, and vulnerabilities. This presentation describes some common optimizations, describes how these can lead to software vulnerabilities, and identifies applicable and practical mitigation strategies.
Insertion lossCausalityMathematical optimizationGoodness of fitJSONXMLUMLComputer animation
BitCompilerSlide ruleCodeResultantSoftware bugVulnerability (computing)Computer hardwareStrategy gameSoftware developerStandard deviationComputer programMechanism designInformation technology consultingCondition numberInformation securityMultiplication signAssembly languageCorrespondence (mathematics)Mathematical optimizationPrincipal idealInterior (topology)ProgrammschleifeType theoryLoop (music)ImplementationProgramming languageQuicksortProcedural programmingAreaCASE <Informatik>Compiler constructionIdentifiabilityComputing platform2 (number)Characteristic polynomialBound statePoint (geometry)Video gameRun time (program lifecycle phase)Extension (kinesiology)Software testingGame theory1 (number)Asynchronous Transfer ModeSeries (mathematics)Crash (computing)Integrated development environmentString (computer science)Latent heatOpen setFluid staticsDifferent (Kate Ryan album)Rule of inferenceBell and HowellPredictabilityDescriptive statisticsComputer fileComputerPlanningSoftwareDependent and independent variablesAbstract machineComputer programmingMathematical analysisCartesian coordinate systemSemantics (computer science)Pattern languageVotingCoprocessorElectric generatorInheritance (object-oriented programming)Personal digital assistantGroup actionPhysical systemComputer animation
CompilerEndliche ModelltheorieRight angleTotal S.A.Different (Kate Ryan album)Computer programCodeComputer programmingParameter (computer programming)Loop (music)Standard deviationVariable (mathematics)IntegerResultantExpressionPointer (computer programming)Information securityMereologyLogical constantProcess (computing)Mathematical optimizationCalculationRun time (program lifecycle phase)Software bugPreprocessoroutputCASE <Informatik>Software testingCoprocessorSemiconductor memoryOrder (biology)QuicksortBitFile formatOperator (mathematics)Division (mathematics)Programming languagePoint (geometry)Software developerDeterminismSound effectElement (mathematics)Buffer overflowSeries (mathematics)Multiplication signPhysical systemObject-oriented programmingType theoryInterior (topology)LengthIntegrated development environmentCompiler constructionLink (knot theory)PlanningBound stateMaxima and minimaPositional notationComputer hardwareProgrammschleifeFocus (optics)Wave packetInsertion lossReal numberMilitary baseMetropolitan area networkException handlingFlow separationTerm (mathematics)Computer animation
SummierbarkeitMathematical optimizationPointer (computer programming)ComputerQuicksortContext awarenessCartesian coordinate systemResultantIdentity managementAlgebraPoint (geometry)Information securityComputer programBuffer overflowMaxima and minimaCodePairwise comparisonParameter (computer programming)MathematicsCompilerProgramming languageDescriptive statisticsMacro (computer science)Goodness of fitExpressionP-valueThermal expansionVirtual machineSoftware testingSheaf (mathematics)Arithmetic logic unitFeedbackGcc <Compiler>Electronic mailing listSource codeGroup actionSoftware maintenanceSelf-organizationIntegerCASE <Informatik>Variable (mathematics)SpacetimeDefault (computer science)Endliche ModelltheorieInverse elementComputer configurationInternetworkingVulnerability (computing)Type theoryNumberRight angleForm (programming)Total S.A.RAIDOpticsBit rateDataflowParticle systemDeterminantComputer animation
Buffer overflowIntegerCASE <Informatik>Data storage deviceProgramming languageCompilerQuicksortComputer configurationMultiplication signInterior (topology)FlagMathematical optimizationObject-oriented programmingPairwise comparisonBound stateLevel (video gaming)Total S.A.Insertion lossRule of inferenceRepresentation (politics)Position operator2 (number)Order of magnitudeCodeComputer programAdditionOperator (mathematics)Differenz <Mathematik>BitAttribute grammarSoftware developerAbsolute valueElectronic mailing listRange (statistics)Sound effectMultiplicationFront and back endsRevision controlPlanningStandard deviationJava appletNumberMilitary baseInformation securitySign (mathematics)Structural loadChemical equationMathematicsResultantDataflowBit rateRight angleLogische ProgrammierspracheDebuggeroutputMetropolitan area networkGradientMixed realitySelectivity (electronic)Process (computing)Reading (process)Dimensional analysisComputer animation
Object-oriented programmingInformation securityTwitterResultantSocial classBound stateCodeElectronic mailing listStandard deviationMathematical optimizationRight angleQuicksortRun time (program lifecycle phase)Default (computer science)CASE <Informatik>Enumerated typeMereologyUniverse (mathematics)FlagVulnerability (computing)Order (biology)InformationTouch typingCompilerCausalityDecision theoryPredictabilityDesign by contractComputer programLattice (order)Dependent and independent variablesComputer programmingInterpreter (computing)Bit rateGoodness of fitError messageSoftware maintenanceCompiler constructionRevision controlPerspective (visual)Process (computing)Type theoryGroup actionLogische ProgrammierspracheStrategy gameEmailBuffer overflowStudent's t-testMultiplication signImplementationVisualization (computer graphics)AdditionTelephone number mappingOcean currentComputer animation
Computer programNumberFlagOverhead (computing)Flow separationPosition operatorGoogolLevel (video gaming)Pairwise comparisonBitRun time (program lifecycle phase)Buffer overflowAdditionBenchmarkBoolean algebraMathematical optimizationCodeIntegerPointer (computer programming)Insertion lossMessage passingRight angleCompiler constructionWordElement (mathematics)1 (number)QuicksortParallel portCategory of beingSubject indexingVideo gameBound stateCumulantMeasurementCore dumpElectronic mailing listServer (computing)Programming languageBit rateProcess (computing)CircleBuffer solutionExecution unitDataflowOptical disc driveComputer animation
Computer animation
Transcript: English(auto-generated)
Hey, good morning, everyone. Oh, wow, that's loud. I'm sorry, but
Maybe you guys can't hear me very well after last night. Anyway, pretty pretty fun show last night. Thanks everyone to for coming out Really great to see so many people up so early this morning So our first talk of the day is Robert Secord. He's talking about dangerous optimizations and the loss of causality
Howdy So, I guess we're not recording this this morning the camera seemed to disappear What's that? Yeah, I guess they were last seen with a couple French Canadian girls wandering down. Yeah Okay
Okay, so the premise of my talk is that Increasingly compiler writers are taking advantage of undefined behaviors in C and C++ languages to improve optimizations so these optimizations can Interfere with the ability of programmers to really Understand the behavior of the code
and not only sort of human analysts, but also static analysis tools and so forth can also have trouble predicting the behaviors of different compiler implementations So this sort of inability for programmers and tools to understand the behavior of the code
can lead to increase software fault defects and Presumably some percentage of those are vulnerabilities so So this work originally was published as a vulnerability note some years back This was actually a response to some plan 9 developers at Bell Labs
sort of discovered that their code that had been working had stopped working and so we investigated and wound up publishing this vulnerability note So I'm gonna start by talking a bit about undefined behaviors
So I guess a little bit about me is I should've said this on the title slide. So I'm a principal security consultant at NCC Group. I've been participating on the C standards committee for a little over a decade now So if you're wondering, you know who those idiots are that
You know define the C language and do other stupid things So, you know come come tell me what you like least about the language So one of the things I like least about the language is the definition of undefined behavior because
Not that all these concepts aren't valid but undefined behaviors was defined I think overly broadly in the standard so it means one of three very different things So first of all, it means that It gives license to compiler writers not to catch a certain program errors that are difficult to diagnose
and so the C standards written by C compiler writers for C compiler writers and They're very friendly to each other. So they sort of Pat each other on the back and say, you know, if it's if it's hard, you don't have to do it And so that's how that this comes about
The second is to avoid defining obscure corner cases, which would favor one implementation strategy over another and I have kind of a running example of this but if you take int min remained or minus one Mathematically that should give you a result of zero, but if you run that on an Intel processor, you'll get a fault
And so that's considered undefined behavior So that implementations that compile for Intel platforms don't have to write a special case to check for that condition, but so that that kind of gives you better performance, but
It requires that the programmer is now responsible for you know, guaranteeing that the code doesn't generate a fault and the third case is Identify areas of possible language conforming extension. So these are just Places where the C standard doesn't specify some behavior. So an implementation can
Provide some implementation specific behavior and a good example. This is F open. There's a few mode strings character modes defined and all the other character any other character besides the ones listed are undefined behavior and
If you provide another character you're not you're unlikely to experience a crash of any kind But that's really so that different implementations can add different modes That can be used to open a file so regardless of why Behaviors classifies undefined an implementation can completely ignore that undefined behavior with unpredictable results
The thing we say most often in the C standards committee is a compiler could go off and play the game of life at that Point in time and that's perfectly within the the bounds of the standard It could behave in a documented manner characteristic of the environment
So for example, if you do int min remainder minus one on an Intel chip Characteristically that environment results in a fault Or it can terminate a translation or execution with a diagnostic So the basic design of Optimizer for C compiler is the same as for any other procedural language. The fundamental principle is that you want to replace
computations with More efficient methods that produce the same results and then I have this very very over simplistic A Description where I say, you know, so optimizations that eliminate undefined behaviors are good and and
Optimizations that introduce vulnerabilities are bad, right? So it's pretty pretty basic So the C standard has something known as the as-if rule and this basically says that
Compilers will So the standard specifies the results as if on this abstract machine that's very detailed but doesn't specify the methods that implementations must follow So so in the abstract machine everything's evaluated as specified by the semantics, but
actual implementations Really can choose any method they want that produces the same results so this So this gives compilers the leeway to remove code that's deemed unused or unnecessary when building a program and one of the peculiarities of this industry is that
Code that's added with security in mind is the code that's most frequently removed by the compiler and and the reason for that is most security vulnerabilities involve undefined behavior and
While testing for undefined behavior is very easy to invoke undefined behavior And once you invoke undefined behavior, your code can be optimized out So there's three general strategies compiler writers will follow to To generate code. So the first is the hardware behavior in which the compiler just generates the corresponding assembly code and
You let the hardware do whatever the hardware does and for a long time that was really how C worked so, you know old programmers Are very familiar with that type of
implementation strategy a Second is super debug and here you you basically try to trap as many undefined behaviors as you can to to assist in testing But that type of policy really degrades performance so it's not useful as a runtime protection mechanism
And the third is a total license So here you'll treat any possible undefined behavior as a can't happen condition and that permits very aggressive Optimizations and that's sort of what we're talking about So if you look at a particular compiler What you tend to find is that compiler writers
Don't really have sort of strong belief systems, so so you'll you'll find a loop and loops are good place to look for optimizations because compiler writers optimizers are really Really kind of focusing on loops because it's a good place to gain You know make make performance gains
so you can look at a loop that Basically implements a hardware model and next to it will be a loop that looks almost identical but will implement a total license model and so Compiler writers tend to be very pragmatic and sort of do whatever they want Whenever they want to do it without a without necessarily a real strong underlying philosophy
Well, so so the philosophy is you have these compilers and they tend to grow up with different code bases, right? And so the compiler is developed to compile the code that people Use that compiler to build and
And and so these code bases and compilers kind of grow up hand in hand and and that's that's the pragmatism involved So a very simple optimization to kind of get warmed up is constant folding So this is the process of simplifying constant expressions at compile time
Constant expressions can be simple literals like the integer two variables that are just not modified or variables that are explicitly marked as constant and So here's my my example again. I have int man remained or minus one So if you compile that
You're almost definitely going to get the result of zero which is mathematically what you would expect so If I look at this with Microsoft Visual Studio and this is with optimization turned off What happens is it simply?
It simply prints the value zero, right? So there's no attempt to perform that calculation at runtime This is this is basically, you know a runtime constant expression. So the preprocessor determines what the actual result of that expression is and inserts that instead of
Performing a runtime calculation so So, you know, unfortunately if you're doing like a simple test in order to evaluate what you know If your processor faults on that expression or not You might be misled So you have to take a little bit of care when writing these sort of tests to make sure you're actually testing the behavior
You think you're testing so in this case I made the input sort of non-deterministic So the Compiler was required to to generate the actual division instruction And of course the problem here is that?
remainder on Intel processors implemented as a side effect of the division operation and division operation will fault for int min divided by minus one because That value is not representable in the resulting type I normally tell people to ask questions as we go along but this is not a question friendly sort of
environment Okay So so this is the fun one that we wound up reporting that the plan 9 developers had an issue with
so So C11 basically says that if you have an expression where you add or subtract an integer From a pointer the result has the type of the pointer operand so that's generally pointer arithmetic and Expressions where you use the array notation
basically translate into a pointer arithmetic expression where you add the pointer to the scaled integer size and then dereference it so It's just a sort of thin veneer for pointer arithmetic So what C11 says is
Is that you can and try to use a pointer here? So it says you can form pointers to each element of the array Including the one too far element of the array and that you can dereference all these elements Except that the one too far and so this is a little interesting
artifact of the C language and it was because C programmers used to You know write loops where they would increment a pointer and then test if it was Pointing to the one element past the end of the array. So when C was standardized in
1989 The committee decided to allow the the creation of this pointer so they didn't break a ton of existing code So that's why that works like that But the formation of this pointer or Dereferencing this pointer is undefined behavior in the C language. So a programmer might want to
Code a bounds check, you know to avoid buffer overflows so in this example we have Pointer which points to the start of an array max it points to the end of an array and we have a length
Specified as a size T. So size T is an unsigned type that's large enough to represent the largest object that can be allocated on a system and Anytime you you store a size or length or something It really should be as a size T variable and it should never be as an int
So So no matter which model you look at Oh So so then we test to see if the pointer plus the length Is greater than the maximum size that would indicate we're going to perform a write outside the bounds of the array So no matter what model you You consider there's a bug in this code, which is that for very large values of length
Pointer plus length will overflow creating undefined behavior so under the hardware model programmers would expect the result to wrap around which would produce a Pointer to memory, which is actually lower in memory than than pointer So to fix the bug an experienced programmer who's internalized this hardware behavior model of undefined behavior
might write a check where basically, they add this This sub expression which tests to see whether pointer plus lin is less than pointer that would indicate
wraparound on the pointer arithmetic And so so there's a kind of a some terms here where we say experienced programmers might have this problem and the reason we say this is if you're if you're trying to change the C standard there's
There's several arguments you can make so the first argument is I work for Intel And we just added a new feature to the processor and we'd like C programmers be able to use that new feature So sort of evolving hardware's a good way to get the language to evolve as well the second
maybe the third start with the third is a naive programmers tend to make this mistake a lot and Basically, the the C standards committee does not care at all about naive programmers So that argument doesn't go anywhere at all So the the argument in the middle is experienced programmers such as yourselves
Might make this kind of mistake and and then they'll give that some thought So So anyway compilers that follow the total license model can optimize out the first the first part of this check that we inserted with security in mind and this is allowed because
Pointer plus an unsigned Len compares less than pointer. So now the compiler is fully aware that For that to happen undefined behavior has to occur and that undefined behavior
Is the undefined behavior described here in section 656? so So the compiler can assume that undefined behavior never happens Consequently that test is considered dead code and it can be removed So the actual optimization involved is simply algebraic simplification so
optimizations can be performed for comparisons between some pointer P and some value v1 and P and some value v2 where P is the same pointer and v1 and v2 are variables of some integer type so basically
The total license model permits that to be reduced to a comparison between v1 and v2 But if either those values are such that the sum with P overflows the comparison of v1 and v2 doesn't produce the same result as actually computing the
the values and comparing the sums So the interesting generalization there is that because of possible overflows computer arithmetic doesn't always obey the algebraic identities of mathematics So the good news about the C language is it is mathematically based It's just not the same mathematics that you learned in, you know, elementary school. So
If we go back to our example and you look at pointer plus len less than pointer That's the same as pointer plus len less than pointer plus zero So now we've got a comparison between P plus v1 and P plus v2
That can be simplified to a comparison between v1 and v2 and of course len is an unsigned value So it's impossible for len to be less than zero So this is dead code and can be eliminated
so if you're Aware of this problem. It's actually very easy to fix so If it's known that pointer is less or equal to max Which we know in this example because pointer points to the beginning of the array and max points the end of the array you can just
basically subtract pointer from each side and now we'll have We'll have len greater than max minus pointer. So max minus pointers Guaranteed not to wrap around because we know that pointer is less than max So this expression has to be
Evaluated by the compiler cannot be optimized out okay, so So when we were researching this vulnerability we you know, we had some we post it to the GCC dev list and
thanks to the magic of the internet you can still google all this stuff and I I wound up getting called a lot of like really bad names during the course of this investigation And and and the one person who was sort of the least well
Not at all obnoxious, but also helpful was the guy who actually wrote the optimization His guy called Ian Lance Taylor who leads the optimization group at Google in in Mountain View but one of the one of the feedbacks I got from the list was You know Well, this optimization helps out a lot, right?
So an example was if you look at this expression buff plus n less than buff plus 100 if you compile this with this particular optimization, this will be Optimized to a comparison of n less than 100 eliminating the possibility of wraparound in both expressions
So that's probably not a big deal unless one expression wraps and not the other in which case you'd get sort of the inverse of the Result that you were expecting but but really that original code example is still incorrect because You you don't want to rely on compiler optimizations to make your code, correct?
so the the right way to do this is to Try to eliminate the undefined behavior by performing sort of the same optimization by hand and and just reducing that to a comparison of n less than 100 and
And so, you know many Many C programmers wouldn't write this expression. But what tends to happen is These can result from macro expansions. So it's not apparent in the source code that you're going to Generate an expression of this form. So
the behavior of this pointer overflow changed as of these releases of the GCC Compiler all subsequent releases and so this was really something that happened during a maintenance release, right? So you've got a new maintenance release of the compiler and suddenly your code stops working
And and that's sort of the probably the biggest issue with undefined behavior So You know when the compiler encounters undefined behavior in your code It's it's going to do its best to produce some code that's reasonable, but it doesn't really know what you were
expecting to get because The behavior is not defined and and so but what can happen is You know undefined behaviors the compiler is free to modify what happens on each release and so as they work on often so they tend to focus on them for optimization work and
You could get the next release of the compiler and code that used to work will now start to to fail So this particular Optimizations performed by default at o2 and above include optimizing for space and Not performed at o1 or o0
It can be enabled at o1 with strict overflow option or disabled at o2 with the no strict overflow and I'm gonna I'm gonna finish a story. I started a minute ago here, but so So the guy who wrote this Optimization is this guy Ian Lance Taylor leads the optimization group at
Google and and so Google has a you know small number of applications that they run a lot You know so they have these farms of computers and so Ian determined that you know this optimization would save them
You know a half percent execution speed and And that their code didn't have any defects that would be You know would sort of suffer as a result of this optimization and so by implementing this Optimization it improved performance by half percent they could get rid of you know 180 machines off the machine floor and save
You know two hundred and forty thousand dollars a year or something like this, and so there's a very neat argument To make this optimization for you know increased performance at increased costs, but it's it's harder to It's hard to make the argument about possible security consequences and and and and the fiscal cost of that
And it's also sort of a You know GCC is a good description of GCC because it's kind of the Wild West of compilers, right? I mean so anyone could get out there and and make a change when I won But a lot of people could make changes that sort of benefit their code base and their organization
But might not be sort of the best solution for everyone Oh, so when we we sort of identified this problem Ian suggested You know so everyone on the GCC list besides calling me names
You know Defended this optimization is allowed by the standard, and they're right this this is You know allowed under the total license policy But what Ian suggested was that they add a flag that would diagnose this problem So if your code used to work, but now wasn't going to work if it was going to be optimized
Based on this undefined behavior if you use this option Strict overflow where n is greater than equal to 3 it will now be diagnosed And and in a way that's a reasonable solution You know allow the optimization but have a diagnostic if there's code that's going to be negatively affected
So So going through the GCC flags strict overflow Basically allows the compiler to assume strict signed overflow rules are this total license behavior So any signed arithmetic overflows undefined?
The compiler assumes that it can't happen and that permits aggressive optimization So for example a compiler always assumed that I plus 10 is greater than I is always true And that's that assumption is valid provided I plus 10 doesn't Overflow so when when strict overflow is in effect
Any any sort of attempt to determine whether an operation might overflow has to be very carefully written not to actually involve the overflow so this is enabled at 0 2 0 3 and OS so f wrap V
Instructs GCC to assume sign the arithmetic overflow of addition subtraction multiplication will wrap around It's very similar to no strict overflow and The flag disables optimizations that assume integer overflow behaviors undefined and this this options
required by the Java front-end which specifies integer overflow is wraparound behavior so this strict overflow equals n this is the option that lets you know when these
Optimizations are being Being made so it warns about cases where the compiler Optimizes based on the assumption that signed integer overflow does not occur And it only warns about overflow cases when the compiler implements the optimization So basically you need to compile with this flag and with o2 so you need optimization turned on
For this flag to diagnose something. So if the optimization isn't being performed you'll get no No warning so so optimizations Which assume signed overflow does not occur or safe as long as the as long as it doesn't overflow
So so a lot of times these warnings can give false positives And this is sort of a general sort of attribute of C, right? I mean so you can take a a long int and put it in a short and the compiler says sure You know why not and and the basic premise of C
You know, one of the principles is trust the programmer, right? So the compilers thinking Well, that looks kind of risky But I assume the programmer knows that the value they're storing in this long int Is within the range of a short int and so this assignment will succeed without You know loss of value
so so these can be false positives, but the programmer has to Make sure that they are So it's strict overflow equals one this this warns about cases which are questionable and easily avoidable So for example X plus one greater than X With strict overflow enabled that will be simplified to one and this level strict overflow is enabled by the wall
Flag but higher levels are not right. So So this is really the most unfortunate thing about this which is the Optimization we looked at that's going to eliminate your buffer overflow check will not be diagnosed at
With just wall enabled right which is what a lot of developers do So you need to know one of two things you need to know how to identify the the problem and code it correctly or you need to know to set strict overflow equal to three and
Both of those things are rather improbable unless you sat through This talk and you know, I explained to you that you need to do one of those things so So this is still a fairly common problem with a lot of code bases So strict overflow equals two warns about cases where comparisons are simplified to constants
So for example absolute value of X greater than equal to zero You know that that generally looks like it should it should be true, but if you take the absolute value of int min in a two's complement
representation that will overflow to int min Which is less than zero so so this sort of yeah, so so so this code which it could Could be false is going to be just optimized to be returned true
So strict overflow equals three warns about cases where a comparison is simplified for example X plus one greater than one Simplified to X greater than zero and that's the lowest level warning that will detect the the buffer overflow check problem that we identified Strict overflow equals four warns about other simplifications so for example X times ten divided by five simplified to X times two and
In general I'd be I'd be hard-pressed to think you know of situations in which you didn't want this optimization to be performed and strict overflow five also Warns about cases where the compiler reduces the magnitude of a constant so for example you might have a compiler in which
greater than and greater than equals Comparisons cost exactly the same But maybe an increment operations a little bit cheaper than an addition operation and so on those sort of Targets the compiler might we might change X plus two greater than Y to X plus one greater than or equal to Y
So this is only reported the highest warning level because it could generate a large number of false positives so so one of the things that I've been involved with so I mentioned I joined the C standards committee about a decade ago
And you know my plan was to change change the C language to make it more secure But of course what mostly happened was a C language changed me So the one of the things that we did succeed in doing was introducing
the Annex L analyzably annex into the C11 standard and so we wanted to call this a security annex, but You're not allowed to do that because if you call anything security it sort of implies that the C language is insecure and
That's true, but you're not allowed to imply it so we had to call it the the analyzability annex So So we introduced some definitions So first of all an out-of-bounds store is just an attempt to
to modify value outside the bounds of an object or to access a object that's declared volatile So then we we define two versions of undefined behavior, so there's bounded undefined behavior Which is undefined behavior that does not perform an out-of-bounds store
But might perform a trap or might produce an indeterminate value And if you read the indeterminate value in C. That's undefined behavior so and critical undefined behavior, which is undefined behavior that's not bounded and so in the current version C. There's about
200 plus Explicitly defined undefined behaviors and all of them are by these definitions critical undefined behaviors and so what we did by sort of creating these two definitions is that we
We restricted the list of critical undefined behaviors to these eight So these eight can result in out-of-bounds rights and the other 292 plus undefined behaviors are now bounded undefined behaviors that That cannot result in writing outside the bounds of an object
So this is true you know if and only if annex L is implemented by the compiler and I don't yet know of any compilers that support annex L, so but it's presumably a good idea
okay, so Summary and recommendations I'm running a little faster than usual because of the lack of questions so
So you know and this is kind of obvious But you really want to avoid undefined behaviors in your code even if your code appears to be working You know I've had this discussion. I had this discussion with a professor at some university where They had some code that had undefined behavior And I sent them email and I said you know this code you're teaching your students has undefined behavior, and he said oh
No, I tested it it works, okay, and then I decided that this guy was someone else's problem because It's just you know You can't you can't just do that all all day long
So I think a good piece of advice is to find and eliminate dead code yourself instead of letting the compiler do it You know when you write code Normally the intent is that that code does something right so you don't write code that is You know where the intent is for it not to do anything, so it's not doing something that usually indicates
There's a logic error there are some times People write dead code on purpose you know and and a good example is You know default clause and a switch right so you'll You'll have an enumeration of types. You'll have a case for each enumeration, and then you'll write a default clause so
In well-written code that default code is a dead code because it's never going to be executed new compiler will optimize it out But it's there in case a maintainer adds another value to the enum But fails to add the additional case switch that default clause might then detect that error at runtime
So it's basically a defensive programming strategy and and not bad code to have so the fact that You know that code gets reported as dead code. That's that's okay It's it's dead code that was intentionally written as opposed to dead code that was unintentionally written
so So when you look at all these optimizations some of them eliminate undefined behavior some of them might introduce vulnerabilities You know so so sort of the surprising conclusion I have Because when you get to this part a lot of security people Make the wrong decision right which is they tend to be very dogmatic and say turn off anything that might ever
cause a problem ever But generally speaking you know optimizations do more good than harm, right? So they they actually eliminate a lot of undefined behaviors from your code, and they also make your code run faster. Which is cool
And you know so with that additional performance you might be able to afford some more Some more security checks, so you know gaining performance is always a positive so my recommendation is you know you know go ahead and build with these optimizations, but Use the strict overflow flag in order to diagnose
these optimizations and see whether See whether the compiler is making Assumptions that are inconsistent with your assumptions, and what that means is is that your your assumptions were wrong So then you need to revisit your code And my last piece of advice is you know tell your compiler vented to implement annex el and then and then use it
So in summary See the C standard is a contract between compiler writers and programmers the The contract is constantly amended but
Really only compiler writers show up to the meetings, so it's sort of like getting divorced from your spouse and saying You know no you go ahead and take care of the you know the agreement. I trust you You know that you're not going to screw me over And you know that doesn't usually work
So So you know unless Unless more security conscious groups sort of participate in the process The the overall tendency is for the standard to to get worse from a security perspective, right? So so we'd like to go to bed thinking things are getting better, but actually they're getting worse, so you know all these
Compiler writers they like to eliminate guarantees from the standards and the more guarantees they eliminate The more room they have for further optimization, and so that's sort of the overall trend the trend is not towards greater security and
predictability of execution Okay, that's all I have this is how you get in touch with me if you have additional questions and What do I do now do I take questions for a bit? Yep, I haven't heard about it. I haven't seen it
So so this gentleman was claiming that Microsoft Visual Studio compiler adds Telemetry information to the code it compiles and if you want to turn that off you have to find a flag
so Yeah, so I'm gonna give you my sort of strict
Interpretation of response to that right which is I would say that would be allowed if there were undefined behavior in your code But send me an email on that I'll I'll look into that it's interesting any other questions
Yeah Yeah, yeah, oh well so that the levels are
How do you say this so the question is about using overflow levels versus separate flags so the levels are I?
Want to use the word cumulative. I'm not sure that's the right word So if you if you set to level five you get everything all five Of yeah
Right I mean it could be set to five and and so you know this is not entirely scientific But sort of you know, but basically the the levels are meant to be
You know sort of Decreasing chance of false positive right so at level one It's it's likely that it's a true positive and as you increase the levels you're going to produce more false positives So it's a trick to
Reduce the number of diagnostics that the programmer sees because programmers You know are prone to ignore warning messages, and and so if you if you give them too many warning messages They might simply turn it off so you know my kind of recommendation with the levels would be start with level three and
Try to address those warnings, and then you might try setting it to four kind of see what happens And if it looks too intimidating you could you could back off to two three again But I would I would at least look at the three
Yeah, yeah, yeah, that's true, but it yeah and it's You know there's a lot of things in life that are unfortunate right, but I mean the parallel
I'm thinking is I I started playing guitar when I was five and so I got a $70 guitar right which are They're impossible to play right, but you're not going to give you a five-year-old offender You know so so so the amateurs kind of have the worst sort of tools in a way
Okay any other questions There was some more yeah there Well there'll be some additional runtime overhead, I don't know if you mentioned it so oh
Well that's the greatest concern is the runtime overhead you know so it's it's So that list of critical and defining behaviors that was
Very carefully negotiated over several years and a lot of back and forth and it sort of surprised me how? How much pain went into that given that no one's implemented these the annex yet? so you know so you're there are
So I'll give you a quick example. You know so say you have a boolean and You have an array of With two elements right so you'll say you'll take the boolean and use as an index into the array so now the problem with that is a boolean is typically a
you know at least a byte and So there's a lot of values that can be represented besides zero one so compiler writer might just take that value Without checking it in the index the array And if that value or something other than zero one you'd have an out-of-bounds right and and so that sort of behavior would be disallowed
Under annex L. So so now there's some additional runtime checks and so forth you need to do to make sure that Boolean is actually true or false and doesn't have a you know different value I
Don't because it hasn't been implemented, so there's you know we can't run the spec benchmarks and come up with a comparison Okay, sir
Yeah Yeah, no I agree with that. I mean because a lot of Obfuscation techniques right insert code that you don't need And optimizers remove code that you don't need right so there's sort of an inverse process, too
Yeah, well yeah, so if you're if you're depending on So so in C language, right?
Unsigned integer wraparound is well-defined behavior, and it's required to wrap around signed integer overflow is undefined behavior And so it can it can be treated like we discussed and and something I glossed over in the talk a little bit Which is kind of odd?
is that You know the strict overflow flag and a lot of those flags deal with Behavior and signed integer overflow But the actual undefined behavior in the wrap around the buffer overflow example was not integer overflow It was undefined behavior that results from adding a pointer to an integer and it being outside the bounds of an array
but it was sort of thrown into the same category with the integer overflow because Most programmers Don't sort of separate out those concepts, so it's it's a little bit Confusing so even you know Ian Lance Taylor
Google you know kind of Kind of blurred the correctness of this by combining adding that flag to the integer overflow flag Okay, any questions down this side. I haven't been looking that way so much It's because you guys are still hung over. I guess let me look back there sir
You know I don't know of any current efforts to implement it Maybe someone's working on it, and they they're keeping it below the radar
Any other questions I Guess I should repeat the questions so people can hear you want to know if anyone was implementing an Excel Okay no further questions starting to feel like I'm done then
Yeah, okay. Thank you