Dangerous Optimizations and the Loss of Causality

Video in TIB AV-Portal: Dangerous Optimizations and the Loss of Causality

Formal Metadata

Dangerous Optimizations and the Loss of Causality
Title of Series
Part Number
Number of Parts
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Increasingly, compiler writers are taking advantage of undefined behaviors in the C and C++ programming languages to improve optimizations. Frequently, these optimizations are interfering with the ability of developers to perform cause-effect analysis on their source code, that is, analyzing the dependence of downstream results on prior results. Consequently, these optimizations are eliminating causality in software and are increasing the probability of software faults, defects, and vulnerabilities. This presentation describes some common optimizations, describes how these can lead to software vulnerabilities, and identifies applicable and practical mitigation strategies.
Goodness of fit Computer animation Causality Insertion loss Mathematical optimization
Compiler construction Programming language Implementation Code Mathematical analysis Programmer (hardware) Fluid statics Computer animation Different (Kate Ryan album) Compiler Quicksort Mathematical optimization Vulnerability (computing)
Point (geometry) Principal ideal Computer programming Implementation Group action Identifiability Computer file Code Multiplication sign Characteristic polynomial 1 (number) Bell and Howell Coprocessor Information technology consulting Programmer (hardware) Crash (computing) Strategy game String (computer science) Compiler Series (mathematics) Extension (kinesiology) Information security Computing platform Vulnerability (computing) Condition number Area Predictability Compiler construction Programming language Dependent and independent variables Standard deviation Electric generator Software developer Bound state Planning Cartesian coordinate system Computer animation Integrated development environment Personal digital assistant Video game Pattern language Quicksort Game theory Resultant Asynchronous Transfer Mode
Standard deviation Implementation Computer animation Compiler Procedural programming Abstract machine Rule of inference Semantics (computer science) Mathematical optimization Resultant Descriptive statistics Neuroinformatik
Computer programming Implementation Run time (program lifecycle phase) Code Correspondence (mathematics) Multiplication sign Programmer (hardware) Mechanism design Programmschleife Strategy game Different (Kate Ryan album) Compiler Computer hardware Software testing Endliche Modelltheorie Information security Mathematical optimization Physical system Condition number Compiler construction Focus (optics) Assembly language Inheritance (object-oriented programming) Bit Total S.A. Type theory Voting Loop (music) Computer animation Personal digital assistant Right angle Quicksort
Logical constant Process (computing) Run time (program lifecycle phase) Computer animation Calculation Expression Integer Right angle Variable (mathematics) Resultant Mathematical optimization
Run time (program lifecycle phase) Expression Sound effect Division (mathematics) Determinism Bit Coprocessor Preprocessor Type theory Computer animation Integrated development environment Personal digital assistant Calculation Operator (mathematics) Order (biology) Compiler output Software testing Series (mathematics) Quicksort Resultant
Point (geometry) Programming language Code File format Software developer Expression Planning Element (mathematics) Wave packet Type theory Programmer (hardware) Pointer (computer programming) Programmschleife Computer animation Positional notation Integer Software testing Quicksort Resultant
Point (geometry) Computer programming Link (knot theory) Length Interior (topology) Code Multiplication sign Parameter (computer programming) Mereology Coprocessor Software bug Programmer (hardware) Object-oriented programming Semiconductor memory Computer hardware Compiler Software testing Endliche Modelltheorie Information security Physical system Programming language Standard deviation Bound state Maxima and minima Total S.A. Variable (mathematics) Type theory Pointer (computer programming) Computer animation Quicksort Buffer overflow Resultant
Pairwise comparison Programming language Code Sheaf (mathematics) Total S.A. Variable (mathematics) Type theory Arithmetic logic unit Mathematics Pointer (computer programming) Computer animation Compiler Software testing Summierbarkeit Endliche Modelltheorie Algebra Mathematical optimization Resultant Buffer overflow Identity management
Point (geometry) Pairwise comparison Pointer (computer programming) Computer animation Bit rate Compiler Expression Maxima and minima RAID
Group action Code Expression Feedback Electronic mailing list Inverse element Particle system Computer animation Internetworking Personal digital assistant Compiler Right angle Quicksort Mathematical optimization Resultant Vulnerability (computing)
Point (geometry) Code Source code Expression Thermal expansion Software maintenance Programmer (hardware) Mathematics Computer animation Optics Compiler Right angle Quicksort Macro (computer science) Buffer overflow Mathematical optimization Form (programming)
Dataflow Group action Code Virtual machine Parameter (computer programming) Rule of inference Neuroinformatik Number Mathematics Goodness of fit Sign (mathematics) Bit rate Computer configuration Compiler Flag Determinant Information security Mathematical optimization Descriptive statistics Form (programming) Default (computer science) Electronic mailing list Total S.A. Cartesian coordinate system Computer animation Self-organization Quicksort Buffer overflow Resultant Spacetime
Dataflow Homologie Interior (topology) Code Multiplication sign Range (statistics) Insertion loss 2 (number) Attribute grammar Programmer (hardware) Differenz <Mathematik> Bit rate Computer configuration Operator (mathematics) Compiler Representation (politics) Flag Energy level Integer Selectivity (electronic) Absolute value Position operator Mathematical optimization Metropolitan area network Pairwise comparison Addition Multiplication Military base Structural load Software developer Debugger Gradient Sound effect Curvature Process (computing) Computer animation Personal digital assistant Mixed reality output Right angle Quicksort Logische Programmiersprache Buffer overflow Resultant
Addition Pairwise comparison Computer animation Personal digital assistant Operator (mathematics) Compiler Energy level Quicksort Reading (process) Position operator Order of magnitude Number
Revision control Programming language Mathematics Computer animation Object-oriented programming Chemical equation Bound state Planning Quicksort Information security Dimensional analysis
Computer animation Object-oriented programming Compiler Electronic mailing list Bound state Right angle Resultant
Computer programming Run time (program lifecycle phase) Code Decision theory Mereology Goodness of fit Strategy game Causality Compiler Flag Error message Information security Mathematical optimization Vulnerability (computing) Social class Default (computer science) Software maintenance Type theory Computer animation Personal digital assistant Enumerated type Order (biology) Universe (mathematics) Right angle Quicksort Logische Programmiersprache
Group action Run time (program lifecycle phase) Code Execution unit 1 (number) Design by contract Perspective (visual) Cumulant Programmer (hardware) Optical disc drive Bit rate Core dump Flag Circle Information security Position operator Predictability Compiler construction Programming language Electronic mailing list Bit Lattice (order) Flow separation Measurement Benchmark Category of being Message passing Process (computing) Buffer solution Right angle Quicksort Dataflow Server (computing) Overhead (computing) Twitter Number Element (mathematics) Touch typing Energy level Integer Mathematical optimization Boolean algebra Pairwise comparison Addition Standard deviation Dependent and independent variables Information Bound state Subject indexing Word Pointer (computer programming) Interpreter (computing) Buffer overflow
Computer animation
i . death and and the and the and and be a
good morning everyone will its sorry but maybe you guys have hear me very well after last nite anyway pretty a pretty funny show last nite thanks everyone to for coming out of really great to see so many people up so early this morning so our 1st talk that is Robert accord is talking about dangerous optimizations and loss of causality the and so I guess we're not
recording this this morning on the cameras and disappear but this it's what's yeah I again guess so unless you with a couple Canadian girls wondering down thank but it's so the
premise of my talk is that increasingly compiler writers of taking advantage of undefined behaviors in C and C + + languages to improve optimizations so these optimizations can interfere with the ability of programmers to really understand the behavior of the code and not only because sort of a human analyst but also a static analysis tools and so forth could also have trouble predicting the behaviors of the different compiler implementations so uh this sort of inability for programmers and tools to understand the behavior of the code uh can lead to increases off a fault defects in a presumably some percentage of those our vulnerabilities 2 so the so
this work originally was published as abundantly notes that some years back and uh this was actually response to some Plan 9 developers at Bell Labs uh so that discovered of the uh that their code that have been working uh had stopped working and so we investigated and I wonder publishing this vulnerability note so i'm
gonna have uh start by talking about undefined behaviors so there is a little
bit about me is just of this Intel's lies on the principal security consultant at NTC group I've been participating on the C Standards Committee for a little over a decade now so if you're wondering you know who those who those idiots are that but you know define the C language and do a stupid things because of the fact that some of you know come come tell me what you like least about the language I so the 1 of things I like least about the language is the definition of undefined behavior because I'm not that all these concepts on ballot but undefined behaviors was defined I think overly broadly in the standard so it means 1 of 3 very different things so 1st of all it means that it gives license to compiler writers not to catch a certain program areas that are difficult to diagnose and so the C standards written by the C compiler writers for C compiler writers and they're very friendly to each other so that there is sort of a pattern to the back and say you know if it's if it's hard you have to do it and so that so that this comes about the uh the 2nd is to avoid defining obscure corner cases which would favor 1 implementation strategy over another and have come a running example of this but if you take uh in men remainder minus 1 but mathematically that should be the result of 0 but if you run them on Intel processors you'll get a fault and so that's considered undefined behavior so that implementations that compile for Intel platforms don't have to write a special case to check for that condition but that's so that that kind of gives you a better performance but but it requires that the programmer and is now responsible for for you know guaranteeing that the code doesn't generate fault and the 3rd case is identify areas of possible language conforming extensions so these are just that places where the C standard doesn't specify some behavior so that an implementation can provide some implementation-specific behavior and a good example this is the open there's a few mold strings of the character modes defined and all the other characters any other character beside the ones listed or undefined behavior and if you provide the character that you're not you're unlikely to experience a crash of any kind but that's really so that different applications can they had different modes of that can be used to open a file so regardless of why behaviors classifies undefined and implementation can completely ignore that undefined behavior with unpredictable results but the thing we say most often in the C standards committee is a compiler could go off and play the game of life at that point in time and that's perfectly uh within the the bounds of the standard but can behave in a document it matter characteristic of the environment so for example if you do in me and remainder minus 1 on the Intel chip characteristically that environment results in a fault all work and terminate the translation execution with the diagnostic now so the
basic design of optimizer for C compiler is the same as for any other procedure language the fundamental principle is that you want to replace computations with more efficient methods that produce the same result and then I have this very very over-simplistic and a description were saying you know so optimizations that eliminate undefined behaviors are a good and and optimizations that introduce of all a bad and so pretty pretty basic
of so the C standard has something known as the as if rule and this basically says that the compilers will but so that's the standard specifies results as upon this abstract machine that very detailed but doesn't specify the methods that that implementations must follow so so in the abstract machine on everything's evaluated as specified by the semantics but the actual implementations of really can choose any method they want that produces the same results so this some
but so this gives compilers the leeway to remove code is deemed unused unnecessary when building a program and what 1 of the peculiarities of this industry is that um code that's added uh with security in mind is the code this most frequently uh remove the high the compiler and and the reason for that is the most security vote of those involved undefined behavior and uh while testing for undefined behavior is very easy to invoke undefined behavior and once you invoke undefined behavior code can be optimized out so there's 3
general strategies compiler writers will follow to of to generate code so the 1st is the hardware behavior in which the compiler just generates the corresponding assembly code and you let the hardware do whatever the hardware does and for a long time that was really have C worked so you know the old programmers of the and you are a very familiar with that type of implementation strategy a 2nd Super debugging in here you you basically try to trap is many undefined behaviors you can to but assistant testing but but that type of policy really degrades performance so it's not useful as the runtime protection mechanism In the 3rd is the total license so here you treat any possible undefined behavior as that can't happen condition and that permits very aggressive optimizations and that sort all talked about so if you look at a particular compiler um what you tend to find is that compiler writers but don't really have so the strong belief systems so you'll you'll find a loop and and and loops a good place to look for optimizations because the compiler writers optimizes a really uh really kind of focus in on loops because it's a good place to gain a bit you know make make performance gains and so you can look at a loop that um basically implements a hardware model and next to it will be a loop that looks almost identical but will implement a total license model and so the compiler writers tend to be very pragmatic and so they do whatever they want whenever they wanna do it with and without a without necessarily a real strong underlying philosophy and well so so the philosophy is that you have these compilers and they tend to grow up with different codebases rights and so the compiler is developed to compile the code that people uh if you use a compiler to build and and and and so these codebases in compilers can grow hand-in-hand in and and that's that's the pragmatism involved
so the very simple optimisation the kind 1 got this constant folding so this is the process of simplifying cost expressions at compile-time constant expressions can be simple literals like the integer to of variables that are just not modified or variables that are explicitly marked is constant and so here's my my example again I have come in many remain demise 1 but so few compiled that you're almost definitely going to get the result of 0 which is mathematically what you would expect so um if I look at
this with uh Microsoft Visual Studio in this with optimization turned off but what happens is it simply is simply prints a value 0 right so there's no attempt to perform that calculation at runtime it this is this is
basically a you know a runtime cost expression so the preprocessor determines what the actual as a result of that expression is an answer set instead of performing a runtime calculation so I so you
know unfortunately if you're doing like a simple test in order to evaluate what you know if your processor faults on that expression and not of you might be misled so you have to take a little bit care when writing the sort of test to make sure you're actually testing the behavior you you think you're testing so in this case I made the input sort of non deterministic so the compiler was required to to generate the actual division instruction and of course the problem here is that the remainder on Intel processors implemented as a a side effect the division operation and division operation will fall for it means divided by minus 1 because that values not representable in the resulting type not only tell people to ask questions and go on but is not a question friendly sort of environment the take so so this
is the final 1 that we won that reporting from that the Plan 9 developers had an issue with so um so see 11 basically says that if you have an expression where you add or subtract integer from a pointer the result has the type of the pointer operand so that's generally pointer arithmetic and expressions where you use the array notation of of basically translate into of a point arithmetic expression where you have a pointer to the scaled integer size and then the references so uh it's just the sort of thing but the year 4 point arithmetic but so what see
11 says is is that you can and trains appointed here so it says you can form of pointers to each element of the array including the 1 2 4 element of the array and that you can be reference all these elements except that would the 1 2 4 and so is a little interesting from an artifact of the C language and it was because C programmers use to but you know that right loops where they would increment the pointer and then test if it was a pointing to the 1 element past the end of the array so once C was standardized in nineteen eighty nine from the Committee decided to allow the for the creation of this pointer so they didn't break it kind of existing code but so that's why that works like that but the formation of this pointer or dereferencing of this pointer is undefined behavior and the C language the so a programmer might
want to code a bounced check you know to avoid buffer overflows but so in this example we have a pointer which points to the start of an array points to the end of an array we have a link a link as specified as the size t so size T is an unsigned type that's large enough to represent the largest object that can be allocated on the system and any time you store size or length or something it really should be is a size t variable in it should never be as an int and so uh so no matter which model you look at so so then we test to see if the pointer plus the length is greater than the maximum size that would indicate we're going to perform a right outside the bounds of the array so no matter what model you you you consider there's a bug in this code which is at for very large values of length pointer + length will overflow creating undefined behavior so under the hardware model programs would expect the result to wrap around which would produce a appointed to memory which is actually lower in memory then then pointer so to fix
the bug experienced programmer whose internalizes the hardware behavior amount of undefined behavior that might write a check where uh basically they at this and this subexpression which test to see whether pointer + lenders less pointer that would indicate uh wraparound on the point arithmetic and n so so there's a kind of a sometimes here where we say experienced programmers might have this problem and the reason we say this is an if you're if you're trying to change the C standard there's uh the several arguments you can make so the 1st argument is that I work for Intel and we just added a new feature to the processor would like to C programmers be reuse of new feature so this is sort of evolving hardware is a good way to get the language to evolve as well as the 2nd and maybe the 3rd so is there is a naive programmers tend to make this mistake a lot and uh basically the the cease Standards committee does not care at all about naive programmers that's that's so that argument that doesn't go anywhere at all but that's it so that this the the argument in the middle is experienced programmers such as yourselves ahem might make this kind mistake and and then they'll give that some thought so so so anyway compilers that follow the total license model can optimize out the 1st the 1st part of this chapter that we answered it with security in mind and this is allowed because of pointer plus an unsigned land compares less than pointer so now the compiler from is fully aware that for that to happen undefined behavior has to occur and then undefined behavior that is the undefined behavior
described here in section 6 5 6 so so the compiler can assume
that undefined behavior never happens consequently that test is considered dead code and it can be a removed
so the actual optimization involved is is simply algebraic simplification so optimizations can be performed for comparisons between some pointer p and some value V 1 and uh P and some value be to uh where p is the same a B 1 and B 2 are the variables of some semantic types so basically from the total license model permits at to be reduced to a comparison between B 1 and B 2 but if either those values are such that the sum was p overflows that comparison of B 1 and B 2 but does it produce the same result as actually computing the uh the values and comparing the sums that's it so the interesting generalization there is that because of possible overflows computer arithmetic doesn't always obey the algebraic identities in mathematics from so so the good news about that the C languages it it is mathematically based it's just not the same mathematics you learned in you know elementary school so so
if we go back to our example and you look at that pointer plus land lesson pointer and that's the same as pointless when lesson pointer plus 0 so now we've got a comparison between P + B 1 and p + b to uh that can be simplified to do a comparison between B 1 and B 2 and of course is an unsigned value so it's impossible for land to be less than 0 uh so this is dead code and and can be eliminated
so if you are aware of this problem it's actually very easy to fix up and so on if it's known that pointer is less than or equal to max at which we know in this example because pointer points the beginning of the raid max points and the rate you can just uh basically subtract pointer from each side and now I will have a will have landgrave and maximize pointer so maximize pointers a guaranteed not to wrap around because we know that pointers lesson max so this expression has to be evaluated by the compiler cannot be optimised out the date
so so when we were researching this a vulnerability that we um you know we had some we posted to the GCC a devil list and um thanks to the magic of the Internet you can still Google all the stuff and dumb the I want up getting called a lot of like really dead names of have turning the course this investigation and and and and the 1 person who was sort of the least uh well it's not not at all obnoxious but also helpful was the guy that she wrote the optimisation the his guy called a Ian Lance Taylor who leads the optimization group at Google in in Mountain View but 1 of the 1 the feedbacks I got from the list was the so you know uh with this optimization helps out a lot right so an example was if you look at this expression but plus and lesson but plus 100 uh if you compile this with this particle optimization this will be a optimized to a comparison of and less than 100 eliminating the possibility of wraparound in both expressions so that's probably not a big deal unless 1 expression wraps and not the other way in which case you get sort of the inverse of the the result that you were expecting
but but really that original code example still incorrect because of you you don't wanna rely on compiler optimizations to make your code correct so that that the right way to do this is to you and try to eliminate the undefined behavior by performing so the same optimization by hand and and just reducing that to comparison of and less than 100 end of and so you know many and many C
programmers who wouldn't write this expression but uh what what tends to happen is some these can result from macro expansion so it's not apparent in the source code that you're going to a generator expression of of this form the so the
behavior of this point overflow change as that of these releases of the GCC um compiler all subsequent releases and so this was really a something that happened during a maintenance release right so you get a new me this release of the compiler suddenly your code stops working and and that sort of that's probably the biggest issue undefined behavior so you know what when the compiler encounters undefined behavior in your code it's it's going to do its best to produce some code that's reasonable but it doesn't really know what you were expecting to get because of some of the behaviors not defined in an and so uh but what can happen is that you know undefined behavior is the compilers free to modify what happens uh on each release so as a work on optics they tend to focus on them for optimization work and you can get the next release the compiler and that used to work on now starts to to fail the so this particular optimizations
performed by default it 0 2 in above include optimizing for a space and not performed at 0 1 0 0 uh it can be enabled at 0 1 with strict overflow option or disabled L 2 with the no strict flow and um and then the next finish a story at a started a minute ago here but I am so and so the guy who wrote this optimization is is guy less Taylor lead the optimization group at Google and so Google has the you know a small number of applications of they run a lot you so they have these forms that computers and so and b and determine that you know this optimization that would save them uh you know a half-percent uh execution speed and uh and that their code didn't have any defects that would be the you know would sort of suffer as a result of this optimization and so by implementing this optimization it improved performance by half they could get rid of you know 180 machines off Machine Florence save you know 240 thousand dollars a year something like this and so there is a very neat argument to make this optimization for you know increase performance increase cost but it is harder to is hard to make the argument about possible security consequences and and and and the fiscal cost of that but it's also sort of a you know GCC's good description GCC because is kind of the wild west of compiler rate mean so anyone can get out there and and make a change when I won the lottery would can make changes it sort of benefit their code base in their organization but might not be sort of the best solution for everyone the and belt so when we we sort of identified this problem the Ian suggested the In a so everyone of the GCC list besides calling me names and you know defended this optimization is allowed by the standard and they're right this this is so you know allowed under the total license policy um but what he and suggest it was that they at a flag that would diagnosis problem so if your code used to work but now wasn't gonna work it was to be optimized the based on this undefined behavior if you use this option stripped off flow where n is greater than equal to 3 it would now be diagnosed and in a way that's a reasonable solution the some you know allow the optimization but have a diagnostic if there's code is going to be but negatively affected so so going through that the
full GCC flags are strict overflow basically allows the compiler to assume strict signed overflow rules of this total license behavior so any sign charismatic overflows undefined the compile assumes that it can't happen in that permits aggressive optimisation so for example a compiler always assume that i + 10 is greater than are useful ways true and that's that assumption is valid provided by plus 10 doesn't uh overflow and so when when
strict overflow is in effect and any any sort of attempt to determine whether an operation might overflow has to be very carefully written not actually involve the overflow so this is enabled at 0 2 0 3 and a less so f wrap the GCC to assume signer flow of addition subtraction multiplication will wrap around inter selection no strict overflow and of the flag that disables optimizations uh that assume into drove will be here is undefined and this this options as required by the job front end which I specifies the integer overflow is wraparound behavior and so this colorful equals and this is the option that the let you know when these um optimizations are being uh being made so it warns about cases where the compiler optimizes based on the assumption that signed into drawerful does not occur uh and it only warns about all flow cases when the compiler implements the optimization so basically you need to comply with this flag and with 0 2 so you need optimization turned on that for this flat to diagnose something so the optimization is in being performed you'll get no no warning the so um so optimizations um which assumes signed off load does not occur as long as the uh as long as it doesn't awful home so so what time these warnings can give a false positives and this is sort of a general sort of attribute of C. right I mean so you can take a long int input in a short in the compiler so sure you know why not in the basic premise of see you know 1 of the principles is trust a programmer right so the compiler thinking without looks kind a risky but and assume the programmer knows that the value this storing this long into uh is within the range of shorter and and so this assignment will succeed without a you know loss of value so and so these can be false positives but that the programmer has to so make sure that they are the so it's strict on follicles 1 this is this warned about cases which are a questionable on easily avoidable so for example X plus 1 greater than X but was stripped overflow enabled that will be simplified to 1 and this level stricto flows labeled by the wall of flat but higher levels are not right so so this is really the most unfortunate thing about this but which is the optimisation we looked at that's canadian eliminate you're buffer overflow check but will not be diagnosed at uh which is wall enabled rate which is what a lot of developers do so you need to know 1 of 2 things you need to know had identified that the the the problem encoded correctly up or you need to know how to set strict overflow equal to 3 and both of those things are or rather improbable unless you sat through uh this talk and you know I explained to you that you need to do 1 of the things and so and so this is still a fairly common problem with logical bases the so strict orthologous goes to warns about cases where comparisons are simplified the constant so for example the absolute value of X greater equal 0 and you know that that generally looks like it should uh it should be true but but if you take the absolute value of of in men in the two's complement uh representation that will uh the overflow t in man uh which is less than 0 so so this sort of uh yeah so so so this code which could uh could be false is going to be this optimized to be uh return true so strict orthologous
3 at once about cases where comparison simplified for example at plus 1 grade 1 simplified X greater than 0 and that's the lowest level warning that will detect the uh the buffer overflow check problem that we identified a strict awfully goes for warned about other simplification so for example X times 10 divided by 5 simplified to x times and in general I be I'd be hard pressed to think you know of of situations in which he didn't want this optimization to be performed and should
full 5 also warned about cases where the compiler reduces the magnitude of a constant so for example you might have a compiler in which a greater than a greater than equals uh comparisons cost exactly the same but maybe an increment operations low the cheaper than in in addition operation and so on those sort of targets the compiler my read that might change that exposed to greater than Y to X plus 1 greater than or equal to Y so this is only reported the highest warning level could that could generate a large number of false positives so
I so 1 of the things that um
I've been involved with society dimension I joined the C Standards Committee about a decade ago and you know my plan was to change the change the C language to make it more secure uh but of course what mostly happened was at the language change me and so the 1 of the things that we did succeed in doing was introducing um the that it an l analyzable Ionics into the uh see 11 standard and so we want to call this a security Alex but um you're not allowed to do that because if you call anything security it sort of implies that the C languages insecure and and that's true but you're not allowed to imply it so we we had a caller the the analyzability antics um so who so we
introduce some definitions of so 1st of 1 out of bounce the war a is just an attempt to but to modify value outside the bounds of an object and or to access a an object that's declared volatile and so then we uh we defined 2 versions of undefined behavior so there's bounded undefined behavior which is undefined behavior that does not perform and bounce the war but might perform a trap or might produce indeterminate value and if you read and determine value in that's undefined behavior so our in critical undefined behavior which is an undefined behavior that's not bounded and so in the current version see there's about 200 plus and explicitly defined undefined behaviors and all of them or by these definitions critical undefined behaviors and so what we did by creating these 2 definitions is that we have can we
restrict it the list of critical undefined behavior to these 8 so these 8 can result in at abounds rights and the other 1 292 + undefined behavior Chanel bounded undefined behaviors of that that cannot result writing outside the bounds of an object as so this is true of you know if and only if the and expel
is implemented by the compiler and I don't yet know of any compilers support an excel so on but it's a presumably a a good idea
OK so summary and recommendations I'm I'm running will fast unusual because of the lack of questions it um so
so you know it and this is a kind of obvious but you really want to avoid undefined behaviors in your code even if your code appears to be working on you know I have had this discussion has discussion with a professor at some university where that they had some code they have undefined behavior and I sent them e-mail as singers could the teachers to as undefined behavior and he said the 0 no I tested it works OK but and then I decided that this guy was someone else's problem on because it is just you know you you you can't you can't just do that all day long and so I think a good piece of advice is defined eliminate dead code yourself and they're letting the compiler do it and you know when you write code but normally the intent is that the code does something right so you don't write code that that is and you know with the intent is for it not to do anything so not doing something that usually indicates there's a logic error there sometimes people write the code on purpose of you know and and a good example is and you know default clause switch right so you all uh you'll have enumeration of types you have a case for each enumeration and then you right a default class so In in well written code that the fall co-code code is a dead code cause it's never gonna be executed new compilable optimise it out but in there in case of a maintainer and another value to the E none but fails to add the additional case which uh that the fall clause might then detect that error at run time so it's basically a defensive programming strategy and and not bad code to have so the fact that it you know that could give support dead code that's that's OK enhanced cook it dead code that was intensely written as opposed to dead code that was unintentionally written so I so when you look at all these optimizations of some of them have eliminate undefined behavior some of them uh might introduce vulnerabilities and you know so some sort of this surprising conclusion I have because when you get to this part a lot of security people who make the wrong decision right which is they tend to be very dogmatic and say turn off anything that might never cause a problem ever but but generally speaking you know optimizations do more good than harm right so they have to eliminate a lot of undefined behavior from your code and they also make you could run faster which is cool and you know so without additional performance you might be able to afford some more uh some more security checks so you don't gain in performance is always a positive so my recommendation is you know you know go ahead and build with these optimizations but uh use a strict awful flag in order to diagnose of these optimizations and see whether but see whether the compilers making assumptions that are inconsistent with your assumptions and what that means is is that your so the assumptions were wrong so then you need to revisit your code and my last piece of ice is you tell your compiler vented to implement an excel and then and then uses
earned so in summary of the C C standard is a contract between compiler writers and programmers the uh the contract is constantly amended um but but really only compiler writers shock to the meetings so it's sort of like getting divorced from your spouse in saying but you know no you go ahead and take care of the you know the agreement and I trust you you know that you're not going to screw me over the and you know that doesn't usually work and so on uh so you know unless unless more security conscious groups so to participate in the process of the the overall tendency is for the standard to they get worse from a security perspective right so so we like to go to bed thinking things are getting better but actually getting worse so you know all these uh uh compiler writers they like to eliminate guarantees from the standard in the more guarantees they eliminate uh the more room they have for further optimization and so that's sort of the overall trend but the trend is not towards greater security and the predictability of execution so OK that's all I have this is how you get in touch with me if you have additional questions and what I do now after 2 questions for the yeah we that would the unless you that and this now I I haven't heard about it I haven't seen it like the this all the so the the and the the want this the what would be what you know so uh this gentleman was claiming that Microsoft is to compiler adds at telemetry information to the code it compiles and if you wanna turn it off you have find a flag uh so what what do you think the like the yeah yeah so I'm going to give you my sort of strict interpretation of response to that rate which is um I would say that would be allowed if there were undefined behavior in your code with kids with the at the use of but some 18 on that all all look into that's interesting any other questions yeah so yeah I the I the all the that yeah I was you know the what is the to OK of you of all of the work the so what 0 so that the levels or and had to say this is the questions about using overflow levels versus separate flags so it's the levels or and wanna use the word cumulative measures the right word so so if you if you set to level 5 you get everything all 5 years all of this really and sort of the I the yeah its the all that is the it was you also know all right I mean it could be set of 5 and so it you know this is not entirely scientific but sort of you know but basically that the levels are meant to be a the you know sort of decreasing chance of false positive rate at level 1 it's it's likely that it's a true positive and as you increase the levels you're going to produce more false positives so it it's a trick to reduce the number of diagnostics at the programmer sees this this programmers and you know or prone to ignore warning messages and and so if you if you give them too many warning messages they might simply turn it off so you know my my kind of recommendation with the levels would be start with level 3 and try to address those warnings and then you might try sitting at the 4 and see what happens and if it looks too intimidating you could you can back off to to 3 again but I would I would least look at those 3 the will yeah the this the yeah yeah that's true but it yeah it and it's
you know there's a lot of things like the unfortunate right but I mean the parallel thinking is I I started playing the Torah when I was 5 so I got a 70 dollar to core right which are in their possible to play right but you know can give you 5 vendor you know so so the amateurs kind of have the worst sort of tools in a way that no OK any other questions it was more in there well known although go be some additional runtime overhead I don't know if you you mention says so well that's the greatest concern is the runtime overhead that in a so it's it's that so that list of critic undefined behaviors that was very carefully negotiated over several years and a lot of back and forth and it sort surprise me how how how much pain went into that given that no one's implemented the the Annotea and so the units a year that there are a city quick example you know some to say you have a boolean and so you have an array of uh of of the 2 elements right so he'll say the take the Boolean induces an index into the array that's another problem that is a Booleans typically of the you know of at least a bite and so there's a lot of values it can be represented I 0 1 the compiler writer might just take that value without checking in index the array and if is that value or something other than 0 1 you have an bounds right in and so that sort of behavior would be disallowed under an excel so so now there's some additional run-time checks and so forth you need to do to make sure that the Boolean is actually true or false and doesn't have a you know different value you the uh I I don't because it hasn't been implemented so there's no you know we can't run the spec benchmarks and come up with a comparison the the thank sir on I the this 1 the the and what the no so and the yeah yeah I know I agree with that I mean because uh what about uh obfuscation techniques circle that you don't need and optimizes removed so that you don't need a so this is sort of a inverse process to what was the yeah the yeah well yes of your if you're depending on so so in C language right from uh unsigned integer wraparound is well defined behavior is required to wrap around signed integer overflow is undefined behavior and so it can it can be treated like we discussed and and something like I glossed over in the talk a little bit which is kind of odd but is that some tears trickle flow flag and a lot of those fled deal with the behavior of signed integer overflow but the actual undefined behavior in the wrapped around the buffer also example was not integer overflow it was undefined behavior that results from adding a pointer but to an integer not being outside the bounds of the array but but it was sort of thrown into the same category with the integer overflow because and most programmers I don't have seperate out those concepts so it is a little bit confusing so even you know even Lance Taylor at you google you know kind of a the kind of blurred the correctness of this by combining getting that led to the integer overflow flag paying any questions on this side and I'm looking that way so much it's because you guys are still hung over against the me the server I can you I don't know of any current efforts to implement it and maybe someone working on it they're keeping it below the radar there any other questions the because I should repeat the questions of people can hear you want to know if it was implementing an excel the attained no further questions feel like I'm done then you know in the