We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Trends in Open Source Security

00:00

Formal Metadata

Title
Trends in Open Source Security
Title of Series
Number of Parts
90
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
When dealing with free and open-source software, we have to work together with reporters, upstream developers and other distributions to protect end users from security threats. For distributions, it is a challenge to deal with a huge collection of software packages, both internally and externally developed, employing many different development procedures. This talk looks at best practices which emerged for vulnerability tracking. Tracking already reported vulnerabilities is only one aspect, however. We discuss tool-chain based hardening features (which can sometimes turn vulnerabilities exploitable for code execution into mere crashers), some remaining low-hanging fruits in this area, and more radical approaches for avoiding low-level vulnerabilities related to memory safety. Some of the APIs we provide are difficult to use, and we look at ways to detect API misuse statically, across an entire distribution.
25
Thumbnail
15:46
51
54
Thumbnail
15:34
55
57
Thumbnail
1:02:09
58
Thumbnail
16:08
62
Thumbnail
13:26
65
67
Vulnerability (computing)Video trackingAreaDistribution (mathematics)ChainDistribution (mathematics)Mathematical analysisMereologyVulnerability (computing)Projective planeIdentifiabilitySoftwareMilitary baseLecture/ConferenceJSON
IdentifiabilityNumberProjective planeVulnerability (computing)Ring (mathematics)EmailBuffer overflowDatabaseElectronic mailing listSoftwareSocial classDistribution (mathematics)WebsiteForm (programming)Revision controlInformation securityLecture/Conference
Video trackingVulnerability (computing)Information securityIdentifiabilityVulnerability (computing)CodeOpen setProcess (computing)Descriptive statisticsGoodness of fitIdentity managementXMLLecture/Conference
Revision controlVideo trackingVulnerability (computing)IdentifiabilitySoftwareTheory of relativityBranch (computer science)Vulnerability (computing)Extension (kinesiology)Revision controlCASE <Informatik>Distribution (mathematics)Forcing (mathematics)Computer animationLecture/Conference
Revision controlVulnerability (computing)Video trackingBit rateRevision controlBranch (computer science)Inversion (music)NumberNumbering schemeVulnerability (computing)IdentifiabilityMultiplication signMarkup languageComputer programmingSubsetParsingParsingComputer animationLecture/ConferenceXML
CodeInformation securityRevision controlObject (grammar)Message passingSpecial unitary groupBranch (computer science)Serial portDatabaseComputer animation
Physical systemRevision controlRevision controlSoftware testingSoftware bugBranch (computer science)Bit ratePhysical systemVulnerability (computing)Personal digital assistantComputer animationLecture/Conference
Video trackingVulnerability (computing)Distribution (mathematics)Vulnerability (computing)Software bugComputer fileInformation securityProduct (business)Dependent and independent variablesCuboidLatent heatInformationAliasingPolarization (waves)MetadataUniform boundedness principleWhiteboardEnterprise architectureField (computer science)Lecture/ConferenceComputer animation
Hill differential equationVideo trackingVulnerability (computing)Sturm's theoremPhysical systemInformationRevision controlSoftware bugUniform resource locatorObject-oriented programmingVulnerability (computing)NumberNumbering schemeSlide ruleCASE <Informatik>Web 2.0Pairwise comparisonWeb pageComputer animationLecture/Conference
User interfaceQuantum stateHill differential equationEuler anglesNP-hardOvalEnterprise architectureUniform resource locatorSoftware bugSign (mathematics)AliasingInformation securityDomain nameProduct (business)Mobile WebVulnerability (computing)XMLComputer animationLecture/Conference
Inclusion mapPatch (Unix)Video trackingVulnerability (computing)Vulnerability (computing)Numbering schemeProjective planePower (physics)Similarity (geometry)QuicksortArmInformationMathematical analysisDescriptive statisticsProduct (business)Software bugPatch (Unix)Link (knot theory)Repository (publishing)Computer animationLecture/Conference
Tablet computerPatch (Unix)Reverse engineeringInformation securityLatent heatFiber bundleSoftware testingSoftware developerVulnerability (computing)Distribution (mathematics)Multiplication signInformationWebsiteLecture/Conference
User interfacePatch (Unix)Video trackingVulnerability (computing)Hacker (term)Gamma functionInformationComputing platformInformation securityMathematical analysisPatch (Unix)Distribution (mathematics)Electronic mailing listRepository (publishing)Streaming mediaInformationPoint (geometry)Revision controlVideo gameNumbering schemeDistribution (mathematics)Computer animation
Different (Kate Ryan album)Revision controlNumbering schemeMappingEnumerated typeComputing platformAbstractionCartesian coordinate systemProjective planeLecture/Conference
MultiplicationInformationComputing platformRevision controlControl flowRepository (publishing)Patch (Unix)Information securityRepository (publishing)Distribution (mathematics)Patch (Unix)TwitterRevision controlCommitment schemeInformationMappingProjective planeInformation securityFlow separationComputer engineeringMereologyLink (knot theory)Lecture/Conference
Software developerPoint (geometry)View (database)Product (business)Distribution (mathematics)Information securityTwitterMereologyVulnerability (computing)Projective planeFlow separationPatch (Unix)Lecture/Conference
CodeComputer programmingVulnerability (computing)Power (physics)Information securitySpeicheradresseLatent heatControl flowRandomizationAreaCrash (computing)ResultantJSONLecture/Conference
CodeCrash (computing)Point (geometry)Web pageSoftware bugLecture/Conference
String (computer science)OvalSpacetimeHeat transfer coefficientSoftware engineeringElement (mathematics)Computer programmingFunctional (mathematics)Pointer (computer programming)Function (mathematics)Address spaceCodeMemory managementAdditionMeasurementDefault (computer science)Library (computing)Atomic numberBuffer solutionSpacetimeValidity (statistics)Stack (abstract data type)Distribution (mathematics)Run time (program lifecycle phase)Computer architectureOperator (mathematics)Multiplication signRandomizationExploit (computer security)Block (periodic table)CuboidBitFile formatCompilerString (computer science)LoginWeb pageDataflowPoint (geometry)Projective planeComputer hardwareCASE <Informatik>Buffer overflowDoubling the cubeFreewareSoftware bugDirection (geometry)IntegerComputer animationLecture/Conference
SpacetimeSoftware developerPatch (Unix)LengthVulnerability (computing)CodePoint (geometry)File formatCompilerWrapper (data mining)Pointer (computer programming)Context awarenessInformationComputer programmingSource codeDistribution (mathematics)Functional (mathematics)Buffer solutionOpen sourceComputer animationLecture/Conference
Internet forumUniformer RaumFehlererkennungscodeFile formatFormal grammarGrand Unified TheoryUser interfaceGraphical user interfaceComputer clusterHill differential equationPerturbation theoryCodeHaar measurePhysical systemComputer programmingCodeLibrary (computing)Vector potentialFile formatMultiplication signInternet service providerRandomizationAddress spaceSoftwareLocal ringProcess (computing)Loop (music)Pointer (computer programming)Crash (computing)AreaForceBitForm (programming)Linker (computing)String (computer science)Image resolutionOpen sourceResultantKeyboard shortcutLengthSoftware developerAsynchronous Transfer ModePoint (geometry)Overhead (computing)Proxy serverMathematical analysisInformationDataflowMedical imagingService (economics)Java appletWeb pageOvalSymbol tableContent (media)Computer architectureRepetitionDeterminismChainTable (information)Instance (computer science)Functional (mathematics)Online helpMachine codeSequencePlanningRight angleParameter (computer programming)Buffer overflowStack (abstract data type)Variable (mathematics)MultiplicationUniform resource locatorCountingIntegerCausalityFrame problemStructural loadMathematical optimizationResource allocationSystem callState of matterGastropod shellLogical constantComputer clusterBuffer solutionDynamical systemExploit (computer security)32-bitMemory managementRead-only memoryPredictabilityLecture/Conference
Boundary value problemHeegaard splittingStack (abstract data type)Loop (music)WritingResource allocationCodeSound effectLecture/Conference
International Date LineOperator (mathematics)User interfaceHill differential equationEmailInclusion mapLibrary (computing)Read-only memoryJava appletKernel (computing)Thread (computing)Game theoryDrop (liquid)Formal languageMathematical analysisDistribution (mathematics)CodeInformation securityPay televisionDistribution (mathematics)Standard deviationOrder (biology)Library (computing)PolygonFormal languageFunctional (mathematics)Operator (mathematics)Vector spaceTerm (mathematics)Spring (hydrology)Object (grammar)MathematicsElectronic mailing listChainAreaSoftware bugProgrammschleifeProgrammer (hardware)Level (video gaming)Point (geometry)Sound effectPoisson-KlammerProgramming languageMathematical analysisSoftwareComputer programmingRight angleBeta functionJava appletInterface (computing)Communications protocolSystem programmingScripting languageDistribution (mathematics)ImplementationService (economics)Projective planeScaling (geometry)Vulnerability (computing)Multiplication signDifferent (Kate Ryan album)Einbettung <Mathematik>System callException handlingComplex (psychology)Run time (program lifecycle phase)Digital photographyPositional notationRange (statistics)Address spaceCryptographyComputer animationDiagram
Mathematical analysisFluid staticsWindowNetwork switching subsystemSymbol tableDatabaseSystem callIdentity managementControl flowRevision controlRepository (publishing)Subject indexingCartesian coordinate systemDynamical systemLinker (computing)DatabaseLimit (category theory)Mobile appString (computer science)Public key certificatePower (physics)Parameter (computer programming)Symbol tableNP-hardSearch engine (computing)Point (geometry)DemonFluid staticsSubsetValidity (statistics)Repository (publishing)Mathematical analysisData structureDirectory serviceStructural loadMaxima and minimaSoftware bugResultantDifferential geometryModule (mathematics)Process (computing)InformationVulnerability (computing)RecursionSlide ruleLibrary (computing)PlastikkarteEndliche ModelltheorieThermal expansionRhombusWeightComputer fileMatching (graph theory)HeuristicNumberDifferent (Kate Ryan album)Vector potentialFormal languageState of matterFunctional (mathematics)Real numberAdditionDistribution (mathematics)MathematicsTraffic reportingRevision controlCodeVirtual machineSet (mathematics)Information securityPattern languageProjective planeBinary codeUniform resource locatorMultiplicationType theoryOpen sourceIdentifiabilitySimilarity (geometry)Auditory masking1 (number)Buffer solutionChord (peer-to-peer)Computer programmingLogicShared memoryQuicksortQuery languageImplementationParsingDisassemblerRuby on RailsSystem call
Multiplication signLecture/Conference
Transcript: English(auto-generated)
I'm going to talk about three main areas. Vulnerability tracking, how we can, as a large distribution of upstream software, track vulnerabilities in the upstream software we ship. Then about two-chain hardening. And in the final part about distribution,
why did fact analysts analysis? So vulnerability tracking these days for all major Linux distributions is CVE based. CVE, Common Vulnerabilities and Exposures, is a project started by MITRE. And it aims to assign a unique identifier
to every publicly disclosed vulnerability. I've put up an example for such an identifier. Basically, it's just CVE dash the year dash increasing
number within that year. The CVE project doesn't create a taxonomy. It's more like bird ringing in the sense that ornithologists put on little rings with numbers on the legs of birds so they can track them.
CVE is aimed at identifying vulnerabilities in concrete software products, in concrete versions, not about general vulnerability classes like buffer overflow exploits or similar. What happens these days is the CVE assignment
alerts distributions because anybody can download the CVE database from MITRE's website and you can see, OK, compared to yesterday, there have been these identifiers assigned. And then you can start processing these new vulnerabilities.
This works very well for public issues. And we actually have a mailing list run by OpenWall called OSS Security where Red Hat's called Xyphic watches and assigns CVE
identifiers for requests that come in for assignments so that the vulnerabilities that haven't received a CVE identifier yet get one. There are also many identifiers assigned
by vendors who have their own pools so they can put on a CVE identifier when they release a security advisor without coordinating with anyone. These CVE identifiers often have less than clear public descriptions,
so we don't really know what it's about. But for these identifiers that are assigned through the open process, code is very diligent in making sure that the vulnerabilities are precisely described
so that we actually can use these identifiers later on. So how do we use these identifiers? We have to put the identifier in relation to the software we ship. The first aspect is the package.
We have to identify which package is affected. And for each branch we ship, we can note a minimum fixed version when we fix the package. This is an approach that is based
on version-based vulnerability tracking. First, I wanted to present it as the ultimate approach to vulnerability tracking, but it turns out that I have forgotten that it's really, really complicated in some corner cases to make this work.
So Debian is doing this for the unstable and stable distributions, and we have learned over the years that it is really complicated to cover all eventualities.
But it works really well to some extent. Obviously, the version-based vulnerability tracking is tied to version numbering scheme and branch models, so you cannot really use Debian version numbers on Ubuntu
or even on Opus or Fedora. That's not going to work. So here's an example from the same CVE identifiers before. That's a vulnerability in Ruby on Rails, which is related to YAML parsing. YAML is yet another markup language,
and it has the odd aspect that it's a superset of JSON and XML at the same time. The Ruby program has used the YAML parser to parse XML
in JSON, I think, and the problem with that is that YAML is far more powerful. It's comparable to PHP's unserialized or Python's SQL or Perl's Storable in the sense that once you serialize an untrusted object,
you are almost guaranteed to have arbitrary code execution because YAML deserialization instantiates arbitrary objects, and you don't want to do this with untrusted data. So here's an example from the Debian version tracking
database, and I'm just concentrating on the annotations for the unstable branch and for the Squeeze, the stable release. The Squeeze release had received a security update,
so there was a subsequent security update. So we have created this one, and it's on the same branch, so we know that it's fixed by comparing it with the branch-specific fixed version. For the unfixed versions, we have an older version
from testing that was in testing or unstable before the bug was fixed. And here's an older version from the Squeeze release
branch, which predates the fixed version. And the interesting thing is that this data can be used to rate installed packages on the system. So you can use this data to see if you have vulnerabilities on your system.
The other approach for vulnerability tracking that's probably used by most other distributions is just file tracker bugs in your bug tracker. That's what Red Hat is doing with Fedora
and the supported products. Basically, the security response team is to create a tracker bug in backzilla.redacom, which is alias to the CVE. This tracker bug will be made public after disclosure.
And there's also extensive metadata in the whiteboard field of that bug, which is also publicly accessible. And this tracker bug depends on product-specific bugs. Usually, you can only see those for Fedora,
not for the internal tracker bugs for the enterprise products. Usually, they're not publicly accessible because there's QA information in them. And that's usually not interesting. So what happens when you upload to Fedora using a URL
and the tracker backzilla bug for Fedora is that the Fedora update system actually feeds back the fixed version information into backzilla. So you have that in a single player. All this is quite different from version-based tracking.
But the advantage is it's conceptually much simpler. And you can cover any case because you never actually have to compare version numbers. So even if your version numbering scheme is totally strange, explicit tracker bugs still cover it.
So here's the same vulnerability. And if you pull up the slides later from the web page, you can look at the references there.
The URL at the top is the CVE tracker bug. You could actually put the CVE name behind the equals sign. And it would still work because it's an alias. And the Fedora tracking bugs are public.
And there are internal bugs for the enterprise products, which are also affected by this vulnerability. And these enterprise products were eventually fixed through the security advisors that I mentioned there. And obviously, there's a lot of automation
for this to work smoothly. So I believe that Ubuntu, Gen 2, OpenSUSE, et cetera, use similar schemes for vulnerability tracking. And the good thing is that most upstream products provide critical information to power this sort of vulnerability
They have the impact analysis and the concrete description of the vulnerability in the security advisory or in the bug tracker. And they often link to individual patches or commits
in Git or material repositories that fix specific vulnerabilities. So as a Linux distribution, we need these individual patches to be able to prepare security updates because we don't
want to bundle months of unrelated development work with a security update because too much could break. So if we don't have this information, we have to reverse engineer it from tablets
we could download from your website. And this is time consuming. You already have done this work. And we would like to spend them on patch review or testing. And as a Linux distribution, we will always
distribute isolated security patches. And make it quite visible what we are fixing because that's just how we prepare the updates. And some of the analysis is also on the public OSS security list. So by hiding this information, you
have not much security advantages. So the important thing here is that we really want to encourage upstreams to make available individual commits in public version
control repositories so that the security fixes can be prepared in a timely manner. So one thing that comes up at this point is how can we improve cross-distribution information
sharing at this point. One thing that's annoying or just a fact of life is that versioning schemes and package names differ greatly between different distributions. Even though we all ship Firefox under some name,
we ship GCC, we ship Bash, we ship Apache, HTTPd. Yet the names of the packages are all a bit different. And we also encode the upstream versions in different ways into our own versioning schemes. But I think in an abstract sense,
the CVE to packages mapping could be shared. There's a separate project for that. It's called Common Platform Enumeration. That's also from MITRE. This might act as a meat in the middle point
where both the distributions can see, OK, this CVE applies to this PE, and our own packages to this CPE are such and such. And this should make this initial mapping step easier.
But apart from that, it's really difficult to automate the versioning part of this. So what does the fixed upstream version look like? And this information is really hard to automate,
unfortunately. But again, please publish your security patches in a publicly accessible version control or repository as separate commits. So this is something we, as linked distributions, we value tremendously. And we would like to encourage to keep it that way.
It's not a trend that went us away from publishing individual patches. But there are a few projects that have always operated by releasing complete tarballs, which
can obtain unrelated development and not just security fixes. And of course, there are some projects that go from a separate patch point of view to a tarball-based distribution.
And we don't hope that this trend, that this turns into a trend or causes a problem for us for preparing security updates. So that's it for vulnerability tracking.
The next part is toolchain hardening. Basically, that's about changing GCC, Binyeotils, the kernel, the C, everything that lies below most programs running on GNU Linux, and change them to reduce the impact of security
vulnerabilities. Most of these countermeasures are probabilistic, countermeasures against code execution. That means that there's always a non-zero probability that an attacker goes through, because there's
randomness involved. And if the attacker guesses correctly, it's still going to result in code execution. And the countermeasures basically fall into two different areas.
One aspect is we try to make it more difficult for the attacker to write at a specific memory location that affects the program's control flow. So it should not be easy to redirect execution,
because the memory location is either not writable at all, or it's difficult to locate in the program. And the other aspect is that we want to make it more difficult to write
a value that has a meaningful impact for the program being attacked, so that it doesn't result in a crash, but does something more interesting for the attacker. And it's important to keep in mind that these are not
complete solutions. What we do is we make the program crash if the countermeasure is effective. And code is not executed, but it's still a crash. And these bugs obviously still need fixing. They are just something that might be lower priority, but it's still something we actually
want to fix at some point. So these features mentioned on this page are mostly enabled by default, just if you use GCC on the tool chain.
The first point is address space layout randomization that reorders shared libraries, so they are positioned in the address space. The start address of the stack is randomized. The start address of the heap is randomized. This makes it more difficult to find spaces
to write to in a predictable fashion. And it also makes it more difficult to get a valid function pointer to code in a shared library, because the address has been randomized. Then there's another aspect about program start addresses.
I'm going to cover that in a minute. Then the next thing that you get basically for free is non-executable stack and heap. That means that when the attacker supplies code to your program and it's stored on the heap, it is not directly executable, because the heap hasn't
got the execute bit in the page projection set. That relies on hardware support for x86. That hasn't been in the initial architecture,
but all chips in common use have it, so it's basically universally available. Then also some time ago, it must have been several years by now, is that we have fixed glibc so that double free bugs where you deallocate the same pointer twice
do not result in a bug that's directly exploitable for code execution or heap manipulation. There are checks in glibc that detect such direct exploitation attempts.
Then there's a stack protector that's about protecting the return address on the stack so that a stack-based buffer overflow that overrides the return address also overrides a stack canary, and the canary then is checked before returning
from the function if it has the expected value, and if it hasn't, we assume it has been overridden by an attacker and abort the program. This kind of measure you have to enable there explicitly, but most distributions have tweaked the GCC defaults in a way that this is always active when compiling programs.
Then there are additional compiler warnings for formant string misuse. Some distributions turn them into errors, either during build or in a separate step by examining the build logs and the warnings GCC has written. And there are also warnings about some cases
of integer overflow that trigger undefined behavior. What's new that landed in GCC in last fall is operator array new hardening for C++ programs. There used to be a really old bug disclosed
in 2002, which basically means that if you allocate a large array using GCC and operator new, it could happen that the value wrapped around internally and what you get back from the C++ runtime library
is just a very small heap block. And then you start writing into the array, and you get a heap-based buffer overflow. This has been fixed in GCC 4.8, which is currently under development.
It's always active. There's no way to turn it off. And the patch has already been backported to Fedora 18. So in Fedora 18, most critical patches have been rebuilt with GCC and the fix applied.
The unfortunate aspect of this vulnerability is that it is in code generated by GCC. So you have to recompile everything that might use this C++ functionality. And before that, you still have latent exposure
to this vulnerability. But it's going to be fixed, and GCC 4.8 is going to be released in April or May, I think. So then, or you could just pick up the backported patch from Fedora.
So another thing about two-chain hardening, this is completely optional, but many distributions in Evelet is called Fortify Source. The very neat aspect is the first item that's a cooperation between GCC and Jellipsee.
GCC provides a compiler built in where you can put in a pointer, a bare pointer, without any length information. And GCC uses context from the program source code that it sees to infer the length of the array. And Jellipsee contains wrappers that pass this length information,
if it's available. It's not always available, of course, to wrapper functions which perform length checks. So what you get there is you have stack-based buffer overflows which reliably result in crashes
because the length check kicks in and terminates the program before the buffer overflow actually occurs. And Jellipsee disables a person's N in writable format strings in Fortify Source mode.
There used to be a back, but this has been fixed. And the second point, it disables, disarms format string attacks that use person N for writing data in a controlled fashion.
So not everything is, we don't enable all hardening opportunities or functionality that is readily available in the through chain right now. For instance, one thing that would help,
which is obviously the way, is to get rid of 32-bit architectures because 32 bits are just too little, contain too little entropy to get
full randomization and full unpredictable heap addresses or shell library addresses. That's also another aspect because size T is 64 bits in 64-bit architectures. You often have integer overflows
that result in buffer overflows, which are only exploitable on 32-bit architectures because on 64-bit architectures, the multiplication is in 64 bits and just doesn't overflow there. So getting rid of 32 bits is one step
that's going to happen eventually and will help us somewhat in this area. Another thing that's actually pretty easy to do, but Fedora, for example, hasn't done is disable prelink. Prelink disables address space layout randomization
because it picks a single fixed value that is valid until you run the tool the next time on this system. So it's not entirely predictable. It still probably works quite well against network-based attacks where the attacker doesn't see the addresses on the local system,
but it doesn't work against local attackers who can just check what are the addresses used on this system by prelink and use this information for further attacks.
What we should do is randomization of program start addresses. That is one point where we do not fully randomize executable pages in the process image. Basically, all processes have a start address,
which is four and a couple of zeros. That's a constant. And the fear is that the actual program itself contains enough machine code to provide targets for return-oriented programming.
So there is the potential of address space layout randomization bypass there. This hasn't been implemented because there is a concern for overhead. But Grant Murphy has recently done some analysis, and it looks like we should really do this and enable it for everything,
and not just critical network services like OpenSSH. Then another thing we can do, which is readily available, is bind now. That's a feature of the LDSO dynamic linker. It instructs the linker to perform all symbol resolution
at program start and then make the global offset table read-only. This means that the global offset table cannot be overwritten by attackers. Because it contains function pointers, among other things,
it is really important to get this protection. What we also can do, but which doesn't seem to be much on the radar, is we could compile a package with frepv. This makes integer overflows in C deterministic using
two compliments arithmetic. But this only helps broken programs, which use incorrect ways for overflow checking in C or C++. So it's not really compelling to enable that
because it interferes with loop optimization. But maybe we have to bite the bullet and enable that as well. The next point that's also already in GCC has been for a while. That's stack checking.
That's about stack-based buffer overflows or stack point manipulation resulting from unloaded allocate calls or variable length array allocation on the stack.
Basically what happens is that a call to allocate with a large argument moves the stack pointer from within the allocated stack completely outside the stack where anything can happen. Most of the time, there was going to be a crash. But it's conceivable that this is exploitable.
And I think it has been demonstrated recently that these things can result in arbitrary code execution. The state right now is that fstack.tech has considerable code size impact because every call is instrumented,
even though the stack frame is small and doesn't contain allocate calls. So we really need to work on that and get the code size impact down. And then we can start looking at performance impact in more detail.
There's a backup plan if the current approach which writes probe values in a sequence on the stack and hopes to reach the guard page on a large oliga, which will kill the process.
If this approach with the loop or the writing is too expensive for Alocar, we have, as a fallback plan, we probably could stack boundaries provided by split stacks without actually enabling split stacks. We would only use the stack boundary
that is used by split stacks to detect that we are too close to the top of the stack to fulfill the Alocar allocation request. And I hope that we will be able to tackle this problem this year because there's so much code
out there that uses Alocar and so far attempts to discourage its use have not been received by programmers because they like it. It's an easy interface. You don't have to call free, unlike malloc. But it obviously has this security risk.
So we have to fix it in the tool chain. One thing that I'd like to see fixed as well is the subscripts checking for operator brackets in C++ that
affects standard vectors, standard string, and standard array. So if you use this shorthand notation, it's basically currently, with GCC, it's as safe as C programs because there's no range checking, even though the runtime library knows how long the array is and could insert the size
checks as it is done for Java or Python or many other programming languages. The C++ standard actually gives us permission to do this check. But we tried to make the change to the library
and implemented it. And it seemed to have a performance impact on tried loops. So we had to put it on the back burner for now. And I think that's the research needed, how we can address this. And in the meantime, if you're a C++ programmer
and want a subscript check, you have to use the add member function, which is the same as the subscript operator, but it's required to perform the subscript check. So this is something where we still have to do some work,
I think. So one aspect that makes it difficult for us to talk about these hardening features is that we have to decide how much performance impact we want to take in order to improve security.
And what I'm trying to show here is we have a countermeasure that we implement. And the damage just stays unchanged. And now we deploy an effective countermeasure.
And the damage from attacks we measure goes way down. And then comes a new attack and goes up. The problem with that is we lack data on detected attacks at a large scale. And that's why we're here, because we count the R and the target in the sense that others are.
It would be different if Oracle had open sourced the Java plug-in, obviously. But fortunately for us, that's kind of out of scope. Then we would have this data available, what changes we make and what changes we see in attack success.
But lack of this data makes it really difficult to judge security improvements in the toolchain in an objective way. So here's a quick list of things we could do if we feel that we want to make things,
to take it to a new level. There are some things we can do to C, C++ that probably change the language quite a lot and require probably standards work.
So it's a very long-term project. I'm not sure if that's going to happen any time soon. Equally unrealistic is library consolidation. That's a thing that should happen,
but it's unlikely to materialize. I think most Linux distributions ship four or five different implementations of SSH. And that's something that's full of a complex crypto protocol
that's not going to help to bring down security bugs. One thing that might help is Floss-specific secure coding guidelines. There have been a couple of books about secure coding, but they usually do not address new Linux needs
and are pretty much out of date by now. So we probably have to revive that. And eventually, all this experience we gained from fixing vulnerabilities
and documenting risky library calls should probably lead to better APIs at one point. But that's also rather long term. And of course, we can go completely crazy and rewrite everything in a different programming language. And curiously, it's completely unpractical right now
because most of the new system programming languages are just not deeply embeddable in the sense that we can replace code in a library and the calling program doesn't notice that it's no longer C, but Rust.
For example, the Netscape Security Services Library could eventually use Rust for implementing critical functionality. But right now, neither of the languages except Ada and maybe LuaJIT qualify for this kind of deep embedding. And Ada is probably not to everyone's taste,
but there's a tremendous amount of tools available for Ada which help to build secure software. Perhaps we should look into that more. And I understand Ada has a huge following of programmers
here at FOSTM, so maybe that's going to be interesting. So my final topic is distribution-wide defect analysis. This is about using tools to find vulnerabilities
across a whole distribution. And what I did for preparing this slide is we ran a proprietary static analyzer on all changed packages in Red Hat Enterprise Linux 6, which
were released in 2012. And we looked at the changes reported by the static analyzers report a huge number of potential issues. And we looked at the before-after state
for each reported bug and found interesting actual matches for vulnerabilities and bugs we have fixed in Erata
during 2012. Only the first one is a real security bug. The other ones were regular bugs. And the first one actually was caught by the Fortify source buffer checking logic because GCC has sort of its own static analyzer implemented.
So this is something we should gather more data on it. And there's a Fedor project. He has the URL, which tries to gather all kinds of information about open source static analysis
tools. So we can use them in Fedora and actually tweak them so that they match closely to the APIs we use and provide us with good information about our code. One thing that I find very interesting
is matching machines and humans and coupling their different skill sets. One thing that you can do is you can take a search engine, what actually is happening right now a lot, and search
for certain patterns. And then manually go through the hits and identify potential security issues that are related to the search string. For example, the Ruby on Rails vulnerability I mentioned earlier could be located
in using this string and this code point to code copies that have similar vulnerabilities. But code search engines are sort of difficult to set up. So Vincent Dainen and later myself have written an indexer for ELF symbol databases.
For an ELF symbol database, basically what we do, we unpack RPM packages, pipe them through an ELF parser, and load the results into a PostgreSQL database.
That has really interesting applications. You can use the full power of PostgreSQL joints and anti-joints and use them to find library calls that are highly likely to be vulnerable to specific attacks.
Like here, the first example is from Expat where you call XML parser create. But do not restrict entity expansion using XML set entity textual handler. This means that a small XML file you download from the net
can blow up in your process into a multi-gigabyte XML structure. Or this one is pretty complicated but can be implemented in SQL. You can implement the dynamic linker or model the dynamic linker using recursive queries
in PostgreSQL. And then you can find insecure uses of getenv from PAM and NSS name service switch modules. That this is something that could lead to privilege escalation because the module, the PAM
module, the NSS module could run in a privilege process such as SU or passwd. So I think for global analysis, we could do much more if we had better tools. We could use debugging information.
And something we could do is we could actually pass the compiled code and extract further information from disassembly. And it would be nice to cover additional languages. And for dynamic languages like Python or PHP, we probably need heuristics to gather good information
about the functions which are being called by the programs. So what I hope to have eventually is to have combined the f-sample database with something that go through the binaries
and extracts function arguments, for example. And uses that to validate, to discover certain types of vulnerabilities like unsecure U-mask in daemons or accidentally disabled SSL certificate
validation. Or the last point is about pathmax bugs because pathmax actually isn't a hard limit on Linux. And we could use the f-symbol database as an index to locate interesting points and then use disassembly
to extract the function arguments and actually find all these calls in a reliable manner. So that's it for me. So I think the two things you should take home is we should try to fix allocate this year.
And static analysis and code search engines are exciting. And please share your version control repository so that we can use that to prepare security updates. OK, that's it for me. If there are any questions?
Since there is not so much time, there is time for just two questions.
Anybody? Is there any question? One, two. OK then. Thank you.