We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Abusing XSLT for Practical Attacks

00:00

Formal Metadata

Title
Abusing XSLT for Practical Attacks
Title of Series
Number of Parts
109
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Over the years, XML has been a rich target for attackers due to flaws in its design as well as implementations. It is a tempting target because it is used by other programming languages to interconnect applications and is supported by web browsers. In this talk, I will demonstrate how to use XSLT to produce documents that are vulnerable to new exploits. XSLT can be leveraged to affect the integrity of arithmetic operations, lead to code logic failure, or cause random values to use the same initialization vector. Error disclosure has always provided valuable information, but thanks to XSLT, it is possible to partially read system files that could disclose service or system's passwords. Finally, XSLT can be used to compromise end-user confidentiality by abusing the same-origin policy concept present in web browsers. This presentation includes proof-of-concept attacks demonstrating XSLT’s potential to affect production systems, along with recommendations for safe development. Speaker Bio: Fernando Arnaboldi is a senior security researcher and consultant at IOActive, Inc. He has over 10 years of experience in the security research space (Deloitte, Core Security Technologies and IOActive) and holds a Bachelor's degree in Computer Science.
32
Thumbnail
45:07
PasswordPresentation of a groupVulnerability (computing)Exploit (computer security)Euler anglesCodeSoftware developerData integrityFunction (mathematics)Antivirus softwareImplementationInformation securityInformation technology consultingThermal expansionMultiplicationMalwareExploit (computer security)Assembly languageVulnerability (computing)Software testingComputer virusParsingData flow diagramINTEGRALProgramming languageMachine codePasswordAreaDifferent (Kate Ryan album)
Random numberProxy serverComputer fileInformationSoftwareRandom number generationProgramming languageInformationRevision controlNumberMetropolitan area networkMultiplication signError messageINTEGRALWeb browseroutputWeightRight angleBitWater vapor
Revision controlWeb browserRevision controlWeb browserFunctional programmingCoprocessor
SoftwareBefehlsprozessorServer (computing)Process (computing)Client (computing)Line (geometry)Web browserServer (computing)SoftwareClient (computing)CoprocessorLine (geometry)Set (mathematics)Text editorProgramming languageLibrary (computing)Web browserAssembly languageMultiplication sign
Server (computing)Common Language InfrastructureJava appletWeightWeb browserJava appletClient (computing)Server (computing)Salem, IllinoisCoprocessorLibrary (computing)Flow separation1 (number)Sheaf (mathematics)
Client (computing)Web browserGraphical user interfaceGoogle ChromeVector spaceWeb browserClient (computing)Library (computing)Server (computing)CoprocessorResolvent formalismPerspective (visual)Line (geometry)Sign (mathematics)ResultantParsing
Vector spaceRevision controlWeb browserScripting languageJava appletResultantCoprocessorComputer fileRevision controlInformationClient (computing)Category of beingWeb browserSinc functionAxiom of choiceMachine codeCASE <Informatik>Vulnerability (computing)Type theoryWeb 2.0
Revision controlService (economics)Client (computing)Revision controlWeb browserServer (computing)CoprocessorProcess (computing)Sheaf (mathematics)
Server (computing)Client (computing)Real numberIntegerWorld Wide Web ConsortiumError messageNumberServer (computing)Point (geometry)Inclined planePresentation of a groupIntegerSoftware testingClient (computing)CalculationDifferenz <Mathematik>Addition
Point (geometry)CalculationFunction (mathematics)CalculationProcess (computing)Number
Sample (statistics)Graphical user interfaceServer (computing)Point (geometry)CASE <Informatik>Process (computing)Graphical user interfaceClosed setNumberProgramming languageFingerprintImplementation
Computer programmingError messageMatroidDemo (music)Database transactionCartesian coordinate systemNumberDecimalProgramming languageComputer programmingMoment (mathematics)EmailAsynchronous Transfer ModeIdentity managementChemical equationArithmetic meanPoint (geometry)
CoprocessorGraphical user interfaceNumber1 (number)Revision controlRight angle
IntegerExponential functionNumberIntegerProgramming languagePotenz <Mathematik>DigitizingPosition operatorFile format
IntegerInternet ExplorerGraphical user interfaceKeilförmige AnordnungPotenz <Mathematik>Java appletInternetworkingNumberPositional notationError message
Java appletIntegerFile formatNumberInterior (topology)Point (geometry)Standard deviationImplementationFloating pointError messageInteger
Scripting languageJava appletLatent heatProgramming language
IntegerNumberHoaxProgramming languageDecimalLebesgue integrationError messageMultiplication signInteger
IntegerNumberMaxima and minimaDecimalCone penetration testRandom number generationMultiplication signSource code
Random numberServer (computing)Function (mathematics)Extension (kinesiology)Random number generationNumberCoprocessorRandomizationFunctional programmingPattern languagePoint (geometry)Extension (kinesiology)View (database)Computer animation
Random numberPhysical systemType theorySoftware developerMechanism designRandomizationSoftwareInformation securityFunctional programmingCoprocessorService (economics)Point (geometry)
Java appletRandom numberPseudozufallszahlenAreaNumberRandom number generationImplementation
Random numberJava appletWeb pageVector spaceJava appletRandom number generationMetropolitan area networkFunctional programmingVector space
Random numberVector spaceQuicksortRandom number generationFunctional programmingMultiplication signNumberProper mapRandomizationAssociative property
Java appletRandom numberVector spaceFunction (mathematics)MathematicsVector spaceFunctional programmingCoefficient of determinationResultantRandomization
Function (mathematics)Random numberPattern languageNumberMultiplication signLine (geometry)RandomizationEncryptionRadical (chemistry)Block (periodic table)ResultantAsynchronous Transfer ModeFunctional programming
Random numberRandom number generationRandomizationMultiplication signNumberFunctional programmingRevision controlPosition operatorLetterpress printingDifferent (Kate Ryan album)Line (geometry)
Vector spaceRandom numberInformationNumberSequenceMultiplication signPoint (geometry)Random number generation
InformationWeb browserClient (computing)Moment (mathematics)Statement (computer science)Numbering schemeUniform resource locatorComputer programmingScripting languageJava appletWebsiteClient (computing)InformationWeb browserCASE <Informatik>Point (geometry)CoprocessorNumbering schemeUniform resource locatorComa BerenicesHTTP cookieDifferent (Kate Ryan album)AreaGoogolNormal (geometry)
Numbering scheme2 (number)Web pageAreaHTTP cookieWeb 2.0Problemorientierte ProgrammierspracheInformationWebsite
Asynchronous Transfer ModeString (computer science)Web pageFunctional programmingRepresentation (politics)Server (computing)Web 2.0Speech synthesisWebsiteSinc function
Element (mathematics)Element (mathematics)Server (computing)Ferry CorstenQuicksortCuboidWeb browserRight angleMachine code
CAN busInformationFunctional programmingDemo (music)Element (mathematics)
Demo (music)Home pageComputer file
Demo (music)InternetworkingComputer fileRight angleInformationNumber
InformationMessage passingInternet ExplorerGraphical user interfaceComputer fileProcess (computing)InternetworkingWeb browserInformationMessage passingCASE <Informatik>SoftwareVulnerability (computing)Local ringPort scanner
Computer fileInformationProcess (computing)Server (computing)Client (computing)Cloud computingReading (process)Computer fileInformationError messageBit rateCoprocessorFocus (optics)Functional programmingWeb browserClient (computing)World Wide Web ConsortiumServer (computing)
Computer fileReading (process)Source codeFunctional programmingReading (process)Shooting method
Line (geometry)Reading (process)Error messageCoprocessorLine (geometry)Software testingContent (media)Computer file
Reading (process)Error messageCoprocessorFunction (mathematics)Expected valuePrologData recoveryAsynchronous Transfer ModeSigma-algebraFerry CorstenError messageCoprocessorComputer animation
Reading (process)Function (mathematics)Line (geometry)ParsingError messageLine (geometry)Software testingComputer fileFunctional programmingType theory
Line (geometry)PasswordAcoustic shadowPasswordComputer fileNeuroinformatikDatabaseLine (geometry)Physical systemInformationNumberGod
CoprocessorPasswordError messageInformationHard disk driveLine (geometry)
CASE <Informatik>Process (computing)Acoustic shadowRight anglePasswordRootkitError message
Acoustic shadowComputer fileLine (geometry)Computer filePasswordInformationThermal expansionData flow diagramRootkitCartesian coordinate systemoutput
Physical systemData integrityMachine codeInformation securityCartesian coordinate systemCoprocessorMachine codeINTEGRALParsingCASE <Informatik>Front and back endsFunctional programmingComputer file
Presentation of a groupComputer animation
Transcript: English(auto-generated)
coming. My name is Fernando and I work as a security consultant for active. I would like to show you today how XSLT implementations are vulnerable to multiple practical attacks. And for that matter, the very first question that I would like to be addressed if I would be sitting over there is
why are we talking about XSLT. This is a programming language that is not so common and has been created when XML was created and it was a way to parse XML data. So a couple of years ago I came across a real scientist, a co-worker, found an XML vulnerability, an external entity expansion that
allowed him to retrieve some passwords and I thought this is pretty cool. I want to learn more about XML, schemas and XSLT. Those are all the technologies related to XML. And I was doing that, I noticed that there were no things
written about how to exploit XSLT implementations. When you're reviewing a language, you may want to know that. So what we will be doing here is to analyze what are those weaknesses. So we're presenting five different issues in here. And how you can practically exploit them. So whether you are reviewing code, if you're a penetration tester or so, or
if you're developing technologies related to XSLT, this will come handy. Or just trying to abuse any implementations, hopefully this will help. One of the things is that even today no one of the vulnerabilities have been
fixed so everything will work. And you may be able to affect the confidentiality and the integrity of multiple implementations. And that means that you may get even some profit in certain areas. So the good thing is that you're not exploiting flaws in a way that a malicious virus would
do it in here. You will see no assembly code. You will see just XSLT and how that can be used to get some fun things. So we will briefly talk today about how you can identify your target, how numbers will let you affect the
integrity. How random numbers may be predictable sometimes. I will show you how to bypass the same origin policy in a web browser using XSLT. And finally some information disclosure through errors. So basically the idea here is to tell you what XSLT does briefly and how XSLT can be
attacked. And finally if you don't know, hello. I was actually expecting this. I know. That's why I'm here. So
this is your first time at DEF CON. How's it going? A little bit anxious. You seem a little nervous. You want me to talk first of all? We have a medicine for that. It comes in
that bottle, right? Yes, it does. I gave him the right one, right? Okay. No. I was like, dude, why am I getting
the water? Moron. Welcome to DEF CON. Thank you very much. Yes. This will be interesting. You feel much better, don't you?
I feel like a man now. Thank you very much. I would expect
that. And as I was saying, you can identify your target like this. Thank you. So basically XSLT is a language that is being used to modify an XML. So what it does is receive as an input an XML document and it creates a text
document or an HTML document or a new XML document for that matter. So there are different versions when it comes to XSLT. There are three. V1, V2, V3. And the different versions doesn't mean that they are improvement. They
should be. But they have more functionalities. Each version has more functionality. And V1 is the most implemented version here because it has been supported by web browsers and because it is supported by previous versions. So an XSLT processor supporting V2 will be supporting V1. So I tested
two types of software. Server side processors and client side processors. Server side processors means those are stand-alone things that you can run in common line or they could be libraries that are hooked up to different languages. Python, Perl, Java, whatever. And when it comes to
client side processors, basically I believe you have two types. You will have web browsers or eventually XML or XSLT editors. And I believe that is a very narrow set of people that are using those. So the some of the processors and libraries are mostly three. These are the most
important ones. Developed by Nomi, Apache and Saxonica. LibXSLT is the most widely deployed one. It is not only implemented by server side processors but also by client side processors, web browsers. And you also have Salem developed by Apache people which comes in two
flavors, C++ and Java. And the similar thing goes for Saxon. And the client side processors here we have the browsers. All the things that I tested was in the latest versions available of all the server side libraries and the web browsers. So we have three ways to do this. The first one involves an XSLT processor receiving an XML and an XSLT.
This normally happens when you are calling a common line processor. And eventually you will get a new document. You will do this if you need to parse an XML. So people will be using this if they need to parse something server side. Another possibility which is more common from a client side
perspective is when the XSLT processor is grabbing the XSLT document. So there is a small portion in the XML that says you will find here the XSLT document. Go get it for me. And create the new result document. And finally you can embed the XSLT document along with the XML and by
doing that you just supply one file to the processor to get the new result. So you might want to know if you don't know already who is your target, which kind of properties do the target have. So by getting which type of version and vendor they have, you may know what type
of vulnerabilities you could exploit in this target. Since clients may also support JavaScript and that would be the case for a typical web browser, you may also retrieve some JavaScript information. All this code that you're seeing here and you will be seeing here, it is only white paper. You can do a copy paste and try on your
target of choice to see what happens. So at the end of each section I will show you a brief summary of the server size. Here we have all of them. XSLT, that's the standalone version of libXSLT. And PHP, Python, Ruby are all related to libXSLT in this example. Then you have
the client which would be the web browsers. You will see the first column for the version, the vendor over there. And if it supports JavaScript or not, basically of course all web browsers support JavaScript. There is one
final thing and that's that normally libXSLT is most widely deployed than other things. So you will notice that it is sometimes when everything is affecting server size processes may also affect the client size. Let's start talking about the issues. So this is something present in client
size and server size. And it doesn't matter if we're talking about floating point numbers or integers, all numbers will introduce errors in here. So as I was testing this, it felt a little bit weird that sometimes calculation was not working as I expected it to be. Certain additions,
subtractions were not doing what I was expecting. So the very first thing that I did was to define a style sheet as a simple calculation. What I was trying to do is just to add a few numbers. So for that matter I had defined a style
sheet which has a specific output in here saying it is a text output. And in the middle you have this simple thing, 0.2 plus 0.1 minus 0.3. That should be 0, right? Pretty simple. May not be that simple for processes. Only two said that
this was 0. That was the case for Opera and Chrome. The rest said, well, close to that. Why is this happening? This is weird. And the weird thing is that you will see this across all implementations. Okay. This is cool. But it
would be better if we did something with this. These are just numbers that were not properly rounded by the programming language. So this is the thing that is present in all programming languages. I realize that you will have this in JavaScript, Perl, Python, C, whatever. This is a
common thing. Floating point numbers will have certain decimals that are over there hanging around that you may take. So I created a simple JavaScript application simulating a bank. This is not a real bank. I wouldn't try this on my real bank. Hopefully my real bank will limit the amount of transactions. I wouldn't allow very small decimal to be
transferred from one account to the other. So the very first thing that I tried was to see this application. I deposited a million dollars on the first account and the second account has a zero balance at the moment. This is where I will deposit my profit. So I noticed that if I remove a very small number from the million dollar account, it will
not get subtracted. But it will be added to the secondary account because it has a lower number than a million. It has a zero. And that decimal means more for a zero than it gets for the million. So this program will try to do the first portion of the program will try to see how
big of a number can retrieve. It is a small number. And then it will do millions of transactions to move it to the secondary account. So you will see in here that we will be using V8 and that's the Chrome JavaScript processor and the standalone version. And we will try to see what's the
best profit to get in here. How much money can I steal from million dollars that will not be noticed. And I will start moving that money from the account number zero to account number one. And hopefully that will give me a daily profit of around $1,300. It's moving, right? Yeah. So this
was good. But it would be better if it would be high of a number. So let's talk about integers. I opened an XML with ten numbers. This will be fairly easy to understand if you're not developing. You'll see that you have five exponential numbers in here and the same five exponential numbers written just the number one with a bunch of
zeros. The thing is that programming languages do not handle very well when you have more than 16 digits because of the precision. But what I tried to do was to print in here the same number that I was having on the XML document and then format it with the commas and the
numbers. In here you will see that Saxon is doing great. This is what you want to see. You will see number one followed by a bunch of zeros. This is pretty clear. This is awesome. You will have the same for the non-exponential notation. Internet Explorer and Firefox argued they weren't able to
show the exponential notation but that's okay. The problem comes when you're introducing errors because there's nothing worth believing that you have the right number when in fact you do not. In fact I was noting today that how they're finishing is different depending whether you're using exponential notation or not. So we'll try to use this
number in a couple of minutes. Say hello for Java. Almost there. And for C they just don't care about what's going on over there. So anything can happen here. So first I
thought okay this is something related to an error in the standard so I went to read the standard 754. But the problem is not over there. All implementations have problems. It is what you do with floating point numbers and integer what matters. You should be saying okay a number
should be between these values and not allowing a value to be so big if you are not able to handle it. Either way this shouldn't be working like this. So I reported all the issues including this one and the floating point numbers to the vendors and the first thing that I heard was that I
should be reading Wikipedia to understand how floating point numbers work. That was interesting. But probably you wouldn't find the answer over there. Then I heard also that I should be reading the XSL TV2 specification but this was affecting purely V1. That was nice as well but clearly was
not solving the problem that was in here. And the very same person also said that this is something that you will see in JavaScript as well. I mean that's fine. I know that you can't find this in JavaScript. But I wouldn't like to have this in my programming language. On any programming language because these errors are everywhere. So we stole some
decimals before. Now we're trying to do a similar thing but with integers. The thing is that if you put a number 1 followed by 17 zeros and you subtract the number 1, programming language will not notice that the 1 is missing. So perhaps I created in here a fake cryptocurrency which I named
fake coin that the value is very small. Very, very small. So I bought a number 1 followed by 17 zeros of these coins. So I have a lot of coins with a total net value of $1,100,000. And I will try to transfer one coin at a time to a secondary
account which will be my profit account. And hopefully by the end of the day I will have a better profit than moving decimals. The profit would be better if I used more coins. I would be able to transfer more coins at the
same time. And here just going for the minimum amount possible just to show you. The minimum amount in here would give me a profit of $2,300. If you add a zero to the coins, you should add a zero to the daily profit as well. So that was nice. That was nice. So the very next thing that I did
was to see how random numbers work. I mean if you are developing, if you have ever developed, you need random numbers. This is something that you should normally see on server-side processors. And you should also know that of course not any random number generator should be used for
cryptographic purposes. I mean random numbers, you have to be careful with them. In XSLT, this is a function that comes from the extended XSLT, which is an extension of XSLT. It is defined as a function that returns a value between zero and one as any random function
should be. Supposedly any random number should be a number that doesn't have any pattern. I mean you shouldn't know what would be the number before calling this method. That would be fairly logical from a random point of view. So we
normally have two types of random functions. If you have ever developed, you may know that you have functions that are less secure like random random in Python. You have more cryptographically secure mechanisms like system random in Python. You may want to use that if you're moving cryptographic things. Some of the software that I tested, the
server-side processors, you're able to see the code, you're able to see how that was developed. For libXSLT, sale of a c, sale of a j and you will see that in all these scenarios, they are using a random number generator, which is fine. The thing comes, the problem may come on the
implementation. If people are using random numbers for any cryptographic proposal, that may be a problem. Because you may know with a random number generator what's going on. These were the functions that we just see in C and C++ and in
Java. And a good definition comes from the man base. These are just by random number generator. You have to take that into consideration and shouldn't use them for cryptographic proposals. But there's one more thing when it comes to random numbers that you normally pay attention to or you should, at least. What happens if there is no
initialization vector? This is something basic from any random number and that's that you need to have some sort of something that is changing when you're getting a random number, otherwise you may get always the very same value and that's not very useful if you're expecting a random number. And that's because you may know in advance which numbers you
will be getting. So once you have a proper IV in place, you will have different values every time that you're calling the random functions. But let's see again how the functions that we saw before are working when it comes to the initialization vector. And in here there is only one that
doesn't have the IV. Again, libxslt. This is not something new to libxslt. They know about this since 2006. But this is how it works. So if you try to create an xslt that will produce a random function or you see anyone who is trying
to produce a random value out of libxslt, you will see something like this. And you will see these kind of results if you're executing that on comma line. I executed twice on the same terminal and I go twice the same number. You can
see that the 7.82, you always get that first number every time that you're executing the random function from libxslt. So the next thing that I do is to try to understand how this can be used in cypher modes when they are doing block cyphers. That's another way to cypher
things if you're using random. So I created two executions at the very first time to understand how these numbers look like. So first I printed the python version of random random. And you get two different numbers. Of course, this random number generator, they may not be the best but they are not
predictable and they are not the same every time that I'm executing that function. But libxslt we can recognize again the very same number that we saw on the previous slide, the 7.82 thing. That's the very same number. If you're calling python again with the print random random function, we will
see that we have again two different numbers. So far four for python and one for libxslt. If we're calling libxslt again, you will notice that in the second position we will always have these new value. The 0.13 and it will be repeated every second time that you're calling this. So without
having an external seed value such as time, you may know in advance which will be the sequence of numbers that will be generated by libxslt which is pretty cool because you may know in advance what is being encrypted if they are using this to encrypt something which will be fairly ridiculous. So
again, you may predict values when you're seeing random numbers. The same origin point is something that is present in client side processors. This means web browsers. Basically
this says that if you're on a website, you shouldn't be reading information from other websites. But again, as always, that may not be the case for certain engineers. So this is important. The origin is always defined by the scheme
the host and the port of the URL. We will be an example of this. The HTTP at the very beginning or HTTPS will be the scheme. The host should be example.com and the port should be either port 80 or port 443 or something like that. So generally speaking when we're retrieving documents from
different origins, the web browsers will not share the information. I mean when we're taking the same origin over and over, we may have ‑‑ we will be sending the same cookie over and over to the same website and that will be okay. Normally JavaScript is used to try to alter this but you don't necessarily need to use it to affect the same
origin policy. You shouldn't be expecting that when you're connecting to Google.com, your browser will be sending the very same cookie to this website because it has the very same origin. If you're connecting to Microsoft.com, you should be seeing a different cookie. This will be a very valid
scenario. You just connect to a website, you're on the main web page, you're trying to access a second web page that is being stored over there and that will be fine. That's okay. You're allowed to see that. In fact, you're even allowed to see other web pages on the very same domain. But if you're
changing the scheme, if you're changing the host name or if you're changing the port, you shouldn't be allowed to see any of the information that is present on that other website. I mean you're not sharing the private information between websites. That's what you would expect of it. So there's only one function that reads documents and
that's document. Okay. So you may try to use that to read another XML document. In fact, since we're speaking about websites, we could also see in here XHTML, which is a fairly common way to represent a web page by certain web
servers. Once we retrieve the XHTML documents, we can see what's inside using either of these two functions, copy off and value off, which will show either an XML representation or an external representation. So the very first thing that you want to do if you want to abuse this, you need to find a
server that uses XHTML. Okay. Bing.com uses XHTML. I'm logging in here. What can you do with this? So on the upper right corner, you will see that my name, it is in a red box, and that is also reflected in the code. And since it is XHTML, this is some sort of XML. And my name is within an
element named ID underscore M. So you may be able to target your web browser to retrieve that value. So let's see how using the document, how using any of the other functions, we can retrieve that information. In here, we can see that the
document function is accessing the URL, www.bing.com. And then right in the middle, we are retrieving the information that we just grabbed from the document. And finally, because I'm lazy, on JavaScript, I will be subtracting the ID
underscore M element, which has my name. So let's see a demo of this. First, I will open Safari, and I will show you that I'm using Bing.com as my home page. And then I
will open the document that is on the desktop that it is not sharing the same origin. Because one, Bing.com, it is being hosted on HTTPS, Bing.com. And the other one, it is file, it is a local file. So let's see what happens. You will notice
again my number in the upper right corner. And when I open that file, I'm reading the document that is being stored by Bing.com, and I'm able to retrieve my name from using XSLT, even though it is not hosted on Bing.com, Safari doesn't care. We will show you that information. So
basically, Safari will allow you to read this. And Internet Explorer may show you a warning message, will retrieve the information, but you won't be sharing anything related to this. And the other process just didn't do anything.
Another cool thing would be that you may use some of this stuff to try to scan internal networks in case you wanted to. There are multiple ways to try to scan internal networks when you're executing something locally. And this could be another way as well. So another vulnerability that I
found and I thought that would be very interesting to discuss would be an information disclosure on file reading through errors. This is something that is present in server side and client side processors. The focus here is
of course on server side processors because we wouldn't care what will happen on a web browser. So the cool thing about this is that it is not possible to read text files in XSLT V1. It is only possible to read XML documents, or as we saw, XHTML documents. And since it is
not possible to read plain text files, it doesn't matter what function you are trying to use in here because all functions wouldn't be capable of doing this. So let's see what happens even though when the W3C consortium says it is
not possible. So we saw before that there was one function to read XML documents and that's the document function. This document will allow access to other XML documents other than the main document. Okay. We have that. We can try to use that. There are also other functions used for
accessing XSLT documents and that would be the functions include and import. These functions do just retrieve a stylesheet and I will try to use it combined with other stylesheets. We don't care about what the manual says about
this because either way we are not trying to read a stylesheet in here. So I created a text file that contains three lines. Very simple. If you see the contents of my test file, you will see a line one, a line two, and a line three. Pretty simple. And if you read the
documentations, you will see that when you are reading a file, this comes from the XML documentation, if you are reading a file, there are a couple of possibilities. The XSLT processor found an error and that's what some of the
processors do. They say this is not allowed in Prolog. Okay. That's okay. The other possibility would be to return an empty XML document. That's what Ruby does. Ruby will show you that there's nothing to see in here and this is something that is also expected as well. But again,
this doesn't solve the problem that we want to read something that was in the test file. XSLT comes again to help us with this. So when using document, XSLT Prog, PHP and Perl will show you the first line of our test file. Remember the
line one of the test file? That's not too much, but it's cool. Perhaps we can do something with that. We also try to use other functions to try to access these files later. But having this unexpected behavior in place may allow us to do something with it. So you may know in advance where
I'm going with this, which type of files may have an interesting first line that would be valuable for us. There are certain specific files that sort the most valuable information of a computer on the very first line. So what if we will be able to read, for example, a password file?
Where we could find those passwords? The most common answer for any Linux system would be an etcetera password. The next one, if you, God forbid, they are running this as root, could be the etcetera shadow. The possibilities are in your imagination here. Depending on
what you are trying to read, you may be able to retrieve certain information that may be valuable for you or for someone else. You also have the Apache password and you also may have databases passwords. There are a number of possibilities down there. The thing is that this is what you will be seeing when using, for example, one of the processors when trying to read the etcetera password. You
will see an error and also something else, the password, which is cool. You can also now use XSLT to retrieve this information. Another example using PHP could be to try to use PHP to read the password of an Apache. And again, since
this is something that you store in the very first line, you may see a bunch of errors and right in the middle what you are planning to see. The password for John in this case. And as I was saying before, just in case they do not care about what they are doing, you could also have someone
using the etcetera shadow, leaving that available if they are running this as root. And this will happen if you are using Ruby to try to retrieve that file. Again, expect all the errors, but also expect the password for the root over there. So this is pretty neat because I believe this opened
the possibility for XSLT to be as interesting perhaps as external entity expansions as a way to retrieve some information. If an attacker is able to compromise an XSLT because the application is allowing the application that allows XSLT to be uploaded or XML that are relying on
document input include that may be altered in any way to try to read files. So either if you are able to control an XML and you have an XSLT processor in the back end, parsing this, or you are able to control the XSLT, you may compromise the security of an application. As we also saw,
we don't always need to do that in place to have the confidentiality and the integrity affected because sometimes when using random function or integers, they may be doing that to our profit without doing anything on our side. So I would recommend as a very last thing that you
should check your code or someone else in case wants to use any of these things. So that's what I have for today. If anyone has any questions, I'm happy to answer them. Thank
you very much. And thank you for all these people who helped me with the presentation.