Emoji Shellcoding
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 85 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/62206 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
DEF CON 3059 / 85
24
28
29
47
51
53
59
60
62
70
72
75
80
84
85
00:00
Moving averageComputer wormGeneric programmingString (computer science)Single-precision floating-point formatWritingGastropod shellForm (programming)Information securityEscape characterComputer virusPresentation of a groupDigitizingOperator (mathematics)NeuroinformatikCompilation albumComputer wormConstraint (mathematics)Hydraulic jumpCodeSpacetimeSet (mathematics)Binary fileoutputOverhead (computing)Exclusive orControl flowDemo (music)WhiteboardSoftware developerElectronic mailing listBuffer overflowTuring testAsynchronous Transfer ModeTable (information)Hacker (term)2 (number)Domain nameStack (abstract data type)Sign (mathematics)Buffer solutionVulnerability (computing)Process (computing)Alpha (investment)Online helpInheritance (object-oriented programming)Software frameworkComputer animation
05:54
System administratorSet (mathematics)Exploit (computer security)NeuroinformatikComputer programmingPasswordGroup actionRootComputer fileLevel (video gaming)Instance (computer science)Computer animation
06:33
RootAirfoilComputer programmingGastropod shellSet (mathematics)Group actionBitPasswordCodeRootVulnerability (computing)Computer animation
07:06
Ordinary differential equationVideoconferencingWorld Wide Web ConsortiumReduced instruction set computingComputerComputer wormFocus (optics)Operator (mathematics)Interpreter (computing)Hydraulic jumpSemiconductor memoryComputer wormDemo (music)Reduced instruction set computingConstraint (mathematics)Multiplication signEmulatorBitReduction of orderForcing (mathematics)2 (number)CASE <Informatik>Set (mathematics)CompilerLevel (video gaming)OpcodeNeuroinformatikComputer architectureCompilerCodeSingle-precision floating-point formatGastropod shellArmCompilation albumComputer animation
10:54
SynchronizationTouchscreenDemo (music)Goodness of fitDigitizingCausalityBuffer overflowVulnerability (computing)Data managementComputer animation
11:36
Gamma functionClique-widthTouchscreenDemo (music)Invertible matrixComputer iconDigitizingGame controllerComputer fileAcoustic shadowPasswordData managementHash functionVirtual machineComputer iconMultiplication signBitGoodness of fitComputer wormRadical (chemistry)Computer animationSource code
12:19
Convex hullSystem programmingUnicodeCodeRepresentation (politics)ConsistencyStandard deviationArchitectureReduced instruction set computingOpen sourceComputer hardwareOpen setGame theoryUsabilityCodeMedical imagingPoint (geometry)Computer architectureReduced instruction set computingRepresentation (politics)Figurate numberComputer hardwareType theoryOpen setSet (mathematics)BitStandard deviationDigitizingGastropod shell1 (number)Sampling (statistics)Single-precision floating-point formatSoftware bugTable (information)MultilaterationMereologyScripting languageBefehlsprozessorTerm (mathematics)Real numberProduct (business)Presentation of a groupOptical disc driveGreatest elementOpen sourceConsistencyRevision controlValidity (statistics)Physical systemWeb crawlerSymbol tableArithmetic meanInformation security2 (number)Source codeElectronic mailing listTouchscreenElectronic visual displayGoodness of fitForm (programming)Connectivity (graph theory)Computer animation
19:45
Exclusive orCodeGamma functionMereologyComputer programmingSequenceElectronic mailing listType theoryRepresentation (politics)CodeAlgorithmCompilerCASE <Informatik>Multiplication signBranch (computer science)Instance (computer science)Goodness of fitChainJust-in-Time-CompilerPoint (geometry)Binary codeStreaming mediaHydraulic jumpCombinational logicoutputBinary fileForm (programming)Gastropod shellKeyboard shortcutSurjective functionBitOrder (biology)DecimalFilm editingInfinityFunction (mathematics)TheoremPhysical systemComplex (psychology)Line (geometry)Library (computing)Medical imagingValidity (statistics)Right angleTouchscreenStandard deviationComputer animation
27:09
Gamma functionAddress spaceEmailAlgorithmFormal grammarComputer wormCASE <Informatik>Set (mathematics)CodeMultiplication signBitFunction (mathematics)FamilyComputer wormComputer scienceRegulärer Ausdruck <Textverarbeitung>AlgorithmGastropod shellElectronic mailing listBranch (computer science)Greatest elementData storage deviceFormal grammarInstance (computer science)Logical constantPointer (computer programming)Tape driveAddress spaceLine (geometry)TouchscreenPoint (geometry)ChainLinear codeSemiconductor memoryRight angleFocus (optics)Medical imagingLogicOperator (mathematics)MereologyForm (programming)Derivation (linguistics)View (database)Control flowBinary codeRepresentation (politics)Category of beingRadical (chemistry)Game controllerGoodness of fitEqualiser (mathematics)2 (number)NeuroinformatikComputer animation
34:33
EmpennageGamma functionDemo (music)WhiteboardFlagMulti-core processorInstance (computer science)SoftwareCASE <Informatik>Computer animation
35:20
Interface (computing)SoftwareGastropod shellCodeWhiteboardElectronic mailing listBefehlsprozessorComputer animation
35:56
Polymorphism (materials science)Common Intermediate LanguageHill differential equationCodeWorld Wide Web ConsortiumComputer-generated imageryComa BerenicesRootGastropod shellBitComputer wormPolymorphism (materials science)NumberElectronic signatureAntivirus software2 (number)Factory (trading post)Electronic mailing listSummierbarkeitComputer programmingMereologyPoint (geometry)Hash functionDatabaseSubsetDemo (music)Maxima and minimaFamilySlide ruleCodeMultiplication sign32-bitPointer (computer programming)WhiteboardCASE <Informatik>Vulnerability (computing)Hacker (term)Bookmark (World Wide Web)UnicodeHydraulic jumpComputer animation
39:19
PredictabilityComputer networkSynchronizationQuicksortVideoconferencingGoodness of fitPublic-key cryptographyBuffer overflowComputer animation
40:01
Wide area networkGamma functionPublic-key cryptographyPolymorphism (materials science)Goodness of fitBitComputer configurationConstraint (mathematics)Gastropod shellFilter <Stochastik>Computer animation
40:53
World Wide Web ConsortiumInformation securityAddressing modeProbability density functionSystem callTouchscreenProbability density functionSlide ruleBitCrash (computing)BackupWeb browserElectronic visual displayProjective planeSpacetimeDuality (mathematics)Demo (music)Filter <Stochastik>Goodness of fitComputer animation
42:10
Gamma functionWorld Wide Web ConsortiumMathematicsNeighbourhood (graph theory)Film editingLink (knot theory)Address spaceSlide ruleRadical (chemistry)NeuroinformatikRepository (publishing)Line (geometry)SoftwareGreatest elementFactory (trading post)CodeGoodness of fitWindowLibrary (computing)EmailTouchscreenMedical imagingLevel (video gaming)Task (computing)Computer animation
Transcript: English(auto-generated)
00:00
Walking over to this talk I was thinking to myself I'm a you know full-time job as a developer and I'm thinking you know if there's one thing that developers love doing and we're super competent at it's string handling. It's not like there's an entire industry based on just how poorly we've mangled that up over the years and if there's a second thing we really love doing and are
00:22
great at implementing correctly it's Unicode. So I'm sure we're in for something exciting here with Unicode or emoji shell coding so help me welcome Adrian and George.
00:41
Hello DEFCON. How are you today? Great. Welcome on board. Before we depart please make sure that your seat is up, your tray table is towed and your seat belt is securely fastened. If you have any electronic device please switch it to DEFCON mode now.
01:05
Please be careful when opening your overhead bins as emojis may fall on you. Thanks and enjoy your talk. So let's present ourselves. So this is Adrian. Hello. It's his third DEFCON talk
01:22
and I'm George. It's my second DEFCON talk. So security researchers at Econormal Superior in Paris, France. We are both French as you may have guessed and we both managed to fail at the Turing test but that's another issue. What we're going to see today. So we're going to talk about some generic tooling and methods to write shell codes under heavy constraints.
01:46
We'll dive into the dark art of shell coding. We'll play with emojis to make computers do things they definitely shouldn't do. By the way you'll also see the first emoji shell code ever created. Brace yourselves. We're going to turn emojis into
02:01
merciless payloads. So let's begin. Who has ever written a shell code? Just raise your hand. Come on. Just raise your hand. Don't be shy. Feds won't arrest you. Well about 20 percent. 10 or 20 percent. So for the remaining. Let's explain what a shell code
02:25
is from the beginning. So a shell code is some code that you either found or managed to inject in your target. This code generally gives you some power. Usually it pops a shell. That's why we call it a shell code. Now it can do a lot more fun things than simply spawning a shell.
02:44
And generally what you do is after injecting it in your target you jump onto it using what we call a vulnerability. So you can have buffer overflow, use after fee and so on and so on. You just go to the CVEs list and you have all of them. And the typical scenario is you send a carefully crafted
03:02
string to your target and you profit and generally you have your shell. However there are many issues with shell codes. So the first thing for example if you have a buffer overflow using a scanf, so you need to input a set string. And a C string you can't have a null character inside because it breaks everything. Same if you treat it as input. Same thing again about scanf. You can't have
03:26
white spaces because it will just break your shell code in half. You can have other constraints. So for example your name can only contain letters so you can't have a name with digits. Or you can also have string escaping. So if you put it in a form generally you have string escaping which
03:45
tends to break your shell code. And of course trying to pass your input with containing slash bin slash sh as your first name is not really credible. So it does not look legit at all so it gets detected. What can we do? So here we go into the domain of constraint shell coding. So generally
04:05
many hackers want to pass their shell codes as pretty stealthy. So the idea is you want to reduce the set of characters that are used for your shell code. One of the normal constraints that have been studied for long is alphanumeric shell coding. So your shell code must only use
04:21
letters and digits. Here we're going to see on x86. And x86 has been solved quite a long time ago. So here it has been solved by Ricks in 2001. And the idea is really simple. So you take your letters and your digits, you disassemble them and you look what happens. So here on x86 it's pretty easy
04:41
because you have single letter instructions for push and pop which allow you to manipulate the stack. You can also increase increment and decrement registers with only single letter instructions. You also have control flow instructions like jump and comp. So this is pretty easy. They are all alphanumeric. And you may also have the XOR operation which does an exclusive OR
05:05
with many many operands that are all alphanumeric. So it's pretty easy shell coding on x86. And if you want to switch to x64, so it's pretty easy. You just put a capital H in front of every instruction you want and it works except for increasing and decreasing registers.
05:23
So x86 is pretty easy. Actually it's even so easy that other people manage to go much further than that. So people tried. So this is not alphanumeric anymore because here we add some spaces and punctuation. But people managed to write shell codes entirely in English. So the idea is you
05:40
take English words, punctuation signs and so on and you want to make a shell code that looks like English. So this will be our first demo. So in this scenario what we'll be doing on x86 computer, so we will do a set user ID exploit. So this happens on a lot of computers. So take for
06:04
instance changing your password. To change your password, so any user can change his own password. For this you have to modify the password file which is on Linux on etc shadow. And for this you do not have the right permission to do so. So someone has to change the password for you.
06:21
For this we have a specially crafted program which is called a set user ID program. So it's a program with a special permission that a user can execute that performs actions at the administrator level which is root on Linux. So here we can check for this program. Here we can see it's called main it's in red and we have the s bit in the permission that describes that it is
06:41
a set user ID program. So if this program has a vulnerability instead of changing just your password what you can do is send a shell code to have other actions. So here we want a root shell from this vulnerable program. So it's purposely vulnerable here. So I just take my shell code so you can see indeed that it's it looks like English. If you try to understand it actually it does not
07:02
meet anything. And then when I paste it I press enter and here I have my root shell that just spawns. So I can check indeed I have UID 0 which is root. Thank you George. So we have seen that
07:26
doing alphanumeric shell coding on x86 works fine. What about other architectures? Well let's look typically at risk architectures aka reduced instruction set. Well then we don't have single
07:40
characters instruction anymore and we also have very few addressing modes. So no way to do instruction from memory to memory. And moreover we have very heavy constraints on the opcode apparent side of instruction. So basically this means that the previous technique doesn't work anymore. So let me present you three ways to get around that namely compilation, emulation, and
08:04
unpacking. So the first one is compilation and the idea is that you want to write a compiler which instead of targeting your usual architecture will target the constrained architecture. It has been done in the past for slightly different thing which is called a single instruction set computer.
08:23
Typically the morphis caters is a compiler which takes any arbitrary C code and compiles it for only the move instruction on x86 because this instruction happens to be Turing complete. So it works quite fine for one instruction set computers much less when it comes to our
08:44
risk setup because then the constraints are on the appearance and not on the opcode. So writing a compiler it's much more difficult and like nobody really know how to do that. So this approach is a bit dead force. Second one is the emulation way and this time the idea is to
09:05
spend quite some time to write a small interpreter for some language. This interpreter will pass the filter typically would be a phonemic and then since you have interpreter you can encode any payload arbitrary payload in your target language and interpret it. This is typically
09:25
what Yunnan and others did in 2009. So they did it for ARM v7 and wrote a brainfuck interpreter. So it works fine but there's an issue with this approach namely it's that the harmfulness relies on what you can do with an interpreter. So like if you want to use it on actual exploit it means
09:45
that you need to escape the interpreter or that the interpreter has to be able to write outside of its sandbox and typically in Yunnan case it's not the case so you're a bit stuck here too. Third one is unpacking. So this time we want to encode our payload in a constrained
10:05
and compliant way. Typically I want to find a way to encode my payload in alphanumeric. Base64 would be almost good for that. And then identify like high level constraint and try to write some alphanumeric unpacker from that which will be able to unpack my payload then jump on it
10:26
and execute it. So you spend some time writing this unpacker which is alphanumeric. You have your payload which is encoded in alphanumeric way, unpack it, jump on it and then you can run arbitrary payloads. So this is what George, me and some of our colleagues did
10:44
previously from ARM v8. So we're going to show a small demo of that. Thanks. Good. So in this setup
11:06
we have a small account manager which is very badly written and has an obvious buffer overflow vulnerability in it. So it's asking for your username and obviously before getting to the buffer overflow there is some check to make sure that the username looks like
11:24
a valid username. So alphanumeric works fine here because username which consists only of letters and digits is quite legit. So we are going to use our payload here. As you can see it is only made of letters and digits and we're going to use that as our username and take control
11:45
of this account manager. Here the idea is to dump the etc shadow file which is a Linux machine. So file which contains the hash of all user account passwords. It takes a bit of time because the terminal is a bit slow.
12:05
Good. Almost finished. And then we have a win. Our payload just dumped the slash etc slash shadow file and we have the hashes of all passwords.
12:23
So in a sense we can I think almost say that alphanumeric shell coding is solved. It even works for the RISC-V architectures and this is what some colleague George and I presented in DEFCON 27. So don't hesitate to jump back in time and see our talk again.
12:46
Back to emojis. A few months ago Blue Diet and Gerine which is a security researcher asked George and I well you have done alphanumeric shell coding could you maybe do emoji shell coding. I did think about it and found a very good reason why it was impossible.
13:02
So just gave this reason and I was happy with it. Then 2 a.m next morning I just wake up and realize that my reason is totally wrong like I did a mistake and so I try to fix it and after few hours I realize that well I can't fix it. So I'm only left with one possibility
13:24
that is to prove that doing emoji shell coding is possible. And then I woke up at 8 a.m. I opened my mailbox and what I found a hoard of wild emojis that were just saying saying hello world to me in Camu. So when you see that in your mailbox the
13:43
first thing you say is what the fuck is this and then you look into it then you ask for a source code then you look more into details and you find bugs and with all the bugs I found I said how the fuck does it even work there are bugs everywhere and then I just went and saw Adrian
14:02
and we thought we sat together on a table and we said let's clean this shit and send it to DEFCON and here we are. Good so we want to execute images on some as some on some architecture so execute emoji code for that we first need to look at images and try to define a bit what images are.
14:25
Well let's say I wanted like 10 years ago to send a nice text to my girlfriend so I type my nice I love you the probability that she would get this instead was unfortunately quite high and well this is not exactly the same meaning. So we needed to do better and fortunately we have
14:46
Unicode which is now quite a old standard to the rescue we can take the Wikipedia definition which is that Unicode is a standard for the consistent accounting representation and handling of text expressed in most of the world writing systems namely that like we want that everything
15:03
displays quite the same and in a consistent way for for everyone on any system. So what do we have in Unicode? Well obviously we have Latin letters such as this capital A and on this part Unicode is actually compatible with ASCII then you can find many other scripts
15:22
such as Japanese paragonels more obscure stuff such as the whole set of alchemical symbols yes in Unicode. You have also the whole set of playing cards, spiders and some odd characters such as cuneiform so this is a single Unicode
15:43
compoint so you this very wild character is a single Unicode character and then most of such has the holy hand grenade of Antioch or Kim Kardashian. Well actually not the last two ones at least well not yet.
16:00
So we have a very small sample of Unicode characters and now the question is well what's an emoji in this? Capital A well clearly not an emoji. Smiley clearly an emoji but maybe what about the jack of spades? Well you look into the Unicode standards well it's not an emoji
16:21
but let's replace the jack of spades with the black joker then the Unicode standard says that it is an emoji starting to get a bit confusing and let's switch it again with from the black joker to the white joker then Unicode says this is not an emoji even more confusing. So we are going to settle for a very simple yet a bit subtle definition
16:45
which is that if Unicode says it is a qualified emoji then it is an emoji. And UTF-8 which is a way to represent Unicode compoint code point in bytes, images are at least three bytes such as this nice smiley face and at most 45 bytes.
17:05
Moreover they add new images every year and we are currently at Unicode version 14. So for today we are going to consider doing check coding with UTF-8 images on Unicode version 14. Good so we have images here and then we need to run the images on something.
17:26
So we are going to use wristwave. What's wristwave? Well wristwave is the architecture of the future. Oh well this is what we said three years ago at the con 27. I think now we can even say that wristwave is the architecture of the present. Well since 2021 you have some real wristwave products
17:46
that you can buy as a customer so maybe you already have wristwave CPUs at home. In more technical terms let's see what we have in wristwave. Well this is a simple RISC architecture. They push a lot to have open source ISAs and also they push a lot
18:03
for open hardware. And in wristwave you have two and four byte instructions and it is little engine. Remember that for later. Well so now I need to execute my images on wristwave. Let's try to look back at the method
18:21
we saw previously the previous work and see if it still works. So alphanumeric x86 you had to take the whole set of single letters it gave you a set of instruction you realize it was Turing complete so then you could shell code in that. Alphanumeric over RISC architecture such as
18:41
ARMv7, v8 or RISC-V you took the set of quadruplets of letters and digits it gave you a set of instructions again you figure out it was Turing complete so you were good. What about emoji RISC-V? Well let's use the first method. I take my list of all images see how many of them can be
19:02
executable as RISC-V. Too bad only 10 of them are so that's not Turing complete at all. Let's try the second method like pairs of images we still have very very few pairs of emoji which are valid RISC-V code and if I go along and take triplets quadruplets it's not getting better at all.
19:23
So we are stuck and the previous method will not work here. So I'm going to try to present you the way it works how we were able to fix this issue with an example. So on the bottom of the screen you have this AUI PC instruction which is in RISC-V
19:41
a way to load the PC relative offset into the RA register and if you look at how it is at this exact same exact decimal representation it is 97 F0 9F 97 and this is not valid emoji. This UTF-8 is not valid emoji. So we're going to split it in two like this left 20
20:06
97 and right the other parts. So let's try to tackle the left part. I want an emoji which ends with 97. So I look into my big bag of images find one this is the okay emoji which starts with F0 9F
20:23
86 and too bad these three first bytes are not RISC-V are not RISC-V code. Typically this is three bytes and as I told you there are no three bytes instruction in RISC-V. But maybe we have another gadget which can do jumps and then you can jump directly on the 97 and execute our AUI PC.
20:47
Let's look a bit more into our bag of images and this time we find the bang emoji which ends with 97 and starts with E290. This time it's quite nice because E290 is a valid RISC-V emoji. Add
21:02
S11 S8. So at the expense of crashing the S11 register I can then start directly this gadget execute my add then my AUI PC which is my target. Good we have two ways to do that and on the right side it's going to be quite similar. So this time I want an emoji which starts with F0 9F 97 look again into my bag of images
21:26
and I find the calendar emoji. It ends with 97 EF B8 8F which is a valid RISC-V instruction. So this time at the expense of crashing the T6 register I can finish my gadget.
21:43
Again last time let's look into our bag of images and we find the bin emoji. This time the last four bytes are not valid RISC-V but 91 EF is a small forward jump. So again I can end my gadget by
22:01
jumping out of it. In the end this is quite good because I can use any combination of okay bin, okay calendar, bang bin or bang calendar and execute things and I have quite a trade-off between the gadget size and crashing registers or not.
22:21
Good so the question is well I could do it for one specific instruction how can I do that like how can I find a way to generate all possible gadgets and a whole list of emoji compatible RISC-V instructions. Well for this we have to take a little step back so we'll be talking about code
22:41
reuse attacks. So for the people who know it already think about return oriented programming or just in time spurring but here let's present for those who do not know it. So return oriented programming has been published originally by Shaham and others in 2007 and the idea is really simple. So you take a huge binary typically the C standard library and then you just scan it and
23:06
you find small little reusable cut snippets so those little cut snippets are called gadgets and the idea is once you have a whole collection of gadgets what you do is you try to assemble them together you chain them in order to have your shell code so you just rebuild the shell code
23:25
in the form of what we call a ROP chain then you send this ROP chain as input in the vulnerable program and you still have your shell that pops. There is a variant of it which is called JIT spraying that has been published by Blazakis in 2010 and the idea is instead of scanning your
23:44
binary to find gadgets what you do is you just create them by controlling some code that then is compiled using a JIT sprayer just in time compiler. So here the attacker for instance has you can see it on the left side on the right side of the screen on the top part the attacker can
24:03
control and write any code he wants so this is typically javascript for instance. So what the attacker does he assigns to the variable y some immediate that are absorbed together if you send it to firefox this is this goes through the just-in-time compiler and is then compiled as the binary code that you see below and in this case it's pretty easy because the attacker manages
24:25
to control four out of five bytes of the final program so here you can generate gadgets as you want and you can just do your return onto programming as before so this is JIT spraying. So here we have to do the same with emojis so we can control emojis and we need to build gadgets
24:46
from it so how do I create gadgets from emoji stream here it's a little bit complex because do you remember the infinite monkey theorem it states that if you take a monkey you give it a cable and if the monkey types on the keyboard for an infinite amount of time then almost surely he
25:03
will type any given sequence so here we do the same and we take our monkey and we give him an emoji keyboard only problem is that there is no algorithm to look for gadgets in an infinite stream so previous methods were only scanning binaries but not infinite streams so we need to
25:23
scan the output of a monkey here we need to invent a new algorithm so let me explain this algorithm so here we have three lines so on the first line what we have is the emoji stream so this will be a stream of emojis this will be what our monkey will type then this emoji stream will have
25:42
a hexadecimal representation that is on the middle and we must synchronize it with some executable stream so this is RISC-V instructions and both should coincide on their hexadecimal representation so let's start with an instruction so let's take for instance add s7 s7 s7 s8 so here
26:03
i have an excess of my instruction stream so what i must do is find a an emoji which starts with e290 so let's take our bag our bag of emojis and take for instance the chain emoji here so here i go on the opposite side so i have an excess of emojis so i must find some executable
26:24
instructions that fits the hexadecimal representation here so let's look what is inside so i find an or immediate which fits and here is something very interesting because here i managed to end my emoji stream and my instruction stream at the same point so both are synchronized
26:42
here here both are synchronized here and what i can do is i can just stop here and consider that this is a gadget because i can reuse the chain emoji independently without taking care of what comes before or after so this is a gadget let's go to a second case if i take the basketball
27:03
player emoji so again i have an excess of emojis here so i must find an instruction so this b9ef is a branching instruction so it's a small branch forward so this is another very interesting case because now i can escape out of my gadget and i do not need to end my basketball player with
27:23
other prescribed instructions so this is another gadget so we can see here the second gadget that we managed to generate and we can reuse it as it is let's go to the third case so let's take the snowman emoji so again i look for an instruction that starts with 84 so let's take for instance store world and then here i come back to the beginning where i have an excess of instructions
27:45
and i complete it with an emoji so the copyright emoji here and again i can do the same as before and here i have a branch which ends my gadget again so i do this for every possibility and in the end i have a whole list of gadgets that are usable for shell coding and so on and so on
28:07
just a side note for people who know a little bit about grammars so the method we use actually is really generic as you may have guessed the algorithm follows uh derivations of a grammar in the form we have a non-terminal that produces a terminal followed by a non-terminal
28:23
here our terminal is emojis so for people who know about it this exactly describes what we call a right linear grammar and there has been a very interesting property detailed by pattell in 1971 who says that any regular expression which is typically for computer scientists what we call a
28:42
regex can be converted to a right linear grammar of course adapting the tool is left to the reader as an exercise well let's get back to emoji now so i managed to generate all gadgets that can be implemented using emojis now we just have to change them good so let's try to write our emoji chain
29:04
using the output of our previous algorithm we want to look at which instructions are indeed emoji compatible and well at the beginning it looks quite fine because we have a bit more than 4000 instructions which i can find in at least one of the gadgets and if you look at what kind
29:24
of instruction we have first we have some logic instructions like add and sub and more so that's quite good we have branches both conditional and unconditional both forward and backward that's very good because for control flow that's very useful we can address many registers so here you
29:42
can see we can address 14 registers but not the stack pointer don't worry even if we cannot touch the stack pointer we can still find ways around and in practice it's not that much of an issue however we have very few immediate so very difficult to load constants and actually in
30:01
this 4000 instruction we have quite a lot of floating point instructions which we don't want to use to give you more an idea of what we have like for some like families of instruction such as the csr family we only have a single one of them so this is the one displayed here and in the end what
30:20
we see that we have a tiny bit of everything so the set of risk five but if we could compatible risk five is like very diverse and very difficult to work with because it's very difficult to combine one instruction with the other yet we do manage to do this to do this and i'm going to show you the overall view of what our shellcode looks like so we use the unpacking method that we
30:44
showed you previously and basically we have a small initialization part then our unpacker which embeds the encoded payload so the unpacker will unpack this payload in memory then jump on it and it means that it's very easy to change the payload
31:04
because we can generate an impactor easily from the payload let's focus on the impactor so in our last talk we like went very far into finding ways to do very efficient impactors and with like many gadgets to do that for this time we're going to go the very opposite and
31:26
write a very simple impactor so for that we will only use three gadgets one is increment the a1 register second one is set is like write the value of a1 in the pointer 22 by a3 and the
31:40
last one is incrementing the a3 register so with that we can write an impactor i will show you how so for this we take an initial payload which is a very dummy payload of only three bytes 3 20 and 10 and if we go back to initialization well what we do here is that we make sure that
32:01
a1 is 0 and then a3 is set to the first address of where we want the payload to be decoded at so let's back to it we are we have a1 equals to 0 let's output three times the increment a1 gadget so then a1 is free and we can store a1 in a3 increment a3 from 3 to 20 it's 1d so we
32:26
output 1d times increment a1 gadget a1 is now 20 you can write it we are good and from 20 to 10 well we have to output the a1 plus plus gadget zero times so we can write it in memory so this
32:41
is a way that we generate our encoded payload if we execute this well then we start from a1 equals zero a1 plus plus three times a1 equals three we write three to memory then we add 1d to three it's 20 you write 20 to memory and add f0 1 so 1 a1 is 10 write it to memory it means that the
33:05
initial payload is equal to the decoded payload and we have found a way to encode any payload into our encoded payload which will decode at the exact same initial payload and since all of this gadget a1 plus plus a3 plus plus and store a1 is a3 are emoji compatible it means that the
33:25
whole encoded payload is only made of images let's present you like a bit in details how we do this a1 plus plus a3 plus plus gadgets okay time to switch to gdb over beamer so here let me present
33:41
you what we have so on the first line we have the get the emoji streamer as before and we have its hexadecimal representation on the bottom so here let me spawn my binary code you have the two gadgets on the left part of the screen you have the registers on the right part of the screen and you have the memory where we want to store our payload on the bottom right part of the screen
34:05
so let's start with an op-like operation for the first gadget so this will just trash the value of s3 if we go to the next one we store a1 which has been initialized to ab before to the address pointed to by a3 which is 8000 then we just take our branch which will jump to the next gadget
34:24
and we start with another nop-like instruction same we have a branch to the next two bytes which is also an op-like instruction and in the end we add the value in the end we add the value of c2 which is one to a3 which will increment a3 and we can continue as long here i have a small gap
34:47
between the two gadgets and i can fill it with whatever i like so here i can fill it for instance with three england flag emojis right let's switch to the demo so here i will present you a demo on a
35:04
high five unleashed board so this is a quad core risk 5 64-bit board so now it has been discontinued so you can't buy it anymore unfortunately and it manages to run a linux so if it has a linux let's play with it so in this case what i made is a small network
35:24
interface that is vulnerable to a shellcode and here i can check that i have my shellcode which is all emojis so let's see so you can see my emojis shellcode here and what i will do is i will just send it through the network to my high five unleashed board on its vulnerable interface
35:43
so let's send the shellcode to the high five unleashed so i just cut it through netcat i press enter and here i managed to get my shell so let me check if the cpu is correct so let's see i can see that indeed i have a risk 5 cpu from c5 and if i just check who am i i'm wrote so i
36:02
managed to get my old shell well i went a little bit quickly over one thing so i have my gap between my two emojis so what i can do instead of just putting england emojis so i can put whatever
36:23
i want i just need to find emojis whose size is compatible with the gap so this is just a variant of the subset sum problem so you can look in wikipedia how we solve it and in our case it's very easy because we have three and four bytes emojis and as we know every number above six can
36:42
be written as a sum of three and four so it's really easy to solve it if we want to have a little bit fun so we can instead for example try to look at how i can fill this gap with the minimal number of emojis and this is standard dynamic programming it took me exactly 4 minutes and 13 seconds to solve it and implement it so you can try it and it's pretty easy
37:03
well and now i can fit with whatever i want and this gives me very something very interesting which is called polymorphism and this is almost trivial to have polymorphism what is the purpose of polymorphism is to generate many variants of the same shell code that all behave the same and this is very useful because now you can pass anti-viruses that generally use the notion of
37:25
signature so they compute the hash of your shellcode and check in the database if it's already known so now i can generate as many shellcodes as i want whose hashes are all different so for example i can put five emojis here i can also put these 10 emojis or more emojis
37:41
as much as i like when you do shell coding for real sometimes it happens that pointer is a bit flawed and you don't know exactly where the vulnerable problem program will jump on your payload so to make up for this you start your payload with a
38:03
long list of knobs which makes that even if it doesn't jump exactly where you want it will just follow the list of knobs until it get to the start of a meaningful part of your payload so this is what we call an op sled and in unicode we have something which is called we have a like a sled emoji so obviously as hackers our first question was well
38:24
can we do our knob sled using the sled emoji well answer is yes we can so we have to copyright it but a few knobs and as many slides as you want so believe it or not but this is perfectly executable RISC-V code
38:50
and with that we can do another demo so this time this is on the RISC-V 32-bit so we have it this time i have the board with me so this is a nice
39:03
expressive esp32c3 board if it is too small for you to see well just look at the slides you have picture and much bigger and the nice thing about this one is it costs nothing you can get it for less than ten dollars from your favorite distributor so it's really nice to hack with good so for this demo we have decided to like use a very bad coin wallet so this one is
39:34
again vulnerable to an obvious buffer overflow and is going to ask for the passphrase to just access it so obviously we want to use our emoji shellcode to well get the private key
39:49
of the wallet and if we can get that the private key we are rich because we get all coins so let's get that it's a bit slow because i'll get that good so you have here you have the end
40:04
of our emoji shellcode we have used polymorphism here this is why it looks a bit nicer than the previous and at the end you can see that it worked so we have done the private key of the wallet and so now we are rich unless you manage to get the private key and extract the money before the end
40:22
of the talk so it won't well so as a conclusion while we have shown you that we are able to do risk-free
40:40
emoji shell coding on both 32 bits 64 bits nice and that's the method that we presented you even if it was a bit technical could be used for other kind of filters other kinds of constraints and we have to talk a bit about what happened during the making of this project so it might not be easy to see but all of these slides are made with latex so we had many issues
41:06
with latex and typically we did crash tech studio we did crash dual attack and more most pdf readers cannot display our slide correctly acrobat reader is probably the worst and all other kdf readers have various niches we even have issue with vlc like we found that recording the demos would not
41:27
play on vsc so i had to uh convert them all from mkv to mp4 and we found a way to break firefox so like this is firefox explaining currently the
41:40
i will do it yes good and if you press f5 with a pdf in full screen then you lose scrolling in the tab like i cannot call anymore in the space pdf and if i press f5 again this is this good uh well it is still broken so i have personally broken but this tab so well i'm good
42:05
to switch to the new tab the backup tab to finish this talk but well uh other things such as terminal slowness and cups if you know about it this is uh what we use in linux to print things so i did their george to print the slides tell us what happened so i
42:23
tried to print the slides and i managed to break to break both my computer and the printer at the same time and since then i was not able to use my printer for any task at all i have to do a factory with that i think good we also have a gcc segfault which we need to investigate
42:42
and very big things for windows for the occasional bsod when profiling the slides bottom line emoji support is hard there are still many issues in libraries software dealing with images if you want to find exploit in zen it's probably a good bet a good bet to look at them
43:02
so in the end obviously we do do release all of what we did so you have the github repository don't hesitate to click on the link if you cannot click on the link of the big screen just remember the short link you have our email address if you want to contact us and if you have any questions don't hesitate to come to the bottom of the stage and we will happily answer them thank you