We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A deep dive into the world of DOS viruses

00:00

Formal Metadata

Title
A deep dive into the world of DOS viruses
Subtitle
Explaining in detail just how those little COM files infected and played with us back in the day
Title of Series
Number of Parts
165
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
It is now 27 years since MS-DOS 5.0 was released. During its day there was the threat of viruses breaking your system or making it act in unpredictable ways. Due to its age and near total lack of consumer use it is safe to assume that all of the viruses for MS-DOS have been written. Using community archives and modern analysis methods we can uncover how they worked and reflect on how things have changed.
Keywords
Level (video gaming)Error messageMultiplication signCodierung <Programmierung>BefehlsprozessorProjective planeRevision controlUniverse (mathematics)Density of statesLecture/Conference
InformationSlide ruleLecture/ConferenceJSONComputer animationXML
Route of administrationMalwarePolar coordinate systemRange (statistics)CuboidMalwareArithmetic meanDensity of statesEndliche ModelltheorieComputer animation
Convex hullRead-only memoryDevice driverExecution unitNP-hardHeat transferAsynchronous Transfer ModeNumberBus (computing)Software testingRight angleRoute of administrationDirectory serviceMenu (computing)MiniDiscHardware-in-the-loop simulationCursor (computers)Density of statesMaxima and minimaRule of inferenceClique-widthUser interfaceGame theoryHand fanTextsystemInformation securityView (database)Point (geometry)Goodness of fitComputer animation
MiniDiscNP-hardExtension (kinesiology)Device driverExecution unitSoftware testingBus (computing)Right angleHeat transferNumberElectronic program guideSurvival analysisField (computer science)Menu (computing)Computer fileComputer configurationOpen setOnline helpLetterpress printingCanadian Light SourceExt functorComputer programmingGateway (telecommunications)Formal languageComputer animationSource code
Computer fileLoop (music)Integrated development environmentMusical ensembleOffice suiteTwitterFrequencyAnalytic continuationDensity of statesKeyboard shortcutComputer virusJSONSource codeComputer animation
Set (mathematics)Error messageExtension (kinesiology)Execution unitAnnulus (mathematics)Server (computing)ForceCodeMalwareSystem programmingComputerAuthorizationComputerComputer networkComputer programComputer virusComputer programmingPoint (geometry)QuicksortRAIDWordComputer animationSource code
Web browserNormal (geometry)Computer virusVirtual machineGraphical user interfaceDistribution (mathematics)Population densityMotion captureEntire functionCycle (graph theory)Mobile appVideo gameCodeComputer animation
Computer fileMechanism designComputer programmingDistribution (mathematics)Density of statesComputer virusSoftwareFloppy diskMalwareNeuroinformatikComputer animation
Electronic visual displayComputer wormHookingSystem callComputer fileCondition numberComputer wormCondition numberGraph coloringQuicksortDemosceneInheritance (object-oriented programming)CodeComputer animationJSON
Computer wormComputer fileCondition numberCodeHydraulic jumpNeuroinformatikComputer wormCondition numberComputer fileMalwareSystem callTape driveGame controllerComputer programmingDensity of statesHydraulic jumpHookingCodePhysical systemComputer animationDiagram
Electronic visual displayComputer wormComputer fileCondition numberComputer programDensity of statesDevice driverInterrupt <Informatik>File systemComputer programmingOperating systemMalwareComputer animation
Resource allocationElectronic visual displayComputer fileString (computer science)Letterpress printingOpen setoutputComputer programmingString (computer science)Message passingBinary codeMereologyPointer (computer programming)System callComputer animation
Resource allocationElectronic visual displayComputer fileString (computer science)CodeOpen setoutputKeyboard shortcutHexagonSoftwareFerry CorstenComputer programmingCodeLoop (music)Coma BerenicesProcess (computing)Uniform resource locatorPhysical systemSystem callHydraulic jumpPoint (geometry)SpacetimeControl flowProgram codeSemiconductor memoryKernel (computing)Game controllerRandomizationSpeicherschutzBefehlsprozessorAddress spaceGroup actionPairwise comparisonRootBootingVector spaceResultantRouter (computing)Interrupt <Informatik>Computer animation
CodeElectronic visual displayWritingOpen setPointer (computer programming)Revision controlRoute of administrationParsingComputer programReading (process)Computer virusComputer programmingComputer fileSampling (statistics)System callQuicksortSemiconductor memoryException handlingSource code
Revision controlRoute of administrationPointer (computer programming)Reading (process)InferenceComputer programCodeCodeComputer fileString (computer science)Run time (program lifecycle phase)Electronic visual displayVirtual machineProxy serverAsynchronous Transfer ModeBitSoftware testingSource code
NeuroinformatikSpacetimeAsynchronous Transfer ModeSystem callGreatest elementComputer animation
InferenceString (computer science)String (computer science)BefehlsprozessorSystem callLaptopLetterpress printingSemiconductor memoryAsynchronous Transfer ModeBitPointer (computer programming)Order (biology)Set (mathematics)Computer animation
BitBefehlsprozessorBefehlsprozessorMaxima and minimaVirtual machineSpacetimeOperating systemCode1 (number)CASE <Informatik>SpeicheradresseAsynchronous Transfer ModeAddress spaceSemiconductor memoryBitStatement (computer science)Computer animation
Crash (computing)Multiplication signSemiconductor memoryView (database)Pointer (computer programming)Physical systemBitComputer animationDiagram
BefehlsprozessorBitTunisGraphics processing unitProgrammer (hardware)1 (number)MalwareChecklistBitComputer architectureMultiplication signComputer animation
ChecklistTouchscreenComputer fileTable (information)TouchscreenOpen setSystem callIntegrated development environmentBefehlsprozessorMultiplication signMathematical analysisComputer animation
Density of statesRead-only memoryCodeComputer fileFlagPhysical systemState of matterQuicksortSystem callConnected spaceInternetworkingComputer animation
Sample (statistics)Sampling (statistics)System call2 (number)Power (physics)Computer programmingMultiplication signBefehlsprozessorState of matterSoftware testingDebuggerPrice indexComputer animation
CodeCodeComputer programmingProgram codeMalwareStack (abstract data type)Computer animation
TouchscreenChecklistStack (abstract data type)ChecklistCodePointer (computer programming)System callPairwise comparisonHexagonSpeicheradresseComputer animationSource code
Gastropod shellComputer fileSample (statistics)Route of administrationTouch typingMalwareMultiplication sign1 (number)Sampling (statistics)Computer animation
EmulatorPointer (computer programming)OpcodeHydraulic jumpBlogCodeForcing (mathematics)BefehlsprozessorMultiplication signVirtual machineComputer simulationHydraulic jumpSemiconductor memoryDifferent (Kate Ryan album)Address spaceStack (abstract data type)Computer animationMeeting/Interview
Computer programSample (statistics)Installation artError messageSet (mathematics)Computer virusVirtual memoryDevice driverBuffer solutionConditional-access moduleChi-squared distributionExecution unitLink (knot theory)Hill differential equationAverageDifferent (Kate Ryan album)CodeComputer programmingComputer virusMultiplication signComplex (psychology)Sampling (statistics)Bookmark (World Wide Web)AuthorizationError messageFerry CorstenSet (mathematics)Revision controlPhysical systemRootNeuroinformatikMalwareComputer fileGodBinary codeVirtual machineMathematicsComputer animation
Multiplication signVideo gameInternetworkingMathematical analysisNumberComputer animationLecture/Conference
Computer fileComputer virusSampling (statistics)Utility softwareWindow1 (number)Binary codeMeeting/Interview
MalwarePhysical systemSurjective functionFactory (trading post)Lecture/Conference
Condition numberQuicksortComputer virusBitPhysical systemMathematical analysisSampling (statistics)Ferry CorstenStrategy gamePattern recognitionMultiplication signTask (computing)Lecture/Conference
MalwareCore dumpInternetworkingSource codeComputer scienceProcess (computing)Computer virusMereologyObservational studyLecture/Conference
MalwareSampling (statistics)InternetworkingPresentation of a groupBinary codeComputer virusSoftwareLecture/Conference
Open setMaxima and minimaRevision controlParsingComputer programCodeRoute of administrationElectronic visual displayComputer fileResource allocationString (computer science)Letterpress printingoutputHydraulic jumpComputer filePhysical systemSampling (statistics)CodeMalwareComputer virusAreaComputer programmingAntivirus softwareProcedural programmingComputer animationDiagram
Antivirus softwareBitSampling (statistics)Product (business)Lecture/Conference
Right angleDiagramGoodness of fitLecture/ConferenceMeeting/Interview
Revision controlMultiplication signMalwareComputer virusReal-time operating systemInformationPoint (geometry)Lecture/Conference
AuthorizationString (computer science)Computer virusAddress spaceSampling (statistics)NumberLecture/Conference
Computer virusBootingMultiplication signRow (database)Forcing (mathematics)RootkitComputer filePoint (geometry)Combinational logicMathematicsLecture/ConferenceMeeting/Interview
Sample (statistics)CodeInternetworkingWeb 2.0Lecture/ConferenceComputer animation
Sample (statistics)Route of administrationGastropod shellComputer fileEmulatorHydraulic jumpCodeBlogGoodness of fitElectronic mailing listVirtual machineDensity of statesMalwareComputer animationLecture/Conference
BlogEmulatorHydraulic jumpChainLecture/Conference
EmulatorBlogCodeRead-only memoryHydraulic jumpSemiconductor memoryCartesian closed categoryCone penetration testDiagram
Transcript: English(auto-generated)
Okay, so this talk is called a deep dive into the world of DOS viruses, and, if you
happen to be at the 8C3, that is 27 years ago, you would have seen a very young and awkward, even more awkward than I am at the moment, version of myself speaking on
basically the same subject. The stage, of course, was a lot smaller than this. This would have really intimidated me back then, but I was talking about a university project that we had run for about three years at that point, and our possibilities are very limited. Meanwhile, 27 years later, our
speaker, in between fighting battleships over the public BGP network, and trying to encode data in dubstep music, was able to actually do all of the stuff that we were trying to do with a lot of effort, basically in, I guess, four hours
of CPU time or something like that. Please help me in welcoming Ben to our stage to talk about a bygone error.
Thank you. Hi, I'm Ben Carrick-Ox, as the slide suggests. So, I have an admission to make. So this is a thing to be aware of. And, you know, things also to be aware of. Anyway, so, what is DOS, to get straight into it? You
know, this is an old IBM, another very old legacy system, but a thing to be aware of is that DOS covers a wide range of vendors. It might not just be like those old IBM PCs. Some of the DOSs had compatibility with each
other, meaning that some of the DOSs had shared malware with each other. To be honest, most people know DOS as these lovely old beige boxes. The same era gave us our loved model M keyboard, hated by some, loved by others for the sound. But, you know, most people's knowledge of DOS came
from computers, a user interface that looked like this. Pretty basic. There we go. Okay. So this is Wordstar. Some of you may not know
that Game of Thrones was written on Wordstar. George R.R. Martin is apparently not a big fan of modern word processing. He admitted he had some issue with disliking how spell checking worked. So just users, and I also guess it's a good security point of view. You can't get hacked if it literally has no internet access. So, also, for a lot of people, this is
also their first experience into programming for some of the older crowd. This is also the invention of Q basic, which, you know, gave a very basic language to program creatively in DOS. For some people, this was the
gateway drug into programming and perhaps the gateway drug into what they started as a career. For other people, though, the experience of DOS was not so great. For example, you know, let's just say you were doing some work in an infinite loop. And at some point, stuff like this happens. Unfortunately, I don't have sound for this one. But you can
just in your head imagine like your PC speakers playing some small techno music on like, you know, but only one frequency at a time. This might get especially incredibly embarrassing if you're in an office environment. Just slowly beeping away. You can't exit this. It has to
finish fully. And if you touch the keyboard, it reminds you not to touch the keyboard and continues playing its music. So, you know, this would be fun. But this wouldn't be fun, especially if you're in an office environment. But, you know, ultimately, it's not malicious. And that trend continues. This is another good example of a DOS virus.
This is Ambulance. For when you run it, an Ambulance just drives past. And then your normal program just continues running. I think this is amazing. It's an interesting era of viruses. It was all the history of it was collected very well by a website called VX heavens which sort of still lives. But unfortunately,
at one point was raided by the Ukrainian police for what is the fantastic wording they used. Basically, that someone told them they were distributing malware. Unfortunately, not malware that operates in this century. But I guess that's good enough for a raid. But luckily, for the archivists,
there are archivists of archivists. And so we have a saved capture of VX heavens. This is actually an old snapshot. There are way more modern snapshots. But thankfully, the MS-DOS virus era doesn't move very quickly. But the interesting thing here is there's 66,000 items in this version. And it's 6.6 gigabytes of code. And these
viruses are like super dense. There's not much to them. They are just blobs of machine code. They're not like your Electron app these days that ships an entire Chrome browser and normally an out-of-date Chrome browser. This is just basic like how to draw an Ambulance and some
infection routines. The normal distribution also changes with it as well. For example, the normal life cycle of an MS-DOS virus is you download or for some other reason run an infected program. That presumably does nothing. To you, it looks like it does nothing. So it remains roughly undetected. That then you go and run
more files. The DOS virus infects more files. And at some point, you're probably going to give one of those extra compute tools to some other computer or some other person. Whether it was by giving someone or copying a floppy disk of some software, maybe some expensive software so they didn't have to pay for it, or uploading it to a VBS where it can be downloaded by many people.
So the distribution mechanism is a far cry from the eternal blues of this era where we can have a piece of strain of malware spread across the world very brutally very quickly. So most DOS viruses are pretty simple. They start and they say, have my payload
conditions been met? If not, then they'll go and display the payload. And the payloads are definitely more, I don't know, nice. You know, you have stuff like this, which is pretty, and it uses VGA colors and all sorts of really nice stuff. You get also some very demo scene vibes from this. Another good
example is this like VGA like super trippy thing, which is really impressive because this is really small. This is less than one kilobyte of code. It's in fact way less than one kilobyte. It's pretty quick. You can watch the entire computer just dissolve away, which also might be quite
worrying if you weren't expecting that. Alternatively, if the payload conditions are not met, then, you know, you hook syscalls and you, or alternatively, if you want to be way more aggressive as a malware offer, you scan for files on the system to infect proactively. And the way you infect DOS
programs is pretty simple. Imagining you have the first three bytes of the program, you have like one giant tape of all the code you have for the target program. Most of them work like this in that they replace the first three bytes of the program with a x86 jump. They append their malware on to the end of the executable. And so the first thing that you do when you run the executable is it jumps to the end of the file effectively, runs the malware chunk,
and then it optionally will return control back to the original program. But there's also the second thing about hooking syscalls, right? So, you know, MS-DOS is an operating system. It does have syscalls. The programs can reach out to MS-DOS to do things like file access and
stuff. So, as you expect, you run a software interrupt to get there. Thankfully, though, MS-DOS does also allow you to extend MS-DOS by adding handlers itself or even overwriting existing handlers, which is very convenient if you're trying to write drivers. But it's also
incredibly convenient if you're trying to write malware. For some of the examples of the syscalls, most of them are relevant towards DOS virus-making. Here is a decent example of the things that DOS will provide you. A lot of them are just very useful in general for producing functional executables that users want to use.
This is what an average program looks like. This is almost the shortest hello world you can make, minus the actual hello world string. In fact, the hello world string might be the largest part of this binary. It's a pretty simple binary here. We're moving a pointer to the message we just set. We then
set the AH register to nine, or hex nine. That's the syscall for printing a string. And then we run a software interrupt, 21H, which is short for 21 hex. And we continue on. We then set AH again to 4C, which is return
with an exit, exit with a return code. And the program will return. So, in the meantime, this is roughly the loop that just happened. You have your program code that calls an interrupt. And that gets passed over to the interrupt handler. In the
process of doing this, the CPU has quickly looked at the first 100 bytes of memory in the interrupt vector table, IVT, as it's abbreviated. And then it's effectively a router. If anyone has written a small piece of code to root HTTP requests or anything, it's basically like that, but in the 80s with syscalls. So it just basically is saying compare this, compare that, jump that,
jump that. Then the thing gets passed to the call handler. It goes and does the syscall, the thing that is required. Normally, it will leave some registers behind, a state, or results of actions it has performed. And it returns control back to the system. So that's the program. So theoretically speaking, if we wanted to go and look at what a program actually does, we need
to set a break point here. Because this is the only place that we can be sure the location exists. Because this is way before the era of ASLR, address space randomization. And this is way before the era of kernel space randomization. In fact, MS-DOS has almost no memory protection whatsoever.
Once you run a program, you are basically putting the full control of the system to that program, which means you can happily also boot things like Linux directly from a com file, which is handy if you want to upgrade. So if we look at certain files, we can
go and see what they do. So in this case, here is one example. This is a goat file. A goat file is like a sacrificial goat. It is a file that is purely designed to be infected. So what you do is you bring a virus into memory in the system, and then you run a goat file in the vague hope that
the virus will infect it, and then you have a nice clean sample of just that virus and not another program inside the virus, which makes it way easier to test and reverse engineer. So we can see things are happening here. For example, we can see it opening a file, moving like where it's looking into the file, reading some data from the file, just two bytes, though, and it closes the file. We see the same sort
of thing repeat itself, except at one point, it reads a large amount of data, moves the file pointer, writes another large amount of data, does some more stuff, and, you know, this, we pass some file names, we display a string, which is almost definitely the goat file message, and, yeah, we pretty much exit after that. So there were a few
syscalls here that we would really like to know more about. So for that, it's the open files. We'd really like to know what files were being opened. We would also want to know what data was being written to the file, rather than having to fish it out of the virtual machine later. And we'd also, just out of curiosity, really want to know what file names it was
asking MS-DOS to parse. Display string is also a nice test to know whether your code is working. So to do this, you're going to have to look a little bit deeper into how the MS-DOS run time and by proxy how the X86 in 16-bit mode works, or legacy mode, I guess. This is basically all the registers you have in 16-bit mode.
And some nice computations at the bottom to make it easier to read. So, as we mentioned, AH, AH is the one that you use to specify which syscall you want. And you'll notice it's not there. AH is actually the upper half of AX. AH is an 8-bit register,
because sometimes people really just wanted only 8-bits. It's very obscure that we were saving that much space. So we're going to go through that. And so this is what a this is the definition of the syscall of a print string. So you have AH needs to be set to 9. This is once you've in order to call the
syscall for a printing string, you set AH to 9 and then you need to set DS and DX to a pointer to a string that ends in a dollar. And that doesn't make a lot of sense or it didn't make a lot of sense to me when I first read that. And so to do this, we need to learn a little bit more about how memory works on these old CPUs or the CPUs that are probably in your laptops but
running in an older mode. So for this is effectively what it looks like. They have a 16-bit CPU. 2 to the 16 is 64 kilobytes. And we have a 20-bit memory address in space. 2 to the 20 is 1 megabyte. So if you ever see an MS-DOS machine like limiting it 1 megabyte or some old operating system saying like the maximum memory it can have is
1 megabyte, it's because it's running in 16-bit mode. And the maximum it can physically see is 20 bits. So the question is how do we address anything above 64K if the CPU can only fundamentally see 16 bits? So this is where segment registers come in. We have four segment
registers. Actually, we might have more but they're the ones you need to care about. There's the code segment, the data segment, the stack segment and the extra statement segment in case you need just another one. So anyway, with that in mind, let's have a quick look at the crash course on segment registers. So imagine if you have a very long piece of memory.
And we can only see 16 bits at a time. So, however, we can move the sliding window around in the memory to go and see like to move our view of where it is. So we can do this and put data around the system and we can use the
final pointer to specify how far into the memory segment we should go. So the DS and DX really just means a multiplier. So where the data segment is 100, you need to just move 100 times 16 to get to the correct place in memory and then DX is the
offset. This continues on. So where we have 16 bit CPU, we have a bunch of general use registers or general purpose registers. They're quite useful for ensuring you don't need to touch RAM too often. X86 actually has a fairly small amount of
general purpose registers. Some architectures have way more. I think more modern chips like GPUs have hundreds. Well, hundreds, maybe thousands. However, this doesn't really change over time in X86 because we have to force backwards compatibility. So really what actually ends up happening when we move up the bitage is that the same registers just get wider.
And we add some more ones for the programmers that want them. And the exact same thing happened to 64 bit, the registers just got wider. So thinking about it, we have a lot of malware now. What if we want to know everything that's happened in this entire archive? So we kind of want to trace all of these automatically,
but we might not know what we're looking for. So let's go through the checklist of what we need to do to trace all of this malware. We need a breakpoint on the syscall handler. When we get that breakpoint, we need to save all the registers so we know which syscall was run and potentially what data is being given to the syscall. Ideally, we're going to
save 100 bytes from that data pointer, not especially because we need it, but it's quite handy in a lot of registers in a lot of syscalls. It's, for example, what you use to get the open file path when you're opening files. We should also probably record the screen for quick analysis rather than just staring at HTML tables.
So we can do that. We burn a lot of CPU time and probably cause some minor amount of environmental damage. And we get nothing. We just run a bunch of stuff and most of them don't return anything. At best, they return a goat file string. They just do nothing. So
if we look deeper into the reason why, it's sort of a smoking gun here. So we can see the syscalls that run on this file that does nothing. And the smoking gun here is the date. So it's asking for the date from the system and this sort of flags out the first issue is that a lot of MS-DOS viruses don't really have a lot to go on because they have no internet connection and
there's not really any other state they can decide to activate on. So the date syscall is pretty simple. The getDate and getTime just return all of their values as registers. And, you know, some using the 8-bit halves to save space. So a
naive way of doing this is what we do is we would run the sample. We'd wait for the syscall for date or time. We would just fiddle the values because in this case, we're using a debugger so we can automatically change what the state of the registers are. And we can then observe to see if any of the syscalls of the program ran changed, which is a pretty good indication that you've hit some behavior that is different.
And then, you know, we can say, hooray, we found a new test case. The downside is running every one of these samples takes 15 seconds of CPU time because MS-DOS, 15 seconds of wall time, which when you're emulating MS-DOS is 15 seconds of CPU time because of the fact that MS-DOS doesn't have power saving mode, so when it's not doing anything, it just goes into a busy loop,
which makes it very hard to optimize. Or, we could take a cleverer look. So, when we think about it, we are in the interrupt handler. All we ever see is the insides of the interrupt handler because we don't know where the program code is. The interrupt handler is the only
place that we know is consistent. Because MS-DOS could potentially load the code for the malware or the program anywhere. But we want to know where the code is. It would be really handy to know what the code is that would be about to run. So for this, we need to look towards the stack. Just like the DS and DX registers, the stacks are located on a stack segment and a stack pointer.
Luckily, the first two values is the interrupt pointer and the stack segment. So we can use that to grab exactly what the code will be ran afterwards. So we just need to add a few things to our checklist. We just need to grab four bytes from the stack pointer and then using that, we can calculate the destination that the syscall will return to. And if we look at some of them, we can look at an example here.
This is what a piece, one of the calls return as. So we see we're running a compare on DL against the hex of 0X1E and then if that comparison is equal, it will jump to one memory address and if not, it will jump to another.
So if we look back at our definition of those syscalls, we can see that DL is the day. So with this, we can conclude that if 0X1E is 30, DL is the day, this malware
effectively is saying if the day of month is 30, we need to go down a different path. If we run these all over time across the whole dataset, what we see is roughly this as a poorly drawn bar chart. We see out of the 17,500 samples we have, around 4,700 of them check
for the date and time. And these are the ones that are really tricky because they're really hard to activate. They're also the most interesting though because those are the ones trying to hide. So, with that in mind, we need to, we have the code segment that we're about to run when we return and we can't really brute force because it takes a lot of CPU time but we can't brute force it
inside a real or emulated machine. But we can brute force it in a significantly more interesting way. We need to build something. We need to build the world's worst x86 simulator. Dubbed BenX86, it's 16-bit only. Any attempt to access memory
effectively ends the simulation. It's got a fake stack. If you try and push something onto the stack, it says, sure, fine. If you try and pop it, it's like, oh, actually, I never held any of that data anyway. Sorry, we're ending the simulation. 80 op codes, most of them are jumps because that's the primary purpose is comparing in jumps. The difference is it logs every op code, every address that it went through
and it can be run with just a small x86 code segment and a register snapshot. This means that we can test all days from 1980 to 2005 in roughly about 100 milliseconds and most programs ended up having just three different code paths on average. So that yields us with 17,000
virus samples and about 10,000 samples that had date variations, as in once you explode the complexity. So I'm going to now use my final remaining time to go through some of my favorite. So this is an example of a virus that just doesn't do anything on the first of 1980.
However, if you would happen to be running this on New Year's Day, you would get this. No matter what you do, every program, you can't exit out of this. Your machine is hung. This might be great, right? You might be like, oh, cool, I don't need to do work anymore because my computer will literally not let me. This also might be terrible because
you might need to do some work on New Year's Day. Here's another example. This does nothing as well. Just another innocent .com file. Of course, reminding that these pieces of malware will be wrapped around something else. So, you know, almost anything could be infected in here. In this case, though, these binaries are nice and shaved down. However, instead,
we get this. Which I think is super interesting. And it's basically the author is aware that they're telling you. They're actually like self-disclosing. They're saying, the previous year, I've infected your computer. And
for some reason, they're being nice. They're just saying, yeah, actually, you have been infected. And as a I guess a pity, I'm just going to remove myself now. For some reason, it's also encouraging you to buy McAfee. This is back in the day when John McAfee himself actually wrote McAfee.
Interesting times. Definitely interesting times. Here is another example. This one I found particularly obscure. On the 8th of November in 1980, or any year
I think, actually, it turns all zeros on the system into tiny little glyphs that say hate. If anyone understands this, I'd really like to know. Like, I've been thinking about this a lot. Like, what does it mean? Is it an artistic statement? I wish I knew.
There could be a CCC variant that says mate. Another good one, in that it's the last thing I ever want to see any program tell me is this one here, where you run it and it says error eating drive C. I never ever want
to error in any program that unexpectedly just says sorry, almost I failed to remove your root file system. Don't know why. Could you, like, change your settings so I can remove it? Cheers. And finally, this is one of my absolute favorites in that it's just brilliant. In that it also stops you from running
the program you want to run. It exits prematurely. This is the virus version of the Navy Seal copy pasta. It says I am an assassin. I want to and I shall kill you. I also hate Aladdin. And I also will kill it. I will eliminate you with and we know where this is going.
It says fear the virus. It is more powerful than God. It only activates on one day, though, so it's fine. Thank you for your time. I know it's late. And I will happily take any questions or corrections if you know this topic better than me.
This totally brings tears to my eyes with nostalgia. So if there's any questions, we have microphones around the room. There's one, two, three, four and one in the back. We also have questions perhaps from the Internet.
If you want to ask a question, come up to the microphone, ask a question just as a reminder. A question is one or two sentences with a question mark behind it and not a life story attached. So let's see what we have. I'm going to start with microphone number one
just because I can see it easiest. Let's go for it. Hi, Ben. Thanks for the talk. Really interesting. My question would be did you do any analysis on what ratio of the viruses was more artistic and which one actually did damage? So most of them surprisingly don't do
damage. I actually really struggled to find a date varying sample that specifically activated on a certain date and decided to delete every file. There are some very good ones in that some of them are like virus scanning utilities that just don't do anything on certain dates and then one day while they're telling you all the files they're scanning, it's actually telling you all the files they're deleting.
So that's particularly cruel. But it's actually surprisingly hard to find a virus sample that actually was brutally malicious. There were some that were just in fact binaries but it's very hard to find one that I think was brutally malicious. Which is a far cry from the days that we live in right now where we're taking
down hospitals with Windows bugs. As everybody's leaving the room, please do it quietly. I see a question at three on that side. Yes, since a lot of industrial control systems still run DOS, what's the threat from DOS malware that might be written
today? It's probably unlikely that an industrial control system that's running DOS would come into contact with DOS malware. The only way I can think is if one vendor was, or like if a factory or supply or whatever was basically downloading or basically wares onto industrial
control boxes, I wouldn't be surprised, but it would be pretty irresponsible. But it would be quite surprising to find MS-DOS malware today on industrial controllers that was installed recently and not just a lingering infection from the last 20 years. Microphone 2.
Did you find any conditions that weren't date-based? Some of them do attempt to, some of them try and circumvent the date recognition. Unfortunately, it's very hard to brute force those. Some of them install themselves as what's called TSR, terminate and stay resistant, which basically means that they will exit out, run in the background
and continuously ask the actual system timer what time it is. It's a bit of a more risky strategy because the system timer might not exist, which would be unfortunate for the virus. So definitely there are viruses that have way more complicated execution conditions. I observed one sample that only activated after, I believe it was something silly,
like 100 key presses, which is very hard to automatically test. Those sort of viruses require static analysis and statically analysing 17,000 samples is a time-consuming task. So we have a question from the internet. Do you have the source, or what is the
source of the malware that you analysed here and published somewhere? You can still find dumps of VX heavens on probably more modern dumps of VX heavens on popular torrent websites, but I'm sure there are also copies floating about on non-popular torrent websites.
Over to microphone one. Hi, Ben. I'm Job. Thank you for your talk. I was wondering, did you learn anything from your studies of these viruses that should be taught in modern day computer science classes? Like, more efficient
sorting algorithm, or some hidden gem that actually should be part of our approach to computing these days? My primary takeaway was x86 was a mistake. So I'm not seeing any more
questions. Oh, no, there is. One more question from the internet. Have you found malware samples that try to detect dummy binaries or whatever to avoid easy analysis? Oh, actually, that's a really good question. It is complicated.
Some viruses would maybe, let's be dangerous, let's try and go backwards on my home-written presentation software. da-da-da-da-da, come on. Too many slides. I have regrets. Here we are.
This slide. You know how I'm saying the malware infection goes to the end? Some samples are really cool in that they don't change the size of the file. They just find areas of the files that are full of null bytes and just say, this is probably fine. I'm just going to plot myself here. Which may have unintended consequences. It may mean that if a program is
like a statically defined byte array of a certain size, and the program is relying on it being zeros when it accesses it for the first time, it may get very surprised to find there's some malware code in there. But generally speaking, as far as the underwear, this deployment
procedure works pretty well. And it actually is very good at avoiding antivirus of the era, which would just be checking common system files and its size. And if the size increases of command.com, then that's clearly bad news. We have a question on microphone 1. Are there any viruses
that try to eliminate or manipulate virus scanners of the day? Oh yeah. So a lot of the samples will actively go and look for files of other antiviruses. But I am generally under the impression that it's kind of hard to find them. There weren't actually that many antivirus products back in the day.
I feel like it was a bit of a niche thing to be running. Microsoft did for a while ship their own antivirus with MS-DOS. So I guess, you know, what's new is old. So there were antiviruses out there. I don't think many of them were very effective.
Any more questions? Oh, right. Another one from the engine. The engine is querying MS-DOS all the time. Go ahead. Did you do the diagrams by hand or do you have a tool? So many hours.
No, so there's a couple of good tools to do it. Asciiflow.org I think is a fantastic tool. I would highly recommend it. I think it's not maintained very well though. Microphone 1. Are you publishing the tools you wrote? I will be publishing the tools at some point. When they are less
when they are less ugly. I will be definitely publishing all of the automatic malware runs and the GIFs generated by them so that people can easily Google for the virus names and get actual real-time versions. The hardest thing that I found is when looking at virus names was literally just finding any information about them and
one of the things I really wish existed at the time of writing this talk was being able to just query a name and be like, oh, yeah, this virus looks like it does this. Since I saw Microphone 1 first, let's go with that. Did you find any viruses that had signage in them? Not signage of today, but the name of the author
was very proud of what he wrote. Yeah, there's some notable examples. Quite a few of them will try and name. So, DOS viruses do have obviously sample names in the same way that we still today give viruses names. A lot of the time, you will just encode a string that you want the virus to be named somewhere in the file,
just a random string doing nothing. It's like, oh, okay, they clearly wanted to be called Tempest. So, that does happen. One of the favorite examples is the brain malware, which literally encodes an address and phone number of the author. I believe in Pakistan. And there's a fantastic mini-documentary by F-Secure
where they go and visit the people who wrote it. It's a super interesting watch, and I would really recommend it. Indeed, yes. Microphone 2. Did you have any chance to look at any kind of viruses that did not modify the files themselves? For example, one of the largest virus
infections at the time was a virus called Nymella, which modified the master boot record. Yeah, master boot record, I did consider. It was more of a time problem that I had to deal with. Getting to the point where you could brute force time and date combinations and looking for master boot record changes was really hard.
I am super interested in reviewing effectively the root kits of the era. But, yeah, that's definitely something I will look into in the future. And we have yet another question from the internet. Yeah, it's even from the same guy. Oh, damn.
Is the web somewhere? It probably will be. I wouldn't expect it to work well in any use case, though. It's effectively designed to not work correctly, right? What was the spec? It basically fails at every single. Anything awkward,
I just went, oh, that's fine. We're probably far enough down it anyway. Where are we? Be aware this is the feature list. So is that a follow-up question from the internet? No, it's a new one. It's a new one. Good. I don't know how serious it is, but would it be possible or a good
idea to use machine learning to create new DOS malware from the existing samples? It would not be a good idea, but I like how you think.
Actually, I saw somebody trying to use NLP to generate viruses. You could probably do Markov chains with X86, to be honest. Please don't do that. Please. Don't try this at home. I have seen things. Please don't do that. So, I think we've run out of questions.
Going once, going twice. Let's thank Ben for this marvellous retrospective talk.