Old Malware, New tools: Ghidra and Commodore 64
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 85 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/62190 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
ComputervirusZweiCodierungMalwareReverse EngineeringComputersicherheitRechenschieberMultiplikationsoperatorMaschinenschreibenSoftwareTwitter <Softwareplattform>Projektive EbeneProzess <Informatik>Güte der AnpassungComputeranimation
01:54
SoftwareRechenwerkVIC 20W3C-StandardGammafunktionAssemblerComputervirusProgrammSchnittmengeShape <Informatik>CodeMalwareGraphfärbungAggregatzustandSoftwareDatenverarbeitungssystemFunktionalVIC 20SpieltheorieCASE <Informatik>GrenzschichtablösungDemoszene <Programmierung>Web-SeitePunktGüte der AnpassungMAPInformationAbgeschlossene MengeTypentheorieComputeranimation
06:13
Open SourceComputervirusMailing-ListeMalwareAnalysisComputeranimation
07:03
Open SourceVorwärtsfehlerkorrekturKonvexe HülleWeb SiteMultiplikationsoperatorHardwareDatenverarbeitungssystemCharakteristisches PolynomBildschirmfensterCodeRechter WinkelTemplateAnalysisInhalt <Mathematik>TouchscreenMini-DiscDebuggingSpieltheorieEmulatorVideokonferenzPlug inComputervirusAppletBefehlsprozessorElektronische PublikationDynamisches SystemAssemblerRohdatenSchnittmengeCASE <Informatik>ProgrammOrdnung <Mathematik>HydrostatikBildgebendes VerfahrenVIC 20Computeranimation
11:53
MagnetbandlaufwerkHardwareMultiplikationsoperatorWort <Informatik>Mini-DiscSchnelltasteWürfelVIC 20FormfaktorCoxeter-GruppeMassenspeicherRechter WinkelBefehlsprozessorDiskettenlaufwerkRuhmasseInformationsspeicherungWeb SiteProjektive EbeneComputeranimation
13:35
Lokales MinimumFrequenzPhysikalisches SystemInterface <Schaltung>Turbo-CodeRechenschieberHalbleiterspeicherSystemzusammenbruchVollständigkeitFunktionalComputervirusAutomatische IndexierungAnalysisCoprozessorBildschirmfensterNichtlinearer OperatorBefehlsprozessorSystemaufrufAlgebraisch abgeschlossener KörperComputerarchitekturZeiger <Informatik>ProgrammResultanteFahne <Mathematik>MultiplikationsoperatorCodeEinfach zusammenhängender RaumFundamentalkonstanteSpeicheradresseInformationsspeicherungHauptplatineLastMereologieGanze FunktionMultiplikationEmulatorDebuggingÜbertragKeller <Informatik>SchnittmengeMini-DiscComputeranimation
17:42
RuhmasseKonvexe HülleSchnittmengeWort <Informatik>Nichtlinearer OperatorResultanteGrenzschichtablösungBetriebssystemSpeicheradresseSerielle DatenübertragungInformationsspeicherungHalbleiterspeicherTouchscreenDemoszene <Programmierung>BefehlsprozessorCodeMAPVerzweigendes ProgrammCodierungComputervirusHochdruckCASE <Informatik>ProgrammSchreiben <Datenverarbeitung>MultiplikationsoperatorFlächeninhaltModemProgrammierumgebungEndliche ModelltheorieEmulatorRechter WinkelUmwandlungsenthalpiePunktBus <Informatik>RoutingLeistung <Physik>AdressraumKernel <Informatik>VIC 20CoprozessorSystemaufrufMini-DiscAnalysisSpeicherabzugASCIIAbgeschlossene MengeComputeranimation
23:36
W3C-StandardAdressraumCASE <Informatik>HalbleiterspeicherMini-DiscCodeLoopDifferenteWeg <Topologie>ComputervirusDatenstrukturInformationsspeicherungDiskettenlaufwerkSerielle DatenübertragungSchnittmengeZahlenbereichKonfiguration <Informatik>ComputerspielGanze FunktionKanalkapazitätComputeranimation
25:18
GEDCOMAdressraumCodecDatenstrukturElektronische PublikationWeg <Topologie>SchnittmengeCodeBitmap-GraphikMailing-ListeVerzeichnisdienstDifferenteMini-DiscInformationp-BlockProgrammZahlenbereichTypentheorieFreewareDateiformatOrtsoperatorDateiverwaltungHalbleiterspeicherUmwandlungsenthalpieAdressraumIntegralVIC 20BitMapping <Computergraphik>MAPRandomisierungZeiger <Informatik>ComputervirusComputeranimation
29:47
GEDCOMVerhandlungs-InformationssystemSechseckLokales MinimumVerzeichnisdienstCodeComputervirusElektronische PublikationOrdnung <Mathematik>DatenstrukturBitmap-GraphikMini-DiscWeg <Topologie>p-BlockLastLoopProgrammFunktionalFreewareInformationComputeranimation
32:28
Hill-DifferentialgleichungMakrobefehlGEDCOMEin-AusgabeFunktion <Mathematik>SteuerwerkBildschirmsymbolRechenwerkROM <Informatik>SkriptspracheElektronischer DatenaustauschSystemaufrufHilfesystemTopologieLesen <Datenverarbeitung>p-BlockZeiger <Informatik>VersionsverwaltungBitElektronische PublikationZahlenbereichMini-DiscDatenstrukturWeg <Topologie>TouchscreenVerzeichnisdienstWechselsprungDifferenteBefehlsprozessorCodierungComputervirusTexteditorAnalysisProzess <Informatik>MultiplikationsoperatorFehlermeldungp-BlockCASE <Informatik>LoopAdressraumBitmap-GraphikDateiformatMalwareRoutingReverse EngineeringCodeFunktionalEigentliche AbbildungBildgebendes VerfahrenLastOrdnung <Mathematik>Physikalisches SystemStichprobenumfangDatenverarbeitungssystemVektorraumSoftwareSkriptspracheAutomatische HandlungsplanungDatenbankSystemaufrufSchlussregelProgrammKeller <Informatik>Kernel <Informatik>TypentheorieTemplateFestplatteComputeranimation
40:06
ProgrammAdditionComputersicherheitComputerarchitekturInformationsspeicherungCodeVektorraumARM <Computerarchitektur>FlächeninhaltAnalysisMalwareCASE <Informatik>RichtungZweiSchreiben <Datenverarbeitung>Hook <Programmierung>Ordnung <Mathematik>RechenschieberPunktFunktionalSichtenkonzeptProzess <Informatik>Treiber <Programm>TextsystemAdressraumAssemblerPeripheres GerätDatenverarbeitungssystemDatenfeldReverse EngineeringSoftwareBefehlsprozessorVIC 20MAPFormale SprachePhysikalisches SystemSpeicherabzugBeobachtungsstudieUmwandlungsenthalpieSprachsyntheseRechter WinkelEndliche ModelltheorieComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:00
Welcome to DEF CON. How many first timers? Raise your hands. All right. Good amount. Good amount. So I guess everybody is here for the 10 a.m. old malware new tools on a Commodore 64. So we have Cesar here to our left. Welcome to him. He's a first time speaker. So everybody
00:23
knows what that means. Yes, it does. Well, we're going to get started. So we're going to get started. Good morning. And let's have Cesar kick it off. Thank you very much. Good
00:40
morning, everyone. Thank you for being here with me. And okay. Let's start immediately. Speaking about the strange things for DEF CON, maybe something like coming from the past. But I think that it makes sense, especially for this ER team. Let's speak about me for a second. But just for a few slides for one slide. My name is Cesar Pizzi. I work my main job.
01:03
It's a reverse engineer. So I spend most of my time looking at other codes. But sometimes I also trying to write something for the community. And I like a lot, for example, to write software always about security regarding regarding security tools like, for example, Volatility,
01:23
OpenCanary, and also have a couple of main projects running, which are entirely done by me. If you want to reach me out, you can obviously reach me on Twitter or on GitHub, open an issue, just get in touch with me. I will be more than happy to speak with you about everything about
01:44
security. But okay, let's start with the why. It sounds pretty weird on 2022 to speak about the common 64 viruses. And what made me decide to do this was this picture coming from the DEF CON 30 team. I don't know, something went in my mind when I saw it. I decided that I had to
02:09
do something like try to bring my knowledge that I have today back with something that I was using when I was a child, for example. But be assured that this is not a talk about, it's not a
02:23
nostalgic talk. It's not something about, okay, the good old days because that's not my way of approaching things. I want also to report another sentence that is coming from the DEF CON 30 team. And so that's exactly what this talk is about. So trying to move what was interesting in the past into the future. And so try to not to lose what we learned in the past and try to keep
02:51
it and make experience about it. Also, doing this, I also realized that there were a lot more. Like, for example, I saw this kind of viruses which was really illuminating and very, very
03:04
interesting to do, this kind of exercise. And remember that we are speaking about a few hundred byte software. Everything written back then about viruses were more or less one kilobytes of software. This is particular that we are looking at today. It's in 700 bytes. And we will see that there will be a lot of things done in 700 bytes. Okay, let's set some common
03:26
historical background about common 64 because maybe nobody, not all of you knows what we are talking about. And okay, this is a picture of common 64. In Italy, where I come from, it was called the big biscuit because of the shape and the color of the chassis. I don't know if it's
03:42
the same in other countries as well, but that was the way. And it was one of the first home computers that was born in 1982 together with Apple II and a lot of others. It was mainly used for gaming actually because of several reasons. But viruses were a thing back then. And
04:02
so, they were existing, but it was very different from the current reality. Okay, let's start in defining a virus before going ahead so that we are all on the same page. Okay, a virus, it's kind of a program that without user knowledge tries to persist and replicate. I'm
04:21
sure that this is not 100% fitting definition, but it's a good starting point for all of us into going ahead and understanding what we are speaking about today. Okay, today if we think about virus or malicious software, we are thinking about something that is built to give a gain to the attacker. Mainly, we are speaking about financial gain or gain of other type
04:44
like information harvesting or something like that, where that in any case is going to a financial gain at the end. But if we think about the situation back then, it was a completely different thing because there were no financial motivation behind the virus, but it was
05:01
something more done for a show of technical knowledge, really, and not doing something malicious to the user, but pranking people maybe or just show something that is, okay, I can do it, so I can show you that I can do it, but I don't want to harm you or whatever else. And
05:22
that's the reason why I tend to define this more close to the demo scene back then. So the demo scene was the way of showing the skills of our programmers doing graphic sounds and so on. And in some way, viruses were something like that, done for showing the low-level technical
05:44
expertise, let's say. And you may think that the lack of financial gain behind this may be go to some naive code, a few functionalities or something written very badly, but it's not the case. We will see together that these tiny programs are really little jewels of programming
06:05
and technical skill set. Okay, let's start from the state of the art. So which art virus is known for commerce 64? This is the list, pretty much complete. It sounds pretty funny if you look at it, because we have, we count seven entries here, which is more or less what we can
06:26
think it's, it happens in a minute of malware creation nowadays. So it's really nice to see that there were so many, so few viruses back then. And there are great analyses already done,
06:41
most of them, especially for BHP, which is the first one, and considered to be the first virus created for commerce 64, and one of the first virus ever created. But there is one which is not analyzed yet. It's the Bula virus. The only thing, it was in the list we saw
07:04
before here, it was the fourth one. And the only thing that I found around before starting this kind of analysis was this little sentence coming from wiki.com, saying that nothing was really known about this virus. And there were some things happening when you run it,
07:21
but it was not clear how it was replicating and so on. And so I decided to get this one. In order to try to get this back to life, I obviously used some tools and some things to having it running again. And these tools, we are going to list them right now, are something to do static
07:44
analysis, which is, for example, the tool I use for static analysis, Ghidra, a well-known tool, I'm sure that you know it, with a specific plugin used to load this commerce 64 virus. I did some custom scripting for analyzing the virus itself with Java and the plugins of Ghidra itself. I
08:04
also created a 010 custom template in order to be able to analyze the disk images. The 64 images are the files representing a disk in comma 64. Then I did also some dynamic analysis,
08:21
obviously, and I used the vice emulator for doing that. Vice emulator, it's an emulator which has also a very nice debugger in it. And then the master 315, which is used to manage the 64 images again. A couple of screenshots about this. And this is, for example, the screenshot from vice, which is the emulator I use for dynamic analysis. As you can see on the bottom, you see
08:43
the comma 64 screen. And on the top, you see the debugger. The debugger, it's very well done. It allows you to step into the code, dumping memory, dumping up codes. And there is also this little window on top right showing you register and also this window showing computer and drive 8.
09:05
Remember this because it will be very important for our analysis. So it looks like we don't have a register just only on the main CPU. And that's important to remember for our analysis here. Then I use Ghidra. This is a well known tool. It supports the comma 64 architecture, which
09:25
is based on 6502 CPU. It also compiled the code, but in this case, it's not useful to have the code compiled, to be honest. So I just looked at the raw assembly code. Then here we have the custom template I created for 010 editor, which allows me to inspect the content of a disk of
09:47
comma 64. I'm going to release this template. I already released it this morning. And I will give you some more details at the end of the presentation about it. Okay. Let's now start in giving some details about our specimen. So the virus itself. The virus from what we know
10:08
exists in two variants, which are identified by a version, which is 6.13 and 8.32. So we have also major release for that. And both of them are available to download if you want to. Have a
10:22
look at them to tweet or replicate the analysis or do something by yourself. You can download it from CSDB.DK site. Okay. But before going into the presentation, we need to set some common knowledge on comma 64 hardware. Because it's very important. You will see
10:41
during the analysis that a lot of characteristics of the comma 64 will be used in the code of the virus itself. And let's start from the main CPU. The main CPU of the comma 64 is the well known 6510 CPU, which is based on 6502. Which was one of the first very cheap CPU that
11:01
were produced in early 80s. And probably it's the CPU that gave the start to the home computer movement. It's the CPU where used and on which it's were based most of the home computer back then. So it was a very cheap CPU doing a lot of things. In the case of comma 64,
11:27
we had a clock of one megahertz, 64K of RAM and 38 kilobytes of program. The comma 64 had a characteristic, a special one. Had also some additional chip sets for managing the video and
11:44
that's why it became very popular for gaming mainly. But the main CPU, it's exactly the same as all the other home computers at the time. What is interesting is that I learned it doing this research that it's still used. You can actually buy one if you want from the site. The
12:02
form factor is a bit different right now, as you can see. But the 6502 and other variants of the CPU are still existing, are still used according to the site. So that's why I'm saying you are not wasting your time this morning looking at this presentation because you are going to learn something very interesting like basics on 6502 assembly that we can use maybe on your next
12:27
automotive project. I don't know if you want to use the 6502 CPU. Okay, let's spend a couple of minutes talking about mass storage. You see that picture of Commodore 64 before, which was that keyboard with some ports on that. It doesn't have any kind of mass storage in it. So the
12:47
mass storage back then was done in two ways mainly. The cheap one was a tape player, which is awful, slow, unreliable. It was really probably one of the worst pieces of hardware ever
13:01
created in human history. I remember it as unusable mostly. And it was a cheap solution. So that's what I bought for the first time. And then when you start to save some money, you can buy another piece of hardware, which is a bit better. It's not really a great hardware as
13:21
well, but it's a lot better than the tape, which was a disk drive reading the floppy disk, the five inches floppy disk. So the black one, the floppy one, exactly. And why I'm mentioning this hardware? Because it's a little less known that the 1541 disk drive had exactly another
13:47
6502 CPU in it, running code, having some memory, having some register. This is why I told you to have a look to the register window before on the emulator, the bugger window, because
14:03
actually you really have another CPU equal to the one installed on the main board. That means that you, in some way, you had a multiprocessor system back then in 1982 because the two CPU can actually communicate between them and they can transfer code between them and execute code at the same time. So it was pretty surprising to understand that you actually had a
14:25
multiprocessor system with the main CPU and the one running on the drive. You can also offload some works on the CPU of the drive itself. It was not so easy because the connection between the two systems was done through a serial bus, so it was low and so on. But it was
14:44
actually usable and it was used also by legit software, let's say, like turbo loader, which were programmed to load things and so on. And guess what? Yes, this can be abused by viruses as well. And this is actually what the Bula virus is going to do and we will see how in a
15:02
while. But before going into the details, I promised you a crash course of assembly of 6502 assembly. And so we will go very briefly. It's a crash course of two slides, so don't be afraid about it. Let's have a look on how it actually works. So we have a few registers in the
15:22
CPU, a lot less than what we expect from current architectures like the Intel one, let's say, where we have a lot of them. And we have basically a program counter, which is more or less the same as the instruction AP of the Intel architecture and is pointing to the current instruction of the processor. And then three general purpose registers. One is the A, which
15:44
stands for accumulator, and you can store values in it. And then X and Y, which are indexing registers. So these are three general purpose registers you can use in your programs. And that's all what you have. You have also some stack pointer pointing to a fixed memory
16:00
region going from 0100 to 01FF. And then a flag status which holds the results of compare, carry, and so on. Very basic structure, very simple one, but you can see that you can do a lot of things with this. I'm adding also these slides here regarding the register 0001, which
16:21
are reported by device simulator, because these are out of our scope today. They are registered for the sound interface device, which was one of the chip I mentioned before, doing for, used for doing sounds. But we don't need them in our analysis today, just reporting these because for completeness. And so very basic crash course about assembly.
16:43
So we can basically do some things with the register. We can store values there and load values from the register themselves. So the instructions are pretty simple. It's a very, very simple instruction set on the 6502. And you have LDA to load a value in the accumulator, so you can put a value there. You can put an immediate value by putting a dash before it,
17:04
or a reference, or you can reference a memory address. And the opposite way, you can also store the value that is in the accumulator into memory. So you have LDA and STA. At the same time, you have LDX, STX, and STY, and LDY for the same operation with other two registers.
17:24
It's like the move operation in Intel assembly, let's say. Then we have another instruction, which is interesting for us, which is the JSR, which is basically the same as the call in Intel architecture, with the closure usually by an RTS instruction, which is
17:40
returning to the calling function. And then obviously, we have also some branching instructions, so you can jump around basically on the results of several operations and so on. So you have this set of instructions here, BPL, BMI, BBC, and so on. A couple of slides, then we can jump into the virus itself. So kernel, that's a strange word. It's coming from
18:05
Commodore 64 era. What is kernel? It sounds like it has been dispelled, and actually, it was. It's something like probably it was meant to be kernel, with E, not the A. And this is the Commodore 64 ROM resident operating system call. So you can call some routings doing basic
18:24
operation by calling a specific memory address. Compared to the modern, I put some quotes here, PC world, it's something like the BIOS OS routing we have in current PC, more or less. So we have this way to call basic and low-level kernel calls, and then we have another way of
18:46
calling more higher-level routings, which are the basic ones, which are stored from A00 to BFF. And this memory area holds the code for the basic operation. So let's say, if you want, I
19:01
want to do a print, the code of the print is stored in this area, and I call it directly by calling the memory area instead of writing print in my basic program. And that's a way of doing things in Commodore 64. But let's start now with our analysis. So the Bula virus, which we saw before, it's one of the few viruses remained which were not analyzed yet. Okay, we
19:28
need to obviously start it in some way. Remember, I said before, the virus never dies, so it's just a matter of giving him the right environment, and it starts again. And the right environment in our case is the emulator, obviously. And this is what happens when you start
19:44
with the virus. Something strange, actually. So you see this flashing screen here, and some gibberish printed out on the screen, and then nothing happens. So it looks like really nothing has been done to the user. But what happened behind the scene in this case? So let's
20:06
start with some code snippets. Okay, this is really tiny here, but don't worry, I will go into details now, so we will see with more details. So what happens during this command, so running the program virus, is that a serial bus was opened on the device. So remember that I
20:24
spoke that the two processors between the main CPU and the one on the disk drive communicate between the serial bus. So this serial bus has been opened. There was an execution of a command, then there was this flashing screen, and then the bus has been closed. So let's see what really happened here. These are the set of instructions opening the serial bus. As you
20:44
can see, there are a JSL and a JMP instruction point to the kernel. Do you remember the addresses beginning with FF? So these JMP and JSL are pointing to two kernel cores, which are the cores actually used to open the serial bus to communicate with the CPU of this 1541. And
21:01
then there is the execution of a command. The execution of a command is simply sending a couple of characters, three characters in this case, M-W, which stands for memory write, and sending this command, this character, these ASCII characters, or pet ASCII in this case,
21:22
because common 64 had a custom ASCII code, sending this and some codes, what happened is that some codes from the main CPU is transferred to the memory of the disk drive in this case. So what the virus is doing right now is just getting its code and transferring it on the
21:46
CPU drive. Then there is the flashing screen and the closing. Why the flashing screen? Back then, IO operations were really slow and you were sometimes really looking at the screen, trying to understand if something was happening or not. And the flashing screen was a way, a
22:03
trick, used to let the user know that something was happening during the IO operation. So that's why it has been added probably in this case, so just to let the user know that something was happening. And then, okay, the serial bus is just closed. So at this point, what has been
22:20
done is that the virus code has been transferred on the CPU of the floppy, but nothing else. So the code has not been executed yet. Why this has been done? For two reasons, maybe. Because in this way, obviously, because the virus doesn't reside anymore on the main memory, and it's more
22:43
persistent right now, because even if you power off or reset the common 64, actually, until you don't power off the drive, the virus still resides there and it's not going away. So that's a neat way to gain persistence on the call itself. And so, now the virus is on this drive memory.
23:07
Okay, it's time to execute it. Just to execute it, it's just a matter of sending another command like the MW we saw before. The command, it's U3, it's another command coming from the ROM of the 1541, which exactly, it just starts codes at a very specific address, which is
23:25
0500 on the disk drive. Since the set of instructions we saw before was transferring code to that address, the U3 command is just executing code residing there. So what happens here, again, snippets, and we are going into the details, is that the serial bus is open again, there
23:43
is a U3 command sent on the serial bus, and that means that the code starts. So the virus itself is starting on the disk drive memory. There was a set of commands, U3 from U9, executing code at different addresses from 0500 to FF01, which was just a matter of what you
24:02
want to do and execute in this case. And what happens, meanwhile, on the 1541, obviously, there is some code landing at 0500, where you have the code waiting for something, and there is just this tiny loop here, which is just waiting for something. It's waiting for an
24:23
attention signal to become low. That means that it's waiting for disk activity. So the virus has transferred the code on the drive, and it's just waiting for something to happen. If the user accesses the file, or saves a file, or opens one, or whatever. Again, we need to set
24:44
some other knowledge about the 1541 disk drive structure and layout before going ahead, so that you will get the entire virus life cycle. Okay, disks were splitting tracks as of today, or less. So no difference in this. Floppy disks had 35 tracks in them with different
25:03
number of sectors for each track. Obviously, the smaller tracks, so the internal one, were smaller, and so they had fewer sectors than the external one, with fewer storage capacity, obviously. One interesting track is Track 18. Track 18, it's a special track in the 1541,
25:26
because it holds the information about the disk and which files are on the disk. So it's the track holding the directory. It's like the MFT and bit mapping and TFS, more or less. So holding all the information about the disk. And the special track 18, which is also 12 in
25:46
Exodesma, has a special structure as well. Sectors 1 to 18 of the directory track contain the directory entries, so all the files stored on the disk are listed in this sector, and the
26:03
special sector, the sector 0, the very first one, which is the BAM, the Block Availability Map, which holds the information about the free and used blocks on the disk itself, which is very important, because we will see that the virus will use this information. This is the
26:21
structure of the sector 0, the BAM. Okay, you see the first two bytes, which are holding some data that we are not interested in them right now, so some generic data, but what is interesting for us, it's the byte 4, which holds the number of free sectors of track 1, and
26:40
then the subsequent three bytes holding the bit map of free sectors on the first track. And then this structure is repeated again and again and again for all the tracks of the disk. So you have the, at byte 8, the number of free tracks for sector 2, and then the bit map of sector 2, the bit map is just a set of one or zero, saying which blocks are used and which
27:03
aren't. Then all the other, so this is for the sector 0 of track 18. Then you have sectors 1 to 18. Sectors 1 to 18 are holding the files. There is this kind of structure repeated for
27:20
each file, and with this specific structure here. You see that the first two bytes are pointing to the next one. So when the first two bytes are zero, this means that this is the last file used by the disk itself. Then you have very important for our analysis, it's the byte starting from 02, so the third byte, and the fourth and fifth byte, because they are
27:42
holding the file type, which could be PRG, SEC, real, or USL, and the track and sector number of the first block of the file. Remember this, because we will see that it will be used by Viro itself. Then there are some other information, like obviously the file name, file
28:00
size, and so on. This structure is repeated for every single file into the Commodore 64, of the list of files. I told you that there are different file types in Commodore 64. These are the known ones, so PRG, SEC, real, USL, and L. PRG is what is interesting for us. PRG is the
28:22
executable files. It's like the executable in the modern PC, so it holds code that can be loaded into memory, and the first two bytes of the file itself are the address where the program itself must be loaded. This is what it actually executes code. Then we have the
28:45
other type of files, which are less interesting for our analysis, but let's just have a quick look to them. We have sequential, which are just byte streams, a way to access files without
29:02
random positioning. RL1, which is a similar way to access files, but with actually a possibility to position the pointer in the file itself. Then USL, which is a user-defined one, and that one, which is an undocumented one. Commodore 64 also had a basic integrity file
29:23
system, so you had to close the file before removing the disk, and you had also the possibility to have a star before the type of the file, if the file has not been closed in the proper way. Now, okay, we set this knowledge. We know now which are the information we need to understand what happens now, and let's start to see what happens when we try to infect, the
29:48
program starts to infect the program itself. We left the program waiting for this activity, remember. When this activity is recognized, okay, there is a start of a specific
30:01
function, which is the 600600600. This function, which is residing on the 1541, does a very specific thing. So it starts to find the first free block where on track 18, so on track 12, in
30:22
hexadecimal. So that means the virus is not trying to store itself in a file. It's trying to store itself in the directory structure, which is something that should not be done, because the directory structure just holds file information, not the file itself. So it starts counting from the very end of the track itself, and try to find free blocks there. Okay, this is
30:49
just the way of doing that, looping from in the BAM, so it loads the BAM. You see the block zero here, and then starts looping and looking for free sectors at the end of the file. Why
31:03
it starts from the end of the track 18? Because it's more likely to have them free. Commodore 64 disks were not known to hold hundreds of files. Probably most of them had, I don't know, 10, 20 files. So usually the last three blocks of the track 18 were actually not used at all. So it's very likely to have them free. So the virus is trying just to check if
31:28
these three blocks are free, actually, and if not, just try to move it and look for the one going through the beginning of the file. It does it through this loop here. When it detects
31:41
three sectors, why three? Because actually the code of the virus itself, it has been holding three sectors, it marked them as used into the bitmap. So it does not write the code again still, but it just marks the sector as used. Okay, now the virus itself just found a way to,
32:04
okay, it can store its code on the disk, but it's not sure yet if it can be re-executed. Because actually the virus itself wants to be re-executed in order to infect another system, another disk. And so it starts another loop, looking for a PRG file. We said before, the PRG
32:24
files were the executable one, and so it starts a loop to look for this kind of file. You can see here that it loads the first sector and then tries to compare, for example, you see that it's comparing the, at the very end of the, I don't know if you can point it, yeah. There, it
32:46
compared the trial type and it looks for the PRG one. When it found it, it started the real infection phase. Until then, it doesn't. So what happens is that, okay, I found the PRG file there, and so I want to start infection process. What does the infection process? It just
33:05
gets the very first pointer of the program, of the PRG file, and it makes it pointing to the track 18. So usually, probably, the pointer of the file is just pointing to another track, not the track 18, because track 18 is not meant to be, to holding some codes. And so it just
33:23
moves this pointer to the track itself. And in this way, it just basically tries to get, try to be re-executed when the user just load the file. There is an additional check, so the virus
33:42
also check if the file is already infected. So if it find that the PRG file is pointing already to track 18, which is basically impossible in a normal situation, it just keep the file and okay, it says, okay, probably it's already infected. So you can see that there are also error checkings, and so nothing is left to the, to random things. So it's really something
34:09
that is well done in this case. So it's now time to infect the file. You, the virus itself, it's just getting the track number for the file itself. It's replacing the track and sector
34:24
number, obviously. It replaced the values with the track and sector. Track will be 18 for sure. Sector will be the sector where it saved the three blocks. It replaced them so that the PRG file will point to that. And then it also makes the very last block of the virus itself
34:41
pointing back again to the PRG file so that the execution of the PRG is kept and the user maybe does not realize that there was an infection. This is not really working every time, but it actually worked. So let's recap what actually the infection process is. So we had, as
35:03
we remember, the on disk activity, the loop waiting for on disk activity, then, okay, when there is a disk activity, track 18 has three free blocks, yes. If yes, it starts the
35:21
infection process. If not, it stacks it. If the PRG file, then if the three blocks are free and a PRG file is found, it starts the infection process. If the PRG file is not found, it does something like, let's say, plan B. It just renamed the disk to Bula rules and
35:41
detects it because, okay, it's just not trying to do something harmful or doing something bad to the system itself, but just renaming the disk itself with a new name. Okay, I promised you that we were going through the two versions of the virus. Basically, the two
36:00
versions were very similar. I don't know why there were two major release between them, but, okay, the two versions are very similar. The only difference between the two are the, okay, the loading of the version 8, it's a bit sturdier, so you don't have the flashing screen, and it also hooks the save command. So, um, hooking in command 64 was pretty trivial to obtain,
36:25
because addresses for the routings are saved on 0300203FF. So, it's just a matter of replacing that address with a new one to redirect a command. In this case, the version 8 is replacing the save vector with a reset vector, so that if the user is typing save to save
36:46
something, just got the computer itself reset, and then so everything is cleaned up, but you still have, at this point, the virus is residing on the code of the, sorry, on the CPU of the 1541, um, device. Okay, I published some tools, because this exercise was done
37:09
through several tools, but I also, um, built my own tools for doing this analysis. I hope maybe you can find them useful, also to do some other analysis about virus malware, or maybe
37:20
just other programs. Um, this is the 010 template editor, uh, editor template, which has been built to read and analyze the disk structure. As you can see, you can analyze, uh, on the bottom side here, uh, the structure, for example, of the bum, of the directory, uh, showing up, which is the bitmap of each blocks, and so on. So, it helped me to analyze what the virus
37:44
was doing, and, uh, it's pretty neat to use for analyzing disk images of the Commodore 64, uh, itself. Then, I created a couple of Ghidra scripts in order to mapping the code. Uh, as I said to you, the, um, analyzer was the, the Ghidra analyzer, uh, was also the compiling, but I
38:02
mostly used assembly in this case, and these scripts are just mapping, for example, all the kernel and ROM calls for Commodore 64 into this, uh, in the source, um, of the assembly code, just adding comments on what the code is doing, so it allows you to streamline your analysis of the code, if you apply these kind of scripts on your analysis, for example, and the
38:25
same has been done for the 1541 ROM calls, because 1541 had its additional ROM calls, the NW common we saw, the U3 common we saw, and so on. I'm also published the, the two database of the Ghidra database I created, uh, these are two database, everything is heavily
38:45
commented, as you can see here, these are the complete analysis of the two virus sample, uh, you can open it with your Ghidra and have a look to it, if you want, uh, they are, uh, will be available on, on the GitHub as well. So, just for reference, if you want to have a look. And
39:03
let's jump to the conclusion, so why, uh, I decided to do the, the, the, this talk today with you. Uh, okay, we are at the end and we saw that the fully functional malicious code can be put in 700 bytes. We saw that this is, this was a virus that was completely functional with a
39:22
lot of functions, with, uh, everything done in the proper way, replication, persistence, even, even some really nice things like, uh, moving itself on different devices and so on. So, the techniques used by the virus were not new, because they were used by legit software as well,
39:41
so, um, they were, they were not discovered by this virus for sure, but it was, in any case, interesting to complete this knowledge, because, uh, it was really helpful for me as, as a reverse engineer. And, uh, I, I, I'll tell you also why. Okay, all, as I said, the scripting together with the fully commented computer database are available on my GitHub and, uh, you
40:03
can download them and, uh, have a look if you want. And let's have a look on what we learned or what I learned and I hope you learned as well, because that's the real core of the talk, from my point of view. I'm a, as I said, I'm a reverse engineer and, um, from a reverse engineer point of view, what I learned, or let's say a record, because sometimes it's just a
40:23
matter of recalling things, not really learning, but sometimes you tend to forget some things. Uh, with this analysis that, okay, for example, few assembly instructions does not mean that we have few functionality. We saw 700 bytes software doing a lot of things, uh, like replicating, writing, checking for errors, which is not really something that you can
40:45
give for, uh, for done, even if higher level program, so, uh, that's something that you have to consider. And that's, uh, really interesting for, for us as a reverse engineer. So, uh, it does not mean that a small program is doing few things. Uh, then, the other important
41:03
things that I'm sure I tend to forget and think it's a gray area in the malware analysis nowadays, is that, is that we don't have to forget external devices when we are looking at malicious software. Um, in this case, we saw that we had two CPUs running on two different, uh, devices of the computer itself. Right now, think about, uh, how many chips you have in a, uh,
41:24
uh, model computer. Uh, I'm sure that you can tell me that, okay, but these are not executing code or you cannot access directly. Yeah, true. But, uh, you have device drivers accessing these kind of things. And, uh, think about it. Think about the thing that, uh, you
41:41
may misuse a device driver in order to, I don't know, execute code, write code, store code somewhere where we are not expected. We don't, we don't expect to see it. We have a lot of additional things in our systems right now. And so far, from what I think, is that this is a kind of a gray area. It's not really explored, uh, from what regards the, um, the malicious
42:05
software. And so, something we, we need to take into consideration. Uh, and then the first thing I learned and I would like also to pass to you, it's the, that the assembly proficiency opens a lot of doors. Um, I tend to see that assembly is a bit, let's say I
42:21
forgot the language, so it's not teachable anymore, nobody is caring about it. Uh, I don't, I'm not saying here that you have to write your next word processor in assembly, which is not the case obviously. And probably you don't need to write anything in assembly in most of the case. But it's very important from my point of view that everybody working in security
42:41
knows assembly and knows in a proper way. Because this really opens a lot of doors. Uh, from my point of view, for example, in this kind of analysis, uh, I didn't know the 6502 architecture. I learned in doing this analysis together with you, uh, this morning. And the fact that I had knowledge of Intel assembly or ARM assembly, uh, really streamlined for me
43:05
the analysis of this simple architecture. So, that's something that we need to keep, uh, into consideration. Also, when we think about studies of, uh, new people coming into security fields and so on, we don't have to forget that the assembly language, it's very important for our purposes. And, uh, okay. So, these are all the references for what I used and what I was, uh,
43:28
speaking about today. Uh, I want to leave these slides some seconds so that everybody's credited as in the proper way. And, um, okay. So, I think that, um, I think that, um,
43:40
at the end of my presentation. So, I just want to say thank you to everyone here and for staying with me for the entire presentation. And, uh, if any question, maybe we have some minutes. I don't know. Five minutes. I don't know if you have, any of you have a question or something. Okay. Yeah. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay.
44:10
Okay.
44:32
I'm sorry. I'm not sure if you, uh, say that, uh, the removal of the hook itself, the removal process is just done by the reset itself.
44:45
So when you reset the memory, everything, it's, uh, brought back to the original state. So the save, it's just, uh, replaced with the correct, uh, address. So when you type save and you replace the vector, the Commodore 64 has been reset, but everything then it's, uh,
45:03
bringing back to the original state. Thank you. Oh, yep.
45:33
Yeah, I think so. Let's say that, uh, okay, it's not just a matter of, of this CPU obviously, but, uh, okay. This is obviously a CPU that is still used,
45:43
so you can apply the same concepts that we saw today. Probably not all this concept because some of the concepts we saw together were really, um, specific to Commodore 64 architecture, but, uh, all the rest of the things like the assembly code and so on, it's exactly the same.
46:04
Yeah, yeah, yeah, yeah. That's exactly the same. Okay. Okay. Thank you so much. Thank you. Really.