Implementing the NTFS filesystem in Rust
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 287 | |
Author | ||
Contributors | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/57066 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Programming languageBootingOperating systemFile systemFlow separationKernel (computing)BitPhysical systemWindowWritingProjective planeDevice driverCore dumpHypothesisHermite polynomialsMultiplication signPartition (number theory)SpacetimeComputer fileUniverse (mathematics)Ocean currentMereologyTask (computing)NumberLevel (video gaming)Data managementBuildingProcess (computing)Software engineeringDependent and independent variablesMultilaterationCodeAsynchronous Transfer ModeComputer wormStability theoryCategory of beingDiagramComputer animation
05:11
Standard deviationLibrary (computing)Function (mathematics)Total S.A.Library (computing)Multiplication signCodeOrder (biology)Military baseDifferent (Kate Ryan album)Inclusion mapState diagramComputer fileOperating systemVideo gameFile systemRun time (program lifecycle phase)Projective planeSoftware developerEmailImplementationLine (geometry)Asynchronous Transfer ModeBooting
07:02
Information securitySoftware bugWeb pageDemo (music)LaptopData typeFile systemException handlingPhysical systemImplementationFile systemOpen sourceComputer fileDevice driverCodeSoftware developerCompilerFormal languageVariety (linguistics)Computer animation
07:55
Adaptive behaviorIterationPartition (number theory)File systemPhysical systemWindowOperating systemWritingElectric generatorFile formatData compressionData structureNumberComputer fileCASE <Informatik>Multiplication signRevision controlFormal languageTable (information)CodePoint (geometry)Real numberMereologyDevice driverOperator (mathematics)Binary codeMultiplicationCodierung <Programmierung>Finite differenceVirtual realitySpacetimeDivision (mathematics)Open sourceImplementationString (computer science)NeuroinformatikLengthKernel (computing)Open setData storage deviceSinc functionDecision theoryProjective planeProduct (business)Internet service providerRange (statistics)Reading (process)BitUsabilityDensity of states
15:42
Electric currentArrow of timeElectronic visual displayFirmwareString (computer science)Functional (mathematics)Partial derivativeOpen setProgram slicingOverhead (computing)Pairwise comparisonComputer configurationFile systemData conversionWindowStandard deviationWindows RegistryImplementationOperator (mathematics)Raw image formatHeegaard splitting
17:55
2 (number)TimestampWindowIntegerDivision (mathematics)Cycle (graph theory)Operating systemResultantMathematicsRange (statistics)Entire functionBuffer overflowMultiplication signComputer animation
19:43
Type theoryTimestampMultiplication signPhysical systemIntegerLibrary (computing)Heegaard splitting
20:32
Computer fileDensity of statesFlagInheritance (object-oriented programming)Directory serviceRule of inferenceBoundary value problemRange (statistics)Pointer (computer programming)Row (database)Attribute grammarComputer fileSingle-precision floating-point formatEntire functionClassical physicsGene clusterMultiplication signExterior algebraStandard deviationRaw image formatCASE <Informatik>Sparse matrixNamespaceCombinational logicData storage deviceWindowOrder (biology)Electronic mailing listTimestampPhysical systemMereologyPerspective (visual)InformationInheritance (object-oriented programming)SpacetimeFundamental theorem of algebra2 (number)File systemData streamData structureRevision controlMiniDiscComputer animation
27:00
Game theoryReading (process)AreaMeta elementGamma functionWeb pageTelephone number mappingMemory managementLibrary (computing)Reading (process)Attribute grammarMereologyType theoryState diagramImplementationSlide ruleCASE <Informatik>Direction (geometry)Error messageIntegrated development environmentBinary fileSingle-precision floating-point formatObject (grammar)Raw image formatEquivalence relationInterface (computing)File systemSemiconductor memoryCuboidMedical imagingPhysical systemMiniDiscCore dumpAxiom of choiceFreewareAdditionLinearizationIterationFunctional (mathematics)Resource allocationComputer animation
30:24
Game theoryFile systemMultiplication signMultiplicationObject (grammar)Reading (process)Library (computing)Computer filePosition operatorProjective planeMereologyCursor (computers)Constraint (mathematics)Mathematical optimization
31:44
Point (geometry)File systemMultiplication signComputer configurationRule of inferenceSynchronizationCellular automatonConstraint (mathematics)Primitive (album)State diagramPhysical systemWechselseitiger Ausschluss
32:30
Meta elementGame theoryGradientImplementationAttribute grammarElectronic signatureObject (grammar)Game controllerCASE <Informatik>File systemReading (process)Computer fileRule of inferenceFunctional (mathematics)Real numberIterationData storage deviceExterior algebraGraph (mathematics)AdditionCategory of beingStandard deviationDependent and independent variablesMereologyOpen setMathematicsComputer animation
34:59
Game theoryIterationMereologyParameter (computer programming)AdditionType theoryComputer configurationCASE <Informatik>Row (database)Ultraviolet photoelectron spectroscopy
35:53
Computer fileGamma functionData structureCASE <Informatik>SequenceRow (database)Computer fileUniform resource locatorIdentifiabilityEmailStandard deviationMathematicsProcess (computing)Attribute grammarBoundary value problemWordInformationNumberMultiplication signRepresentation (politics)Rule of inferenceString (computer science)MereologyProcedural programmingBitMechanism designSemiconductor memorySingle-precision floating-point formatSlide rule
39:36
Price indexFlagSearch algorithmSubject indexingData structureState diagramElement (mathematics)Integrated development environmentMappingBinary codeResource allocationFlagField (computer science)Vector spaceLengthGraphics tabletCartesian coordinate systemRow (database)Object (grammar)Inheritance (object-oriented programming)Computer fileSingle-precision floating-point formatPoint (geometry)Uniform resource locatorCorrespondence (mathematics)Attribute grammarDirectory serviceKernel (computing)Core dumpMultiplicationHermite polynomialsOrder (biology)Key (cryptography)Alphabet (computer science)MiniDiscDiagramArithmetic meanInformation securityLibrary (computing)Network topologyProgram flowchart
43:26
Game theoryKey (cryptography)Correspondence (mathematics)Type theoryAttribute grammarData structureBit rateSubject indexingComputer fileUsabilityFunctional (mathematics)CodeField (computer science)Key (cryptography)SubsetCompilerParameter (computer programming)Slide ruleRaw image formatDifferent (Kate Ryan album)ImplementationLecture/Conference
46:43
Directory serviceWorld Wide Web ConsortiumAttribute grammarManufacturing execution systemRevision controlComputer fileInterface (computing)Attribute grammarAsynchronous Transfer ModeInformationFile systemPoint (geometry)Data compressionWindowWindows RegistryKernel (computing)Raw image formatMedical imagingLibrary (computing)Physical systemSubject indexingInformation securityGame controllerCodeGastropod shellMiniDiscTransport Layer SecurityElectronic mailing listCache (computing)Computer architectureDifferent (Kate Ryan album)RootType theoryImplementationRow (database)MathematicsFundamental theorem of algebraRevision controlDirectory serviceLevel (video gaming)Block (periodic table)Computing platformWeb pageProcess (computing)Data structureReverse engineering
51:56
MereologyCodePoint (geometry)Software developerImplementationFeedbackSurface
52:42
Roundness (object)VirtualizationOnline chatCuboidPresentation of a groupComputer animationMeeting/Interview
53:43
QuicksortFlow separationFile systemImplementationPoint (geometry)Meeting/Interview
55:03
Device driverConcurrency (computer science)Parallel portModal logicComputer fileOperator (mathematics)MereologyScheduling (computing)Meeting/Interview
56:29
File systemGastropod shellComputer fileError messageRow (database)Demo (music)Directory serviceRootMereologyMeeting/Interview
57:51
Directory serviceWindowCodeMultiplication signMeeting/Interview
58:40
Computer animation
Transcript: English(auto-generated)
00:14
Hello everybody, thanks for tuning into my talk today. It's an honor to be speaking at FOSDEM,
00:21
even though we all have to stay at home. My name is Colin Fing and in the next 60 minutes, I want to tell you a bit about the internals of Microsoft's proprietary NTFS file system. Despite being the most widespread desktop file system, NTFS is still uncommon in the open source world.
00:43
So what I did was implementing the NTFS file system in Rust over several weekends of 2021. And many properties of the Rust programming language have actually made it possible to quickly release a stable NTFS create.
01:01
And this is especially what I also want to talk about today. So a little bit about myself. I've worked on the ReactOS project for the past 15 years, and I started there as just another kernel and user mode code contributor,
01:21
but quickly took on various responsibilities over time, like build environments, infrastructure, and also some project management tasks. One way or another, I have probably touched all parts of the operating system by now after that time. It is important to say, unlike most projects presented at Fostem,
01:42
ReactOS is a Windows compatible operating system and not Unix based, giving you some idea why I'm presenting a proprietary Microsoft file system today. I actually had my first contact with Rust in 2017 when I ported the Hermit Core Unikernel
02:01
that was developed at a university research institute from C to Rust for my master thesis. That undertaking went very well in just a couple of months and driven by this success, I fully embraced Rust and now use it for every new project where it makes sense.
02:22
And I encourage you to do the same. As my day job, I'm a software engineer at the German startup Enlice, where I reverse engineer proprietary systems in the industrial manufacturing space and write compatible connectors in Rust. But now what's the actual motivation
02:42
behind choosing NTFS as my target? My primary idea is creating a Windows and likewise a ReactOS bootloader in Rust. To this day, ReactOS still lacks a working UEFI bootloader. Note that we cannot just use GRUB here. The Windows boot process is fundamentally different to Linux.
03:02
And if you're interested in the details, Alex Ebragen covered those in his talk at FOSDEM 2007. If you compare the Windows boot process to Linux, Windows bootloader would roughly bundle all stages of GRUB plus the initial RAM disk. Windows is also exclusively booted
03:20
from NTFS partitions these days and ReactOS wants to get there as well with full NTFS support in the longterm. So if you develop a new Windows bootloader now, it better has NTFS support right from the start. I brought you a screenshot of ReactOS current freeloader and you can nicely see it loading the kernel
03:42
and drivers from the only system partition before the kernel takes over control. There have also been some attempts at writing such a UEFI loader in C. However, we had a hard time reusing the existing freeloader code, which is fully into BIOS and 16-bit.
04:01
And as well as integrating the EDK2 into our build system. Both things are actually no brainers in the world of Rust. We only have one build system and co-reuse is actually a breeze. Eventually, long into the future, I could even think about going a step further
04:20
and creating an all boot payload out of my Rust UEFI loader, but that's for later. I also want my crate to be universal enough to serve as a file system driver for the growing number of Rust operating systems. There's Redox, there's Torx, there's the Hermit Core I worked on,
04:40
Oxide recently released Hubris, just to name a few of the contenders in the emerging space of Rust operating systems. It doesn't even need to be a Rust-only operating system. Linux now supports Rust inside the kernel. Windows and ReactOS are highly modularized. That means you can, for example,
05:03
let a C kernel run a Rust driver. This isn't rocket science at all. I have done similar things during my work for Hermit Core. What motivates me even more is contributing towards the excellent Rust no STD ecosystem,
05:20
which can probably be considered one of Rust's killer features. To understand that, just ask yourself, why is there actually a file system implementation in the kernel? And a totally different one for the same file system in the bootloader. There might even be a third file system implementation for the user mode tools,
05:40
and I could think of a fourth one for the file system checker. I mean, this is such a waste of developer resources. Why can't we just reuse one implementation for all four scenarios? Well, with C, this is much harder, but with no STD of Rust, we have for the first time an ecosystem of reusable code
06:02
that is not tied to a runtime library or a particular operating system. And libraries can also simply be added to your project through a single line in the project file. The creates IO repository, which you see in the screenshot here, then does the rest.
06:21
At ReactOS, I have spent more time than anyone should in life to make different C code bases compatible. You know, there are different header guards to follow. Sometimes you have to follow the header inclusion order. And there are just so many pitfalls to get different code bases compatible
06:40
when they actually want to do the same things. But all of this is not fun. And ultimately there's also just no value in header files that define the same things over and over again. This is also why I bet that no STD libraries in Rust can go further than most C libraries will ever do.
07:04
And finally, everyone likely had their CVE stories lately, and file system implementations are no exception here. Just in 2021, there has been a critical CVE in Microsoft's NTFS implementation. To be fair, this also applies to the open source NTFS 3G implementation,
07:23
but this isn't even restricted to proprietary file systems. In 2016, two Oracle developers have run a fuzzer on a variety of Linux file systems, and they could often lock up the system just in minutes. Needless to say, all affected code was written in C.
07:43
As such, another motivation for my create is creating a file system driver in a language where the compiler rules out many of these problems at build time. Over the years there have been many NTFS implementations
08:01
for multiple operating systems, but only a handful of them have evolved far enough to support full read and write operation. First of all, there's of course a reference driver in Windows, and it introduced the file system with the Windows NT release in 1993, and is developed up to this day.
08:22
Remember back then we still had DOS and the DOS-based Windows 9X line, and both of them lacked NTFS support. Microsoft had no interest in fixing that, so the Winternals company stepped in and released NTFS-DOS Professional in 1996.
08:41
That product was developed until 2002. Just in that year, work on the captive NTFS project also started, which was the first driver to provide native read-write support for NTFS on Linux, and it was developed until 2006.
09:03
Note that both implementations are totally independent. The reason why I named them together anyway is that they have one thing in common. They don't actually implement NTFS. Back then, NTFS was still not reverse engineered enough, so people considered it easier
09:22
to just simulate an anti-kernel environment and implement just enough APIs to run the reference NTFS Windows driver within that simulated environment on a different operating system. To achieve that task, notably captive NTFS
09:41
also uses some ReactOS code for that. In 2006, we finally got the first real read and write open source implementation of NTFS, and it was presented under the name NTFS-3G, third generation. This one uses a FUSE user space file system API on Linux,
10:01
and this one is also supported by other operating systems. To this date, NTFS-3G remains the most popular way for accessing NTFS on non-Microsoft operating systems, and it is also commercially backed by the Taxera company. For example, they have also released
10:21
a user-friendly NTFS for Mac product. Finally, the Paragon company has also developed commercial NTFS drivers with write support for a long time, since at least 2002. I talk about Paragon, especially because they recently got some news.
10:42
They released one of their drivers as open source, and it got integrated into the upstream Linux kernel. Now let's get to some facts about NTFS or the new technology file system in its full glory. Like it or not, but NTFS must be considered
11:02
the most popular desktop file system by a huge margin, simply because Windows still accounts for nearly 75% of the global desktop OS market share. This has profound implications on the entire market. For example, if you go into a store to buy an external hard drive,
11:21
you can expect it to come pre-formatted with NTFS, simply because it's very likely that it's going to be used with Windows. The NTFS file system was initially released with the first Windows NT version in 1993, as I already said, and the current version is NTFS 3.1 from 2001.
11:43
This one was released with Windows XP and is still used on the latest Windows 11. But note that this version number only really applies to the internal file system structures. Data like the journaling log or used compression formats
12:02
may still be different between Windows versions, and this is why Microsoft also warns between subtle compatibility issues when opening a system partition of a newer Windows on an older Windows, for example, when that system partition
12:21
has been hibernated. NTFS has been using Unicode file names and 64-bit values ever since. And even though the concrete formats may look odd today, the file system can be considered future-proof, and I find that remarkable for 1990s technology.
12:42
Now, what is actually so odd about the NTFS Unicode strings? To understand that, we have to dig a bit into history. Remember that Windows NT was the first major operating system to natively support Unicode in 1993.
13:01
Back then, Unicode was still in its infancy. Today's widespread UTF-8 encoding didn't even exist until 1993. So Microsoft had to use a different encoding, and when discussing that, someone probably said, 64K characters ought to be enough for anybody,
13:21
and Windows went with a fixed-length UCS2 encoding, which always uses two bytes per character. Granted, back then, this was not a bad decision for the time being. Unicode didn't define any code points beyond the UCS2 range until 1996,
13:40
and even the defined code points weren't used until around 2000. But even more computers at that time were very CPU and space-constrained. So one advantage of UCS2 is that you can determine the string lengths through a simple division by two,
14:00
something that, for example, UTF-8 cannot offer. At the same time, it doesn't waste as much space as a fixed-length four-byte encoding does. Microsoft probably already had to explain why some binaries got twice as large as their ASCII counterparts, but just imagine them talking about binaries
14:21
which had four times the size. However, Windows also couldn't infinitely stay on the 64K characters offered by UCS2, so it partly moved to UTF-16 with Windows 2000. Windows 2000 therefore added an alternative
14:41
four-byte character encoding as part of UTF-16 to represent more code points. I said they partly moved because some things like the upcase table had a predefined format with two bytes per character, and the upcase table is what defines case insensitivity in NTFS.
15:03
This means NTFS today isn't really case insensitive with respect to later edit characters. A prominent example in the German language is the sharp S. If you have an NTFS drive, try it out. The capital sharp S only got added
15:22
to the German language in 2017, and what you can now do is create a file with a capital sharp S and a file with a lowercase sharp S, and NTFS will allow both of them to happily coexist even though it claims to be case insensitive.
15:44
Now, how did I implement the NTFS Unicode strings? String operations are actually very common in any file system implementation and should therefore have minimum overhead. For handling the NTFS Unicode strings, I therefore defined an NTFS string type
16:01
that just wraps a byte slice of raw Unicode bytes. You may ask, why not a U16 slice, given that we work on two byte characters? Well, NTFS comes into play here. NTFS always uses little endian Unicode strings on disk,
16:20
so if we treated two bytes as a U16 on a big endian system, it wouldn't work just by transmuting the bytes. If we did an NTFS conversion once here, we would need to allocate additional storage, and this incurs massive overhead and is therefore no option
16:41
for handling strings with low overhead. But thanks to Rust's standard trades, working with my byte slice wrapping NTFS string is no problem at all. So it implements display, EQ, and ORT to print and compare NTFS strings with each other.
17:01
Additionally, it also implements partial EQ and partial ORT to compare NTFS strings with Rust's UTF-8 string slices. We can compare them safely here without allocating extra memory and by using nothing more than the comparison operators, unlike in C. And if you actually want an allocated string
17:22
instead of working on a slice, two functions exist that return an owned UTF-8 string. To be honest, NTFS string is mostly taken over from a previous trade, NTHive, which parses Windows registry hives. I'm open for any suggestions how to best split that off into its own crate
17:42
and be useful to as many projects as possible. Thanks to Windows and also UEFI, the UTF-16 strings will still stay with us for a while. Also, NTFS timestamps are equally odd to newcomers.
18:02
If you come from the Unix world, you are used to count seconds since 1970, but Windows and NTFS have their epoch date defined to the beginning of the year 1601. So why 1601? Well, thanks to our friend, Pope Gregory, we have to insert a leap day
18:21
into every year that is divisible by four, but not if it's divisible by 100, but again, if it's divisible by 400. This results in cycles that repeat exactly every 400 years. Calculating with respect to these cycles greatly simplifies calendar math. When Windows NT was developed in the 1990s,
18:42
the then current cycle ran from the year 1601 until the end of 2000. And that cycle was therefore chosen as the anchor for all timestamps. Now, even with second granularity, calculating from 1601 to the 1990s
19:01
already needed a 64-bit integer. So now that we've already chosen a 64-bit value, we have such a huge range of values we can express, and we could make the granularity even finer. This is why Windows and NTFS chose to calculate time in 100 nanosecond intervals
19:21
and still not overflow until the year 60056. Now you also understand why the year 2038 problem never existed for NTFS. In fact, the entire operating system does not use Unix timestamps, except where required for compatibility with C.
19:45
What did I do in Rust? Well, in Rust I've implemented an NTFS time type that just wraps the 64-bit unsigned integer value and a simple getter for returning that timestamp. But depending on your enabled features, you may never need it.
20:03
So if you have enabled the chrono feature, you have some converters from and to the popular chrono date time types. And if you're running on the standard library, you can also convert from the system time type
20:20
of the standard library. I bet that NTFS time is also useful to other people, and again, I'm open to your suggestions for splitting it up into an own crate. Now, there are a few buzz-worthy features of NTFS which are always used to distinguish it from other file systems.
20:41
As you can imagine, I didn't implement all of them in the first version of my NTFS crate, but we are only covering alternate data streams, B-tree indexes, hardlinks, and sparse files so far. We'll get into the details of them later. So let's first have a detailed look
21:02
into the anatomy of an NTFS file. A file is a fundamental data structure of the NTFS file system. So even the master file table, which is actually the list of all files on disk, it's itself organized in a file.
21:21
When we think about a file, we usually imagine just the name and some associated data, but forget about that. A file on the NTFS file system looks more like this. The file name and the file's data are just two of multiple possible attributes that can belong to a file.
21:40
Let's have a look into the standard information attribute first. This is where NTFS stores C time, A time, M time, and another timestamp when the record was last modified. This is also where you find the DOS file flags, such as archive, hidden, read-only, or system. DOS and Windows people probably know
22:01
what I'm talking about. Let's move on to the file name attribute. This is where NTFS stores the file record number of the parent directory, the actual file name, and the namespace that file name belongs to. So the namespace is an enum, and it's usually WIN32 for the case-insensitive
22:23
long file names that we all know from Windows. It may also be DOS for a legacy 8 plus 3 name or a combination that means WIN32 and DOS if a given name fulfills both criteria for DOS and WIN32 names.
22:41
For historic reasons, NTFS also supports a case-sensitive POSIX namespace with fewer character set constraints, but this is practically not used anymore. There may even be more than one file name attribute if a file has a different long and short name, which is usually the case. And NTFS implements hardlinks
23:02
just by adding another file name attribute to a record. This is the entire magic behind hardlinks on NTFS. Now, let's get on to the data attribute, and this obviously contains the actual raw data buckets of the file.
23:21
NTFS, again, supports multiple attributes of this time. That is the entire magic behind the alternate data streams feature. Apart from NTFS, I only know of HFS on classic Mac OS having a similar feature. There, you could have a single file with a data fork and a resource fork.
23:42
Now, the entire file record always has a size of one KB byte, and as a result, only small data of a few hundred bytes can fit into a file record. Now, when we append additional data to the file, we quickly reach the one KB byte boundary of the record.
24:03
At this point, NTFS needs to allocate some clusters anywhere on the disk and move the entire data attribute there. This is now called a non-resident attribute. What remains in the file record is a pointer to the cluster range where the data has been moved to,
24:22
a so-called data run. We can happily add more data now up to the boundary of the allocated cluster range. Let's edit the file again, adding even more data. NTFS now needs to allocate some more clusters to continue the data and references them via another data run.
24:42
This is how a file gets fragmented on disk. Now, when a file gets too fragmented, we don't even have enough space left in the file record to even manage the data run information. At that point, NTFS allocates a new file record and moves all attributes that can be made non-resident.
25:01
Now, there is enough space to store the second data runs pointer. To link the original file record to the new file record on the right, NTFS adds an attribute list attribute, as the glue in between. There wasn't enough space in the original file record on the left to begin with,
25:21
so the attribute list is usually non-resident as well. The attribute list references all files of the original file record. In this example, it contains a back reference to the standard information attribute that is still part of the original record.
25:43
It links the original record on the left to the record on the right, which contains the new file name and data attributes. Now, let's add some data to our file one more time, and imagine that we even hit the one kibibyte boundary
26:02
of the other file record. An attribute list now allows us to continue the same data attribute in a third file record, and references that file record via a second entry for the same attribute.
26:21
Fortunately, this is as complex as it gets. There can never be more than one attribute list in a file, nor can they be nested. A single attribute list is enough for NTFS to handle even large and very fragmented files. Now, from the perspective of a user, you don't want to deal with these details all the time.
26:42
Most of the time, you only want to access an attribute, no matter if it's stored resident as part of the file record, or deep down in another record, glued together via an attribute list. So how can we leverage Rust to bring some order into this complexity?
27:02
The answer is traits. Rust comes with a lot of useful traits, and you probably know iterator. For iterator, you just have to implement a single method, next, to get an arbitrary next element. And when you have done so, you get a lot of additional methods for free.
27:21
I'm talking about linear search, aggregation, and zipping here, just to name a few. You don't need to re-implement them on your own for every case. A natural first choice for my file system trait is using the read and seek traits from stdio. Remember that my NTFS library
27:41
shall not be tied to a particular type of disk API. These traits are just enough of an interface to let the library work on any object that can read raw file system bytes, no matter if they come from a real disk, an image file, a memory buffer, or anything else. At the same time,
28:00
the NTFS library can itself provide objects that implement read and seek to let the user read things like attribute values. In my crate, I have distinct read and seek implementations for resident attribute values and non-resident attribute values. And I also have a very special implementation
28:20
for attributes that are split into multiple entries of an attribute list. We have just seen all those three cases in the previous slide. Users of my crate don't have to deal with that, though, if they don't want. So, they can just go via the NTFS attribute value enum that handles all three types
28:41
and also implements read and seek. But using the stdio traits directly poses a serious problem for our no std crate, and that is the stdio error type. Both trait functions here return stdio error values. But unlike many std types
29:02
that are just re-exports from core, stdio error is not part of Rust's core library. And if we take a look at the type, it can't even become part of core due to the custom variant that performs heap allocations. It could become part of Alloc, that is the no std library for heap allocations,
29:23
but I'm not aware of any efforts in this direction. What is happening instead is that popular crates re-implement their own read and seek traits without the custom error variant. One such example is the bin read crate, which I'm using in my NTFS library.
29:43
Its read and seek traits work in no std environments by not implementing the custom variant, and they just map to their std equivalents when the standard library is present. I'm now using them in my crate as long as Rust provides the better solution out of the box.
30:01
But this is not the only pitfall. Both read and seek traits are supposed to move an internal cursor, so they take immutable reference to self. That means if we only had an immutable reference, we wouldn't be able to read anything. So for a moment, let's imagine we had an NTFS struct
30:23
that stored immutable reference to the raw file system reader. Rust's borrow checker will make sure that we never have more than one immutable reference to an object at the same time. Consequently, this design of NTFS file
30:40
would make it impossible to do things like reading two files on the same file system simultaneously, an unacceptable constraint for my crate. I had a look at how other projects deal with that, and I found an X4 file system library that uses the read act trait from the third-party positioned IO crate instead.
31:02
Granted, this could solve part of my problem because I could read from the file system using a stored immutable reference. Remember, multiple immutable references to the same object are no problem, but I would be back to the same problem as soon as my crate gets write support. Write support isn't possible
31:21
without actually mutating the underlying data, as you can imagine. Apart from that, I would lose all advantages of the STDIO traits. These, or variants thereof, are already widespread within the Rust ecosystem. And as you can imagine, some optimizations are also possible when you have an internal cursor
31:41
that knows about the last seek position. But the constraints don't stop at this point. So if my crate shall work in a no STD environment, we also cannot use synchronization primitives to manage access to the file system reader. Synchronization primitives are always tied
32:00
to a specific operating system. Hence, moving the mutable reference into a mutex is no option. I also don't want to move borrow checking to runtime, something that Rust supports, but I would thereby lose some of the compile time guarantees that Rust gives me. So this also rules out a ref cell container.
32:22
So what options do we actually have? I came up with the following rules for my crate to solve this problem. First rule, I only store immutable references in base structs. This is no solution yet for the file system reader
32:40
because that one continues to be mutable. But to give an example, I could have a base NTFS object that stores things like cluster size, sector size, serial number, static properties that don't change for an open file system. And of course, I want to be able to store the reference to that
33:01
and don't copy all those objects over and over again. So this first rule still allows me to store immutable references for situations like this. Second rule, a base struct never contains any mutable references. Instead, mutable references are only passed
33:23
whenever a function needs them. Provided that the functions are fine granular enough, this finally allows me to have two simultaneous NTFS file structs and read from them in alternation while keeping Rust's borrow checker happy. The only downside is that the responsibility
33:41
of providing the correct file system reader is now shifted to the caller, but this should be acceptable. Finally, there are a few cases where the function signature is not under our control. For example, when we want to implement Rust's iterator trade. A real world example is this iterator
34:00
over the attributes of a file. In that situation, I implement an additional struct, NTFS attributes attached, that wraps the original NTFS attribute struct and a mutable reference to the file system reader. So here, we actually have a mutable reference inside the struct.
34:21
This struct can be created from NTFS attributes via an attach function and split up again into both parts via a detach function. Note, we obviously cannot trick Rust's borrow checker this way. As long as an attached object exists, we cannot use a file system reader for anything else
34:41
until it is detached again. It seems like this is the trade off to be made if we want to implement Rust's standard trades. I should also highlight, I didn't come up with the attach idea myself, but borrowed it from the popular pet graph trade.
35:00
That third rule, however, is not an option for the one case I have where the returned item borrows from the iterator itself. As we just saw, implementing Rust's iterator trade requires defining an item type, but that item type may not introduce additional type parameters. This means we cannot implement the iterator trade at all.
35:23
And as a consequence, a helper struct with an attached mutable reference also has no compelling advantages. I bet this has to wait until generic associated types and the promised streaming iterator features land in Rust. And from what I've heard, this may happen really soon.
35:46
I'd like to dig into another important part of any NTFS implementation, namely record fix ups. For that, let's have a look at the file record from the previous slides again. But this time we want to see the bare bytes.
36:02
For that, I brought you the on-disk representation of the 524 bytes file here, but the exact size doesn't really matter. Just imagine it fits entirely into a file record. So all of it begins with the file record header that is exactly 48 bytes
36:21
and mainly contains some identifiers and offsets to everything else. Next, we have the standard information attribute. It's the first attribute here, and well, it contains no strings. So if you look at the ASCII representation, you can hardly identify it there.
36:41
Next comes the file name attribute. And here you already see from the ASCII representation that Kafka.txt is the chosen name here. This name perfectly fits both the Win32 long file name and DOS short file name rules. So only a single file name attribute is necessary here.
37:02
And finally, we have the resident data attribute, which has a value short enough to be completely embedded into the file record. But if you look closely, something is off in the data attribute. The word slightly in the data is missing the letters G and H.
37:22
Why does that happen? And why does it happen here? Turns out that this is related to the only memory region we haven't named yet, which is the update sequence. And DFS comes with a basic mechanism to detect bit rot in record structures. The assumption is that either an entire sector is okay
37:42
or it's bad. The feature works by incrementing the first two bytes of the update sequence on every change to the record structure. These bytes are also called the update sequence number. The incremented update sequence number is then copied to the last two bytes of each sector that is part of the record structure.
38:03
Previous bytes at those locations are moved into the update sequence. So let's look at our situation here. The distorted word slightly actually crosses a 512 byte sector boundary. The missing G and H letters are replaced by the update sequence number three,
38:21
which can also be found in the update sequence. And in that region, we also found our missing letters G and H again. Now, maybe it becomes easier when we read a record because here we need to reverse the entire procedure. So what we do is, we read the current update sequence number from the record
38:43
and we check that every sector of the record ends with that number. If that isn't the case for a sector, the entire record is declared as faulty. But in the happy case, the last two bytes of each sector are replaced
39:02
by the corresponding bytes from the update sequence. And we call that process record fix up. Now, all of this is why we can't just go and read an NTFS attribute directly from disk. If the attribute is part of a record, we need to fix up that record first.
39:21
If you miss that part, have fun debugging for days while your data is corrupted in two bytes. Well, we don't want to do the fix up on the fly for every read, so it makes sense to keep the fixed up record in memory. This is why my record structure uses a vector.
39:42
Note that the record may not only be a one-kibibyte file record, but also a four-kibibyte index record. That amount of data makes using a stack-backed array vec infeasible. Now, this is therefore one of the few locations where my crate needs to perform heap allocations.
40:02
And this is also where it needs alloc in a no std environment. Don't get that wrong. The crate supporting no std plus alloc is still far more useful than a crate that requires the standard library. For example, you can write a UEFI application with no std plus alloc
40:21
and the kernel side of the hermit core I worked on was also no std plus alloc. But if you know a better practice and depending on alloc in such situations, I'm all ears. Finally, let's come to an NTFS concept which maps pretty nicely to Rust, B-tree indexes.
40:41
NTFS uses B-tree indexes to quickly look up arbitrary elements. These may be files, object IDs, repass points or security descriptors. The most popular index is a directory and a directory in NTFS is simply represented as an index of files
41:01
sorted by the case insensitive file name. Now let's have a look at the structure of an NTFS B-tree. An NTFS B-tree has multiple entries per node and an end marker for each node. Every entry and even the end marker may have a reference to a single sub node.
41:21
If there is such a reference, the entries of the sub node come before the parent entry alphabetically. The diagram here shows how files with the names one to 10 could be organized in an NTFS B-tree. And yes, this tree is sorted.
41:41
So let's have a look at a single entry. A single entry here has multiple fields. First, the data offset, data lengths and some padding, then index entry lengths, key lengths and flags. All of these fields are mandatory and guaranteed to exist for any index entry.
42:01
But depending on the length fields and flags, an entry may additionally have a key data and a sub node. These are not mandatory. So you first have to read the mandatory fields, interpret them and then decide whether to read any further. So this is much more than just the plain old data structure that you can directly read from disk to memory.
42:22
And it is even less one when you take endianness into account. This also means that each index entry has a variable length. This however has consequences for lookups. If you want to access, for example, the fifth entry of a node, you first need to iterate through entries one to four.
42:42
Means textbook binary search algorithms that always pick the middle element have less benefits here, but it gets even more complex. So file index entries have no data field, but they reuse the data offset, data lengths
43:01
and padding fields to store an eight byte file reference. And if we look into the key of a file index entry, it's actually just a copy of the corresponding file name attribute that is indexed by this entry. We already have that file name structure elsewhere for interpreting such an attribute value.
43:21
So again, how can Rust bring some order into all of this? And the answer is again, traits. To handle the different types of indexes in a generic fashion, I first introduced the trait NTFS index entry type. Every structure that implements this trait can optionally implement one of two market traits.
43:43
NTFS index entry has data or NTFS index entry has file reference. This allows me to make a corresponding getter function only available to the index type that actually supports it. So you can't accidentally query a data field for a file index entry.
44:00
The compiler will catch this during build. In a perfect world, I could even declare has data and has file reference to be mutually exclusive, but Rust has no way to express that yet. So I'm left with taking care of myself that I don't implement both of them for a single structure, but I think this is acceptable.
44:22
Now we've seen in the previous slide that key and data are not random bytes, but structured values that we should decipher. Therefore, we extend NTFS index entry type and NTFS index entry has data with type fields. Every index entry type now needs to specify
44:40
the structure type of its key. Same for any entry type that has data. Structures can be declared suitable as an index entry type or index entry data by implementing the corresponding market rate. So to make it easier, how does all of this look for a particular entry type? Let's take a look at the file index
45:01
from the previous example again. So previously I have already declared a struct NTFS file name to deal with file name attribute values. We also need it elsewhere. Now what I need to do for the structure is implementing the NTFS index entry key trait
45:20
to make my structure usable as an index key. With that preparation, I can now implement my file index as an NTFS file name index struct. I set its key type to NTFS file name, and I specify that a file index entry has a file reference field, but no data field.
45:41
And that's the entire code that is specific to file indexes. The rest of the index code is fully generic and reused for other index types. To illustrate that, here's an excerpt of the NTFS index entry implementation. As you see, it has a type parameter E,
46:00
which must be an NTFS index entry type, such as NTFS file name index from the previous slide. Now the key method is implemented for all index entries, as every index entry must have a key. But it doesn't encode a specific type as its return value. Instead, it refers to the key type
46:20
of the past NTFS index entry type. On the other hand, file reference is only implemented if E also implements the has file reference trait. And that's basically the entire magic. I just made NTFS indexes type safe, where we previously had raw bytes with variadic structures
46:41
and multipurpose fields. Now that many features have already been implemented, what is actually still missing from my NTFS crate? Well, first and foremost, caching for performance. My original idea was to leave that entirely up to the user of the crate. If the entire interface to the raw file system reader
47:03
consists of the read and seek traits, a user could, for example, just implement a cache between my crate and the actual disk controller. A similar real-world example for that idea is stdio-buff-reader. However, this idea was deemed insufficient when I stumbled upon record fix-ups.
47:23
If only disk blocks are cached, I still need to repeat the fix-up process for every block that is retrieved from the cache. As the file system crate, I therefore need a way to explicitly tell the cache what process data to store. And this is also what the Windows kernel's cache controller supports.
47:40
At the same time, the architecture of my new code shall not be tied to a specific kernel. And all of this together is again a point where I'm very open for suggestions. Only when that is done and the crate's architecture is validated, we can think about adding write support. I've already designed the crate as much as I could
48:02
with write support in mind, but the actual implementation is still missing. As mentioned earlier, my NTFS library also still misses some NTFS features. Compression actually shouldn't be that hard. There's already a Rust crate that implements the proprietary compression algorithms used by Microsoft and NTFS.
48:22
I actually expect repass points and security descriptors to be equally simple. We mainly talk about further attribute and index types here. Journaling is a different beast though, and that may actually need further reverse engineering and fundamental changes to the crate.
48:43
Finally, let's come to a little demonstration. I've brought you NTFS shell and NTFS shell is an example that comes with my crate and enables you to inspect the internal structures of an NTFS image. I originally wrote that tool to develop the library in user mode
49:01
before running the code on the bare metal in kernel mode. NTFS shell works on every platform that Rust supports, but on Windows, you can even inspect your running system partition, and that is pretty neat. I'm going to try that out here.
49:20
So NTFS shell comes with a few commands. Let's have a look and let's get some general file system information first. So we can get similar general information for every file and even the MFT is just a regular file.
49:41
Let me demonstrate that. So let's try the same for a real file. I guess everyone has a pagefile.this in their root directory. Let's try it out with that.
50:04
And we also get a list of attributes of pagefile.this. So we have standard information here, we have a file name, but this file also has a data attribute that is embedded into an attribute list.
50:22
Probably because the data consists of many data runs. Let's find out about that. And yes, indeed, that page file is split up into 16 fragments. We can actually do the same with folders.
50:42
Let's get some information about the top level Windows directory. So my tool works almost like a real shell. So let's jump into some subdirectories to explore them.
51:06
No, this is a folder where the system registry resides. Windows usually protects all registry files from being accessed while the system is running. But NTFS shell can happily access the last version stored on disk.
51:28
We can also access a file, for example, by its record number. Let's try that out with the record number of the system file we just queried.
51:43
And as you see, you get the same information. Finally, we can even extract the file from the file system. And with that, I'd like to conclude my presentation.
52:00
Even after all these slides, we have still only scratched the surface of the code, but feel free to look up the remaining parts in the code now that it is up on GitHub creates.io and docs.rs. I did my best to document the code, so I hope it answers any outstanding questions you might have. A big shout out at this point
52:21
goes out to the contributors to the Linux NTFS documentation. This was by far the resource I consulted most during development. And as I'd like to say, implementations come and go, but good documentation is forever. Thank you very much for your attention. And I'm looking forward to your feedback.
52:55
Now we are back. We are back and we're gonna be live in...
53:01
Okay. I think we are live. Yes. Let's thank Colin for this wonderful presentation. I'd like to take this opportunity to thank Colin for presenting and choosing FOSDEM 2022
53:20
to present and talk about this work. And it's an honor for us. So before I open the floodgates, I would request the audience to please have a huge round of virtual applause for Colin. And yeah, if there are any questions, if you could please put in the chat box and I'd be happy to take it to Colin.
53:42
Well, thank you very much for the opportunity to present at FOSDEM. I really appreciate that. Before we see any questions, I think the overall chat was mostly positive. Everybody was super sort of happy
54:01
about what you've done, but what comment that just sort of stands out as you just casually said, you ported an NTFS file system in several weekends. You said it casually, but I think that is nothing short of impressive. So kudos. Thank you very much.
54:21
Well, as I said on my last slide, it's much thanks to the amazing documentation that is out there about NTFS these days. I mean, I'm standing on the shoulders of giants here who have done that amazing work on documenting NTFS. And I'm pretty sure that the public NTFS documentation
54:44
will outlast several more implementations of the file system. So I think this is the most important point. And also the reason why we didn't have that in the 90s where NTFS was less researched.
55:02
There is a question. Any thoughts on how to support async or concurrency so there can be parallel IO in flight? Maybe it's not too bad in read-only driver, but probably hard in a read-write driver. Well, if we talk about asynchronous operations,
55:24
I don't think I will implement anything like that in my crate because my Rust crate is meant to implement the basic atomic operations needed for NTFS.
55:40
So if it's about doing it as synchronously, like two operations in parallel, writing two files in parallel and both shall finish at the same time, it would be about the caller to split up the file writing, for example, in chunks of a few kilobytes
56:02
and then write one file, write another file. So the scheduling part needs to be done by the caller. But I'm basically here with my crate to provide the underlying foundations to do the necessary NTFS operations.
56:21
And that works both synchronously and asynchronously. Sure, I hope that answers the question. There's one more. Could one use your shell to find out about a broken NTFS file system and perhaps even restore files?
56:40
Well, I spent some efforts on coming up with nice error messages and not crashing or panicking whenever I hit a broken file system. So you could, of course, try to explore a broken file system with NTFS shell and then check how far you come.
57:01
You may have some luck indeed where Windows, for example, tries to auto mount it and then fails when something is broken. My NTFS shell will only fail as soon as it actually hits the broken part of the file system.
57:20
And if there's, for example, a deleted file, but you somehow can reconstruct the file record number it once had, then you can also try to access that particular file record. I can't tell you how to get that file record number. That may need some trial and error actually, but it is definitely possible to access files
57:43
where you don't know the file name anymore just by accessing the file record number. Do you know why the root directory in your demo was in the Win32 and DOS namespace, whereas the Windows directory was in the POSIX namespace?
58:02
Well, I think this actually has historical reasons, probably from the time when Windows still had a POSIX subsystem. And the Windows directory, I mean the name actually fulfills all criteria. It fulfills the DOS, the Win32,
58:21
and also the POSIX criteria. So I could imagine it was for compatibility, but I'm not sure actually this is, these are one of the details that only Microsoft knows about, or maybe someone even there has to look it up in the code.