FreeBSD's Ext2 Implementation
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 24 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/15336 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2014 | |
Production Place | Ottawa, Canada |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
1
5
7
9
12
13
14
15
17
18
19
21
23
00:00
File systemMereologyImplementationPhysical systemCommutatorArithmetic meanDatabaseExtension (kinesiology)Network topologyBoss CorporationWater vaporWindowOrder (biology)HypothesisOvalXMLUML
01:43
Ext functorImplementationExtension (kinesiology)Computer fileDirectory serviceNetwork topologySubject indexingFreewarePhysical systemSystem programmingReading (process)PlastikkarteMaxima and minimaGeometryBlock (periodic table)Asynchronous Transfer ModeInheritance (object-oriented programming)Data structureMachine codeOptical disc driveCache (computing)Data bufferResource allocationFile formatCodeUltraviolet photoelectron spectroscopyRevision controlSynchronizationSoftware maintenanceDefault (computer science)Similarity (geometry)EmailSystem on a chipWritingShape (magazine)Software bugThread (computing)WindowRead-only memoryMiniDiscSoftware testingElectric currentPatch (Unix)MereologySmith chartAlgorithmPairwise comparisonFeedbackDirection (geometry)BenchmarkFile systemOnline helpHash functionOrder (biology)Strategy gameControl flowFlagField (computer science)TimestampLimit (category theory)Machine codeFile systemSoftware testingPartition (number theory)ImplementationComputer configurationOperating systemMiniDiscPoint (geometry)Software developerSeries (mathematics)Touch typingEmailNetwork topologyBitFile formatWeightNormal (geometry)Projective planeQuicksortDirectory serviceAnalytic continuationMatching (graph theory)Kernel (computing)MereologyHard disk drivePhysical systemDifferent (Kate Ryan album)View (database)Revision controlResource allocationComputer fileWindowMultiplication signBenchmarkCodeSemiconductor memoryDevice driverFreewareInformationAsynchronous Transfer ModeData structureExpected valueBlock (periodic table)SpacetimeSynchronizationVirtual memoryProcess (computing)Student's t-testGeometryWeb pageWebsiteGame controllerUniverse (mathematics)ProgrammschleifeKey (cryptography)Adaptive behaviorTraffic reportingVariable (mathematics)Decision theoryStandard deviationInsertion lossSoftware maintenanceExterior algebraPairwise comparisonResultantUtility softwareDatabaseCache (computing)Software bugType theoryMultilaterationOrder (biology)Number1 (number)Clique-widthVideo gameContrast (vision)Latent heatMaxima and minimaMixture modelOcean currentFunctional (mathematics)Right angleAlgorithmBounded variationDefault (computer science)Control flowBuffer solutionSubsetExtension (kinesiology)Line (geometry)Distribution (mathematics)Information securityInterface (computing)WritingLevel (video gaming)MathematicsUniform resource locatorMoment (mathematics)SoftwareRaster graphicsLimit (category theory)Template (C++)Parameter (computer programming)Similarity (geometry)Table (information)Computer clusterArithmetic meanSubject indexingReal numberReduction of orderRule of inferenceComplex (psychology)Computer forensicsThread (computing)Sound effectComplete metric spaceFeedbackGoogolAttribute grammarShape (magazine)Game theoryAreaPhysical lawNeuroinformatikStructural loadProduct (business)Ideal (ethics)VotingPay televisionSign (mathematics)CuboidWorkstation <Musikinstrument>RankingCartesian coordinate systemException handlingWordTelecommunicationTorusDrill commandsSearch engine (computing)CASE <Informatik>WhiteboardMusical ensembleFile viewerExecution unitBuildingSensitivity analysisCondition numberBoss CorporationCycle (graph theory)Group action7 (number)Set (mathematics)Cellular automatonForestFrequencyCoefficient of determinationGoodness of fitOperator (mathematics)Pulse (signal processing)Tournament (medieval)MappingField (computer science)Inheritance (object-oriented programming)State of matterFigurate numberDivisorImpulse responseLink (knot theory)Extreme programmingStrategy gameSpeech synthesisNoise (electronics)EstimatorHypermediaSystem callLattice (order)CausalityWater vaporDiscounts and allowancesData managementPressureElectronic program guideEndliche ModelltheoriePlotterMixed realityOffice suiteComputer animation
Transcript: English(auto-generated)
00:00
Hello and welcome to my talk about the extended 2FS file system implementation in FreeBSD. This file system is not really that important for us in FreeBSD, but it has a special value for me because this was the means that I used to become a commuter in FreeBSD. Basically the file system was not very well implemented.
00:25
It was rotten in the tree basically and nobody wanted to use it because it's under the GPL, so it never saw much use in FreeBSD. We even started removing features from it and meanwhile there was an implementation
00:44
in NetBSD and people can use their X2 file systems from Windows, so it was sort of interesting to see something happening to it. Well I've been using FreeBSD since about 20 years, I did like a huge amount of ports and I mean the history in the PR database that I have is quite impressive, but I never
01:07
was really invited to become a ports commuter back then and only when I started working intensely on this did I become a commuter and well after that I've been doing a lot of things on other parts of the system and some of them not very easy to do, but this
01:25
was the thing that got me started and that's usually the story with many FreeBSD developers, you just need some part of the tree where you can hang and you can start doing things and then you will find a lot of more nice things to do in the system.
01:44
So why is X2FS still important? Something very weird in Linux, the one thing that identifies Linux, the one technology that they have developed pretty much from the start is X2FS file system. It's still the fastest file system in Linux and they even have this mentality that is
02:05
not probably what you would expect in a normal Unix file system is that if other file systems do the normal stuff, if other systems do their homework, then the fastest file system is simply the better, the best of them.
02:22
We will pretty much discuss that way of thinking here, but from the performance point of view I found a very recent performance benchmark of all their file systems and it appeared that in 2010 at least the fastest file system in Linux is still X2.
02:44
In Linux X2, X3 and X4 are actually different file systems. In FreeBSD we have done it so that we build upon extended 2 and then we add features from extended 3 and extended 4. So what for us is extended 2 file system is also extended 3 and extended 4, but for
03:05
them it is a completely different file system. Okay extended 4, the journaling file system from IBM and XFS, they all have similar performance. Extended 4 is a little bit faster.
03:22
One thing is as soon as extended 3 added the journaling, the performance went really low. They completely lost performance adding the journal. There are some nice things about the way extended 3 indeed uses the journal. They do some sort of delayed allocation in the journal and XFS has learned from that
03:43
recently and because XFS was even slower than X3FS. But RFS in the time in 2010 was actually the slowest file system of all the implementations in Linux. Another interesting thing about the X2FS, it is available of course in Linux.
04:05
There is support to Windows. The main BSDs have it. All the BSDs have it. There is a max OS X implementation, sorry. It is based on the FreeBSD version.
04:21
It broke recently because Apple has been removing some of the support required for external file systems but basically it was based on the FreeBSD implementation when it was done. ICU also has an X2FS implementation when the X2FS was created, there were also ports
04:46
for HERT and MassX. MassX, I don't know, it simply disappeared from the net. It was a research operating system in France but it appears to not be used anymore. And there is also an OS2 driver.
05:02
So you can find X2FS on basically every file system that you know around and it does have some Unix-like features that a normal FAT file system doesn't have. It has soft links, for example, and it accepts very well long names and it is case-sensitive. So it is still very useful as a file system.
05:23
You can carry in a memory stick that you want to interoperate on many operating systems and it has some Unix-like features that you want. Until not long ago, it was, the X2FS was sort of recommended for SSDs because since it doesn't do journaling, X3 is the one that does journaling, it doesn't wear so much
05:47
the SSDs as would an X3 or X4 file system using journaling. It is considered lightweight since they basically froze extended-2 and went to move
06:00
to extended-3 and then to extended-4. They tried to keep X2 very simple and that's actually a very interesting advantage because you can experiment adding new features to it without adding more complexity and see if these things would work in other file systems. It also has some academic value.
06:23
It has become very useful papers and for research for that same reason. Okay, I will just go a little overview of how this came to be at least in FreeBSD. In 1992, there was an initial implementation of the X2 file system in Linux.
06:41
It is based on the concepts seen in UFS but as we will see just in a while, the idea was to replace the system, MINIX-like file system that the first versions of Linux were using. In 1995, so basically three years afterwards, there was an initial port of the GNU extended-2
07:02
file system in FreeBSD. In 1998, NetBSD rewrote X2FS. They based on our UFS, their UFS, and they created a stripped-down version of UFS that basically read and write X2FS.
07:22
It was interesting. So you see a nice hole here. The idea is that for 10 years, no one really cared about X2FS in FreeBSD and that no one cares means that features rotted and really no one cared.
07:42
In 2009, I actually proposed it as a project idea once. Actually, I don't know, it was in 2005, I really forgot someone should do something about it and merge probably the NetBSD and the FreeBSD ports. I thought that but no one did it.
08:02
In 2009, 2010, 2012, we started getting Google Summer of Code projects related to X2FS and one of the ideas was it would be really nice to have a file system where students could experiment with other toy projects to see what happens because we didn't really
08:22
have, I mean, you cannot have people play with UFS which is the main file system used in FreeBSD, right? And well, importing a completely new file system is not something that people want to do so it would be nice to have somewhere where people can experiment and students are just fine with that.
08:45
Okay, I'll start a little bit with a history of Linux. As I said, it was based on UFS. The idea is just like in UFS, the file system is defined by its superblock and its inode structures. We have that in headers.
09:02
You can find it in, well, that's two X3 and X4 pages have a wiki site. You can check all their structures there. That's public information. And basically, they have the control of that design and if you follow the superblock and the inode structures, you should be able to reproduce what X2FS does.
09:24
One thing that is rather interesting and it's pretty much noted in the literature, like they started implementing the idea behind UFS and at some point they said, well, this works, this basically works without adding a lot of the stuff that UFS does.
09:46
In particular, the older UFS uses the geometry of the disk to try to synchronize the information, how it is written and that was actually useful for performance like long years ago.
10:01
But the disks evolved and that information now is mostly hidden or irrelevant and so Linux didn't go into that, of course. Even UFS doesn't really carry that information anymore, the modern UFS. They didn't go into that and they also didn't implement fragments.
10:21
Let's see what happens. You start with a block size in your file system and after some time you notice or the UFS guys noted pretty easily that if you make bigger block sizes, the performance would improve because you were able to write more information in less time,
10:41
so performance would improve. But growing the block sizes has this advantage that you lose a lot of information, right, because if your information is less than the block size and you are writing a complete block and you don't know how to, if you don't have any granularity there, that space will basically be lost.
11:03
So exactly that's what happens in X2FS. They have small block sizes. They don't grow the block sizes because if they grow the block sizes they might lose more space. But just using simple blocks without getting into managing fragments is rather simpler
11:22
and it seems to work very well with their normal performance expectations. Extended 3 was released in 1999. The idea is basically that it adds journaling. It's not optional. If you have a journal then you have extended 3.
11:40
If you don't have a journal then it's extended 2 and that's basically everything that was added in extended 3. The idea, there's also there a difference between FreeBSD and Linux in conception that FreeBSD tries to run everything or tried to run everything in sync mode.
12:03
Linux says, it's under the philosophy that sooner or later things may go wrong. It's just better to have a better FSK tool that will catch all the errors and leave the disk to do better performance. So they actually run in a sync mode and they don't even have a sync mode that works.
12:26
And then after some time they discovered that, well, a sync mode is not really that safe. We need something like journaling, they added journaling. On extended 4, it was released in 2008 and probably most of you remember how the process of that went.
12:42
Basically added extents so that they would get better performance. Journaling was kept but they made it optional. And so, well, if you want to know if you're using extended 4 or not then you must have extents. Otherwise you're using extended 3 or extended 2.
13:04
After this, well, the design has just shown its limits. The idea by the extended 4 designers and the developers involved is that the future is BotterFS. And BotterFS is pretty much somewhat like their answer to CTFS.
13:21
The idea is copy and write. And another thing that they have is BotterFS. You can convert your extended 2 or extended 3 or extended 4 partitions to BotterFS transparently. That's another reason why we really needed to have a good extended 2, 3, and 4 implementation in FreeBSD.
13:46
Because if you want to de-pinguinize your system, you basically have to read the information that is on those old partitions. You can probably manage around doing a tar of all the file system but, well, or a huge file system that is rather problematic.
14:01
It's really much better to be able to put your disk online and copy it to a CFS partition and continue working. And, of course, the license for the Linux kernel is JPL version 2. So that's the license there. Okay.
14:21
Now, the BSD light sports, that was in 1995, got more back, did this for match. There was a subset of the BSD code that was supported to match that was called match light. And they basically had two approaches to try to port.
14:41
They were also interested in porting, in having an X2FS. So they had two approaches. The first approach was they complete the Linux code and port it to the match kernel. That was terribly difficult according to Gotmar. They basically failed. And basically because the interfaces are completely different, the VFS differences and completely different
15:06
buffer cache and also the directory stuff, it was infeasible to maintain that port. So their second approach was take UFS. They actually use UFS in match.
15:21
And try to put the minimal part, well, not really the minimal, take all the block location code from Linux and glue it into UFS. And that worked. When you look at the code, you see some mentions that they are mimicking what they are doing in Linux using the same BSD functions.
15:43
But that strange mixture works. Some Linux specifics were not ported. And then also, of course, some of the UFS specifics were not really meant to work along with the Linux code so they were removed or not implemented.
16:01
The cluster rights. Cluster rights is a technology that was meant to be like, to give some of the performance advantages and extents into UFS. That was developed by Sun and, well, that was left out. And there was an if def fancy real lock.
16:22
It was also not implemented. It looked very difficult to implement on UFS. It was left like mentioned in the notes but it was not done. Of course, the license was GPL version 2 and BSD license, the part that came from UFS because it was based on the BSD code.
16:42
Okay. For FreeBSD 2.2, John Dyson, which was a very well-known developer, he did a huge revamp of the virtual memory stuff. He got in contact with a match UFS people and he took that code and he basically ported it to FreeBSD.
17:07
The port had a lot of problems. It is notably slower than the Linux version but it worked, sort of. There were many bug reports in the first versions but it worked.
17:25
NetBSD took that same code and ported it for a while and after that, after some time, it simply was not working very well. Okay. So we basically had a file system that worked everywhere.
17:42
Eventually, well, since the license is BSD plus GPL, it was not really very interesting for FreeBSD developers to work on it. There was no effort to keep in sync with upstream Linux in doing this. Eventually, when the journaling was implemented in Linux, we decided to remove the MKFS and FSCK utilities from our username
18:06
code because it was, well, if these tools found the journal, they would probably just break it, break the file system. So that was unacceptable. I think it stopped working. I don't know really why but no one cared about fixing it so it was simply removed.
18:27
And of course, we will see a little bit more about that later. It's GPL code so it had to be isolated from the rest of the kernel. It cannot live along with UFS. It was a KLD. It was not compiled into the kernel.
18:44
Okay. NetBSD. The idea was if this is really so similar to UFS, we can just take UFS, copy it somewhere and replace all the headers and then see how it goes from there. They did that and it worked. And that was, well, rather surprising and I think it was done like in two weeks.
19:03
Afterwards, they added some of the newer features but they basically re-implemented everything. Okay. They did copy some part from the FreeBSD code because I was just walking around the code and I saw, oh, look, this and this and this. One of the parts that it's very different between UFS and Linux is the directory lookup code.
19:27
Okay. As I said, the implementation is very clean. That's something interesting because if you take the FreeBSD, the FreeBSD implementation and you look at it, it has gone to like two or three operating systems before getting back into FreeBSD.
19:43
So it's a mess. And it has code that is on GNU style and code that is on FreeBSD style. The headers are from Linux. It was a complete mess from a style point of view. It was, well, difficult to manage, difficult to understand where it came from.
20:00
And one thing that is very clear is the implementation was slower than the FreeBSD port. Now, and remember, this is ten years afterwards. The FreeBSD file system was basically very rotten.
20:20
There was a first Google Summer of Code with the idea of making it GPL free. I mean, if it were GPL free, then that means we can start using it a lot more. That means developers will be more friendly with it. That means we can load it by default. It was important to have a GPL free. But even then, by the quality standards in FreeBSD, even if it were GPL free, it also had to prove some performance.
20:46
Otherwise, people say, well, we are not really here doing this because of license issues. We are not GPL zealots. We are not BSD zealots. So we want to keep the GPL license because it's not really doing any problem for anyone.
21:07
So, well, okay, the mentor was Lulf at FreeBSD. And the idea was, well, where should you start, right? If you are going to do this, where should you start? Should you start from NetBSD code, which is mostly clean out there and you don't really have to worry about the licensing of which file?
21:26
Or would you start from the FreeBSD part, which is already working? It's not an easy decision. But, well, oh, and should you start by looking at the algorithms or by looking at the headers?
21:41
Well, the headers are just renaming some variables. Maybe it's easier to start with a code. Who knows? Well, coders are lazy. So we started what was working right there. And we basically broke it by parts, tried to replace first the headers and then the code and do the proper adaptations from the NetBSD code,
22:05
which was being based on UFS was also very similar to what we would expect to do in our regular UFS. And finally, the report was done in that Google Summer of Code. But the result was the performance halt.
22:22
We basically lost a lot of performance. We didn't really know at that time why was all this performance lost. We did notice that we removed some of the features that the Linux implementation had. And so we blamed the pre-allocation code because it was lost. So one of the ideas was in the future, or the notes for the future maintainer was,
22:46
okay, pre-allocation, we have to implement it back or do something because the performance loss is unacceptable. There were many coding style loops. In FreeBSD, we have the style nine. If you are going to do code in FreeBSD, you have to take that into account.
23:02
And that's not something that university students learn nowadays. Well, it's life. One thing that was done that was pretty interesting is somewhat to try to compensate for the performance lost, the code was made MP-safe.
23:20
And the key to, well, a lot of file systems were about to be removed because they were not MP-safe. And the key tool to making X2FS MP-safe was looking at the UFS code. I was not, of course, I was not the student or the mentor in that project. I was around, but I was around pushing the student mostly to keep the same order of the code that was in UFS.
23:46
So when he had to look at the UFS code for making the code MP-safe, we had all the keys from UFS where to do the locking. Okay, so that's basically what we did was we looked into the history of UFS one.
24:04
UFS one has evolved a lot, of course. So we had to freeze the moment in time when the soft updates code was implemented. And before that, we started looking at how the structure was. And we could basically copy and paste from one side to the other and check out what improvements have been done in UFS.
24:24
There was an improvement in allocation. It actually came from OpenBSD in UFS. They came out with some DirPref changes, and they were rapidly taken by FreeBSD's UFS implementation.
24:41
The Linux guys took note, and they called it the Orlov allocator. The idea is it was just a pretty small change for the huge benefits that it brought in performance. Basically, you can reorganize the directory structure, the space utilization structure,
25:03
according to the number of files in a given directory, and that improves a lot of performance. It's mostly empirical. It was changed a couple of times in UFS itself. But it was important, and it was important because it was also in X2FS.
25:20
So taking it from UFS was really interesting. Okay, this is a table of the results. Now, I said we lost a lot of performance. The truth is there are a lot of benchmarks out there, and in some benchmarks it would look like the performance is a lot better than in others.
25:43
Okay, so the student was particularly interested in trying to show a similar performance. And, okay, you may see the violet part, that is the new implementation, is a little bit slower, only a little bit slower.
26:02
Okay, but the idea is this is with the SMP improvements. This is not the raw data, but this has, it has some improvements. It was acceptable enough to be committed into FreeBSD, even when, well, it was probably not very stable,
26:20
but the original code was not very stable either, and this was PSD licensed. So this made it into the tree. Right after that we were lucky enough that we had another student. And one of the things that I have to mention about both Google Summer of Code students involved in extended 2 FreeBSD projects is
26:41
they continued working after the Google Summer of Code was over. And that is just great because they could just live with their money and do other things in life like normal people. Okay, there were many issues left, but at least the code was clean from a license point of view.
27:02
Actually there were some headers that still kept GPL information and we stood a lot of time cleaning up that stuff. But basically it was clean. It's not, licensing is always complicated. It is not really clear if, for example, information from headers can be really licensed or not because they are usually just numbers.
27:27
Numbers cannot be copyrighted. Well, but nevertheless you don't want any type of licensing problem in your code so you just have to clean up the best that you can. Okay, there were some interesting research papers related to X2FS.
27:42
For some reason Linux became very popular and people started writing documents about how to improve the performance in X2FS. And these developments were not similar to the ones that have been historically been seen in UFS. So there was some sort of contrast that could be made.
28:03
We did bring a lot of features from UFS that were also implemented similarly in Linux. So they had sort of a high priority because that's lower hanging fruit that you can bring without much trouble.
28:21
Having the code base that is already very similar to UFS to make it look even more alike and to bring more performance improvements. So we fixed the sync mode. We added Odirect. Odirect is something, it actually comes from SDI's XFS. And the idea is most file systems will cheat on you and will catch a lot of the information.
28:46
They will have it right there so that, well for example there are small files, they are easy to have in memory. And you don't, they can be temporary and probably the time it takes to write them down
29:01
is a lot less, is a lot more than the time they are actually going to live. So you can catch them there and have them available and well no one will notice. And eventually you may not even have to write them. However, databases tend to like to avoid all that catching.
29:25
Because the idea is that some utilities may know better than the file system what has to be cached or not. So Odirect is a mode that basically disables the caching. And that was really easy to port back to X2FS.
29:42
The Odirect in Linux and in X2FS and in SDI's XFS does a lot more things. But just doing this basic thing was a good thing to have. Okay, then Bruce Evans. Bruce Evans has been basically working on X2FS.
30:02
He has been sort of a maintainer of the code. And he had discovered quite a lot of things that were wrong in X2FS. And doing frequent comparisons with UFS. The thing is many of the bug fixes that he found had been fixed in UFS by alternative means.
30:24
But had not been worked on in X2FS. And when the change from the GPL code to BSD license code was found, was made, he found a bug. A big bug that meant 50% reduction in performance basically.
30:44
The student said, oh, I'll check it. But John Baldwin is here, did it much faster. And that became a huge performance improvement for us. And it's so much that, I mean, if you add the async mode and this bug fix, after that we haven't really made it perform better.
31:08
And we have added a lot more features, but that was like the highest point in performance. After much, much more cleanups and coding and stuff, I became a commuter.
31:21
And that really helped bring a lot more of that code after working with the students. Right, because there was someone that could review it. Basically we had a small team that worked very well. There were reviewers of the code, there were people cleaning the code. And basically Sheng Liu was doing most of the basic coding.
31:45
The project was successful. But, okay, it was not only, okay, we wanted two features basically. We wanted pre-allocation. As I said in the previous Google Summer of Code, we knew that we had to do pre-allocation at some point.
32:00
Because it was the main feature that we had lost and we were losing performance. And we wanted to have some extended for support. And that's basically for people, the extended for was basically born in those days. And we needed to start bringing, to make it easier for FreeBSD users to bring information from their Linux file systems.
32:25
The code was not really committed that year. But, okay, here are some small benchmarks. As I said, there are benchmarks for everything. They will show what you want them to show, basically. Here is one of number of threads versus throughput.
32:44
The green, the green line was basically the async mode. Okay, the idea is that the student also implemented window reservations. I'll speak a little bit about this right now.
33:02
It is somewhat like the pre-allocation code. But if you see, okay, here there is a green and a blue light, and a blue line, they are basically the same. What we found is that the windows reservations didn't really work very well. This performance improvement on the green and blue lines is caused by the async mode.
33:24
Which traditionally, FreeBSD is considered insecure and something that you shouldn't use. But, well, if the Linux guys do that, it's sort of nice to have that mode for comparison purposes. This said, we are pretty much aware that our async mode is not as fast as the Linux async mode.
33:45
Okay, let me see. Now I'm going to talk about the Linux reservation code. The idea is we lost pre-allocation, but the extended two used a form of pre-allocation that the extended three removed.
34:01
Because it was very difficult to implement journaling using that old pre-allocation. What that pre-allocation did was it marked eight blocks in the disk every time you would open a file for writing. And leave that only for that file.
34:21
This developer, Mingming Cao, did basically the same as the pre-allocation in extended two, but he only did it in memory. And the reason is if you preserve those eight blocks in the disk, if you mark the disk where you want to use that space, and you are going to try to journal, then it will be a mess.
34:41
So he did it only in memory. He had a red-black tree, and for each node he had what he called a reservation window. That is the place where he's going to write everything. Okay, I'll be back to this comment down.
35:01
So we did that. We implemented the windows. Actually, Sheng Liu did the implementation of the pre-allocation window. And it's basically, okay, this is the GPL code. This is the BSD licensed code.
35:23
As you see, maximum value, minimum value. And the mid value. This is the value after Bruce Evans' buck report was fixed. As I said, that was an improvement to the allocation.
35:41
It would set the blocks more contiguously, and you see the mean value went up. But probably the most important thing is that this width diminished. That means the file system is a lot more predictable. If you see the GNU file system, which basically uses the same code as Linux.
36:05
Had a huge variation. The performance overall is less, but that's probably our fault. The Linux code does, we do the same allocation, but we have never got the same performance as Linux has.
36:21
So the lower value is our fault, but this width, this huge width is their fault. That's what I think, at least. Okay, as I said, this is much more predictable, but we also obviously lost performance. And then with the fix, performance went up again.
36:43
And this is the result of adding the reservation window. And as you see, well, the overall performance is mostly unchanged. But the mean, the medium performance diminished a little bit. And that was something, in short, it was amazing.
37:02
Because it meant either the student did something wrong in his code, and the reservation windows are simply not working as they are expected to work, or the code was wrong. And we basically thought either of those could be true, until I found this paper by Valeria Aurora, and there was also Theo Tso,
37:26
and some other extended tool developers. The idea is they took an extended tool file system and tried to reduce the FSCK time. And if you read what they were trying to do, you say, well, they are actually doing all this stuff simply because they don't have soft updates.
37:45
They are trying to do like a cheaper version of soft updates. That's what they were trying to do. And in their adaptations, they removed the extended two pre-allocation because it makes it pretty difficult to implement what they were doing. And they added the extended three pre-allocation.
38:02
And they got to this comment, the result for the reservation, only versions of X2 are even more puzzling. We suspected our port of reservations is buggy or suboptimal. And that's exactly what we found when we did the same reservations code. We think the code is buggy. I personally think, it's just my opinion,
38:23
that the reservation code only worked as a caching system because of the journaling. But I think that the algorithm was not really very good. Okay, so what did we do?
38:41
Okay, this is another benchmark simply. Let me see, as you see, let me see, okay. Well, this is basically, this shows the improvements. Okay, this is the GPL file system. This is the, no, this is the current file system, yeah.
39:02
Now, oh, this is interesting. Okay, this was the first BSD license file system. You see the performance is the lowest of them. The red line is the GPL file system. The performance is an average better. This is the fixed file system, both the violet and the blue,
39:23
and with and without reservation windows. And you see that it's mostly the same. Okay, we started considering something different from the reservation windows because it was clear that that was not working.
39:40
And the thing is the UFS code had the real lock block, that fancy real lock that I had talked about previously. The idea of the real lock block is not really performance. The idea is to reduce fragmentation. If you see that documentation in the Linux file systems, one of the ideas is, well, Unix doesn't really need to defragment.
40:05
And what we found, actually, in that documentation is that Linux does nothing about fragmentation. Over in 1994, Kirk McKusick developed this real lock block. It's basically some code that you allocate normally,
40:23
like you would do on your disks, the files. But right before writing them, this code will reallocate it all so it will be continuous. And this was actually the technology that made Bruce Evans' first research obsolete in UFS
40:44
because Bruce Evans noticed that the allocation was not doing continuous allocation on the disk, but the reallocation would fix that. So, well, he noticed a bug in UFS.
41:00
He didn't really report it because the reallocation covered that bug, and he didn't have anything left to report. That bug was fixed some time ago, but it was not really all that relevant thanks to the reallocation code. Okay, and as I said, the idea was not really the performance, but it had that side effect that it did improve the performance,
41:22
especially when the file system is aging. So the extended to from match had, if they have some code that at that time it was simply garbage for us. Sheng Liu did complete the implementation. One thing that he had to take into account is that
41:41
there are no fragments in Linux. So what he did was he displayed with the bitmaps in X2FS, and the code was much simpler and much smaller, and it worked. And we ran some tests, okay?
42:01
So we have what was current with the bug fix, and that is basically the line in the middle, right? The realloc code had some parameters that could be polished, a little bit fine-tuned, and we did get it to perform a little bit better than the normal implementation.
42:24
But the huge advantage is aging. We can control the fragmentation to some level, and that's something that the Linux file system doesn't do. And the performance is basically the same as we had before implementing it, so it was really worth having.
42:41
It also makes our code a lot more similar between UFS and X2FS. Okay, that was really nice. Now, the status after that Google Summer of Code. The reallocation code was actually done after the Google Summer of Code, but it was one of the first things that we brought.
43:00
Extended for read-only support was done. It was basically adding some more headers and having the disk recognize the extent structure to read it. We didn't get a reviewer. That happens on FreeBSD. We really wanted to have a reviewer, and actually Bruce Evans said, Well, I cannot review it. I don't have the time to do it,
43:22
but it's new functionality. It's not a security risk from what it looks, so basically you could bring it, but we really wanted some. I mean, if you were going to bring something to the tree and then start cleaning it up, it's not really nice. So what we did is we basically rotted in my disk space for a while,
43:46
and then I did find a reviewer, and he made many cleanups, and it broke, and it ended up broken. Well, I left it in my disk for a while. The other thing is when we activated realloc block,
44:02
we had to do something to test all these file system changes. You cannot just put in code that looks to work rather well. Now, we had to run some benchmarks, but we also had to run some tests, so we ran FSX. It is on the tree, the FreeBSD tree, and you can use it to test.
44:23
It was developed actually from Apple, and we started finding bugs in bugs in bugs. So we did bring this code very slowly, as long as we were fixing the bugs that we were detecting, and I honestly was doing most of my development in FreeBSD 9,
44:42
but FreeBSD 10 was current, and of course I cannot bring things directly to FreeBSD 9, so we tried to keep FreeBSD 9 and 10 in sync. Fortunately, X2FS is not a very common file system, so if you break it... Well, we didn't break it very badly. We did break it, but thankfully not many people were using it.
45:05
The feedback, though, was very useful. We proposed another Google Summer of Code 2012. It was not accepted. The idea was to add something called the directory index, and journaling. As I said, it was not accepted.
45:22
And, well, we did implement a lot of stuff that we found very easy to get from UFS, sigdata, SQL. We added support for huge files, some other stuff that, well, I will skip right now because we won't have time.
45:40
Something funny is, NetBSD... We tried to keep the implementation as close to NetBSD and OpenBSD and UFS as we could, and NetBSD did take a Google Summer of Code in 2012 with a directory index. We were discussing with Haiku. Haiku did the directory index also. We talked with them,
46:01
and we got them to re-license their MIT code to a BSD license. The difference is not huge, but it's rather nice to have all the process under the same license. But these guys started all over again, and we gave them some feedback about where to find some stuff and how to do the hashing,
46:20
and the student did it, and basically it appears to have worked, but NetBSD didn't take it. I don't know why. We discussed with them, and they were happy that we took the code, but they haven't included it yet. We found that this directory index...
46:40
Directory index is something used in many sophisticated file systems. Basically, most people do that with B-trees, but, well, the Linux made up this new technology, to call it, somehow, and they are very proud of it, and it was really nice to have it for some... One simple reason is, we unavoidably had to use the Linux data structures.
47:07
They are in the disk, right? So, if we can use their same code to provide some advantage, it's useful to have. So, we brought that one. Shengliu did the port, and we implemented it, ported it, tested it, and it's basically working right now.
47:22
Sadly, we have not been able to test it properly, and the reason is, no one of us is really using extended to file system for something critical anymore. None of us use it as a main partition.
47:40
We basically test it on a USB, and testing on a USB, it seems just saturates the disk at some point, and we cannot really see any advantage when using their index. Nevertheless, the code was very useful to get extended for into a file system.
48:02
Well, the extended for code, we cleaned it up afterwards. It basically works. We had to work around some of the issues that it has with the newer features, because some newer features... Linux wants to have some newer features
48:22
when mounting extended for their formatting with new options now in the file system. Basically, it wants to know that it can support everything they wrote they should support. So, it is working, but we are only interested in keeping it read-only.
48:43
Let me see. Let's hurry up a little bit, because it's time. Okay, some thoughts about the strategy. As I said, the thing with extended for is we don't really control the design. We have to do basically what Linux does. We cannot add data structures to the file system.
49:03
There are places where we added some data structures to UFS, but extended for adding the same feature that we're dealing at the data structure, so they'd have to be calculated in memory. That doesn't work very well, but it's what the Linux guys do,
49:21
and we have to do it exactly equal. We considered maybe implementing soft updates, which might sound a little bit crazy, but soft updates actually write some stuff into the data structures, but we don't own the data structures, so we cannot touch that. There are a series of things that UFS does rather well.
49:45
X2FS doesn't, X3FS doesn't, X4FS doesn't, but we cannot do either, because they don't do it. Okay, basically we wanted to get extended two in good shape. We were not into adding a lot of performance,
50:01
but we did get to a point right now where UFS one and extended two have the same performance. Okay, we always kept the data headers along with upstream with extended four, and that actually makes it easier to implement everything, because we know everything the Linux guys know
50:22
from the data structure, so we can take the better decisions on what code we have to implement. Okay, the development upstream. It went wild after X4, and the reason is X4 is dead, right? Next, the future development will be in butter FS,
50:42
so people are pushing. People are pushing very hard to get all the features they use into X4FS before it dies, let's say. It goes into dark mode, and that is a mess for us. It is a mess, because there are many features that we simply are not interested in using in FreeBSD.
51:04
Okay, kudos to Sheng Liu. Sheng Liu started, well, he did that last FreeBSD Google Summer of Code. He's now one of the biggest contributors in extended four file system. He works for a Chinese company, Beidou, a search engine, and they use extended four a lot,
51:22
and he has become an important contributor into the Linux extended four file system. And the thing is, he has continued working and developing on FreeBSD all this time. He would absolutely love to keep working on FreeBSD, but, well, the reality is he has to work on Linux because it's what his company uses.
51:42
About extended four read and write, as I said, I have no interest in that. The idea here is people that have their Linux file system may want to try, and that want to try FreeBSD will probably want to move to CFS, and the idea is that we want to make it easier for them to move their extended four partitions to CFS.
52:03
We are not really interested in post-regating that the life of extended four for them or for us. I think we would welcome someone from the Linux guys. I mean, if someone wants to provide that support, we would not reject it.
52:21
Also, the netBSD guys have started implementing journaling. We are not really interested in that, but if they complete that code, we will find a way to bring it to FreeBSD. We have kept the implementation similar enough that that can be done.
52:40
Okay, some stuff for the future. We don't have extended attributes. They are very different from the UFS ones, and it basically requires a complete implementation, and I'm not sure how to identify them in FreeBSD. Access control lists, that would be interesting.
53:02
They are based on extended attributes, but in theory, you could understand the ACLs without understanding the extended attributes. It could be done, but, okay, we're not working on that anymore. Also, the Linux file system extended four
53:22
can support an almost unlimited number of directories in the file system. We haven't really worked on that, and it's a complex problem because on FreeBSD, nlinks is signed. On Linux, it is unsigned, but they also find a way to break that limit.
53:43
We haven't really looked at that. It's a rather complex problem that we're not going to look. All these features are actually used in Linux in Lustre. Lustre, they do clustering file systems, and they added those and basically contributed them back into X4.
54:04
X4 was mostly designed by Lustre. Endianness, we haven't really worked on that. Linux is a small endian file system. Even on big endian platforms, it is supposed to write everything under a small endian. We basically are ignoring the endianness for now.
54:24
Benchmarks versus features and time testing. This is rather interesting, as I said. The idea, the general idea in Linux is the benchmark rules. If you see those forensics stuff, they do benchmarks, and they say, wow, CTFS is giving huge performance,
54:41
but it's doing something very strange because it's writing faster than my hard disk controller can write. So it must be wrong. I personally think it's wrong to just look at the performance of a file system. Because if X2, that is supposed to be the fastest file system,
55:04
it is actually the fastest file system, at least in Linux, doesn't take any care about checksumming, and your information is corrupted, well, your benchmark will be very well, but your disk is not. Another thing to consider here is
55:24
how do you justify spending time on extended three and even on UFS when there is a hugely interesting file system like CTFS. Of course, UFS has its own uses on systems that don't have those huge memories, etc.
55:40
But one thing doesn't exclude the other. And you can format a CVOL on CTFS to use X2FS that has some advantages, minimal, you would say, in that if your software somehow depends on using the layout of the file system,
56:01
like if it's a specific database, or if it's a disk-specific utility that does undelete, for example, then it is useful to have X2FS in a CVOL. And you get, of course, all the advantages of using CTFS. Another idea is, as you see, the MatchOS guys,
56:23
what they did was they took a part of a Linux file system and plugged it into UFS, basically. That could be done for other file systems. As long as they keep some similarity with UFS, I think what was done with X2FS could be done with another file system.
56:43
So basically, a Linux file system could be a template for, our X2FS could be a template for other file systems. But that's something I haven't really looked at deeply. At least with XFS, that's not true at all. Okay, and the recommendations. Try to use X2FS when formatting.
57:03
Use TuneFS to add birth time, because we support that fine. And you can also use the dir index, which most Linux distributions are now starting to use by default. Well, that was basically it. So there is some minimal time for questions or comments.
57:29
Questions? Okay, well, it took like four years to work on X2FS and take it to the stage it is in. We didn't do it at a professional level. We're all just amateurs at this and learning our way around file systems.
57:45
But, well, you cannot expect everyone to pay for this type of thing. And it's really nice to have these things that are apparently not that important, also polished and working well on FreeBSD. Thank you.