We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Loupe: Designing Application-driven Compatibility Layers in Custom Operating Systems

00:00

Formal Metadata

Title
Loupe: Designing Application-driven Compatibility Layers in Custom Operating Systems
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Providing support for mainstream applications is fundamental for a new/custom OS to have impact in the short and long term. This is generally achieved through the development of a compatibility layer, currently an ad-hoc and unoptimized process that involves a vast amount of unnecessary engineering effort. There is a need for efficient methods to measure precisely what OS features are really required by a given set of target applications, gathering results that can help drive the development of compatibility layers by pinpointing what features should be implemented in priority. In this talk I will present a streamlined methodology to optimize the development of the OS features required to build a compatibility layer in order to support a set of target applications, focusing on the system calls of the Linux ABI. To avoid engineering effort overestimation, we rely on dynamic analysis. The methodology revolves around a tool called Loupe that measures, for every system call invoked by an application processing an input workload (e.g. benchmark, test suite, etc.), which ones really need to be implemented and which ones can be faked/stubbed/partially implemented. Given a set of applications and input workloads, Loupe can compute for a given OS an optimized compatibility layer development plan, aiming to support as many applications as possible, as early as possible. We analyze Loupe's measurements over a wide (100+) set of applications, and demonstrate in particular that the effort needed to provide compatibility is significantly lower than that determined by previous works using static analysis: our study shows that as much as 40-60\% of system calls found in application code are not even needed to successfully run meaningful workloads and even full test suites.
System programmingOperations researchPrototypeVector potentialNumberSource codeBinary codePhysical systemProcess (computing)BuildingKernel (computing)Loop (music)Standard deviationCartesian coordinate systemBitLibrary (computing)Slide ruleRepository (publishing)Software developerMultiplication signBinary fileWordPhysical systemMeasurementCodeProcess (computing)CASE <Informatik>Source codeLevel (video gaming)Different (Kate Ryan album)Binary codeImplementationMereologyOperating systemEmulatorGame controllerRevision controlHypermediaStudent's t-testCrash (computing)System callInformation securityTerm (mathematics)NumberAbstractionSelf-organizationFlow separationPrototypeProjective planeMathematical optimizationOperator (mathematics)Machine learningSet (mathematics)Total S.A.Uniqueness quantificationForcePoint cloudDomain nameArmMultitier architectureDampingParameter (computer programming)Branch (computer science)Computer animation
Process (computing)BuildingOrganic computingImplementationLatent heatMathematical analysisKolmogorov complexityInformation securityInstallation artCodePhysical systemSoftware developerObservational studyStability theoryFile formatCompilation albumOverhead (computing)Web pageLine (geometry)System callSource codeAerodynamicsStatisticsMeasurementMobile appDatabaseWorkloadScripting languageView (database)outputGastropod shellKernel (computing)ImplementationMessage passingPhysical systemSystem callLoop (music)Operating systemPlanningSubsetoutputCodeCore dumpInheritance (object-oriented programming)Cartesian coordinate systemRight angleLevel (video gaming)Electronic mailing listMathematical analysisOnline helpBinary codeBitMathematical optimizationWeb pageSet (mathematics)HookingMagnifying glassPoint (geometry)Stress (mechanics)Operator (mathematics)Uniqueness quantificationProcess (computing)Semiconductor memoryPresentation of a groupMobile appServer (computing)Web 2.0Generic programmingInterface (computing)Kernel (computing)Library (computing)1 (number)Software developerGame controllerRepository (publishing)MeasurementOrder (biology)ResultantStudent's t-testView (database)Point cloudDatabaseComputer fileFluid staticsWorkloadStandard deviationCASE <Informatik>EstimationHoaxDescriptive statisticsSource codeMultilaterationComputer animation
Mobile appType theoryComplete metric spaceScripting languageCodeComputer fileServer (computing)Client (computing)WorkloadMathematical analysisProxy serverSystem callEstimationAerodynamicsImplementationBinary codeFluid staticsPhysical systemEmailElectric currentLimit (category theory)Configuration spaceSystem programmingChannel capacityMathematical analysisPhysical systemComputer fileEntire functionCartesian coordinate systemScripting languageHoaxSystem callSuite (music)Mobile appRight angleSoftware testingClient (computing)ResultantCASE <Informatik>Ferry CorstenWrapper (data mining)Library (computing)Cloud computingLevel (video gaming)BenchmarkOpen setArithmetic meanBitBinary codeCodeType theoryCrash (computing)Complete metric spaceMessage passingNumberSubsetReading (process)Operating systemDifferent (Kate Ryan album)DatabaseView (database)BuildingServer (computing)Letterpress printingError messageFunction (mathematics)AdditionPlanningSource codeFluid staticsImplementationDemo (music)Multiplication signCuboidLimit (category theory)EstimationRevision controlTerm (mathematics)Software developerCustomer relationship managementMaxima and minimaDynamical systemGoodness of fitWorkloadStandard deviationSet (mathematics)EmailComputer animation
RootkitSequenceNormal (geometry)Level (video gaming)System callScripting languageSoftware testingMultiplication signParameter (computer programming)Loop (music)Binary fileBinary codeComputer configurationComputer animation
Normal (geometry)Gastropod shellSoftware testingLoop (music)Scripting languageFunction (mathematics)Parameter (computer programming)Standard deviationComputer animation
StatisticsRootkitDatabaseMathematical analysisGamma functionAerodynamicsEmailGEDCOMSoftware testingCache (computing)Fluid staticsBuildingWage labourSoftware testingCASE <Informatik>Mobile appSet (mathematics)Different (Kate Ryan album)Multiplication signBitPhysical system
System callVertex (graph theory)Streaming mediaCore dumpProxy serverGamma functionExpressionWorld Wide Web ConsortiumDatabaseGEDCOMComputer fileResultantHoaxCartesian coordinate systemComputer animation
MeasurementBuildingProcess (computing)Streamlines, streaklines, and pathlinesPhysical systemWorkloadMeasurementSemiconductor memoryLevel (video gaming)Computer programmingOperating systemFile systemSystem callVirtualizationFlagComputer animation
Program flowchart
Transcript: English(auto-generated)
At the end of our final talk for this session we have Pierre here He's going to discuss about loop a tool that well he and we have been using to Measure compatibility for different OSes Pierre you have the floor. Thank you as well. Thanks everyone for attending my talk
This is joint work with a bunch of colleagues and students Including Hugo my PhD student. He's a key player behind his work. I'm just you know, getting all the Media situation, but he has built all this stuff. So all the credits go go to him. So in this
Brief talk. I want to speak a bit about application compatibility for custom operating systems So I guess most of you don't need to be convinced that we still need custom operating systems today when I say custom, I mean both like research operating systems And prototype operating systems from from the industry right the thing the thinking that Linux has solved
Everything is not true in my opinion We still need things like unique craft if you want to go fast or if you want to specialize like crazy We still need things like rusty arm it if we want security or SCL for so we still need custom operating systems
and The thing is with this operating system, they're only as good as the application that they can run right so compatibility Is key compatibility with existing application is extremely important if you want to build the community
You want you want your user to go to you your website compile your custom OS and then try some of their favorite applications or try some of the Highly popular applications in a given application domain like Nginx or Redis for cloud if you want to attract sponsors or investors
Or even if you like me are a scientist you want to gather some early numbers to make a publication Well, you need to do that on standard applications, right? So so compatibility is important and Another argument would be like how many times did you hear the word POSIX spoken today, right?
There were some slides. There was POSIX like three or four times written in the single slide so compatibility is important and it can be achieved in a Few different ways as we have seen with Simon, but one important thing to note is
in my opinion Porting is not sustainable So porting is what many of us do we build a custom operating system And then we take Redis and obviously it doesn't work as is with our operating system So we move we we modify Redis a bit we disable some feature because we know that they make our OS crash
And then we have Redis like a version customized for our operating system This is not sustainable because you can't maintain like a branch of Redis For every operating system out there right in the long term it doesn't it doesn't work so well, so Porting also basically means that
You as your as developer you ask the users of your application to make some effort to The application developers they need to make some effort to be compatible with your operating system like this. This doesn't work Nobody is ready to make that kind of effort Maybe if you give them 10x performance speed up, but you know, this is unrealistic. So
What you want to do is once again in my opinion As an OS developer you want to provide compatibility as transparently as possible And this means you emulate a popular operating system
For example Linux or a popular abstraction like POSIX or the standard C library And then you can be compatible at three different levels the first level is source level or API level compatibility, so you ask the user to compile the application code
against the sources of Your kernel in the case of a unique kernel So this is a you're still asking some effort from the users writing in many scenarios You don't have access to sources, right? If you have proprietary binary or pre-compiled binaries. Well, you can't have source level compatibility
So it's not it's not perfect and binary compatibility is generally a more let's say pure version of compatibility and There are many two ways to achieve it. You can do that at the level of the standard C library like OSV
So you will dynamically link your kernel plus a standard C library against your application compiled as a position independent executable or as a shared library itself this is great, but If the application is making directly system calls to the kernel without going through the C standard library
Well, once again, it doesn't work, right? And as a matter of fact I have counted more than 500 executables in the Debian repository that contains the Cisco instruction Right, so they make Cisco directly to the kernel. They don't go through the C library
Go for example is making most of its Cisco put directly to the kernel And not through the C library. So what you want to do is to be Compatible at the level of the system code. So your kernel needs to emulate the Cisco API that Linux is providing This is the most transparent
way of achieving compatibility Now this is scary, right? Linux has more than 350 system calls. Do we need to implement them all? Will we be and Aren't we going to re-implement Linux by doing so? And some of them are Extremely scary by themselves, right? You have like hundreds of IO controls and each of them probably require its own implementation
The Linux API even goes beyond system calls. You have things like slash proc slash dev that are, you know, used in by many applications. Like the first thing a muscle binary does when it runs is to
Look in, I believe it's slash proc or slash sys to get the size of the terminal, right? So you need to have emulation for this part of the API too. And this, because it seems like a big engineering effort this creates It hinders the development of custom operating systems. So this is
inspired by the keynote by Itimati Roscoe at ETC and OSDI 2021. We looked at all the papers So these are top tier operating systems Conferences and we look over the past 10 years
Over a total of more than 1,000 papers. How many were about proposing a new operating system as opposed to things like security or machine learning and Among them how many were just hacking Linux versus proposing an actual operating system
implemented from scratch and the numbers are similar to what we saw earlier, right? You have just very very very few papers proposing a new operating system because it's a significant engineering effort and and part of the effort is to be is providing compatibility to run application like
Apache or Redis to get a few numbers at the end of the paper, right? So this is a problem Now the particular problem that I want to talk about is how I'm sure several people in this room have attempted to build some form of compatibility layer for
and operating systems and we are all kind of working on the same thing in parallel with some form of Ad-hoc processes that may benefit from some some optimization So I've just listed here a few a few projects that have a Cisco level binary compatibility layers, but actually there are many more
and From what I understand it is a very Organic process. So first of all, it is application driven, right? People have a few sets of application in mind that they want to support If you are doing cloud you want to support the user suspect and Redis Apache, whatever
and and the process basically Looks like that you take an app you try to run it On top of your operating system. Obviously it fails you investigate You're like, oh, I'm missing the implementation for this system call So you implement whatever operating system features are required to fix that particular issue
Rinse and repeat until the app is working and then you go to the next app. So it's a very Intuitive and organic process. So when I build the Hermitage, this is exactly what I was doing So Like something that comes to mind is can't we have some form of
generic compatibility layers that we could plug Like something a bit like new lib that would provide a generic interface and I Believe it's it's not really possible because most of this implementation to support the system code is very specific to whatever operating system
You are using And it's not clear if a generic compatibility layer can be achieved But can we still somehow optimize that process Some have tried static analysis, so
They take a binary of the applications they want to support and they look okay So what are the system cores that are made by these applications? So this has been done in The best paper in Erosys 2016 Analyzed all the binaries from Ubuntu, I believe it was 1404
Repositories and they concluded that every Ubuntu installation including the smallest one require More than 200 system calls and 200 IO controls, 5 controls and PRCTL calls and 100 of 2 of 5 So this doesn't help it is still quite scary
It still represents like a gigantic Engineering effort but Do we want full compatibility with you know Ubuntu installation In the end, you know, especially in the early stage of the development of an operating system
you just want to get a few applications up and running and Do you even want 100% compatibility right when I write a paper I don't really care if everything is stable. I just want to get some numbers So Isn't there a better way and obviously maybe you think about yeah, let's do dynamic analysis, right?
Let's run the applications that we want to support We send them some inputs that we want to support like I'm running Engineings and I'm submitting some HTTP or something like that and then we retrace the system calls that are done, right?
So this is going to give us a subset of the system calls that can be identified through static analysis That has a tendency to overestimate. So with this trace The engineering effort to support an application and a set of input is is a bit lower But it's still not a panacea because it's not taking into account two things that we do when we implement compatibility layers
So this is my code. Don't judge me one thing that I did with from army tax was At some point it was an app that was calling em in court to check if some page of memory was unswapped or not It's as actually, you know, there is no swap in in most unique hair nail
So it really didn't matter to implement this so you know, sis means operation not supported Right, so stop being a system core is just saying yeah, we don't support it and you cross your finger that the application has some kind of fallback pass to do something else if the Sis code fails and it works in some cases and then we can do something even more nasty. Don't judge me again
You can fake the success of a system code right surprisingly in some situation returning a success code Even if the system code doesn't have any implementation in your operating system
It's gonna work in some cases, you know, I'll tell a bit more about why this work sometimes So stubbing and faking lets you implement even less system cores than what you would trace with it trace So in the end, you know If you want to support an app or a set of application in your custom operating system
The amount of system cores that you actually need to implement So obviously it's more than the entire Linux is called API Static binary analysis will on the binaries of the applications you want to support will identify a subset of that Still pretty big. It's an overestimate Source analysis gets you more precise results, but it is pretty hard to achieve and it is still
overestimating S trace will give you once again a subset things start to look better and Among these trace by S trace you actually don't need to implement everything you can stub and fake some some of this is called so
Can we measure that yes with loop so loop means magnifying glass in in French It's a tool that was built by Hugo my student and it's some kind of super S trace That is measuring the system cores that are required to support an application and that can also tells you which one you can
Stub and which one you can fake So we used it to build a database of measurement for a relatively large set of applications And with loop if you give me a description of your operating system
basically the list of system cores that you already support and you give me the list of Applications that you would like to support we run them through loop and look and derive a support plan Which basically will tell you okay for this set of target application And given the set of system cores that you already support
What is the optimized order of system cores to implement to support as many applications as soon as possible? Okay, so I will give you an example of support plan by the end of the presentation So from the user point of view loop needs two things to perform its measurement on a given application You give it a Docker file that is
Describing how you want to build and run the application for which you want to measure the syscalls needed and Optionally, you may need an input workload think about a web server It's not going to call many system cores until you actually start to send requests to it
Luke will instantiate the application launch it on a you know standard Linux kernel and Analyze the syscalls that are done and with a few tricks will be able to Know which ones can be faked or stubbed the results are It's basically just a CSV file for each syscall that is made by the application
Can it be faked can it be stubbed or does it require so full implementation we store that in a database and Later so, you know, we populate the database with as many measurements as possible and this database can
Given the list of syscalls that is already supported by your operating systems gives you like some form of optimized super plan given Which of the applications you you want to support? Okay, so how does it work? When loop runs the application first It does a quick pass of stress to measure all the system cores that are done by the application and then
for each system cores and to identify we use sec comp to hook into the execution of each of the system cores and rather than actually executing them through the Linux kernel
we Emulate the fact that the syscall is stubbed. So we just return inosis without executing the syscall We can also emulate the fact that the syscall is faked. We return zero and then we check if the application Works or not Following the stubbing of the faking of this particular syscall and then we do that for each system core that we have identified
with this trace How do we actually check for the success of the execution of the application So we identified two types of apps some we call them run to completion There'll be something like a file and you know, you start a file it runs for one minute and then it exists outputting some some kind
of some stuff from the standard output So with run to completion apps We run the app instrumented with loop. We check its exit code if it's different from zero We consider that the run was a failure could have been killed by a signal or things like that
And we can also run a script optionally in addition to that After each run of the application to check its standard output we can grep for error values We can grep for you know, success printing something like, you know, 50 requests per second have been achieved
The files that may have been created by the application and so on And and then another type of application is client servers So with client servers, we run the app instrumented by loop and in parallel We run a workload could be WRK, HTTPERF, the Redis benchmark for Redis and so on and we check the success of both
We check that the app doesn't crash Originally servers are not supposed to exit So we check that it doesn't crash and we check the success of the workload like, you know If Redis benchmark written something different than zero probably something went wrong and then we are able To see okay, so I'm currently trying to stub the read system call
Is the application succeeded or not? So we use a database, let me check the time, okay and we analyze the results so we So these results are made on a relatively small database of about 12 highly popular
Sorry, 15 highly popular cloud applications. So this is just a subset So what you have on the Y-axis is a number of system calls that are identified by static analysis in purple on the binary on the sources
In yellow and then dynamic analysis and we run for each of these application Both the standard benchmarks that we Redis benchmark for Redis, WRK for Nginx And so on and we also run the entire test suite so the KID with the test suite is if you
You know support. I mean if you if you measure what's going on during the entire test suite you get a very good idea of What are all the possible system calls that could be done by the application? Obviously, you need to assume that the test suite has a good coverage But it is the case with these very popular applications and and what we see is
First of all, you know static analysis Overestimate this is not very surprising the amount of syscall that is identified by static analysis is relatively high compared to what we get with dynamic analysis and if Something interesting too is that the amount of syscall that can be stubbed or faked so the grain bits
On the dynamic analysis path. It is actually quite is non negligible, right? So what this mean is that if you want to support Redis with a Redis benchmark where Binary level static analysis tells you that you should implement 100 system calls
If you just want to run the Redis benchmark to get new performance numbers for your paper You actually need to implement just 20 right, so that's like Divided by five right and and if you want to pass the entire test suite of Redis you Need to implement about 40 it's still like half what
Static analysis is telling you. So it's kind of a message of hope right for Building compatibility layers and for developing custom operating systems in general so yes static analysis overestimates a lot the engineering effort to support an app and Even naive dynamic analysis does measure much more syscall than what is actually required
if you Know that you can stub and fake syscall Another view at these results Can be seen here. So for each of the system calls, you know, zero is read one is write two is Open I guess and so on
among our Dataset of about 15 apps how many of these apps require the implementation of the system call in question, right? And then so you have here's a result for static analysis at the binary level At the source level
This is s trace without counting which system calls you can stub or fake and This is what is actually required. So if you consider that you will not implement what you stub or fake This is what you actually need to implement and as you can see, you know, it's much much much much less Engineering effort versus what static analysis is telling you
Why does stubbing and faking work? So here you get some code snippet from Redis so if you stub Get our limit as a C library wrapper will return minus one and as you can see
Redis will actually fall back on some kind of safe value, you know I'm not able to understand the maximum of files that I can that I can open so I'm going to fall back on 100 sorry 1000 and The fact that faking works is
Actually that you have quite a bunch of system calls So this is for each system call and each app in our data set How what is the percentage of apps that are actually checking the return value? Of the system calls and Some system calls are almost never checked the return value. It kind of makes sense, right? When you see this
Why check the return value of clause? And and this is why you know faking work in in in many cases And as a question that we asked is okay, so When you speak about
Providing binary compatibility and and you don't do boxing anymore Basically all the effort of supporting apps is on you the operating system developer and this is how it should be in my opinion, but how How much effort does that mean in the long term right now, so we had a look at
versions of Redis engineering's and Apache over the last 10 years and What you know What are these calls that actually needs to be implemented in purple and we saw that this number does not change very much right, so once you make an app and you make it work it actually means that
You need to keep up to date with the Most recent version of this app that are coming up, but it doesn't necessarily mean a very big engineering effort either And these are the support plans so we had a look at Unicraft, Fuchsia, which are some
Operating systems that are already a relatively good support for a good number of system calls And we look at Kerla, so Kerla is another Unicraft mail written in Rust and it's very I wouldn't say immature But it doesn't have support for a lot of system calls and for our set of 15 apps that we had in the database
We derive a support plan so for Unicraft for example In its current state. It's already supporting most of the apps of our data set If you want to support an additional app what you need to do is to implement system call number 290 and stub these and then you'll get memkft and
Next if you implement this syscall You get h2o and then you need to implement these two syscalls, and then you stub that and you get MongoDB Okay, so same thing for Fuchsia and Kerla Obviously, it's a bit more interesting because this one doesn't support many applications out of the box
And I believe I have time to do a quick demo. Okay. I'm gonna do it real quick. Yeah, so this is So I'm gonna do a test with LS which is like the simplest test because we don't have a lot of time In the Dockerfile I just copy a test that I'm going to show you
And then I call like the this is kind of the top level script of loop With a few options that don't matter that much and I say ok The binary that we are going to instrument is slash bin slash LS and and this is the parameter
So I'm going to do LS slash and we are going to check if it works or not With every possible syscall that can be invoked by LS and the test Which should be there The test that we are going to run after each execution of LS to see if things have work So this shell script will take the standard output of LS as parameters and to make things simple
I'm just checking that LS actually output something right? I'm doing LS slash so something should be outputted if nothing is output There is a problem and keep in mind that loop is also checking the return value of LS itself
So, okay, so I'm launching loop Like this which should work so What happens under the hood is that we build a container that we've seen the Dockerfile for We are starting two containers in parallel. Each one is running a full set of tests
Trying to stop and fake all the system calls and we use this to Check for differences between the replicas in case there is a problem Most of the time there is no differences. So it takes a bit of time and then
Okay, it's done So If we go to the database, so we have now much more than 12 apps And if we go to LS The most interesting results is This CSV file which contain for each school zero being read one being right
Is it called by LS or not? Can we fake it can we stop it and can we Both fake and stub it Or it's more like does the application works when it's fake does it works when it's dead and does it work when it's both fake
instead and as you can see some Cisco's like 11 I don't know which one it is can be both a stud and fake something for 12 something for 16 Some Cisco's like this is read for example It is called but you can't take it which kind of makes sense and this wouldn't work if you if you can read
and Yeah, that's pretty much it. So briefly What we are currently working on. It's a more fine-grained measurement some system calls have kind of Sub features like a lot of programs will require at least map anonymous for a map to allocate memory
But not really to map a file. So we are looking at you know Checking which flags can be stubbed or fake and things like that. And we are also looking at the virtual file system API That's it so building compatibility layers is important for custom operating system it seems a bit scary, but actually it's not that much engineering effort and