We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback
00:00

Formal Metadata

Title
CloudABI
Title of Series
Part Number
30
Number of Parts
79
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
CloudABI is a new runtime environment that attempts to make it easier to use UNIX-like operating systems at the core of a cluster/cloud computing platform. Ed Schouten
Open sourceFreewareSoftwareVideo game consoleDevice driverPhysical systemFunction (mathematics)UnicodeProcess (computing)Personal digital assistantScale (map)World Wide Web ConsortiumService (economics)Computer networkDirectory serviceDatabaseQuicksortDifferent (Kate Ryan album)Process (computing)Kernel (computing)CodeProjective planePhysical systemOperator (mathematics)Revision controlBuildingInformation securityPoint cloudState observerFundamental theorem of algebraSlide ruleDebuggerDefault (computer science)Operating systemVideo game consoleDevice driverTheory2 (number)Front and back endsScaling (geometry)Server (computing)Web 2.0NumberSystem administratorDatabaseComputer fileDirectory serviceNetwork socketCloud computingVideo gameOpen sourceFormal languageGoodness of fitSoftwareInformationFile systemExploit (computer security)Web pageMereologyWeb serviceFlow separation1 (number)CASE <Informatik>CompilerArithmetic meanShared memoryInternet service providerSensitivity analysisMultiplication signScalabilitySoftware developerBitUniverse (mathematics)Latent heatThread (computing)Term (mathematics)Mathematical analysisExecution unitRun-time systemFreewareCategory of beingDisk read-and-write headRight angleComplete metric spaceAxiom of choiceForcing (mathematics)Stability theoryClient (computing)Interactive televisionInsertion lossSpectrum (functional analysis)Expert systemConnected spaceXMLUMLLecture/Conference
Service (economics)World Wide Web ConsortiumComputer networkDirectory serviceDatabaseVariancePhysical systemInformation securityCapability Maturity ModelSoftware testingNP-hardComputer programSocial classJava appletNetwork socketString (computer science)RootComa BerenicesCodeStandard deviationParameter (computer programming)Configuration spaceComputer fileWeb 2.0Computer programmingJava appletStandard deviationNumberInjektivitätComputer simulationConfiguration spaceDependent and independent variablesSingle-precision floating-point formatLevel (video gaming)Connected spaceSoftware testingProcess (computing)Overhead (computing)Constructor (object-oriented programming)Programmer (hardware)Server (computing)QuicksortSocial classCASE <Informatik>2 (number)Cartesian coordinate systemSlide ruleDirectory serviceRootKeyboard shortcutNetwork socketComputer fileInterface (computing)Content (media)VirtualizationPhysical systemWritingDenial-of-service attackFile systemInformationBitLine (geometry)Information securitySoftware maintenanceTask (computing)Object (grammar)Open setVirtual machineNamespaceMultiplication signInstance (computer science)Functional (mathematics)SoftwareSocket-SchnittstelleOperating systemMiniDiscParameter (computer programming)CodeMessage passingRadical (chemistry)RandomizationPairwise comparisonAreaRight angleInternetworkingService (economics)Web crawlerAdditionState of matterSource codeBranch (computer science)Library (computing)Execution unitOperator (mathematics)Archaeological field surveyBinary codeSimilarity (geometry)DemosceneBlogStochastic processLecture/Conference
Link (knot theory)World Wide Web ConsortiumServer (computing)KnotInterior (topology)WritingComputer virusAddress spaceFamilyConcurrency (computer science)Network socketCommunications protocolPoint cloudProcess (computing)Personal digital assistantLetterpress printingRun time (program lifecycle phase)Run-time systemSoftwareExploit (computer security)Scale (map)ComputerDefault (computer science)Group actionResource allocationRead-only memoryThread (computing)MiniDiscOpen setComputer networkTable (information)AlgebraBitMereologyComputer fileDirectory serviceSocket-SchnittstellePhysical systemRight angleService (economics)Hacker (term)VideoconferencingFile formatInformation securitySoftware bugFunction (mathematics)output3 (number)Latent heatFamilyWordWeb serviceScripting languageSoftwareOffice suiteAverageWebsiteSocket-SchnittstelleComputer fileServer (computing)Domain nameWeb 2.0Process (computing)Stochastic processEntire functionVideoconferencingRight angleInterface (computing)Physical systemVector spaceNetwork socketQuicksortExploit (computer security)Software frameworkFile systemDatei-ServerMiniDiscFile formatMereologyScaling (geometry)Spontaneous symmetry breakingDivisorDependent and independent variablesData storage deviceProduct (business)Instance (computer science)Run time (program lifecycle phase)RandomizationNeuroinformatikExtension (kinesiology)Information securityDescriptive statisticsState of matterConcurrency (computer science)Directory serviceNumberoutputDialectDeterminismAuditory maskingConnected spaceCASE <Informatik>Group actionSpacetimeGame controllerPoint (geometry)Multiplication signCodeLine (geometry)ForestCoprocessorEndliche ModelltheorieFirewall (computing)RoutingCondition numberRevision controlTranscodierungSet (mathematics)Service (economics)Stability theoryAdditionPrisoner's dilemmaRun-time systemSemiconductor memoryChannel capacityCartesian coordinate systemSingle-precision floating-point format1 (number)Reading (process)Shared memoryBitSmartphoneDynamical systemInternet service providerPoint cloudRead-only memoryMessage passingYouTubeRootWritingSystem callLecture/ConferenceComputer animation
File formatSoftware bugInformation securityVideoconferencingFunction (mathematics)WritingProcess (computing)outputService (economics)World Wide Web ConsortiumDirectory serviceExploit (computer security)Flow separationReading (process)Set (mathematics)Computer fileDifferent (Kate Ryan album)Physical systemInterface (computing)Kernel (computing)ImplementationCompilation albumArchitectureComputer hardwareSystem programmingSoftwareComputing platformComputer virusInstallation artCNNPatch (Unix)Operations researchPersonal digital assistantPoint cloudArmWhiteboard32-bitNeuroinformatikComputer fileProcess (computing)Endliche ModelltheorieFunctional (mathematics)Binary codeSoftwareDirectory serviceWeb 2.0Physical systemServer (computing)Black boxOperating systemCuboidData typeVulnerability (computing)Patch (Unix)Streaming mediaRevision controlLibrary (computing)Cartesian coordinate systemException handlingComputer programmingRight angleDatabaseSoftware testingSet (mathematics)Different (Kate Ryan album)Software developerComplete metric spaceCompilation albumModule (mathematics)BitKernel (computing)WritingDomain nameQuicksortVapor barrierSlide ruleProduct (business)Perfect groupHoaxTheoryStandard deviationInformation securityCore dumpInstallation artExtension (kinesiology)Buffer solutionProper mapWeb pageDemosceneFunction (mathematics)Link (knot theory)VideoconferencingCodeNP-hardAdditionWorkstation <Musikinstrument>Line (geometry)Computer architectureElectronic mailing listLogical constantSubject indexingCorrespondence (mathematics)Run-time systemPattern languageLatent heatSquare numberPredictabilityState of matterBinary fileUtility softwareBuffer overflowComputer hardwareCompilerLinker (computing)Water vaporObservational studyPresentation of a groupInternetworkingElectronic program guideEvent horizonACIDWordPasswordTotal S.A.Interactive televisionSound effectFigurate numberLecture/Conference
Computer programInterior (topology)StrutGastropod shellFunctional (mathematics)Computer programmingDirectory serviceRight angleMiniDiscComputer fileUtility softwareAreaQuicksortSheaf (mathematics)Radical (chemistry)Connected spacePoint cloudCorrespondence (mathematics)Term (mathematics)CompilerResultantRevision controlFunction (mathematics)FehlererkennungResolvent formalismCore dumpElectronic mailing listCompilerFlagComputer animation
Gastropod shellProcess (computing)Portable communications deviceSocket-SchnittstelleNumberShared memoryObject (grammar)Computer programConfiguration spaceSet (mathematics)Network topologyData structureSimilarity (geometry)Line (geometry)World Wide Web ConsortiumServer (computing)Computer fileIP addressConfiguration spaceError messageWeb serviceService (economics)Data structureUtility softwareServer (computing)Network topologyPoint cloudLine (geometry)WordUniverse (mathematics)Attribute grammarWeb 2.0Gastropod shellPhysical systemSoftwareDirectory serviceProcess (computing)Computer programmingParameter (computer programming)Address spaceFilter <Stochastik>String (computer science)Formal languageQuicksortExecution unitSingle-precision floating-point formatConcentricSheaf (mathematics)Connected spaceScaling (geometry)Network socketCodeVariable (mathematics)MultiplicationMultiplication signNumbering schemeCASE <Informatik>Different (Kate Ryan album)IntegerSystem administratorDatabaseSocket-SchnittstelleNumberSet (mathematics)Vector spaceCorrespondence (mathematics)Medical imagingRule of inferenceEmailConcurrency (computer science)MiniDiscBitRootMessage passingFront and back endsData loggerRight angleStandard deviationParsingLecture/Conference
Computer fileRippingComputer programPerspective (visual)Information managementOvalKey (cryptography)Data loggerIP addressRoot1 (number)Directory serviceComputer fileTypprüfungComputer programmingParsingString (computer science)CASE <Informatik>BitNumberPoint cloudProgrammer (hardware)Network topologyFunctional (mathematics)IntegerProcess (computing)Type theoryNetwork socketKeyboard shortcutCodeCartesian coordinate systemMessage passingConfiguration spaceMultiplication signError messageConcurrency (computer science)Key (cryptography)Connected spaceWeb 2.0Parameter (computer programming)Pairwise comparisonQuicksortAttribute grammarPoint (geometry)Data structureExterior algebraAddress spaceInterior (topology)MiniDiscSocket-SchnittstelleMereologyMatching (graph theory)Default (computer science)Physical systemComputer architectureOpen setProtein foldingLevel (video gaming)Element (mathematics)Different (Kate Ryan album)SpacetimeEstimatorMappingComputer animation
ParsingPhysical systemMaxima and minimaSystem administratorService (economics)Computer programComputer fileProcess (computing)SoftwareSoftware developerCommon Language InfrastructureInformation securityCodeComputer hardwareComputer networkComputer virusEmailStability theoryModul <Datentyp>Digital filterPoint cloudMobile appOverhead (computing)Entire functionSystem programmingFormal languageData managementSuite (music)Basis <Mathematik>Scheduling (computing)Human migrationAddress spaceChannel capacityPlanningPrinciple of localitySoftwareInformation securityPhysical systemCASE <Informatik>QuicksortConfiguration spaceData structureCodeFront and back endsCoprocessorFlow separationNeuroinformatikService (economics)Point cloudSoftware frameworkWritingProcess (computing)1 (number)Computer programmingInterpreter (computing)Computer fileWorkloadRootInterface (computing)Goodness of fitFeedbackData storage deviceFirewall (computing)Mathematical analysisStandard deviationBasis <Mathematik>Escape characterOverhead (computing)Color confinementAntivirus softwareComputer clusterFunctional (mathematics)Software developerParticle systemGastropod shellSuite (music)Binary codeData managementConnectivity (graph theory)Google App EngineSystem administratorSubsetCloud computingGraph (mathematics)PlastikkarteRight angleAdditionDecision theoryDatabaseNetwork topologyMappingScheduling (computing)Level (video gaming)SpacetimeLibrary (computing)Extension (kinesiology)Data centerProgramming languageObservational studyKernel (computing)Web serviceVirtualizationBefehlsprozessorPoint (geometry)View (database)Partial derivative10 (number)Line (geometry)Cartesian coordinate systemEmailSingle-precision floating-point formatWater vaporVideo gameMiniDiscInstance (computer science)Orientation (vector space)Server (computing)Direction (geometry)Exploit (computer security)Standard errorDebuggerForm (programming)Computer hardwareParsingFreewareQueue (abstract data type)Filter <Stochastik>Lecture/Conference
InformationComa BerenicesSource codeCodeLibrary (computing)QuicksortSoftware testingPoint cloudCASE <Informatik>Functional (mathematics)Repository (publishing)Link (knot theory)Web pageComputer animation
InformationLine (geometry)Metropolitan area networkSoftware testingCloud computingKernel (computing)Firewall (computing)MereologyPoint (geometry)Formal grammarProcess (computing)SpacetimeQuicksortCASE <Informatik>Point cloudEndliche ModelltheoriePhysical systemNamespaceLevel (video gaming)Library (computing)Socket-SchnittstelleKeyboard shortcutParsingRootGreatest elementSoftwareSuite (music)Functional (mathematics)Core dumpImplementationInequality (mathematics)NeuroinformatikDirectory serviceStandard deviationSampling (statistics)BuildingElectronic data interchangeLecture/Conference
Interior (topology)InfinityInformation securityKernel (computing)Standard deviationDirectory serviceImplementationSoftwareNamespaceRun-time systemComputer programmingSystem callMereologyVideo gameQuicksortInterpreter (computing)Library (computing)CASE <Informatik>Personal digital assistantNetwork socketRootFunctional (mathematics)Normal (geometry)RandomizationWeb pagePoint (geometry)Stack (abstract data type)Electronic mailing listEndliche ModelltheorieMiniDiscOpen setProgrammer (hardware)InternetworkingHacker (term)Point cloudCompilerComputer filePhysical systemCartesian coordinate systemLecture/Conference
Network operating systemComputer animation
Transcript: English(auto-generated)
So, according to my phone, it's 4.30, meaning I'd better get started. Don't want to let the German punklichkeit down, right?
So yeah, thank you all for showing up. It's a rather small audience, but, well, it doesn't matter, I mean, it turns out that you people have a really good choice, I mean, thanks for showing up and attending my talk. Today I'm going to talk about something I'm developing called Cloud ABI. So first of all, before I start, all of the work that I present in this talk is open source,
even though I'm developing this for my company. So I do provide professional support on this, but there's nothing that prevents you from using it. So that's sort of the end of all the commercial bingo that I want to share with you. On to the open source stuff. So before I start explaining what Cloud ABI is and go into sort of all the messy details,
let me first give a short introduction to who I am. So for the last seven years now, actually, I've been a developer at FreeBSD, so about ten years ago I started contributing my first bits to the operating system. The first thing I wrote was Xbox support for the original Microsoft Xbox One, which
I wrote together with a guy from the, like, same university I went to. Later on I started hacking on sort of larger projects and sort of the large actual chunk of kernel code that I wrote for the operating system was back in 2008 when I wrote a new TTY layer for the kernel that was SMP safe.
The reason why I started working on this project was because at the time FreeBSD was making, doing a lot of work to improve SMP scalability. And the problem with having a coarsely locked TTY layer back then was that every time a process would fork or terminate, it would actually pick up a global lock.
So forking and exiting, that really sort of didn't scale linearly, what you would sort of hope for. So after that I started working on sort of more projects related to that, other kernel but also user space projects. A year later I started working on a console driver called VT, which eventually ended up
in the operating system and is, I think, in the upcoming version of VB or this one, the default console driver. What's that? Okay, well, pretty awesome. Later on I started working on Clang BSD. So back in 2010, some people at LVM started working on a new compiler front end for their
compiler infrastructure called Clang. And back then almost nobody was using it. Apple was sort of developing it internally and they recently open sourced it. So I thought this is really nice, having a BSD licensed compiler infrastructure and a BSD licensed operating system, that would be a really good idea. So back in 2010 I started working on this and eventually Clang became the default compiler
in FreeBSD for most of the interesting architectures. So after that I did some other work. In 2011 the new C specification came out close to the end of the year. So I immediately got my hands on the latest draft I could find that wasn't behind a paywall.
And I started implementing some of the new features in the language because I think that C11 is sort of a real good step forward compared to C99. Support for atomics, threading support was finally part of the language, and at least basic support for unit code. So all of those features I added those to FreeBSD,
and the latest versions of FreeBSD should do proper C11. So between 2012 and 2013 I didn't do a lot of open source contributing. I did move over to Munich and had a really lovely time there. I had a really nice job there. But unfortunately it didn't allow me to work on a lot of open source software.
Late 2014 I decided to quit that job and start my own company because in my opinion I sort of had a really nice idea in my head that I wanted to work on called Cloud ABI. So I started my own company to build infrastructure for secure cluster and cloud computing. It's actually really broad terms. So the software I present in this talk, it doesn't necessarily need to be used in cloud computing.
But I think that for cloud computing there are some really strong use cases. So during this talk I'm going to, like the talk that I'm going to present right now is sort of chopped up in a couple of separate parts. First I'm going to explain what I think is wrong with Unix.
People have different observations about what they think is wrong about Unix, but this is sort of what I think is wrong with Unix. So I've been using Unix for a decade now, but in my opinion there are a couple of fundamental flaws with the operating system that have never been fixed. So first of all it doesn't stimulate you that you sort of run software in such a way that it's secure.
And what I mean with that, I'll show that in the next couple of slides. And what it also doesn't stimulate you is that you write testable software. Over the last couple of years we see this huge increase in writing software in such a way that it's easier to test. Testability is a really important aspect of modern software.
Not only because it allows us to write software that is sort of more robust, it also allows us to write software that is more reusable. And last I think that systems administration hasn't really improved over the last decade. So when I started Unix it was just maintaining a server
and hacking text files in the ETC to get everything to work. The only difference we nowadays have is that we have some Go or Python tools around it that sort of attempt to make our life easier, but in my opinion they don't do a really good job at that. And I'm going to give a couple of sort of examples where Cloud ABI can be used to make systems administration easier, but those will be more towards the end of the talk.
So Unix security problem number one. In my opinion there are two problems with Unix security and this is the first problem. When we start a process on Unix it can do a lot more than it actually needs to do. So consider a simple web service. You're running a simple Apache or Nginx server that just serves a couple of web pages.
In theory this process would only need to do a handful of things. So first of all it needs to pick up HTTP requests that come in on a TCP socket. Second of all it needs to access some kind of data directory containing your documents that you want to serve over the web, so your HTML files, maybe your PHP files or what have you.
And then optionally you also need access to a couple of database backends. Maybe you also need to have access to a log file, but if you sort of add it all up it's just a really small number of things this web server needs access to. So if you look at sort of what happens in practice, is that if there's a security exploit in the web server
then an attacker can actually do a couple of things that you really don't want to happen. So first of all it can just create a tarball of old world readable data under slash and send that back over a TCP socket to some kind of server on the other side of the world. If there is some kind of file system that happens to be mounted,
say an NFS share or something that contains a lot of sensitive information of your company, then all of that data is suddenly exposed. And you could argue well then you should just set up your file system permissions correctly, but in my opinion defence needs to be in depth. It shouldn't be the case that you're solely relying on a couple of permission bits on a file system to make your entire company secure.
Even worse, an attacker can just register new cron jobs. They just invoke the cron tab executable and then append a couple of lines to the cron tab of the web server's user. So even if you're patching up the web server to sort of no longer be vulnerable, it's the case that the attacker can still every night or so spawn like a backdoor process that it installed at the time that the server was initially compromised.
And even worse, it can also just invoke a couple of set UID command line tools like the write command line tool and can just spam messages to arbitrary terminals in the system. Even if it doesn't have any access to the file systems, it can turn the system into a botnet node.
It can just open news, TCP sockets, perform SYN flood attacks on random servers on the internet, you know, create spam emails, all that kind of stuff. So you just wanted to do these couple of things and in practice you're allowing the web server to do all these random things that you don't want it to. So the second problem with security is running arbitrary third-party applications.
So in the previous slide I was talking about programs that you can trust, sort of. But now I'm going to talk about just random third-party applications that you don't trust. Executing those safely on top of UNIX is incredibly hard apparently because if you're just executing them directly, so you're SSHing into your server,
you're running dot slash random process, then that could really mess up your system. If it's just running as your own user, for example, it can do a lot of nasty things. Even if it's running as user nobody, there's still a lot of evil things a process like that can do. So the last couple of years you see sort of the increase in the use of Jills and Docker and Solaris Zones,
namespace virtualization, and with those it's still quite unsafe actually. So every couple of times a year they discover that there's still a new hole that needs to be plugged. A proc file system instance inside of Docker is actually exposing quite a lot of information that it shouldn't expose.
So in my opinion, Jills and Docker are not really that safe. And then what you can do as sort of a last resort is just run your process in a virtual machine and that's also what you see quite a lot, so that people use Zen or KVM to just run a separate instance and run your processes in there.
But the problem is that it increases maintenance overhead but also reduces performance quite significantly. So the question I ask myself is why can't Unix just safely run third-party executables directly, dot slash whatever, and it should be safe. It should be the case that it can only access the things that you grant to the process.
It should be the case that it can just perform arbitrary tasks. So the other problem with testing, I mentioned it previously, is reusability and testability. So programs on Unix are hard to test and reuse as a whole. And people often say no, it's not that hard, and they just give a couple of really simple examples where they show that it's in fact easy.
But if you sort of look at programs generically, it's a really tough problem. What I'm going to do in the next couple of slides is give a comparison about how we solve testing in a completely different area of computing systems, namely how we solve testing in Java. And if we then compare how we do testing in Java with how we do testing in Unix,
you actually see that Unix is really in the 1980s in that respect. So say I would write a simple Java program, namely a web server. What you would typically do is, of course this class is far from complete, it only contains a couple of members and a constructor function,
but you could write your web server like this. So inside of the class there's a socket member that sort of receives all the incoming connections and some root directory in the file system where files should be fetched from. So what you could write inside of your constructor is when we construct such a web server, create a TCP socket and bind it on port 80 and the root directory is slash, far, slash, dot, dot, dot.
So most people here would agree that this class is not really testable and also not really reusable because, for example, it can only listen on port 80, you can't run two web servers at the same time because they can only,
they can't bind to the same port number twice, and it's also restricted to serving files from the single directory in the system. So what you would typically do if you're like a sane Java programmer, you would write something like this where you sort of extend the constructor to at least take a port number and a root directory path name and set those in the constructor.
And suddenly you can finally reuse your web server class. But most Java programmers out here know that this is still not the way you're supposed to write Java code, because what you would typically do is use something called dependency injection, where instead of letting the class construct the objects on behalf of you,
you construct the objects yourself and provide them to the class. So take a look at this class here, for example. Instead of having a TCP socket passed in, it takes an arbitrary socket. And the advantage of this is that you can create your own mock socket class
and sort of inject requests into it and capture responses. So if you want to test this class, you can just simulate requests and responses without actually opening a single operating system level network connection. And the same holds for the file system access. Instead of using underlying system calls to access the file system directly,
you could use an interface which you can call directory, and it has a couple of member functions like get file contents taking a path name. And then suddenly you can just let this web server run on top of a virtual file system, so like an in-memory file system or on top of a network file system. This is how you're supposed to write Java code.
So the fun thing about Unix programs is that they're not written like the last example. They're really written like the first two examples that I showed. So it's either the case that parameters are hard-coded, so they make certain assumptions like, you know, I must open this file name on disk. And if they're not hard-coded, it's typically the case that the path of the configuration file that they use is hard-coded.
And even if they are like truly parameterized, so you can just pass on all of the configuration on the command line or overwrite the place of the configuration file, it's still the case that these programs acquire the resources on behalf of you. You don't provide the network socket to the web server,
you provide it the port number it should listen on, which is similar to the first couple of examples. So this is a double standard in my opinion. We know what a badly written Java program is, but still for some reason we can't see that programs that sort of use the Unix mindset
are also badly written in a certain way. So here's an example of a web server that is testable. So this is a program that will probably compile in any flavor of Unix, and this is testable in my opinion. Instead of it constructing a network socket that only binds to a specific port, it always just uses file descriptor 0 to call accept on.
So standard in is a network socket that you can provide. And the advantage of this web server is it supports any address family. It supports IPv4, it supports IPv6, even Unix domain sockets. And it supports TCP but also SCTP. So just look at the number of applications out there on Unix
that had to be patched up to support IPv6, while most of them are written in such a fairly trivial way that they could just have the sockets injected. Also if you want to support concurrency, you don't need to write a single line of code to actually get concurrency because what you could just do is create the socket once and then spawn 10 web server processes that use the same file descriptor.
So it just comes for free essentially. And this web server is also testable because what you could do is you could just create a Unix socket and just programmatically inject requests into it and capture the responses. So there's no need to sort of guess a port number that might be free on the server and spawn a TCP socket on it
and hope that nothing else on the network accidentally connects to it or that you're accidentally running it on the same port as a production instance, all those kinds of flaws. That simply won't happen. You can just create a Unix socket and run this web server against it. So now that I've sort of explained what I think is wrong with Unix, namely that it's insecure and not testable,
let me sort of show you the solution that I've come up with to deal with this. So I've developed sort of a new Unix runtime environment called Cloud ABI. And this Cloud ABI, think of it as, you know, Linux is typically capable of running Linux processes, 3BZ is capable of running 3BZ processes.
This is like Cloud ABI operating system running Cloud ABI processes, except that Cloud ABI operating system does not exist and I'll go into that in a bit more detail later on. But Cloud ABI is sort of a stripped-down flavor of Unix that is, in my opinion, better protected against exploits.
So the impact of a security exploit is a lot smaller. It allows you to finally write software that is reusable and testable and also sort of has a couple of tricks that make it sort of fun to use at a larger scale. So I'm not claiming that sort of the entire idea was sort of, that I came up with the entire idea myself.
There are some parts of a framework called Capsicum that I reuse, which is a capabilities framework for 3BZ. So to sort of really briefly explain how Cloud ABI works and what sort of the intent behind it is, is I'm going to sort of explain what a simple process could do on Cloud ABI.
So the most simple process that you can imagine, it starts up and it starts in the most simple way. It can still allocate memory, it can create pipes, it can create socket pairs, it can create shared memory, it can allocate, sorry, it can spawn threads, sub-processes, it can get the time of day,
it can do all sorts of things that only have a local impact. So you can't just open a random TCP connection to a server that's somewhere on the other side of the world. You can't just open a random file on disk. You can't just delete everything that's in slash etc. You can't just send a kill signal to a random process on the system.
It's really just sort of this local environment in which you can sort of compute stuff. It's also worth sort of mentioning briefly that some of the sort of Unix interfaces are not easily compatible with this interface. So for example, the process table. You see that Unix processes traditionally do need to access the global process table
and send random signals out to other processes. So a couple of small extensions have been added to sort of safely create handles to processes, to sub-processes so that you never need to inspect the entire global process table. It sort of remains local. So how can you actually let your process do something useful?
Because, I mean, just computing stuff and not interacting with the network or with the file system is pretty useless. So file descriptors are used to just grant these additional rights. So if you want to process to access a file, you just start it up with a file descriptor to a file on disk and suddenly it can just read from that file, write to that file,
depending on how it was opened. Even more powerful, you can just give file descriptors to directories. So if you have a web server and you would just provide it a file descriptor to slash, var, slash, whatever, then it can just access all of the files that are underneath. So it can't open dot, dot or slash, whatever. It can only access files that are strictly underneath the directory that you pass in.
You can also provide it sockets and suddenly the system or the process becomes networked and it can just answer requests that come in. So what's really nice about sockets on Unix is that at least Unix sockets can be used to pass file descriptors along.
So what you could just do is give a process of file descriptor to another service that grants you more resources. So say you want to build a process that makes outgoing network connections. You can't just open those connections on your own. You have a separate process running alongside, so not a cloud API process, that can open these sockets for you
and then send them back to your process through file descriptor passing. And this is really funny because then you can sort of make user space firewall processes. So as an extension to what POSIX normally offers, these file descriptors have a permission bit mask. So normally on Unix it's the case that file descriptors can only be open for reading,
for writing or for both. In cloud API every possible action that you can perform on a file descriptor is an additional write. So you can say this file descriptor is open for reading, for mmap, but you can also truncate it and you can, for example, call fallocate on this to allocate more space on this.
Anyway, it's really just an arbitrary set of bit masks where you can say I want to allow these actions and I don't want to allow these. And this is actually what's called capability-based security where all of the actions that your process can be formed is not determined by a set of access controls. It's determined by a set of capabilities that your process happens to have at one point in time.
And new capabilities can be acquired, for example, through file descriptor passing, but a process can also discard some of its capabilities by just simply closing those file descriptors. So a secure web service, how would you model this on top of cloud API?
So it's actually sort of, you can almost literally take that description that I gave earlier and for every sentence out there say this needs to be a file descriptor. That's exactly how it works. Your process just has three file descriptors in this example, namely a socket for incoming HTTP requests, a read-only file descriptor of the directory containing the HT documents,
and an append-only file descriptor of a log file. So you can already see that if there's a security exploit in this web server, not a lot of evil things can happen. The attacker can read more stuff from the file system and it can append garbage to the log file, but it can't just throw away the log file or add new files to the web server root directory.
So the nice thing about this model is that it's also flexible at runtime. As I mentioned, the process can gain new writes and also discard writes under the correct conditions. And what it can do is it can apply the principle of defense in depth. So what you could, for example, do is
say you want to build the next version of YouTube where people upload videos on your website and you serve them back to the user. You probably want to transcode these videos because the user just gives you, I don't know, some kind of weird file format that it used on its smartphone and now you need to convert it to sort of a sane format
or even multiple formats that are supported by the devices that you want to support. So what you can do is after you've received the video from the user, you could just fork the web server process and spawn a new sort of tiny container in a certain way that only has access to two pipes, to two file descriptors,
namely one that's used to provide incoming video input and one that's where you write the transcoded output to. And what's really nice is that if there's then like a security vulnerability in the video transcoding library that you're using, say a buffer overflow, the attacker can only write more garbage output to the output video stream
but it can't actually sort of get more insight in how your network is set up internally or interact with HTTP requests that come in from other users. So it's still annoying that the attacker can write garbage output but still the impact of such a security vulnerability is really small when compared to what's currently going on in Unix.
And here's sort of an example of something that's a bit more complex. Say you're sort of interested in running a more traditional web server infrastructure so where you have slash tilde username support so you can go to my domain slash tilde add and it eventually serves the files that are in a subdirectory of my home directory.
What you can do with this model is that you just run a separate process and only that process has access to slash home and you can send an RPC to it saying like hey a web server request came in for tilde add slash index dot html and that process then says
okay here I'm going to give you a file descriptor to tilde add and now you can access all those files underneath. So what happens is that an exploit in such a web server never would yield any write access to the system but in addition to that it's also never possible to access any files outside of the web directory. So it really allows you to put sandboxes in sandboxes
and this is a really beautiful feature. So testability of cloud ABI processes. In a model where all outside or functionality towards the outside of the world is determined by file descriptors it becomes really easy to test software
because what you can just do is you can start up your executable with a different set of file descriptors. If you don't want a process to talk to the production database you can just provide it a file descriptor to a fake dummy testing database server that only returns data that is used for testing.
So it's incredibly easy to test cloud ABI processes. In fact in my opinion it's even impossible to write software that's not testable. So I briefly mentioned a couple of slides ago so that there is no such thing as a cloud ABI operating system.
Think of it as a definition of what a cloud ABI operating system should look like. Which system calls it should support. That's exactly what cloud ABI is. So it's an ABI definition that specifies the list of all the system calls all of the data types and all of the constants. So cloud ABI for example defines that einval corresponds with value 18.
It specifies that an offset in a file is 64 bits. All those kinds of things are encoded in the ABI. My idea is to add support for cloud ABI to other operating systems out there. So what this means is that you can just compile an application once. You can build software in your nice Macbook or Linux workstation.
And you could for example run it on the server that runs, you know, FreeBSD, Minix, whatever happens to support cloud ABI. So adding support for cloud ABI to existing operating systems is not that hard. Because I've already added support to a couple of operating systems out there.
For example FreeBSD. And adding support for this only required me to write 10,000 lines of code for the FreeBSD kernel. So it's really not a large investment. One person just needs to do this. And you can just run arbitrary cloud ABI processes on that operating system. So there are a couple of other operating systems that I'm supporting right now.
Or working on supporting. Namely, NetBSD and Linux. Eventually I just want to support all of the BSDs out there. And it would be nice if macOS was also supported. But I sort of need to tackle them one by one of course. Right now I am focusing on only one hardware architecture.
Namely, x86, 64. I don't think there's a need to sort of support 32-bit binaries nowadays. It wouldn't make a lot of sense. I am actually interested in having ARM support eventually. You see that a lot of interest is nowadays going into ARM. You can already see that with all those Raspberry Pi boards.
Maybe I would support the 32-bit ARM boards. But I might actually skip those entirely and just go for 64-bit computing. So the nice thing about FreeBSD is that I managed to upstream cloud ABI support into FreeBSD. One and a half weeks ago.
So if you happen to have a FreeBSD system that really runs the latest developer snapshots. You're going to run these two commands. Namely, this command you can run it to install a complete cloud ABI toolchain. Which includes a compiler, a linker, a standard C library, C++ library even. And then you can use that to compile C and C++ programs.
But it also includes a kernel module that you can load. And if you load this kernel module then you can just execute cloud ABI processes just like regular Unix processes. So for other operating systems it's actually a bit more complicated. Because I don't have any packages yet.
And the operating system support hasn't been upstreamed. So if you're for example using Linux or NetBSD. And there are a couple of steps that you need to take to make cloud ABI work. So first of all you'd have to install Clang and Binutils manually. This is not that hard, fortunately. It's especially easy because all of the patches that I wrote for Clang and Binutils have been upstreamed in the meantime.
So you can really take for example Clang 3.7. Which is coming out one of these days. And it includes cloud ABI support out of the box. No patches required whatsoever. The same holds for Binutils. The upcoming version also has everything upstreamed. After you have a properly working C, C++ toolchain.
You actually need a couple of core libraries. Otherwise you wouldn't even be able to compile the simplest hello world application. So there's a C library called Cloud Lib C. Which I wrote specifically for cloud ABI. And think of it like this. It contains everything in POSIX. Plus some of the small extensions provided by Capsicum.
The capability based security model that I'm using. Minus all of the garbage that you wouldn't want in an environment like this. So if you're just building some kind of black box application. That sort of is really confined from the environment around it. There's really no need to provide access to the password file.
Or provide functions like could you kill this random arbitrary process. So a lot of these garbage APIs in POSIX. That shouldn't be used in my opinion in a correctly sandbox application. They're all gone. So it's a really lightweight C library. After you've installed this C library. You could install a couple of other libraries like Lib C++ for C++ support.
And Lib Unwind for exception support. Which you like also if you do C++ programming. And once you have all of these installed. You can compile proper cloud ABI executables. And once that's done. The only thing you need to do is patch up your existing operating system kernel. To actually run these cloud ABI executables. So that involves going to the GitHub page.
I'll provide the link at the end of this talk. And check out the proper patch set. Then you should be all good to go. But eventually I'm looking for having at least packages for the toolchain upstream to most operating systems. So if you're sort of into the packaging scene. If you happen to be really good at writing Debian packages for example.
Please talk to me after this presentation. Because it would be really awesome if we also had the toolchain upstreamed into other operating systems. That would really lower the barrier of using this. So in the next couple of slides. I'm going to show you how you can run a cloud ABI process.
And I'm going to demonstrate to you that even though the idea behind cloud ABI is sort of perfect in theory. When I started working on this. I noticed that there was still a missing piece of the puzzle. Which I hope I've sort of resolved. So this what you see here is sort of a simple version of the LS utility that you'd normally have on Unix.
But then specifically tailored for cloud ABI. So this tool doesn't support any of the fancy command line flags of course. But what it can do is it can just simply give you a dump of all the files that are in the directory. It doesn't even try to sort them alphabetically or anything. It just dumps them in the way they're sort of stored on disk.
So what happens is that when this program starts up. It calls these two functions. It's actually sort of the most interesting piece of the program. Where it first opens the directory. So it can iterate through it and extract the directory entries. But it also opens a file handle to your terminal. So it can actually write output to it. So this program uses a convention that's standard in.
File descriptor 0 is the directory that it should traverse through. A file descriptor 1 corresponds with the terminal. So this simple LS utility. You can just compile it as follows. Just install the cross compiler tool chain. And then just invoke cc dash o ls ls dot c. Like you would normally do on Unix.
And then you can run the program by passing in slash edc to standard in. This actually works. This gives you a directory listing. So even though it works. I noticed that it sort of feels unnatural. It's in my opinion not the way to go.
So even though you can use your shell to pass in files to a program. Or you can pass in directories or pass in character devices on your system. The shell doesn't provide an easy and portable way of creating sockets. So if I would run this web server that I demonstrated during the introduction.
I wouldn't be able to start it up from the shell. Because I can't give it a socket. What's also really annoying is that the ordering of the file descriptors might actually be really important. So if your service becomes more complex and you need to start it up with half a dozen or even more file descriptors. Then you can easily invoke it in the wrong way. You need some kind of documentation that would explain.
File descriptor 0 corresponds with the log file. File descriptor 1 corresponds with the web server root. That simply doesn't scale in my opinion. And would just cause a lot of headaches for systems administrators. Even worse, you can't actually deal with a variable number of file descriptors. Say you have a web server that can listen on multiple sockets.
Could use multiple database backends. And multiple of those at the same time. What would the numbering scheme look like? You would need to somehow provide passing command line variables. Saying the first five file descriptors correspond with database backends. And then there's like a seven or so log files.
That simply wouldn't work. I can't see that working. What you also lose is sort of the transparency in Unix. Where you can write a single configuration file. Where you just explain how the entire service should work. You know if you look at the Apache configuration file. There are a lot of configuration parameters that sort of have nothing to do with which resources to access.
They only describe how the process should behave. Well on the other hand you also list a lot of path names and network addresses that the process depends on. So I thought about it a bit. Well actually quite a lot. And I came up with the following solution. I wrote a utility called Cloud ABI-Run.
And this utility is incredibly simple. I think it's only 200 or 300 lines of code right now. And that's mainly because it needs to do some file parsing in there. But how it works, you just invoke it with an executable. And on standard in you provide it a configuration.
And this process allows you to start an executable with an exact set of file descriptors. It makes sure that no file descriptors leak into the process. And it makes sure that none of them are missing. And what it does, it merges the concept of program configuration with providing access to external resources. So that means that you still have your traditional configuration file.
In which you have configuration parameters but also list the dependencies of the process. And how it does that, it replaces the traditional command line arguments by a YAML-like tree structure. So there is no more argv when your process starts up. It has something else, namely a tree structure of configuration parameters but also resources it can iterate through.
So say you write a very simple web server. So this still has nothing to do with Cloud API. But you would just write a simple web server that takes a configuration file. You could, for example, use YAML. And in this configuration file you would have a couple of configuration attributes. Like the host name that is, for example, return on all of the error messages and in the HTTP headers.
You would want to specify the number of concurrent connections that this web server should receive. So in this case, 64. And you would say it needs to listen on this IP address and port number. Then finally it also needs access to a couple of files on disk. So Cloud API run accepts a configuration that looks a bit like this but is annotated in a special way.
What most people don't know is that YAML is actually a typed language. So there is a difference between the string 8 and the integer 8. And you can actually write it down in different ways to sort of remain type safe.
So Cloud API run uses tags from a special YAML namespace, which you see here at the top. And it allows us to use these tags with exclamation marks. Like exclamation mark socket, exclamation mark file to add dependencies on resources that the program wants to use.
So this is almost the same as the previous configuration file. But you see that all of the attributes that either refer to socket addresses, binding on a certain address or path names on disk. They've been extended to use these exclamation mark file and exclamation mark socket tags.
And what Cloud API run does is that it scans through this file, it parses the YAML file for you. And it tries to acquire these resources for you. So it calls socket and bind to obtain a socket that's bound to this IP address. It calls open to open these files, so this log file in this web directory.
And it replaces this by FD tags, file descriptor tags, as references to those file descriptors. So when it created a socket and bound on it, it turned out it was file descriptor 17, 42 and 28 for the log file in the root directory. And this is what's being passed on to the application. Well not yet, there is still one pass in between, namely a sanitizing pass.
And what it does, it closes all of the other file descriptors that happened to be open at the time the Cloud API run was running. And also renumbers the file descriptors to be sequential. And the reason for this is that it makes the execution of the program a bit more deterministic.
So every time you start up the process with the same configuration files, it's also the case that the numbers of the file descriptors match up. Otherwise it would be a bit more annoying when debugging processes. So how does this look from a programmer's point of view? Because eventually you need to access this data from your program. So instead of using the traditional int main, int argc, char argv
function, you may optionally use an alternative entry point called program underscore main. And it only has a single argument, namely an arg data t. And this is a handle to this tree structure, you can just iterate over it. So because the configuration in our previous example was actually a mapping, you know, it's always a key value, it's sort of a dictionary.
We see that this piece of code now invokes a function called arg data iterate map. And you pass in this handle to this node of the tree, namely the root. And you pass it a function that needs to be invoked for every element, so a callback function and also some argument data that needs to be passed.
And now you can just, I mean, I tried sort of simplifying this code as much as possible, I removed all of the error handling. But this would be your configuration file parser. So you see that we first obtained the string value of the key. So in this case we're trying to extract host name concurrent connections listen log file root there.
And we perform a string comparison on those. So if it's a host name then we can just extract a C string argument from this tree structure. So now we've obtained a host name. And we can call this function getfd to extract file descriptor numbers from the tree.
So it's really important to keep in mind integers and file descriptors are two separate types. Because Cloud ABI run needs to know which numbers are file descriptors and which ones aren't. Because it needs to know which of those file descriptors need to be passed on to the new process.
So this is actually a really nifty tool I've discovered. Because it allows you to configure a service securely without any additional effort. If you compare this to SELinux or AppArmor where you have to write separate security policies, a separate configuration file. Something like this is completely not needed for Cloud ABI.
You still have a single configuration file in which you configure the program and you start it up and it's secure. So if you change a path name in your configuration and start it up again, it should still work unlike AppArmor. Also it's impossible to invoke programs incorrectly as in getting the ordering of the file descriptors wrong.
Because programs don't depend on the ordering of the file descriptors. It's no longer the case that 0 is standard in, 1 is standard out and 2 is standard error. Programs start up and they just have a big bag of file descriptors that they have to use to run correctly. And what's also really cool is YAML. It uses YAML 1.2.
And YAML is also a superset of JSON. So you can use any tool that generates JSON or YAML and just pass that data to the program directly. And that's really nice. So there's no more invoking programs through the shell and making sure you get all the escaping right. You can use high level libraries to actually construct the data you want to pass on to the program.
So from a security point of view this is awesome in my opinion. Also for software developers there's no longer need to write a configuration file parser because this all just comes for free. You just run Cloud API with the YAML file and your program receives it in a tree structure already in pre-passed form.
So it also means that programs no longer need to require any resources in startups. So as soon as your program starts running you can already do the stuff that actually matters. Accept requests and just process them instead of first writing tens of thousands of lines of code maybe in a large application to just parse the configuration file and set up all the resources correctly.
So the final thing I want to discuss is what are the use cases for Cloud API. So a couple of these use cases that I present are either things that I sort of made up myself or where I think Cloud API is a good tool. But it's also based on some feedback I got from companies that really showed an interest in using Cloud API for their purposes.
So even though Cloud API is cloud in the name it doesn't necessarily mean that you can only use it for cloud and cluster computing. I've seen some interest from hardware appliance vendors. So for example companies developing storage solutions or firewalls and they're actually
thinking about using Cloud API to harden the processes running on their systems. So in addition to making their software a lot more secure it makes it a lot easier for them to run third party software. So in FreeBSD there exists a technology called NetMap and NetMap allows you to efficiently do firewalling in user space.
So it's a sort of a lockless queue in which network packets are exposed to processes and the processes can apply filtering to them or discard packets. And this would allow people to just write these third party filtering libraries and if
there's a security exploit in it then the appliance as a whole is not compromised. So it makes it easier for network firewall vendors to sort of allow modification or extension of their functionality through third party plugins. Also I've worked for a company that made a sort of email spam filtering appliances and they used a binary blob component to do the spam filtering.
Which is really bad because if there's a security exploit in that spam filter there's nothing you can do yourself to secure this. What if that virus scanner vendor supplied their virus scanner as a Cloud API executable that
would only for example take one pipe for the incoming email, one pipe for the outgoing email. That would make it a lot more secure so even if there are a couple of security exploits in the virus scanner it's still not that bad as it is right now.
So another example I thought of is having Cloud API as a service. So right now people use Amazon EC2 or Google Cloud computing, Google App Engine but in my opinion these services sort of don't tackle the problem as a whole. So Amazon EC2 makes it a lot easier to get your hands on computing resources but it doesn't make life simpler for you.
Because every Amazon EC2 instance you get is basically just a new computer for which you also need to do the systems administration. So we've tried to solve this by coming up with tools like Puppet to automatically administer all those systems. But the problem is in the root in my opinion you shouldn't be doing any whole systems administrator if you have a cloud computing platform.
It should be the case that you just have a program that you want to run be it like a computationally intensive program or a web service. You just give them the binary and let them run it. And Cloud API makes that easier to do without using any virtualization. So right now I think Amazon EC2 uses Zen and Google Compute uses I think KVM, I'm not sure about that.
These impose a lot of CPU overhead but a technology like Cloud API could make it possible to run these systems directly on top of a Unix kernel without any CPU virtualization overhead. And what's also really nice is Google App Engine is a really nice cloud computing framework that I like.
What you do is you just write a whole pile of Python code that you just want to run in the cloud and you just throw it over the fence and Google just runs it for you. The only problem is that Google App Engine only supports a couple of scripted or interpreted programming languages because those are
the ones that they can do sort of analysis on to make sure that it won't escape the confinement of the sandbox. But with something like Cloud API you could just run arbitrary processes. You could say I'm compiling a special Ruby interpreter for Cloud API and just running it with a couple of Ruby files that I provide.
So finally one of the use cases that I've been thinking of that's also really interesting about Cloud API says you could use it as the basis of a cluster management suite. So what you could do is you could just make this really tiny process that just runs on a whole pile of servers and the only thing it does is it just accepts RPCs, instructions, you know, what should I run?
Similar to systems like Kubernetes. But the nice thing about Cloud API is that because you have to provide all of the dependencies of a program explicitly you have a really accurate high quality dependency graph of all the processes. And this allows you to sort of add so much more smartness to the system as what we currently see.
So right now if Kubernetes processes start up and if one of the back ends of a service is down it just sits there, it runs but it fails to connect to its back end. Something like Cloud API, the cluster management system would already know that this is happening, can just say I'm not scheduling this process until all of its dependencies are fulfilled. It could also make more high quality scheduling decisions like I see that all of these database servers are running in one rack over here.
I might as well just run a couple of front end processes right next to them instead of running them on the other side of the data center or maybe even worse in a different continent. So this is a lot easier if you would sort of have a cluster management system purely built on technology like Cloud API.
Also because all the dependencies are known, if you want to migrate a process from one server to the other, you exactly know which files on disk it's going to access and you know what to migrate over to the new server.
So these are sort of like, this is sort of Cloud API in a nutshell. I hope I clearly explain sort of what the intent behind Cloud API is, how it sort of works and what the use cases are. There is a page on GitHub, the Cloud libc repository and it sort of has a nice introduction and some links to some other interesting articles.
And of course the source code itself which you can sort of try it and experiment with. Even if you're not interested in sort of using this in Cloud API, in my opinion it's also a really high quality C library. So if you're interested in knowing how a certain C library function is actually implemented, be sure to check it out. And there's also a whole pile of tests with it, so it's really good to also get some example code on how it works.
There's also an IRC channel on EFnet called Cloud API, be sure to drop by and lurk to see what's going on. And finally my company Nuksi, if you would be interested in commercial support on technology like this or could think of a killer use case, then be sure to contact us. That'll be it. Are there any questions? Wow. Maybe tomorrow, yeah.
I'll be here tomorrow as well, so if you sort of happen to stumble into me tomorrow then just chat with me a bit and let me know what you think about it. You had a question? Exactly, that's a really good remark.
So, exactly. That's a really good remark.
So with Cloud API, a lot of people often sort of present these use cases like, it can't do this. But it's of course really important to realize that Cloud API is not meant to cover the 100%.
There are of course quite a lot of things that really need to be done in a traditional process where you do have access to all of these global namespaces. So what you could for example do is, I've been thinking about this, is let every system run like a master process that can provide access to arbitrary directories, arbitrary sockets.
You know, it can connect to everything and it can bind to everything, sort of the root process. What root could normally do, let that run in the process. And then you would just have these stacking, like adapter processes on top of that, that can do all sorts of interesting filtering and whitelisting.
And that would at least make systems a lot more secure. So even for a cloud computing platform, you could make it completely safe, you could be completely sure that processes running on top of your cloud computing network don't connect to your internal network. It sort of finally allows you to do user space firewalling.
Because what you see right now is that you have all these really complex firewall policies, the grammar of the features that a firewall in the kernel has are constantly growing because people come up with new criteria that needed to be filtered on. And something like this could finally allow you to do all of that in user space. So this is really not meant to cover the 100% right now at least.
It's sort of, I foresee that there will be sort of like a hybrid model where certain parts still run as native processes. But sort of all of the interesting things, you know, where you do a lot of parsing and where stuff can simply go wrong just runs in a cloud API process.
So that's a really good question, yes or no. So I'm really starting at the bottom of course and you're starting with the C library and building up from there.
So right now where I am is that C and C++ really work. As in lib C++, a lot of the standard tests are already passed from the test suite. And I'm now slowly getting to the point where I'm trying to get sort of more high level libraries built on top of this that also include interpreters. So getting Lua to work on top of this, getting Python to work on this. That's really, I don't have anything complete yet but I'm now experimenting with building it against it and
extending the C library to, you know, add non-standard functions that were actually needed by this core library.
So my idea would eventually be that something like C Python, the official Python implementation would just work on this. Albeit slightly different, so the normal Python interpreter has standard include directories that it looks into. In this case it would be the case that you, in your YAML file, explicitly
give a list of directories and that explicitly is passed on to the Python interpreter. So yes, you eventually should be at ease, you can use the standard interpreters for Python or PHP, whatever you like. But starting them up is a bit unconventional but that's the best we can do in this case.
So without a modified kernel is actually pretty hard because you could potentially do it but then you wouldn't have the security benefits.
Because the program could still, in assembly, call the original system call that did provide access to a random path from disk. So the Linux security policy for example is really not powerful to emulate this. So that wouldn't be possible but there's sort of another interesting point of your question.
So what you could do is something the other way around. So a lot of the functions that depend on global namespaces like open that tries to open a global path in the disk. You could add a wrapper inside of the C library so that there is one file descriptor like the root directory of the system.
And something like open would just be translated to open a file underneath the root directory. Like a lib hack that you could use to more easily port existing applications. So working on this has sort of crossed my mind on more than one occasion. And I almost started working on something like this but my concern with that approach is that it basically brings you back to where you started.
If all of the software that you're running on top of this stack is all based on this lib hack and all still assumes a global root directory. And just calls into a special RPC to always just get a TCP socket to a random destination on the internet. Yours then in the end not any better than where you started.
You still end up with untestable, unsandboxed software. Yeah, that's still true.
Yeah, so there are still some advantages to it if you would have a lib hack like that. So my goal would be that eventually it would be just a separate sort of maybe an overlay. So that if you would for example install this lib hack and you would add a special include path to your compiler. That if you include standard IO.h you sort of get a standard IO that stacks on top of the one of cloud libc.
So in cloud libc it only provides the features that don't provide any global namespaces. But then everything that depends on the global namespace is listed in the small tiny standard IO that just adds a couple of missing features to it. And then it would be really clear, then programmers can really clearly decide I want to have sort of the pure capability based runtime environment.
Or I just quickly want to get the software running in sort of a quick and dirty way. That's sort of the model that I foresee but I haven't started on that yet. I'm sort of first trying to see how far we can go by only using this purely capability based environment.
Any other questions? Well then thank you all for your attention and also I liked the questions. They were really good and in-depth.