We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A Component-based Environment for Android Apps

00:00

Formal Metadata

Title
A Component-based Environment for Android Apps
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
With 2.5 billions of active users Android is the most widely deployed mobile operating system in the world. Its vast complexity paired with a monolithic architecture regularly result in severe security issues like the infamous Stagefright bug. In this presentation we talk about an ongoing research project which aims at running Android applications on top of the component-based Genode OS framework and secure them using formally verified components. We discuss how Android applications interact, how well this matches the semantics of Genode and what it takes to support unmodified Android apps.
33
35
Thumbnail
23:38
52
Thumbnail
30:38
53
Thumbnail
16:18
65
71
Thumbnail
14:24
72
Thumbnail
18:02
75
Thumbnail
19:35
101
Thumbnail
12:59
106
123
Thumbnail
25:58
146
Thumbnail
47:36
157
Thumbnail
51:32
166
172
Thumbnail
22:49
182
Thumbnail
25:44
186
Thumbnail
40:18
190
195
225
Thumbnail
23:41
273
281
284
Thumbnail
09:08
285
289
Thumbnail
26:03
290
297
Thumbnail
19:29
328
Thumbnail
24:11
379
Thumbnail
20:10
385
Thumbnail
28:37
393
Thumbnail
09:10
430
438
Android (robot)Mobile appBuffer overflowHypermediaService (economics)Multiplication signLevel (video gaming)Similarity (geometry)VideoconferencingMathematical analysisCartesian coordinate systemLibrary (computing)Complex (psychology)Computer animation
FingerprintConvex hullPhysical systemSoftwarewiederverwendungHypermediaType theoryLevel (video gaming)Software frameworkSoftwareAdditionInteractive televisionWeb browserError messageMessage passingConnectivity (graph theory)Complex (psychology)Presentation of a groupComputer animationLecture/Conference
Physical systemSoftwarewiederverwendungConnectivity (graph theory)InformationDataflowStack (abstract data type)Web browserSoftwareComputer animation
System programmingPhysical systemData structureHierarchyRecursionProcess (computing)DataflowFormal grammarInformationCorrelation and dependenceFormal languageFormal verificationHill differential equationComplex (psychology)Connectivity (graph theory)Data managementInterface (computing)Bound statePhysical systemCodeDataflowSoftwareIntegerCategory of beingFunctional (mathematics)Proxy serverBuffer overflowInformation securityContent (media)InformationInheritance (object-oriented programming)HierarchyRun time (program lifecycle phase)Cartesian coordinate systemException handlingCondition numberInvariant (mathematics)PreconditionerComputer programmingAbstractionNetwork topologyDefault (computer science)Software frameworkFormal verificationDevice driverComputer architectureLimit (category theory)RootDesign by contractTelecommunicationFormal languageBuffer solutionComputer animation
Android (robot)Cartesian coordinate systemMobile appProjective planeInteractive televisionData storage deviceVirtual machineObject (grammar)NumberComputer animationLecture/Conference
Physical systemRun time (program lifecycle phase)InterprozesskommunikationCartesian coordinate systemRun time (program lifecycle phase)BuildingConnectivity (graph theory)Mechanism designComputer architectureKey (cryptography)Proxy serverData storage deviceMobile appPoint (geometry)MikrokernelComputer animation
BuildingComputer fileDisintegrationConvex hullCASE <Informatik>BuildingComputer fileCartesian coordinate systemAndroid (robot)Game controllerNormal (geometry)Connectivity (graph theory)Software testingPhysical systemTranslation (relic)Data structureSource codeLibrary (computing)Direction (geometry)Systems integratorRepository (publishing)Run time (program lifecycle phase)Category of beingComputer animation
Hill differential equationRun time (program lifecycle phase)Android (robot)Cartesian coordinate systemAndroid (robot)Run time (program lifecycle phase)Core dumpVirtualizationJava appletCodeGraph (mathematics)Computer animationEngineering drawingProgram flowchart
Run time (program lifecycle phase)Android (robot)Open setState of matterStatisticsElectric currentRun time (program lifecycle phase)CompilerJava appletCASE <Informatik>Point (geometry)CodeAndroid (robot)Software testingStructural loadCartesian coordinate systemLibrary (computing)Computer programmingDifferent (Kate Ryan album)State of matterDebuggerMechanism designImplementationLecture/Conference
Maxima and minimaLibrary (computing)Euclidean vectorHill differential equationBinary fileProxy serverAndroid (robot)Exception handlingSoftware frameworkConnectivity (graph theory)FluxParsingInterface (computing)CuboidSoftware testingKernel (computing)CodeComputing platformFlow separationRow (database)Buffer overflowComponent-based software engineeringElectric generatorCartesian coordinate systemCASE <Informatik>Semiconductor memoryMessage passingBlock (periodic table)Resource allocationComputer animation
Interactive televisionAndroid (robot)Table (information)Cartesian coordinate systemConnectivity (graph theory)Computer animation
Android (robot)InterprozesskommunikationService-oriented architectureEuclidean vectorCNNMessage passingConnectivity (graph theory)SpacetimeDialectDirectory serviceData structureDifferent (Kate Ryan album)Heat transferInterprozesskommunikationCartesian coordinate systemRun time (program lifecycle phase)Pointer (computer programming)MetadataComputer wormAddress spaceService-oriented architectureSemiconductor memoryClient (computing)Computer fileComputing platformLevel (video gaming)Keyboard shortcutMechanism designINTEGRALTelecommunicationBuffer solutionObject (grammar)Android (robot)Kernel (computing)Physical systemMultiplication signRemote procedure callGame controllerSuite (music)Local ringSoftware testingService (economics)EmulatorElectronic mailing listFilter <Stochastik>Extension (kinesiology)Domain nameCodeInternet service providerIntegrated development environmentSoftware bugFunctional (mathematics)Complex (psychology)Term (mathematics)Transport Layer SecurityRhombusShared memory
Cartesian coordinate systemException handlingMessage passingFormal verificationEndliche ModelltheorieRun time (program lifecycle phase)MikrokernelFilter <Stochastik>Moment (mathematics)Multiplication signMereologySoftware testingCommunications protocolBuffer overflowCircleAndroid (robot)Point (geometry)Crash (computing)Projective planeLecture/Conference
Point cloudFacebookOpen source
Transcript: English(auto-generated)
Yeah, I'll be talking about an ongoing research project, so I have to disappoint you that you're not going to see Android apps on Geno just now, but that's ongoing work. Firstly, the question, why are we doing that anyway?
So the one or the other of you might remember the infamous stage fright bug, which hit Android in 2015. That was an issue where you could exploit the media subsystem of Android by sending malicious videos or other media data. And that would then trigger an overflow in the application.
And as the media service runs with high privileges, you could eventually take over the Android device. And that affected millions of Android devices. And the problem is that the issue has not really been solved since. And so after 2015, and the analysis here goes only till end of 2017, but after that
a lot of similar issues popped out all the time. Here's a spike, of course, because many people looked into that. But in fact, in all your phones that you have in your pocket, this issue is not really solved. There are still issues in these vastly complex media libraries that an attacker could exploit.
Media frameworks are not going to be simple anytime. So we need to do something else instead of trying to implement them correctly. It may not work. So how can we avoid such stage fright type of errors?
This is something that the one or the other of you has seen already in the previous presentation of my colleagues. So we employ component-based architectures. So you have complex pieces of software like the network stack here or the browser. And then you could put a trusted component in between to control the interaction between
these two complex pieces of software. And naturally, this trusted component here in green, this needs to do the right thing. If this component also has errors that could be exploited by an attacker, then you gain nothing from establishing such an architecture.
So that component could, for example, sanitize all the data that's sent between the components or enforce additional policies like only encrypted messages should go from one component to another. We can see two problems now here, the information flow and the correctness of the component.
It makes no sense if the web browser still can communicate with the network stack directly, then we are not going to enforce our policy. So we somehow must make sure that the information flow between the components are as we want it. For this, we rely on the G node OS framework.
I'm not going to talk too much about it because if you just stay here after my talk, there are going to be two talks on G node, one by Norman Feske, who's one of the founders of G node Labs, and I'm pretty sure that he's going to tell you a lot of details on G node.
And after that, G node Labs is going to talk about their port to the AMV8 architecture, which is probably also very interesting. Just some words, G node has a hierarchical structure when it comes to the resource management in the system, which avoids a lot of complexity of the monolithic architectures. On the root of this tree, there is a microkernel, which is only there to control resources
and establish the communication between components, and then such a tree is created and a node in such a tree can only create a trial process, a child node, out of its own resources.
So you have a natural abstraction for where do the resources, if I do anything in the system, come from. And that system isolation is the default. So when a component is created, initially it knows nothing except of a handle to its
parent, and if it needs some resource like a network interface, it can just go to its parent and ask politely, do you have a network interface for me, and then the parent is the only one to decide what to do next, maybe reject the request, forward to another child, which happens to implement the network driver or even implement it itself.
So that's up to the parent, and it's completely transparent for a component on G node where exactly that resource comes from. And you can imagine that this fits very well with the architecture that we have in mind where we can put our trusted components in the middle, and the application doesn't even notice that someone is filtering all the content that goes between two components.
Now we have the other question. Cool. We now have the information flow between untrusted and trusted components. How do we ensure that our trusted proxy actually does the right thing? We do not want to have a buffer overflow or anything, or an integer overflow or something
in that component. We use the Spark programming language for that. We write trusted programs on G node and Spark. Very briefly, Spark is based on the Ada language. It's a compiled language, which you can compile with GCC and LLVM.
And the very nice thing is it has a customizable runtime, and we also created a very limited runtime for G node, which is suitable for these trusted components. And a fourth property of Spark is that it has contracts.
It supports context-based programming, and does have preconditions and postconditions and invariants. So in a function, you can clearly state what it expects before it can run, what conditions you have to establish before you are allowed to execute that function.
And it also has postconditions where you can say, this is what the function guarantees me after it has run. And invariants just can establish certain properties within your code, if you want. Spark comes with a verification tool set which statically analyzes the code. And one thing that you can prove is absence of runtime errors, the thing that I just mentioned.
You prove that no variable ever goes out of range, the bounds of a buffer are honored, and no overflows can happen, no exceptions are thrown. But you can also go further and prove functional correctness.
So you could, for example, add a contract that states the behavior of your function, like the integer is incremented by one, and then prove that this actually happens on all paths in your program. Spark is used for a lot of applications, a lot of safety, like avionics and automotive, but also for security.
How do we apply this approach to Android applications? So this is the project that I've been mentioning, the GAR project. We want to have unmodified Android apps on GNote. But also we want to have formally verified policy objects that can control the interaction
between these untrusted Android apps. So Android runs on a virtual machine which is very, very complex. We do not want to trust Android applications. But we will still use the vast number of Android apps that exist in the app stores.
An architecture could look like this, or it does look like this. So we have the microkernel and the GNote OS components, which I won't go into detail in this talk. We somehow need to integrate that into the GNote build system, so we can actually build the Android runtime and our components.
And if we have this runtime where the actual app runs, they somehow need to interact. There must be an IPC mechanism to make them interact. And of course, we somehow need a way to plug in our trusted proxies that I've been mentioning. And this scenario could, for example, be we have two applications that want to access
the contacts on your phone. This is the contacts application that actually stores the data. And what you would want to have is a trusted proxy that implements a special policy, something like an application can only read out the contacts it edits itself and where a special tag is present. Or we could even try to encrypt the contact data when an application stores it with an
application-specific key. So that's the scenarios that we have in mind. And we will now go through the points that we have solved and that still need to be solved in the future. First, build system integration. This is standby now. How does the Android build system look like? It's called, so the build system is called Song that's been introduced in Android 8.
And this is pretty nice. It has a JSON-like structure in so-called blueprint files, which are purely declarative. That was a really nice property that we made use of here. There are no conditionals, no control folds, as you may know from Makefiles.
It's only declarative. What they do is they handle the complex cases in a special Go component that reads these files. They are then translated to a low-level build system, which is just there for quickly building the sources.
What we have done, we also created a translation tool which can make makefile fragments out of these blueprint files. And they are now suited to be integrated into the G node build system, which happens to be make-based.
Now, in case you know how a G node application is created, it's exactly the same for native Android applications, which you can use from the Android source repositories directly. For a library, you just create a normal G node library file. For an application, you just create a normal G node application makefile.
And then you just have to include a special fragment to build Android applications. And that's pretty much it. Porting native Android applications, which the runtime is one of, is pretty easy now
on G node. And if you build a run script, which is a tool to actually execute applications in test settings, for example, then you can easily, for example, run the test cases that come with Android. And this would be one for a library, which is called the Butils.
And this is the same that would run in an Android test suite, but executed on G node. So the core thing that you need to run Android applications on anything is the Android runtime. This is a Java virtual machine, which executes compiled Java code.
This is not for you to actually recognize anything. But this is the dependency graph of the Android runtime. I believe this is the runtime library. This is the actual executable. So as you can see, there are quite a few dependencies that had to be ported to G node
to make this work. The main binary that's executed is Dalvik VM. That's the front end that's run on Android. It's small. It's just a stupid front end, which loads then the actual Android runtime called libart.
It's loaded as a dynamic library. And this is the one that has all the dependencies. So what's the current state with the Android runtime on G node? So Dalvik VM and libart have been ported to G node. At the moment, we have around about 1,000 test cases that already came with Android
successfully executing on G node. So this is mostly the dependencies that you have seen. Most of the around about 500 Android runtime test cases also successfully execute on G node. And at the moment, the runtime fully initializes.
And it comes to the point where it solves the actual Java program on the ARMv8 target of G node. And then it crashes. This is probably to some issues that we still have. One thing that we know is still a problem is that Android rise on the Linux-Futex implementation.
And we need to find a way to refactor the code that it uses a different locking mechanism. And there are probably some more issues that come through from differences between Genome Flip C and Linux.
Genome Flip C is FreeBSD-based. And so you naturally have some problems there. The runtime compiler. So Android compiles applications at runtime to optimize the code. You can also do that at compile time. The compiler is also basically ported but still needs to be bug fixed.
So the trusted proxies are the second thing that we need to take care of. Once we can run Android applications, we still need to see how we can secure them. One thing is our downsized Android runtime, which I've been mentioning.
This runs on Linux and also on other platforms. It has a stripped-down functionality, like there are no allocators and no exception handlers. The other thing that we are using is an answer to the question, how do you actually build trusted components on such a component-based platform in Spark?
We have built the nice component library, which is a framework for fully asynchronous event-driven components. It supports Genome, but also the Muen separation kernel and Linux for testing.
And we made sure to only include features that we can formally verify using the Spark toolset. And we left out everything else. So you can be sure if you write trusted component in that nice framework that you have a chance to verify it. And there are a couple of interfaces already implemented for logging, timer, block devices,
methods passing, and also shared memory we are currently working on. In case you didn't see the last talk by Johannes Klima, that's probably something that you should check on the recordings. And another ingredient is the record flux framework, which we are also building.
And this is the thing that aims to prevent the overflows when sending data between components. So it's a framework to formally specify messages that are sent between applications.
And what we can do is from this formal specification, we can generate Spark code that's then also verifiable more or less out of the box. And what we can also do is that we can verify the model itself. So if you had a contradiction in this message so that the parser
is not well defined and can have contradictory states, then this is something that we already proved on the model. There's also an extended talk on record flux that you could check the recordings for by my colleague Tobias Heyer.
So now what's up next? There's still a lot to do. And one big topic that we have on the table right now is the interaction between Android applications. So that's pretty much that box. We just were here. How can I run the applications? We covered this.
And now how can we bring that together? How can the components actually interact? Android IPC is pretty special. It's a pretty diamond beast in terms of the functionality that's provided by an IPC mechanism. The IPC on Android is called Binder. And on Linux, on the Android kernel,
it works like an application opens in a device node, def binder. Sometimes it's called difference for other communication domains. And through IO controls, it has to place a rather complicated data structure, which you can see here, which is a pointer data structure
as a reference in that IO control. And what it actually does then is a message passing. But it's unfortunately much more than simple message passing. And there are a notion of local and remote objects in Binder, which have a reference counting inside the kernel Binder mechanism.
There are special objects that you can place in your communication buffer. For example, file descriptors or pointers, which then are transparently rewritten to have the right pointer in the target address space. So you could, with Binder, send a pointer list to a communication partner.
And the kernel will transparently rewrite the pointers. And then the receiver can just use the pointer list. This is pretty much a nightmare to support on a different platform than exactly Android. And that's something that we thought about. Lastly, there's a name service application, which is special. So if you send to the magic component zero,
then you can query for names existing in the name and then get a handle to an application that you want to communicate with. So all in all, this is pretty much a mess. And we started designing an IPC mechanism
that would map these concepts of the Android Binder IPC to G node. And the idea right now is that we built a verified broker components or user level application. And all Android applications, or the runtimes actually, are client of that broker applications. And they share a memory region where they can put the payload together
with these pointer data structure and the metadata. And then they would use message passing to this broker component to initiate the transfer between different shared memory regions. As you can imagine, this is beneficial to have it in user space.
So we don't have the complexity that the Linux kernel has for implementing this binary communication. And it can also be used to easily introduce a component between the broker and the client applications. There's at least one problem here.
So with that concept, you at least need three copies to send the message between one Android application and the other. So one for copying the data into that shared memory region from wherever it's located in the Android application. Then the broker has to copy the data again to the target address space.
And lastly, it has to be copied to the place where the receiving application expects it. And that needs to be seen what the performance impact is of that. So we see that the rehosting the Android runtime
to the G node OS is feasible, apart from from bug fixing that still needs to be done. It's gonna run on G node very soon. The Android build system was pretty helpful there because it's purely declarative. So if you build a build system, make a declarative, don't add any code there.
The extensive test suite that Google already provides with the AOSP was extremely helpful to track down issues that we have without debugging a whole application all the time. And we also have created an environment for trustworthy component that we later can plug between complex Android applications.
So as I said, the next steps would be the user level IPC which we just started working on. And the next step once basic Android applications can run on G node, we need to provide them with the services they expect on an Android system. So somehow we need probably port some of the existing Android of services
but also provide emulation stubs that then just bind to a G node service that already exists. Integration of the UI subsystem is definitely a topic to actually see the applications. And then we wanna implement the trusted filters. And I imagine, for example, the calendar filter that I mentioned earlier.
So thank you. Other questions? So the question was, I said things get verified.
What do we actually verify? So for the baseline is absence of runtime error. So no exceptions of phone buffer overflows and so on. For the protocol filters I was mentioning,
we also prove that the high-level model of the messages is actually correctly paused. Apart from that, there are no trusted filters for that Android setup at the moment. So there's nothing to prove right now.
Dominik? So the question was, there's the Fuxia project by Google and they have the circle microkernel. How does our things relate to that project?
Yeah, Fuxia is component-based. It has a microkernel. The point here is I just don't know what's gonna happen with Fuxia. Is it ever gonna replace Android? Is it going to vanish tomorrow?
I don't know. So there are similar concepts, but I have no idea where Google is taking that to. We need to see. It's definitely gonna be interesting. Yes?
So the last part was? OK. The question was, to run Android applications on Geno, do we need to rebuild them or can we just use it unmodified? And the goal definitely is to use it unmodified and all the test applications we already
run in the JVM until it crashes are the same that you would run on your Linux JVM or on your Android JVM. So except from native code, which we need to see how to handle it, if it's an application without native code, it will run without modification.
And time's up. Thank you very much.