We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Peer to peer OS and flatpak updates

00:00

Formal Metadata

Title
Peer to peer OS and flatpak updates
Title of Series
Number of Parts
50
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Recently, work that we have been doing on Endless OS to allow peer to peer OS and flatpak updates has been reaching maturity and nearing wider deployment. This talk will give an overview of how we support LAN and USB updates of OSTrees, how it fits in upstream in OSTree and flatpak, and what you’d need to do to enable peer to peer updates for your OSTree system.
24
Thumbnail
15:29
25
Thumbnail
21:21
32
44
SpacetimeMobile WebSoftware frameworkDistribution (mathematics)Physical systemPeer-to-peerLocal area networkSystem programmingRepository (publishing)Content (media)NamespaceIntrusion detection systemComputer fileElectronic mailing listComputerKeyboard shortcutMappingInformation securityComputer fileDifferent (Kate Ryan album)Software developerHeegaard splittingContent (media)Physical systemBitElectronic signatureBranch (computer science)Mobile appSign (mathematics)Repository (publishing)Revision controlKey (cryptography)Local area networkStructural loadRingnetzCartesian coordinate systemCASE <Informatik>Intrusion detection systemLocal ringNamespacePeer-to-peerTerm (mathematics)CloningDistributed computingNeuroinformatikObject (grammar)Directory serviceOperating systemHierarchyDomain nameArithmetic meanQuery languageError messageReverse engineeringCore dumpAddress spaceSoftware frameworkDistribution (mathematics)InternetworkingMetropolitan area networkEntire functionRemote procedure callCombinational logicDescriptive statisticsFile systemMeasurementConfiguration spaceElectronic mailing listSpacetimeComputer iconPiMetadataLevel (video gaming)Data storage deviceCollisionTupleSlide ruleKeyboard shortcutCurvatureSoftwareComputer animation
Keyboard shortcutRepository (publishing)Local area networkLocal ringServer (computing)Digital filterUniform resource locatorFilter <Stochastik>Message passingRow (database)Computer fileAuthorizationRingnetzLocal area networkRepository (publishing)MetadataQuicksortLine (geometry)Mobile appElectronic signatureVirtual machineRevision controlPhysical systemTime zoneInsertion lossInstallation artOperator (mathematics)Web 2.0Matching (graph theory)Local area networkDirect numerical simulationServer (computing)
Local area networkRevision controlDirect numerical simulationDigital filterCodeSystem programmingConfiguration spaceSoftware developerPhysical systemOscillationCASE <Informatik>RingnetzFreewareNeuroinformatikPhysical systemCollisionTable (information)Connectivity (graph theory)UnicastingverfahrenProcess (computing)InternetworkingSoftwareInternet service providerCommunications protocolRepository (publishing)QuicksortStudent's t-testVirtual machineHash functionConnected spaceTelecommunicationPeer-to-peerDrop (liquid)Web 2.0Row (database)Shared memoryComputing platformBitCurvatureMobile appGraph coloringVirtualizationFunctional (mathematics)Metric systemDistribution (mathematics)Filter <Stochastik>1 (number)Slide ruleFile systemConfiguration spaceOnline helpIntrusion detection systemFeedbackDirect numerical simulationServer (computing)Near-ringCrash (computing)Default (computer science)CodeNetwork socketElectronic signatureNumbering schemeLocal ringMultiplication signRobotSpacetimeSoftware testingComputer fileFlow separationStructural loadElectronic mailing listWide area networkRouter (computing)Computer animation
Local area networkServer (computing)Direct numerical simulationConfiguration spaceSystem programmingRoundness (object)Meeting/Interview
System programming
Transcript: English(auto-generated)
Hello, everyone. So I'm going to talk about peer-to-peer OS and Flatpak updates, which is something that we've been working on at Endless over the last year and a bit. So I'll define some terminology first,
because these terms are a bit vague sometimes. Flatpak, you've probably all heard of. It's a Linux application sandboxing and distribution framework. For the purposes of this talk, we just need to know that it's basically something that uses OS tree, which provides things that users care about. We don't need to do any more details about how
Flatpak works. The OS in this case is Endless OS, which is a Debian-based operating system. Again, you don't need to know any more details about it, apart from the fact that it's based on OS tree. OS tree is very briefly Git for operating system file trees. But I'll go into more details about what
that means on the next slide. And peer-to-peer updates, we care about updating from other computers on the local network and from a USB stick that's been pre-prepared by someone else with updates for you. So the overall goal is to be able to distribute system
updates and to distribute new versions and new and different applications to other people without them having to download them over the internet. So OS tree, which is kind of the core of how this is all done and the core of how Endless OS is implemented and the core of how Flatpak is implemented,
it's kind of like Git. It's a content address file system where you dump files in. They are hashed by their content and stored as a path based on that. So each of them has checksum. Each object can be a file, it can
be a directory tree that contains a hierarchy of files, or it can be a Comet object which contains some combination of files and directory trees. It's got refs, which are basically human readable names which point to a Comet, just like a Git branch does. And they can change over time, so you can point them
from an old Comet to a new Comet to a newer Comet. It's got remotes, just like Git remotes. They are a little configuration saying, here's some repository on the internet somewhere which you can download updated refs from. Each remote has a name which you choose locally. By convention, it's always the same,
but it doesn't have to be. And as I said before, Flatpak is based on OS tree. So in the Flatpak world, you've got apps. Each app is a ref in OS tree terminology. And when you deploy an app that is a Comet from OS tree with a load of files that contain whatever the app needs, it's binary, it's icon, whatever.
So how do we add peer-to-peer support? What we have to begin with with OS tree are these refs. These refs are unique per repository. So just like a Git branch, you've got a master branch for each Git repository you care about. But each repository has a master branch.
So if you want to update your master branch, you need to know what repository you're caring about. So if you have a ref in OS tree, and you want to update it, you need to know what ref that actually is. So if I have an app that I've produced locally called GEdit,
and there's also a GEdit produced by the GEdit developers and published separately, they're going to have probably the same ref name, but they probably refer to different content. So what you need is a global namespace, which disambiguates those and says that the GEdit over here that I've produced locally is mine,
and the GEdit over there is theirs. And those should be considered separate things. So we need global namespace for refs. How do we do that? We added something called collection IDs. So these are basically a globally unique version of the remote name. So you can uniquely identify each repository and clones
of it mirrored around the world with a collection ID. So if GEdit were to be published by the GEdit developers in their repository, they would set a collection ID for that repository. If that were to be mirrored by GNOME, they would copy the same collection ID, so that all the refs for either are considered to be equal.
And then if I were to publish my own GEdit, I would choose a different collection ID because I'm publishing something which is not necessarily the same as what they're doing. So if you take a collection ID and a ref and consider those as a tuple, that becomes globally unique. And you can use those to look up and query for updates
for the refs you want wherever. Vivek, you've got a question. I can repeat it. Is there a convention or some enforcement for making people not collide with the names of their collection IDs? There's no enforcement, although if someone
were to choose a collection ID that had already been chosen and it started to collide with things, then errors would appear everywhere. There is a convention to use reverse domain names, same as most other things. But this is documented. Any other questions so far?
Cool. So to summarize, collection IDs are like a name for a remote but configured globally rather than locally. Does this mean that you could prepare a malicious version of one app, put it on a USB stick, walk over to somebody else's endless OS computer, then essentially update the app with the malicious version?
You would have to have the GPG key from the original. So there is a measure of enforcement. Yeah, I'll come on to that in a second. But I'm very perceptive. So yes, summary files are the other half of the problem. OSTree has, each repository has a summary file
and that contains, amongst other things, a map of refs to comment checksums. So it just gives a complete list of what the repository contains and the comments for each of those refs. And it contains some other metadata, like the repository description and, yeah, its name,
localized name. And the summary file is signed by the same key for the entire repository. So you know the summary is authentic for that repository so that you can't have a man in the middle interject a summary that contains incorrect data or malicious data as you download over HTTP,
which you can do for OSTree. So it's traditionally signed as one big blob of stuff. It lists the refs. If I am on a local area network and I've got some refs from this repository over here and some refs from this repository over here and maybe some from my operating system vendor
over here, and I've installed all of those and I want to expose my local OSTree repository onto the network, I've basically got things from three different summary files and I need to combine those into one. I can't do that if I've got one signature for the entire file because A, I can't reproduce any of the signatures from any of the upstream vendors
and B, I can't sign with three keys somehow for different bits of content. So the solution there is to, yeah, that's the problem. The solution is to drop the signatures. That reduces security, you say.
So we reintroduce security by implementing it a different way, splitting the ref mapping up and flipping around how the security is done. So instead of having a summary file which binds a ref, which is the name to a comment checksum, you have a ref binding which is in the comment
which binds the ref name to that comment. So it's kind of backwards rather than forwards. And you can do this because each comment in OSTree has some metadata saying its date and the author and maybe a comment message, just like Git comments do.
And this metadata is always signed by the person who built the repository. So if you put some extra metadata in there that says this comment should be on this ref and maybe also this ref and this ref and then you sign the whole lot, you can always check whether a comment that you've downloaded and looked at
is actually meant to be on the ref that you thought you downloaded it from, which means that when you get rid of the signature on the summary file, although the summary file isn't trusted, you can then verify from everything that you pull back from it. So now the groundwork is in place. Those are the two big problems that we're in
in the way of doing peer-to-peer updates. We can do them. So with USB updates, we essentially take an OSTree repository and put it in a well-known location on a USB stick. And with the LAN updates, we essentially just expose an OSTree repository on the local network with a web server.
With the LAN stuff, how do you actually find the updates on your local network without going to every machine and saying like, what refs do you have? Can I have all of them? Are they up to date? Because that would result in a lot of unnecessary traffic. So like with a LAN of 30 machines, in a small business or a school or whatever,
each of them have 100 refs, like a couple of your operating system, various apps that you've installed. Many of them will be at different versions. How do you actually find which refs that you want and which are up to date and which machine has the latest update that everyone else can pull from?
The solution is to take a Bloom filter of the refs on each OSTree and put it in a DNS SD record with Avahi. And then when you're updating from that peer, you will check whether the ref you want is in that Bloom filter. If it isn't, you don't care about it anymore. If it potentially is,
because Bloom filters aren't entirely deterministic, you download the summary file from that peer and you check to see if it does actually contain the ref you want or whether it was a false match. And if it does contain the ref you want, you then download that comet, check the GPG signatures are all correct and match the ref,
and then download the rest of the comets and update from them. The code for all of this has been done completely upstream in LibOS tree and in Flatpak. And there are some components in our updater for Endless OS, the OS updater, which is also free software.
And it has already been supported in various upstream repositories where the collection IDs have been added to their configuration. So, FlatHub, for example, has a collection ID set. So you can already use FlatHub apps with peer-to-peer updates if the tooling you are using supports it.
I mean, that is still being shipped out to distributions and probably hasn't been enabled in many places yet, but the pieces are in place. The components that we have in the Endless OS updater, they are the bits that if you wanted to implement this for yourself, your own distribution or platform,
they are the bits you would have to replicate or adapt. They are not shipped by default by Flatpak. So we have got a web server and a DNS SSD record generator for the LAN sharing, which basically takes your local OS tree repository, exposes it over the network,
and also updates an Avahi list of DNS SSD records and generates the balloon filter from the refs that you have, and various bits of plumbing for that to integrate it with systemd and do socket activation. This has been worked on by quite a few people at Endless.
We have got Matthew, Rob, Dan, Kreshmir, and me, and then lots of help and review and feedback from Colin and Alex at OS Tree and Flatpak, and also a lot of reviews and merge testing done by the RH Atomic Bot.
That is it. We are hiring, so if anyone wants to talk to me about that, please do. We are looking for a desktop engineer and tooling engineers, but the code is all there. Has anybody got any questions? I feel that was like a lightning approach to it, so I can expand in detail on anything.
Does this DNS SD record compare to distributed hash tables? To what, sorry? Distributed hash tables.
I think distributed hash tables took up a lot more space, and they were not as easy to implement, but yeah, I cannot remember the details now. We wanted the Bloom filters to take up.
There were various restrictions on what we knew could be supported by different routers as they forward DNS SD packets. Some of the larger DNS SD packets just get dropped, so we wanted the Bloom filters to be really small. Distributed hash tables, I think, come out a bit bigger,
but yeah, I do not know. It has been a while since I actually looked at that. Bloom filters certainly do what we want. They allow you to have more than enough refs in your local repository that you are advertising, sort of several thousand, if I remember correctly, before the probability of collisions
becomes too high to make it worthwhile. They do mean that you can massively cut down the amount of requests you have to make, like unicast from you to the computer you think has a ref. By using the Bloom filters to call the ones that definitely don't have the refs you want.
Does that answer a bit more detail? Yeah, thanks. Alex, back there. I was promised a color emoji in the slides. You were promised an emoji. You got two emojis.
Well, it was black. It is a color-ish. A color emoji would have detracted from the color scheme of the slides, I think, so sorry. What about WAN updates?
Oh, what, sorry? Why just restrict it to LAN updates and not do updates over the entire internet? I guess by using DNS-SD, you sort of restrict yourself technically, but is there a reason for not thinking about WAN updates? The original use case we had for supporting LAN updates
is because endless OS is something that we want to run on computers which have restricted internet access, and particularly the use case we were caring about was schools where the teacher's machine will have an internet connection, and then there will be loads of student machines that don't, and they can only connect to the teacher's machine.
I can't immediately see there being an advantage in doing updates over a wider network because the cost of communication between all the peers would start to get very complicated, and also didn't want to go into writing an entire distributed hash table file system.
That's not my thing. This does the job for the use case that we cared about. There's nothing stopping people from writing one in future, so the underlying APIs in OS tree
that allow my computer to say, I want this ref, and this ref, and this ref, find updates for them, they will in parallel look on the internet, on the local network, and on any USB sticks that are plugged in, so you can always write an extra provider which would, I don't know, look on bit torrent or something,
or some other custom protocol. As long as it can implement, there's a couple of virtual functions, basically, but it can do all of these things in parallel, and it will take whichever updates appear first and look likely for some moderately well-defined metric
of what likely means. So, yeah, it's a possibility in future if you want. Any other questions? So, thank you, Philip, and give him a round of applause.
Thanks.