We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

dbus-broker

00:00

Formal Metadata

Title
dbus-broker
Subtitle
A Linux D-Bus Message Broker
Alternative Title
Status update
Title of Series
Number of Parts
50
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
dbus-broker is an implementation of the DBus specification, intended to be a drop-in replacement for the reference implementation on Linux. It is now scheduled to be the default system and user bus in the next Fedora release. This talks shows some of the lessons learned during this relatively young project, the challenges faced, what the current status is and the plans for the future.
24
Thumbnail
15:29
25
Thumbnail
21:21
32
44
SpacetimeSystem programmingService-oriented architectureMessage passingPhysical systemDemo (music)Service-oriented architectureVariable (mathematics)SoftwareProjective planeRadical (chemistry)
MathematicsSystem callCurveLatent heatService (economics)Multiplication signDemonMereologyFunctional (mathematics)CASE <Informatik>Bus (computing)Order (biology)MathematicsCycle (graph theory)Right angleExpected valueLine (geometry)Message passingTask (computing)ImplementationScaling (geometry)Equaliser (mathematics)Forcing (mathematics)Variable (mathematics)Computer animation
DeterminismMessage passingCondition numberMessage passingClient (computing)Service (economics)Group actionDemonResultantSystem callSemiconductor memoryLine (geometry)Local ringVirtual machineConnected spaceMultiplication signLatent heatBitSoftwareCASE <Informatik>Computer animation
System programmingMilitary operationSystem callDemonQuicksortService-oriented architectureMessage passingGroup actionPeer-to-peerBus (computing)Different (Kate Ryan album)State of matterMatching (graph theory)Independence (probability theory)MathematicsBroadcasting (networking)Meeting/Interview
Physical systemPeer-to-peerHill differential equationPhysical systemMessage passingBus (computing)DemonEnterprise architectureBuffer solutionClient (computing)Identical particlesSystem callSoftware testingRight anglePoint (geometry)Noise (electronics)Group actionDifferent (Kate Ryan album)Interactive televisionSpeech synthesisPeer-to-peerComputer animation
Physical systemPeer-to-peerWeb pageDefault (computer science)Physical systemMathematicsGame controllerBenchmarkProjective planeMereologyTraffic reportingDifferent (Kate Ryan album)Exterior algebraSubsetImplementationComputer animation
Service-oriented architectureCASE <Informatik>StapeldateiBefehlsprozessorBroadcasting (networking)Multiplication signClient (computing)NumberScaling (geometry)Matching (graph theory)Exception handlingBenchmarkBus (computing)Service-oriented architectureBitSystem callObject (grammar)DemonPhysical systemParallel portLinearizationInstallation artEvent horizonMessage passingCartesian coordinate systemLogical constantMathematicsRaw image formatCommunications protocolSemantics (computer science)Different (Kate Ryan album)Interface (computing)Roundness (object)Operator (mathematics)UnicastingverfahrenState of matterMeasurementSingle-precision floating-point formatComputer animation
Service-oriented architectureWeb portalMultiplication signCodeImplementationDemonProjective planeSoftware bugPlanningRight angleMessage passingError messageDrop (liquid)Software testingMereologyCASE <Informatik>Statement (computer science)Library (computing)Repository (publishing)CurvatureInstance (computer science)State of matterIntegrated development environmentService-oriented architectureRhombusBus (computing)Computer animation
Service-oriented architectureSystem programmingInternational Date LineClient (computing)Flow separationNetwork socketConnected spaceBus (computing)ImplementationMessage passingService-oriented architectureVulnerability (computing)Ferry CorstenElectronic mailing listMultiplication signSoftware bugObject (grammar)Instance (computer science)Condition numberMeeting/Interview
System programmingDifferent (Kate Ryan album)Extension (kinesiology)Public key certificateWordBitOcean currentSoftware maintenanceProcess (computing)Absolute valueService (economics)Normal (geometry)Semantics (computer science)Network socketMereologyCodeLatent heatMeeting/Interview
System programming
Transcript: English(auto-generated)
Hi, this is Tom, I am David, we are both working on a project called D-Bus Broker. Over the many years we have always been told D-Bus is fine, and today I want to tell
you, well, D-Bus is fine. It works. Many of us use it, many of you probably use it, we use it at Red Hat for a lot of things, a lot of our software relies on it for many, many years now, for over a decade we have been using D-Bus. So why do we keep talking about it, and why do we want to talk to you today about D-Bus?
To explain that I want to start with a short demo. Imagine you use the tool deconf. Deconf is a very simple tool to set and retrieve and query global variables.
In this example we use our terminal to write a value 2017 into a variable, all systems go year. And then we can use deconf to query that as well, and it will return us a value. Now fast forward one year. We of course want to update that variable to 2018, but this time we want to use the
underlying bus call. Deconf underneath uses for writes a bus call to the deconf service. And here we have an example where we use a command line tool to do the same thing. We call a function called change on deconf. Most of the upper part can be ignored, and as you see in the last line we set the variable
ASG year to 2018. And to verify that we try to read it with the older one. And we notice it wasn't updated. We don't know why, everything, like if you look into it, everything works. So we just try again. So we do the same command again with a busctl call, and we try to read it again.
And suddenly the value appears. So if you try to reproduce it in a command line, it's very easy to reproduce. So what happened here? Like for some reason the message did not reach the deconf service. This is of course unexpected. And of course this is not what D bus does, like what D bus daemon does all the time.
There's some special case why D bus dropped that message. And of course this is a very crafted example in this case, but it's still unexpected. And the key reader might notice that we have one line that says expect reply equal false. It's a D bus feature that says I sent a method call, I'm not interested in the reply.
And usually it's just so the other side doesn't have to send a reply and you save on a few cycles and you don't have to assemble a reply. But in this method we were able to craft one situation where for no particular reason the D bus daemon decided to just spuriously drop the message.
And this is definitely against intuition and it makes it like it's an edge case that is hard to predict and hard to deal with. Because there's really no real reason to drop this message. And these edge cases exist in D bus daemon for a long time.
People have dealt with that in the past. And sure, some of them are edge cases for some people, but for other people they might end up debugging for two hours some weird situation just because they hit that edge case. Right.
So the thing is that this is not a problem that is inherent to the D bus specification. This is just some edge case that we found in the D bus daemon, the reference implementation of it. And for this reason and many other reasons like it, we decided to try to investigate if we could write a replacement for the reference implementation that doesn't have all the same problems.
So we decided to go from, to find what is the right design principle that we could use in order to still implement the same specification, still be a drop-in replacement, but to get around some of the issues we discovered here. So firstly, what we want to achieve, one thing we want to achieve is deterministic behavior.
So what really is happening here, in this example, is a race condition. So if you are in a very special situation, that you have an activatable service that is not yet activated, and it is, when you send a message to it, and the client has sent the message,
leaves the disconnects from the daemon before the target service has been started, then that message is dropped. So there's a timing going on there. If you waited a bit longer before the command line tool finished, it would have worked. So this is not something that is documented in specification, obviously,
and it's something that would take quite a lot of effort to figure out why on earth was this message randomly dropped. So this kind of stuff, obviously, we don't want to do. You make an action on the daemon, and then afterwards you do something else, unrelated, and it affects the result of what you did.
Moreover, we really don't want to drop messages silently. By that I mean that if one peer sends a message to a different peer, and it doesn't, for whatever reason, go through, then either the target or the sender should be notified about the problem.
I'm not talking about writing some message to a log, but one of the two parties should know that something was dropped. Like, we're not on a network, we're not sending UDP packets to the network, where you know that, of course, packets can be dropped. We are on a local machine, there's really no excuse that packets just go missing and nobody's told about it.
Moreover, we have limited resources on our machines. So it could, of course, happen that we just don't have the resources to perform some action, like sending a message. Sometimes, messages must be dropped.
So the thing that we want to achieve, as I said earlier, is just not to do it silently. So in that case, you end up sending a message, sending a method call on somebody, and there is just not enough memory available to do that. You have to figure out what you're going to do. So if you send a method call, you can just simply reply, the daemon can reply to the sender,
that you're out of memory or you're out of your quota, and you can't do it. Straightforward. How about if you receive a method call and you want to send a message, a reply to it, and then you don't have the resources to send a reply? Now you have to think, well, it's sort of different. I mean, what happens to the daemon's daemon is that if you don't have the resources,
daemon tells you that the reply failed to be transmitted. But that's not really what you want, because firstly, nobody is listening, waiting for the daemon to tell them about replies to replies. And secondly, it wasn't really your fault. Somebody else triggered you to send the reply, and they were not able to receive it.
So what we are trying to do, when an action happens on the broker, we're figuring which user or which peer is responsible for the action, and we are blaming or counting on the responsible user, not just the one that made the action. So if a reply is being sent, it's the destination that's responsible.
If a method call is being sent, it's the sender that's responsible. Lastly, we want to make sure that the daibus daemon or daibus broker keeps state about all the peers connected to the bus. They can be matches that they've installed because they are interested in broadcast,
can be outstanding message replies, and so on and so forth. And what we want to make sure is that if two peers are talking to each other, the state of independent peers, that has nothing to do with it, should not affect things like the performance and so on.
So let's look at an example. Again, we are now using a simple command line tool to do a message call. We are calling now on systemd the ping method. This is a standard method that most daibus clients implement.
And we're just pinging systemd here at random, it doesn't make a difference which daemon we are talking to. And so you can send it, sorry, you send a message call, bus it yell, and it returns, so you send a ping and you get a pong back, so it replies. It's just a simple echo test. Now, we made a little client that shows our little problems.
So we call our own client called noise, and we pass it in a name we want to send noise to, basically. So the point here is that this could be running as a different user, unprivileged, nothing interesting going on here, really. We are doing some action on the bus, targeted at systemd,
and now we try again with the same thing. We try again to call ping, and now it doesn't return. So somehow, this client, which is unprivileged and running as a separate user, should not be able to affect the interaction between our user and systemd, is able to stop systemd from replying.
So we just have to cancel that. And it's not just this one method call. Actually, with this test client we made there, we are able to basically mute systemd, or any client on the bus. So we are able to stop them from sending any messages at all. So that's bad, right?
We shouldn't be able to have an unprivileged client that sends the messages and targets some other client and just makes them stop sending anything at all. So what's really just to go back to what's actually happening here, that noise is doing the same thing. It's sending ping to systemd, but it's not reading out the replies.
So we send ping to systemd, and systemd answers it, but we are not reading it out. So the buffers inside of the daemon are growing, and because the bus daemon doesn't distinguish between method calls and method replies in its accounting, the person that appears as being blamed for the growing buffer is the wrong one.
So it ends up that systemd is sending too many messages, so they are blocking them from sending any further messages. So in dbus-broker, we do it the other way around. So we track who's responsible, and we will then tell this guy that they're no longer allowed to send any messages,
but this thing still works just fine. So unrelated peers are not affected. So the dbus-broker project is our alternative to dbus-team, where we try to follow the principles that Tom explained, and some more principles, which we discussed in detail at our DevConf talk this year.
So if you're interested in why we picked these principles and how we follow them, you're more than welcome to talk to us or to look at that talk. dbus-broker today is ready to be used, and there is actually a Fedora change request to accept that, and dbus-broker will become the default in Fedora F13 as it is scheduled right now.
And you can already use it on the webpage we have at the end. There are instructions on how to use it, as simple as installing the package and running one system control enable command line. And it should work as a drop-in replacement for dbus-team with no observable differences. It should.
If there are bugs, you're always more than welcome to report them to us. So as a last part of this talk, we want to describe or show some of the benchmarks we did afterwards, so when we did the final implementation of dbus-broker. And in this, we have the most basic benchmarks we can do.
We sent a unicast from a client to a different one, and measure how long it takes. The second one is a pipeline call where we try to send many method calls without waiting for them to be done, and the third one is a round trip where we send a message and wait for the reply. So it's two messages that are sent.
And under all benchmarks that we did, we could observe two to three times speedup compared to dbus-demon. And other than that, the basic benchmarks don't show any algorithmic change in the speedup.
So we're observable in all of these three cases, just about three times faster as dbus-demon, which surprised us and made us a bit happy, I assume. The next example was just connecting to the system bus. And we have two benchmarks again,
where we just connect to dbus-demon or dbus-broker, and the second one is including sending a first message, because a lot of the things people do in dbus is create a new command line application or something that just sends one call out, or sends a bunch of calls out, and then disconnects again. And again, we see a three times speedup compared to dbus-demon,
which we achieved with dbus-broker. But other than raw performance, what we talked before against, we want to make sure that everything on a bus scales properly, so that things that do not affect your operation in a semantic sense also don't do it actually when running the command lines, when running the messages.
So, two more benchmarks that we have to show is, we created a very simple measurement of a single message, but changed what kind of background messaging or background state the demon has. So in one example, we took a lot of matches
that we install for objects on the system bus, and increased the number of matches, and then looked how long it takes to send a single broadcast that doesn't match either of these matches. And we tried to use a common case, so we don't try to just create a fake example that wouldn't happen,
but instead what we said, we imagine the case where many objects exist in the system, and we say, of course people match for these objects, but there might be an event for only one of these objects firing. And as it turned out, dbus-demon scales linearly with the number of matches you install on these different object paths, even though when you send one broadcast,
it only affects possibly one of these matches. And in our case, we made sure dbus-broker always scales, in this case it looks constant, but it actually scales logarithmic, so you cannot on the system bus install matches that are unrelated to a specific broadcast,
but it won't affect the behavior on dbus-broker, but on dbus-team it might, in quite a lot of cases, except for interfaces, affect the behavior linearly. Another example that we did is, instead of installing many matches in background,
we made a lot of outstanding method calls, so we said there are many clients running in background, which just send method calls and wait for the reply. And then we sent a single method call and measure how long that one method call runs. And again, we see a linear behavior of dbus-demon, linear behavior and a constant behavior for dbus-broker,
and in this case it is really constant. And it means, regardless of how many other outstanding method calls there are in background, if you send one method call, it will be in constant time on dbus-broker. With the one exception here, we always assume the CPU time, like these method calls are not running in parallel, it is just outstanding. So we run them before and then just measure
that time used for that one method call. And yes, all of these benchmarks show that we try to follow our principles, that things run independently of each other, we don't have global state, and we don't drop messages spuriously,
and we believe that this is crucial to avoid nasty errors that you have to debug for a long time. And that's why we wrote the dbus-broker project, and why we believe it's currently ready to be used, and it will be used in Fedora in F30, and we have already heard from several people who used it in their test environments,
and we are more than happy to hear about more reports of people deploying it. Hi, I just wanted to see when you say
you rewrote the daemon, does that mean that it's a daemon rewrite using libdbus, or you rewrote the whole thing? No, there's I don't think any shared code between dbus-broker and dbus-team. libdbus is part of the dbus repository, and it's really a lot of the details of the daemon implementation
are actually also part of the libdbus implementation. So for instance, accounting and so on is all done in libdbus for dbus-team, and if you want to change that, you actually would have to change libdbus. Furthermore, libdbus and dbus-team use a private API which is not accessible for external projects. So no, there's currently no shared code
between the two projects. There's some upstream work in dbus daemon going on for implementing container support, which can be used for Flatpak portals
and any other container use cases. What are your plans for that in dbus-broker? So our plan is to implement everything that's in the spec, and that's our statement, so we make sure whenever there's something in spec we will adhere to it, even if we believe it's in a way that would break our guarantees.
For instance, right now there are things that the spec allows that would break some of our principles, but we follow them, so we say we will still implement them because they're in the spec. We do regularly comment on any issues on the bug tracker that we are aware of and try to make sure that our statement
that the people working on the upstream implementation know what our comment on that is. For the container one, I think, we commented on all of these. Basically fine with the concept of it. There are some details.
For instance, Tom commented that it should run as a separate bus name and not in the same bus name, and the other major one was probably about object paths, but otherwise we are okay with it. I think we stated that all in the public comments.
Yeah, I think so, but you haven't started implementing it yet. No, no. So far the comment from McMitty was that he's still not sure whether this is the final draft, so we did not hurry in implementing it, but we are not opposed to implementing it.
I have a question. In the bus demo, there is a problem with restarting, that many clients are not able to survive the disconnection from the daemon, so I wonder if some way of transparent restarts is tackled with the bus broker.
Currently, we do not support that, and there is an open issue that was open some time ago in our issue tracker, where somebody explained the wish for this. And in principle, we think it would be nice to be possible to do that, we would be able to do that, but at the same time, it would require a huge amount of work,
so it's not high on our list of priorities, so we wouldn't be opposed if somebody turned up and made it happen, but at the same time, I don't see it happening any time soon. Thanks. So there is a weakness in how D-Bus handles idle exit
of auto-starting things? There is like a race condition, where you try to idle start, but someone queued you a message and you hadn't got it. Did you ever look at fixing that? Yeah, you mean exit and idle, right? So you mean... Idle exit, yeah. There is actually an RFC in the D-Bus bug tracker,
where we proposed something called connect client discuss. The idea is basically to get something similar to socket activation, but for D-Bus, so to try to avoid the name activation, instead actually take just a socket that is pre-connected and has everything already set,
and do normal socket activation with services. Yeah, that would be our preferred proposal. It does change the semantics quite a bit and doesn't make it trivial for other services to adapt to that. I do remember that we talked about that quite a lot
during the KDBus days. I don't exactly remember what our conclusions were, so right now we don't have any differences there to D-Bus 2. So I think that the conclusion is that as far as we have figured out, with the current D-Bus specification, I don't think it's possible without extending it. So we need some extension and we have proposed an extension to it, but there's nothing that we could just magically fix.
Any other questions? Have you thought about taking more of a part in the spec process and maintenance? Have you thought about playing more of a part in the specification process and maintenance?
Yeah, I mean, we would like to do that. I mean, we have participated a bit, and we would be... Yeah, you definitely participated quite a bit, but like taking more ownership, maybe. That would be, yeah, absolutely something we're interested in, or I'm interested in, at least. Don't want to put words in your mouth, by the way.
Okay, thanks.