Building Blocks of Decentralization
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 275 | |
Author | ||
License | CC Attribution 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/51933 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
InternetworkingInformation securityFacebookGoogolCentralizer and normalizerGame controllerPoint (geometry)AuthorizationSingle-precision floating-point formatPhysical systemComputer animationMeeting/Interview
01:11
Message passingTouchscreenQuicksortFacebookService (economics)TrajectoryCollaborationismGoogolPower (physics)Physical systemProjective planeMultiplication signSelf-organizationCore dumpVideoconferencingReal numberIndependence (probability theory)DatabaseMereologySoftwareData storage deviceOnline chatDesign by contractSynchronizationRadical (chemistry)Form (programming)Vapor barrierFundamental theorem of algebraSource codeSet (mathematics)Regulator geneScaling (geometry)CryptographyNumberContent delivery network1 (number)Order of magnitudeInteractive televisionMilitary basePoint (geometry)Software design patternGroup actionComputer fileSatelliteAbstractionCentralizer and normalizerSoftware developerRule of inferenceShared memoryTwitterBitServer (computing)Computer animation
07:45
Physical systemEndliche ModelltheoriePoint (geometry)Limit (category theory)Online helpScaling (geometry)Set (mathematics)Term (mathematics)Category of beingExterior algebraData storage deviceExistenceQuicksortComputer animation
08:55
Source codeOrder (biology)Physical systemWeb crawlerInstance (computer science)QuicksortTerm (mathematics)Uniform boundedness principleTable (information)Hash functionLatent heatPeer-to-peerMetadataCore dumpOpen set
10:20
Source codeInformationMultiplication signOrder (biology)Server (computing)Uniform boundedness principleProjective planePhysical systemTotal S.A.MeasurementLatent heatGroup actionCommunications protocolContent (media)Source codeAddress spaceService (economics)EmailInternetworkingBit rateDifferent (Kate Ryan album)
11:26
Server (computing)Service (economics)Order (biology)Different (Kate Ryan album)EmailIP addressServer (computing)Communications protocolInternetworkingInstance (computer science)Installation artRaw image formatMatrix (mathematics)CuboidMenu (computing)Touch typing
12:22
Streaming mediaSource codeWhiteboardOrder (biology)Web browserInstance (computer science)Core dumpContent (media)MetadataGraph (mathematics)Communications protocolDifferent (Kate Ryan album)Execution unitFamily
13:29
Source codeIdentity managementCommunications protocolSoftwareServer (computing)Replication (computing)Order (biology)Mechanism designService (economics)Office suiteCentralizer and normalizerSimilarity (geometry)Computer animation
14:18
Source codeMaizeCoordinate systemSimilarity (geometry)Local ringCommunications protocolCASE <Informatik>Server (computing)Order (biology)Projective planeAddress spaceQuicksortType theoryCircle
15:24
TopologyEndliche ModelltheorieClient (computing)Server (computing)FacebookMereologyCentralizer and normalizerPoint (geometry)Partial derivativeServer (computing)Physical systemInstance (computer science)Client (computing)Matrix (mathematics)State of matterPersonal digital assistantShared memoryBitDecision theoryComputer animation
16:26
Endliche ModelltheorieClient (computing)Coordinate systemTwitterMenu (computing)SoftwareDifferent (Kate Ryan album)Rational numberInheritance (object-oriented programming)ResultantOpen sourceSurfaceSocial classServer (computing)TelecommunicationMessage passingBit ratePhysical systemDecision theoryTerm (mathematics)Selectivity (electronic)Endliche ModelltheorieClient (computing)MiniDiscOrder of magnitudeHeuristicPosition operatorComputer animation
18:19
Endliche ModelltheorieComputer networkSynchronizationOverhead (computing)Interior (topology)QuicksortMessage passingMehrplatzsystemTerm (mathematics)Endliche ModelltheorieTelecommunicationPolygon meshForm (programming)Data structureNetwork topologyPhysical systemCategory of beingView (database)SoftwareBand matrixInformation securityAdditionIdentifiabilityMathematical optimizationSingle-precision floating-point formatComputer animation
20:48
Endliche ModelltheorieDatabaseComputer networkSoftwareDifferent (Kate Ryan album)AlgorithmUniverse (mathematics)Position operatorNeuroinformatikRight angleHash functionCategory of beingDatabaseTable (information)ForceContent (media)Dynamical systemAbstractionImplementationSheaf (mathematics)NamespaceComputer animation
22:44
Endliche ModelltheoriePhysical systemProof theoryMessage passingThresholding (image processing)Communications protocolEndliche ModelltheorieSet (mathematics)Fault-tolerant systemCASE <Informatik>Network topologyCrash (computing)Inheritance (object-oriented programming)Multiplication signComputer animation
24:28
SoftwareMultiplication signCASE <Informatik>TheoryState of matterProof theoryMereologyQuicksortPhysical systemNeuroinformatikPower (physics)Communications protocolComputer animation
25:43
Identity managementInterior (topology)Physical systemCentralizer and normalizerIdentity managementDevice driverFacebookForcing (mathematics)Limit (category theory)Set (mathematics)MehrplatzsystemGoogolSymbol tableArithmetic meanComputer animation
27:09
Interior (topology)Content (media)Computer data loggingLimit (category theory)Set (mathematics)Neighbourhood (graph theory)Message passingPoint (geometry)WeightData storage deviceTotal S.A.SoftwareIdentifiabilityFingerprintBitWeb 2.0Structural loadOrder (biology)Content (media)Term (mathematics)Arithmetic meanQuicksortDependent and independent variablesNatural numberCartesian coordinate systemScaling (geometry)Artificial neural networkPhysical systemWeb browserOcean currentComputer animation
30:10
Band matrixLimit (category theory)Computer networkMusical ensembleClique-widthShift operatorReal numberFluid staticsMultiplication signLink (knot theory)Physical systemAverageProfil (magazine)Band matrixMobile WebLine (geometry)Different (Kate Ryan album)Computer animation
31:39
MetadataPretzelComputing platformAddress spaceEncryptionCategory of beingLimit (category theory)Scaling (geometry)TelecommunicationEndliche ModelltheorieIdentity managementServer (computing)EncryptionMereologyInformation privacyPattern languageIP addressPoint (geometry)Physical systemMetadataComputer animation
32:55
CryptographyEndliche ModelltheorieFrame problemPerspective (visual)ResultantPoint (geometry)Dynamical systemRegulator geneOrder (biology)Physical systemPower (physics)Centralizer and normalizerDistancePosition operatorRegular graphRight angle
34:32
Form (programming)Library (computing)Pattern languageRadical (chemistry)Transport Layer SecurityQuicksortSoftware developerMereologyData storage deviceSoftwareEndliche ModelltheorieView (database)Data structurePhysical systemHypermediaMobile appRight anglePersonal identification numberOrder (biology)Power (physics)Type theoryEvent horizonDifferent (Kate Ryan album)NumberDirection (geometry)Set (mathematics)Vector potentialIntegrated development environmentCausalitySpacetimeComputer clusterInformation securityIncidence algebraService (economics)Stability theoryCategory of beingGame controllerCentralizer and normalizerGroup actionDecision theoryWater vaporLocal ringComputer architectureShared memoryFamilyBlogElectronic mailing listSynchronizationMultilaterationComputer animation
41:44
Finite element methodBitComputer animation
Transcript: English(auto-generated)
00:03
Internet has this funding myth that it's a very decentralized thing. But if you look at the actual usage today, much goes through a few centralized commercial
00:25
entities like Google or Amazon or Facebook. And there are many single points for authorities or other interested parties to throttle or censor access to content.
00:41
Our speaker, Will Scott, who is a recovering academic who has worked on distributed systems and security, among other things, thinks that the Internet should not have these central points of control and wants to tell us about the building blocks of decentralization that will allow us to build another Internet that is less centralized and more resilient to
01:08
central attacks. Have fun with the talk. Great. Thank you. Yeah. So I'm really happy to be talking today. And it's been great to see so many people at RC3, even if just through my screen.
01:28
In reflecting on this talk, I think there is a couple of messages. And the primary one for me has actually been really helpful. When I think about what we have with decentralization and what's getting built, the trajectory actually
01:44
looks promising in a lot of ways. And I think the second message is that when we think about the technical building blocks for decentralized systems, we really are at the very beginning, that there is
02:02
a lot left to do. And there's a lot of work ahead of us. So I want to start with a story. And that story is a story of community and of building new systems. And the first step and the first question is, how do you build that community?
02:22
How do you find that community? And I think the answer that we have is things like this event. And this is one that maybe we don't have all the systems of discovery. And there's a whole talk here that maybe we don't have time for, which is, how do you find the people who share similar interests?
02:42
And the place where we can start to think about decentralized technologies is instead that you've got this community, and you want to sustain it. And we actually have a bunch of decentralized tools and a really rich ecosystem to do that. So we can find our community, and we can talk with federated systems like Mastodon.
03:03
We can have video chats with Jitsi. We can host our own files, store our own data. We can collaborate on software projects by running GitLab. And so when you think about decentralized systems to help a community be self-sustaining and independent, we have a set of fundamentals that do this well.
03:26
And what we're doing is not developing new things, but it's reducing the barrier of entry. And so when you think about the sorts of things that are happening today is you're finding things like instead of having to find someone to run a server and run a complex
03:45
thing like GitLab, you have new systems like Radical that can make that easy. So people just run a piece of software, and they don't need to worry about the complexity of deployment. So we're reducing the barrier of entry, but we already have a set of building blocks
04:00
that allow a community to do this. Great. So you've got your community around an idea, around an ideology. And you can build software, and you can start to make this idea grow. And so when you think about that, the next step is, how do you make services?
04:22
How do you make ways for that idea to get to more people? And once that stops being a thing that you can just self-host, that you would run on your own servers, we have building blocks now. We have underlying systems and design patterns for how do you take your service and allow
04:43
it to scale while maintaining independence. And these are somewhat less developed. I think we'll go through this in a little bit, but the user bases are maybe one or two orders of magnitude smaller than the previous ones. But you can have files go on a decentralized CDN like IPFS or a number of others.
05:04
You can use distributed database abstractions, be they Gun or Earth Star or HyperSwarm or others that will allow messages to get passed in a decentralized way and allow for some amount of synchronization and collaboration so that a service can go out to millions
05:24
of users without needing to fall back to centralized technologies like Google or Microsoft or Facebook or et cetera. And so you can, I think, not only have things that support your community, but can then
05:41
reach other communities. And so we're sort of following the trajectory. This is less developed. And then we can go to the next step that's even less developed. And this is now not only touching a virtual service, but starting to interact with the real world. Because when we think about the successful technologies that are the Googles or the Facebooks
06:03
of the world, they're not purely digital. They go into our real lives. And this is a part that's still very much in development, which is cryptocurrencies are a whole thing. But the thing that's maybe interesting is they help us bridge into the real world
06:23
financial system. And so our services still maintain independence while being able to touch that aspect. And likewise, the promise of the DAO or of decentralized autonomous organizations is that you can maintain decentralization while interacting with real legislative, political,
06:42
et cetera systems. And so the thought is rather than having some core owner who has to have power, you can encode beforehand a contract of how this organization will work based on the principles and have that power remain decentralized in that there's not a single person that the
07:01
existing real world regulation systems see as the source of all of this accumulated power and go after and are able to then influence or manipulate apart from what the community wants. And so this takes the form of these evolving and developing systems like Ork that's how
07:21
you have something that exists in a decentralized way and have it be able to interact with existing real world systems. And the answer ends up involving both advances in cryptography and in thinking about how you find hooks that you can make these points of interaction.
07:46
So what I want to go through in the rest of this talk is start with a grounding on a set of things that I would consider decentralized systems to give us a sense of scale and help ground us.
08:00
And then I'm going to abstract one layer to talk about the underlying models and technological systems that are common to a lot of them. And then talk about two ways to think about the limits there. One is as these things grow, what are the emerging pain points that we see, that we
08:21
know of, in terms of the existing technologies are running into pain points? And then the other one is the more model properties. So these already are failing us in some ways.
08:41
And so how do we think about whether there are alternatives that do not have those same properties? Are these things that we can bolt on or do better with in the current systems? And how do we do that? So in terms of existing systems and where we are, BitTorrent remains probably one of the
09:01
largest decentralized systems, in a sense. There are 3 million, 4 million active users out there on any given day, which starts to reach meaningful percentages of the global population that are actively using BitTorrent. We can think about one other aspect of BitTorrent that's interesting, which is you've got a lot
09:27
of users, but then you've also got this metadata layer of finding new torrents. And that finding new torrents core that needs to exist to allow you to discover the peers that you are in in your specific federated instance of BitTorrent comprises of
09:47
something on the order of 400 open torrent trackers that will maintain metadata, but also a Kademlia DHT, which is a distributed hash table that is made up by more normal
10:05
peers that store metadata. And there's roughly 4 million torrents that are contained in that distributed hash table of metadata about who is participating in which torrent.
10:21
Another federated system that's had a great year is Mastodon. As an exemplar of activity pub protocol and the Fediverse more generally has on the order of 3 million users, the activity pub Fediverse more generally is something like 50 different projects.
10:41
There are on the order of 5,000 activity pub servers. This fall, I guess, there's been measurement from the community data science collective academic group looking at activity on Mastodon specifically.
11:00
They find something like 70,000 tooting users who collectively to roughly 30,000 times per day. And this measured activity versus total users is not an unexpected ratio. On pretty much any social network, you would expect that the majority of people there are
11:21
consuming or not posting public content. More broadly, when we think about federated things, there's a lot of even foundational internet technology like email that is federated still. There are on the order of 6 million email servers, so different IP addresses running
11:42
the SMTP protocol. There are a bunch of WordPress servers as well that federate through comments and pingbacks and so forth. Jitsi reported this year that they have on the order of 20 million users of their
12:04
primary service infrastructure. They didn't say how many independent installations there are because that's much harder to measure. And matrix had a great year and has grown to 2.5, 3 million users and has at least 11,000 independent instances.
12:23
Moving again on this graph towards external impact, IPFS passed 2 million users this year. They have a DHT using the same cadmium protocol as BitTorrent, but self-select a smaller
12:41
core of active nodes to participate in that DHT as a way to try and provide better latency and availability guarantees in it. And so their DHT is composed of somewhere in the 5,000, 10,000 DHT nodes. Of those 2 million users, somewhere around 20% or 100,000 are running a desktop instance
13:07
where they're staying on and are not being accessed through a web browser. And then participating in the DHT, one finds something on the order of 20 million different pieces of content that are being stored in IPFS. And in a given day, something like 8 million that are being retrieved through that DHT
13:24
or finding metadata about who has them. Secure Scuttlebutt is an offline-first decentralized network. It's got an order of 10,000 users, and it uses public servers as a mechanism for storing
13:46
and forwarding when users are offline. And it has on the order of 100 of those. Previously, I mentioned Earthstar. That uses a similar partial replication and this network is really going further on
14:10
the decentralization and meant for small independent collectives than many of the others. And then finally, Bitcoin has only about a million active addresses or accounts.
14:30
And about 11,000 nodes that are running the full Bitcoin protocol. And this is not a dissimilar ratio to what you'll find on all the other projects of this type.
14:42
And so again, you've got this fairly large gap between users and servers or the elevated nodes that are doing more of the work and coordination. Part of that we can see in the Bitmark case is that in order to validate and run the full
15:05
protocol, you need to have all the historic data, which is up to about 300 gigabytes in this case. So the cost of running this full node is non-trivial. It's not something you can run on a phone or even really on a local desktop necessarily.
15:24
So from these, we can then say, what are the commonalities that a bunch of systems share? And I want to start by, I included in that, you might notice a lot of federated systems.
15:42
And we can think of federation as this partial decentralization, perhaps, that we've gone from a central Facebook or Google where there's one entity to these federated systems, Matrix, Jitsi, et cetera, et cetera, where there's a lot of instances of a server,
16:02
but the server is distinct from a client. And there is this distinction to be drawn of, well, OK, there are things like BitTorrent, perhaps, although BitTorrent has, again, a tracker that's sort of like the server that's separate from the clients, although the tracker is getting phased out to the
16:21
DHT. And so the point is, and this is why I think of federation as a lot of the part is that what federation is doing in some sense is it's externalizing the heterogeneity of resources, which is you've got some nodes that are more powerful, that are going
16:40
to be always on, that have bandwidth. And in a federated system, we explicitly run a different piece of software on them that indicates that, that is, this is the server software. And in systems that we think of more traditionally as true peer-to-peer, within
17:01
the software, there end up being heuristics that try and do the same thing, that try and guess should this node take on more of the coordination work. Because one of the things that all of these networks want to do is make efficient use of resources, and those resources are not evenly distributed. Some nodes have more disk space, have more bandwidth, have better availability, and
17:24
your system is going to perform better if you're able to take advantage of that. And so we can, and I think, I don't know if I'd think of this as cheating, but you can think of these as two different problems, which is, there's one problem, which is how good of heuristics can I have to self-select nodes into being servers versus clients?
17:45
And then how do I make decentralized protocols? And those are somewhat different. And as a result, you can say, do I need to figure out how to have good heuristics? Or do I just start with federation, which seems to be getting us a couple orders of magnitude
18:02
more users and more success? And then I can make packaging and make auto-selection of client and server and opting into those positions be a thing that happens independently of that, or happens later as a different problem to solve.
18:20
The counter is that you think of message passing and how your communication model will happen purely in terms of a single node. And this is the natural thing that you end up with when you do that and you go on the full sort of, I'm a node and there are other nodes like me, but let's take the view of a single user or a single node, is you end up with something that looks
18:42
like gossip. This is what Secure Skittle but uses in a sense, or is based on initially. And the basic concept here is I get a message and I want to then send it out to everyone else I'm connected to so that messages sort of disseminate through the mesh,
19:03
through the decentralized mesh. And we have a bunch of optimizations on that initial concept. So things like I will then send out an identifier of the message to see if the person I'm going to send it to has already gotten it. And then if they have, I'm not going to send it so that I don't waste the full bandwidth
19:23
of sending that message over every edge. One of the things that's true when you start there is there's no concept of the structure of the network itself. Because you're sort of considering a node in isolation, what you don't have is any
19:44
ability to then pull back from there to look at the full network and start to say useful things about that. And so one of the things that is tough to say is like, will a message get to its recipient? How quickly will it get to its recipient? Because these are all questions that aren't about the single node but are about the structure
20:03
and the connectivity of the network more broadly. And so it becomes hard to mesh this sort of communication with things like when should I form additional communications or additional connections? Who should I remember? And questions about how you maintain a strong network topology.
20:23
And I think one of the things that we maybe need to be thinking about is how do you combine those two together so that you can start to have better guarantees and better understanding of the actual dynamic properties of these decentralized networks. Because we end up with very few properties that we can say useful things about in this
20:45
sort of system of message passing by itself in isolation. To give a sense of a somewhat more concrete building block that ends up being the other one that we can use, that's distributed hash tables. And so a distributed hash table, which many of you have probably seen at least in
21:04
passing, is that we've got a bunch of data and we've got a bunch of computers or participants in our network. And we're going to come up with some identifier, maybe a hash, of both participants and content. And we're going to put them in the same namespace.
21:22
So it's the same hash, right? And then the different nodes, the different computers, will store the content that is close to them. So I'm a person at position three. I'm going to store all of the content that hashes to four, five, six, until there's
21:40
some other node. So I get some section of the space of content based on where I hash to. And then when I want to find content, I find the node that I know about that is closest to that content. And I ask them. And if they know someone closer, they'll forward me on to that person. And so I end up finding the node that is responsible for that content.
22:02
And this has an algorithm. There's a few different implementations. And DHTs are in wide use. One of the nice things is that it's an abstraction that from outside, you can think of it as a centralized database, that I can put stuff into it, and I can get stuff out of it, and I don't have to think about the dynamics within.
22:24
And it has this nice property that it sort of grows. As you have more users, you can expect the users to each bring data. But the data per user isn't necessarily growing. And so it's efficiently keeping data across all the different users.
22:44
The other building block that we have here is coming up with consensus. How do a set of decentralized nodes agree on something? This is what cryptocurrency really is relying on, is you have a bunch of nodes that want
23:03
to all agree on something. But that's a much more general problem. We've had this problem and have a set of systems for it. In centralized systems for a while. And it turns out that's more efficient. But when you think of what the problem for consensus is in those cases, they've
23:23
actually been able to get away with a weaker threat model. So Paxos, Raft, these are protocols for consensus that work within a threat model of fail stop. And what that means is nodes that are broken may not respond. They may delay. They may freeze. They may crash.
23:41
And despite some threshold of nodes failing or crashing or not responding, the system guarantees that the other nodes that are still working will agree on an outcome. But in a decentralized world, that ends up not being quite enough, that what you
24:00
want is a stronger notion of what a bad node looks like. And so that's what gets called BFT, Byzantine Fault Tolerance. And that is tolerant of Byzantine nodes, Byzantine nodes being nodes that can act maliciously. So they can send arbitrary messages. So it's not that they just fail to send a message.
24:21
It's that they send a message indicating a different thing than they should have. And so Bitcoin came along, did proof of work. In proof of work, you are using the scarce resource of computation, so how many times you can hash something, as a race that gets used to as long as the majority of the network
24:46
is all following this protocol. Even if someone is trying to do malicious things, they don't have enough power unless they are the majority to mess that up. And that's switched now, I think, because of environmental concerns and so forth, to
25:06
proof of stake, primarily. And proof of stake is, instead of using computation as that scarce resource, it's using how much of the existing resource you have as that.
25:24
There's a whole theory in here about things like, if you already have a bunch of it, you don't want to lose it, and so you're more likely to not do something malicious. But in both of these cases, these are not systems that lead towards additional
25:41
decentralization. And one of the things that I think is at the heart here of a lot of the problems for decentralized systems is this question of symbols and identity, which is within the system itself, we don't really have any firm notion of identity. So who is a user?
26:02
Is some entity or some collection of users acting on behalf of one user, or are they multiple users? We can't meaningfully enforce that. When you think about how you enforce it, even for centralized systems, like the Facebooks or Google, they go to governments, and they go to this, I'm going to identify you
26:20
as a citizen or a person with a driver's license. And if you don't have that external authority, that stops making sense as a way to appeal to authority. And so instead, a lot of effort goes into trying to think about incentives and trying to make the system not make it worthwhile to appear as multiple users when you are,
26:43
in fact, only one. But the flip side of that is that the incentive actually incentivizes centralization, because it ends up meaning that it is better to be a single large entity than many small entities. And so that means that if it's better to be the single large entity, that single large
27:05
entity becomes more and more powerful over time. So that was a set of building blocks. Let's go through limitations on these building blocks. So DHTs are something that we have.
27:25
And they do grow with users. However, so as you get more users, you can have more content, and the load and burden on each user in the DHT does not increase. However, there's a set of applications that you just can't really do with our current
27:41
understanding of DHTs. So if I want to do web search and put identifiers for all of the web, or for really any many millions of identifiers, suddenly that's a lot of load that I would be needing out of each user on the DHT. So we can do something like a few million torrents total based on that user base.
28:03
But are you going to be able to store hundreds of millions of items, all of the individual files, and make those individually addressable? That suddenly becomes really hard. And so this is a question of naming and thinking about, what is an identifier that I want to be able to search for or look up? What are the things that I'm putting as the entry points for lookups?
28:24
The other one is an interesting or limit here in terms of scale, that these things don't fully scale, is that a lot of the applications that we want are interactive
28:40
in nature. So we want to be able to look up something and get a response in an interactive way that is compatible with web browsing or some sort of I'm there as a user. And DHTs take something on asymptotically log in lookups to find content. So I find the node I'm connected to that's closest to it.
29:00
They tell me someone who's even closer. As the DHT grows, that goes from two or three people that I need to talk to to three or four or five. And if each of those is taking 100 milliseconds, that starts growing quite a to a point where even though it is log in, which is great, it's more than constant.
29:20
And it stops being competitive with our centralized systems. The second limit is gossip. As a gossip network grows larger, we start to really find ourselves in a situation where we don't expect messages to necessarily end up going all the way across it.
29:42
And so we stop being able to say things about, I'm in a local neighborhood, but I can't necessarily find a friend through the network because it becomes way too expensive for all messages to go to the whole network. And so there is something about network structure and about how you think about things beyond
30:06
gossip that we need in order to scale those numbers. And the final one is an infrastructural one, which is that there continue to not be real world incentives that are pushing the infrastructure that we have to support decentralized
30:22
technologies. So it's still true, as it has been for decades, that a normal user's upload bandwidth is about half of their download bandwidth. This is true. This is a broadband fixed line one, but it's also true for mobile. But likewise, when you think about latency between end users, when I think about a centralized
30:46
system, the latency for me to get there keeps going down because they keep building edges closer to more of their users. And so on average, they actually end up getting closer to their users and lower latency. But when you look at, and this is a ripe Atlas probe, 2015 or 2010 till now, the latency
31:05
doesn't actually go down much. It goes down, and this is a bunch of nodes trying to get into Europe. And the latency from Russia goes down a little bit, but India is staying about the same. And we don't have money that's going to be invested into the links that are getting
31:22
transited for traffic between two different end users because that's not something that there's money that is incentivizing. And so until we figure out a way to solve that, we're going to end up with a very static latency and bandwidth profile. OK, I'm going to, I think, running into question time, but I will finish with the
31:46
more meta properties rather than these scaling limits that we need to think about. The first is coming up with a much stronger model of metadata exposure and privacy. We have end-to-end encryption. We're able to protect the contents, but we don't typically protect either the size or
32:04
communication patterns of who's talking to whom. And without servers and intermediaries, I think basically all of these decentralized systems, less so federated, but even there, end user IP addresses and the identity of
32:22
where they are and what they're connecting as ends up being something that is used for the disintermediation. And as such, there's no real guarantee about how private you are or who could learn that you're participating or who you're talking with. And so coming up with models where we get some limits on the retention, so being able
32:41
to say things like at some point after you have stopped talking, the system will not continue to hold on to your IP address or who you've talked to, your communication patterns, is important and is not part of any of these models. And so I'll end with two things that I go back to when thinking about this.
33:01
The first is a paper on the impossibility of full decentralization in permissionless. And this is on blockchains, but permissionless consensus, which basically is the impossibility result that I brought up earlier when talking about how in order to disincentivize SIPLs,
33:20
we move towards encouraging centralization. And I think this is a framing question, which is the incentive to decentralize is actually an incentive that happens in a larger picture.
33:40
It's not the system itself, but it's the entities around your system that are the incentive to not have centralization. It's the government, it's the regulations, and it's the other dynamics that are why you would want to not be centralized. And we don't have that in our model. And likewise, the tech policy perspective is saying, essentially, there are these very
34:08
powerful existing entities that will attempt to identify the points of power and regulate them. And that ends up being why we need this decentralization in some sense is to prevent
34:22
being co-opted by existing systems of power. That's where we find the motivation for decentralization. So I will end there. I guess we have maybe a couple of minutes for questions. You're muted for me.
34:43
Yes, thank you. We have a few questions from the audience. First is what do you think is the future for decentralized social media? Will there have to be a major event for decentralized social media to gain more traction? It's a good question. Social media is an interesting and problematic beast in its own way.
35:07
This is something that we haven't really traditionally had in the same way. And so one of the questions maybe to step back is what do you actually want? Having the part of social media that is you and your communities and your friends,
35:24
I think we already can do and will continue to see more localized things. Because they give us the same experience. Figuring out how you broadcast a celebrity or these sort of more large-scale zeitgeist ideas is where we are much less developed.
35:41
And so part of that less developed also means that there's more potential for further technological development to make that more possible. So I'm optimistic that we get there, but I think that's further off. Okay. And another question related to that, but probably more into direction of user experience.
36:01
Can we abstract the decentralization to ease the use for the average user? Yeah. I mean, and there's a couple of things there, right? Which is we are starting to have libraries and patterns that other developers can build better user experiences on. I think we're seeing also that there are easier-to-use sort of end-user compatible systems
36:24
that are views into decentralized networks. So be that the things like Manyverse or Planetary, which are mobile apps that came out this year for a secure Scuttlebutt, or Radical and these just sort of desktop apps that provide Git or other types of decentralized network views
36:46
that don't require setup or work. We're getting better at user interfaces, and we're having building blocks that are not themselves particularly visible or require configuration. And a more technical question that just came in.
37:02
What do you think of the trade-offs made by single-hop DHTs and their huge local routing tables? Right. So this is an interesting question, and it's where a bunch of thought is being put right now. And so one of the things you could say is in the same way that a federated system
37:23
is externalizing this disjoint resource or resource heterogeneity between sort of highly available nodes that might make up your DHT and the sort of end-user more transient nodes that are querying it, you could have a DHT that tries to self-organize to take
37:46
advantage of even more heterogeneity. So you find nodes that are more powerful, and you have multiple layers or a hierarchical DHT where some DHT nodes forward queries into a smaller center that's more powerful.
38:03
And you end up with something where you probably can get rid of some of these log n-style latency things to make them be able to scale more at the expense of something that starts to look like more centralization, that there end up being a smaller number of nodes in
38:24
the center that, if they do go down or if they decide to censor things, they end up with a lot of power in the system, which is starting to be scary. So there's a trade-off there. Next question. Given the popularity of Cademlia, have probabilistic models won the decentralized
38:44
architecture, or are they better suited to decentralization? That's a good question. I think Cademlia, I don't know if it won just because it's probabilistic. I think it has something to do with, it's fairly simple to conceive of.
39:02
We often say these distributed systems are really hard to reason about. And in some sense, that's like this, I've done one thing, and I'm out of ideas. One thing that I took away, at least, from that set of building blocks that I presented
39:23
was there were only three or four things that are underlying all of these things. We really have not looked through a lot of this design space. And so I am not at all unconvinced that trying something radically different, we can find other models of decentralization that are more performant and that have different properties,
39:42
and we just haven't explored enough. So I think that's a no. OK, the last question until now. Do you think decentralized writable storage like IPFS is threatened by toxic datasets like libgen? Well, OK, so there's an assumption that libgen is a toxic dataset.
40:03
But beyond that, I mean, I think there's a very reasonable story for any of these systems for how you sanitize them or make them compatible with external power structures. And so in an IPFS-like thing, users are opting into pinning or storing data as notes.
40:26
So it's only either you've decided to pin data in order to make it re-available to others. And so there are, as a node, if you get a DMCA complaint or you get a complaint that
40:41
something is illegal, you can blacklist it. You can say, I'm not going to pin or reserve that data. And so there's a way for each individual user in the system, as they get complaints, to not make that data re-available and limit their liability. So you'll get, as there's different regulatory environments, people will be able to comply
41:01
with their own regulatory environments. And so I think that is probably enough. The bigger question is, OK, does that cause the software to be seen in the same light as BitTorrent? And that's, I think, a more philosophical question and is as much like a perception
41:27
and finding good uses that you can counter things with as anything else. OK, thank you. That were the questions. Thank you for the interesting talk and for being here answering the audience questions.
41:42
Thank you. This was great.