We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Beernet: Building peer-to-peer systems with transactional replicated storage

00:00

Formal Metadata

Title
Beernet: Building peer-to-peer systems with transactional replicated storage
Title of Series
Number of Parts
97
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
We will very briefly introduce Beernet's architecture describing the peer-to-peer network topology, the distributed hash table, and the transactional layer for replicated storage (called Trappist). We will also describe Beernet's API to create peers, exchange information between them, and to store and retrieve data from them. We will finally describe some applications built on top of Beernet, such as a small wiki, a collaborative drawing tool, and a web-base recommendation system. Beernet is a library to build distributed systems as peer-to-peer networks. It provides replicated storage with distributed transactions, which are highly robust because they do not rely on a centralized point of control. Beernet can be used to develop synchronous and asynchronous collaborative applications. We have used it to build a decentralized wiki, a collaborative drawing application with gPhone clients, and a web-base recommendation system. Beernet stands for pbeer-to-pbeer network, where words peer and beer are mixed to emphasise the fact that this is peer-to-peer built on top of a relaxed-ring topology (beers are a known mean to achieve relaxation). The relaxed-ring provides a distributed hash table (DHT) with no central point of control and without relying on transitive connectivity between peers.
5
15
Thumbnail
48:33
41
Thumbnail
35:21
47
48
Thumbnail
1:03:30
50
75
Thumbnail
50:56
94
BuildingSystem programmingData storage deviceBoris (given name)InformationEnterprise architectureÜberlastkontrolleMemory managementKolmogorov complexitySelf-organizationComputer networkPoint (geometry)ResultantPeer-to-peerSystem programmingRule of inferenceData storage deviceSelf-organizationCartesian coordinate systemMultilaterationOverlay-NetzProjective planeInternetworkingIdentifiabilityComplex (psychology)Computer fileAlgorithmServer (computing)BuildingClassical physicsClient (computing)Single-precision floating-point formatData structureRing (mathematics)Memory managementÜberlastkontrolleScaling (geometry)Lie groupMusical ensembleEnterprise architectureAsynchronous Transfer ModeCentralizer and normalizerMetropolitan area networkComputer animationLecture/Conference
Memory managementPeer-to-peerRing (mathematics)Arithmetic meanTelecommunicationNumberOrder (biology)Multiplication signCoroutineConnected spaceSpacetimeMemory managementDependent and independent variablesGame controllerComputer networkHash functionRoutingTable (information)Green computingLogarithmComputer animation
Arithmetic meanKey (cryptography)Ring (mathematics)Information retrievalData storage deviceDependent and independent variablesOperator (mathematics)Table (information)Hash functionCASE <Informatik>Computer animationLecture/Conference
Hash functionTable (information)Ring (mathematics)RoutingLibrary (computing)Computer networkLattice (order)Message passingDatabase transactionReplication (computing)Phase transitionAlgorithmEnterprise architectureCartesian coordinate systemMessage passingEnterprise architectureAlgorithmBroadcasting (networking)Branch (computer science)Ring (mathematics)Replication (computing)InformationDatabase transactionPhase transitionKey (cryptography)Memory managementGraph coloringRange (statistics)Client (computing)NumberPointer (computer programming)Dependent and independent variablesBitIdentifiabilityComputer networkPeer-to-peerBuildingLie groupInstance (computer science)Formal languageLevel (video gaming)Self-organizationLibrary (computing)Greatest elementParameter (computer programming)Hash functionDescriptive statisticsVariable (mathematics)Uniform resource locatorPoint (geometry)Physical systemTelecommunicationSimulationCommunications protocolFrequencyCore dumpAdditionCommitment schemeMultilaterationIntrusion detection systemLoginRoutingComputer animationLecture/Conference
Computer networkPeer-to-peerCartesian coordinate systemMultiplication signDatabase transactionPeer-to-peerSystem programmingSound effectReplication (computing)Client (computing)Software frameworkAlgorithmLink (knot theory)Student's t-testVideoconferencingYouTubeServer (computing)Web 2.0Computer programmingForm (programming)Validity (statistics)Reflection (mathematics)Sheaf (mathematics)Computer networkDivision (mathematics)Set (mathematics)SynchronizationPersonal digital assistantGoodness of fitSineComputer animation
Database transactionDigital photographyData storage deviceSet (mathematics)Electronic mailing listProcess (computing)Right angleKey (cryptography)Student's t-testDatabase transactionProcedural programmingRevision controlComputer animation
Client (computing)WritingDatabase transactionComputer networkReplication (computing)Client (computing)Loop (music)Metropolitan area networkProgrammschleifeMereologyDatabase transactionRevision controlComputer fileProcedural programmingSheaf (mathematics)Memory managementCuboidStudent's t-testPeer-to-peerValidity (statistics)Message passingComputer networkArithmetic meanReplication (computing)Communications protocolPresentation of a groupSelf-organizationComputer programmingMultiplication signRing (mathematics)Network topologySet (mathematics)CASE <Informatik>Key (cryptography)Cartesian coordinate systemDistribution (mathematics)Software frameworkWikiLoginCodeWordComputer animation
Function (mathematics)Computer animation
Transcript: English(auto-generated)
OK, test, OK. So BierNet is actually a peer-to-peer system that allows you to build peer-to-peer applications on top of it. It is the result of a project called Selfman, and we are going to try to use it in a project called
Mancosi, which I want to tell you a little bit later. And so I'm going to tell you what is the motivation of our peer-to-peer system. The idea is to build scalable distributed applications with robust storage. That means that we don't want to work with a classical client server architecture, because you have a central point of congestion, so it doesn't scale.
And it also has a single point of failure. This is what we don't want to have. So this is why we want to have something which is decentralized. But if you go decentralized, there are more complex things to do. So we need some self-management for the system. It means that we want to have self-organizations of the participants of the network, and also self-healing. So if some nodes fail, you can replace them very easily,
or at least you don't have to do it manually. It has to be done by the system itself. So with self-management, we deal with the complexity of the decentralization. So what we want to do is actually peer-to-peer. And I will tell you later on why we call it peer-to-peer. So the most known way of doing peer-to-peer
is actually file sharing, where you have an unstructured overlay network, which works on top of the internet and allows you to share your files. You just connect randomly to some of your nodes, and it works pretty well when the files do not change. So the files of the music, they just stay the same, or the movies stay the same. You don't update them. And the thing that, if it's very popular,
because you try to find, you search as much as possible, and you find good luck, and if you don't find it, doesn't matter. So for popular things like, I don't know, Britney Spears or Metallica, you can find it very easily. But if this guy in green is searching for something called Sturbach-Bakelak, which is this guy here in red,
the algorithm is not going to be able to find it, even if it is in the network. So we want to add some structure to the network, which is done in the academia, with one ring to rule them all. And the rule is actually to assign an identifier to every peer and to put them into a ring. The identifier is going to allow you to connect to your predecessor and your successor, following the order of the numbers, of course.
And this ring is also one ring to find them, because you can have some special connections to jump significantly into your ring, so it's efficient for routing. So you can find every peer in your network, and logarithmic time, it means that if you double the size of your network, you only need one extra step to find every node over there.
The fingers can be done like this, like dividing half of the half and half of the half, or you can put them in a different way. It is independent of the ring. But because it's decentralized, you don't have control about things. You have to deal with peers that are failing, those peers there, peers that are leaving because they want to leave, or peers that are joining, or they join when a peer is failing.
So all this kind of stuff is why you need some self-management. And also, as in the previous talk, there were some problems with communication between peers. If this guy can talk to that guy, it doesn't mean that he can talk to this guy. So you can have some problems with the connection between successor and predecessor. This is why we have the relaxed ring.
So we don't want to have a perfect ring. We have something which is more relaxed. So we will allow some peers that can talk only to the successor, and just know about the predecessor so they can share the responsibility of this space very easily. So this is the relaxed ring. In the relaxed ring, we built a distributed hash table as in the other rings. But this means that you can do storage
by putting something with a key and a value and retrieving something with the key. Or you can search for the responsible of the key with the operation lookup. And the responsibility of this hash table goes from the predecessor, excluding the predecessor, and including the peer itself. So this peer here, which has the identifier 0,
and this one, which is 7, it means that the responsibility of this guy is to store all the hash keys that goes from 1 to 7. Very simple idea. It's not invented by me, but from the commit. And the fingers here will allow you to reach any point in the network, as I said, in logarithmic number of steps, plus a B here, which is what we call a branch in the relaxed
ring, which are these nodes that are here, which are experimentally less than 2 in a network of 10,000 nodes. So it doesn't really affect the routing. So there's no SQL language here. You just do put and get from the information with key value pairs. This is not fault tolerant, because if the responsible
of the key dies, you don't get the value. So this is why we need some replication. But I'm going to do that a little bit later. To build a network, what you do, this is implemented in a language called Moser Os. And you do an import of the library. You just say, I want to create a new peer with some arguments.
With this peer, you join one network just using a URL. You build your own network, or you can join existing network if it's allowed to do that. And then to send messages to the peer, you can do it reliably with rsend2 using one of the IDs. And then you just send one message. You can usually do this for building your protocols
and all the stuff, but you just not say, long live Boston, even though it's a good message. You do something else. When you broadcast a stuff, you can do it with a range, which means sending to all of them, all but me, or from this node to that node. So it means you can do multicast. And as I said, for storing, you can put things with key value. So here I'm putting a beer. And then you put whatever you want.
So this beer with a key bink is called Adelardus. It's a double beer. It's from Limburg, it's very good. And then when you retrieve it, you just do get bink with a variable x. And x is going to get the value with this description. As I said, this is not fault tolerance, so we need transactions of the peer-to-peer with isolation, which is our layer called Trappist. Trappist is a symmetric replication.
It means that you have here, for instance, two items, one with color green, the other one with color gray, which is symmetrically replicated on the ring. So it has four replicas of every item. This client here wants to modify some of the items. So how it's going to do it? With a transaction manager. The most common thing to do transactions in a decentralized way is to do two-phase commit.
A two-phase commit relies on that guy to not fail during the transaction. You cannot do reliable things on nodes in a peer-to-peer, because you cannot rely on peers. So what you have to do is to also replicate the transaction manager and to have a consensus and to work only with the majority of the peers
that have the replica, not with all of them. Because if one fails during the transaction, it's going to be too bad. So you just work with the majority and with the replication of the transaction manager. So it is a little bit slower than two-phase commit, but it's more robust. So you gain a lot, actually. So when the transaction is done, you keep your new values replicated, and the transaction managers disappear.
We also have an update for a notification on every update, so you can build something like public subscribe systems very easily with this layer. So I'm going to show you a little bit the architecture of peer-to-peer. So on the bottom, you have the relax ring, which just organizes the peers, doing self-organization,
self-healing. Then on top of this, you do the reliable message sending to build the DHT on top of that. You have the replication. You have this trapeze transaction layer. And we have developed several protocols for doing Paxos consensus algorithm. Also doing Eagle Paxos means that you get the logs in a pessimistic way.
This is for doing synchronous collaborative applications. And you have your notification layer, which is all working with the replication, which is symmetric. This architecture, of course, looks better with a background picture that tells you more or less how the architecture evolves. So which kind of application have we developed in BERNET?
Pepino is a peer-to-peer network inspector. This is an example of a simulation of a network, because we try to create problems between the communications of some nodes. So you get some branches, for instance, here. And then you can have the fingers in between. And also you can follow the successor pointers or the predecessor pointers. So you can actually debug your applications.
You can see the messages exchanged between the peers here with different colors. You see at which level they were sending messages, either for the organization of the ring or for the transactions, for instance. So we have another application called Syndata, which is sharing idols and discussing about common addictions.
The idea is that you just have a community-driven recommendation system. This is done with the Paxos consensus algorithm. You just simply provide some links, like say, I like this video on YouTube. We don't put the video itself, because we can have some problems with copyright and all that stuff. So we just put the links. And the people can say, instead of star scores, no beer for that one, two beers later.
And then every time that you vote on the stuff or you provide a new recommendation, this is stored with a replication on this data system. And then you can retrieve all the stuff. And this is all done with transactions. So there's a set of peers. We use 42 for the server, which are actually a form of server.
So this is a peer-to-peer network which is behind the web server. We have another application called dtransdraw, which is decentralized transactional drawing, because everything is on transactions again. This one is using the eager lock-in. It means that every time that you want to modify some of the items that are on the network, you're going to get the lock pessimisticly, because you don't want anybody to try to modify the stuff
at the same time as you, because this is synchronous communication, not asynchronous as the previous one. So this guy is modifying two items. All the locks are got here. So here I'm mixing two applications, dtransdraw and Pepino, to investigate what is going on. When you finish your modification, the locks are released. And the update is sent to the other users. We managed to make a client for G-phones.
It's very slow, but it's working. So at least we did something good, just for the effect. So we have another experiment. The idea was not to develop ourselves our own application. We tried the students to do the stuff so we could see whether it's a valid programming framework.
And there are some students that did a nice job. They had to do a small Wikipedia where the articles are stored with key value pairs. Every paragraph is a key value, and every article is a set of paragraphs. So you can have two people working on the same article but modifying different paragraphs. And then when you run your transactions,
all the paragraphs are going to be committed, or all are going to be refused, and then you abort, and then you have to solve your conflicts. Again, this is some code. You see this is for updating an article. You receive a list of paragraphs to update and a list of paragraphs to delete.
A transaction, what it is? It is a procedure, it goes from here to here, which does a certain amount of read, write, or remove. In this case, I'm just doing two loops. One loop here to write all the paragraphs that I want to update, and one loop here to remove all the paragraphs that I have to delete. Once I've done everything, optimistically,
now I'm going to take all the logs for doing the boxes, and I will say to the transaction manager, please commit my transaction. And this transaction is going to say, yes, everything is committed, or sorry, you have to redo it again. How are you going to know this? When you run your transaction, you say, execute my transaction, which is called trans, the name of the procedure here, with this client, and using this protocol.
So you can use dynamically which protocol you want to use for your transaction, and the client is going to receive the message, commit, or abort. So there's not that much code for doing an update with transaction and replication and all this stuff. So to summarize, beer-to-beer network, so it mixes the word peer and beer,
because it's peer-to-peer, using beers, because beer are a known mean for achieving relaxation. Why relaxation? Because we use a topology which is called a relaxed ring. So it's one relaxed ring to rule them all, that provides self-organization and self-healing. With transactional DHT, the transactional DHT
is done with symmetric replication, as I said a lot of time already. With different protocols using optimistic or pessimistic locking, we would like to do something avoiding locks, because we know that locks are not so good for distribution. This is why actually we end up having the relaxed ring here. But we need to investigate this in the part of the transactions. We are not sure how to do it,
so if you have some idea, please contact us, and we will be happy working with you. Well, I already showed you some of the applications we already developed, a small wiki done by the students to prove that it was a valid programming framework, distributed transactional drawing, syndaca-pepino, and for the future work,
what we would like to do to contribute to the protocol Mancosi, is to have something similar to depth tolerance, the idea that you can have your files of your package for the distribution of your package, distributed in a peer-to-peer network, where you have the key value pairs would be the package and a set of files. So when you upgrade your version of your package, you just upgrade some of the files,
so you can retrieve only the files that you need from the DIT, from the two stuff. There are some other stuff that are more interesting about Mancosi that you can see tomorrow in the distro step room. They have several presentations which are really cool, and yeah, so that's it for Mancosi. So to end up my talk, I remember the lightning talk, you need to have a last message to give to the people so you remember your talk, and my message to you is,
a beer a day keeps the doctor away. Thank you very much. Thank you.