We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Beyond Swapping Bits

00:00

Formal Metadata

Title
Beyond Swapping Bits
Subtitle
Accelerating file-sharing in P2P networks and IPFS with Bitswap
Title of Series
Number of Parts
637
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Bitswap is a Block Exchange protocol designed for P2P Content Addressable Networks. It leverages merkle-linked graphs in order to parallelize retrieval and verify content integrity. Bitswap is being used in the InterPlanetary File System architecture as the main content exchange protocol, as well as in the Filecoin network as part of the block synchronisation protocol. In this work, we present Bitswap’s baseline design and then apply several new extensions with the goal of improving Bitswap’s efficiency and efficacy and minimizing its bandwidth fingerprint. Most importantly, our extensions result in a substantial increase to the protocol’s content discovery rate. This is achieved by using the wealth of information that the protocol acquires from the content routing subsystem to make smarter decisions on where to fetch the content from. This talk is focused on sharing the operation of Bitswap, the testbed we build to test improvements of Bitswap, and the ways in which we improved Bitswap building the foundation for research lines on file-sharing in P2P networks.
179
Thumbnail
20:09
245
253
Thumbnail
30:06
294
350
Thumbnail
59:28
370
419
491
588
Thumbnail
30:18
Computer networkCommunications protocolComputer fileContent (media)Image resolutionSystem programmingColor managementLocal ringRoutingInterface (computing)Message passingSynchronizationDatabaseHash functionLink (knot theory)MIDIVertex (graph theory)Web pageFilesharing-SystemPeer-to-peerUltraviolet photoelectron spectroscopyProcess (computing)Multiplication signInstance (computer science)Computer fileDigital photographyContent (media)Computer networkCentralizer and normalizerConfidence intervalPoint (geometry)Shared memoryCategory of beingInsertion lossVideo gameTrailBasis <Mathematik>Physical systemGraph coloringServer (computing)Web 2.0Pattern languageCausalityData storage deviceCASE <Informatik>Operator (mathematics)Order (biology)Interface (computing)Computer architectureData structureCoordinate systemDirect numerical simulationBlogEndliche ModelltheorieWordPersonal digital assistantLevel (video gaming)RootHash functionMessage passingStructural loadTraverse (surveying)Data managementMathematicsComa BerenicesNumber1 (number)Group actionExtreme programmingCovering spaceArmExistencePresentation of a groupOnline helpNetwork topologyKey (cryptography)File systemLink (knot theory)Interactive televisionNetzwerkdatenbanksystemMoment (mathematics)IdentifiabilityDirectory serviceCommunications protocolSynchronizationWebsiteConnected spaceComputer animationProgram flowchart
Vertex (graph theory)Web pageInterface (computing)Computer networkDependent and independent variablesMIDIConnected spaceBroadcasting (networking)RootInternet service providerPeer-to-peerMessage passingHeat transferCommodore VIC-20CASE <Informatik>Peer-to-peerLevel (video gaming)1 (number)Interactive televisionWeb pageRootLink (knot theory)Pattern languageView (database)Electronic mailing listMultiplication signEndliche ModelltheorieTrailOperator (mathematics)Message passingComputer fileInterface (computing)Data storage deviceDataflowAreaDifferent (Kate Ryan album)Data structureProcess (computing)Broadcasting (networking)Order (biology)Ocean currentContent (media)Communications protocolComputer networkDependent and independent variablesInformationDirectory serviceConnected spacePhysical systemTouchscreenGroup actionService (economics)Social classInsertion lossNumberCausalityBlogShared memoryPhysical lawBoss CorporationMixed realityInstance (computer science)
Message passingBroadcasting (networking)Dependent and independent variablesPeer-to-peerConnected spaceRootVertex (graph theory)Web pageElectronic mailing listNumberExecution unitLine (geometry)Insertion lossCausalityInternetworkingAuthorizationCASE <Informatik>Peer-to-peerFrequencyInstance (computer science)Physical systemPoint (geometry)Sound effectPersonal digital assistantLevel (video gaming)Connected spaceKnotAreaBroadcasting (networking)WordOrder (biology)Closed setMessage passingGroup actionSheaf (mathematics)Data storage deviceMultiplication signSoftware testingFood energyNumberUniformer RaumExistenceBitContent (media)Data structureComputer networkGauge theoryBlogError messageSelf-organizationProcess (computing)RootQuery languagePerformance appraisalPairwise comparisonResultant1 (number)Envelope (mathematics)Electronic mailing listPlanningComputer animation
NumberElectric currentImplementationDeterminismBand matrixCommunications protocolInformationQuery languageHeat transferClient (computing)Interface (computing)Computer networkWindows RegistryRoutingContent (media)Peer-to-peerLocal ringDatabaseData compressionStreaming mediaMessage passingVertex (graph theory)Overhead (computing)Strategy gameComputer networkSingle-precision floating-point formatData managementWeb 2.0InformationCommunications protocolContent (media)RootInterface (computing)CASE <Informatik>Peer-to-peerData compressionMessage passingBand matrixOrder (biology)Range (statistics)ImplementationNumbering schemeColor managementCartesian coordinate systemUniform resource locatorLevel (video gaming)Operator (mathematics)String (computer science)Event horizonInteractive televisionConnected spacePhase transitionElectronic mailing listBroadcasting (networking)Ocean currentProjective planeData structureLink (knot theory)PrototypeWindows RegistryMultiplication signQuery languageInstance (computer science)Computer architectureBranch (computer science)Software repositorySoftware testingMoment (mathematics)Endliche ModelltheorieBlogComplex (psychology)MathematicsSelf-organizationInjektivitätHypermediaShared memoryArithmetic meanView (database)Condition numberRight angle1 (number)Video gameAreaPlanningPhysical systemInternetworkingDirection (geometry)XML
NumberTerm (mathematics)Message passingHeat transferContent (media)Vertex (graph theory)Wide area networkPeer-to-peerRange (statistics)Total S.A.BitVector potentialPrototypeCommunications protocolComputer networkMessage passingSoftware testingUniform boundedness principleBroadcasting (networking)Range (statistics)CASE <Informatik>Band matrixComputer networkMultiplication signPeer-to-peerType theoryWavePhysical systemPrototypeResultantWindows RegistryProduct (business)Software repositoryConnected spaceCuboidComputer fileOrder (biology)DataflowBitRight angleSymmetric matrixNumberHydraulic jumpVector potentialContent (media)Filesharing-SystemCommunications protocolTelecommunicationInformationImplementationSoftware developerEstimatorTheoryUniformer RaumError messageFrequencyReal numberGroup actionGodWeightKnotInsertion lossTrailGame theory1 (number)Physical lawPressureObservational studyDiagram
Communications protocolFeedbackElement (mathematics)FeedbackComputer animation
Transcript: English(auto-generated)
Hello everyone and thank you for having me here. My name is Alfonso de la Rota and I'm a research engineer at Protocol Labs And I'm here to talk about file sharing in peer-to-peer networks. Actually, I'm gonna talk about all of the research
We've been doing at Protocol Labs research to try and improve file sharing and drive speed ups in file sharing in peer-to-peer networks So it is well known that file sharing and file exchange in peer-to-peer networks is hard Because you have to worry about content discovery, content resolution, about the content delivery
And there are a lot of nodes in the network that potentially can have that content And doing so without any central point of coordination, it's even harder Because out there, there are a lot of content routing systems that help in this quest of trying to find the node that is storing the content We are looking for, like for instance BitTorrent had the BitTorrent trackers in order to discover nodes that store the content
In the Web 2.0, we see the DNS as the perfect system helping us to find the server that has the resource we are looking for And in peer-to-peer networks, we usually use data structure and we organize content in a DHT
In order to be able to find the nodes that store the content we are looking for The problem is that all of these content routing systems have their own trade-offs So for instance, BitTorrent trackers, it's a centrally governed system
The same happens for the DNS, they are fast but they are centrally governed And then we have the DHT, that is like the big system in peer-to-peer networks The big content routing system in the peer-to-peer network But the problem is that when the network is large and the system starts to scale, the DHT is pretty slow
So in order to overcome all of these trade-offs in content routing systems, we came up with Bittrap In the end, Bittrap is a message-oriented protocol that complements a providing system or content routing system In the discovery and exchange of content in a distributed network Bittrap is already deployed and is already used in IPFS as the exchange interface
And in Filecoin as the Filecoin's block synchronization protocol And Bittrap has a modular architecture that is really similar In the end, Bittrap exposes a simple interface with two operations
A GET operation and a PUT operation The GET operation is the one responsible for saying Bittrap that you want to find content in the network and download the content in the network And then we have the PUT command That what it does is to store content in the network
So we will say, hey, Bittrap, this is the blog or this is the content or the file that we want to store in the network And we see that the models in which Bittrap Note is comprised are the following First, we have a connection manager that leverages a network interface
in order to communicate with other notes in the network and exchange messages with other notes in the network Then we have the ledger The ledger, what it does is to track, so whenever other Bittrap Notes send requests to our note Our ledger, what it will do is track all of the requests being made by other notes
So in this way, we know what others are asking for And if we have that content in our blog store, we will be able to send it back to them And then we have the session manager That is the one responsible for sending new sessions
and orchestrating all the messages that will allow us to discover the content using Bittrap and then transfer the content or download the content from other notes using Bittrap The session manager leverages a content routing interface in order to communicate with the providing subsystem
of the content routing system that there may be in the network So in the case of the example that we will be following throughout all this presentation which is IPFS, Bittrap complements the DHT as the content routing subsystem of the network But Bittrap would be able to work with other content routing systems, for instance a DNS or a network database
And it would even be able to work in isolation without the help or the aid of another system We'll see in a moment how Bittrap works and why I'm saying this But before we start with the operation of Bittrap
Let's understand how Bittrap understands content and how it finds content and manages content within the network So in Bittrap and also in IPFS, content is chunked in blocks So we have for instance this file This file will be chunked in different blocks that are uniquely identified through a content identifier or CID
The CID in the end is just a hash of the content of that block And it's a way of being able to identify uniquely these blocks of a file Because of the duplication, these blocks can belong to more than one file
as long as they have the same content And these blocks are usually linked one to another in a DAG structure like the following And this DAG structure can represent a lot of things From a file, like for instance a file with a lot of blocks that are the CID root
It stores links for the rest of the blocks that comprise the file Or it can represent a full file system This will be the case, like if for instance we have a directory This would be the root of the DAG structure
Will be the directory that will have all the links for the files in the directory In this case we can have for instance like three files And each of these files can be comprised of two blocks So all of these items will have a CID And here we will have the CID root with links to the files
And the files with links to the blocks that comprise the files And this is how BitSwap understands content in the network and interacts with content in the network Another thing that is worth noting before we start with the operation of BitSwap Is the common request patterns that are used when finding content in a peer-to-peer network
And specifically in IPFS Usually we can find a common request pattern And the importance of knowing common request patterns is that BitSwap will behave differently according to the request pattern used
So for instance a common request pattern is when we are trying to fetch a dataset or a full file For instance let's consider this a dataset with a lot of files Where this is the name of the dataset, these are some of the files in the dataset
And each of the files in the dataset is confirmed by a different number of blocks So in this case what BitSwap will do is first So we say hey I want to get these datasets using the interface that we've seen The get command that we've seen in the BitSwap interface
What BitSwap will do is to first gather the CID root of the DAG structure And once it gets this block it will inspect this block and check the links for the next level It will get these blocks and once it gets these blocks it will have knowledge about the blocks
Through the links of these blocks you will get knowledge about the blocks in the next level And in this way BitSwap will start traversing the DAG structure and gathering all the blocks in the DAG structure This DAG structure could be as deep as we want it to be This is one of the common request patterns but we have another one that is also really common
That is the one we would use when we want to download the assets to render a website So imagine that we have this directory that stores a website And we want to render the web page.html
In this case what BitSwap will do is, the first thing it will do is to get the CID root The root of the DAG structure and it will traverse one by one the path Instead of going level by level trying to gather all the blocks of the level As it was the case in this request pattern It will take the CID root, see the links in the CID root, follow the one it is interested in
Which is page. Once it gets the block for page it will get the links for this block And go to the one for doc.html and once it gets to doc.html This is the content that it actually wants, the file that it wants to render
It will read all the links in the block and get all the blocks for the level of the doc.html These are two of the common request patterns used when fetching content like in IPFS and with BitSwap
And this would be the flow that BitSwap would follow So as I've said, BitSwap is the exchange interface in IPFS To understand the operation of BitSwap, let's see how it would work when fetching a file in IPFS
As I've said, BitSwap is a message oriented protocol so we will see six different messages in BitSwap Three requests, there are the one half, one block and cancel And three responses, the half, the block and the don't half So if we want to fetch a file from the IPFS network using IPFS
What IPFS does is it first checks if that file is in its block store So it checks, imagine that I want to download the doc.html that we've seen before The first thing that it does is to check if the blocks for that file are in the block store If this is not the case, IPFS triggers a get operation in BitSwap
And triggers a new session that starts looking for all of the blocks for that doc.html file that we've seen above So the first thing that a BitSwap session does is to broadcast a want message to all of the connected peers for the BitSwap node So this want message, what it does is a want half message that is saying
Hey, from all my connections please let me know if any of you have the block for this CID And the CID, so in this broadcast what we are trying to find is the CID root for the content that we are looking for So in this case we would be looking, if we are trying to find a full file we would be looking for this specific block
It would be the block CID1 that we try to find in the broadcast stage And for the doc.html, the same, we will try to find this CID root that will give us information about the links in order to get into the doc.html So we send to all of our connections a request to check if any of them have the block for this CID
And in parallel what BitSwap will do is to make a request to any of the available providing subsystems in the network In the case of IPFS is the DHT for that CID
So in case my current connections, none of them have the content I have a way of knowing who in the network is storing this content And in this case the CID root block for the content that I'm looking for So according to if these nodes have the content or not
They will answer with either a half message saying hey I have this content Or with a don't have a message saying I don't have this content And also through the DHT we will have knowledge about a node that I may not be connected to that also has the content
The BitSwap session, what it does when it receives these responses Is to add all of the peers that have responded successfully to this request into the peers of the session So in the subsequent interactions for the discovery and exchange of content
Instead of asking all the nodes, all the connected nodes The session will only ask to the ones that have answered successfully to this request In this case these three nodes This is the view from the peer that is requesting content for the network
But what happens, what is the view that a peer that received this request has So what happens, imagine this broadcast from peer A that is receiving So if we have peer A that is sending a want for CID 1, a want for CID 2 to peer B What peer B will do is according to the request that is receiving from peer A
Update its want list in the ledger We've seen that the ledger is this model used to keep track of what CIDs What blocks other nodes are looking for So in this case peer B will keep in its ledger information about the blocks that A is looking for
So in this case, peer B may not have the blocks and send a don't have to peer A Saying hey I don't have it But peer B will still remember the blocks being requested by peer A In order to, in case it receives the block by chance by any other channel
At any time and peer B sees that in the ledger, in A's ledger A is still looking for CID 1 What it would do is if it receives the blocks directly, immediately forward it to peer A And once it sends this block to peer A, as peer A already has the content
It can be removed from the ledger So this is the view from the peers that are receiving requests from other nodes in the network And what is the flow? Because here we've seen, I mean, we do this broadcast in order to see who's storing the CID root for the content I am looking for
But what is the process from the discovery to the actual download or exchange of the block So we've seen that here peer A will send a don't have to all of its connections, peer B, peer C and peer D And they will answer according to if they have it or not have it
To the one half, all of them have the file and they will answer with a half message And they will be added in the session As the first response that is received to peer A is from peer B Peer B, what it will do is say, okay, if peer B has the content, I will directly ask for its exchange
So peer A sends a won't block, in a won't block what we're saying is, hey, please send me this block I already know that you have it, so please send me this won't block And peer B will answer with the block for the CID root of the content we are looking for In this case, peer C and peer D may answer afterwards to the request
When it happens is that peer A won't ask for that block But it will keep the knowledge that these two, peer C and peer D They potentially have the rest of the DAC because they have the CID root And from there on, so as we've seen, once we get the CID root
We can get knowledge about more CIDs in the DAC structure Because we inspect the links and we know what to ask for And from there on, as the three of them are inside the session Because they've answered successfully to this broadcast message We can start asking for more blocks in the next level of the DAC
So here, for instance, to peer B we can ask, we can send directly Because we have knowledge that potentially has already, I mean, at least has the CID root It potentially has the rest of the levels of the DAC structure So I can, instead of sending won't halves and won't block and having to go back and forth I can send directly some won't blocks and the rest won't halves
In this case, in order to have a multi-path of exchanges Peer A sends a non-overlapping request of won't blocks to all of the peers in the session So in this way, we are spraying the request and trying to get the content
Another thing that is worth noting is that bitstrap messages Inside bitstrap messages we can put more than one won't request Because inside the envelope we can have one list of requested CIDs So in this case, for instance, let's take the exchange of Peer A with Peer B as an example
Here we are sending for these three CIDs a won't half Sorry, a won't block and for the rest a won't half And we see that according to if it has the block or not It will answer with the blocks to the won't blocks With halves to the won't halves And with don't halves to the blocks that it doesn't have
Either if it's a won't block or a won't half If it doesn't, if Peer B doesn't have these blocks It will answer with a don't half And this back and forth of request of won't halves and won't blocks Is followed over and over again until that structure
So once we have the CID root we can start getting all of the levels Until we get all of the blocks for the content we are looking for But what happens at one point, because here we are only communicating With the nodes inside the session So the ones that have successfully answered to the CID root to this broadcast
And also the ones that I may have found through the DHT request Or the content routing system query So what happens if at one point I keep receiving don't halves For all the blocks I'm asking for in the peers of the session So imagine that after sending this request
Peer B says that it doesn't have any of these blocks In this case, Peer A will remove Peer B from its session And say hey, I'm not going to ask this guy again Because he doesn't seem to have the rest of the blocks Of the DAG I'm looking for And this may be the case in some cases We may have peers that store only the top levels of the DAG structure
And not all of the DAG structure that a peer may be looking for So in this case, as Peer B doesn't show I mean, it seems not to have the blocks I'm looking for anymore It's removed from the session What happens if all the peers in a bitstrap session are removed? Well, we have to do another discovery, another broadcast stage
In which we communicate research to the providing subsystem To try and populate with potential nodes storing the content And also we again broadcast to all of our connections Just to check if any of them may have gathered those blocks
In the time that I was trying to interact with all the nodes in my session Or maybe I have new connections that they already have the content So in this case, in this broadcast Another thing that we have to bear in mind Is that in this case, the broadcast Imagine that we got into this level And after this level, all of the peers in the session
Don't have the rest of the blocks So something that we have to bear in mind Is that this broadcast, instead of doing the broadcast for this As we already have these two levels We start the broadcast for the nodes Where we run out of peers in the session
So it's a way of populating the session again With candidates that potentially store the content In order to restart the download of the rest of the blocks That we're planning for the specific content And finally, what happens when
For instance, peer A gets a block Because peer A is communicating with a lot of nodes at the same time We may have a lot of nodes in our peer session So every time that peer A receives a block From maybe, in this case, let's consider Peer A interacting with peer B as one of the peers in the session
If at one point it receives the block from another peer that is not peer B Peer A will send a cancel message To all of the nodes in the session So that, I mean, in order to notify them That it is not looking anymore for that CID
And for all the nodes in the session To remove the CID1 from A's ledger So from there on, even if peer B receives the CID1 It won't forward it to peer A Because now it knows that peer A has found the block From other peers in the network
So this is basically how BitStrap works We did an extensive evaluation of trying to compare BitStrap Against, for instance, the DHT So in this case, we made some tests In which we had a peer network
An IPFS network Where in order to find the content In order to find a block in the network You had to resort to a query in the DHT And we compared it with BitStrap In which, I mean, we had the CIDR Within the connections of So in this test, what we had is like 20 nodes
In which we had a lot of ledgers Like 19 ledgers and just one CIDR In the case of the DHT In order to find the block You had to search through the DHT And in this case, as all the nodes were connected One to another The CIDR is connected directly to the ledger
And BitStrap does like this Back and forth of one-half-one blocks To discover the content and exchange the content And what we realized is that BitStrap is always faster than the DHT To find content in the network As long as any neighbor of the ledger has the content
Then we did another test In order to see how BitStrap and the DHT Behaved when the number of nodes Increased in the network I mean, this is not a really meaningful result
Because we are talking about a dozen nodes And the real impact of using BitStrap compared to the DHT Will be seen when we have more nodes But we see that the more nodes there are in the network The slower is the DHT lookup And in this case, I mean
BitStrap may have a bit more overhead To find, like with all of these broadcasts and so on To find the CIDR that stores the content But once it is found The exchange of the block is straightforward
Of course, there is something to bear in mind here BitStrap is really fast As long as one of the connections of the BitStrap node Stores the block If this is not the case The DHT has a 100% probability that it will find As long as the node, the content is still stored in the network
It will find a node storing the content This is not the case in BitStrap Where if we don't use any content routing system It won't find, if the node, if the content is not in any connection of the node It won't find the content But that's why we use BitStrap As a complement to the DHT
Because imagine that the first block, the CID root Because for instance, it may be the case that for the content we are trying to look for None of our current networks has the CID root Or the content we are looking for So then is when we resort to the DHT
Or to any other content routing subsystem To find the CID root Once we find the CID root What we do is we add this peer to the session And we establish a connection with it From there on, we will interact directly with this peer And there will be no need to resort to the DHT
So from there on, there is no DHT lookup But we will leverage the connection that we already established in BitStrap While searching for the CID root for the rest of the back and forth To find the rest of the blocks in the network So that's why BitStrap is so interesting To be used as a complement to other content routing subsystems
As a DHT But this is the baseline operation of BitStrap And one thing we realized when we were doing all of these experiments Is that BitStrap has issues I mean, it's not perfect And this is how, I mean, after this realization Is how the beyond swapping research project started
Because we realized that BitStrap currently is a one-size-fits-all Implementation And it may not suit every use case And every kind of data Because we may have, for instance Applications that they want to be really fast In the time to first block
We may have applications that need to exchange a lot of data So there are a lot of great gamut of applications And BitStrap doesn't have a way of being configured In order to be fine-tuned for any of these applications
Then we realized that the current search Or discovery of content that BitStrap does Is blind and deterministic So it doesn't worry about what has happened before in the network If we see this broadcast stage in the BitStrap protocol It sends a one-half to all of the connections And it doesn't care about what has happened before
In other sessions, in other requests for content Or whatever, other events in the network It just broadcasts everyone And tries to gather information about who has the content And we started realizing that maybe we can do this search smarter
Leveraging the information that is out there in the network In other protocols And in the actual interaction of BitStrap With other nodes in the network from previous interactions And also we realized that BitStrap requests are pretty dumb And dumb in the sense that they are plain requests Where we are requesting a list of CIDs
A want list And instead of this, as we have defined structures We could think about more complex requests Where instead of asking for the blocks one by one And having to go back and forth In order to discover the links for the rest of the levels of the DAC
Maybe we can find a way in which we can perform queries In which instead of saying give me this CID And this block for this AD We could say give me this full DAC structure Or this branch from the DAC structure And make a complex query Where implicit in the query
We have the array of blocks And the list of blocks we are looking for Instead of having to find out by ourselves The blocks that I'm looking for And finally, of course, we could make BitStrap more efficient In the use of bandwidth And this is how, like with these realizations Is how the, beyond swapping these projects
Started This is an ongoing work Here in this repo you will find all the information And I highly recommend going there and checking out Because there are a lot of ideas And prototypes that we are searching And we are inviting everyone to contribute In order to make, we also have the testbed Where we are doing all the tests
And we are inviting everyone to join our quest But to give you a glimpse of what we've done so far We have already prototyped three of the RFCs That have been discussed in that repo We have explored the use of compression At the network interface We'll see in a moment We have explored the use
Like the gathering of information About what's happening in the network To make it more efficient Search of content And we have added a new model That is the relay manager In order to increase the range of discovery Of BitStrap messages Let's start with compression So, I mean, we started thinking Okay, HTTP already uses compression
In order to download data from the web And if HTTP does it, why aren't we using it In order to make a more efficient use of bandwidth And we tried three strategies We explored the use of just like The same way that in HTTP You can compress the body We said, what if we compressed the blocks
And what happened is that in the end We had some savings in bandwidth But there was an overhead From having to block per block Compress all the blocks included In our BitStrap messages So then we said What if we use full compression Every single BitStrap messages is compressed Again, we saw kind of the same behavior
But then we realized, okay And what if we go to the network interface That in the case of the network protocol That in the case of IPFS And BitStrap The BitStrap implementation in IPFS Is lead peer-to-peer And we implement compression String compression at a protocol level And that's what we did
We explored the use of compression At a protocol level for BitStrap And for lead peer-to-peer And what we managed to do is that With a smaller overhead Than in the above schemes We managed to, for certain datasets To have up to a 70%
On bandwidth savings So this was a first win And actually you can find I'm adding here some URLs Where in the block Of PL research You will be able to find All of our contributions I mean, we've been documenting All of the work that we've been doing
Around the idiom BitStrap project So once we had compression going on We said, okay We saw that we can leverage Information from previous interaction In the network in order to make better Better discoveries of content Using BitStrap So the next thing that we implemented Was a once message inspection
In BitStrap What we did is that we said Okay, if a node is requesting content It may potentially Be storing it in the future So instead of having to broadcast Everyone when When we try to find content If any of the nodes, any of my connections Have requested for that CID before Then instead of asking everyone
Let's go and ask directly for the block To the one that had requested it Before So what we implemented is A once message inspection In which BitStrap nodes will inspect The requests from other nodes And for each CID Have a list of nodes
Of the ten top nodes That have requested recently that CID So that instead of broadcasting everyone I will ask, send a one block directly To the node that has requested That CID recently So if we go to this architecture It's similar, the pure block registry
Similar to the ledger But in the ledger Whenever a peer has found That block, it's sent a cancel And then we remove that block In this case We are also tracking the requests From other nodes But in this case We are uploading it live In order to have knowledge About the node The peer
That has more recently Requested for that CID So in the discovery phase Instead of sending a one half For everyone We just sent a one block To the top peer That has requested the file recently We did some tests And we explored We did some experiments In which we had like 30 nodes
Where we had just one CIDer And a lot of leechers Trying to find content But these leechers Came on waves So what had happened Is that the more Of course in the baseline The more nodes that had the content The easier that it was For bits of nodes To find the content But if we go to the want inspection
Improvement To the prototype What we see is that We reduce Even when a lot of nodes Have the content And the time To fetch a block Stabilizes What we see is that We reduce in one RTT The time To request the block
Because instead of having to send A one half to everyone And then send a one block To get the block Here what we are doing Is Is that we are directly Sending a one block To the guy that we know That have requested Recently the content And potentially has it What happens if this guy Doesn't have the content It doesn't matter Like we start
We lost an RTT And we start over again Like we do all the traditional One block One half With the discovery That baseline Bits have used And another interesting consequence Of this prototype Is that we We significantly reduced The number of messages Exchanged between nodes Because instead of having
To send this one half Even if you know already Who has the content And all this one half One box and so on If you have in the Pure block registry An entry for that CID You directly send the one block To that guy So you are reducing The number of one blocks That you have to send And the number of back and forth That you have to do With other nodes
In the network So another big win For bitswap And the next thing We went one step further And we said Okay The problem is that If one If our nodes Any of our neighbors Has the node The block that we're looking for Then we have to resort To the content routing system
But if we add a TTL To bitswap messages So that these bitswap messages Can't jump It means that Even if the neighbors Of our neighbor Has the content I don't have to resort To the content routing subsystem To find the content And I can go Directly Like I can use this My neighbor as a relay
To find the content That is in my neighbor's neighbor So here we see How this would work I mean if PRA Right now sends A watch message to PUB What happens in the baseline Implementation of bitswap Is that PUB doesn't have the content Say hey I don't have it And then PRA PRA has to find some way
To find the content But in this case What we will do Is that When PRA sent a watch message To PUB PUB will start a relay session And broadcast like forward These messages Until the TTL is zero To its own neighbors And according to PUB Has the content or not
PUB and PUD Will answer According to PUB So what we're doing here Is that PUB will be Communicating with PUB and PUB As if they were As if both of them were PRA's neighbor Using PUB and PUD As relays And with this What we're doing
Is that we're increasing The range of discovery Of bitswap Without having to resort To an external Content routing system And the results were Pretty pleasant Because I mean This is the Here what we had Is 30 nodes Where we had Like one single seeder We had a lot of Passive nodes Passive nodes in the end
They just run the Bitswap protocol But do nothing And then we had A lot of leechers Trying to find the content Cedars and leechers Couldn't be Connected To each other directly So they would have to I mean They would have to resort To a content routing system To find the content Because they don't have
Direct connection Or use this Jumping bitswap To find the content In the seeder And what happened is that We see that the DHT Having to Do this request For the DHT To find the seeder Is lower Than Using the TTL And Jumping Through the passive node To find the seeder And use the seeder
As the relay To communicate between The leecher And the seeder And another interesting So What happened here If we see this result What happened here Is that we say Okay So we're sending A lot of want messages We are exchanging A lot of information Between nodes We have a lot of Requests From the network What if we mix The jumping bitswap
So the use of TTL And bitswap messages With the pure block registry The want inspection Because as we are Getting more information From nodes that are A few hops Away from me I can leverage This information To make More direct Search And this is actually What we did And what happened Is that it actually worked And the fact that
We were gathering More information In the pure block registry Because of all Of this flow Of want messages Through passive nodes And relayed And forwarded Want messages And the fact that We can make Instead of sending All these want Hop one block Once we know Where the block is We can do directly This want block
And get the block back It meant that There was a significant Increase I mean improvement In the time to fetch Blocks Of course This always comes With a tradeoff And the tradeoff Is that The fact that We are using Symmetric routing So that
To gather the block We are using The same path Used To So Because what happens With pure If pure Goes to the DHT To find who stores The content And C has the content What it does Is that directly Establishes A connection with C So from there on Like the communication Is directly Between A and C
For our jumping B trap What happens Is that we are using PUB As the As the relay And PUB May be connected To B and D And B and D May be connected To each other So the thing is that Again there are a lot Of messages flowing The network And there may be A lot of blocks More than one Relay here Finds the block Maybe a lot of blocks
Flowing Into the network And that's why We see this Increase In the number Of duplicate blocks Flowing around In the network Compared to the case In which we use the DHT Just for To discover The node That stores the content And directly communicate To it We would I mean We are already thinking
Ways of improving this Because actually we use Instead of using the relay session To perform the exchange Of the content We use asymmetric routing So that We just use the TTL To find and discover The node That stores the content And then The same way that we do With the DHT We establish a connection Directly with that guy We would reduce The number of duplicate blocks here What is the problem of duplicate blocks?
In the end it's an Inefficient way I mean an Efficient use Of bandwidth And this is all that we Have tried so far To improve file sharing In P2P networks But this is an ongoing research And I invite everyone To join us In this quest There are a lot of RFCs With potential ideas
Of improvements Not only to improve File sharing in IPFS Or in GitHub But to improve File sharing In P2P networks Overall So have a look at them And join our discussion In order to give us Feedback about What is happening out there There are already Research and development Teams building products For the RFC And coming up
With new RFCs That are being discussed In the repo So in the end If you like All of these topics Help us make File sharing In P2P networks Placing fast Going into this Repo Joining the discussions And proposing new ideas And prototypes Here you will also Find the testbed And ways to replicate
The results that I've shown Throughout the talk And that's all for me Please, if you have any questions Or you have any feedback