The Evolution of MonoTorrent
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 97 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/45752 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 201023 / 97
1
2
4
5
6
8
14
15
16
17
23
29
38
41
42
44
46
47
48
50
53
54
62
63
64
65
66
71
74
75
77
78
79
80
82
84
85
94
00:00
Operator (mathematics)Latent heatHacker (term)BitVolume (thermodynamics)Multiplication signLibrary (computing)Process (computing)Peer-to-peerExterior algebraNormal (geometry)Different (Kate Ryan album)Extension (kinesiology)MiniDiscReading (process)Cartesian coordinate systemThread (computing)View (database)CodeAlgorithmSocket-SchnittstelleStudent's t-testFile Transfer ProtocolLine (geometry)Connectivity (graph theory)GoogoloutputInformationAreaServer (computing).NET FrameworkWeightComputer fileElectronic signatureConnected space2 (number)Lecture/Conference
04:07
Thread (computing)InternetworkingGraphical user interfaceComputer fileIdeal (ethics)DeadlockLecture/Conference
04:36
Thread (computing)CASE <Informatik>Shared memoryOrder (biology)Wechselseitiger AusschlussDifferent (Kate Ryan album)NumberSerial portIdeal (ethics)Hazard (2005 film)Cartesian coordinate systemParallel portDeadlockLecture/Conference
06:35
Peer-to-peerWeb 2.0Thread (computing)MiniDiscBlock (periodic table)State of matterNetwork socketServer (computing)Connected spaceSocket-SchnittstelleCASE <Informatik>Order (biology)InformationComputer architectureCore dumpSystem callBitBoolean algebraSelectivity (electronic)Scaling (geometry)outputReading (process)Operator (mathematics)Loop (music)Multiplication signLecture/Conference
09:52
CurveComputer architectureCASE <Informatik>Thread (computing)Process (computing)System callLoop (music)Peer-to-peerLogicCartesian coordinate systemProxy serverNetwork socketSocket-SchnittstelleLecture/Conference
10:48
Network socketQueue (abstract data type)Execution unitPort scannerLoop (music)Object (grammar)Proxy serverLoop (music)Cartesian coordinate systemNetwork socketProcess (computing)System callCodeConnected spaceView (database)Point (geometry)Lambda calculus.NET FrameworkDeadlockOrder (biology)Thread (computing)Socket-SchnittstelleLibrary (computing)WeightLecture/Conference
12:20
Thread (computing)CASE <Informatik>Connected spaceMereologyException handlingCore dumpDeadlockLocal area networkLogicLecture/Conference
12:45
Network socketQueue (abstract data type)OvalLoop (music)BefehlsprozessorThread (computing)Operator (mathematics)2 (number)Loop (music)Condition numberPoint (geometry)Queue (abstract data type)Network socketNeuroinformatikState of matterImplementationSoftware frameworkComputer architectureLecture/Conference
15:03
Multiplication signThread (computing)Computer fileSocial classCodeRandomizationSoftware bugMiniDiscControl flowPerfect groupSoftware testingBranch (computer science)Cartesian coordinate systemCommunications protocolPoint (geometry)Condition numberOrder (biology)Extension (kinesiology)Network socketRule of inferenceDeadlockSystem callWeb 2.0CausalityChainUnit testingLecture/Conference
19:15
Convex hullSocial classLecture/Conference
19:49
Interior (topology)Maxima and minimaMoment of inertiaMUDSupremumInclusion mapConvex hullQuantumExecution unitMenu (computing)Color managementDefault (computer science)MIDIImplementationSocial classSubject indexingLatent heatRange (statistics)Line (geometry)Disk read-and-write headCodePoint (geometry)Computer fontLecture/Conference
22:42
Execution unitSocial classCodeField (computer science)Operator (mathematics)Line (geometry)Normal (geometry)BitElectronic mailing listMetropolitan area networkLecture/Conference
24:03
EmailOpen sourceVideoconferencingMultiplication signCartesian coordinate systemTouchscreenInformationComputer fileRevision controlBitGame theoryInternetworking1 (number)Library (computing)Motion captureExtension (kinesiology)Server (computing)Lecture/Conference
25:46
Extension (kinesiology)Human migrationGame theoryNeuroinformatikMedical imagingDistribution (mathematics)Computer fileBitServer (computing)Multiplication signUniform resource locatorLecture/Conference
26:49
Execution unitWechselseitige InformationMenu (computing)WindowStreaming mediaCartesian coordinate systemAuthenticationService (economics)WordVideoconferencingBand matrixSubject indexingComputer fileBitLimit setBuffer solutionMobile appTouch typingLibrary (computing)Connected spaceFault-tolerant systemPoint (geometry)Client (computing)Computer programmingServer (computing)Data storage deviceHypermediaLecture/Conference
Transcript: English(auto-generated)
00:00
Ok, ready? Do you need an introduction? That's Alan. Hi, I'm Alan. My name is Alan. He's a young guy and he still authorized to go and see on C++ Let me do the intro. Let me do the intro. I think this deserves an intro.
00:20
Alan McLovron was this guy. We're doing Google Summer of Code. Isn't he carrying coffees or something? So Alan McLovron we had a Google Summer of Code and I really wanted to have a BitTorrent library in C sharp. The reason I wanted this is because I figured instead of doing HTTP fetches
00:43
and downloading files like that we really should be using BitTorrent. BitTorrent should be a fundamental component on every application that needs to transfer files instead of FTP or fetching file signatures. That's just stupid. But we do need a library for that.
01:00
So in the Summer of Code we said we need a student that over the summer writes a BitTorrent library. How hard can it be? And this year we received three applications, I think. There were three. But Alan was the first one Alan submitted an application and he immediately contacted me and I and said I've been thinking about this. I think this is how
01:22
we build data, etc. So we basically said, well, Alan seems to be the most excited guy for the BitTorrent library. So we got Alan on board and he did an amazing job over two months. Unbelievable work. But not only that, Alan kept working on this in his spare time. So next year we brought him to Boston to work on other stuff that was not BitTorrent.
01:43
He's an amazing hacker. And one year after when he graduated we said you need to work for us. So that is the amazing Alan McGovern. One of the best hackers in the world. Thank you. So the idea of this talk isn't really talk about how Monetorrent evolved.
02:01
It's basically there are a few problems that I encountered while implementing the BitTorrent specification and I just want to show people how I worked around them or how I came up with a solution. So basically, what is Monetorrent? You need to use the microphone. Do I have to turn it on? Hello?
02:21
I don't know. Okay. So, what is Monetorrent? It's just a .NET based BitTorrent library. There's just a bit of background info here. So it started out in the 2006 Google Summer of Code. As Miguel said,
02:40
it was three months, I think, ten weeks and of course I didn't have everything like there wasn't 38,000 lines of code written then, but these days it's about 38,000 lines of code. It supports pretty much everything you want from BitTorrent library like multi-threading, UDP trackers, HTTP trackers, all the
03:00
different extensions like DHT which is like an alternative to normal BitTorrent trackers. So basically you can find other peers without having a centralized server. So basically making BitTorrent decentralized. So everything is supported. So, as I was saying, the idea behind this talk is two areas. One, the threading. How
03:22
did I implement the threading? As you can imagine, there's 60 connections or 60 sockets. You have to do a lot of disk IO, reading and writing to disk. You have to contact trackers. It's all asynchronous. You can't afford to do blocking operations because it'll just kill the performance of the library. You won't be able to download 10 megabits a second
03:42
or 100 megabits a second. You can't do blocking APIs. And the other bit, I want to talk about the algorithm which allows you to choose who you download a piece of. As in, maybe piece one off this guy, piece two off this guy, or maybe you split it between different people to get a single piece faster. So
04:01
that's kind of interesting, just how it all works. So that's what I'm going to talk about. So the common thing I see a lot on the internet is people want to use threads. And they're like, oh, I have to download this file. I want to use a thread and then I want to update my UI from the thread
04:21
and then I want to do this from the thread. It's all wrong. That's the wrong way to use threads. Threads aren't a way of updating your GUI and doing this and doing this. It just won't work. You're going to end up with deadlocks. Things are going to break. So, what's the ideal number of threads in an application? Say you have a lot of work to do and you can
04:42
make a parallel. Anyone want to hazard a guess? One. But you can deadlock, which is better, doing the work slowly or not doing any work? So,
05:02
I'll come back to this later. The idea is with one thread you will always complete your work. It could be slow, but it will always be completed. Because otherwise if you have two threads you can deadlock. So, I'm going to come back to these questions now later. So, how many other threads should one thread know about?
05:23
Yeah, exactly zero. If your thread is waiting on another thread to finish you don't need threads. That's serial. Threads and serial don't work together. They should be parallel. So, yeah, just zero. How many locks or mutexes
05:40
should you need to access shared data? Like, how many layers of locks? ideally zero. Because there should be no shared data. Because otherwise you can't make it fully parallel if you're sharing data. But of course, we live in the real world, so we do actually need locks
06:02
because we will have shared data in some cases. So in that case, one. Because if you have more than one lock if you have two locks, different threads can acquire the locks in different orders and you'll end up with a deadlock. So that's what I was saying about the one thread. You don't want two locks or if you have five locks then it's going to break. There's no way you can guarantee that your application is
06:22
always going to take the locks in the correct order. Well, if you can guarantee that, then why are you using five locks? So really you want one lock. So just to remind you if you're trying to use threads to do too much stuff like when using lots of locks you're doing it wrong. It's the wrong way to use
06:42
threading. So for monotone, what we have in a typical case is maybe 60 connections each of those is transferring data then you have lots of very slow disk access. So if you're downloading data you're going to be writing 16 kilobytes at a time. So if you're downloading at a megabyte a second, that could be
07:02
30 or 40 write calls a second. You have web requests because you need to contact servers and you have the DHT requests which is the decentralized peer selection. So all of this is very slow. You can't afford to use one thread per socket or
07:21
one thread for disk.io or one thread for each web request because it's too many threads. You're going to end up with 200 threads, 300 threads. It doesn't scale. And you can't use blocking calls because you can't just go to socket.read and then wait for the data to arrive because what are your other sockets going to do? They're going to sit there with data
07:41
pending and you can't read it because you're waiting for the first socket to give you data. So blocking calls don't scale. So the original architecture that monotone used was basically locks and threads. What I said is very bad and this is why. When a peer
08:00
connection starts, it receives some data. Of course, we just received this block of information. What it then did is I took a lock. So on the peer, I took a lock. So then I could use this data. I could decode it into proper data and then I could update that peer state because sometimes
08:20
I just had to change a flag like make it a boolean true instead of false. So in that case, I don't need to access the actual core. So you think, okay, this is good. I'm using a very small bit of information. I just need to update the peer so I just lock on the peer. But sometimes you also need to update the core like the actual engine.
08:40
So you take a lock on the engine and then you update it. So that's good. You have two locks. It's very simple. Sometimes you just need to update the peer. Sometimes the peer and the engine. Then of course, you have 60 people doing that. So this makes sense. You're always guaranteed that the peer will be locked and then the engine. That's the order. So lock A then lock B.
09:02
But there's problems with that and this is one of the problems that I was fighting with for maybe a year is that essentially the engine will always you need a loop. The engine sometimes needs to just do an operation. So the engine will lock on itself and then the engine will realize, oh, there's a peer
09:22
and this person hasn't sent me data in two minutes. So I need to close that connection. I came up a bit wrong. So then the engine is now locking on the peer. But that's backwards. So what basically happens is the engine would deadlock. There's no way around
09:40
no easy way to fix this. I'd fix one case and it would break somewhere else when I implement a new feature. So basically, two locks was too hard. I can't do two locks. It just blew up. So that's what I'm saying. Two locks, you think it's easy but then there eventually comes to a case where you can't do it with two locks.
10:02
It's too hard. So I decided what I need is a new architecture. One without locks. I need a new API. So basically it uses one thread. Like a main loop in a GTK application or a WinForms application everything happens on the main loop.
10:20
So what happens is I have one thread which executes all the engine logic which does all the data decoding the updating of the peers and every call is asynchronous. So you're thinking, okay, one thread how are you going to handle 60 sockets? Well, each socket just uses the normal asynchronous APIs like begin receive, end receive.
10:43
But instead of instead of actually doing processing on the thread it always proxies back to the main loop. So this is essentially what the code looks like. I just call a standard socket.beginConnect then oops
11:01
forgot about that So if you're familiar with the socket APIs beginConnect is the regular it's a regular call. So once the connection attempt is finished .NET will invoke this callback which is here. And all I do is I queue that up on the main loop.
11:21
And then do the normal stuff here. So as you see I'm using a lambda anonymous delegates because it's a very easy way to proxy objects and code onto the main loop. So every asynchronous call essentially just does this. So rather than having to worry about, oh, I have to lock on this and then lock on this and then do my updates
11:42
everything goes to the main loop. So the slow processing is actually receiving the data or writing the data. So all of that is asynchronous. But from my applications point of view there's only one thread, this. So sure, monotaurant is using maybe 10 or 12 threads but all I really work with
12:02
is one. So because I only have one thread that I actually work with I can't deadlock and I can never take a lock in the wrong order. And all the slow work isn't done. No slow work happens here. So the performance is really high. Some people use the library
12:21
and they've reported that they can saturate a 100 megabit connection like on their LAN using monotaurant which is quite nice because you think, okay, one thread it must be really slow but it's just one thread for your core logic so everything is still fast. You can't deadlock because there's no locks, I don't use a lock except for small parts
12:41
and in that case there is only one lock because here when you're queuing it up this has to be protected by a lock but only the queuing and de-queuing from the main loop is protected. I don't need to protect anything else. This is a great architecture if your CPU bound
13:01
Well if your CPU bound yeah, it's a different story. So it just yeah, it just depends. Well yes and no. If your CPU bound, you can still use this you just go instead of instead of socket.beginconnect you go my big long operation that takes 10 seconds
13:21
.begin and your 10 second operation will be in the background thread from the thread pool and then once that 10 second operation is finished you go back to your main loop and you do a microsecond of work. So that's what I'm saying. Then you're explicitly used because then your computation is running in another thread
13:42
But then again you're back down to your operations can my operation be paralysed and if so well actually there's another talk about that kind of stuff later by Jeremy Lovell who's around there so he's going to talk about how to do that kind of work without using threads without explicitly starting I need four threads here to do this work
14:01
he'll show you how to do it without using threads explicitly you're not explicitly using threads yes and that's the point I think what he's saying to me I think what he's saying is you don't want to touch any shared state
14:23
on the thread so what he's doing is he's not touching any shared state because he's using threads there it's all in the background but he's not sharing any state and the only state that he needs to protect is the state of the queue and the question is I think that parallel frameworks
14:42
have a non-blocking queue implementation which would be even more fantastic but it is so it's just basically a way of using threads without worrying about locking or any race conditions because all the important stuff happens on one like all the important non-blocking
15:00
really short lived tasks happen on one thread so that's basically how Monotron works on threads so it's quite I like to think it's a lot simpler than before because before I fixed a bug with a deadlock and now I haven't had a deadlock in more than a year so it is working much better
15:21
but as he was saying, different applications do need different approaches for this kind of application I think it's perfect because the only slow stuff is disk access is socket access is web access and that can all be done handled with the normal calls I never have to explicitly use threads or worry about it
15:41
so the second half of the talk is about the piece picking basically in BitTorrent when you download a file it will be split up into small chunks maybe 32 kilobytes up to sorry, into pieces of 32 kilobytes to 2 megabytes
16:00
and then each piece is in chunks of 16 kilobytes which is specified in the protocol so if there's 32 kilobytes a 32 kilobyte piece has two chunks a 2 megabyte piece would have lots of chunks so the idea is that sometimes if you have really big pieces like 2 megabytes or 4 megabytes
16:21
you don't want to have to download that off one person because if you're downloading 2 megabytes off one person and they're really slow then you're going to wait forever for that piece to arrive but you don't want to there are also some other things sorry there we go
16:40
there are other conditions as well because you don't want everyone to start downloading from the very start of the file and download in order because then if someone goes offline you'll find that everyone has pieces 1 to 50 no one has pieces 51 to 100 so you have to be kind of you have to choose the rare pieces also you don't want everyone choosing the same rare pieces
17:01
because there's no point in everyone going oh, only one person has piece 7 so everyone downloads piece 7, that's bad so it has to be kind of random so you go okay, I have 10 pieces they're all kind of rare I'll download one of those and then of course sometimes you don't want to download files there could be 100 files in the torrent, you want one
17:21
so all these make it more complex to pick a piece because you have to follow all these rules and then there are some extensions to the protocol which say under certain conditions you're allowed to use this, otherwise you're not so I needed a way of solving this which was one, I could test because I do have a lot of n-unit tests
17:41
because without them I'd never get anywhere I'd fix a bug, I'd break something so what I originally started with was one class so I just have a class and it's called piece picker and it just picked a piece at random and then I made another method in this class which made it choose rare pieces at random
18:00
and then I made another method in this class I made it more complex and it's like, now I can ignore some pieces and then I had to add more code and as time went on, I couldn't test it it was impossible for me to write a test and verify that under every condition I will actually choose a piece randomly or under every condition I'll choose one of the rare pieces
18:21
it was too hard there was too much code, too many branches there was no way I could test everything so what I really wanted was one class which will select at random but this isn't working at all one class which will choose rare pieces
18:41
sorry about that one class which will be able to let me ignore pieces or sorry, ignore files and then other classes then for specific behaviors so I wanted one class to handle each kind of picking behavior and then I wanted to chain them into a pipeline
19:01
so, code time give me a second now to see if this will work again
19:22
ok, so as we were saying I originally started with just one class piece picker, which had everything I figured that couldn't work I need different subclasses so this is the standard class there are basically two important methods I'm sorry
19:41
this one here pick piece you switch and disappear you confirm it ah, ok
20:06
keep that ok, so the two important methods I'm going to talk about are pick piece and is interesting so pick piece as it says is just a way of that's the method that's called when I want to choose a piece from someone
20:22
and then is interesting is the method that tells me if a particular person has a piece I want so are they interesting to me do I want to go through all this complex calculation and choose a piece from them so the idea behind the class was when you're implementing a subclass for specific behaviors
20:42
you don't want to implement you see all these virtual methods you don't want to implement all of those you only want to implement one, maybe two so the way it works is when you create a piece picker it's a pipeline so you make a piece picker which wraps another one which wraps another one
21:00
which wraps another one and the default implementations of all these methods just sorry, here we go all it does is it returns I have the picker up here and I just call a method on that picker if it's not null
21:21
so it's really simple, by default I will always call the base implementation the class I'm wrapping so what it means is you can write code like this this is an entire subclass which does randomized picking that's all, just one method because every other method will automatically call
21:42
the one I've wrapped so basically there must be one class which implements everything but everything else just has to change one specific piece of behavior so for random picking all you do is is that font big enough?
22:01
or do I need to make the font bigger? is the font size okay? so for random picking all I do is I choose an index between the start and end of the interesting range and then try and pick a piece there if I can't pick a piece between the midpoint and end index I choose it between the start and midpoint
22:22
so it's very simple that's how you do random picking this is very easy to test it's just five lines of code you could test that in your head so this is the advantage of this new method now because I can test things really easily or if you want
22:40
to ignore something suppose you never want to download you never want to download a piece you already have so one of the things is I need to be able to ignore the pieces I already have so all I do is this bit field is the pieces
23:01
that the remote peer has that one is all the pieces I want to get and this is all the pieces I already have so it's just a bit field operation like not and so it's just one operation to remove all the pieces I already have from this list
23:21
so if I have if what I have is one, I have these pieces so it's just a normal bit field and the remote peer has this all I end up with is zero, one, one zero, zero zero I think
23:41
so I'll have two pieces left so this is really easy to test again it's two lines of code so before it was very complex because I had one class trying to do everything but now I have a much simpler method of doing everything and that is mostly my talk
24:03
so back to here and oh it's gone back to the start so basically does anyone have questions on anything I've spoken about questions?
24:23
do you still have the beautiful versus version running? I'm going to say yes because you'll forget about it by the time you leave I'm going to say yes because you'll have forgotten by the time you leave so you won't check I think it still works but I don't use it I haven't used it in a while
24:41
so sorry I use monsoon it's better there are a few there's a few open source ones there's monsoon there's another one parrot parrot bit torrent or something
25:02
there's a guy who's made it's a little application he was on the diar c channel a while ago basically it will take a video file suppose you make a game you record a screen capture from it and he basically has a little application which will generate a torrent strip information from it add it to the torrent and then distribute it
25:21
all automatically so you select a file you just hit upload to the internet and he uses monotaurant to generate everything and then put the file on the internet and publish it there is an extension for banshee because monotaurant this library has been exposed over dbus well there is an application
25:42
which exposes this over dbus so banshee can then download podcasts which are via torrent that extension isn't really maintained because they don't have the time for it but there is an extension out there which allows you to download podcast torrents there are a few commercial companies which have contacted me in the past which use monotaurant to distribute files
26:02
like updaters for games or one particular company it's basically VM image migration and they want to be able to have one VM image and then migrate that to 200 servers and they are using monotaurant for that because it allows them to have one centralized location and then every computer
26:22
can transfer the data between themselves so there is no single bottleneck it's not like 200 computers trying to get all the data off one server they have normal bit-taurant so it makes their deployment much much faster it's MIT x11 so as open as you want
26:50
it's funny you should say that because we have the sliding window picker which basically a guy got in touch with me and was like
27:00
I want to use bit-taurant for streaming every two months someone emails me and goes can I talk to you I want to stream video with bit-taurant and I want to use monotaurant for this so the idea behind the sliding window picker is that when you are streaming something you want a buffer of maybe a minute so you want to download these specific pieces very quickly
27:20
and then there is another bit from one minute to three minutes where you want them but it's not urgent and then finally anything else you can choose whenever you want so if you have all the high priority pieces and all the medium priority pieces everything from where you are in the video file onwards is low priority so you just choose it whenever
27:42
so all you really need to do with monotaurant is keep the high priority start index updated so as your video player plays the file you go okay I've played the first 10 kilobytes you increase the index by one you've played 100 kilobytes increase the index by 10 and monotaurant will then automatically
28:01
only select the pieces from that point onwards and will always keep a buffer of a minute so it is possible to stream via bit-taurant but it isn't ideal but it can be made better because the advantage of bit-taurant is that anyone can choose a piece off anyone else but if you are streaming maybe two or
28:20
three people will have the piece and then you will need it and then nobody needs it because it's in the past people just want new files so it's not ideal because you don't get the same kind of bandwidth sharing that you wouldn't in normal taurant but yeah it is possible you probably should talk about how this can be plugged into the server-like media pipeline
28:41
yeah so one minute one minute okay so yeah this is the fully managed library it's RC sharp so it's fully compatible it can be run inside the server-like sandbox so if you really want you can develop a little bit-taurant client which will publish the server-like can only access a limited set of ports
29:02
and in those limited set of ports there is an authentication I can't remember the words now but basically you have to respond with a certain bit of data before you are allowed to make a connection so you can't just connect to anything you have to connect to a proper service like a server-like service so you could write a quick application
29:22
using monotaurant to expose bit-taurant over the silverlight service and you can then just use this to stream video in your silverlight application so if you want video streaming in silverlight you could do this just as easily
29:44
Apple have banned bit-taurant from the app store it has banned bit-taurant but if if you implemented a downloading program that didn't explicitly say bit-taurant that would be ok but as soon as you use the word bit-taurant it will say no
30:00
because bit-taurant is illegal and people only use it to download illegal things it's completely different it only downloads Linux ISOs why is it called monotaurant? because long story it was originally called bit-sharp because no one was using
30:21
a name like that and I figured bit-taurant, c-sharp, bit-sharp, it's great but there was another library which had been inactive for two years and then the guy activated again and he made a commercial product out of his library and that was called bt-sharp so bit-sharp and bt-sharp it's too similar so it just was called monotaurant
30:41
what's your problem with monotaurant? thanks for listening