MirrorBrain - Free CDN for Free Software Projects
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 70 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/39532 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 20097 / 70
3
4
6
16
17
18
19
20
21
22
23
24
25
26
27
29
30
31
32
33
35
40
41
44
46
47
48
50
51
54
55
57
58
59
60
62
65
67
68
69
70
00:00
FreewareOpen sourceContent delivery networkContent (media)Content delivery networkMeasurementClient (computing)Service (economics)Normal (geometry)FreewareShared memoryMedical imagingState of matterSoftwareInternet service providerDatabaseProjective planeUniqueness quantificationBitBound stateVirtual machineUniverse (mathematics)Office suiteDirect numerical simulationInsertion lossBasis <Mathematik>Link (knot theory)Euler anglesWeb browserKernel (computing)Form (programming)Computer fileOpen sourceRevision controlServer (computing)Type theoryTask (computing)Line (geometry)1 (number)DistanceDimensional analysisWeb 2.0Machine visionConnected spaceDigital rights managementMultiplication signWordChemical equationLevel (video gaming)Electronic mailing listProduct (business)Physical systemArithmetic meanDirectory serviceEndliche ModelltheorieBand matrixElectric generatorNatural languageYouTubeFront and back endsFeedbackModulare ProgrammierungInformationTime zoneDistribution (mathematics)2 (number)In-System-ProgrammierungComputer animationLecture/Conference
09:13
Time zoneInformationServer (computing)Client (computing)Formal languageSoftwareMultiplication signWebsiteCommunications protocolEndliche ModelltheorieFocus (optics)Field (computer science)Moment (mathematics)MereologyDifferent (Kate Ryan album)Pairwise comparisonPolarization (waves)outputSelectivity (electronic)Product (business)BitFamilyExtension (kinesiology)EmailSimilarity (geometry)Uniform resource locatorWeb browserSoftware frameworkFile Transfer ProtocolNormal (geometry)Lecture/Conference
12:28
Airy functionInternet service providerCollaborationismBitWordFlow separationSoftwareClient (computing)Goodness of fitContent (media)Software frameworkSpeech synthesisBuildingDifferent (Kate Ryan album)Lecture/Conference
14:58
Computer fileAxiom of choiceOpen sourceInternetworkingChemical equationMedical imagingType theoryContent (media)Network topologyHash functionChaos (cosmogony)Client (computing)1 (number)Selectivity (electronic)Formal verificationStructural loadGoodness of fitFiber (mathematics)Electronic signatureService (economics)Computer fileUniform resource locatorInternet service providerServer (computing)File Transfer ProtocolLimit (category theory)Software bugGroup actionSoftwareLastteilungElectronic mailing listLecture/Conference
20:15
File Transfer ProtocolClient (computing)Bridging (networking)Hash functionHeat transferServer (computing)TrailCombinational logicSurvival analysisClient (computing)DatabaseFile Transfer ProtocolSlide ruleSelectivity (electronic)Process (computing)WordHash functionBitPower (physics)Computer fileHeat transferSoftwareOrder (biology)Link (knot theory)Sign (mathematics)Category of beingArrow of timeConnected spaceInclined planeTask (computing)Level (video gaming)DialectInformationMereologyDifferent (Kate Ryan album)Electronic mailing listFile formatSoftware developerCASE <Informatik>Server (computing)Virtual machineInclusion mapSolid geometryElectronic signatureInternet service providerInternetworkingClassical physicsContent (media)Error messageWeb browserParallel portComputer animation
25:16
Rule of inferenceCASE <Informatik>ThumbnailLocal ringNumbering schemeGroup actionLevel (video gaming)Connected spaceExecution unitSign (mathematics)Link (knot theory)Client (computing)Goodness of fitFeedbackMereologyNetwork topologyDistanceSatelliteNormal (geometry)DataflowInternetworkingWordTupleLibrary (computing)Exception handlingDialectLecture/Conference
30:16
MereologyGoodness of fitConnected spaceVery-high-bit-rate digital subscriber lineWordLecture/Conference
30:50
Process (computing)Office suiteAreaSoftwareInformationBitFreewareVideo gameProduct (business)Degree (graph theory)Lecture/Conference
31:45
Selectivity (electronic)WordConnected spaceBitSemiconductor memoryNumberInternetworkingLecture/Conference
33:13
ExistenceFlow separationFinite differenceDatabaseDifferent (Kate Ryan album)Internet service providerEmailSynchronizationElectronic mailing listProjective planeMultiplication signGoodness of fitFreewareConnected spaceComputer fileNetwork topologySet (mathematics)Internet forumPoint (geometry)Data structureGroup actionOverhead (computing)Electric generatorLine (geometry)Numbering schemeMathematicsAddress spaceSelf-organizationCollaborationismPhysical systemScripting languageContent delivery networkRight angleVirtual machineSystem calloutputContent (media)Forcing (mathematics)Product (business)Office suiteOpen sourceForm (programming)Service (economics)Level (video gaming)Electronic program guideLecture/Conference
Transcript: English(auto-generated)
00:12
Hello, so I'm going to start with my talk about mirrors, about content delivery networks
00:22
and about delivering free software. My name is Peter and if you haven't heard CDN before, CDN is short for content delivery networks, networks that are specialised for pushing
00:42
stuff to users who want to get it. I don't want to bore you with technical details about all this today. I rather want to share my vision. I have a vision for the future,
01:04
how to improve on these things and this I want to share with you and I hope you find it interesting and maybe we can stimulate some things that we can do together later.
01:23
So what is it about? If you look at the world of free software projects, then there are many many projects that have stuff that they do for users and give to them and there are larger ones and smaller ones and all these share the same challenge. Some of these
01:52
projects provide content that is really huge like CD images or larger software packages like OpenOffice while other of these content providers provide smaller bits that are less
02:07
problematic but may also be highly popular and mirrored around the world like the Linux kernel for example and solutions to get this stuff to users exist in three forms.
02:27
The content delivery network could be either a commercial one, it could be that. The commercial content delivery networks are really specialised in that like Akamai or YouTube. They would
02:45
be very nice to use for open source software projects but they are really expensive. They do what they do well and you have to pay for it a lot. The second type of content
03:00
delivery networks is things that the academic world came up with during the last 10-20 years. There are some approaches that are highly interesting but they never reach the production state. Often they are too complex to really be implemented and often they have
03:28
for example features like bandwidth and latency measurements between users and you might need nodes around the world for that to do these measurements and client feedback
03:41
has to be provided to some servers and this is hard to implement because then it doesn't work with a normal web browser for example and you need infrastructure, real machines that are placed somewhere that do this work. So the academic approaches are sometimes
04:04
it's nice to read on them but only a few of them are really in use, only one actually. And the third approach which is very popular and also very traditional is to use mirrors.
04:22
Mirrors are server machines contributed or provided by universities mostly by ISPs or also by private persons so you download stuff from a mirror. There are about let's
04:42
say 300 mirrors around the world which provide this service and many of them you know well because you come back to them again and again. So how do we deal with these
05:02
mirrors? How is this organized? Let's go to some examples. I have put some logos here of some approaches for this task. These are the commercial guys. Akamai is used
05:25
by Apple, Microsoft, Novell to provide their software updates. Limelight is actually providing YouTube for example. Then you all know SourceForge. SourceForge has a few mirrors but very large
05:40
ones. They have a web front-end approach for users to get to this stuff and another approach is the mirror manager of Fedora which has again a slightly different approach.
06:03
It is not working on file level, it's working on directory level. So it doesn't know exactly if a particular file is on a mirror but it has some kind of state database which
06:21
knows roughly what mirrors do have. There is Bouncer. Bouncer is used by Mozilla and by OpenOffice. There are actually two versions of Bouncer and one of them does not actually, one of them is able to distribute client requests on geographic basis to mirrors in
06:46
that region which Fedora also does. They didn't mention that. So that's probably something that you always want to do because connections to closer mirrors always work better or typically
07:03
work better. There is another Bouncer version that does not support this which is used by OpenOffice. There is the Debian style approach which basically just does a schematic
07:22
assignment of mirrors on a country and DNS round-robin base. There is a Mandriva approach which Per knows much better than me. We have a microphone, probably better.
07:49
Firstly, I implemented using metalinks which we still do, but we had it on server side where it generated metalinks based on the coordinates in latitude and longitude and
08:04
calculated the distance to the nearest mirrors which would be done on server side but it would require a lot more. Then we switch to doing it based on the user's
08:21
time zone and coordinates there on the user side. So now it just generates metalinks locally and automatically picks mirrors which it fetches from the Mandriva mirror list which is updated every now and then. That's about it. That's a very advanced, a very nice system actually, especially the new metalink generation.
08:49
Let's switch this off. The geographic coordinates, do you get them from GeoIP?
09:03
Okay, so the client provides info to the server and the server decides where to send it. It does provide the time zone to the server. So the client provides some info
09:43
to the server which allows it to select a mirror. This also means or implies that this approach requires a specialized client so you couldn't use it with wget or normal web browser.
10:06
I mean web browser send something like language header so if you go to the site then it can decide on that but sometimes that's wrong. Anyway, let's not talk too much
10:29
about metalinks and I will focus for a moment on the comparison of these approaches again.
10:41
MirrorBrain is what SUSE came up with two years ago and what is developed, still developed since then. It is an approach that does not require a specialized client but it can work with specialized clients to do a more advanced mirror selection.
11:01
And as I said I don't want to bore you with technical details but two other approaches that are a bit similar is, these are those academic guys. Chorale CDN is actually highly interesting and it's working to some extent but it has the disadvantage that it requires you to use different URLs.
11:29
You have to prefix some other hostname to get something from the network. So again it's not transparent to plain HTTP and FTP protocol that many clients use.
11:43
Codeen is the only candidate that might maybe reach more popularity in the future. It is in some production use. I actually know a few mirrors that take part in it and use it for some specialized or for some things like a US-American mirror
12:05
or delivering stuff to Singapore I think. He has been using that and I find it works well. So all these are a little bit different from each other and may require a client that is specialized or not.
12:21
That's one of the differences. If you look at this picture, this is what I see when I look at the mirror framework landscape.
12:40
There are lots of different frameworks and they are separate and apart from those I just showed you, the few, there are many, many others like Apache Software Foundation has quite a simple redirector where you can choose mirrors manually.
13:02
Many other software content providers do have some very small solution. So the need is there and everyone tries to solve it in a simple way. And I will talk in a few minutes, I will talk a bit more why it isn't so simple to do it in a simple way.
13:30
It's quite a challenge actually to assign clients to a good mirror and also to provide the client a way to fall back to another mirror and so on.
13:45
These things are called cairns. I don't know if you know the word. If you go for hiking in English speaking countries then you often see them. They are actually useful. They can show the way.
14:02
You can mark a path and see where you go. But this is not leading anywhere. So what we rather should have is something like this.
14:21
This is the roof of a church and it's a building that has been built by collaboration and cooperation. So how do we get from here to there?
14:50
I think we really should introduce more collaboration on these things. What you often see happening is that communities are separate like these.
15:04
Like the boys and the girls, they don't talk to each other. They are afraid of each other and you can also see this in open source communities. You care about your vicinity but you don't really know what the others do.
15:20
There is a lot to learn from each other. They have something that they don't have and vice versa. I already mentioned that it's not so easy as it might look at first with dealing with mirrors. I will describe some reasons why this is not easy.
15:49
I'm going to show you on a little example. The example is picking mushrooms and then deciding which ones you want to eat.
16:02
Because if you ever did that then you know, you can look into books, internet. Each mushroom might be growing together with others that are not the same type. You have to look at each one carefully and so on. It's quite a complicated business. Let's try to explain how the selection of mirrors can be done on this background.
16:29
The first question would be, does the mirror have the file in question? We need to scan the mirrors, which can be done from a central location. The mirrors don't have to have some software on it because they already provide the content
16:45
like HTTP, FTP, R-Sync, so we can look at them. Another question would be, is the mirror close? The mirrors in question, are they close to the client so they could provide good service?
17:02
Might the mirror be trustable? No, mirrors are never trustable because they could always be hacked or broken. Or there could be a broken firewall in between. It could deliver garbage or actually manipulated stuff.
17:21
It is very useful if you can sign your content and actually provide the signatures or verification hashes together with the content. Or for some files it may make sense to just do it yourself and just send the file yourself. If it's not a large file, then that's fine.
17:41
For example, all the signature files on your file tree you can just deliver yourself. They may not be even larger at all than an HTTP redirect, which is also 1500 bytes. There may be private mirrors. Mirrors marked as private that are only meant to be used by a limited group of network clients.
18:11
Mirrors can have very different performances. There are big ones, better ones. And you have to prioritize on them and try to achieve a load balance between them.
18:28
For the larger files, you may actually want to verify if the server is actually able to deliver that correctly.
18:41
About 20% of mirrors can do that. Either on FTP or HTTP they are broken in this regard. Many mirrors are useful, but if they have to provide DVD images and extremely large content
19:00
then they just go down to their knees. So it may be useful to exclude them for bug delivery. And then you have to monitor the mirrors if they are actually available because mirrors have to be rebooted, they die for various reasons and you have to monitor them quite closely and no longer send clients there if there are problems.
19:28
Clients may actually send along with their request some preference and you might want to respect that because sometimes the client just knows better what's good for him and what works for him.
19:44
So it would be good to have provision for that. And finally, you can just go ahead and choose one mirror because this mirror might just not be available or it might not have the file and then you more or less have a sorted list of better, better, not so good mirrors
20:06
and you can give fallbacks to the clients. So these were some things that make mirror selection a not so easy task
20:26
and it's not something that you implement in a day. And actually most of those problems you are not even aware of them if you start. I certainly wasn't when I started.
20:41
And you'd rather learn about this problem during your deployment and development and you start to collect experience in the different use cases. So this all, I believe, is solved in the mirror brain infrastructure
21:05
and I also believe that it's solved in a way that would be very useful for other content providers to use. And I lost track.
21:25
Okay. After talking about the server side for so long,
21:41
it might be useful to talk about the client side, the other end for a few slides. And you all know classic HTTP and FTP clients and web browsers but you may also have heard of Metaling clients. Metaling clients are specialized download clients that combine
22:05
combine FTP, HTTP and also BitTorrent into a powerful download client that can work intelligently and fail over and if it encounters errors and problems, connection problems
22:26
or broken content, it can verify this and it can actually continue downloading from elsewhere. These clients also can download in parallel so they can try to max out on your internet connection and get content faster.
22:43
And these clients are, let's call them intelligent, and Metaling's the Metaling client needs information to do this job and this job is provided to them by what's called Metaling and so called Metaling is just a mirror list.
23:07
A mirror list in XML formatted so it's machine readable and it also can include hashes and signatures for the files
23:21
so the client has all that it needs to successfully download the file from somewhere. So what really happens is a knowledge transfer from the server who knows the mirrors to the client who wants to use them.
23:43
And this works pretty well and I have a nice quote on that from Anthony, the guy who invented the Metaling which I was delighted to read from him
24:01
and actually this combination is really a powerful combination because this is what really makes things work. You can have the best server and database and mirror database and mirror scanning and everything and mirror selection in the world but as long as the client is just a stupid FTP client it will just follow redirect that you suggest to him
24:26
and then it will either work or it will not work. And whenever you want to have something like try again or try another mirror then you have to have something on top of HTTP and FTP
24:42
and this is what Metaling does. So back to the larger picture, if you look at the world map then there are quite some countries and regions that are far apart
25:04
and there are also different parts of the world with a lot of internet connection and less internet connection and looking at this map I can give you some more reasons to believe me
25:20
that it's not easy to select mirrors because you want to assign a mirror that is close to the client but often this doesn't work by just measuring the distance because the network topology looks extremely different from this and I will show you a few examples.
25:43
First example is New Zealand. New Zealand is quite a simple case. It has an edge localization quite there at the end and it's simple to see that they have proper connection over there
26:05
but to the rest of the world it's much worse and they also have some connectivity to the west coast of the US. I have heard this from someone but I actually don't know.
26:21
This is one of the problems. So while this New Zealand case is pretty obvious there are also still some things that you need to find out. So anyway it's a good rule of thumb to just send clients from New Zealand to an Australian mirror. But if there is no Australian mirror then you already have to decide which one is next.
26:45
And the chances are that these are not good because they don't have much interconnectivity. Often interconnectivity to the internet centers of the world is much better than from here to there
27:04
because especially in Africa often people are connected with satellite links where they want to go and not to their neighbors. Another interesting case is Russia.
27:20
Russia is an extremely large country and I know from a lot of feedback I got that I learned a few things. I learned that China, Russia doesn't work. This continent called Asia would be the normal unit of geographical thinking.
27:49
When you use a certain library GeoIP for looking up client if you locate a client and look up it's where it is.
28:03
And in this case Asia as a unit doesn't work well because there is not good connectivity between those large parts. And other special things about Russia are that for example Ukraine can't get to these mirrors that are here
28:25
maybe for political reasons I'm not sure. Russian users have very bad connectivity to other Asian mirrors which admittedly are quite far away. Russian mirrors have good connectivity to German mirrors
28:40
so you have to really have to some special cases and handle certain countries specially so never assigned to there from Ukraine but always assigned to Germany or something like that. It makes it rather interesting. Any simple scheme will not work for all the countries for very long.
29:05
As soon as you start to learn about all these particular cases then it becomes quite complex. Another interesting case is South Africa where basically South Africa at the tip of Africa has like five mirrors there and the rest of Africa has none.
29:26
So it's really quite concentrated here. And the neighboring countries actually have decent connectivity to South Africa from what I've heard.
29:43
But Mozambique for example doesn't get good connectivity to South African mirrors. So Mozambique is better assigned to German mirrors again because satellite, the national link goes there. So this is another interesting exception or regional particularity.
30:10
If you think about internet connectivity then I think for most people in this room it more or less looks like this. Like a high speed motorway.
30:23
But often we are not aware that for other people it may look completely different. So there is a large part in the world where many people live more than in the well educated, rich
30:43
and well connected countries that don't have connectivity as good as this. So a child in Africa may not have the opportunity to learn to be educated.
31:04
But education and information are key for being healthy and living a healthy life and finding a job and so on. And people in less well connected areas also need to download stuff.
31:20
For simple reasons they need to download software like OpenOffice. OpenOffice plays quite a role in this because it's a free office production suite and it's really needed for people around the world and it's also why it's so popular. And people have big problems to download OpenOffice
31:44
because it's about a 100 megabyte download or a bit more and that can be very hard to download. I have two quotes here from... I quoted them from memory but this is what some people said
32:02
who are affected by the situation. So I believe that mirror selection that helps them would be very worthwhile. If you look at the percentage of people in the world who have internet access
32:22
that's 22 percent. So there's four fifths of the world doesn't have internet access at all. And of these people who have internet connectivity at all I have some numbers here about the percentage of those who have broadband connection
32:42
like DSL or something fast. So in Germany it's about 24 percent, Korea it's 8 percent, the Slovak Republic is 6 percent only which is amazing because it's in the middle of Europe and in the poorer countries it ranges between 1 and 3 percent.
33:02
So there's a lot of users who have only bad connectivity, no broadband. So we can help them a lot. So the question, back to the question,
33:21
how do we get from the unorganized, chaotic, separate solutions to the big solution? This is practically how I see the mirrors organized.
33:47
It's just that they are not well organized, not as clear as here, but it's a very loose organization. The thing is that the mirrors, those guys,
34:01
they are the same for Fedora, for Ubuntu, for OpenOffice, for SUSE and so on. You always meet the same guys again. So any of these mirrors a lot of projects. So what the user sees is the mirror or what you think about is this mirror or that mirror but actually this machine mirrors several projects.
34:22
This one actually the same and this is very similar. It's just another file tree, different layouts, different sync times and different set of projects that are mirrored. But in principle you could see these elements
34:40
and try to think about how to get this in structure. So instead of setting up mirror brain for every project, which would involve quite some overhead because each of them would have to know the mirrors and keep a database and so on,
35:06
you could also have one big database which knows about the mirrors, knows about the servers, the contact persons, about the content providers and keep this in line together.
35:21
And this is actually not just a dream, it's actually not far from implemented. I'm working in next generation database scheme that provisions for that. So this becomes possible. It's not a big step, it's just making things easier
35:41
so you don't have to store a mirror many times. So this nearly exists and a common database could actually lead to another thing, a common file tree because those mirrors could actually have the same file layout
36:02
which would make it even easier to find around on them and to have a database that reflects their file tree. But at this point it already might become difficult again because nobody knows if this is ever possible to implement
36:23
because it would involve changes on every mirror and I can say from experience that most of these 200 mirrors that I work with for SUSE, like 50% I have contact persons and the other two quarters,
36:45
the other half maybe I don't even have a contact person, I don't find out about one, I don't reach anyone even if I try my phone and it's very hard to get hold of people that are so far away and they don't publish the email address or it's completely outdated.
37:03
And it would be an illusion to say, okay, let's get all these together and change everything because it would never happen. And these mirrors are also very different, very different operating systems.
37:24
Syncing of mirrors is also an interesting topic for collaboration and for improvement because everybody has some scripts that are better or worse and it would be very interesting to have something working which we can share.
37:44
So altogether this could form some free content delivery network that would actually deserve the name of a content delivery network and wouldn't just be what we have now which is some isolated solutions and very different mirrors.
38:08
And as you might understand now the business of selecting mirrors and it's not so easy and most other solutions won't get that right
38:23
because it's a lot of work. So here's my call for collaboration. I want mirror owners and content providers and users and researchers
38:42
to join a mailing list and to talk about these things because it's very important to talk about this and it's always enlightening if you talk to... I talked to Fedora Guy, to Per from Unriva and so on
39:02
and it's always enlightening because everybody, like these specialized guys also have lots of good thoughts about this and mirror admins from around the world have their picture about their region and what happens there and what's the political situation,
39:20
why is there no connectivity between North and South Korea and so on. And we need to get this knowledge together and the interesting thing is that a mailing list like this doesn't even exist. There is no common forum for these things. I think there have been news groups like 20 years ago
39:43
where these people were gathered but there's no place to meet them except you join the Fedora mailing list or the Susan mirror mailing list or the OpenOffice mirror mailing list and then you always meet the same community but there's no shared forum where also the content providers are together.
40:06
But there's a lot of potential to get them together. So this basically is what I wanted to share with you.
40:26
Thank you for listening and I hope you have some input.