Software distribution: new points of failure
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/47443 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Point (geometry)Slide ruleSoftwareDistribution (mathematics)Point (geometry)BitComputer animation
00:17
SoftwareEmailPhysical systemCodeFormal languageModul <Datentyp>Type theoryPresentation of a groupSoftwareProgramming languageInternetworkingPresentation of a groupOpen sourceBitDependent and independent variablesProduct (business)Physical systemStokes' theoremFormal languageClosed setCodeQuicksortCD-ROMProjective planeComputer animation
01:28
CloningComputer networkProcess (computing)Transport Layer SecuritySoftware developerOpen sourceMalwareRevision controlTrojanisches Pferd <Informatik>ProgrammschleifeComputer networkOperator overloadingError messageAsynchronous Transfer ModeInternetworkingDatabase normalizationOpen sourceCodeRevision controlPlanningInformationDependent and independent variablesSoftware developerLevel (video gaming)AreaMereologyParticle systemInternetworkingComputer networkOperating systemComputer networkAsynchronous Transfer ModeReal numberConnected spaceGoodness of fitClient (computing)SoftwareInformation securityDirect numerical simulationForcing (mathematics)Transport Layer SecurityCommunications protocolIn-System-ProgrammierungImage resolutionServer (computing)Repository (publishing)Database normalizationMalwareElectronic mailing listComputer animation
03:56
Block (periodic table)MereologyInternetworkingZirkulation <Strömungsmechanik>InformationAsynchronous Transfer ModeInternet service providerInformationPhysical lawComputer networkElectronic mailing listWebsiteCentralizer and normalizerDistribution (mathematics)GradientLocal ringDemosceneComputer animation
04:56
Open sourceElementary arithmeticInternet forumWeb browserClique-widthInternet service providerStandard deviationWebsiteSoftware bugIn-System-ProgrammierungDifferent (Kate Ryan album)Block (periodic table)InformationPhysical lawBlogDemosceneContext awarenessSoftwareSpeciesRule of inferenceComputer animation
05:42
Block (periodic table)Electronic mailing listAddress spaceMassDensity of statesService (economics)Repository (publishing)FingerprintSlide ruleWebsiteComputer networkBlock (periodic table)Density of statesIn-System-ProgrammierungMassIP addressMereologyAxiom of choiceService (economics)Computer fileRow (database)Content delivery networkFluid staticsFood energyAddress spaceFormal grammarSpeciesComputer animation
07:17
Virtuelles privates NetzwerkPublic domainOnline helpArithmetic meanQuicksortPoint (geometry)Workstation <Musikinstrument>CASE <Informatik>Server (computing)Computer animation
08:08
Database normalizationServer (computing)Local ringDefault (computer science)Information overloadComputer configurationDistribution (mathematics)Server (computing)Structural loadDatabase normalizationInstallation artAddress spaceElectronic mailing listTouchscreenComputer animation
09:13
Physical systemTheoryLevel (video gaming)MereologyPoint (geometry)Single-precision floating-point formatWorkstation <Musikinstrument>Slide ruleComputer animation
10:10
Windows RegistryComputer networkContent (media)Computer networkContent (media)Length of stayWindows RegistryService (economics)Content delivery networkServer (computing)Computer networkStructural loadTask (computing)Computer animationLecture/Conference
10:43
Public domainOperator (mathematics)Single-precision floating-point formatServer (computing)Public key certificateContent delivery networkQuery languageWindows RegistryFormal verificationPoint cloudAddress spaceTheoryOperator overloadingComputer networkError messageClient (computing)Domain nameInformation securityCASE <Informatik>Process (computing)Computer networkIP addressUsabilityRouter (computing)Server (computing)Single-precision floating-point formatAddress spaceInternet service providerDifferent (Kate Ryan album)Default (computer science)Client (computing)Information overloadWindows RegistryCodePublic key certificateMultiplication signSoftware maintenanceInstallation artPhysical systemResultantWebsiteOperator (mathematics)InternetworkingWeb pageMetadataFamilySpacetimeRevision controlGoogolInheritance (object-oriented programming)Dot productArtificial neural networkArithmetic meanBuildingTheoryStructural loadService (economics)RoutingOpen sourcePoint (geometry)Reading (process)Mechanism designNormal (geometry)PhysicalismMenu (computing)MathematicsImplementationArithmetic progressionRow (database)Dependent and independent variablesComputer animation
14:40
Insertion lossWindows RegistryMobile WebCASE <Informatik>
15:01
Client (computing)AliasingCodeWindows RegistryExterior algebraElement (mathematics)Incidence algebraInternetworkingWindows RegistryINTEGRALComputer animation
15:54
Equals signComputer networkData recoveryCuboidService (economics)Content delivery networkContent (media)Computer animation
16:15
Address spaceClient (computing)Block (periodic table)CurvatureDirect numerical simulationSingle-precision floating-point formatRoutingDirect numerical simulationAddress spaceClient (computing)Content delivery networkBlock (periodic table)ResultantNamespaceCASE <Informatik>Point (geometry)Physical systemVotingDressing (medical)Structural loadSoftwareProper mapLevel (video gaming)Spring (hydrology)Computer animation
18:01
Database normalizationClient (computing)Data modelInformation securityClient (computing)Information securityDatabase normalizationEndliche ModelltheorieSoftwareServer (computing)Computer animation
18:49
Service (economics)Software testingServer (computing)EmailDifferent (Kate Ryan album)Installation artInternet service providerPoint (geometry)MathematicsRow (database)Computer animation
20:54
Open sourcePoint cloudFacebook
Transcript: English(auto-generated)
00:05
Welcome to the FASM 2020 distribution Stavro. Our next talk is software distribution, new points of failure in a censored world, from Alexander Patrikov. Thank you. Let me start by talking a bit about myself. I am a freelancer. I work from home.
00:25
Previously, I worked for a certain company as a software architect, and I am giving this talk as a software architect. So the audience of this talk is people responsible for any kind of code ecosystems. It is not a secret anymore that programming language modules, operating system packages, and all
00:46
other sorts of code are now distributed from the Internet, not from the CD-ROM. And people create new such ecosystems, not every day, but close to it.
01:02
And the purpose of this talk is to give some guidance to avoid a mistake that would result in this ecosystem eventually becoming only for US and Europe. So we also need to clear some legal stuff. Technical opinions expressed in this talk are my own. Political opinions are, maybe, maybe not.
01:24
And I don't represent any of the projects mentioned in this presentation. So let's start with a good interview question for a new developer. So what happens if you try to clone a Git repository or install an operating system package or something from NPM? What happens at the network level?
01:50
The next question, what would go wrong? And finally, what actually went wrong during the recorded history? Let me first give the answer to the network part.
02:04
So first, the client creates a DNS request to the ISP's DNS server. The DNS server does the name resolution magic by sending more packets to the authoritative name servers. Then, once the client gets their reply, it initiates the TCP connection, then more high-level
02:25
protocols go on the wire, such as TLS or HTTP, and finally the package is installed. You see lots of moving parts, so obviously lots of places that could go wrong. And let's first discuss why it is important. I'll bring an example from China.
02:44
Five years ago, Xcode download, which is an Apple tool, that download was too slow or in some places even completely broken. And this forced developers in China to get Xcode from unofficial sources.
03:02
And one of those unofficial sources replaced the original Xcode package with a modified one that injected malware into software that was built with that modified version of Xcode. So that's how a simple availability problem has evolved into a bigger security issue.
03:25
Well, as I said, there are many moving parts, there are many trailer modes in the network, broken cables, overloaded networks, misconfigurations. The list is, of course, incomplete.
03:41
But as we all know, Internet usually works, because resilience and redundancy are built into its infrastructure. And even more importantly, there are humans responsible for fixing whatever is broken. So now there is a new kind of network failure.
04:01
Governments do not want their citizens to be able to see certain information. So they pass the laws that say this kind of information should not be accessible to citizens. And access to websites containing that information, for example, information about drug abuse, so such sites should be blocked.
04:31
So they create centralized lists of sites to be blocked. They distribute such lists to the Internet service providers. The Internet service providers block those sites.
04:44
The problem is that governments want to restrict such information at all costs. So in Russia, it happens since 2012. Let's see which sites you will not be able to access, or if you travel in the past, you will not be able to access.
05:04
So you see, it's not only sites that contain information on drug abuse, it's also sites that distribute software. There are blogs, there are standards documents, there are bug trackers.
05:24
There are no laws that prohibit citizens from seeing such information. Nevertheless, such sites are blocked. Well, on some ISPs, some of the sites are actually accessible. That's because different ISPs use different block technologies.
05:44
So those sites on the previous slide are not the targets of their censorship. They are victims of, let me call that, technical overblocking. Let me explain this phenomenon. Why is this blocked? It is not technically possible to pass this through without also blocking that.
06:04
So without also passing through that. And the government explicitly tells ISPs to block that. If you don't block that, you will get your ISP license revoked. So this is also unfortunately blocked. The problem is if this is part of your infrastructure.
06:22
So how does this happen? ISPs typically block stuff by IP address because in our age when everything is encrypted, when with TLS 1.3 there is even encrypted SNI, they do not actually have much choice.
06:41
So shared IP addresses, how does this happen? Mass hosting for static files. Some chips contact delivery networks. The DOS protection services. There are many more examples where a shared IP address is given to a customer. Finally, there is a telegram war, but well, it's a subject for another talk, so I will not go into that.
07:06
This is not specific to Russia and China. I can bring examples from Iran, from Egypt, and because how politicians work, this can only get worse. So how to deal with this breakage?
07:21
Often an advice is given to use VPN, Tor, or whatever other circumvention technology. However, I would not say that it is a politically acceptable answer because there are people who simply cannot be convinced to use any sense of circumvention technology,
07:41
maybe because of propaganda that only bad guys use such tools, maybe because it is actually legal in some places. I would also say that it is not a technically good answer. If you have the situation when your servers are blocked, then you have a point of failure in your infrastructure,
08:02
and in some cases it is actually easy to fix. So for technical domains, mirrors help. Mirrors are used by many Linux distributions. They were not designed for dealing with censorship.
08:21
They were created to distribute the load, to move the load away from the central server, to make sure that the user downloads packages from a mirror which is near him, which is usually faster. So they provide the needed redundancy, and how does it look like?
08:44
So in the installer of, for example, Debian, there are screens where you can choose the country where your mirror resides, then you are presented with a list of mirrors in that country. There is also an option to enter the address of your own mirror, which can be unofficial.
09:04
In Fedora, they went even further. They do auto-detection of the fastest mirror by default, which creates a really great user experience. So what could go wrong in this setup with mirrors? So remember the slides where I listed the moving parts?
09:22
They are still there. Still, everything can go wrong with any of those parts. But it only affects the selected mirror. This mirror is not the target for the sensor. So actually there is one official Debian mirror right now blocked in Russia, the Spanish mirror.
09:47
So, why here? Because it, I don't know, I can look it up. So still it is not a problem because there is more than, there are more than 300 other mirrors. Debian is still installable in Russia, so there is no single point of failure in the whole ecosystem.
10:07
That's good. That's, I would say, a perfect solution, a perfect situation. So, but recently another solution to the original task of making sure that the load is spread among multiple servers
10:26
and the user downloads from a nearby server became popular. Content delivery networks. It's a network of mirrors run by someone else. So I will describe how this is different from the classical setup with mirrors.
10:41
I will use NPM public registry as an example. So let me first describe the apparent CDN benefits. There is a single domain name behind the whole mirror network. So there is no need for the user to select the mirror manually, which is a great boost in usability. Also there is no need to design the security of your system with untrusted mirror operators in mind
11:06
because all the mirror servers are operated by a single legal entity. They can even share the same SSL certificate, which is also great from the operational viewpoint. So let's see how it works.
11:21
So if I try to install an NPM package, then NPM client resolves registry.npmjs.org, which is the default registry. Then it downloads the package metadata over HTTPS. Then it downloads the package and installs it. And installs it. Done.
11:40
Let's see how it looks like in the network. So registry.npmjs.org has, last time I checked, there was 12 AR records, which are for IPv4 addresses. And there are 12 AAAA records, which are for IPv6. So the IP addresses belong to Cloudflare, which is a major CDN provider.
12:03
Cloudflare uses Anycast, so each of those 12 IP addresses actually are hosted on multiple servers, directly distributed. And normal internet routing, such as BGP mechanism, ensures that the user really gets to the nearest server
12:24
and downloads the package from there. So how does this survive censorship? NPM is not blocked in Russia, so I had to simulate it by misconfiguring my router to return TCP reset packets to half of those mirrors.
12:47
End result. It was possible to install packages. It was slow because of their tries, because of the delay between their tries, but nothing broke. That's great, especially for a system that was not designed for this use case of circumventing censorship in mind.
13:08
So why is it slow? Because, as I said, it was for a different use case. It was for a use case of overloaded server or overloaded network, where adding a delay between your tries does help.
13:22
Also, it helps that I blocked the servers with a simple TCP reset. Not all sensorware does that. There are also cases when they helpfully try to present a page which says this site is blocked, and they present it using an invalid SSL certificate.
13:44
So if I try to do that with an invalid SSL certificate, then of course NPM will fail to download and install packages. This isn't very fixable by changing NPM code. I'm not asking the NPM maintainers to do that, because it is, well, for a different use case.
14:02
Still, this example demonstrates that the client-side failover, as implemented in NPM, does a great job of circumventing censorship. But let's also highlight one more important difference between a traditional mirror setup and a CDN.
14:26
Let's go to China. Actually, the inaccessible registry is a common problem in China. If you go to ping.pe, you can ping the registry.npmj.org server from many places, including China.
14:48
And you will see that in many cases there are many lost packets. TCP is not designed to deal with that, and so the download fails.
15:01
So how do Chinese users use NPM? The answer is that they don't. There are alternative NPM registries in China. They claim to mirror the official one, so the two registries are on the slide. However, they are not exact mirrors. In particular, they strip the whole integrity check in JSON elements that are in the registry API.
15:35
So packages installed from there cannot be trusted. Still, Chinese users use that, so it's an incident waiting to happen.
15:43
I hope that somebody from Taobao or people from CMP and Jess is listening to this talk over the Internet. Could you please fix it? Thank you. OK, so we have looked at NPM. There is another service that uses a content delivery network, and this is Flatpak.
16:06
I will use that to demonstrate that not all content delivery networks are equal, and you should really evaluate the setup. So, Flatpaks are usually downloaded from Flathub.
16:22
Flathub uses Fastly as a CDN, and Fastly operates a CDN using a CNAME. So dl.Flathub.org is a CNAME for some shared DNS name in the Fastly.net namespace.
16:44
And that long name results in one IPv4 and one IPv6 address. Those addresses are different for different clients, so that's how they do the geographical spreading thing. So they are relying on DNS, not on any custom routing.
17:04
So for the original purpose of spreading the load, that's a valid solution. But for the case when some of the infrastructure can fall victim of a sensor, there are simply too many single points of failure here.
17:22
So there is no possibility for clients to failover. If the government by accident blocks dl.Flathub.org, or that long name in Fastly.net namespace, or that single IPv4 address that is returned to my client, then I can no longer download packages from Flathub.
17:44
So, don't do that. It is too easy to block such setup by accident. And this also applies to failures not caused by the government. So think about it too. So the takeaway from my talk would be,
18:03
if you want to implement countermeasures against accidental blocking in your software ecosystem, then please add proper redundancy. Please implement client-side failover, because it is only the client who sees the ultimate truth,
18:22
whether the server works or not. Then it would be great if you allow unofficial mirrors in your ecosystem, because, well, that's what happens with NPM. Cnpm.js is an unofficial mirror, even though NPM does not want mirrors.
18:42
And because you have to allow unofficial mirrors, you have to design the security model with them in mind. So that's all for me. Are there any questions? Fair, please?
19:04
As a service provider, how can one check if I am blocked elsewhere? There is no way to do that. You have to rely on reports from users. How can they talk to me?
19:21
So they can still email you, because, for example, the servers distributing packages and the email servers are usually not the same. Other questions? Has anybody checked this Chinese NPM? Is there any difference in NPM packages served by the Chinese NPM and the main NPM?
19:47
Has anybody checked that? So I haven't checked that. Chinese users use it. So I think that it's a good idea to test that,
20:01
but because of the quite complex API where each package has its own API endpoint, it would be a quite difficult task. Well, my own viewpoint when I worked for a Chinese company, I told them explicitly not to do that and installed Tor on their server
20:21
and told them to use Tor socks, NPM, install something. Other questions? No questions? So we finished five minutes per lead.