How Facebook uses Python to build (and operate) datacenters at scale
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 160 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33774 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 201750 / 160
10
14
17
19
21
32
37
39
40
41
43
46
54
57
70
73
85
89
92
95
98
99
102
103
108
113
114
115
119
121
122
130
135
136
141
142
143
146
149
153
157
158
00:00
FacebookMixed realityPhysical systemData centerProduct (business)BuildingScaling (geometry)SoftwareSystems engineeringLecture/Conference
00:53
FacebookWebsiteFacebookScaling (geometry)Computer animation
01:32
FacebookFacebookSoftwareDialectPoint (geometry)Channel capacityNumberBuildingMereologyScaling (geometry)Network topologyBootingData centerBitNeuroinformatikTheory
03:36
Fiber (mathematics)Data centerOrbitNatural numberTraffic reportingTerm (mathematics)NumberUniverse (mathematics)Multiplication signRepresentation (politics)Channel capacity
05:15
FacebookSoftware testingCASE <Informatik>Address spaceProduct (business)Hacker (term)RoboticsDigital photographyProjective planePrototypeFormal languageBuildingMultiplication signSlide ruleRight angleProper map
08:10
Line (geometry)CodeBinary fileFormal languageService (economics)NumberFormal languageThumbnailProduct (business)FacebookData managementInterface (computing)BitNumberRevision controlCountingBinary codeCodeSoftware repositoryFront and back endsComputer fileType theoryServer (computing)Line (geometry)CompilerService (economics)Insertion lossArmSoftware frameworkXMLComputer animation
10:09
Server (computing)Fiber (mathematics)Type theoryFiber (mathematics)CompilerCodePhysical systemType theoryUniform resource locatorInterface (computing)Server (computing)
10:51
Type theoryFiber (mathematics)CoprocessorServer (computing)Network socketFactory (trading post)Communications protocolService (economics)LogicNumberDivisorFactory (trading post)CoprocessorSource codeJSON
11:31
Service (economics)Open sourceCodeQuicksortTask (computing)Phase transitionFrequency3 (number)Software framework
12:22
Task (computing)Server (computing)Fiber (mathematics)Module (mathematics)Boilerplate (text)CodeFreewareQuicksortMetric systemOnline helpMereology
12:58
BitVirtual machineInterpreter (computing)VirtualizationSource codeMoment (mathematics)FacebookMultiplication signFormal language
13:33
Physical systemOpen sourceFacebookComa BerenicesBlock (periodic table)Area
13:55
Musical ensembleMathematicsCodeGastropod shellScripting languageFile archiverCombinational logicMultiplication signRobotOpen sourceServer (computing)Binary codeBuildingXML
14:36
FacebookCodeBinary codeBlock (periodic table)Server (computing)CASE <Informatik>Library (computing)Figurate numberInterior (topology)Slide ruleModule (mathematics)BitStatisticsBuilding
15:50
WindowFacebookFacebookNeuroinformatikBootingSlide ruleOffice suiteRoboticsMultiplication signHacker (term)BitScaling (geometry)Product (business)Computer animation
Transcript: English(auto-generated)
00:04
Thank you very much for your time. I'm going to turn it over to Nikolas Jepanov, who's going to tell us about how we at Facebook use Python to help us build data centers at scale. Some of you may remember me from previous years from
00:20
such venues as the River Bar. I'm a production engineer at Facebook. Production engineering as a role is kind of like a mix between software and systems engineering. I'm not going to talk about this in a lot of details now, but if you're interested, you can swing by our booth down at
00:43
the marketplace, and we can tell you more about this. So, this talk also has an alternative title, which I'm sure is a theme that you've heard a lot about in this conference still, and it's automating the things that you
01:03
didn't think you could automate. So, Facebook is growing really fast. We have more and more users doing more and more things on the site. This means that some things that we could get away as a company by doing by hand, we can't do anymore
01:20
because we're not going to catch up with the requirements of our scale. Now, I'm not saying that this applies to any company out there, but an interesting thought experiment would be if you consider that computers are getting cheaper, it's actually getting more and more expensive every year for you to
01:43
not use computers to do things, because humans are not getting cheaper, in theory. So, yeah, this may not apply to everybody here, but at Facebook, we really needed to stop and think about how we
02:01
can stop doing things that really are not going to scale, and this is one story of such a thing. So, fabric. This is how Facebook designs data center networking. This is the so-called spine leaf topology.
02:22
I think it's a design that somebody came up with in the 1950s, so at the bleeding edge of technology, as always, and what this allows us to do is add compute capacity without having to mess with the network, among other things.
02:40
I'm not a networking expert, but we do have some networking people here at the conference, so if somebody is interested in this particular thing, you can, again, swing down by our booth, and I'm sure some of the networking folks will try to help you, and interesting data point is that there are networking people at the Python conference, so,
03:02
yeah, that's kind of like part of the whole theme. Yeah, so, but, Facebook, we actually have, yeah, we have many of these fabrics, because we have a number
03:21
of regions that have a number of buildings in them, so this adds up quickly at Facebook. So, a question for the audience and a little bit of a nausea-inducing animation, thanks to PowerPoint, is how much fiber cabling connects all of these fabrics?
03:45
Like, maybe venture guests? So, let me just say that for astronomy nerds in the audience, this is not an actual representation of planetary motion, just to, you know, get it out there.
04:04
So, it turns out there's quite a lot of fiber cabling in our data centers. Some of my colleagues from the data center teams calculated that it's about a million kilometers of fiber cabling that we have to deal with,
04:23
so it's interior enough to, in this make-believe universe, go to the moon and back and around. Yeah, so, we can think about this in terms of fiber ports
04:41
that we have to deal with in our data centers, and as we are adding more capacity, we have less time to do the things we used to be able to do without automation, and we can't do that anymore, because there's just more stuff happening.
05:03
Today we are at approximately between 10 and 20 million ports, but that number is going to grow in the next couple of years rapidly. Yeah, so, this is really not where we want to be.
05:25
And when it comes to operating this stuff, we also don't want to have people deal with this kind of stuff. We need to do better.
05:41
And then, cheesy animation later, we built robots with funny tattoos to help us do this, and one of the tattoos is a Python logo. So, it's on the slide.
06:04
Yeah, so, this is a nice sunset, California sunset photo of, I think it's building 16 in the main Facebook campus in Menlo Park, California. The address is one hacker way,
06:21
and this is kind of like what we at Facebook believe is a good way to solve problems is to hack. So, we hold these hacks, which are also kind of like hack-a-thons. They can be a couple of days long. Nowadays, mostly people go home and sleep.
06:42
I don't think this was the case all the time, but nowadays it is. And we take this very seriously. Lots of engineers take a couple of days to think about exactly the kind of things that I try to talk about here, like what things can we do better,
07:01
and then maybe sit in a room for a day or two and try to come up with a quick prototype. And many, many internal projects and even external-facing products at Facebook started out as like crazy ideas that people just sat together in a room and hacked on for a day or two.
07:26
Yeah, this is an aerial shot of the campus. Building 16 is at the top right corner, I think, and you can see the San Francisco Bay at the back. Like, realistically, you cannot just hack your way
07:42
through some problems. You can start out like that, and that's a good way to build a prototype, test your ideas, but as you want to use this and make better use of it, you have to put it in proper production.
08:03
And there's one language that's actually quite good and helpful when you need to come up with a fast prototype, but actually it works well in production as well. You've probably heard about this language. At Facebook, we're kind of like thumbs up in Python.
08:22
Here's some numbers from our one single repo that we have that I pulled out in June. These are rough numbers, but we actually have thousands of binaries running in production. A decent number of them is Python 3. We have millions of lines of code.
08:43
40%, if you look at the line of code count, is actually Python 3, so we are very invested in Python 3, and it's the second most used language for back-end services after C++. Most people probably know
09:02
that the front-end of Facebook, the product, is written in a slightly improved or a significantly improved version of PHP, so we don't count this here, but everything in the back-end is mostly C++ and Python
09:21
with some other interesting bits here and there. So how does it all work together? You have all of these servers running Python, C++, some people like Haskell. It's apparently cool, or used to be. At Facebook, we use Trift, Apache Trift.
09:44
It's basically an RPC framework, and what it does is you can define types and you can define the interface of your service, and then you have a compiler that compiles this and generates language-specific code, so you can have services written in different languages
10:04
that can talk to each other seamlessly, and all of your definition lives in a file like this, so this is obviously a very contrived example. This is not actual code, but our system to help us deal with a lot of fibers
10:26
would maybe have an interface like this. You would define a fiber type and maybe define a location type, and then you would have a fiber server, and then you would be able to ask what kind of fiber cabling is happening at a certain location.
10:45
And then once you run the Trift compiler and generate some code, then you get generated code which you can then import, and it's fairly easy. As you can see, there's only about six imports, and then you have to actually write
11:00
the business logic of your service. We still haven't cracked that problem. I mean, how not to do it. And then you instantiate your handler. There's a processor. There's two factors because one is enough. Generally, this is not very engineer-friendly,
11:23
and also there's a number of problems with this. There's no signal handling. There's no monitoring. There's basically nothing. So we came up with a solution to this and open-sourced it. It's called Spartz.
11:41
It's basically a services framework for Python. And at Facebook, we use Trift for everything. So it supports Trift, but it also supports HTTP, and it does a lot of cool things. You have periodic tasks, background tasks. You have logging, command line arguments,
12:01
all sorts of nice things like that. So even if you're not interested in using it as such, it's very nice readable code, so you can take a look at it. If you're interested, this is a good example of how you would write a service framework in Python, which I'm sure is becoming more and more common nowadays.
12:22
And then, if we use Spartz, this whole boilerplate becomes much better. You still have to write your code. We can't help you with that yet. But then it's just like a couple of things.
12:40
And this thing will do way more than the previous example did, because it will expose metrics so you can monitor your service and go on like five weeks of vacation like Hinnok did. And all sorts of other things that you get for free. So for those of you who are paying attention,
13:02
I did mention binaries before, and most people think of Python as an interpret language, which does not have binaries. It has source files that are then interpreted by the CPython interpreter. At Facebook, we actually don't use virtual and pip
13:21
or any kind of thing like that. We actually do build binaries, and we distribute them to all of our machines also using Python, which I will tell you more about this in a moment. So we have one build system to rule them all in Facebook.
13:42
It's called Buck, and it's also open source. So if somebody's interested in this, you can go to buckbuild.com and you will see. You can read more about Buck. And basically, we use Buck to build Python binaries.
14:01
This is not the new technology. This is done, there are open source solutions to this. But what we do is we put it in a zip archive and then prepend it with a shell script that knows how to extract and execute the code. So all of our code, all of the binary dependencies, all of Python, pure Python dependencies
14:22
would be in one big blob that you then send to all of your servers. Buck can help you with a lot of other things. It caches your build artifacts so you don't have to rebuild things all the time that don't change. And it has a very nice reproducible way to define your builds.
14:42
And if you take a look at this, this is actually also Python code. So this was actually very useful for me to come up with all the stats from a couple of slides back because you can use things like the AST module to parse this and figure interesting things out about your build.
15:00
So in this case, we define our thrift library which tells Buck to build and generate the generated code and then we have a Python binary that depends on this library and runs ServerPy which then solves all of our problems in theory, of course.
15:24
So yeah, like I said, now we have binaries. Facebook, we use BitTorrent which is also powered by Python and AsyncIO to take those binaries and distribute them to a huge fleet of servers
15:41
where they need to be. So Python is heavily abused and this leads us to the kind of like the
16:00
TLDR of the talk and if you take anything away from this it's, well actually it's two things that some things about how we use Python at Facebook but also that the robots are coming and I was really practicing this but I couldn't come up with a good enough impression
16:21
of an Austrian accent saying and they need your clothes. So imagine I said it in an Austrian accent. But yeah, like I said, as things scale you need to automate more and more things. I think this has become a very big theme in a lot of
16:40
talks that I heard today and the previous days at this conference. So it's something to think about. For me it's very interesting to think about how really a lot of things that you didn't even think you could be automating are starting to
17:02
pieces of it are starting to get eaten by computers and done by computers more and more. And at Facebook we use hacks it's a very fun interesting way to come up with ideas of what things can go away and be done by robots.
17:22
And with that I ran a little bit short on time but we're hiring and what's interesting for these folks I think is London and Dublin offices for production engineering so if you're interested to come and
17:42
build robots with us and maybe do cheesy slide decks yeah come to the booth and talk to us. That's it. Thank you. As for questions
18:02
I would prefer if like people grab me at the booth I'm gonna be at the booth a whole day tomorrow so I think it's a little bit awkward to talk like this so come grab me after at the booth and we can chat. Thanks a lot.