How Facebook uses Python to build (and operate) datacenters at scale

Video thumbnail (Frame 0) Video thumbnail (Frame 1337) Video thumbnail (Frame 2311) Video thumbnail (Frame 5317) Video thumbnail (Frame 7882) Video thumbnail (Frame 9086) Video thumbnail (Frame 11112) Video thumbnail (Frame 12000) Video thumbnail (Frame 12544) Video thumbnail (Frame 14116) Video thumbnail (Frame 15232) Video thumbnail (Frame 16285) Video thumbnail (Frame 17268) Video thumbnail (Frame 18552) Video thumbnail (Frame 19450) Video thumbnail (Frame 20335) Video thumbnail (Frame 20880) Video thumbnail (Frame 21910) Video thumbnail (Frame 23064) Video thumbnail (Frame 23746)
Video in TIB AV-Portal: How Facebook uses Python to build (and operate) datacenters at scale

Formal Metadata

How Facebook uses Python to build (and operate) datacenters at scale
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
How Facebook uses Python to build (and operate) datacenters at scale [EuroPython 2017 - Talk - 2017-07-11 - Anfiteatro 1] [Rimini, Italy] With 4 datacenters on-line and more coming fast, building and operating datacenter buildings becomes a problem we need to solve at scale. At Facebook, Several teams of Production Engineers write the software that helps us do this efficiently, and we use Python... a lot. In this talk, I will go into some detail about only some of problems we try to solve to make sure our datacenters come online on time so that we can make sure you can connect with all your friends on Facebook, and keep them humming, as efficiently as possible. We'll go into some detail about the awesome Python infrastructure (some of it open source), that we use to build this software, and some of the engineering practices. This is a talk for you if you were wondering how to track each and every strand of fiber cabling within a datacenter, or make sure we find out that the cooling system isn't really doing it's thing before actual servers catch fire from serving you live videos
Facebook Building Scaling (geometry) Software Data center Systems engineering Product (business)
Point (geometry) Building Dialect Scaling (geometry) Channel capacity Mereology Number Neuroinformatik Facebook Facebook Software Network topology Data center Website Booting
Channel capacity Natural number Term (mathematics) Multiplication sign Universe (mathematics) Data center Representation (politics) Bit Traffic reporting Fiber (mathematics) Orbit Number
Facebook Digital photography Prototype Robotics Personal digital assistant Multiplication sign Projective plane Address space Product (business)
Prototype Building Right angle Software testing Proper map Product (business) Formal language
Service (economics) Service (economics) Arm Code Line (geometry) Binary code Code Counting Insertion loss Bit Line (geometry) Binary file Formal language Product (business) Formal language Number Front and back ends Revision control Facebook Number Data management Software repository Thumbnail
Server (computing) Service (economics) Code Server (computing) Interface (computing) Compiler Type theory Uniform resource locator Type theory Software framework Fiber (mathematics) Fiber (mathematics) Physical system
Service (economics) Type theory Logic Server (computing) Network socket Factory (trading post) Factory (trading post) Coprocessor Communications protocol Fiber (mathematics) Coprocessor Number
Module (mathematics) Service (economics) Open source Code Server (computing) Boilerplate (text) 3 (number) Mereology Frequency Phase transition Software framework Quicksort Metric system Fiber (mathematics) Task (computing) Task (computing)
Area Open source Block (periodic table) Multiplication sign Moment (mathematics) Source code Virtual machine Virtualization Coma Berenices Bit Formal language Facebook Interpreter (computing) Physical system
Scripting language Module (mathematics) Slide rule Server (computing) Statistics Open source Code Interior (topology) Block (periodic table) Robot Multiplication sign Binary code Combinational logic Mathematics Personal digital assistant File archiver Gastropod shell Musical ensemble Figurate number Library (computing)
Slide rule Server (computing) Scaling (geometry) Multiplication sign Binary code Bit Neuroinformatik Product (business) Facebook Facebook Robotics Hacker (term) Office suite Booting Window
thanks thanks everybody hello uh my name is ecology fun of uh and I'm going to be telling you about how we at Facebook use Python help us build data centers at scale some of you may remember me from previous years from uh such venues as the river by ah um I'm a production engineering Facebook uh production engineering as role is kind of like a makes that we software and systems engineering among going to talk about this in a lot of details now but if you're interested you can swing by our 2 down at the marketplace and we can tell you more about this uh so this talk also has moved open vital which I'm sure is a theme
that you've heard a lot about this conference uh still and it's all making the things that you didn't think you could automate as Facebook is growing really fast and I we have more and more users doing more and more things on the site this means that some things that we could get away is accompanied by doing by hand that we can't do any more because we're not going to catch up with the requirements of of Our scales now I'm not saying that this applies to any company out there could be an interesting thought experiment would be yeah
if you consider that computers are getting cheaper it's actually getting more and more expensive every year for you to not use computers to do things uh because humans are not getting to it interior and the S. so yeah this may not apply to everybody here but if Facebook we really uh needed to stop and think about how we can start doing things that really are not gonna scale and this is 1 story of of such a thing so that the fabric this is this is how Facebook designs datacenter networking and this is the so-called supplying leaf topology and think it's a design that somebody came up with in the 19 fifties so that the bleeding edge of technology as always and what this allows us to do is add compute capacity without having to mess with the network and among other things and not in networking expert at but we do have some African people here at the Conference of somebody's interested in this particular thing and you can again down by our boots and I'm sure some of the networking folks will try to help you an interesting data point is that their networking people at the Python conference yeah that's kind of like part of the whole team I think think the yeah so works the Facebook we actually have in the yeah and we have many of these fabrics because we have a number of regions and then have a number of buildings in them so this adds up quickly at Facebook um 2 so question for the audience
and a little bit about nausea-inducing animation uh thanks to PowerPoint how much fiber cabling connects all these topics like maybe venture yes and so let me just say that for astronomy merits in the audience this is not a natural representation of planetary motion and just you know get it out there so it turns out that there's quite a lot of fiber cabling in our data centers and some of my colleagues have from the data center of teams calculated that it's about 2 million kilometers of fiber cabling and that we have to deal with this uh so it's interior enough to uh and this make-believe universe go to the moon and back and around and yes so we can think about this in terms of 5 reports that we have to deal with in our data centers and as we hear adding more capacity and we have less time to do the things we use to be able to do that without information and so we can do it that anymore because there's just more stuff happening today we're out approximately between 10 and 20 million ports that that number is going to grow in the next couple of years rapidly um yes so
this is this is really not very 1 of the and and when it comes to operating the
stuff we also going to have people deal with this kind of stuff are we need to we need to know do better it and then cheesy animation later we built robots with funny tattoos and helped do this and I wanted to do by 2 logo so but that's on the slide and
yeah so this is a nice sunset California sunset photo off I think it's building 16 in the main Facebook campus in Menlo Park California um the addresses 1 hack away and this kind of like what we face but believe is a is a good way to to solve problems to hack so we hold these acts which are also kind of like hackathons they can be a couple of days longer and nowadays mostly people go home and sleep I don't think this was the case all the time but now they it is an and we take this very seriously lots of engineers take a couple of days to think about exactly the kind of things that that I tried to to talk about here like what things can we do better and then maybe sit in a room for a day or 2 and try to come up with a quick prototype and um and many many internal projects and even external-facing products of Facebook started out as like crazy ideas that people people just sat together in a room and hacked on for for the data 2
and yet this is an aerial shot of the of the campus building 16 is at the top right corner I think and you can see the San Francisco Bay at the back and like realistically you cannot just have your way through uh some problems that you to start out like that and that's a good way to have to build a prototype test your ideas that sound as you wanna use this and make make better use of it you have to put it in proper production um
and there's a there's 1 language that's actually quite good and helpful when you need to come up with the fast prototype but actually it's a works well
production as well as you probably heard about this language we get Facebook we're kind of like thumbs up thumbs up in light of and have some managers
from our 1 single repo that we have had that I pulled out to June these rough numbers but we actually have thousands of binaries running in production at a decent number of them is Python 3 we have millions of lines of code the 40 per cent if you look at the line of code counters actually by country so we're very interested in Python 3 and it's the 2nd most use language for back-end services arm after she loss and our like most people probably know that like the front-end of Facebook the product is written in a slightly improved or significantly improved version of PHB so we don't count as here and but everything in the back is mostly C + + and by Don with some other interesting bits he so how does that
work together we have all of these servers running by Don C. plus plus some people like Haskell spent cool were used to be an the the the phrase book will use drifts Apache Thrift it's basically an RPC framework and what it does is like you can define types and you can find the interface of your service and then you have a compiler that compiles this and generates language-specific code so you can have so this is written in different languages and that that can talk to each other seamlessly and all of your definition lives in in like a fire like this so
this is obviously a very contrived example this is not actual uh but like uh a system to help us deal with a lot of fibers would maybe have an interface with this you would define a fiber type may be the final location type and then you would have a fiber server um and then you will be able to ask what kind of fiber cabling is happening at them at a certain location and then like mentioned Beatrice compiler and generates uncle then you get the generated code which you
can then import and it's fairly easy as you can see there's only like about 6 imports and then you have to actually write your business logic of your service that we still haven't cracked a problem I mean how not to do it
and then you instantiate your handler there's a processor the Sankar there's 2 factories because 1 is enough just generally this is this is not very uh engineer friendly and it's also there's a number of problems with this like there's no signal handling there's no monitoring there's basically nothing
so that we came up with a solution to this and open source skull spots and it's basically services framework for Python and 3rd phase will use drift for everything else so it's a cost of but it also supports HTTP and does a lot of cool things you have periodic tasks background tasks you have and logging command line arguments all sorts of nice things like that so even if you're not interested in using it as such it's very nice readable code so you can take a look at it if you're interested like this is a good example of how you would write the service framework and on which I'm sure like is becoming more and more common nowadays that the and then if
you if we use parts of this whole boilerplate becomes uh much better there's just you still have to write your code we can help you with that yet and but then it's just like a couple of things and this thing would do way more than the previous example that because it will expose metrics so you can monitor your service and go like 5 weeks will occasionally connected and the the are all sorts of other things to get for free and so for those of you who were
paying attention I did mention vineries before and most people think of Python as a interpret language which is not have binary should have source files to the then interpreted by the C Python interpreter um Facebook we actually don't use virtual and paper or any kind of thing like that we actually do build binaries um and we distribute them to all of our machines also using by time which I will tell you more about this in a moment the so we
have uh 1 build system to rule them all in in and Facebook school block and it's also a source so if somebody is interested in this area you can go to bottom build . com and you will see you can read more about if green more about talk and
that basically we use bot built by combiners this is not new technology like this is done the open source solutions to this but what we do is like we put it in a zip archive and then pretend it with a shell script that knows how to uh extracted execute execute the code so all of our code all of the binary dependencies all of bite and pure Python dependencies would be in 1 big blob that you then have the central research but can help you with a lot of other things that catches your artifacts you don't have to rebuild things all the time but don't change and it has a
very nice reproducible way to define your builds and observe if take a look at this this is actually also Python code so this is actually very useful for me to come up with all the stats from couple slides back because you can use things like the AST module to pass this and figure interesting things out about your bill so in this case we define our Thrift library which tells the DIA which tells block to build the generator the generated code and then we have a python binary that depends on this library and runs server which then solves all of our problems interior of course
and so yeah like I said now we have binaries that Facebook use bit uh which is also powered by Python and sink I O to take those miners and distribute them throughout huge fleet of servers where they need to be so yeah by heavily abused
and this leads us to the candlelight like here DLB on to talk and if you take anything away from this it's well actually it's 2 things are the some things about how we use Python at Facebook but also that the robots are coming and I was really practicing this but I couldn't come up with the with a good good enough impression of of an Austrian accent saying and they need your clothes so imagine I said that in an Austrian X and uh but not like I said this thing scale you need to automate more and more things I think this to come a very big team in a lot of in a lot of talks that i've heard today and in the previous day's um at this conference so but something to think about its very for me it's very interesting to think about how really a lot of things that you didn't even think you could that be alternating are starting to uh pieces of that they're starting to get eaten by computers and done by computers more and more and that Facebook we use hacks it's very fine interesting way to our to come up with ideas of what things can go away and be done by robots and year and with that around a little bit short on time but not hiring and and what's interesting for these folks I think is London and Dublin offices for production engineering so if you're interested in common but built robots with us and maybe do cheesy slide that's uh yeah come to the point and and talk to us that's it thank you few have those 4 questions the 1st what people will need to boot I'm going to be that the boot uh holy tomorrow so uh think it's will bit awkward to talk like this so come with me after the boot and and we just don't all have had