NixOS at Tumblr

Video thumbnail (Frame 0) Video thumbnail (Frame 1105) Video thumbnail (Frame 2188) Video thumbnail (Frame 2721) Video thumbnail (Frame 3312) Video thumbnail (Frame 4147) Video thumbnail (Frame 4726) Video thumbnail (Frame 6406) Video thumbnail (Frame 7037) Video thumbnail (Frame 7807) Video thumbnail (Frame 8688) Video thumbnail (Frame 9582) Video thumbnail (Frame 10707) Video thumbnail (Frame 12018) Video thumbnail (Frame 12630) Video thumbnail (Frame 13045) Video thumbnail (Frame 13716) Video thumbnail (Frame 14071) Video thumbnail (Frame 14831) Video thumbnail (Frame 15616) Video thumbnail (Frame 16578) Video thumbnail (Frame 17516) Video thumbnail (Frame 18036) Video thumbnail (Frame 18725) Video thumbnail (Frame 19349) Video thumbnail (Frame 20374) Video thumbnail (Frame 20920) Video thumbnail (Frame 21448) Video thumbnail (Frame 23009) Video thumbnail (Frame 24927) Video thumbnail (Frame 25604)
Video in TIB AV-Portal: NixOS at Tumblr

Formal Metadata

NixOS at Tumblr
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year

Content Metadata

Subject Area
Infrastructure automation testing is hard, but NixOS makes it a breeze. Using NixOS' testing framework, Tumblr developed a comprehensive integration test suite for Jetpants, Tumblr's database automation toolkit designed to manage billions of rows and hundreds of database machines. In this talk we will explore the challenges of testing Jetpants, mimicking complex replication and sharding topologies, and future applications of NixOS at Tumblr.
Complex (psychology) Process (computing) Scaling (geometry) Website Database
Beta function Scaling (geometry) Software Blog Blog Website Database Database Coma Berenices Physical system Row (database)
Single-precision floating-point format Latent heat Server (computing) Personal digital assistant Server (computing) Network topology Numbering scheme Database Replication (computing)
Server (computing) Scaling (geometry) Heegaard splitting Data storage device Right angle Replication (computing) Physical system
Server (computing) Different (Kate Ryan album) Radius Intrusion detection system Order (biology) Range (statistics) Virtual machine Numbering scheme Right angle Database Writing Row (database)
Multiplication Network topology Infinity Replication (computing)
Information Network topology Cloning Bit Right angle Parameter (computer programming) Replication (computing) Physical system
Server (computing) Multiplication sign Order (biology) Data center Database Physical system
Functional (mathematics) Software Hacker (term) Real number Order (biology) Computer hardware System programming Control flow Computer network Product (business) Physical system
Axiom of choice Prototype Virtual memory Hacker (term) Projective plane Interactive television Software testing Software framework Complete metric space Statistical hypothesis testing
State of matter Wrapper (data mining) Network topology Computer hardware Phase transition Software testing Infinity Replication (computing) Number Physical system Statistical hypothesis testing
Mathematics NP-hard Sequel Code Online help Software testing Line (geometry) Evolute Statistical hypothesis testing
Personal digital assistant Code Phase transition Software testing
Point (geometry) Integrated development environment Wrapper (data mining) Code Personal digital assistant Range (statistics) Phase transition Infinity Software testing
Phase transition Phase transition Bit
Point (geometry) Default (computer science) Logic State of matter Phase transition Planning Infinity Statistical hypothesis testing
Electric generator Process (computing) Code Software developer Multiplication sign Database Integrated development environment Database Process (computing)
Suite (music) Building Software developer Multiplication sign Database Methodenbank Modulare Programmierung Statistical hypothesis testing Statistical hypothesis testing Mathematics Process (computing) Integrated development environment Methodenbank Phase transition Gastropod shell Integrated development environment Software testing Gastropod shell
Standard deviation Process (computing) Linker (computing) Patch (Unix) Interpreter (computing) Database Booting Product (business)
Cache (computing) Cache (computing) Personal digital assistant Vertex (graph theory) 2 (number)
Service (economics) Key (cryptography) Multiplication sign Data storage device Database Incidence algebra Instance (computer science) Product (business) Process (computing) Read-only memory Touch typing MiniDisc Cuboid Vertex (graph theory) MiniDisc Booting
Default (computer science) Online chat Open source Stress (mechanics) Software testing Booting Attribute grammar Physical system Personal area network Statistical hypothesis testing
Channel capacity Multiplication sign Software testing Mereology
so I present to you gray hem thank you [Applause] hi you all know me as Graham I work for tumblr as my day job I work on the site reliability engineering team working specifically on online databases specifically for tumblr mostly that means MySQL and memcache and we're going to talk about how we use Nick Nix to make that reliable and I think a lot of this becomes interesting when we look at the scale of tumblr and the complexities of MySQL this is my tumblr somehow I
managed to grab beta that tumblr com I would have expected it to be taken we
have over 150 billion posts 370 million blogs on the network and we receive 30 million posts a day that poses remarkable scaling challenges and earns us in the one of the top 25 sites in the United States according to Alexa we do this with hundreds of database systems obviously billions and billions of rows and we have just tons of data and when
you do it with a tool called jet pants jet pants is a MySQL automation toolkit it helps us manipulate MySQL replication topologies and sharding schemes which I'll get into in a minute and it's really designed to our specific use cases in MySQL a very simple deployment
involves a single server that handles all of the reads and all of the writes this has some benefits but major downsides involve a few you can only get a big enough server so you can only hold so much data and you can only serve so many reads additionally if this one server goes down everything goes down
more complicated but more resilient is where you split the reads from the rights so right still go to the the master and then it replicates by sending right instructions down to each replica in this way you can scale is as many replicas as you need to serve as many reads as you need but again if you receive too many rights you can't scale the master other than by buying a bigger server also this can obviously only store as much data as can fit on a single system in the system if the
master fails you can still serve reads any path doesn't require writing data can stay online this is big advantages if you architect your system correctly
at tumblr we use what's called ranged based Chardon in order to split the rights among many different servers so
we do it based on the primary ID so based on the record that is being stored this has a lot of benefits and as a sharding scheme is pretty pretty good if a single cluster or a single shard goes away all the other shards are online still happy still serving traffic still can accept the reads and writes and if a single master goes down for example it has a much smaller impact additionally there are some other shardene schemas that involve like modulus or like round-robin writing there was gonna be difficult to scale out after you set how many you have it can be very difficult to add more with range based shard and you simply add more at the end for an example if we have three database machines each accepting one through ten 11 through 20 and 21 through 30 these are primary IDs a new post gets created with ID 14 and it goes into the 11 through 20 bucket the 11 through 10 and 21 through 30 aren't touched at all and if we get more
than 30 posts we can easily create a 31 through 40 or a 31 through infinity for example
in tumblr each of these is actually their own replication topologies so each one has a master in multiple replicas that way we have data resiliency and replication to survive failures those
replication topologies have issues and and in my school especially it's a bit complicated to promote a replica to become a master if the replica master fails you have to take something which already has the data tell it it's a master and then convert everything else to read from that that can be a bit front a bit difficult to do if you mess up the parameters when you're setting up minus QL you can accidentally ruin your data set Jett pants handles this for us
in this hypothetical cluster we have a master with receiving rights and two replicas receiving reads if the master dies no longer accepts rights but we can still serve reads we then take one of the replicas and set it as a master and take the other replicas and configure it to replicate from that new master we then destroyed the old infrastructure and clone from one system to the next to create a new replica and now we're back to where we started a healthy system with enough duplicates of the information now
the the difficult thing about this system is it's very challenging to test and for a long time was entirely manual we would have actual database servers in our data center that we would clone and replicate and kill and test and and it took a long long time to do this these servers take quite a long time to configure because of their how unique they are and in order to test them you
you need I have solved this with NICs
and in order to do that I needed to
setup Collins which is our asset tracker in order to track what Hardware we have we need real systems which we can SSH into we have to have a functioning Network so they can talk to each other and so they can replicate and they need to be actually in MySQL and as many of these as real as possible in order to replicate production so that we don't accidentally do something in production that breaks something in early in early 2017 there was a hack week at work and I thought well this would be a cool thing to try but it'll never certainly never work by the end of
the week it worked and this was me and
shortly after my co-workers and my CTO this hack week project earned me the CTO Choice Award which is was pretty fun this this testing framework was completely automatic you you said it you it runs it shuts down it tells you if it works or not it was expected to take six months to a year to implement this because of the complicated interactions and requirements it was expected to be dogged slow it was expected to use for vm's it was expected to use puppet each of those things are very slow it ended up as I noted it took a week to build the first prototype and it takes six minutes to run a complete test I've
ended up making a DSL around the testing system here you can specify a number of spare DBS this is just spare hardware that's ready to be used number of replicas deeb use it starts out by creating a replication topology and here you can set the number of replicas to have and then you can specify a list of phases to use of test phases to run each of those are just bash wrappers starting
spared D B's if you have six it'll just
have six that are not associated into any cluster to start any replica DBS if
you set two or three you will have three
replicas on the one master and I used
the monolith because just like in the movie this triggered a remarkable evolution in how we developed jet pants suddenly we were less afraid to go change that scary ssh code that could let you do nasty things of by mistake or you could put new lines in your sequel which is a pretty common thing to do and it's been it's been a really remarkable help I'd like to walk through an example
test I think that would be interesting
this is a shard master promotion so this is that case we talked about before where the master has died the replicas there will be two replicas in the cluster so when this test starts up
before it runs any code this is what it guarantees will be there a single master and to read replicas in the test phase I
have a this wrapper takes arbitrary Ruby code and runs it inside the jetpens environment this accepts a pool which is the entire shard or the shard range which is in this case post one through infinity it selects the master and shuts it down this is a to simulate a failed master at this point after that test
phase runs the master is dead it can no longer accept writes reads are still okay I then run a jet pants promotion
which is just a standard phase is just a bit of bash to run it calls jet pants promotion it calls passes demote the master always starts at 2.10 and the replicas are to 11 to 12 and so on for the the spares so this will take them at the old master and replace it with a replica we are now
at this at this stage where we have the new master new master and the old replica is still down there
and then we can assert certain things about about the state at the end so at this point we can ensure that the master is 2.11 as specified previously and actually a behavior of Jett pants is by default to configure the old master to become a replica of the new cluster so you ensure that the old master is now a replica and that involves some automatic logic to start by ice ql these run now
on every pull request to Jett pants they are there's a bout a half dozen tests so it does take about 30 minutes but that's a lot faster than waiting for even a single manual test because of how it's slow it is to run those to the manual tests yeah it's been really remarkable and honestly using NYX and Jenkins has taken away almost all of the plan of Jenkins I think it took two maybe three commits to get our NYX build working inside a jetpack Jenkins which is astonishing really nice additionally we
are using Nix to develop an internal database the database integrates with
jet pants it's based on go and it has quite a few development tools going loves dev time tools for code generation and whatnot and it's a quite involved build process there are three developers
on the team building this all three have embraced Nix and have quite liked the process it took setting up a dev environment from however long it would take to 10 minutes to download the dependencies as I said no time wasted on Jenkins if we make a change the build process we know it before we push the Jenkins we know if it works before we push it and using the NIC shell obviously lets us make sure we're following the same steps the build steps do
the tests do integrate with Jenkins or jet pants so every time this internal database runs we run the full suite of jet pants tests the modularity that the module system affords us has made that incredibly easy the not to mention how incredible in the X OS test process is that's a remarkably innovative idea I believe this using Nix for this deploy
process has given us wonderful traceability it's very trivial to know exactly what went into the build process and how we ended up with this broken build in production and has has made it much simpler to diagnose why something is not working unfortunately we are not using Nix who actually deploy this database it is go so it only depends on the linker the interpreter so we we do have to patch elf the interpreter out to refer to the standard a standard interpreter path and then we deploy the artifact with puppet in the future something that
we've been working on is deploying memcache with a net boot we serve three
million cache hits per second and that's cache hits not including misses we have many nodes I didn't even bother to count because it would take too long to find all the use cases and just so explain
what memcache is real quick is it's a
very very simple database it's a key value store you put data you get data and you can increment and decrement there is no persistence so if it reboots you've lost all your data if you stop start memcache you've lost all your data it's an incredible tool for making tumblr fast right now we deploy it onto
this nodes with disks and we deploy it with puppet and sometimes those disks fail and that is very annoying when your service is memcache it doesn't matter if the disks have failed because we don't ever touch the disks the only thing that touches the disc is puppet and so when the disk fail we just have to shut it down anyway so some time ago I set up a pixee boot I had a pixie boots support so that our provisioning infrastructure can boot Nix OS and started doing a for another hack day a net booted memcache box I have that working the remaining process is integrating it with our monitoring tools it's quite likely we'll do a production pilot early 2018 we'll see that's we are hoping to get that scheduled we just had a recent incident which would make it quite nice to to be net booting these instances instead of running off the disk any questions okay
Nix OS already has built-in support for pixie in fact the packet provisioning infrastructure uses that tooling extensively in that is open source the specific tooling that tumblr uses is probably not going to be too generic outside of what Nick's packages already supports I can't stress enough how wonderful the Nick's packages and this is in this regard you're when you import Nick's OS there's a system and then there's like a net boot a pixie attribute by default and you can just use that and be done sorry if you mentioned this already so your chat pans test today use the QE Mook testing infrastructure yep they let
me just go back the make test that's the make test that's in X packages yeah it's the exact same Q testing that we use behind you did you yeah so as you know
with our it came we test we often have these random failures so we did by this our solution is not very elegant we just run one test at a time and we only run a few simultaneous builds on the hosts at any given time we're not fully utilizing the capacity of the host so we don't typically run into those problems yeah I was hoping to find a nice solution as part of that but I have not [Applause]