Logo TIB AV-Portal Logo TIB AV-Portal

5 Years of Rails Scaling to 80k RPS

Video in TIB AV-Portal: 5 Years of Rails Scaling to 80k RPS

Formal Metadata

5 Years of Rails Scaling to 80k RPS
Title of Series
Part Number
Number of Parts
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Shopify has taken Rails through some of the world's largest sales: Superbowl, Celebrity Launches, and Black Friday. In this talk, we will go through the evolution of the Shopify infrastructure: from re-architecting and caching in 2012, sharding in 2013, and reducing the blast radius of every point of failure in 2014. To 2016, where we accomplished running our 325,000+ stores out of multiple datacenters. It'll be whirlwind tour of the lessons learned scaling one of the world's largest Rails deployments for half a decade.
point paradigm decision time closed bits Part ease of use Facebook's Google cycle model platforms reading
point regression trees Actions services logical consistency time sources capture sheaf bits heads number production computational Types caches terms different steady state orders Data Centre Right metrics platforms
point app decision time sources ones Mass sign in second number operating system platforms optimal classes scale law projects storage Content applications Sphere means processes Computer animation case steady state life pattern Routing
complex code time decision schemes loss Part different extent management cyber algorithm Relational Blocks load Development feedback storage parallelization Transactional bits Benchmarks entire tens connections category sparse processes Right figure sort cycle procedure Results record point filters server functionality implementation control non-existence matrix Sequel real component Mass threshold production goodness level testing Cats share optimal platforms architecture install standards scale interfaces front end Databases applications system call caches Spring Computer animation case memory libraries
area point server app multiple surface time component water applications web Computer animation cubes single Neun classes
point matrix time component administrations sheaf ones second production response time single queue level testing model share exception systems call centre Graph Blocks Development cellular storage incident unit testing applications message-based processes Software life iterations figure
web matrix Computer animation component flood shape model applications write libraries elements
point Actions states real time administrations sources Mass inverse production second mathematics strategy Hardware optimal architecture script multiple Ionic independent Databases instance Location processes Software Data Centre Right Routing
domain script multiple control email Actions time Development storage bits lines Part applications rules number Computer animation strategy single orders Data Centre sort
multiple building app Development administrations The list sheaf fit Databases applications rules processes alternatives strategy different case single Data Centre life diagrams Right model errors
multiple capacity time sources projects storage sets maximal Mass Part entire tens processes strategy memory orders sort architecture spacetime
mathematics sort
this is the kind of humor and minus assignment I work on the official pretty much of phi and today I'm going to talk about the past 5 years of of scaling itself I For I've only been around each of 5 4 years so the 1st the 1st year was a little bit of digging but I wanna talk about all the things that we've learned and I hope that people in this audience can maybe place themselves on this timeline and learn something to learn from some of the lessons that we have learned over the past 5 years this talk is inspired by a guy called Jeff Dean from from Google is a genius um and he did this talk about how the skill Google for the 1st couple of years and he showed how these there a workable equals the shot of the lead all the stuff they did all than the no signal paradigm and and finally I went to the news cycle part and the announcer into to see but this is really interesting to me because you solve widely make decisions that they made at that point in time I have always been fascinated by what made say facebook this different now was the time to write a for for their for PHP make it fast so this talk is about that it's about an overview of the decisions we made at shopify and less so about very tactical details of all its to give you an overview and and mental model for how we evolve our platform and there's tons of documentation that's again there's tons of documentation out there on all these I'm going to talk about today other talks by co-workers walk close reading ease and things like that so market to get into the weeds but I'm going to provide an overview Irish of and at this point you probably tired of fury about Shutterfly so I'm not going to talk too much about it just the overall I the fight is something that allows merchants to sell people to other people and relevant for the rest of this talk we have hundreds of thousands
of people who dependence on 5 for their livelihood and through this platform rerun almost a 100 k obvious that he the largest sales here real service we always 100 K requests per 2nd or a steady state is around 20 to 40 K requests per 2nd and you run this on 10 thousands of workers across 2 data centers about 30 billion dollars have made it through this platform which means the downtime is costly Szityu East the and these numbers you should keep in the back of your head as the numbers that we have used is to go to the point of we are at today roughly these metrics double every year that's symmetric used so if you go back 5 years just have to cut this in half 5 types I wanted to do is a little bit of
vocabulary for shopify because I'm going to use is loosely in in this talk to understand how shall fireworks fire is at least 4 sections 1 of those sections is a storefront this is where people or browsing the collections browsing the products are added to the cart this is the majority of traffic somewhere between 80 to 90 % of our traffic or or Pre people browsing the storefronts then we have the check out this is for gets a little bit more complicated became caches heavily as we do on the storefront this is where we have new rights decrement inventory and capture payments admin is more complex you have people who apply actions to hundreds of thousands of orders at the same time at this uh concurrently you have people who are unchanged billing they need to be built and all these things that are much more complex than both check out and storefronts in terms of consistency and DAPI allows you to change the majority of the things you can change and it in the oven the only real difference is that computers have the API and computers can hit DAPI really fast recently I saw an ad for people who wanted to have an offset under orders numbers for a million and so this apple greater million orders and lead all of them to get that offset so people do crazy things with this API and that's 1 of our other major that is our 2nd largest source of traffic um after after I wanna talk a little bit about a philosophy that has
shaped this platform over the past 5 years blast sales is really what is built and shake the platform that we have when Cartier wants to drop his new album shot the fight is the team that I am on the sphere of life that we had we had sort of in the road 5 years ago or 5 years ago was when we started seeing these customers who could drive more traffic for 1 of their sales and the entire platform otherwise was serving of traffic they would drive multiple so if we were serving a thousand requests per 2nd for all the stored on off some of the stores to get us to 5 thousand and this happens in a matter of seconds the still might start at 2 PM and that's when everyone is coming in so there's a fork in the road or do we become a company that support the sales or do we just take them off the platform in front of them heavily and say this is not the platform for you that's a reasonable path to take 99 point 9 something % of the stores don't have this pattern became drive that much traffic but we decided to go the other route we wanted to be a company that could support the sales and we decided to form a team that would solve this problem of customers they could drive enormous amount of traffic in a very short amount of time and this is diving think was a fantastic suggestion that this happened exactly 5 years ago which is why the time throughout the talk is 5 years the and I think is a powerful decision because this is served as a canary in a coal mine the flat sales that we see today and the amount of traffic to they drive of say 80 K RPS that's what the steady state is going to look like next year so when we prepare for the sales we know what next year is going to look like and we know that we're going to laugh next year because were already working on problem so they help us stay at 1 to 2 years ahead then we have the in
the need of this talk I will walk through the past 5 years of the major infrastructure projects that we've done these are not the only projects that we've done there's been other apps and many other efforts but these are the most important to the scaling of our Rails application 2012 was the year when we sat down and decided that we were going to go into fragile rout which flat silver would go on to become the best place in the world to have classes so a team was formed whose sole job was to make sure that shopify as an application which they up and be responsive under these circumstances and the 1st thing you do when you start optimizing application is you try to identify the law hanging fruit In this case the law or in many cases the law hanging fruit is very application-dependent the lowest hanging fruit from infrastructure side that's already harvested in your will bounces rails is really good at this or your operating system they will take all of the generic optimisation tuning so at some point it has that work has to be handed off to you and you have to understand your problem the mean well enough that you know where to because ones whereas the 1st ones were things like background checkouts and this sounds crazy what you mean they were background for both the AT was started in 2005 2004 and back then backgrounding jobs in repeat or rails was not really a common thing the and we haven't really done it after that either because it was such a large source of technical debt so in 2012 a team sat down and collected a massive amount of technical debt to move the background jobs into or move the checkout process into background jobs so the payments were captured not in request that took a long time but in jobs asynchronously with the rest and this of course was a massive source of speed now you know occupying all these workers belong when you request another thing we did have these domain-specific problems is that was inventory you just you might think that inventories decrementing 1 number and you never really fast if you have thousand people but bicycle is not good if you try to decrement of the same number from thousands of course at the same time you run into lock contention so we have to solve this problem as well and these are just 2 of many problems we solved in general what we did was we printed out the debug logs of every single quarry on the storefront on the check out and all the other half past and this is the checking them off I can find the original picture but I found 1 from a
talk that some of from the company bed at 3 years ago where you can't see the wall here where the debugger lots repeat and the team at the time would go and cross often write their name on the chorus and figure out how to reduce as much as possible but you need a feedback loop for this we couldn't wait every until the next sale to see the optimizes we've done actually a different we need a better way than just crashing at every single fossil but having a tighter feedback loop just like when you run the tests locally you know whether it work pretty much right away we want to do the same for performance so we started we wrote a load testing
tools and what is low testing tool would do is that it will simulate a user performing a check it will go to the go to the store front browse around a little bit find some products added to its part and performed all Czech it will pass a test this entire check procedure and then we spent thousands of these in parallel to test whether the performance actually made any difference this is now so deeply wet in artificial the culture that whenever someone makes a performance change its people ask all have the load testing of the this was really important for us and as siege just hitting the storefront that is something that just runs a bunch of the same request is just not a realistic benchmark of realistic scenario another thing we did at this time was we wrote a library call identity cash we had a problem with it we had 1 my cycle at this point hosting tens of thousands of stores and when you have that you pretty pretty protective of that single database and we were doing a lot worse to and especially the sales were driving such a massive amount of traffic at once to these databases so we need a way of reducing the load on the database the normal way of doing this sort of the most common way of doing this is to start sending cost to reach like so you have databases that feed off of the 1 that you write to and you start reading from those and we tried to do that at a time it's it has a lot of nice properties to use to reach slaves over another method but when we get this back in the day there wasn't any really good libraries in the real world we tried to Fort some and try to figure something out of you ran into data corruption issues we ran into just mismanagement of the reach slaves were fully really problematic at the time because we didn't have any deviates and overall mind you this is this is a team of rails developers who just have to turn and efficient developers and understand all this stuff and learned that the job because we were we decided to go too far ahead of that we do we did so we just didn't know enough about my sequel and these things to to go that path the so we decided to figure out something else and deep inside a shall told had written a commit many many years ago introducing this idea of a de de identity cache of managing your cash out abound in men catch idea being that if I quarry for a product I look and then cast 1st and see if that's there if it's not if it's there I'll just grab it and even touch the database if it's not there I'll put it there so that for the next request will be there and every time we do right we just expired of centuries that's we manage to do this as a lot of drawbacks because that cat is never going to be a hundred per cent what is in the database so when we do read from that manage cash we never write them back to the the database it's too dangerous that's also why API is often you have to do fetched into the final to use because we only want to it on the sparsity winter read-only records so you cannot change them to not corrupted data this is the mass of downside wait either using read slaves or identity cash for something like this is that you have to deal with what are you going to do when the caches expired for old so this is what we decided to do at the time I don't know if this is what we would have done today maybe we would have we got much better handling slaves and they have a lot of other advantages such as being able to do much more complicated quarries but this what we did at the time and if you having severe scaling issues already identity casts a very simple thing to do and use the so after 2012 and what would have been probably are worse Black Friday Cyber Monday ever because the team was working nite and day to make this happen this is the famous picture of our CTO based landed on the ground after exhausting work of scaling shot by the time and someone and welcome up until mhaith future goes down we were not a good place but identity casts load testing in all this optimization is safe us the and once once the team decompressed after this massive spring to survive the sales and survive Black Friday inside when it is here we desire is a is a question of how can we never get into this situation again we spent a lot of time optimizing checkout and storefront but this is not sustainable if you keep optimizing something for so long it becomes inflexible often fast code is hard to change if you optimize storefront and check out and had a team that only knew how to do that there's going to be a developer was gonna come in at a feature and ad for user result and this should be OK people should be allowed to wrap Corey's without understanding everything about the efficient often the more slower thing is more flexible think of a completely normalized schema it is much easier to change in the back upon and that's the entire point of a relational database but what you make it fast it often is a trade of becoming more flexible think of say a an algorithm on a bubble sort and square is how much is the complexity of the algorithm you can make that really fast you can make that the fastest bubble sort in the world you can write a c extension in Ruby that has inline assembly and this is the best bubble sort of world but my terrible implementation of a quick sort which is n log n complexity is still going to be fast so at some point you have to start optimizing Soon out and we article so that's what we did the charge at some point we knew that flexibility back end sharding seemed like a good way to do that we couldn't we also had the problem of fundamentally shall an application that will have a lot right doing the sales there's going to be a lot rights to the database and you can't cash rights so we have to find a way to do that and starting was so basically build the savior a shot is fundamentally isolated from other shops it should be shot a should not have to care about shopping so we did per shop charting where 1 shops data would all be on 1 shot and another shot might be another shot in search of might be in this together with the first one so this was the API basically this is all the sharding API internally exposes within that block it will select the correct database where the product is for that shot within that block you can reach Jutta other shot that's illegal and in the controller this might look something like this at this point most of us don't have to care about it it's all done by filter that will find a shot on another database wrap the entire requesting the connection that that shot is on and any product free will then go to the correction this is really simple and this means that the majority of the time developers don't have to care about short they don't have to know of its existence it just works like this and jobs will work the same way but has drawbacks of this kinds of things that you know can do I talked about how optimization might you might lose flexibility that optimization but architecture you lose flexibility a much grander scale fundamentally shops should be isolated from each other but in the few cases where you want them to not be there's nothing you can do that's a drawback of architecture and changing the architecture for example you might wanna do joint across chops you might wanna gather some some data or an ad hoc worry about at installation across shot and this might not really seem like something you would need to do but the partners interface for all of our partners who build applications actually need to do that they need to get all the shots and installations for them so we'd which is written as something that the joint across all the shops listed and this had to be changed and so the same thing go when for our internal passport that would do things across shops find all the self with a certain that you just couldn't do that anymore so we have to find alternative the if you can get around that don't charge fundamentally sort of i is an application of all have a lot of rights but that might not be your application it's really hard and it took us a year to do and figure out we ended up doing in at at the application level but there's many different parts of a part on levels we construct if your database is magical you don't have to do any of us some databases are really good at handling the stuff and you can make some trade at the database level so you don't have to do this at application level but there are really nice things about being a relational database transactions in real and scheme as the fact that most developers just familiar with them or massive benefits and a reliable they've been around 30 years and so they're probably going to be around for another 30 years at least we decided to that the level because we didn't have the experience to write a proxy and the databases that we looked at at the time were just not mature enough actually looked at some of the databases that we were considering at the time and most of them have gone out of this so we were lucky that we didn't buy into this proprietary technology and solve it at the level that we felt most comfortable with at the time today we have a different team and we might have solved as a proxy level or somewhere else but this was the right decision at the time the in 2014 we started investing in resilience and you might ask what is what is resiliency doing in a talk about performance and scaling well as a function of scale you're going to have more failures and his letters to a threshold in 2014 where we had enough components the filters were happening quite rapidly and they had a disproportional impact on the platform when 1 of our shots was experiencing a problem every request to other shots and shot were another shot were either much lower or feeling altogether did make sense that when a single red is where the server blew up all shot 5 down this reminds me
of a concept from chemistry where it your reaction times proportional to the amount of surface area you expose if you have 2 glasses of water and you put a teaspoon of Lucia and 1 initiative to a cube in the other class the glass with the blue sugar is going to be dissolved in water quicker because the surface area is large the same goes for a technology when you have more and more servers more components there's more things that will react and can potentially fail and make it all fall apart this means that if you had an atomic components and you're all tightly knitted together in a web where 1 of where it's 1 of these components fail attracts a bunch of others with that and you have never thought about this adding a component will probably decreased availability and this happens
exponentially as you add more components your overall availability goes down if you have 10 components with 4 nines you have a new a lot less downtime if they're tightly work together in a way that 1 of them is a single point of failure and we haven't really at this point had the luxury of finding out what a single point of failure is even more than we thought it was going to be OK but I bet you if you haven't actually verified as you will have single point of failure all over your application where 1 failure will take everything down with it do you know what happens if you memcached cluster cluster goes goes down we didn't and we were quite surprised to find out this means that you're only really as weak or as good as you we can spot single point of failure and if you have multiple single point of failure multiplied by the probability of all although those single points of those together and you have the final probability of your app being available very quickly the what looks like downtime of hours per component will be days or even weeks of downtime globally at advertised over an entire year if you're not paying attention to this it means that a coke component will probably decrease your overall availability the outages look something like
this your response time increases and this is this is a real graph of the incidence at the time in 2014 where something became slow and as you can see here the time is probably 20 seconds exactly so something was being really slow and hitting time of 20 seconds if all of the workers in your application or spent 26 seconds waiting for something that's never going to return because it's going to time out that there is no time to serve any request that might actually work so shot 1 is slow request for structured 0 or going to lack behind the q because these requesters to shot 1 will never ever complete the mention that you have to adopt
it when the search becoming a problem for you is that single component failure cannot compromise the availability or performance of your entire system your job is to build a reliable system from unreliable components ever really useful mental model for thinking about this is the reason the matrix on the left hand side we have all the components in our infrastructure at the top we have the sections of the infrastructure such as admin checkout storefront the ones issued from before every cell I will tell you what happens if that component life is unavailable or slow what happens to the section so if readers is goes down is storefront up this check out is Avena this is not what it actually look like in reality when we drew this out is probably a lot worse and we were shocked to find out how red and blue how down integrated Shopify ILO when what we thought were tangential but data stores like memcached inredis top-down down everything along with it other thing we were shocked about we wrote this was this is really hard to figure out figure out what all the cells in the values of the mark is really difficult how do you do that you go to your production to start taking notes that how you know what what would you do in development so we wrote a a tool that hope you do this the tools called toxic proxy and what it does is that for iteration of the block it'll emulate network failures at the network level by sitting in between you and that component in the last this means that you can write a test for every single cell in that so when you flip it from being ready to being green from being back to being good you can now that no 1 will ever reintroduce that failure so these these might looks something like that but when some message queue is down I get this section and I asserted response except at this point in shopify we have very good coverage of our sensing matrix by unit test that a all backed by toxic properties and this is really really simple to do and other tool we wrote this call center
in it's very complicated exactly how how all of these components work and how they work together in semi and so I'm not gonna go into it but there's a real need goes into vivid detail about how Simeon works Simeon is a library that helps your application become more resilient and how does that I encourage you to check the readme and define and how it works but this tool was also invaluable for us to run to to not or to be able to be a recent Morrison applications what we
need a mental model we mapped out for how to work resiliency was that of a pyramid where we have a lot of see that because the 10 years we have paid any attention to the web I talked about before of certain elements dragon out everything with it was was happening everywhere the Rossetti matrix was completely red when we start and nowadays it's and it's in pretty good shape so we started climbing we started figuring out writing of this tool incorporating all these tools and then active that when we got to the very top someone asks a question what happens if you flood the that's when we started
working on more to the sea in 2015 we needed a way such that if the data centre caught fire we could fail over to the other datasets but the saliency and charting an optimization were more important for us than going into the sea well to the sea was large in inverse are largely in infrastructure effort of just going from 1 to n this requires required a massive amount of changes in our cookbooks but finally aperture all the inventory and all the servers and that the spin of a 2nd dataset and at this point if you want failover shop of item of center it is run this script and it's done all Shutterfly moved to a different dataset and a strategy that is uses is actually quite simple and 1 that most real substance use pretty much as it is in the traffic and things like that a set up correctly shall vise running in a data center right now in Virginia and 1 in Chicago if you go to a shop shopify owned you will go to the data center that is closest to if you in Toronto you've got to the data center in Chicago if you are in nearly and might go to the data center in Virginia when you hit that data the low bounces in that data center inside of our network will know which 1 of the 2 data centers is active it's at Chicago's aspirin and we're out all the traffic so when we do a failover we tell the low balances in all the data centers what is the primary data so the primary data center was Chicago and removing to Ashford we tell the low balances in both the data centers to route all traffic to action aspirin Virginia when the traffic gets there and we just moved over any right will fail the database at that point and read only they're not right both locations of 1 because the risk of data corruption is too high so that means that most things actually work if you browsing around shopify and shut off I shall fight storefront looking at products which is the majority of traffic wants anything even if you are an admin you might just be looking at your product and others is all and while that's happening were failing over all of the databases which means checking that the caught up in the new data center and in making the right of so very quickly the shots recover over a a couple minutes so they could be anywhere from 10 to 60 seconds per database and ensure the fireworks again we then move the jobs because when we move to when we move all the traffic we stop the jobs in the in the source data center to remove all the jobs over to the new data center and everything just text but then how do we use both of these data centers we have 1 data center that is essentially doing nothing just very very expensive hardware sitting there doing absolutely no how can we get to a state where we're running traffic at multiple datasets at the same time utilizing both of them the the architecture 1st look something like this
it was shared with shared red as instances certain and cached between all of the shot when we say a shot were referring to my single shot but we had started right as we had a shot cached and other things so all this was shared what if instead of running 1 big shot the fire like this that we're moving around we run many small shops but independent from each other and have everything they need to run and we call this a pot so Parmalat everything that a
shot of findings to run as the workers as the red Memcached bicycle or whatever else there might be sitting there needs to be for a little self either the if you have these banks are the fires and they're completely independent they can be in multiple data centers at the same time you can have some of
them are active and it is an a 1 and some of them active active in part 1 might be active center 2 and part 2 might be active and it is in a 1 the so that's good but how do you get traffic so for shopify every single shot as is usually a domain it might be free that we provide for their own domain this when this request is 1 of the data center is the 1 that you closest to Chicago or or or and Virginia depending on where the world you are it goes little script that's very aptly named sorting hat and it was soaring had will do is that it will look at the request and interpolate what shop what applied what many shopify this this request belong to if that request is on a shot that is going to pot to ignore this and 1 on the left but it is another 1 who goes right so sorting hat is just sitting there sort of the request and send them to write doesn't care where your winding which there's a new line to which is rather to the other data it needs to OK so we have an idea now what is more to DC strategy can look like but how do we know it's 6 it turns out that there's just needs to be true rules the honor rule number 1 is that any request must be annotated with the shop order part that it's going to all these requests the storefront are on the shop domain severe indirectly annotated with the shop they're going to through the domain which the domain we know which part which many shall but this request as belonging to the 2nd rule is that any requests can only touch 1 pi otherwise it would have to go across datasets and potentially this means that 1 request might have to reach Asia Europe maybe also North America all in the same request and that's just not reliable again fundamentally shots and request a shot should be independent so we should be able to honor these 2 rules so you might think well it sounds reasonable like Shutterfly should just be an application with a bunch of control actions to discover shop but there were hundreds if not a thousand requests that did in the pilot they might look something like this they might do something going over every shot and counting something more do something like that or maybe it's uninstalling people accounts and seeing if the if there are any other stores with it or something like that across multiple stores when you have hundreds of end points that are violating something you're trying to do and you have a hundred developers were doing all kinds of other things introducing new endpoint every single day that's going to be a losing battle due to send an e-mail because tomorrow someone joint never read that e-mail was gonna violated Raffaele talked a little bit about this yesterday but he colored white listing we call that shitless driven
development the the idea is that your job if you want on a rule 1 and 2 is to build something that gives you a shirtless a list of all the things that violate the rules if you do not obey this shit less you raise
an error tell people what to do instead is is the actual you can't spell tell people not to do something unless you provide an alternative even if the answer to that is that they come to you and you help them solve the problem but this means that you stop the bleeding and you can then going forward rely rule 1 and 2 in this case being on when we had this for rule 1 and 2 on at our mall to DC strategy work the and today but all of this building at top of 5 years of work were running 80 thousand requests per 2nd I multiple datasets and this is how we got the thank you if it do you have any global data that doesn't fit into a shot yes we have a dreaded master database and that database holds data does belong to a single shot then there is for example the shock model right we need we need something that stores the shop globally because otherwise the low bound can no globally with the shopping other examples are apps axis so the inherently global and and they're installed by many shops it can be billing data because it might span multiple shots partner data there's actually a lot of these data so I'm going to this at all but I have to spend 6 months of my life so 1 problem so we have a master data is indispensable to what did not multiple multiple data centers in the way that we solve this essentially we have freed slaves in every single data center that feed off of the master database that is in 1 of the data in data centers if you do a right you do cross the sea rights this sounds you were scary the by we limited pretty much every path that has an higher so low from writing on this so billing has a lower as in because the right of the cross the sea by the thing is that billing and partners and the other sections of this master database in different sections there are fundamentally different applications and as we speak directly extracted under shopify because shot of should be be the shot application and if the extracted out shot 5 then you also do across the sea right because you don't know where that thing it so it's not really making the hassles worse and it's OK that some of these things have lower as ELOs then the check out and storefront and the admin that have the highest follows so that's how we deal with that but we don't really do how do you deal with a disproportionate amount of traffic to a single single shot so the I of the diagram earlier
then shows that the workers are
isolated part this is actually lot the workers are shared which means that a single applied can grab up to 60 to 70 per cent of all the capacity of short 5 so was actually isolated in the pod or all the datasets and the workers can sort of move between pods we would like the fungible they will between parts on the fly below Bilbao citizens request to it any more appropriately connected of correct so this means that the maximum capacity of a single stores the somewhere between 60 and 70 per cent of an entire data set and is not a hundred per cent because that would cause an outage because of a single storage were not interested in but that's how we sort of move around that answer how do we deal with large amount of data the like someone someone who's doing importing and 100 thousand customers on a thousand orders all this is where the multi tendency strategy or architecture shines these databases a massive half a terabyte of memory many many tens of course and so if 1 customer has tons of orders and then that just that's and and if the if the customer is so large that it needs to be moved that's so what this back fragmentation project is around is removing the stores to somewhere where there might be more space for them so basically we just deal with it by having massive massive data source that can handle this the problem the import itself is just on in a job as some of these jobs are quite small for the big customers and the we need to use a more realization work but most of the time it's not a big deal if you have millions of orders and it takes a week to import that you have plenty of other work to do it do during that time but otherwise so this is not something that's been high high analyst how much time done OK thank you a few the higher if the
a sort of change