Managing FreeBSD at scale

Video in TIB AV-Portal: Managing FreeBSD at scale


Formal Metadata

Managing FreeBSD at scale
Reclaiming Control of Large Infrastructure Deployment with Puppet
Title of Series
Jude, Allan
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Berkeley System Distribution (BSD), Andrea Ross
Release Date

Content Metadata

Subject Area
Detailed discussion of ScaleEngine's production implementation of puppet on FreeBSD to manage many heterogeneous servers across the globe, with 70+ servers at 26 data centres in 10 countries deployed in a number of different roles (Web Hosting Cluster, HTTP Accelerator, HTTP CDN, Live Video, On-Demand Video, GSLB DNS) our needs cover a large swath of the capabilities of any management system. It is common for sysadmins to jump straight to cloud providers if immediate scale is required. This unnecessarily reduces autonomy and choice, ceding control over many important components to large corporate providers, such as Amazon or Rackspace. While "the cloud" remains an option, sysadmins should strive to maintain full openness on their systems, avoid vendor lock-in, and regain control of infrastructure deployment. This talk presents a "full control" look at managing multiple simultaneous FreeBSD deployments around the globe, independently sourced, yet centrally managed. Unlike many common deployments, most of our nodes are physical, rather than virtual, and many are on rented machines where we have little control over the selection of hardware and components. This talk will also cover a number of tools and tricks that were used, obstacles that were overcome, as well as share insights and lessons learned in the process of deploying puppet. Also covers our system for deploying templated jails around the world as part of our CDN and managing them with our Global Server Load Balancer (as discussed at EuroBSDCon 2012). Highlights: * What is puppet? * Deploying puppetmaster for scale (using nginx, not passing large files through ruby) * Managing config files * Managing packages (with portupgrade) * Advanced configuration files with templates * Creating and using custom facts (freebsd specific facts) * Deploying jails with puppet (with ezjail) * Lessons Learned - Delivering large files requires some form of offloading - Templates are where the power is - Puppet is not like scripted deployment, manifests are different (and better) * Where to go from here: - FreeBSD patches for facter, some of our custom facts should be standard - stored configs or puppetdb (needs porting), letting hosts know about each other - using puppet to automatically configure nagios
Content delivery network Scale (map) Freeware Thermodynamic system Product (category theory) Server (computing) Directory service Online help Mathematical analysis Number Cache (computing) Computer animation Computer network Object (grammar) Control theory Videoconferencing Information security Subtraction Sanitary sewer Exception handling
Scale (map) Scaling (geometry) Process (computing) Thermodynamic system Point cloud Point cloud Bit Scalability Thermodynamic system Scalability Computer animation Computer network Computer network Process (computing) Form (programming)
Point (geometry) Scale (map) Server (computing) Scaling (geometry) Thermodynamic system State of matter Ring (mathematics) Channel capacity Point cloud Mereology Expected value Skeleton (computer programming) Arithmetic mean Prediction Expected value Computer animation Order (biology) System programming Data center Pressure Game theory Sanitary sewer
Pressure Multiplication sign WebDAV Rule of inference Thermodynamic system 2 (number) Planning Revision control Sign (mathematics) Component-based software engineering Kernel (computing) Cuboid Subtraction Task (computing) Newton's law of universal gravitation Rule of inference Scale (map) Default (computer science) Thermodynamic system Process (computing) Scaling (geometry) Point (geometry) Model theory Component-based software engineering Arithmetic mean Kernel (computing) Computer animation Computer hardware Order (biology) MiniDisc Procedural programming Central processing unit Mathematical optimization Arithmetic progression Task (computing) Resultant
Point (geometry) Complex (psychology) Server (computing) Mobile app Multiplication sign Virtual machine Point cloud Event horizon Scalability Number Term (mathematics) Vertex (graph theory) Lastteilung Subtraction Identity management Social class Scale (map) Mobile app Addition Thermodynamic system Scaling (geometry) Process (computing) Server (computing) Physicalism Point cloud Computer animation Order (biology) Vertex (graph theory) Data center Kolmogorov complexity Identity management Reading (process)
Scale (map) Multiplication sign Scaling (geometry) Channel capacity Multiplication sign Channel capacity Prediction Scalability Measurement Measurement Prediction Computer animation Right angle
Point (geometry) Slide rule Server (computing) Multiplication sign Thermal fluctuations Virtual machine Point cloud Insertion loss Mereology Event horizon Twitter Revision control Image resolution Video game Bit rate Insertion loss Arithmetic mean Single-precision floating-point format Energy level Software testing Abstraction Subtraction Error message Euklidischer Ring Social class Standard deviation Thermodynamic system Graph (mathematics) Channel capacity Block (periodic table) Point cloud Coma Berenices Bit Total S.A. Instance (computer science) Voting Computer animation Oval Data storage device Personal digital assistant Noise Control theory Website Right angle Family Abstraction
Meta element 1 (number) Insertion loss Mereology Direct numerical simulation Duality (mathematics) Computer network Visualization (computer graphics) Core dump Videoconferencing Cuboid Extension (kinesiology) Process (computing) Electric generator Thermodynamic system Channel capacity Basis (linear algebra) Electronic mailing list Physicalism Point cloud Virtualization Instance (computer science) Skeleton (computer programming) Root Internet service provider Quadrilateral Buffer solution Control theory Right angle Point (geometry) Server (computing) Freeware Presentation of a group Virtual machine Point cloud Number Root Average Hacker (term) Database Computer hardware Subtraction Metropolitan area network Condition number Scale (map) Axiom of choice Content (media) Core dump Usability Directory service Cartesian coordinate system Component-based software engineering Computer animation Computer hardware Computer network Data center Bijection Videoconferencing Multi-core processor Hydraulic jump
Computer programming Server (computing) Computer file Divisor Decision theory Virtual machine Bit rate Client (computing) Public key certificate Open set Template (C++) Web 2.0 Object (grammar) Ideal (ethics) Cuboid Medizinische Informatik Scripting language Configuration space Vertex (graph theory) Traffic reporting Condition number Installation art Scale (map) Scaling (geometry) Product (category theory) Public key certificate Information Server (computing) Computer file State of matter Client (computing) Term (mathematics) Configuration management Dirac delta function Template (C++) Computer animation Network topology Data storage device Logic Control theory Formal verification Pattern language Right angle Identity management
Code Scaling (geometry) Multiplication sign 1 (number) Mereology Wiki Web 2.0 Computer configuration Cuboid Module (mathematics) Library (computing) System dynamics Injektivität Thermodynamic system Real number Electronic mailing list Bit Instance (computer science) Variable (mathematics) Category of being Macro (computer science) Software testing Server (computing) Freeware Computer file Virtual machine Control flow Student's t-test Number Scripting language Software testing Macro (computer science) World Wide Web Consortium Data type Default (computer science) Addition System dynamics Scripting language Server (computing) Chemical equation Magneto-optical drive Model theory Bound state Interactive television Set (mathematics) Wiki Computer animation Gastropod shell Identity management Library (computing) Separation axiom
Point (geometry) Slide rule Server (computing) Manufacturing execution system Computer file State of matter Virtual machine Client (computing) Discrete element method Public key certificate Rule of inference Variance Number Web 2.0 Videoconferencing Local ring Scale (map) Overhead (computing) Raw image format Thermodynamic system Public key certificate Computer file Aliasing Point (geometry) Expression Model theory Content (media) Sampling (statistics) Client (computing) Mereology Weight Directory service Control flow Limit (category theory) Kernel (computing) Content (media) Computer animation Integrated development environment Network socket File archiver Module (mathematics) Natural language Right angle Modul <Datentyp> Block (periodic table) Permian Regular expression Writing
Server (computing) Addition Proxy server Computer file Mountain pass View (database) ACID Client (computing) Regular graph Mereology Public key certificate Number Type theory Root Computer configuration Single-precision floating-point format Authorization Control theory Local ring Proxy server Scale (map) Addition Raw image format Public key certificate Server (computing) Computer file Client (computing) Weight Cartesian coordinate system Arithmetic mean Uniform resource locator Computer animation Formal verification Right angle Permian Block (periodic table)
Domain name Read-only memory Server (computing) Computer file Divisor Virtual machine Branch (computer science) IP address Thermodynamic system Variable (mathematics) Template (C++) Operator (mathematics) Energy level Vertex (graph theory) Configuration space Information Normal (geometry) Computer-assisted translation Tunis Partition (number theory) Identity management Torus Information Server (computing) Computer file Weight Variable (mathematics) Template (C++) Local Group Connected space Inclusion map Uniform resource locator Computer animation Speech synthesis Videoconferencing Identity management
State observer Asynchronous Transfer Mode Server (computing) Thread (computing) Computer file Real number Virtual machine Markup language Coma Berenices Mereology IP address Number Local Group Maxima and minima Read-only memory Computer configuration Central processing unit Local ring Subtraction Traffic reporting Identity management Area Torus Information Open source Electronic mailing list Interactive television Internet service provider Sound effect Ultraviolet photoelectron spectroscopy Set (mathematics) Thread (computing) Template (C++) Root Computer animation Data storage device Right angle Identity management
Meta element Interior (topology) Multiplication sign Coma Berenices IP address Summation MKS system of units Backup Vertex (graph theory) Error message Social class Metropolitan area network Wrapper (data mining) Block (periodic table) Open source Internet service provider Mass Repository (publishing) Computer cluster Order (biology) Configuration space Right angle Clef Empennage Server (computing) Graphics tablet Pay television Computer file Determinism Revision control Gastropod shell Scripting language Macro (computer science) Condition number Installation art Default (computer science) Scaling (geometry) Mathematical analysis Line (geometry) Directory service Binary file Computer animation Logic Personal digital assistant File archiver Vertex (graph theory) Key (cryptography) Videoconferencing Routing
Point (geometry) Axiom of choice Read-only memory Server (computing) Group action Freeware Robot Multiplication sign Virtual machine 1 (number) Insertion loss Revision control Roundness (object) Virtueller Server Read-only memory Computer network Computer hardware Control theory Vertex (graph theory) Configuration space MiniDisc Subtraction Scale (map) Channel capacity Server (computing) Open source Internet service provider Physicalism Point cloud Bit Set (mathematics) Cartesian coordinate system Computer animation Computer hardware Contrast (vision) Internet service provider Data center Vertex (graph theory) Control theory Freeware Data type Window Electric current
Standard deviation Spacetime Multiplication sign 1 (number) Binary code Solid geometry Mereology IP address Wiki Mathematics Bit rate Videoconferencing Cuboid Automation Control theory Office suite Information security Arc (geometry) Identity management Social class Metropolitan area network Email Spacetime Thermodynamic system Process (computing) Product (category theory) Mereology Open set Root Repository (publishing) Internet service provider Control theory MiniDisc Normal (geometry) Right angle Pattern language Quicksort Figurate number Automation Row (database) Point (geometry) Read-only memory Slide rule Server (computing) Freeware Divisor Computer file Patch (Unix) Computer-generated imagery Virtual machine Augmented reality Bootstrap aggregating Cache (computing) Natural number Database Operator (mathematics) Computer hardware MiniDisc Subtraction Macro (computer science) Fingerprint Condition number World Wide Web Consortium Addition Standard deviation Graph (mathematics) Scripting language Surface Cellular automaton Volume (thermodynamics) Set (mathematics) Line (geometry) Cartesian coordinate system Word Computer animation Personal digital assistant Web service Mixed reality File archiver Statement (computer science) Gastropod shell Videoconferencing Pressure Identity management Window Separation axiom
but can so so my it's our you and I'm going to talk to you about how I use of it to manage a large number of myself this is no help to set the
end of the is helpful everyone except for me each so I've been previously sysadmin about 11 years now and then my biggest accomplishment is building a CDN I instead of the city and but eventually needed kind of a CDN the newly built it and then that today the more popular than the product both the support so we do that yeah and also hosted a podcast week that talk about systems that work in and before that professor of all college although in Canada colleges different of professor that I don't actually have to leave and I thought they were Engineering Institute and there were a lot of puppets mostly sent last year better can publish an article in the magazine about how the use of previously and I'm like I should be doing and I saw his started you obviously can't and basically this picks up where he left off and goes and I also work with the best what also answered that it
so a couple things at the cost of what scaling actually is and why important while I don't trust the cloud produced in form a little bit about what's going on actually is there and so on I didn't scale find things an advance copy of the things we learned in the things going to do that the so basically
scalability is the ability your system a network of process to handle more diversity than that now
but you can predict how and that's going to still time so scales New FIL always just in fact because in order to decide how to scale know what part the scale and you can predict that because users are what you think they will do they do something that so designing
for eventual scale build from and then in themselves of the pressure point is and where you thought was so you have all the scale or something that doesn't need it and the scale of part of this and that so if you try to predict whether those going be most of the time you're wrong so you design the system with expectation is still in the united skeletons to assume that you don't know exactly where problems going to be not so somebody can is when we started naming our systems we came up with the geographic system for after you human we only had 1 dataset of service the data center is part of the of the mean so on and that's work very well for us now covering 29 different datasets that way when a systems by looking at and in the system you can tell where which is something the cloud always you know personal definition of the cloud is when you don't know which state your servers and so planning as a
postcard version of rule that says please allow 30 days research results to come back so there's 2 different
ways you can scale system that you can optimize it this means getting more out of it without spending anything the idea to scale up making buying a faster system or you still have find more systems that works or you can use the which the based on different way it's going so you optimize system is you what you're doing is making each task on the system it less work for less time so that there's no more time left over to do more work amount so at each of the 1st to take each step in the process fixed-effect something about that the 1 . 5 seconds have left their signs used to do more each sector but if you try to into early what to do and so you just don't want time to the things that it a and I've been talked about his talk our benchmarking there's not much that you have to to anyone most of the previous 2 defaults scale automatically very well 10 years ago it was very common to be compiling a custom kernel that had a bunch of tuning in now that's much less common almost everything the model the city of this after reading out and all you really have to change them the scaling up basically attempting to make the system and the more work right so you can it better processor or more procedures in same box or more and more were spilled right you don't have enough disk I O worse than those means more I for to but eventually has limited by the harbor you can buy you can buy 6 needed the and you only have so much money progress actually need difference with but in order to scale up you have to know which component is the problem what is stopping you from getting performance you want or getting enough perform so you have to figure that out scaling up that might work for a minor just to give bypass a process of a new process and run fast or you can
scale add which is mostly what we did more nodes you're dead and also this let you handle value of a slot so you just by what servers although it depends on your app actually scale that with some only use 1 processor on 1 server so but not in any event going out on a single system so this always help but most of the work but it comes at the cost of increased management complexity right now to be able to manage more server and that's where the pain quite impressed that that was 10 servers and 20 and fine but then at that point where it's like the need 10 servers in 6 different data centers 6 different countries this week at that point it became cheaper to to learn public and to do it all manually and also require some kind of load balancing I which we didn't have a reading and we built that and we did talk with European because you can and it also requires some market where you have to what you can buy when get the or you can use the
class basic guidelines used it horizontal scale but you can go to amazon and rent 20 servers the only reason for a couple hours you want but if you're going longer term turned out to be be much more expensive and there are a number of other costs you don't think about it right basically automatically adding virtual machines here cluster you still have that most of the management like get from additional physical machines but you also have to deal with the fact that the cloud has something the idea used in order to interact with the machine so more unless complexity at the same time and also you have basically and Oxford and so I will keep doing it for ever larger and larger because you don't
get anything so if the business
isn't built build scale up probably not going to do my but if you try to predict how you're going to scale up too far ahead of time right start-up that builds 400 thousand users before they have 1 user then spent a bunch of money probably in the wrong place and yeah you basically have misguided capacity to support some predicted growth so rather than going scale on predictions you need the live on measurements planetary you need to know what the problem is so insult bank gets the problem might be and then you cannot react to actual growth and the so some
of the things that I don't like book class are the lack of capacity you can tell what's happening in the class you have no visibility at all what they're actually doing where they actually stories about or when there's a problem how do you find you never know when the problem hand on with the compressible on Twitter saying all right that was yesterday this time 1 of the Amazon Services soaring 503 errors incredibly high rate Amazon statisticians doesn't say that there is a problem with colon is detected problem I was using Amazon I would know that there was a problem until stop breaking and I have to investigate myself if I have no visibility on what's inside Amazon's part sleep several 14 hours on an airplane coming to this weekend and this week and then but also you end up with vendor lock-in right in building the Amazon ecosystem there are other clouds but most of them you can just pick up your stuff and move it's very difficult find the difference in the eyes different abstractions of story right there's S 3 friends on but Rackspace Cloud files with entirely different right yes for block storage and think that has something so on so and so it means if you wanna move does all this extra cost of redeveloping everything and so you end up being stuck at and there's a price basically for us it was cheaper than doing it yourself the marketing people right overclouded economic life and our or whatever but if you leave an Amazon E C 2 instance on for the whole month actually cost more money than renting a service that's 40 consider that Amazon like server you pay for every bit of them with after was family events server from they give you some and so that makes a difference and then you have to consider how much is the loss of visibility work right and which is you can't tell what's happening or your stuff that's most like here of on Amazon somebody was doing some performance testing like 200 + instances and they found that certain instances were really slow the problem is there's something wrong with the physical machine at some point or something and they didn't know and it's the this noise on their performance test out of a total of machine way and ask a different 1 we decided of physical machine and we don't have a problem but they can't predict when it's at high spin up a machine they're going to suck up his harvest spelling or has some other problems or just this this is shared system when you're on an amazon Cloud or whatever there are other people on the same physical machine and some of the same resource if they ordered all then you have performances you can't tell what other people are doing because it's the and is the risk you some cloud other than the Madonna right then how many of the writers of business last year so the system running a business all happy happy and then when they get old site you have so
what there's only buried that image slightly of course slides later you can get it out the to and so the graph is still small but basically this is the true but they got from UBS emergency rather than most citywide is all over the place although on some tests on some instances yeah this insisted very well it's straight what this 1 is just all over the place because what other people are doing is affecting what you this is not the answer you will what is at not exactly this this is my graph versus they don't have some of are basically they're doing performance testing on Amazon in this case is the EDS the Elastic Block Storage and their spun up to 100 instances standard performance tests with reading right so that reported reform votes and so on and grasp the throughput again and rather than getting some consistent level this wild fluctuations in some instances would just stop it has to do with and relational are these er yet fact that is a small 1 is to this is that this is a small white felt reusing 1 lesson blocks this is a small using for In our 8 0 and this is a large with a single UBS large with for EDS is circular yes the the website has much higher resolution version of of the graph and its exploration of the testing that little all of so and so In
among a number of other reason why we don't use the cloud basically reduces our autonomy basically tied some vendors and having to rely on them for more of this is that if we had the hardware itself have also other than Amazon where you can kind of of previously mostly other clouds don't offer previously and that's deal with also using virtualization usually has performance impact well experimental does In particular fertilization is using really bad at network that's true but you can get on a virtual network using up great there's another talk about that early yesterday and being that were doing video streaming and we need to put a dataset are more out each machine then that's a problem especially when it's not predictable right we can say right this machine can do this man made of it because virtualization How much throughput you have based on how busy the machine doing other people's work from sh at some point we can't predict how much capacity we have we can't load balance based on and hybrid network of more expensive so if you that 1 of the 1st generation and 1 instances from Amazon you get a dual-core core processor and so and have the ramp I think that's this present Michael price but it works out to 162 dollars for every 30 days in rough Her machine I or and that's old like 2007 in the opera process that you're getting I from virtualization and the 2nd and 103 which are the ones that you can run previously on without when this tax you get 15 gigs of RAM and a quad-core processor and receptor 360 dollars for 30 day you can get slightly cheaper reserved instances but you have basically paper for a Europe from which is the lack of flexibility is if we rent hardware from 1 of our various providers anywhere on average for history I hope 30 quad-core every 16 gigabytes of RAM really 189 dollars a month and that includes the terabytes of traffic with which we wouldn't have such or that from somewhere like sponsors conference here we can get a dual-processor quad-core beyond with 256 gigs of RAM for 389 dollars as opposed to 360 dollars for 15 years rest and do that with performance remains so yes why does that matter
of basis skeletons and we do http delivering video streaming we have about 80 number flights physical machines that are in 28 different data centers and in different countries there's an insert in your did get that has a list of all delegations on I can basically and added it we can possible 50 debates a 2nd as the machines we have most that's because we never tried it was too much of a one-to-one machine because with video contention is a killer versus http and you have more people more demand for them you have been with and people island over HTTP the problem slows down they don't really notice but with video if you add 1 more user server that causes contention all of a sudden all 1 thousand user-defined watch video the buffering and they so basically if we had any intention at all it kills not just the new customers but everybody who is doing in that box and so we purposely never put a box around 65 per cent of it's really the best part of that is because our dinner slow down the system takes about 5 minutes before it stopped sending users to a system of the DNS details so we have to cut up early to make sure we can go all the boxes are producing 9 but a lot of them have your best but we're conditioning by going all the ones with root uncertain part of the reason for that is our obvious and X and burnished relating the delivery and engine cash basically a bunch of hacks directories with book 12 million files and and then expressed a much better job of that and also basically because of the 12 million files and as a case of unexpected these machines bound for an hour more where instead of that and we have that ran that want and now we manage our all the servers with public and we also make extensive use of jails partly because 1 of our axis job so we keep that energy and Over the last year and is also that of our billing database at it served 80 billion http requests going over 500 terabytes fact and that 0 2 petabytes of the not so why
use public is busy allows us to quickly scale up and still an on own this we can right another server somewhere usually get set up in a couple of hours we talked about on it and it falls down on higher infrastructure and set that up in boxes production so do that we had to basically deployed puppetmaster at scale we you just install public from the ports tree it won't scale and so Mexico's like walk through what we had to do to make it work we also use custom backs basically we extract some specific information from previously and partly from our own infrastructure to make decisions about at a computer but it has a lot of bills in fact but mostly Linux specific the some of that work on open because previously known the same reiterates patterns that's me instead of just using public to deploy convict files we use templates and basically customizable it false for each machine based on the fact that we expect I we also use public to manage packages currently we use creates left portents stall to install base reports building all the act we need on each individual server it's about half an hour not really a big deal for us but we always switching packet and you get some and we also use public to manage are easy get basically we have a little other recipe and can all in jail and deploy because we just stand out these video surveillance and the tensions between 2 and half but most of the machine basically each role as packed up to and so it depends how many rules so that based on how much RAM as much and mostly been it has I our original product which was scaling PhD at like writing out across but previously service used jails much more we have lot of that's the ideals and so on but there was a couple of jails
for so what is public basically exact configuration management and I think it was compared to is you describe what server should look like and public analyzes the machine finds the deltas and does what has been fixed the machine looks like the men so basically make the simple declarative manifest and describes the machine based on our there are old approach which was right a couple scripts and run a money machine it will retry when 1 of the steps fail for or it will only do the next step in the previous step actually work and things like that so compared to scripting it's much better to have manifest because it will also notice down the road when all of a sudden 1 of those conditions is true anymore because of I we also use it to manage packages of all of our web servers need to have the latest version of and annex install our end user right I need my administrative user on every machine so I can log in and all those need to have my S public keys like logic and we compress other files and because like that I 1 of the other things it does is bundled with this program called factor that basically extract facts about machine and stores them at the top and it also uses SSL certificates to verify into the client while serving complete file basically over you can walk up and down like a big problem is you don't have the Nestle certificate signed by my absolute as I mentioned at the
beginning the stock doesn't cover the very basics of instead of the other I Edward attended a talk at obviously can't and there's the you that and we also put an article and use the magazine early last year but in addition we also use that to macros from the public wiki shall convey which is basically a macro for adding you know injects underscored enable equals yes in our and considering all the variables in the various shall follow that make up previously right so we the values of motor that our students make the company and then the porch come those basically same thing but uses the that count that the system so where you you basically have a separate the file for each service you have enabled so that you can basically all of our bodies variables are a file you to see I see properties like part and it makes it easier to manage and that and so
1st thing that you have to do to scale public is replaced the people so number of basically puppets of Ruby script and it uses Web break little ruby library that provides a single-threaded useless Webster it's great for testing and making sure puppet working but after that before you start having machines that things from it you need to replace 1 of the options for that is mongrel there's a little check box option for in the port of actually do anything useful anymore so Barbara basically is the passage I approach to Ruby you have a bunch of workers 4 . 1 listening and it works and you just load balance across those each ones only single-threaded because it's running really you can basically have a bunch of separate workers New have a from a web server for back to them and they execute different result after public 2 . 7 that basically decided to deprecate that all the bits that may longer work that were inside of the they switch from rails to rack and that room for a set of care basically it's not the best option anymore also the pool of workers that would run the code with a fixed size of signals on 5 mongrel instances and then you lower bounds on the web the other option newer 1 is passenger it's now basically a model for and next so when you compile and next school and part of the list you'll see passenger off the wild basically provides dynamic workers and it's built into interact various models of passage I approach this is the model will be of so the web server forks a bunch of and Oxford directly rather than over applied for TCP the since the dynamic workers state you can spot up to 12 but it is and we call running all the time so the original 1 which was public 2 . 6 we use Monroe but when we upgraded to 3 . 1 I switch to passenger because it was not clear how to make model work but
biggest challenge was the way we avoid the jails basically we have these Jill archives like 700 megabytes have everything in it for the and we have public deployed accurate server and so the polio from the puppet master right because the deal archive are licensed the the video server much of other stuff in it we don't want to just be laying on each piece of so to deliver files with all you basically running through the the rubies reading the file and the writing of the soccer and that's slow and not good and especially when you have mongrel where you only have 5 workers that means if you've got 5 new machines appalling tying down workers and now this expression can check in and get its manifest on at so to avoid that problem basically we wrote some and the next thing says when you're talking to other and you file if states the public sector some because we have engine next as the web server and then it calls up the passenger there every week we can say for certain files just delivered directly in genetics from her that about we authentication negative but has public using SSL certificates for each client has signed by insignificant we can and next verify that by pointing through our system of and that way and that's we deliver our jails files you authorize clients but I would do it at using so this is basically the
compared to reduce and you can download the slides after this is what you have in a stock is of and then the next slide shows the specific things we right so in the production environment when you put the file content of rectory we just went straight where those files live on our public as you know a puppet masters actually in and then we have a 2nd 1 here for module obviously public has models in the files for the modeling the place so we have a little regular expression here that so except for modeling and then point that the model directory of the small rewrite to our so that URL doesn't really match the following this so it will revert rule that things about In this way we consider the files from our files directory and the files that arena modules directly without going through that way were not blocking a limited number of Peruvians this is an were not using Ruby to send files 1 and Mexican just take a sample that the kernel of I
was introduced 1 specific problem in the end Jackson said
we have a soap opera quiet thereby plants on meaning and next will not talk you unless you have minds of the so the problem is when you create a new
public and obtain server it connects the provinces I would like to use it to design a request is proving and normalized I would like the public and the news of the certificate and so what we did the 2 options we can figure out verify claims to optional and basically said let them connect way and then in different location what we'd have to some have checked inside of these things you have to be verified in these things you know normally the root the public care of all that but we've wire around public a number of places and so we have at our own axis j so rather than that we created a 2nd based on a series of normal or plus 1 and we have verified by an optional also but we don't have our work around if everything goes really so when you arrive morning FIL the new public agent but as part of the 1st non-green talent we 1st certificate so on is at the option that the certificate authority is acting on and of course and so it will do jury to talk certificate but it's still do holy files that over the regular work part of the reason for this is that In would you can have local public masters and set up but it only have 1 certificate authority because it's the authority that I so in addition to this be useful for access control work around you can have every puppetmaster proxy this court back to your single certificate authority right so we have a puppet master in North America and 1 in Europe but the 1 in North America is X is said to be the if you connect to the 1 in Europe on this port if you promptly passes you To this day that authority to the 1 in North America but can the right of work it's a local 1 and with this we can do genus but he has slowed down the in our view discrimination on top of acid and our clients just go automatically to the right this way when we deploy servers in Europe we don't have to tell them in different of has
and so think you're templates instead of just a big files instead of just pushing out you know assistant jailed that can't reach machine that has a bunch of tunes in we can analyze each machine and decide how to write that a lot and basically we can incorporate facts information we pull out of the machine and we can also In our no definition In couple worry Howard Head this machine needs these roles solvent on we also defines some our own variables that so this is of no definition
of 1 of our machines prefix all my variables to fit in speech so we have a machine in Chicago culture I read it fully qualified domain name them we aspire the location we used to group machines and we have this identity that we use operating and then we tell public which IP addresses on this machine or for what so branch we have an array so that we can handle more than 25 thousand concurrent connections and then arousal to the user and conjecture that it like and for load balancing we also specify cat what traffic level this machines stopped accepting new traffic and that were traffic level should you ready to take more and then we include our various roles what so an edge server has partition and next server has been that's so for some things we need to create custom back because either the puff fact didn't understand previously or there was no other factor for what we're trying to do so we extract things like how much memory is and stock right so it is all this is CTL and how much RAM machine or how many of the use this machine so then
we yeah B which is like a ruby markup language we take that information and you think and so we have a ripple in that list of we're facing an array of different IP addresses that part of the cost of a smart and we look through them and add Coleman 88 to the end of each and enjoying in a comma-separated list without putting a comma after the last that was interesting for me because I have never written anything in Ruby for and so this is actually the article commentaries so it points to a complete file and then it says for storage we want Malik an area of 25 per cent of whatever random machine that right so it looks at the fact that all this machine has 24 to directly use that for this mission we have 8 so so we use that instead of having to the final size and then we use that identity to identify the server or and then we also do things like we create the number of thread pools based on many machine had so if it has a cost then we created a and then how many threads in each pool will divide Our max of book HA divided by the number of CPU this dynamically sizes how many threads barge we use sheen it's not only to use the parameter because In our set ranging from a bunch of different companies no to machines the even more referencing company we come back 6 months later another Sheen it's been read a lot and so this basically allows us to adapt to the fact that all of our servers are at so Our
solution for installing packages is used on upgrade in that stall but we also deploy package tools that com but basically has all of the options that you would normally it's the dialog defined right so we say and next with passenger and interacts with catching interacts with us itself and all the options for all the packages we use obviously answered the spot I originally when they're putting the you know making while at the end of the in the laboratory but that's not working if your reports these newer right the port is new the options file then it asks and you end up with the oxygen we can also the files you a puppet named differently but we actually have with 4 different actors tool that counts based on different roles so real acts tools but edge for observers and then make its name effective tools but also on and and then we create a class for
each path in this case that I hope which is delta then is monitors are as logic blocks you with disappear wrappers from threatened SSH if you're being a bad person so it's really say you include the package and I hope also deployed has that kind which comes from repository but this file depends on packet so once all this . com file until the port is already the top and then we use our macro and we basically enabled that I hosts our that come so it'll start up and then we define the servants and we say it ensures that the house is running so every time agent checks on which by default every 30 minutes it double-checked that hosts running or issues installed or so is it enabled through and we also subscribe it like that thing in his talk when you create a subscription there if we ever changes in that Paul it will refresh the file but analysis service depends on so research the service so that picks and you can think and I have a package for easy jails so this is also the detail but it also takes care of the initial set up right so we have a regular stop ringing ensure that user jails fall but then we also have the exact line just wonderful command and the same with the shell you are an easy yell install and then whichever version of the OS week using that particular host and then we tell puppet what directory that creates so before ever run this command it checks does that they still that if it doesn't it runs the easy gelatin command installed in and when checked later that directly does exist so it doesn't so basically make shell scripts determinist and we say that it requires easy package so this won't be run until jails fall or use your Felsenthal for some reason a more so in puppet nothing happened in any specific order unless you create these interdependencies trading order and then we enable easy
and this is where all the magic happens to class for creating jails so In a node configuration you just said I mean in detail with the set so it takes the name and the IP address of gel optionally the jail archive the which operated by Israel so that you can stand as a bunch of the same scale and you can specify an alternate with directory if you don't want soldiers video this macro this class because actually bigger than this but and some of the stuff that reaction in if case for whether or not there's an archive we do not include minus 8 correct that there is a but with the residents like and then so there's reduced the jail route to the jail directory created actually this so this way can tell whether that or is installed on the and we say that created jail depends on having easy Jones ball having used your set up right which is it's on base still and depends on having a copy of your work and so it has to there with that is archive from our repository before it can try to create there always there and it notifies the service is Yale every time this is changed so that basically it'll start the jail isn't finished and this is the father condition that tells pocket to download our archive which comes from that's where the and next override and so we create a service basically the definition for each gel rightly said make should still running called these jails enable of but the service command concert these gel individually so we had a teacher had you that right so I decided to go easy Gilligan start yelling stop restart and you get a little for beginning out agility running basically is that is joke consul command to run you been true so we get a 1 or 0 in the this overturning error in jealous of running overturned through it will work and the inner node configuration we just say news until all this with this IP and users archives and it'll spin up how to basically to
get our servers rather than having we have our own back for 1 set of servers but it does make sense to right have our own rack in 28 of data centers in countries where we are at war had never been so we basically rat servers from a bunch different places 1 of the advantages that this is that we have trained from a large collection of providers as well but not dependent on 1 place of transit from 1 place and it also means that we can add those 1 of our providers in Europe has server sitting in the back a robot did you did you i the y-axis within 5 minutes is in and you can install travesti have serotonin in no time on it but also gives us predictable probably cost and as we add more servers we can certainly with mostly will albeit discount of and allows us to choose our this memory size more adaptive we with the where demand was at 1 point we need more RAM so we cancel some were servers that didn't have a lot of room but if 1 had a lot and then our needs changed all that we need this instead of red and so we can approach from the servers that were for capacity on round but didn't have enough this we could get rid of those and ran different ones that have a lot of bits this the flexibility to adapt our inventory based on on the whereas if we have bought the physical machines a lot harder to do that this and provision of services in under 10 minutes about some but the
big 5 divided basically lose control over drug a lot of time you can pick a general region but you don't have much choice you don't have any choice of a transit provider or and we don't know anything about the physical hardware or what's happening outside talking the colonies said depending on what type of node u by Amazon it's a different version you get Linux no it's like stanford . 1 an old Windows notice 3 . 3 In the newer ones are much numerous that so it's hard to predict what you're going to get these fertilization but when you find during REM know exactly but most importantly allows us to use Free BSD so some things that we
still need to do to improve our set oxygen hydrogen G is the set up the and build all of our packages with the matter take 20 plus minutes off at a point in time and give us a little more control over what installed where and it would definitely operating the packages on the surface like our approach now is just wonderful all packages and public pressure luckily it only takes 25 or so minutes but also looking at ways to automated deployment of troops and that s unlike the traditional shop we can't really exhibit because we have 1 server at this provider in Germany so there's something for the patent office possible in different ways we can this want down previously these with fossil not all of are provided to psyche mind bending the hardware so our custom i so that on the image would be great if there for this is provided by the provider so 1 of my ideas was basically create a small according to whatever there was no a small slot very small that s did it under the drive and then used you part resized to fill in the rest of the graph and we just do that for each driving that is decided it would have the OS we covered already installed and everything so in this the marriage that approach might also be that if the provider on standard OS for we just overrated place instead of having to try to get out of and access and of the volume of the cell I would also like to look at that of tuning are women you catch sizes also in we said only allowed use this much this space that have files possibly like to change it so that's auditory based on how much disk space there on the all the best mix easier harder at same time the rate of the blended environment makes it and difficult basically the that that's lies about having the the not because it shows that this is being the size of free space and so it changes constantly and that also makes were bad facts and factor because changes over time are so we might be looking at a specific z as dataset that maybe as a quota and that would sold for and we also looked at to more loaded comparable presented they limit the oxides based on how much RAM or has because we have we'd end up with number of federalized because we have you this dialog video server that needs audit of random based their body choosing half a quarter of the Raman the box so we need to limit the oxidized nature not putting pressure on burnished and what of public specific things
we like to look at are some previously patches for factor a lot of the facts like come with memory have provided for almost every including obviously and the way to determine how much RAM is a solid machine openBIS the previous year actually exactly the same so by adding 1 case statements to the public repository all those that they would do this for free this year so there's a bunch of little things like that where I suppose I just need to make some get a whole requests and there we better previously support of but somebody needs to so I don't like that of so and I stored in fates are a big thing we're looking at that basically this means the puppet master has a copy of the file from each of the agents when you create and basically they had Active Record wait for that database up the deprecated in 3 and basically already that is public TV which is a big job a thing that doesn't have a working last time I and we like to use that because in addition to better inventory management that would allow allows us to do it is I have values is this be completed by so every time we add new server or service it would actually create a corresponding monitoring stuck in an idea for but we have place over the monitor this allows the Figure 1 server based on the fate of another and without storing debates of and it also allows to manage the of host every 1 of our boxes could know what the private and the public fingerprint for every 1 other that way when SSH around you note that the I also would like to look at using I don't know how to answer that tool that other users for managing the files this so instead of using those Shell macros to check if this line exists and then you said it often things that modify involved place this is designed specifically for writing you don't files other words the question let this was the the do not so what would you like to you I had looked at anything like and I didn't know what worked previously in article I have e-mails popular produced the magazines so I looked at it and so on and then you look at the right of the really really isn't a problem at separable and delivering 700 that powerful so we work around all of the the on the news as data if we do that although it is less and yes although my Windows machine doesn't like that's but In so subproblem but this is the next example I could think of what you can do is start that I haven't used it yet so I don't have an example that and so know in the world like you so idea or mail servers minute there are some of the stuff normal servers still done by hand but yes our mail server is jail right you have a lot to say yes yes so the mail servers like there's only 2 of them have but yet we manage them with the public involved busy everyone every host has and so it's just a matter of of because we don't have an easy archive of the mail server we can just stand there but yes the machine runs of the mouse the thing on the on the left the think would not really we don't have much sense data on genes of mostly these are edge servers that just catches the J files videos but we should have a better defined process for destroying was that data before we return the machine have like in mind it's easy to do that the ones that don't have proper you know all of our public all Republican are just the and we have adapted to QA and production and we merge the suffer we have a couple of jails that we used to test the deployment Mexico's moving forward with also jails like the jail that runs the museum actually has public installed inside it as well the public actually treats NGO as a separate and so that basically that jails and has only of public IP address as has deny hosts sold as well and so on so we actually some machine the running of it to retreat all the I We have previously machines but public has better support for open the than previously because somebody right and handout but a lot of things are the same between obviously a previously so by adding a 2nd case above the were below the overview of the 1 that would make it work of previously so there's a lot of really low hanging fruit for patterns that I really should be you know no are mostly his since we really restrict there hasn't been anything that brings up but read you have there's on the wiki there are macros that we use to write for that a lot the update so done security updates will not evidence of previous the death of the the switch from 991 using of and here is where by a lot of times up here right from machines where we don't have IT my axis that's 1 of the but for the ones we do yes that would look like that's how we treat the package is just package everything and let it up and put everything on fresh so you don't have a virgin competence yeah well that's what we're trying to come up with is that as they plunk down to make it faster and that way it stand up bravely did he die hard right with an image of a couple of systems with all was yeah I think you want yes but I don't know how to yes they we just by present rather than later and then periodically just heard all actors have to install because port operator is the best way of the the patch I think so and you know you're old has not yet yet parts and know that person and that conditions right now I got to that it's sort would like to do but I just learned of short will and I would have started to 1 6 and I just where was tied right slides of the 3 and learn what really left think of the the while public has right but it has classes to write out and a new ideas files but it requires the storing the big of every server in the master so that you have the variable the need to point and that requires a and the 1 where they had to database using Ruby's Active Record class just talk to any my skull server is deprecated and the only way to do it is with public debate this is a 2 other things and is part previously so we're not auto-configuring ideas yet and it's something like this we have to figure out how to make up the the work any other the


  837 ms - page object


AV-Portal 3.13.1 (abea844c86ad1b15ca76e1472346f3fd8bea123a)