The cluster refit


Formal Metadata

The cluster refit
Title of Series
Nielsen, Simon L. B.
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Berkeley System Distribution (BSD), Andrea Ross
Release Date

Content Metadata

Subject Area
A way to do project infrastructure, and a way not to; or the cluster before and now. The FreeBSD project is rather old and as such has had the infrastructure for running the project, such as CVS, Mail, and web servers, for a long time. The basic setup had been the same for more or less 10 years with the result that it was very complicated, had many inter-dependencies and of course no documentation on how it was set up. Security wise the old setup was out of date with current practices. In 2012 we had to move from one datacenter to another, and in the process it was decided to redo the setup more or less from scratch with the goals of making the setup simpler, more robust, segregated, secure and basically something which didn't cause the administration team to lose sleep over. The presentation will, for historic reference, present the old setup as an example of how not to have a cluster set up in 2012, and how we decided to set up the new one to meet our goals. The design of the new setup with heavy partitioning of network and hosts, using of FreeBSD for everything where possible including routers running FreeBSD 10-CURRENT. The impact of the 2012 November compromise on the infrastructure will also be discussed and what was learned from that.
Freeware Inheritance (object-oriented programming) System administrator Multiplication sign Projective plane Bit Mass Incidence algebra Mereology Local Group Table (information) Content (media) Computer animation Computer cluster Information security Information security
Service (economics) State of matter System administrator Multiplication sign Transport Layer Security Mereology Number Revision control Exchange Server 2007 Service Pack 1 Computer cluster Bit rate Internetworking Atomic number Computer network Damping Office suite MiniDisc Website Computer-assisted translation Subtraction Metropolitan area network Social class Physical system Mobile Web Addition Email Information Software developer Projective plane Content (media) Bit Mereology Set (mathematics) Cartesian coordinate system Word Computer animation Personal digital assistant Computer cluster System programming Vertex (graph theory) Dependent and independent variables Website Right angle Quicksort Physical system Reverse engineering
Slide rule Spacetime Greatest element Presentation of a group State of matter Multiplication sign System administrator Virtual machine Mereology Number Frequency Exchange Server 2007 Service Pack 1 Crash (computing) Latent heat Mathematics Video game Meeting/Interview Atomic number Flag Website Information security Physical system Area Projective plane Electronic mailing list Mathematical analysis Bit Tablet computer Computer animation System programming Website Figurate number Quicksort Physical system Fundamental theorem of algebra Thomas Bayes
Freeware State of matter Authentication Password Client (computing) Mereology Rule of inference Number Normal (geometry) Physical system Authentication Software developer Projective plane Basis (linear algebra) Price index Instance (computer science) Internet forum Particle system Arithmetic mean Computer animation Data storage device Password System programming Key (cryptography) Uniform space Reading (process)
Email Musical ensemble Randomization Interior (topology) Ferry Corsten Multiplication sign Mereology Predictability Wiki Chaining Web 2.0 Single-precision floating-point format Damping Uniform boundedness principle Physical system Control system Vulnerability (computing) Service (economics) Email Process (computing) Computer-generated imagery Mapping Building Software developer Electronic mailing list Bit Instance (computer science) Functional (mathematics) Flow separation Unabhängigkeitssystem Website Configuration space Quicksort Web page Point (geometry) Slide rule Computer programming Server (computing) Game controller Freeware Service (economics) Observational study Computer file Presentation of a group Electronic mailing list Number Revision control Computer hardware Utility software Scripting language Traffic reporting Metropolitan area network World Wide Web Consortium User interface Information Validity (statistics) Projective plane Physical law Volume (thermodynamics) Directory service Local Group Wiki Computer animation Function (mathematics)
Point (geometry) Pulse (signal processing) Server (computing) Game controller Service (economics) Divisor Code View (database) Multiplication sign Source code Auto mechanic Branch (computer science) Mereology Disk read-and-write head Subset Revision control Prime ideal Bit rate Natural number Scripting language Subtraction Information security Physical system Exception handling Authentication Service (economics) Software developer Projective plane Shared memory Code Total S.A. Summation Computer animation Repository (publishing) Personal digital assistant System on a chip Duality (mathematics) System programming Revision control Communications protocol Separation axiom
Point (geometry) Freeware Service (economics) Software developer Modal logic Scientific modelling Authentication Password Directory service Student's t-test Mereology Number Direct numerical simulation Mathematics Causality Database MiniDisc Information security Units of measurement Physical system Coccinellidae Enterprise architecture Service (economics) Firewall (computing) Server (computing) Software developer Video tracking Moment (mathematics) Basis (linear algebra) Bit Directory service Instance (computer science) Incidence algebra Set (mathematics) Local Group Process (computing) Computer animation Repository (publishing) System programming Website Gastropod shell Key (cryptography) Resultant Local ring Router (computing)
Point (geometry) Installation art Freeware Service (economics) Multiplication sign System administrator Virtual machine 1 (number) Mereology Distance Power (physics) Single-precision floating-point format Video game console Backup Vertex (graph theory) Information security Physical system Area Game controller Serial port Service (economics) Spacetime Server (computing) Data recovery Moment (mathematics) Element (mathematics) Core dump Limit (category theory) Power (physics) Open set Summation Process (computing) Computer animation Order (biology) Website Iteration Remote procedure call Information security
Point (geometry) Pulse (signal processing) Musical ensemble Installation art Freeware Multiplication sign Disk read-and-write head Power (physics) Number Frequency Computer cluster Video game console Backup Vertex (graph theory) Physical system Game controller Serial port Service (economics) Spacetime Server (computing) Data recovery Closed set Counting Core dump Bit System call Power (physics) Open set Plot (narrative) Computer animation Vector space Quicksort Information security Data management
Computer programming Freeware Service (economics) Scientific modelling Mereology Event horizon Power (physics) Number Frequency Direct numerical simulation Sign (mathematics) Hypermedia Operator (mathematics) Cuboid Physical system Form (programming) Service (economics) Physical law Projective plane Software maintenance File Transfer Protocol Degree (graph theory) Internet forum Word Message passing Computer animation Vertex (graph theory) Website Right angle Cycle (graph theory) Separation axiom
Point (geometry) Firewall (computing) Multiplication sign Password Mathematics Single-precision floating-point format Computer network Office suite Information security Physical system World Wide Web Consortium Authentication Curvature Theory of relativity Process (computing) Key (cryptography) Firewall (computing) Server (computing) Building Basis (linear algebra) Counting Computer network Bit Local Group Arithmetic mean Root Computer animation Computer cluster Website Key (cryptography) Gastropod shell
Point (geometry) Pressure Domain name Block (periodic table) Multiplication sign Point (geometry) Binary code Insertion loss Mass Single-precision floating-point format Frequency Word Computer animation Single-precision floating-point format Computer cluster Key (cryptography) Information security Website Arithmetic progression Information security Units of measurement Domain name Physical system Exception handling
Point (geometry) Domain name Pulse (signal processing) Game controller Freeware Computer file Multiplication sign System administrator Dynamical system Mereology Number Expected value Web 2.0 Computer cluster Internetworking Computer hardware Computer network Subtraction Information security Physical system Software developer Content (media) Sound effect Computer network Directory service Orbit Radius Computer animation Computer hardware Computer cluster Website Quicksort Sinc function
Standard deviation Freeware Copula (linguistics) Beta function State of matter Multiplication sign Voltmeter Computer network Mereology Database normalization Computer animation Computer network Selectivity (electronic) Electric current Asynchronous Transfer Mode
Web page Laptop Service (economics) Length State of matter Firewall (computing) System administrator Lemma (mathematics) Average Variance Number Revision control Summation Logic Units of measurement Metropolitan area network Raw image format Link (knot theory) Software developer Computer network Bit Boltzmann equation Limit (category theory) Maxima and minima CAN bus Macro (computer science) Computer animation Universe (mathematics) Pressure Units of measurement Data type
Point (geometry) Email Group action Code Multiplication sign Patch (Unix) 1 (number) Parameter (computer programming) Interface (computing) Mereology IP address Number Revision control Web 2.0 Flow separation Authorization Office suite Metropolitan area network Sanitary sewer World Wide Web Consortium Email Standard deviation Spacetime Server (computing) Building Web page Interface (computing) Bit Set (mathematics) Category of being Word Wiki Computer animation Read-only memory Module (mathematics) Right angle Game theory Quicksort Resultant
NP-hard Freeware Digital electronics Authentication 1 (number) Directory service Insertion loss Client (computing) Replication (computing) Mereology Video game Insertion loss Cuboid Noise Information security Units of measurement Physical system Data integrity Chi-squared distribution Sound effect Set (mathematics) Group theory Binary file Replication (computing) Computer animation Website Information security
Axiom of choice Numbering scheme Freeware Differential (mechanical device) State of matter System administrator Scientific modelling Multiplication sign Directory service Mereology Total S.A. Formal language Local Group Internetworking Operator (mathematics) Gastropod shell Subtraction Physical system Social class Installation art Axiom of choice Process (computing) Scripting language Server (computing) Login Total S.A. Line (geometry) Binary file Local Group Computer animation Personal digital assistant Password System programming Vertex (graph theory) Normal (geometry) Pattern language Modul <Datentyp> Key (cryptography) Object (grammar) Quicksort Data type
Software developer INTEGRAL Code Mountain pass Scientific modelling Multiplication sign System administrator 1 (number) Number Local Group Physical system Authentication Server (computing) Distribution (mathematics) Building Software developer Client (computing) Local Group Electronic signature Sign (mathematics) Root Computer animation Personal digital assistant Website Right angle Data management Physical system
Radical (chemistry) Slide rule Computer animation State of matter Set (mathematics) Multiplication sign Quicksort Information security Information security Physical system
Principal ideal Building Slide rule Validity (statistics) Set (mathematics) Building Scientific modelling Multiplication sign Virtual machine Revision control Computer animation Root Computer cluster Key (cryptography) Information security Information security Physical system
Area Key (cryptography) Software developer Password Exploit (computer security) Branch (computer science) Variable (mathematics) Virtual machine Table (information) Degree (graph theory) Root Computer animation Computer cluster Repository (publishing) Computer cluster Repository (publishing) Key (cryptography)
Logical constant Complex (psychology) Multiplication sign Scientific modelling Mereology Permutation Malware Video game Computer cluster Formal verification Core dump Cuboid Information security Descriptive statistics Social class Physical system Software developer Electronic mailing list Bit Hecke operator Price index Variable (mathematics) Category of being Arithmetic mean Exterior algebra Telecommunication Configuration space Data type Vapor barrier Authentication Hypothesis Revision control Factor analysis Telecommunication Form (programming) Authentication Multiplication Graph (mathematics) Key (cryptography) Video tracking Projective plane Planning Division (mathematics) Set (mathematics) Word Computer animation Personal digital assistant Interpreter (computing) Statement (computer science) Key (cryptography)
so I'm going to talk a bit about the previous the trust to call which is running all the basic infrastructure for them through this the project parents we've been working over the last years so of cleaning up a whole lot of little mass so I'm going to be so about and I'm part of the 1st administration to and there's you talking about started in my spare time it's entirely on work with and all the things most so uh 1st
group do some introductions big General and choose notice that required for anything else to make sense and the problem that nutrition tumor actually DOS wages more than 1 would expect but most of us we start to list all the you may have remembered and then go on to talk something about the world so we have the horror stories of how you should not run that's set of this size within thing that in the 2012 when Mr. and then some of the main part about the new set up and what we need to fix it the old stuff and then concluded with the part about the loving then the rebels 2012 security incident where the firm about the resistance of a good compromise
so 1st it historically wary always called the freebies developed systems clusters but it's not really a class the normal sense of the word it's not running very coherence paths it's not a big parallel cluster both of the history books belonging to different of but when we see the Freedom Center for additional cluster of generally the systems that we're running at a particular data set to support for existing project I'm going to be focusing on our primary cluster at a young which is where we run all the primary services like mail where version control systems we also have this secondary sites in clusters and ICT I can form your internet NewJersey just getting started having some systems at the marks in the and then finally as a rather different entity we have proposed by the and texts in Canada the network class but it's a very different beast in that it's be used for testing and develop was playing around it was originally following the work performed system but all also being used later as a place where we can but safely due to the developers axis and they can do they don't have to have this it was running at home and have access to many different parts of the world so the work that i'm presenting today both in the past and the new cluster is done by many different people but it's because most of the new because the world and about is and stuff I acted on proton complained and said we should not do in this way hopefully tries we should try to this way and also did some of its optimal so the administration office because the clustering it's a bit it's not really structured in exactly who is doing what that number of teams which are more or less and mostly less strictly defined and what they're doing and between the different teams there are there are some overlap with the with the members thank you during this time the main team would have this called cluster and this mobility were referred to as a different administration team that basically people have room on the primary systems and try to get the various currently yeah and then not that man members well uh 7 sort of currently regularly active and we just got 2 New over-enthusiastic members who helping older than the old farts is past during these but different and also gave most of the and all mention of it later why that's and important and has been a problem in the past there were had their accounts T. which is sort a it's a t which deals with state whenever we have a new committed getting a conjugated past changing account information and so on and what it may seem very small 2 members their primary member of accounts team is stated all still was also main postmaster and so it's hard for anyone else in the county to do anything because by the time there is available to me requesting something did over there then we have postmaster region clearly defined t which deals with the male so was spam filtering making sure that the the day-to-day part of all those systems actually running the mailman acting up so as mentioned the them on the overlap between different eat and then after the 9 members of cluster atoms on the right of number of people with the axis and to the readers but since almost all members to respond to spend time a reversal of much time have and so require some number of people to be able to have a reasonable response rate which hasn't been the case in the past and that was basically no people around actually attempted and most but not all of the members of the administration team of commandos but and the site node we have the classic cat 22 of getting new members to administration to unstressed may face tumors that we need people they can't members trust not to mess everything up but it's hard for you the people to show that they would not be a mess everything up without having access in the 1st place so it's not a major problems it's difficult to understand the dual you have friends was the the content of the of the right to and the fact that some of you if you do most of you can it would be hard to to everything's of permanently but it would be so not be hard to have a fruit is to project male and what not be unavailable for a couple of weeks because have to be do Americans infrastructure from scratch the but you know perhaps but the actually the
thing in response to the question of breaking the last year yet is so I'm going to try
and speed up because I have a fair number of slides and also try to put a little bit of time for which the and if you have a question about something specific please just after the presentation so in brief overview of the history of the distributors project started out all its young life at this 5 bits and and lived there for a number of years yeah as fast as they move through if you want for space but and the last year or so have been moving to the and public figures and uh in this scenario uh the 1st move was done in 2006 where the important part of the state in 2006 everything was physically move which was done initially by the Peter over a period of time taking all of the also was and putting them into cabinets which could be moved and then there was a flag there where we had what kind of every Muslim plucked put onto a truck and moved to the data and and we're very happy that the you're crash then it's a but it was while also physically but there was no fundamental change work all it is the last the and there were a number of machine that would of the science but it's still so the next move which has been heavily the last year so has been from there if you want to the to the outcome of because they 1 as cutting down and to sort of a project where it is is still shown below here has been a working on for a number of times to they have to do this be even before we do that a lot of tuning dumping because a problem with side that was that 1 previously people at the site which means that fall on-site work other had to ask Yahoo side of people all the answer to the people have to spend some amount of time traveling across the bay area to that so by having it is the problem in the sense that it means that when things break seriously Peter shorter that and then you can walk downstairs and fix things and the move from this 1 to the atoms and it will be called the bits and I wanna start of a long period of time because it even before the security and then we have decided we wanted to not just for prepared they wanted to have basically as a clean slate and trying to do a lot of things we knew what we've done wrong in a much more way was some physical so move almost all of them were all where while physically move they were the results and in
this picture the speed of light this is the old uh status set up at the C 5 which is just I have here to show where some of the anybody who has done physical system administration on the right scalable know this town of all the use of the way the problems and analysis of the women what this and that's right in that the quality of the eventual because of their reliance on a lot of time with the data that points there was the a lot of people at the 1st of all and then it is right at the bottom of the list of work you can also and if you might decide to sort out there and only let alone that all the good and that it led to a lot of what what what they all have to point out the number of physically and top government which I think it seem to be plotted on here with this is some
introduction for most people you might know about the primary way that through the project of of indication of that is using in this state this becomes important later and the 2nd hour authentication which is not useful whenever people have lost the SHP is PDP the real fun starts with people have lost both their safety and to the it's yet freaking out somebody else who can alter the activity the so for the most part known user accounts have no particles for the most part that has something to do it and that we didn't people that use possible log of course it also read that later we have no chapter possible and the only thing we have to identify people and authenticated is the state basis and a lot of people think it's more common for continue the PGP and then there a and we use the current rules as especially in the past to find possible for being able to store passwords use used by you more recently have started using more traditional style actually having using supports written attended will take its on but it's still only used mainly 1 thing to note here is that unlike many traditional companies and so on we have all the users and the i outside and we have the every developer has their own disposable set up which means it's very hard to rely on having very much on the client side uh for instance using occur across to authenticate people living with his h would cause a lot of problem for a lot of people also mean exposing also and and then on the side we have a number of stand-alone systems like read uniform system which are totally dependent on the employment the promises to what does that mean
inflation duty do in basic try to make sure that the further the project and function there's a lot to so several roles and things that the thing that was in time require the primitive is the it cannot function without male and work so well with the control system but also have a whole bunch of other things while interior practical without them rather the wiki such a system and reports so that 2 some of the people who were not what we want to be a system which had grown as an integral part of the entire workflow for the old people and so the project has all these bible of services that there are very important to make the project function a lot of things can still happen but there were a lot of work I don't have any numbers but truth of what the other end of report called support collection quality but from my understanding is it was not good so I'm going to try to
I had tried to list all the services this is not to go into detail services books trying to give people depending of the number of things as moving parts required to have the the privilege approach running I have written a little services the Nassereith top and T. what else is there and it's the number of slides from prior work because I had forgotten that where simply that almost not among the amount of food services that as novel develop all through the user you don't really think that you notice when an not so called primary having an e-mail system involved not outbound it into to wait until fairly simple but at least some 9 years ago well quite a fair amount of work a tender loving care and keep you running because it was a fairly huge amount of traffic for having on a single so what's today hardware has grown larger than some of the more faster than their volume of e-mails so there is a problem whether spam filtering whilst not perfect people notice when it's not can we have all the band of mere forewarning for which each developer some number of us have uh we have mainly which of course the yeah it's essential part of of repeated projects with is running mailman which we have I'm not sure about low population had pay part of it works most of the time but that work for instance when it was influenced Launch Launch law and it turned out that mailman when you look at me a primary web interface of boxer configuration of the files this is all the configuration files which 1 is the 1 that nobody could it could be mail processing this this is finally we have very few list or it extremely over them into the hardware suddenly when you have hundreds of millions of males that to get because the like website 1 that yet we have all the presents the primary web server is have most of it is just starting to melt that makes it a lot simpler faster just run because we have a of work build would take all the articles reviews and not let's just that the works of others nothing at all but not very much much Due to all the negative things and then you that but over the years that let's map random neural to something else is long trivial but it's it's it's a greater than the illusion of being a single website site this is mostly where more that the and we have all CGI scripts which are for the most part will fall might remember 5 there there but they both of them have been really tough for a long time and they depend on a whole bunch of infrastructure but work for a when you the website and go both pages and was search that's not fantasy CGI that calling back into the Makefile for off which initially required catapult for this matter to work and the that of the 2 of them I tried 1 . 2 recalls Gießen refold from scratch and I got the 1st month far I then sort of lost speaker that take a while and I was told this last we have the people through this so it is just the study was paid for the uh have a friend doing which both the take care of itself and then when where we thought to have itself so about the spammers love and started creating trillions of user accounts added to pages and they're very great exit control the man of 1 1 Ricky goes in and out and as long as you have to use which then overtaking just wouldn't have in the past you know it it was not possible to know validation of what they you can create pages at the end of the acyl chain 242 the continent so that was 50 thousand the NT Directory is the 1st time looked around well just spammers try already greater than this subject real 1 month the what happened was even less happy about when the number of spam users group around 100 thousand at which point was starting to get timeouts in the woods which is why I recently had to resort to killing all accounts for which we could clearly identified as being human because it was just away from a thousand and is not we have the version control exit site CDs with subversion where people were so that it both used by end users want program bit but also are useful tool for developers while you get all this information using the command line utilities the some of the time we're browsing around that's less useful we had a poor audits and you would smell builds a ball is a system for detecting fall on this particular system trying to see if you know that hold with no vulnerability good those in that they amount of interest the and some of them or the duration of ML by historical works wall independent systems where all of it depends on original but demand scripting and things interdependent to make this workers when of work it's when accident if again forget that it's depending on CDs and see this is not being able more this book but it was something that was only in the as well on the the intent that was that was the end of the season the of the the yeah like point a that it it took a month before anyone knows that portal was being updated and so we have a lot of control
system calls we have some version as a prime assistance for everything now when we still have to run the CDs for this debate for exploring to where 2 users Buckley's that's slowing down 1st among the Protocol server which I'm not sure exactly how much use it's getting but it's still being used for all people during their development was started written for the rest of the and the project because CBS just those 2 branches and even less so it's all been sitting there on the side we have a lot of people who really like and some people who perhaps even ball hate using powerful and I think it's there so yeah and I like I mean it's not hurting anybody who hate that don't have to to use these from that inflation point we had the sum of total subversion repository for of sitting on the side because we don't all the all the experiments experiments happening in the summer of code to the role of private repositories continue and it allows us to have a separable authentication mechanisms we have all much beloved Mets rates a that is fundamentally different point of view of the point of view both to take care itself and day-to-day things among others the but must detail except that we all that we absolutely must not everything from the fact that yes and effectively running MaryTTS 3 which was defined as the users because as security the 5 years and right now I don't think it's important
but in in the world half the will that here we go on all the time because the 1 with we you don't have to know that and at least finally this is the problem from changing it's not my projects so I'm not making promises for them with this will so we run the mountaineering infrastructure at the primary goal being at 2 P. master which all all that humorous stumbled data from to be master while seemingly intersymbol service people upload a negative and releases and so on Admiral sector has been 1 the pain points and the we have a lot of murals and it's the initially if people could use ing because we're running out of fragments system so we use this up which meant that the reason the run into some works with not being able to do to you files lots and to be about dignity so while simple has caused some amount of work and also the they will server as running as opposed to the system have general I O the performance of giving us a couple of megabytes per 2nd from this subset it was basically whenever anybody download something from the limiting factor was not that what was actually this so that was that means that we will not that years up master system which of course is very old and has a lot of magic because it was presented to the world as 1 big treat it comes from multiple sources there are a bunch of scripts and then pulled to my things and the yeah has some interesting work that likely also that's Florida then we had a pulse now benefit is just big master service running that's pretty simple it's just the nature and a lot of that's they run by the security of team but the from part dual hats that I had to move some of the interesting side note here there is that I think on until 3 months ago this we rely on the edge of 3 800 megahertz dual 500 gigabyte at this with the head 3 three-member control from it actually works very well but yeah all right of and and this having fun was saying that 10 is getting to in warnings about this failure yet so there was definitely a rapidly some public nearest and this and that data that users accessing run also person so the 1 of the things I really like and instead of subversion compared to is that it's extremely lightweight as long as you have more friends you can have a lot of users if we could find all useless ignoring latency from that provirus just prefer not to forward and the problem to now runs a single share through this don't all where random
authoritative DNS which wild among the public in a similar and that's run out by AIC businesses that we have that's also motivating himself but published recently and as last nite we turned on unisex so that'll be interesting to see if the course we run the show also a free fall which all the nobles will know about and run the developer restraint system which if you develop was people that was you know about the service where you can go and build stuff that might be to slope they're probably passed on that unlucky you want we can run pretty 10 9 and 8 at this point uh so also reducible to and was the whole directories where white people useful yeah whatever on an intervention on then and we have a bunch of internal support tools which have been growing lately while we're trying to organize small we run a separate parts to the instance fall ticketing just yeah that's as much meant for that kind of thing we run monitoring it through an idea system those the most primary cases like if this visual free-fall that apart from some of the some of the this is just our repository learning and we will often be put on before naggers except I have seen in the very long what is on side yeah as basis and fall 6 each month after security incidents we run running on instance because the moment for those students having things are off it just started screaming and after we had moved things all the controversy that we're with an univariate finds fancy enterprise emerges at alright mom this is based on and is that right now and yes that this is the most cruel ill like databases no change history tracking the simple but actually does the job extremely well especially since most of the administrative I very far from the so must be having a place for connected the whole thing the filter is set up and to but information and that it has proven extremely basically required but not perfect but it works run all this
switches not really we do this and layer to model we have a number the lands we don't do routing and so on with we do robbers and most more recently included BGP therefore the outside that is not so want run BGP but necessity because otherwise we would be depending on the out group who that UPS fault the changes to final renowned finals and we experiments this point run that began between the sites making it a bit simpler to surprise what about having so crystal everywhere and guidance for democracy were other than their parts which you can really run activities 6 online local councils so where they can only be having to suppose that I might consider there yeah we have this is a piece of which used to be finally we run her prose mentioned and now run the collapse so was fall directly from that used to be have a bit more about and the wrong cause the units results of aggressive units so as for Cyprus and they also do it in a set of education which that means that if you break it in 2nd you know the main now you will probably not using for this moment
we run into pieces runs serial console and since most admins don't act it receives service and I was right part of an iteration deepening for 5 years for the 1st time so any of the previous 2 books of x 0 consul and remote power is extremely useful and means we can do the job it was very other than the usual some biases support the serpent along some we run back up some of this the spirit gold and a bag of some the elements as that well this is we have targets of the really essential parts and it's 1 of the areas where we now what moment all focusing again of being able to back up everything so that if the system by and for much for fast but restoring and distances securities vacuoles before running on on a critical so and soon on also about so that we have a history of what has happened so themselves they're not yet currently is running jail which is the sum of the fun you experience it will be beginning of dedicated machine which is even more locked down the ground 1 once we get some hot with this space all and now that is that we have in mind issue of the the this limits amount of power you have endurance of filling all the rat you structure in brightness which gets from what think this this will get 1 and will probably even get multiple because the ones in order to see a separate sites that it's not have a single point of freedom at the reference point we tend not updated make pretty basic auditing of this 1 had system 2 month ago has so far there is 100 gigabyte a lot the we have really infrastructure with another thing that so the required for being able to do it that remote of the of the of the I don't think so
I mean what you maybe I just turned it on I don't use some for all the 100 gigabytes of a 2 month period is not that much ordered a set of and then we
have a number of so not really wrong but also to the in various ways posted this as in the space in the past been rather ad-hoc that we need solver provided some jails and so that users who run this suffered from that solves all of being there but we're not really sure what we're trying to formalize this a bit more the the biggest 1 called upon both cluster which we're now getting into a situation where we're trying to make the call the call that to so help make sure that the systems are running at the head of previously the Hessian account system and after that it's up to the port manager people to actually use of which they hope that there is that we can reduce the number of people with vectors to so we advocate for the we have huge data which has been it like has been taking care of the lack of support from the plot that and we have the pulse months through this up at some of the freebies don't they serve as a running rather separate from everything else that there was a host of here that we basically provide power and we have both that's about we don't think there is a foundations which in most concerned about the number of clusters that people have been helping you run those that called and even more separate the system on the side where appropriate close to ports building and sort of it running on the side unlike usually had the count already yes but it will have at some point and 1 of the small things from periodic musical team collections of which I don't have the money and then there's grow to prevent which was ligand field had been trying to take care of this find out when which will have full a long time which currently being evaluated what's was and
so 1 of the important is that we don't cross we do not motivated periods do not run of yourself of the weaponry and so on and so by historic messages you program by most of the DNS for that then fully country-specific predictor lots and add them to the box which is running separability it has been estimated by the yellow and the forms to which have been running entirely on the side so wouldn't trust that but is now now that's proven to actually be useful uh it's slowly being pulled into the project as set the overlapping have things happen maintenance 1 of the problem people driving it is also has a cost and then more recently had that port scouts system is slowly being moved into the predictable and this is mentioned In the riddle part of saying OK we're trying to run important services we prefer that services can show that word before they're putting into the fall because once we have something running it's very hard to get rid of because people start going on this is the only 1 of the most of the of of all side of the so currently the something right here on this the is something word you you so the question what do you want to do about the 3rd party rural especially for where it's based I should your bet particular event also security public and really bad a date but I'm right I think we're going to be moving more to our model where since we have a number of generous sites hosting the provision at the site to various degrees donate a Rackspace and power law and uh even tho to us and that will be trying to take the some of those things and and move that we now have a possibility but this is not something we can deal with doing overnight because happened about things that in that it only on a 1 or when there is a particular problem to make less sense today when we I always used the primarily even from Europe because the the speed the latest knowledge is and the the world use your
URIs and they all all the neural maybe it's a sign of the master node that is that all the Moreau cycles dominate piece of with the magic of hearing that that media compromised travesti update all ports that mirror does not mean that they can compromise and use previously it was was in you is know was what we have we have a bite mark which we're trying to get us operation you're running out where we're running out of it just provide host so I'll
try to speed up the bit of need to skip some of it to
get them to discrete time so they also have 1 big flattened work for everything we have worked so was sitting on the same what were running benefits and we have x building on this and network infrastructure like you know this this everything was on the same network where a lot of the security was the dependent on systems not being and using traditional Unix Wilson from a local local to prevent mobile users from compromising of course it wasn't just that you could do totally easily but if it was like a created you would be at the this was suboptimal and the means that this was this will set out before I really start doing things and this was how think about that a long time ago that it was all over the world and the answer I would like to expect but this is how we do things long ago this is how we do things last year and so that was what I really wanted to do that something get from 1 of the main problems we had a deal downside side that we did manage to switch so we could not have multiple the lantern and simple fashion it requires you have get out to configure it we used as a basis for count but we do have no passports and that at all all possible almost group the way we the this is a key to also was lost in the in the 1st year and other and the lack of security office the method was we could distribute keys easily to other sites which of course meant that then we have to hack systems on top meant to be used for a process for authentication pleasure and firewalls were were ruffled running at the yahoo switches and again relatives which meant that to have any changes these it had to go through people employed by Yahoo and then to 1 to 2 weeks as we got IPC connected through a tunnel from ICT and terminated at a single so so that a single point of failure of so we
don't like have pictures of not reviewable we don't have pictures actually from as few 1 and all this is how the rack looks just before people started moving you know all of them from a 5 2 2 with what all FIL mass in only so the more you have right the people involved in when and where is 1 and to show a
progression from the old system while cabling so not perfect is it certainly much improved from so we have to make sense of the word sense of some of the things you want to say this but when you you go on this 1 this 1 left right right what was so we had
last single point of failure and also the so is a skis 1 dumpster rental fun when the number of but crashing for period of time and so on and off as far as since you basically block in where that depended on that which were most systems except for the real critical systems which was for the 1st time when realize 0 I have consul I have remember to copy common problem to console the binary losses companies and excellent candidates with respect to the optimum dependent on and affects and we have a funny cross-talk between all the progress of with a gap benefits these these crossed mount have for many years been read only bought it still meant love Lofoten whenever 1 of the simplest we had a single concept which can result yet the prominence of the which calls everybody's be extremely fearful touching that particular so because that would kill both units and I'll come to lecture at the same time 1 make it seems very loosely security domain in that if a compromise 1 it would be trivial to compromise everything so this was the Battle of
so the new cluster at yahoo Sunnyvale and that is why is this not anything from young with and that's what happens when you have mean many things and so
some of the goals of sort of the moving to 2 reactants and all that is and we wanted to have flexible network wanted files and elsewhere too so that we could separate thing from the land we could change filed and world strict and so on some of this was uh we did fermentation European Internet sites for how to set this up so you know Peter been recently doing all of your Internet and base infrastructure is active during a lot of what we the yahoo sites mirroring of what we have thought of that you're and we want to run as much through as possible uh at at much as possible but not every not has as an ultimate requirement for its that Europeans and we have some donated all controls before running through but we had some other problems with them but then that it was just not we we wait a long time and so in the end they're not running for forget they are more standard commercial orbitals of yes In this much fun when people I post like this so what is the pulse all all the radius of the lecture and this is called the but remotely so therefore while just confused and otherwise having to resort to very often work around to get the money yet patient modern about half of this was I we all have enough enough to going off into the weeds while they're having to do a little but then again we want to be able to separate things multiple in separate the domains of trust security so that different be Lance Falk what's possible and we you a that 1 part of this is security so that if we have a compromise of something we run a long run a number of things like the we keep which is running dynamic contents and that's how this and of this so we want to prevent an expectation of this to spreading and we also want to like the more segregation and that makes system administration simple it's really tough to single jails much more sure what you're touching that when you modifying the BU not suddenly going to be killing something else over different place it does mean of course that we suddenly have a lot more systems to operate and so on but if but monitoring as little as diverse as possible in the past we had NFS-mounted and all that even better the web so was had to invest not to get a safety and so on that's now gone and so we have an effect on the places where makes sense since it's very useful for developers to be shared among directory between fruitful fuel so 1 the references to for the practice and so that's pretty much it for of and then we're using it basically is a I demonstrated using people places but a lot more I yes this is gone I was not say that this has worked and so on but that was actually not 1 of our major pain points both time but it was so usually based so we have well that now almost everywhere yeah most of the hardware was donated the by Yahoo and scanned from various places yahoo Sean has some were for following quite popular remember across the UK and that was the that you really and it would have still have travesti a foundation to be able to study when shown quiet sector running in in the the to the
after the so know the this is the unfortunately is a happy heart disease due the lighting but this is less than a month old picture of how it looks not and we see this looks really well
and the cabling it's also rather state and certain by Peter and the John have that they spend time to make sure that the cabling don't end up with what or
the so yeah I'm running out of time yeah so the Yahoo network is we have BGP uplink mentioned because that's how we can get confidence in that we have native before the 6 run all 5 volts which then at least kind altered during all around some interesting failure modes of the and and adding it to the final selection of 1 of the 1st place and that running current because 1 is the new copulas recall this is that pretty a little BGP part this pretty standard PF redundant set up with half pizza and we have the last we have fewer than I prefer boats people I can't solve a compromise and what was same compared to having between the ratio brief beta
BTB coming inseparably and that of lovingly named Robert and rather rest and the final yes think it's very easy to type in a number and then this leads to the universe that we have won the predominant each so and then using it bonding to get to be a bit lengths in the Bible 1 another across the actually crossing from back to the finals and is that's what want and then the
logical that is that this is not entirely accurate picture of the the set up of trying to get the outset of that we have infrastructure the land with stuff like the it name because of age that was all there authoritative over what I would call the pressure the lemma running some of the really critical services like the version control system states of the while it also meant by the limit of harmony you could think about and title of the rapid development we lab 1 with that of its axis and and so that the external services where we put all of the things we Jobbik monohybrid is in state and then we have the pages believing lands 1 of the things that for both security and for the administration is very nice that it's there there's less risk of anything going wrong in the past that have handled it can set up we have multiple TTP so was sitting there and knowing which can serve as they should reply to because some were run by a the Boltzmann details of Mumbai and then we have yeah so we will most likely be includes people that society laptop on yes php . net the projects primary so the amount of center of the and set of firewall policies that lock them very tightly into what they can do yes I like because also the only the accident run to run along the run PHP anywhere on the promises building unit
is and so the jails set up a we use EC jails in Iraq it makes managing all of deals a lot simpler but this is basically just all the primary the standard used parts that all binary Part I share and not that we don't know we don't amount the jail which makes it simpler to operate and very easy to create new jails and the it was run both unsupervised and your fast uh most people historical works it doesn't actually make any different day-to-day other than for those you press the basic ones we now have naturally set up so if we must them up about the it's just because we use the channel interface come parameter which means that we only have to configure actually wants I was critically reviewed that code was when then then and was very sceptical about it because people just add and delete interfaces and so always been sort of word would be the wrong thing but that have and it makes managing everything much easier in 1 place to the right half the number of times had jails where y is a year later so has my with stuff about you not time and a half years of want we try to chest up 3 jails so that we only have 1 result of conflict then recently that it's the different game we still use and mail for sending out mail we tried to switch to the main 1 1 and actually is running in a couple of jails there's just some quirks will represent saying that all males that coming from it you should look like come from repeated or much man high hopes the rest of the set all your Europe your and the end of this the you want to and we have many of the 6 only jails since all sharing wrote briefly went to follow which set up uh where we spent things into many jails 1 we don't have that many activities for accurate and actionable because something be just talk to also running there still the a number of jails where we ended up having to have right for what was because they had to talk something people whereas these now uh and then we use an office read-only amounts lot for sharing data between jails it turned out a bit too much had the centered here that you could have 1 guiltless suppression where pulled knowledge serving that works fine 99 . 9 9 9 the time the problem is the last 1 where can you certainly don't have followed in between the 2 and then the subversion nearest serving of have a committee because we did not close until we try to be combined to give a positive very confused why he was getting 1 result notable getting another so now we actually haven't got gotten this fixed yet I think it's a single common that's bad and wonder about we will
so for all the web serving we now run everything behind bondage that the ones the not is thus some caching but for most things about that like primary the amount the prime advantage is that we have 1 place we can do all the magic of redirecting particularly well someone else and we can use 1 IP address for all the different stuff that's resort because of reduced at all and so on bonds doesn't support should be as so currently running for me property it's best property it just until they had to quantify then buys disappeared everything to the actual back and jails with them which or static serving 1 of the the 1 both support web and so on which allows us to seperate each of the space in the parts we understand confident about it into 1 due to this also means that a number of these things have funny it consists of the where depends on number of modules so it's nice when you walk during operates at your it's much simpler to see what your thing and then do not to misinterpret something you've been told that well May lands also behind the same things were just redirected to a different political yeah and currently running a phoneme we're running at patches of both on most of the back and here mostly because that's the 1 I and we knew the the best but it's yet at some point I think will be so that a changing everything running the same thing all work yes and then when considering all authors we don't mind friends and all those who have 1 careful with built and then that's also don't mounted in the such yeah and I ask you
what appears but had symphonic works like the setting and whole bunch of unity that it when have a slightly higher loss than it started free what this would that if you look at the previous at the end of an ice progresses you specific acts they do not make mental back off if you are this book system with the compiled fine because that was the only way we could get that get to work on the other hand that doesn't mean that if you have a hard problem units so will enable the hammer so we did not want to export exposed yet and it has no replication and the life having multiple sites and some of security issue with not circuit and so now we're using a lack of has a flexible schema which of course not because the effect on what we're doing it also means that where flexible and what we should be doing but I think that's a reasonable trade-off presently in the course of month now there was also 1 of the reasons why it took so long we will be talking about about the years but we knew that getting it can go wrong would be painful so so yeah it's possible that has built in replication with very nice has problems both the cell so that you cannot you can have the individual that clients actually having a very good idea that they're talking to about and very widely supported both in of all the the you not and in boxing and stuff like that you know we're not using all of this change in of I just have a short
double my own electric current which 21 and you don't to since there from this part
in the more interesting part of this where we have a custom scheme that means that we can for that administration hosts use different shell so we do not have to install it's age a kind all bad everywhere and but is also right we also have the resources public in lab and then export up and the and ability to have different L and the stage different roles which means that for all administrative they have 134 during the normal committees of node heals gives access to administrative current that 1 is the way we defined 1 the differentiate between what you how to do so remember this group that is object classes in you know that which is slightly painful because whenever we have to add a new type of system we have to go and help in detail but most of the but when this summer 1 of the things this is all because that's what you see on this was also language so so it started out
with using them as the model which is the obvious choice because it's fairly widely used a simple 1 you have just put them there you have of that working in this at but it increases latency significantly whenever you it doesn't really matter for from many years sort simple basically operations but there are some horrible corner cases like post that update since running different processes all the time you have a significant slowdown to all the profit to the uh that's and has a total of Internet and the working at the club working model audience all you will end up in the next minute time out and told us most of in line and it doesn't support this is a piece of a state does have a pack so that you can call cues from that but that would meant pattern all associated with which we really from and of course depend on having a whole bunch of people but reported stolen also so the 1st replacement of this was the 1st simple system that generated password single-trials files locally but most PostScript that requires still and that modules everywhere and posted opted itself uses use piece the so the placement was reinventing a mobile in the new system of update of the state so as we
wanted has system which was independent of the other running and wanted to prefer to independent based system tools for each of the all of us also since the number of our jails is also running authentication system so that people like and this is the use of a rather flexible and designed this system requires some useful to this present yeah so
reinventing the wheel we have different role for himself the ones that men with me but at the cost of people not logged in those developers will be developed was a lot of talking and so on the numerous don't combined saying that this so allows people the webmaster group and that one's going to log in and this allows for last port managers to log it builds and merges master possible groups which means that you can have local right which is very useful in some cases for not having to define a overwrites roles for everything if you just have a single extract out we also used to do that today all this data absence of K 5 locked the which used to be the case uh going through actually means that now it's actually possible to remove administrators again having removed and where the fuck to just under little find OK because the that to integration time albeit was strongly inspired by what's introduce topic model and I think strongly as using some of the same code and we may have just available using it be that means that even for the sites they can just cool about it is the proprietor of such signatures right there is no need for anything else it does mean that this 10 minute black for any update but normally really yeah
so why don't you engine and all although tools that the main reason was really there was at the time when some had to decide as 1 person was really bring about doing anything like this and the cost of the and so were somewhat nervous about yourself into a corner where we have uh all this running and then only 1 person who really knew how and have the even before the securities that we knew that were taken as some time to get this running and probably more than we have that to the and we really want to get to the not to have flooded organize the so I think it is extremely likely that a a year from now problem unless we will be running radicals you've engines or something like that sims that's not really really for us to reinvent the wheel will so people have already solved and now we have to have more people with experience of and since we have the the state system we running now we have sold 1 of the major problem with so we have this sort of this for supporting everything what was going to build up those
from for the rest of the week that of the yeah
that's it so I don't know how many haven't what about security and trying to go this is an abbreviated version
looking up said at the summit so this is like yeah in November
2012 but he was compromised which then they you attack taxes to both the normal for personal system and the pact building infrastructure and due to the the lack of security model in the package belief systems and have naturally evolved meant that they did actually have to they don't have to exploit as things get access and more productive there was principal model I'm running through all of that was declining as root writeable by non users what at that so it was rather quickly in fact this happened uh but at its remember a long time to better than nothing all of the validation that nothing happened didn't take that long but the rebuilding of getting in the running again was yeah yeah
said that another key and it was key variability across to the residents that most houses the work so we're lucky in that
area they both still monitors all his locked very closely so so see wonder why somebody trying to log in as me on my home so from the 3 distinct clusters that's not supposed to happen I'm not doing this and so that's not all the nominal luckily they attack could have done a lot worse than that and they got relaxes to the proposed approach is especially serious CBS which is basically impossible to validate that so it was validated the table all the prior the primary branches hasn't been modified we don't actually be modified into thinking that the best possible but it's very hard to be 1 % so the degree to subversion repository as it was other than away from from other neurons yeah very
quickly in tracking down the basing of what has happened and very quickly certain bad and users 1 of effective was wrong important part there was no indication that all that any malicious code been distributed to any users at so called 1 2 give a clear statement of the 1st time the properties to of what happened and then we ran into this small core work while reviewing development is say we can't verify CFU version and then point out that we don't like to have an interpretation telling you use approach uh so that was a matter scramble trying to have some documentation since it is a trade off and perhaps the most from 1 but has decided to it's better to have a bit popping up missing go do this and you can actually you can have a detailed description what happened at this is how to to refer to the use of subversion thought the systems and so on and that of having a half an hour yeah like intellectual world that was that the alternative with you might have a story about yeah that that was a lot of organizational thing that meant that was confusion about who was doing what and communication and failed internally in affirmative project between the different teams especially the class that insecurity and call some people were talking and some people were not well at times it could be useful to people with multiple hats and other times it means that there are some permutation official between so
the important parts of going forward another work trying to resolve the organizational issues who's responsible for what as mentioned the start that 1 of the big problems we also have a had 1 of the reasons that a lot of things worked on suboptimal was that people had to walk around the demonstration team was just not respond and there is no denying that where people do basic requests at issues all do something so simple that it takes we saw on 1 and then people tried final solutions and that causes the core and yes of 1 of the initial thing what until all of the so the model of keys inside the cluster which could possible list as as H 2 other systems development piece of the toric artifact they removed and killed there was enormous effort in from the old people in rebuilding in the postal system to have a security model and not allow the kind produces great and then the general rebuilding the infrastructure of the project itself and we assume looking at adding to the fact of cations 1 cases where 1 I'm worried that if we just go and do something very simple we will call small problem that we face fate and because we can run into people try fast hypothesis them so we want to to do it right but we want to add as many barriers to the protection of the project yeah that was I have to say time put in like I tried it yesterday and it took 55 minutes any other questions just over the top of the hour if you which I like that's more flexible and support more widely be allows you to do stuff like really it which is the boxes and so on into until authentication systems don't have 2 different 1 I do not like all the complexity of the system about like that there's well that's so well which is interesting and working using BD BD is not aligned the of that so I would say if you could do with only diversity simple thing of NIS and didn't have the security problems of the form of something as simple as an S but it just doesn't do what want don't the division of the the equipment that that I think most of it is estimated that there is always the chance that being about Humboldt and up to date but I mean it that that the slope the slope on documentation is this the plan for a lot what we want to have but when you graduate the to can you have you have a reasonable chance of getting using the graph what was going in the this was in does the government of the just males and and this is only in the 1 of the please note that have all the money it's a heck of a lot of and and the constant schema parts of pretty basic so 1 must not multiple slaves that's not really anything in practice especially I was with and you and there was a lot of swearing and there was growing up in you have no to work with the it well I have a of lot of using using them all the but all of them should has come to hold of the Panel on what was going on which means that all of the things you don't know what what is the whole set of variables all 1 of the reasons was going somewhere on configuration with also this means that we have in the course of the last 1 you will not be on the set of he has the aim of the world which you will work but the other thing that stands for the real we're not of you know communicating users of any yeah I don't like to use the word you know and that's about the meaning life the that might actually decrease with books on but it's not clear yet how we to be doing some of the most I think the 2nd type of feature


  975 ms - page object


AV-Portal 3.11.0 (be3ed8ed057d0e90118571ff94e9ca84ad5a2265)