From microservices to monoliths
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 31 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/45267 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
1
3
5
6
7
9
10
11
13
14
15
16
18
19
21
22
26
27
28
29
30
00:00
Physical systemDomain nameSlide ruleMultiplication signTwitterMoving averageWebsiteBitScheduling (computing)VideoconferencingLink (knot theory)CASE <Informatik>XMLPanel painting
01:48
Human migrationBuildingFingerprintArchitectureMetric systemFront and back endsData loggerBitSource codeQuicksortArchitectureMultiplication signMetric systemInclusion mapJSONXML
02:43
Domain nameReal numberInterface (computing)CouchDBErlang distributionMobile appData storage deviceFamilyDigital photographyErlang distributionRule of inferenceDomain nameEmailExtension (kinesiology)Differential (mechanical device)DatabaseDistribution (mathematics)Direct numerical simulationInterface (computing)ResultantReal numberService-oriented architecturePoint (geometry)SequelWordInternetworkingStandard deviationMultiplication signFront and back endsBitOffice suiteComputer animation
05:04
Computer networkHuman migrationStatistical dispersionForm (programming)Patch (Unix)Queue (abstract data type)Vector potentialConnected spaceSoftwareNeuroinformatikSystem callMereologyServer (computing)Physical systemTrailCache (computing)Revision controlStrategy gameCloud computingWind tunnelDatabaseOpen setMultiplication signLoop (music)DebuggerMoment (mathematics)FreewareSlide rule2 (number)Error messageNetwork socketDifferent (Kate Ryan album)Integrated development environmentCoefficient of determinationLibrary (computing)ReliefVariable (mathematics)View (database)Point (geometry)Analytic continuationAxiom of choiceProduct (business)ResultantSoftware testingStability theoryComputer animation
08:35
Rollback (data management)System administratorBackupHuman migrationSoftwareMobile appSystem administratorWind tunnelCuboidProduct (business)Human migrationCovering spaceGastropod shellIntegrated development environmentScripting languagePhysical systemFront and back endsHookingRollback (data management)BootingBitCodeVirtual machineBuildingNormal (geometry)Data centerBackupPhysicalismPersonal digital assistantScatteringMathematicsVideo gameMultiplication signExpert systemConnected spaceComputer animation
11:02
Inclusion mapInformation securityPatch (Unix)Revision controlRight angleNetwork topologySoftware testingHuman migrationMathematicsSoftwareVirtual machineSoftware developerConsistencyProduct (business)ParadoxComputer animation
12:42
LoginArchitectureBackupData centerWind tunnelSemiconductor memoryBitCore dumpServer (computing)Moment (mathematics)Entire functionLoginCuboidDatabaseDot productProxy serverSubject indexingNumberDifferent (Kate Ryan album)XMLUMLProgram flowchart
13:38
CouchDBErlang distributionQueue (abstract data type)Vertex (graph theory)Speech synthesisConnected spaceService-oriented architectureLibrary (computing)Mobile appLastteilungServer (computing)Category of beingSoftwareProxy serverOpen setCuboidBootingProduct (business)NumberEmailElectronic mailing listFront and back endsIP addressCASE <Informatik>Software bugRevision controlMultiplication signDebuggerDampingXMLComputer animation
15:48
Human migrationSystem programmingPhysical systemBlogComputer networkFreewareCellular automatonClient (computing)Server (computing)Default (computer science)Limit (category theory)Computer configurationCodeStatisticsSoftware bugDefault (computer science)Cartesian coordinate systemConnected spaceMultiplication signSoftware testingServer (computing)Task (computing)Human migrationSoftwareElectronic mailing listDatabase1 (number)Mobile appComputer configurationBuffer overflowNumberInformationSystem administratorAuthorizationGoodness of fitStructural loadSlide ruleTraffic reportingDomain namePhysical systemConfiguration spaceComputer fileProcess (computing)XMLComputer animation
18:33
Hardy spaceTransport Layer SecurityType theoryHuman migrationTransport Layer SecurityMultiplication signFrustration
19:20
Data managementMetric systemCross-correlationEvent horizonSystem administratorFlow separationGreatest elementPasswordState of matterSoftware developerComputer fileBitPhysical systemKey (cryptography)Software repositoryData storage deviceVirtual machineWeb pageCASE <Informatik>Multiplication signData managementInstallation artLibrary (computing)Run time (program lifecycle phase)MiniDiscComputer animation
21:13
AlgorithmToken ringDatabase normalizationNeuroinformatikNormal (geometry)Front and back endsKey (cryptography)Single-precision floating-point formatForm (programming)CuboidTelecommunicationDampingFile systemHeat transferWorkstation <Musikinstrument>Ring (mathematics)DatabaseBackupPerspective (visual)Endliche ModelltheorieMessage passingPlanningSubsetNumberLevel (video gaming)Right angleMultiplication signCryptographySemiconductor memoryPhysical systemQuicksortSystem administratorCartesian coordinate systemMereologyAlgorithmXML
24:39
Default (computer science)System administratorToken ringProcess (computing)FrequencyKey (cryptography)Multiplication signSystem administratorValidity (statistics)Information securityService-oriented architectureNumberEncryptionAuthorizationRight angleBitView (database)XML
25:47
ExistenceData storage deviceArrow of timeLine (geometry)Multiplication signData structureXML
26:52
Patch (Unix)Local ringCellular automatonFraktalgeometrieSoftware testingLatent heatDampingFraktalgeometrieCASE <Informatik>Computer fileBitTemplate (C++)GodQuicksortLine (geometry)Plug-in (computing)Set (mathematics)Local ringMoving averageMoment (mathematics)Patch (Unix)Operator (mathematics)Physical systemDemonService-oriented architectureXMLComputer animation
28:17
Token ringInformationCommunications protocolProgrammable read-only memoryProcess (computing)Key (cryptography)Physical systemAsynchronous Transfer ModeProduct (business)InformationRevision controlInstitut für WissensmedienComputer fileToken ringMobile appQuicksortSet (mathematics)Programmer (hardware)Category of beingSoftware testingConfiguration spaceGoodness of fitObject (grammar)Service-oriented architectureIntegrated development environmentSoftware developerCASE <Informatik>Different (Kate Ryan album)Socket-SchnittstelleCuboidAddress spaceConnected spaceGreatest elementPasswordXML
30:52
Structural loadDuality (mathematics)IntelInterface (computing)Matrix (mathematics)Service-oriented architectureChemical equationProxy serverWeb 2.0Interrupt <Informatik>SoftwareMereologyAsynchronous Transfer ModeDomain nameDirect numerical simulationService-oriented architectureMobile appPoint (geometry)Software maintenanceReading (process)Scaling (geometry)Right angleConnected spaceDatabaseData structureGraphic design2 (number)Presentation of a groupTouch typingInterface (computing)Local ringServer (computing)QuicksortEndliche ModelltheorieDiagramProgram flowchart
33:16
Mobile appHuman migrationComputer networkEvent horizonRevision controlDemonPoint (geometry)QuicksortProxy serverMobile appCloud computingView (database)BitService-oriented architectureMultiplication signHuman migrationCuboidNeuroinformatikFrustrationMachine visionProcess (computing)Metropolitan area networkDatabaseKernel (computing)FreewareDifferent (Kate Ryan album)StatisticsConnected spaceThread (computing)Software developerCASE <Informatik>InternetworkingPerspective (visual)ReliefScripting languageOptical disc driveCartesian coordinate systemStability theoryElectronic mailing listNumberXMLComputer animation
36:01
IntegerCompilerTraffic reportingQuicksortBit32-bitGoodness of fitSoftware bugFlagCASE <Informatik>Traffic reportingComputer animation
37:09
Function (mathematics)MiniDiscRead-only memoryProcess (computing)StatisticsProduct (business)BitMereologyPerfect groupGreen's functionExistential quantificationOperator (mathematics)Connected spaceBefehlsprozessorNumberQuicksortServer (computing)CuboidConstraint (mathematics)Matching (graph theory)SoftwareLengthLimit (category theory)Queue (abstract data type)Semiconductor memoryString (computer science)Process (computing)StatisticsPhysical systemQuery languageArchitectureDatabaseCodeMaxima and minimaTrailIntegrated development environmentBuffer overflowFront and back endsGrand Unified TheoryDataflowDemonDatabase transactionCartesian coordinate systemThresholding (image processing)Boundary value problem
40:54
Real numberNeuroinformatikSoftware testingWeb 2.0Response time (technology)WeightServer (computing)Wind tunnelWebsiteStructural loadQuicksortGodState of matterAreaGraph (mathematics)Multiplication signView (database)Connected spaceBitDirect numerical simulationDebuggerMiniDiscSoftwareComputer animation
41:58
BitMessage passingWeb pageConfiguration spaceEvent horizonProcess (computing)Function (mathematics)Formal languageOptical disc driveCountingNumberAlgebraic closureState of matterTotal S.A.Physical systemGauge theoryService-oriented architectureComputer fileStreaming mediaNeuroinformatikMobile appLine (geometry)
44:01
Event horizonMenu (computing)Maxima and minimaQueue (abstract data type)BitQuicksortMultiplication signWindowEvent horizonState of matterGraph (mathematics)Green's functionQuery languageXMLComputer animation
44:50
Message passingQueue (abstract data type)DatabaseBefehlsprozessorWebsiteSet (mathematics)Process (computing)Firewall (computing)DemonReal-time operating systemConnected spaceMultiplication signServer (computing)NeuroinformatikWind tunnelSet (mathematics)Game controllerGroup actionLine (geometry)Queue (abstract data type)Front and back endsBootingCartesian coordinate systemFamilyDatabaseCore dumpMessage passingBackupWebsiteComputer animation
46:17
LaptopReverse engineeringDefault (computer science)VolumeMessage passingQueue (abstract data type)Cartesian coordinate systemDemonConnected spaceLibrary (computing)Proxy serverLevel (video gaming)Reverse engineeringCASE <Informatik>Default (computer science)LaptopMultiplication signFilm editingProcess (computing)International Date LineOperating systemVolume (thermodynamics)Greatest elementCommunications protocolMessage passingClient (computing)Queue (abstract data type)Computer animation
48:12
Physical systemField (computer science)Computer networkEvent horizonComputer-generated imageryNeuroinformatikGreatest elementProcess (computing)BitDemonInterface (computing)SoftwareFile systemPrimitive (album)Multiplication signIdempotentDressing (medical)Medical imagingScripting languageCategory of beingDecision theoryCloningTheory of relativitySoftware developerPatch (Unix)Service-oriented architectureEmailCurvatureField (computer science)Address spaceSingle-precision floating-point formatFlow separationSymbol tablePhysical systemComputer fileStandard deviationLoginMobile appConfiguration spaceEntropie <Informationstheorie>Rule of inferenceStress (mechanics)Computer animation
52:15
Right angleDifferent (Kate Ryan album)LaptopExistential quantificationLoop (music)Coefficient of determinationGoodness of fitConfiguration spaceOperating systemTouchscreenView (database)NeuroinformatikFreewareErlang distributionBitPoint (geometry)Electronic mailing listRevision controlResultantForm (programming)Open setMultiplication signHuman migrationCheat <Computerspiel>WordStability theorySlide ruleXML
Transcript: English(auto-generated)
00:12
Okay, so I hope everyone's having a great bestie can 2017 I see them. This is my first time here The goal for the talk is to actually have questions at the end. So I'm going to motor through this as fast as I can and
00:27
Short interjections fine just stick your hand up or shout something out. I'll roll with it But save your longer stuff for for the end Yeah, so I'm a New Zealander Which means my English is possibly hard for you to understand to stick your hand up if that's the problem
00:43
Or just listen to the video. The slides are up on the Conference website go to the schedule find the talk click on the download link So you can read it there in case I have problems. We're missing like the corner of the screen, but hey This is from New Zealand this picture in the background
01:01
There's a few of those scattered around sometime. Hope you go have a chance to come and see my great country So, I'm Dave Cottleheber, this is a clickbait troll link from microservices to monoliths and It's I suppose it's really about fundamentally about distributed systems and how Bad they are in practice And if you can avoid them you should and if you if you're not ready for it, you'll end up where we were so
01:27
I work for a little company called I want my name and We'd really like it if you're a Twitter person that you tweet Hi, I want my name I'm at the conference Dave is wonderful the last bit is optional but the first bit is not The more tweets the happier the business people are and the more trips and conferences we get to go to
01:45
Okay, so we're a domain. Oh, sorry. Yeah about this talk. Why me migrated into what a Little bit about migrating and building our our own source because one of the nice things about the BSD world is you can do that really easy yourself a Bit on the architecture a bit about how we move the back end
02:03
Include one how I had to roll back moving the back end and try it again The bit belt sort of the basics of how we keep our system secure and up-to-date a little bit about cluster and jail Internals at least how we use them and to lose to how I broke some stuff rolled back and had to fix it
02:21
again and there's some stuff about metrics and monitoring which For me as an ops person really is my bread and butter the Holy Trinity backups monitoring log files And then probably patching. Yeah And then finally into interlude 3 another problem where I had to go back and fix some stuff And then some time for talks at the end
02:43
So we're a handcrafted artisanal domain reseller We have a shelf of typeface in the office and we carefully pick out every domain name your request on the internet It's polished. It's delivered. It's shipped. They're beautiful. You should get domains from us I'm probably a main point of differentiator is that we do these things for real people and to an extent
03:04
This community are not really real people We are way past. I'm dealing with DNS problems. We make the DNS problems for other people. That's our bread and butter This is for families for grandmas for photographers Anybody who goes I want an email and a website, what is this domain thing? And why do I need it?
03:25
And we make a really fit to help people. We're an ethical business We're just quite unusual these days unfortunately, and we like a simple interface As I said before I'm a New Zealander. We're a global distributed team. I think there are 14 of us Three in Canada. There's some Stockholm rule over the place. So we live distributed systems and we're a distributed company as well
03:46
Our stick is of them Two years ago as a pearl catalyst front end, which was leading-age tech a Decade ago and it's showing a bit of its age when we fought multiple workers To handle eight concurrent requests. We use eight gigabytes of RAM and that's kind of yeah, that's got to go
04:05
And we're very much an early in back end and have been for a while rabid MQ as a messaging service Written in Erlang Apache couch DB I know sequel distributed database written in Erlang and our custom search app is written in Erlang again So I'll keep mentioning that and you should go home and start learning it immediately
04:23
We use this session store called Kyoto tycoon, which is a lot less known It's written and see it's very fast it provides a binary API and a pretty standard rest ish API and the main reason why we use it is because it runs on multiple nodes so we can have this
04:40
Clustered arrangement where it doesn't matter which node you talk to you should get more or less the same result Very much how people use Redis today Two years ago. We had 20 DB and unstable VMs and I use the word you here with a capital U Because that is the distribution we'd chosen to ship our software on and I would say most the time that was probably okay
05:01
but this year that time It was open SSL open SSL open SSL open SSL And I didn't write up in the slides because I thought it was so obvious Part of our new free BSD stackers Libra SSL all the way and I have to say I wish I'd had that a year ago
05:21
It really introduced a lot of variability to our stack So the problem we were facing is continual patches because of open SSL and you couldn't really build a production system That was the same as the test environment and that fundamentally meant that if you're tracking down any complicated problem you're basically screwed There's no way to do it in particular one called net
05:41
HTTP so The particular version of pearl thing was five dot 24 RC to changed net HTTP from a blocking call to a non blocking call and at the same time introduced an error where if the Remaining pay the the remaining packet on the socket was exactly 124 bytes long at deadlocked
06:07
And I won't tell you how long it took me to sort that out, but it wasn't really a fun night And that was really the moment where I went back and ran it on our hip chat channel. You can't make me do it I'm not doing this stuff again. We have to move away and some other people said yes, which was a relief
06:21
Yep, so the bigger picture 20 VMs all doing different things request comes in the front front end of the website We make a couple of calls to our back-end servers to our session cache We collect some stuff from external API's and there's a lot of latency Cumulatively well individually the latency isn't great, but cumulatively it was it was really long
06:41
it was painfully long both for our users and for us the Cloud provider we were using had a lot of what I loosely called VM induced network and instability What were perhaps more politely crap? And that was really bad and it was affecting us almost every week We would have some form of outage that was triggered by losing SSH sessions
07:04
Losing connectivity between our servers and required manual intervention to restart stuff and potential downtime for customers The downside of this is again because it's a distributed system when you have one part of it that stops It builds up a queue of requests and an obvious thing the computer does we just put a little for
07:21
While loop around that while network is down hammer it repeatedly until it comes back up and that strategy doesn't work In fact, it doesn't work so well that it exhausts all the TCP Ephemeral ports on the server that's doing the requesting it also hits when the other server comes back up Before the database is ready that hits that as well. In fact, then the outage spreads from being a relatively short
07:43
Maybe a few seconds micro outage from a network point of view Spreads into turning everything off because they're all trying to reconnect so quickly and on top of that because we were using SSH tunnels You couldn't SSH in to fix the problem either. So you had to log into a console and that really wasn't fun
08:00
and finally As a result of our choice of Debian stable a Debian unstable. We really had an upgrade hell problem We couldn't go back once we started upgrading and we couldn't go forward. So we'd have an open SSL patch. We decide to employ it We'd find that there were some problems in our underlying
08:20
Perl libraries and we'd have to troubleshoot that in production, which is the wrong way to do it I think we all know that and so this is fine dog That was pretty much the way it was for a year. This is finally working on it And then finally we said let's move so onwards Choose wisely now I wouldn't be at a BSD can conference If I told you we've just gone to deep in stable and then and they left into that so obviously we picked something else
08:47
And we really wanted the holy grail. We wanted everything we wanted reliable and custom packages So that's perjury and building our own base system Easy rollback for DBs and upgrades and with ZFS and boot environments. That really is that really is cheating. It was just too easy
09:04
We switch from puppet to answer ball, which Fundamentally, I've got nothing against either tool personally but We're not a big company and we don't have full-time sis admins. So Ansible is one of those things if you can read a shell script and you can look up some documentation
09:21
You can probably hack in the three or four small changes You need yourself without needing to go to the expert and beg for assistance and that's a big advantage Um, we also want a robust transparent failover for apps and backends and we didn't have that in our old cluster environment and effectively the way we dealt with that is use carp to freebies know, you know physical nodes sitting in the same data center and
09:45
Let carp do its magical magic and everything's fine after that We use these pipe D tunnels for resilience and This was probably the first bit of beastie code that actually got introduced into our DB environment ripping out the auto SSH tunnels and replacing
10:01
With these pipe D from Colin Percival. It was much faster It was much more stable But there was a surprise which I'll cover later. And of course, it was a surprise in production So so there you go, but at least mean that when we were having these micro network outages It would recover by itself most of the time
10:23
and also If there was a problem and that and the tunnel hadn't come up at least our SSH connections are still free and we could work As normal sis admins do when we we hook into the box So the big picture is three physical machines system in backup box with Padre for boarding ports
10:40
Things like logging and monitoring located in that and that was the first box we converted moved all of our logging across to that and A&B cluster nodes and a few scattered beams during migration Because that just makes life easier move them off Debian onto FreeBSD and then when everything's sorted move them onto the cluster
11:03
So first up Padre How many people are familiar with Padre a Lots of people and probably would get as well. You don't have to smile. I just have to be familiar with it It's not the same thing. Um We could probably have done this with subversion But to be honest, I used get a lot more and basically what we have is on the right here
11:22
you can see we've got a custom fork at the previous the ports tree and then a bunch of Things which are things that we're interested in that we keep that we're up to date with He waited a penancy for some software that we use was missing Here we waited three months for a patch to land so it finally did so I could remove it from our ports tree
11:41
Grab the Libre SSL patch as soon as it lands and one for HAProxy because HAProxy needs a patch to support LibreSSL and we just keep these patches On our own tree and periodically when it suits us not when Upstream decides we pull in all the recent changes bump our ports and packages up to the latest version
12:02
And there'll be two or three of ours that float up to the top and then we rebuild everything Spit up an extra node Test stuff make sure it works and then not in production when we're ready we can switch over and that was just heaven I think probably I was the only person who cared about that, but it was great
12:21
And paradox II that probably the most important thing is the one we actually haven't really sorted out yet, which is Guaranteed consistency across dev and prod. We don't really have development VM machines Fully sorted out yet migration was the key thing and the next date is getting dev VMS But knowing you've got consistent packages and versions of everything as the first step to troubleshooting things reliably
12:42
So Architecture, you're not supposed to be able to read all the little dots and numbers here. That's okay It's just the general picture the rabbit in the middle is rabbit MQ That's connected by tunnels everywhere. So Between the nodes between our monitoring a login box here and the rabbit MQ server up here a bunch of
13:03
things doing Dnse things small boxes doing this dns they will converse over rabbit MQ and then right at the top We've got the two cluster nodes sitting at the same data center carp an LACP IP failover and HA proxy in front of them and I'll talk a little bit more about that in a moment
13:21
But that's the general picture you've got to keep in mind two big boxes 40 cores and enough memory to fit the entire database in RAM and all of its indexes in RAM And that really makes a big difference for speed alone. So it's kind of cool
13:41
So fundamentally from a tech perspective, I like boring things I like things that I've used for many years that I know and I trust and I like older versions of them I like other people to discover the bugs in production and Then complain about on the mailing lists and the only piece that we introduced that wasn't really in directly in that category was HA proxy
14:00
it's been around for a long time and the way we use it is that We have a number of services that have a carp IP attached to them so they can float between our two cluster nodes in some cases there are two carp IPs so that we can have load balance traffic between the two front ends and
14:21
The services CouchDB Kyoto tycoon and Raven MQ actually the the jails actually connect to an internal HA proxy IP and then HA proxy says if The service you want is running locally and it's available Then I'll send you there and if not, I'll stick you through the tunnel
14:42
The SPIP tunnel to the other box and that works Brilliantly so brilliantly that I can dispatch stuff and reboot boxes as far as I know without my colleagues knowing And that was really the goal of it That's proved really good. And the other surprising benefit that are not considered is that a number of our older? libraries and software using HTTP 1.0
15:03
To connect to the server and then it would disconnect and we'll make another request Disconnect and with HA proxy HA proxy being a proxy We have one leg speaking 1.0 to the to the crappy app and then the other end connecting to the server to the back end With maybe three persistent connections and the first time I migrated and I was looking at this I was panicking
15:23
I was going I'm supposed to have a hundred connections from the to the back end and I only see three What have I missed and then I was thinking how does that even function like this? That's how is that possible? But it turns out that alone is a big big speed win HA proxy is very good at handling many concurrent connections and then we have one nice fast
15:40
Persistent open TCP connection which may or may not be through a tunnel. So it was great We're probably due for an interlude about now, oh we are yes so like any good sysadmin I did load testing and I doubled the amount of connections and Put s-pipe D in place and it all looked great. So we started the DB migration and
16:01
As I'm one of the authors of couch DB I was pretty confident about that Pride comes before fall. So we migrated the data across started to switch servers over and the way we did that was Switch an s-pipe D tunnel from the old database across to the new one. The database is replicating So we're not losing data in this process and little by little migrated a few safe servers across
16:27
Then slightly more complicated ones than the app servers and then right at the end the the last one Which is what I think of as the money application. It's the one where people search for domain names Migrated that across and it wouldn't work And of course that's after about like six hours of shuffling things around so I look at my checklist and go okay
16:47
Back we go put all that and then go do some research. What have I missed? Um, yeah, so I rolled back reviewed logs, I think on the next slide I actually have the picture I Applied a whole lot of tweaks to the FreeBSD network stack because as a system and I obviously blame the network first
17:05
Before looking at my own stuff And it was looking good. So the next day I Migrated at all and me and my wife went off to watch Rogue One And the cinema incidentally is underground which I'd not considered at the time and there's no cell phone reception there
17:23
Yeah, so the these um Refusal of new connections and Continued to occur while I was watching the film thinking everything was bliss Listen to you overflow and what's kind of weird is that there's some large number of hits. You should be able to look it up I think with F stat or one of the other tools and then see what process is failing to handle connections
17:45
So I couldn't find that And so I did the usual thing collected all the information I could All the stuff that I thought might be useful sent along to task the despite demailing list and am I using it correctly? they're about a hundred and
18:01
Collins very kind reply was Yeah You just need to change the config option. Ironically. That's not actually in the Available in the port by default and the RC to D file But there was an easy enough fix and then this problem went away I guess the upshot of that was for me the the value in the FreeBSD community is if you can do a good bug
18:22
report Then there is someone who will probably be able to help you And perhaps if you can't do a good bug report, you might be a little more stuck That's worth spending the time and to collect the information So the sacred beasts of Sisipman is the hardy yet. They who must be shaved
18:43
As we're going through this migration There were a lot of small little things that cropped up and the interleads really the big things that broke They're really painful things that took time and frustration But there are a lot of little pieces and one of my learnings in hindsight was triple the amount of time I assumed it would take roughly because because of the X that lined up to be shaved repeatedly some small
19:03
the X some big yaks and As a as an ex manager, I don't think there's any way you can predict that You just get used to your stuff and go. Okay, this person with this that's going to take two weeks means four weeks And this person means six weeks. Yeah, and maybe I'm a six-month type of guy
19:21
Yeah, so Sisipman back belts Some of the stuff I put a star by the logging. I'm not really going to cover that I think we all know about logging and it's really just a matter of picking a tool you like and we didn't do anything Exciting with that. We did use Splunk and That's expensive. So we stopped using that news gray log instead which as far as I can tell is
19:41
Better for our use case and the prices rise Vault is an interesting tool It's a secrets management tool And so the easy thing to do as a software developer is as you're developing it You need some passwords to stick them in you get repo works fine in your machine and you don't think about how you're going to deploy that so you
20:01
Just ship different passwords with that into production If you're slightly clever, you'll put them in a separate file or something like that, but you end up in the state where these Passwords are the things you really don't want exposed get stuck in the system They you don't have libraries that manage this at runtime and you end up managing it at a deploy time or install time
20:20
And so as you call vault effectively is a key value store where you can Put secrets in and you can pull them out again It's reasonably new maybe two or three years old But it's really really nice to use and it solves the problem of where am I going to put these secrets? There are a couple of nice thing that does I'll cover them on the next page and the next
20:42
Three four tools down the bottom collect D. How many people have used or know of collect D? Very few. Okay, so I'll spend a little bit more time on that Riemann probably even less Awesome yet. So my goal here actually is to try and win you off Nagios. Okay, that's last century's tick
21:02
It's time to go and I'm pretty sure most people have used graphite graphite or graphana for storing data and sticking on disk And I'm not even going to mention Jenkins, so So I'm not a cryptographer but the show me a secret algorithm is this idea of where
21:20
You can take a secret and if you imagine it as a ring of data You can snip out segments of the ring and if the Rings overlap then you can give each person a segment of the ring and then you only need a Sense Yeah, and I don't know about the internals of vault, but that's the general idea
21:43
We take this database that has all the secrets in it and we have a single master ring That is then with the keys then distributed amongst various members of the team and when we restart vaults Or when we start it for the first time we initialize this ring structure It's kept in memory and you have to have in members out of your ring
22:06
To actually get access to the secrets. So choose that number wisely if you take five members of you to be part of the ring and You have five of them and then one of them dies in a car accident then no secrets So there's a trade-off between having say three out of five
22:22
Is everyone going to be available are they in the right time zones? Are they in a place where they can use a secure computer and you need to think about that sort of thing for me from? The perspective of a threat model ours is very simple. We're expecting you to be at home on your normal work computer Not in a cafe and you should be able to reach the people you're dealing with on a some form of communication channel that you can verify
22:43
Not IRC that the other people you're speaking to so phone or video We know each other's voices and that way you can ask them from the workstation to log in Do their initialization with their key then the next person does it and after? In people have done that then the vault is available and up and running
23:01
We use github auth as For when people are doing their normal day-to-day work They don't have this magical ring pass but they have a delegated one and it allows them to read and write a subset of the keys and vault and In future answer will itself will only have the right to read keys from the vault
23:21
It will be a simpler delegated privilege and that was great. Um There's a number of models for vault you can deploy in a complicated HA infrastructure and Again having spent a lot of time with distributed systems. We just have one Copy, it's currently hosted in s3 and we will probably move that out to just a
23:43
File back end and then back that up with tasksnap or maybe tasksnap and git the reality is if we have downtime and Like we lose we lose our main admin box and we need to restart that from scratch the path from restoring to from backup and Deploying these things on a file system as opposed to getting set up with s3 and enabling all of that again
24:05
A file system is much much simpler, and I don't think expose this anymore from the fret model So the only other thing to add with vault is some future plans. We would like to Have at level tokens, so not just tokens for a database
24:21
but actually for an application itself where it will get a transient key to RabbitMQ or a Transient SSH key if it needs to transfer something over SSH and this would be valid for a limited and quite a limited lifetime But we haven't done that yet and Hachikook Vault does provide that in the future. So we'll definitely give that a go This is a bit of a view of what it looks like numbers change are completely innocent
24:46
Vault is an HTTPS based service. You need TLS Minimum 1.2 to secure it. It works fine with let's encrypt except You have to remember that every three months you roll the keys every three months. You've got to start and stop your vault Which is obvious in hindsight, maybe not so obvious up front
25:04
Using a token that you get from github you do an auth and now you have a token this long thing down here And you can probably see it there. It is token duration. It's valid for a period a given period of time This is from the initial setup. So that's numbers really long It's a lot less now. So this gives you a nice trade-off you can decide
25:22
If you're maybe using some automated process in the same way you deal with a Kerberos ticket You can say I want this to be valid for maybe a week And then it needs to be renewed weekly for your admins If people are traveling you can restrict that right down and say maybe an hour or two or or a day it depends how keen people are on reawthing versus
25:43
preserving the security of the company So here we're just writing a secret to vault We do not know who the Scarlet Pimpernel is so there's two key things here secret black arrow That is the key. This is a key value store. It is not some other complicated data structure store where you can
26:03
Use like a lens and grab and update a piece inside it You have to update the whole thing at once. This is very important because if you get that wrong you will lose secrets so in the second line here we Update now, we know who the Scarlet Pimpernel is. He's a comfortable food and then when we read it back
26:21
We noticed that we see comfortable food and we don't see we do not know they said that the key inside the key But the value has actually been overwritten We get least duration time You can have it in YAML. You can have it in JSON and you can just Read out a single value
26:41
In itself right there to this school or do it else if you needed it's great I've not really seen anything like it I'd be really interested to hear from other people if they have something that that that does that So Ansible, this isn't an Ansible talk we could go on for hours about these sorts of things and we've got about 50 rolls and most of them are custom because
27:03
The way we system in FreeBSD is I've got one specific thing and RC dot conf dot D slash service name Conf file that settings go in there and then I've got a daemon that I need to start and stop and possibly some related Files, it's pretty contained like that. So most of these are custom and they're not very exciting three or four lines of Ansible plus some templates
27:25
The only big ops was that the native jail support and Ansible is only when you're running Ansible on your jail hosts talking to the jails and it did not occur to me That this would be the case. So I'd done all my tests and dev locally. It's all working brilliantly. I said, okay, right
27:42
Let's some Redo one of these cluster nodes and test it remotely and didn't work and I was going crap I've spent literally weeks assuming this would work and thank God there's a fork of a fork With some patches that makes this work and that was there was a bit of a stressful moment because my Python was not good Enough to rewrite that stuff from scratch
28:02
And huge thanks to XMJ on IRC both for his fractal styles stuff Which I looked at and his Ansible plugin and his moral support when he discovered that I didn't know this either Yeah so I'm not sure if you can read this at the back
28:21
probably not Ansible is for better or worse is YAML. In fact, I spent most the last year running YAML. I'm actually yeah, I'm a YAML programmer The main things of interest is on the left here is our jail definition so we have a file where we've got Configures a top-level object
28:40
ISL Instances and then under this thing IWM base And the conflict is pretty simple. It's more or less what you'd see in jail dot-conf It's an IPv4 address for the jail to be bound to a bunch of packages that we want to have installed And some down the bottom here in blue some magical properties allow raw sockets I can't remember if we actually needed this in the end, but you can add those
29:02
Those properties there. This Ansible role is not applied to the jail. It's applied to the base system. So it's in a separate file and then Over on the other side. We have the app config that said you could actually gets applied to the jail So Ansible has a connection plug-in That pushes this data through
29:22
It's a catalyst app. I've got a primary port 5000 It's got a path to stuff some things have a debug mode and test mode and all that sort of stuff This config setting here can vary depending with you on a development box or a production one so the way we say that I want catalyst to spit out lots of debug information is
29:41
Just to update these things in Ansible and depending on whether it's production or development different settings get applied and that's that's pretty good Down here. We've got magical tokens and there's a plug-in we use look up vault the secrets this is the key and this is the The sub value we want within that key that gets pulled out and returned back to answerable
30:06
When answerable runs then it pulls all of these things out of vault. It uses the Orth token that I have that should still be valid so this means if someone takes the our Ansible scripts and And steals them without that auth token. They can't get the secrets
30:22
Out of our production environments and then it goes and pushes these into the right places Unfortunately many of the apps we use use file-based Tokens so if you were to hack into the service Then you would still be able to get those passwords Um, but you'd at least have to get in most cases root privileges first to do so. It's a good compromise
30:42
It's not ideal I would like a way to inject a secret into a process in such a way that was only visible to that process Right. So now we're actually inside the cluster. There's two nodes here
31:02
I'm not a graphic designer. Yep. So one node along here one node at the top Here's our logical domain here from the network side of things Carp failover round-robin DNS With a lag interface and also switch support configured as well upstream on the switch and
31:20
To be honest once we set that up. I've not had to touch it since it just works perfectly This sort of structure here I'll point to the bottom one. It's the same top on the bottom H a proxy listens on both of these nodes and as I said earlier Jails Request access to HA proxy for a particular service and the HA proxy will direct it either to the rabbit MQ catch
31:43
Do you or Kyoto tycoon database on the local node or if that's not available? It flicks it through the tunnel to the top node and that works very very well Couple of problems related rabbit MQ, which I'll touch on later, but otherwise this just this this just works
32:01
Out-of-the-box and it's pretty very very reliable one of the nice things about So You could do this with nginx, but you have to buy a commercial nginx license for that and for the scale We are we're not Google. We're not web scale We measure our hits and you know tens per second not ten millions per second we really don't need that but what I did want is the ability to
32:24
Take off some of our services to do maintenance on them without having to Interrupt customer requests or take parts of the app offline and HA proxy allows us to turn a particular So this server here with this connection here into maintenance mode It will then drain the connections to it and it will see new connections off through to the next other available node
32:46
And that's a really really nice trick. Um the three or four modes for HA proxy and dealing with that round-robin and so forth Um There's an excellent talk I think by the CEO of fastly where he talks about the actual impact of these models in practice
33:01
I should find a link and add that in the presentation because it's well worth a read and to understanding whether round-robin or Least loaded is your best choice for us to be honest. It probably doesn't really matter include the second so We
33:20
Did the DB migration we've got everything else our first apps behind HA proxy and I'm kind of relieved because it was a big step a big step to take But we're also in this halfway house What some stuff in DB in over here and the unreliable cloud and we've got these databases and happy free BSD land Running and then one morning everything wedged completely absolutely everything at exactly the same time very very odd
33:46
Different cloud providers some in VM some not and all cases HA proxy had stalled From a demon service perspective was up and running if you went to create the stats that was available But it wasn't accepting new connections at all and everything was down because the databases are behind it
34:05
They're also down from the point of view the applications and it was well luckily was 930 the morning I was already sitting at my computer and five minutes later It's all back up and running but I was a little bit concerned. So I whipped out d-trace to check one of the processes
34:22
I got a panic on the box and I was thinking great This is a really good story to take back to my colleagues about Previously stability and I couldn't get a caught up out of it either so that's a bit frustrating summarize this the situation sent off to the HA proxy list and Three other people replied and said could you tell me exactly when that happened because we saw the same thing, too
34:43
And within literally a couple of minutes Globally, there were a number of people seeing the same thing The only common thread was previously 11 not X doesn't matter what vision of HA proxy running doesn't matter if you're running libre SSL whatever This is a bit new to me I've not really struck this sort of thing before and I had visions of little gremlins in the internet sort of
35:04
Script kiddies hosing everything with some mysterious packet of death There are probably kernel developers and people have a lot more experience in that sort of thing Does anyone have an idea what might that might be bonus prize if you can guess it quiet. Thank you for being honest
35:22
Yeah, anyone else have any ideas? Um, it's a good idea. But no, we're all in different time zones. They should we're all in UTC No, no, it's another good good guess No No, that's the thing we should get rid of as a matter of principle
35:50
Yeah, keep rolling your hand that's exactly right so bonus points the man in the corner there well done So after I came up with my weird conspiracy theories, I'm Willie Taro
36:01
replied and said If I'm right, this will be about 47 49.7 days sounds like some sort of millisecond 32-bit timer rolling over Never heard of that before an HA proxy, but it could be possible And long story short we waited 49.7 days with the pitch that I think was a
36:22
Alleviation to us and Problem solved and the answer was that um I'm assuming that in 10 dot X the port is still built with GCC and With 11, it's built with clang or at least we were building with clang and we need to use this frat v flag to prevent rollover
36:43
Yeah weird that went away and just Last week. I had the final confirmation. We all reported back that after another 49.7 days no one had seen it. So I was kind of waiting a bit thinking I hope that really hope that doesn't come back And it's another case of running good bug reports
37:02
Thankfully, thankfully, we haven't seen that again because that's really difficult when everything in infrastructure goes down all at once So this is the bit I enjoyed the most to be honest I like monitoring. I like sitting at my desk. I would like to have three large gummy bears You know be it bear statues that glow blue when everything's fine green when the builds are going
37:26
Orange when something's wedged but production isn't down and read when I should really be getting out of bed That's that's my dream of perfection. And of course, there are many yaks along this way to shave So you really need a pro ops Yeah, could a Jedi I'm not sure about the Jedi added that yesterday to the presentation, but I'm definitely an ops. Yeah, kurta
37:45
So the first part of this is collect D In every networking environment that I have used There's always been at some point. I'm TCP Ephemeral port exhaustion caused by badly written
38:02
Distributed systems so better ins perhaps a little unfair but the guts of it is you've got to have the exponential back off if you're running an application that is crossing the network boundary between Boxes or servers or networks and in my experience very very few people do this. So when there is an outage Their queue of transactions piles up the network comes back up
38:23
And they immediately start hammering the box saying let me in let me in and if you have 20 30 servers doing this Then that piles up very quickly the way I've dealt with this in production is And the new architecture is put effectively three limits in place Ask people to rewrite the code politely that may or may not happen and depending on constraints use PF on every single box to
38:47
Effectively throttle the connections at the server end Restrict the number of connections with s pipe D now that I know that you can actually do that So that from one particular server back to a central database serve You can get a maximum say 400 concurrent connections through that and they just sort of slows everyone down and then the back end
39:06
The databases are quite capable of handing 10,000 concurrent connections So that really isn't a problem But out with without without using PF and s pipe D to sort of slow the to nip the flow in a couple of places You end up in this overflow situation
39:20
So to keep track of that we've got sort of basic stuff and collect D collect D is a demon written in C It's pretty low Low Effort to config and out of the box you get all those nice things CPUs it if he's Displaced memory you can have per process stats. So unfortunately, I put the label in slightly the wrong place here But you can actually see a process name or a process match in the middle here
39:43
And this is allows you to match on Let's say TTY is a string and you you know in your previously system You're speaking to see a bunch of those maybe what is it eight or twelve and you can actually monitor that And see if these processes go down because we're aggregating this over multiple systems. You can actually say I need to always have at least
40:05
Two of these processes running somewhere in my environment I don't care where and then you move away from this per box monitoring problem of saying whoops This per box monitoring problem of saying How Do I decide if what happens if I reboot this box?
40:23
Is it okay to reboot this box you have this information from other servers? We also have custom stuff in there rabbit MQ length. That's over an HTTP API to rabbit MQ so it's pretty easy to throw in any sort of general query to a database that gives you back XML or JSON or just some other number we have basic thresholds and here so I don't do anything fancy. I don't know anything
40:44
Computationally expensive and collect D sends its data back out over TCP Through an East pipe D tunnel to our monitoring box Now this is the sort of stuff we get out of Collect D interim and this is all in real time. So it's not historic and
41:06
With a bit of luck the computing gods will eventually let me collect data Well, I'm talking here and we can see little things moving along HTTP response time So this is actually from three or four front three or four nodes that provide DNS They're continually querying our front end web servers and saying how quickly do you respond? It's just for the first primary website
41:25
Gets but it gives you a really quick view if something has changed From the network side of things. Are we seeing really slow response times from a particular area of the world? Maybe in disk the stuff here is our TCP connections. This is the piece that really enabled me to narrow down really quickly
41:42
Why I was have Why my East plate D tunnels weren't working as expected and what I used to establish load testing beforehand And that's just collect collect D. I only graph two states established in time weight And that seems to be enough most of the time to know what's going on Reman is a tool-ridden enclosure and I'm not a great closure person
42:03
So I've tried to keep this as simple as possible This is a snippet from our config file The way to think of Reman is a bit like you're staying at the top of a mountain With a huge stack of events and there are three or four rivers down below you and you're sorting these events Sometimes you want to throw the event into three of the rivers
42:20
Sometimes it just goes into one bucket so all of the collect these stuff goes into another bucket all of our custom written app stuff goes into another bucket and Then within that within the stream you can decide okay for this host and this service. I want to for example Count that there's at least one of these processes running on it or I might want to say
42:43
This particular one here. You can probably see in the middle this really long line curl JSON rabbit MQ gauge Total messages should be reasonably obvious. It's the output of the JSON API for a bit MQ of the number of messages currently Not delivered in rabbit MQ. So that's my big backlog if something goes down in the system. That number will
43:02
Go up very quickly and for that I'm going to say I want a ton deliver two minutes I'm going to rename it to something a bit more comprehensible rabbit MQ backlog and I'm going to take it as notify This large clause here and this lispy flavored language sees if the state is expired
43:21
Which means I'm no longer receiving messages at all from that service. Like if you turn it off Then change the state to critical and re-inject it put it back at the top of the mountain So it flows down the path to Critical page someone get them out of bed And the same thing if it's okay, we were injected that seems odd. Why would we re-inject an okay message?
43:41
and the answer is Because if you pay someone to get them out of bed and they get the computer find out the system fix the self They'll be really annoyed. And this is the piece that sees all the systems back up and running Integrating to pager do you turns the alarm off and goes back to sleep? Yeah, and then finally we send all this stuff off to graphite so we can have pretty pictures for historic viewing
44:01
um, I just stuck this into ZFS and one of the problems with graphite is you've always got to decide how to Aggregate your data over time. I don't I tell ZFS compress it sort it out and Seven years of data comes up to like something like 60 70 gigs. It's a bit slow to query sometime But that's not really a major problem
44:22
So these are the real-time graphs that's the sort of thing you get out in graphite and we'll have a look at this This is this event here the the rabbiting queue monitoring and okay state everything's green and then when it fails it goes up into orange so most the time I have a Window just sitting there like that and I have a look periodically to see if it's if it's going orange
44:43
It's something getting paged Are we going on time here? Okay We'll just go to interlude to and then we can have a look at some real-time stuff So interlude the third the databases have moved the message queues have moved. The primary application has moved
45:00
Let's move the back end job queue processes, which is a bunch of a pearl demon. I bit like the Star Wars thing I decided to go on holiday in two weeks starting on Saturday because it lines up with the school holidays and we're all going As a family and I don't want to take my computer Um Long story short move the job cues major backup problems with the monitoring
45:21
We've just seen but the site is still usable and the stars are there because usable depends on exactly what you mean by usable Some connections to rabbit MQ it is slightly flaky and they just fall over and we don't really know why yes So I tweak all the things this is things like sys controls that we've set Untweak them reboot stuff
45:40
Doesn't work and I check the tunnels. I double-check the tunnels I turn P off everywhere P off P if the firewall off everywhere. It's not the firewall turn it back on And slow panic sets in to me not to the servers. And this is I think Wednesday or Thursday, I think was Wednesday and At a almost midnight on Friday evening. I suddenly work out what it is after reading some TCP dump in group
46:05
very closely My personal learning from this is go for TCP dump much much much earlier Every time I use TCP dump, I've always said the same thing I wish I'd done that half a day or an hour ago two hours ago my bad, so the setup we had for H a proxy is tuned for HTTP keep alive and I'm also designed to make sure that our
46:27
third-party API clients Don't misbehave. So if they don't send any data, I Cut them off Unfortunately rabbit MQ, so it's a reverse proxy TCP connection on HTTP one
46:42
Rabbit MQ is a messaging queue could sit there for several days with absolutely no traffic H a proxy has TCP keep alive at the bottom. So the operating systems are aware. There's a connection But at the application level there's no traffic passing and so H a proxy itself Doesn't see these keep alive messages being passed on each end of the connection
47:03
So it was closing the connections if there was no data And that was what was breaking the spanking stuff Um Yes accused with no message volume would have their TCP sessions dropped and The pearl demon that restarted these workers when they finished their work or when they died
47:21
Wasn't quite fast enough to keep up with these jobs and they come into production. So we'd see a backlog Creep up and then would catch up and tail down again The fix was to enable So rabbit MQ uses a messaging protocol called a MQP I think I've got that right and
47:42
Heartbeats should be enabled by default and all clients but are obviously not So in some of the libraries we could enable it and the pearl one we couldn't and So the solution was just simply to change the we're missing but a text change the H a proxy idle time to three days Problem solved and I go on holiday, but I did take the laptop just in case
48:02
Yeah, but there was there was touching go I left I think it was like 6 the morning the next day and I had to wait till 1 a.m So I could call my colleague in New Zealand and tell them I know what the problem is and you're going to like it We fixed it. So Price for anyone who can tell me where that is
48:26
Not bad not bad you're right it is a New Zealand though, I'll tell you it's all at Stewart Island It's the bottom end of New Zealand 20,000 Kiwis live here. This is very Kiwi birds not Kiwi Kiwis like me beautiful place
48:40
No computers very good if you want to get away from cell phone reception lovely So some thoughts of once Previously is interstellar ninja hipster tick, but for someone like me who's not great on networking I got a bit lost and I think in hindsight That was something that really slowed me down. I don't know what an EP is a ton of tap I thought I knew what a clone interface was but it turns out I don't
49:05
How to make typical jail network seems obvious in hindsight but it really isn't at the beginning and You've got to make a decision between how do you get your traffic in and out of the jail? And how do you make the decision on? How permissive this jail should be in relation to network traffic. Should it be able to access other jails? Do you want to talk to services on your host? Do you use PF to control it? Do you use HAProxy?
49:24
Some guidelines will be nice. I noticed that no one can agree with a V image is safe or not We don't use it, but I'd love to have PF directly inside the jail So I'd like someone to answer that for me and If jail.conf from PFConf and OpenBGP would have a baby and dress it in ZFS dataset properties
49:43
I would be really happy. I think this is the learning I've seen from from Docker I'll mention it once because otherwise I get post-traumatic stress disorder Docker provided people with an API for packaging stuff up with included the networking and some file system bits with a very very primitive
50:02
Non idempotent way of doing builds and it turns out it was enough for people And so we have all these bits in FreeBSD that are Generally speaking far far better But they're not packaged up. And so once you know them you call them together yourself you build your own Lego and go It's awesome. But there's a piece missing there and that I think that's the piece that is the on-ramp for
50:22
Future BSD users and developers who are missing that LibXO is awesome. I want it everywhere all the time And in fact, I want it streaming so that I can connect it up to RabdenQ things like this And then make decisions across my infrastructure Rather than in a single place. It may well be this is already possible. I don't know about it
50:41
I'd like to have the at symbol in gcost fields because then I can put email addresses for demons and services in there There's a patch for it. I couldn't find the positive standard that says is that or isn't allowed. Maybe it's true System D is not something I enjoyed particularly because it changed over time with every Linux release but it is a very simple way of describing what a job should and shouldn't do and
51:06
When you're writing your own apps when you're writing a new daemon for like a new port for FreeBSD You typically cobble together bits of someone else's shell script and over time you learn to do this a lot better But what I learned from system D is having a single flat config file that covers 90% of people's needs
51:21
means that people don't screw demons up and I think I've submitted at least three patches for other people's demons and Several patches for my own demons as I learned more on how to do that properly DTrace and Tdump are awesome. I would love to have TCP support for base syslog D. In practice
51:40
I rip it out and put our syslog in but the problem you have with the jail is if you want to get TCP Syslog data out of the jail you do that over UDP and then you need a PF rule to make sure that people on the outside can't spam your Your UDP logs that kind of would be nice and I know Alan Joo Tobi wasn't possible
52:01
But I really would like something like ZFS send for FreeBSD update. So instead of downloading 3072 patches they would just go here's one I prepared earlier done. I Start off doing this. Thanks. The list got really long Your name probably isn't there when we come to these conferences. We get an opportunity to say thank you. And so, thank you
52:27
These are the people who helped there's some right at the top I'm Ed Masters one of the first who fixed the sizing on my laptop It used to be the screen was this size in the middle of a big black boundary And that wasn't really very conducive to doing work with FreeBSD Once it was full screen. That's really what started this migration path three odd years ago
52:44
So there's little bits for newcomers who are struggling around literally in the dark Makes a big difference and those are all the people that helped In particular want to mention Matt Macy for his DM derium next work That's enabled my colleagues to use FreeBSD It's what made this presentation possible
53:02
And I think people underestimate how important it is to be able to dog food the operating system using so big thanks to Matt Anyway, any questions, I think are probably Probably gotten to the end of my time now, haven't I?
53:24
I'll do a shortened version of the story a colleague when I was working the University of New Zealand 25 years ago said hey, I have this free the FreeBSD thing. It's awesome. I ignored him and Then I picked up 2.8 Open BSD from a colleague and I was in France at the time who's because I wanted to learn more about Unix and I got started there with great man pages and
53:46
Then I picked up free BSD later because I was using Erlang it has d-trace support D-trace didn't run on the computer. I was using I went to find a computer that would run d-trace Yeah, so that's how it's all started d-trace
54:07
The short answer is I don't bother with it it turns out it's not really necessary It's pretty that's pretty funny in practice. I have monitoring so monitoring will tell me something's down. I don't think My colleagues would know necessarily what the problem is and there are a couple of solutions for this around doing like a little loop
54:23
in a DVD check Yeah, that would be a thing I should fix Yeah Yeah, I've seen at least two articles on on the internet about people doing this so I don't imagine it's particularly difficult Yeah Good point on none really it's a bit of embarrassment
54:50
We just said yeah, let's do it. Maybe maybe I don't know where Lee is. Maybe he can speak to that. But One of my colleagues is the founder of the company He's a longtime free BSD user and now he's also back on truest and his laptop as a result of this
55:04
So it's kind of cheating. Yeah That we were preaching to the converted I think we were all very frustrated with the status quo from a stability point of view and a customer point of view and we wanted To change some stuff and it was a more a matter of framing it up saying here are the problems we have and Here is a way of dealing with it
55:24
anyone else Great, if you're interested in more stuff, like I know that sometimes it's hard to go What are those words in the screenshot the slides are on the site? My contact details are there and in the next couple of weeks I'm gonna blog all the in long form all the little bits that I glossed over like how does collect D work?
55:43
What do we do with Rhema and that stuff? So you have configs to look at if people want to play with it Thank you You