Logo TIB AV-Portal Logo TIB AV-Portal

Tales from the North

Video in TIB AV-Portal: Tales from the North

Formal Metadata

Tales from the North
System Administration of a Geographically Disperse Network
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Over five years ago I joined a Northern Internet Service Provider (ISP) in Yellowknife, NWT, Canada providing high-speed internet, email, website hosting, and other specialized network services to approximately 50 communities located within the Northwest Territories and Nunavut. Our satellite headend was located in Ottawa. Originally, each site systems (known as earth stations) ran FreeBSD 5.2.1, but, we could not run the same version of FreeBSD on newer hardware. Thus, we started using FreeBSD 6.2. Different sites running different versions of the FreeBSD operating system created many headaches when debugging software issues for a given service and while migrating services within a community from one server to another. Also, due to a lack of foresight, a central repository was not implemented and thus with the use of the newer version of FreeBSD the diverse software stack versions were not the same. To solve the software stack issue we implemented FreeBSD 5.2.1 instances within the FreeBSD jail framework before jailutils and the Warden application from PC-BSD. Custom scripts were created to launch and terminate various jail instances. However, this still caused issues because the system tools used in diagnostics on the host system were inconsistent across the various sites. It was my job to develop a strategy to update the base operating systems on all of our systems located across the entire network. To do this I relied heavily on my past experience of administering and updating Diskless Gentoo Linux instances at MUN and reading the FreeBSD Handbook. I took this task one step further and created a custom FreeBSD ISO based off of FreeBSD RELEASE coupled with SYSLINUX's PXELINUX framework. I was able to install the custom OS from a known stable state. I would like to present this material to the BSDCan attendees in order to share my experience with others and to show that you can accomplish the same task without spending a large amount of money on proprietary software.
satellite email services statistics presentation satellite services code time provide workgroup sites geometric internet Computer animation software provide system computer scientist system life hill mazes website
satellite server satellite mapping LAN blackbox LAN part sites Computing Computer animation real vector software Funknetz system configuration box configuration cycle Router wireless Router Kanäle systems
Actions server Free Installer services factor log clients part stacks sites bundles software Hardware manifolds Video system website services unit Serialization focus purchasing information server workgroup instance component-based Computer animation software Hardware system versions HDDs platforms website Remote localization
Free smart stacks energy system processes disk platforms Kanäle services server point model clients workgroup ftp server machine Computer animation software orders system versions platforms margin Booting
Actions Manufacturing execution system time instance part topology mathematics mechanisms flag processes Security Kanäle services Bugs email track Blocks building point binaries sampling staff bits machine category management configuration system configuration Right Router spaces Booting server Free services smart DRAM Computing hierarchy slices system structure disk default distribution demon server Magneto-optical drive clients workgroup NET ftp server ftp server Broadcast Computer animation Hardware versions platforms life addresses
complex services time corresponds images directories help clients Content powerful memory configuration system box testing framework configuration localization share default script services server workgroup Menu clients bits bin machine ftp server inclusion Computer animation mainboards string versions HDDs disk website testing Games Booting
parsing clients wikis web image mathematics different kernel localization services binaries storage Menu Demo machine processes HDDs website testing Remote server services link images directories inclined Menu Content events memory string Hardware testing Booting default Installer default distribution information full-duplex projects clients incident bin applications inclusion Bootstrap Computer animation intrusion detection systems string versions
satellite Actions time PROM instance clients open subsets heads stacks sites image orders mechanisms configuration framework Office Security algorithm satellite building binaries bits Types different framework website sort figure Remote Security Blocks Booting server implementation Free services patched Content fields production period operations slices system level Multi-task implementation report Booting optimal Authentication response information server projects analysis core Databases basis applications Location words kernel events Computer animation software case Universal versions archive platforms life Internet cookies
suite Actions presentation bottom Installer code time unit Contracts clients heads sites information mathematics memory single sets file system box framework scratch vertices Recursive localization errors Security position load binaries administrations bits instance mining management processes communication orders system website pattern Free current record point purchasing track server backup Free services time machine machine Clusters NET help Feuchtigkeit Mass blackbox distances Computing fields production memory Hardware system spaces testing share report backup user interfaces server vision Actions applications SCSI mathematics loop Computer animation case carrier life wireless
satellite email Actions Statechart Logistics states time unit Contracts water sun Computing Explorative Datenanalyse graphical user interface hypermedia Rolling framework man views files Menu instance scans favourite measures degree processes URN Now website configuration SIP WAN services Open Source network OS maximal 3rd distances Computing Hardware system mazes Sum conditions web Super CAP sin server forces plan core coma system call CAN-bus mathematics words Computer animation software Synchronous localization
satellite time loss clients counting part sun sites Computing Explorative Datenanalyse graphical user interface box clone framework God script man satellite views files Menu bits instance statistics favourite connections management means processes Now orders Free WAN neural network Random access The list theories van internet terms system mazes Sum association Dean-Zahl form rules Super server unique lemma model law Content core lines CAN-bus mathematics Computer animation software
satellite point server building non-existence implementation Open Source files time part sun rules sites topology Explorative Datenanalyse graphical user interface component-based different system framework testing mazes Sum area script Super sin views server forces files binaries core lines instance favourite raids connections mathematics CAN-bus wave Location Computer animation environment Sum Results
so doing hardship work at Memorial University of Newfoundland with the computer science present and past with the handling and the problem at and stats believe it or not the the problem at its that's where very demanding on our services and computer science this 1 and run the code just hit go so today I'll be talking about tales from North system illustrations a geographically dispersed network so while 5 years ago
I joined a northern internet service provider in your life there's 0 50 sites 25 in the northwest territories and 25 and 1 of but there were segregated sites and it was an interesting time the satellite had and is still located in Ottawa there's
a map 1 like so here we see that purple side is not about Northwest Territories down here Northwest of vectors non-call cycle like which is part of community yeah that X so the
typical configurations satellite equipment the black box my interests of local area network equipment which consists of routers switch wireless network equipment other black box that suddenly here about because on system and by trade like but for the servers with to more physical systems located there and only policies that the white box
general systems so let's focus on remote sites really site manifold instance of previously 5 . 2 1 along with various software stacks on them and that as women aged we purchased new assignment or repurpose equipment that it failed and deploy factors at community we found with the new equipment that 5 . 2 1 was not or just not supporting the hardware that we purchase and the kind of thing where we so on the software group and a couple of the not it means that together this is part of the drawing and said right let's make use of previous the jails and what our services that we offer for clients information jails and then it doesn't matter what you have on the server side and that allowed us to In Proceedings so you the jails were there was 2 per community 1st was man and Indian as the 2nd with L that that ACP and the network accounting so what was the listening to see how much traffic and uses so is fuels
so because the from video and then we had a on our local community service support wider CSP take well our eyes and ears in the site when we can have a person there when we would send attacked that tech would be you know looking at the infrastructure that was in place and making adjustments or what have you they were not really knowledgeable with BST but they could replace a hard drive what have you require and fix cable how do we how you know
for troubleshooting model so
troubleshooting does not all platforms ransom unless if we encounter the issue as offers you had we do but not all systems have the appropriate software stacks in order to do debugging but we you move forward I we started you know it's like what can we do this is where I came in and said was primate everything a marginalize of energy genius distributed
computing stuff so I was with the this was previous server previously server approach uh where each system was computer were basically the he FTP NFS In a set up the it's really come the fore the servers that were similar management I work and you know from there so take that same terrible and do a clean install of the other system but once it was but that's a this the the properties clearly Our 1st publicized like well let's just go with no 1 life and the tree hierarchy what have you but then rollout and new there well this to resolve that we went with a 3 slice approach the 1st 2 would be for the OS and the 3rd would be for the local data and they would move our jails in that space and then use that was the boat 0 compared with appropriate flags to the
instructor system to who got that alternates so here is a sample here he configuration we found when we were doing I was I was operates with the Cisco that the built-in TFTP staff I just couldn't handle we were running and issues so I went with the it's to you have to keep it from ports block size and in the 1st part of this and it will rebound that system to address and secure true the coast of love along with ACLs that were on the right so we had a kind of a two-tier security mechanism but producer came we had this concern where you know the traditional set up with various next server while name that we want to move from and groups no I should point out that no change in the following we can implement different 62 binaries that come with different distributions In this regard can flow to and if the distributions were running 32 bit and we were looking at you know what would go down the road we have around 64 bit that part so i gives you the flexibility there have rolling operates in a in the east side if a side was alive without the be was dark we don't light up and try to keep our service outage for customers who absolutely no would send out e-mail notifications and at times various services structure and the reason for that destruction you and went on them so that they were never left out there
and with the manifest shares yeah your typical exports definition and then you're corresponding Ursula complex to bring together the questions so far when you run too fast among these schools the so I had to you know talk to you a bit about debugging relying on the local CS he wrote that mutation shack probably turn off the of the most that is driving signal through this London and work safety other outsider ends 1 thing that I came up with was use insists Linux with the men disk options you can configure already existing the FTP to run them and test this was an example of 1 thing that it was somebody's systems so we purchased came with diagnostic based on I move that the community and just the script accordingly and then if the system went down but had the was if anyone down was lit up with running services I have the ASO mounted such that I could do client payoff was not a power supply or motherboard issue if it would I got this was and then I confronted the principal tests through the mandate we think that so it that help cut down on cost of flying attacked to the community cost of not having the ship white box replacement so you know which say almost ran 1st hard drive that incentive then it is a way for at time frame where we were going to have that site service for you if we have that using some other at work in and that was again to ensure that our and clients head service that off and we always stressed service whenever game minds because in the north power susceptible you never you'd have rolling hard it to the generator failures have outage not tell us that we're kind left scrambling try to figure out how things things had gone down off so then I can
explain here so so for the
projects you cannot bank from crawled out work copies these binaries to be sure the default menu you can just as they have examples on the wiki site and you use owns the land boots uh what's another distribution Rasmussen you hardware text itself like that so as long as you don't have to this allows you to prepare for any event you have a tool infrastructure to to call upon to help diagnose this this this will stop not that was all local in that community what do you do that everything went off 1 master and then we have take the secondary server make that a client so what is yet assess this I was non-OSA blood brought therefore provide and probably glossed over there so either could be the master if required Hi there could be we had 1 incident where we lost the hard drive subsystem the on a server as this is the great idea to the light up that server and this was more on this see if we can run a few the jails of so as the primary server that was left running with the the 2 parsers you know handling costs were information I was running a test on the 2nd server scene yes I can watch another like and do these other jobs if required I could have primary server that's in Florida or you alive I could have died as a data store not a share off that the client machine and then no offense amount in the jail or what have you share information so it's like it's different layers of an onion if you want to look at it that way you do Walden beautiful things with this 1 this unfortunate that about the sentences this again Chris talk yesterday about the piece the the incline installations you I would love to have those applications back in 2007 so that introduces or so yeah as
along with the configuring Sicily on the server side we made a change in the great images directly to where we in the the places that we had of the change the string in the city that come from this image the GPD links of the for the bootstrapping and to some of the work that I'm doing moral and into that they on and talk what this opens up you know according through a web server itself like that you can have distribution of honor a remote web server access through duplex and that's fairly interesting work there and then of course 1 change a professor you sanity check make sure of things will be restart service with the client machine and then perform whatever tests that want to do such as a mentor
so now that we had 50 sites this was only done in Nunavut with the whole all also doing in the Northwest Territories when the time was right start upgrading that infrastructure there 1 so we had her head satellite node in Ottawa we had a Yellowknife office so we had a presence there we have some servers but this framework for the jail implementation work so well we want to start to convert our services in these 2 locations to run the same way so we also had a cookie cutter that approach for the handling software stack upgrades the jails and and then we took turns you know once we were we had made these jails of the various services we would role them across the different servers in you through the same technique of database so it's the so we could do 1 previous
260 but it was 7 1 before I'm not sure you know so this technique you lent itself to many applications fortunately and they're going to finish now with that you know I talked about a terrible you pushing up using that do a client install I took that same terrible framework greater customer by some so that the when the technicians went that site and we don't have you know er open year consuls in those remotes they could just put the CD in the figure and servers up network and I could do with this blind flying by the the cotton public northwestern communities with success but so kind that my fingers crossed because you never know which way was going to of seats so some interesting notes a talk about the cost Carol option if you had with multiple slices you can have altered ROS is that you using if the software group 1 try 64 bit previously that option was available or is something that crops we go to the secondary image reboot server with the known good catching we had always and for deployment we used the artist for sending the data out the communities I change that are saying the I thought I could do work as well there was the same was using some flags such that you know our project change the algorithm that SSH was using something a bit later in product not congested network 1 was going up so that the community was suffering the black 0 period this so no creating your own custom previously terrible that gave you the flexibility of note taking my work was based on previously released so it's 6 . 0 release and then you know they use they know offense amount of news reports the packages that I wanted customized to look for some of optimizations with university workshop and a few others and that allowed flexibility going forward that all sites at the same binary so that the same software stack if the issue is going to be seen it was gonna be cross of border or it's an edge case and we have the method to develop and provide you know information back to the top of this is an issue can be resolved it no OK it's in a kernel over no less related trouble we can't takes to previously through the reporting mechanisms offers much data to that group to help diagnose the issue that and then I know securities that kind a dirty words people talking about 2 2 2 was at 2 levels to work 2 types of authentication operator server and off that's the beauty of using open Beispiel previously used always said you go bases you're always going to newest the binaries impact what sometimes we don't have the luxury we had the same sort of stuff but previously with how the systems were built we don't have access to the same ports stuff like that and because terrible customized so we could archive you support we can go back in time and revisit bringing backward security patches I put in place of a bill patch server that would you know on a weekly basis I read the analysis going out and security hold on the appropriate that patches were if was a quick fix had in the field up during off hours the very with the position and adopt a new life for years and after which I that move that means land and work for that from the final 3 years old mutants was 3 and a half hours but everyone else so when everybody in the West was asleep others up work action and that he was showing me the only if officer Ottawa came on pneumonia offices in that responsibility this means that so the so future
work this is also going to in you know with the knowledge and experience that came from this position you know that took that framework there was a little users within the Department of Mathematics and Statistics where I'm currently working and deployed a with knowledge with my manager so and everything was about poor people embrace things because we were like shop in the were running gentlemen examples like Linux from scratch and for that you know a rent-free this 9 1 1 In the 64 bit as my master server and then this is an older adult here like tool for to do this set up 1 with fewer possible we set up life and a fair share was on you a fastball system z effective I use that you can set up ice because the targets for some machines and then also did shares and 1 might you that you compute node set up so that could use testing and the 3rd day a mounted up her blue as a client and put 2 3 terribly this and this is the 1st new In fact of that single letter clusters so that you know how stupid but it was just point process the approach I'll be fixing that next month but on that's the main server had set up time machine because I had clients within the department using the Mac OS other than using SMB shares the access on record prayers how back up the file systems appropriately so that if solution was wrong we have a way to back that up on so that's a hundred almost finished beta and going and production see I 7 the terrorists because a lot of flexibility were refining 1 of our computer nodes in math that this guy 0 was just being congested that the complications of binaries we're just causing a massive load increase of over 10 and the machine was just stagnant 1 abroad in the ice was the target of the discos or push over the network stack and that it was amazing that speed increase of for the researchers from the code so they're happy and we don't have to go 3 per year purchase new hardware for a dedicated file system for them then the user interface of snapshots yeah used that toy suite to give them a point in time reference to go back to indicate that thing the and
questions what is this yeah I at the time of assistance goes to because gives the installed Justin and I know that's what I was going to vote on the 9 1 1 system I have I used to be the install and like I said in lecture yesterday from Chris morals this is great now I've seen an example think that embrace and extend this this framework that I developed that related to think of this is and what 1 thing that you with regards to you know this might go back to our head and talk about having you the servers there and then doing a rolling updates we have a in the we did make use of gene here we had great mirroring we had backed up so we had was a 12 hour turn the backup so we have a prominent sector backups removed back data and this is when free nested loop freeness coming out and I really like the eyes because the approach so that gold mine vision was too great to take 1 of the primary backup servers turn that into almost like San they called the cluster of servers and prompt make contain clients and then they just amount of that I suppose the target launched the jails you know so people just see that the services are operational this a 1800 ms as best known my scope of but I'll tell you that they the you way since weighted by its the acceleration of the yeah have as that's why presents and wireless carrier all black box hardware at the server side that's where I was interested with that you know take the terrible created CDL status like Swiss army knife going out field the following previously the update go back to a known instances you can use custom entries still this this for finding out about order and security you the food security applications you used on systems and then you know also I did enable reports to be sent back to the net after group and would look at the mel-spectral systems were still functioning where Nagios we had 52 instances of narratives with an shipping the data that you are the 1 and you want so we also have a kind of hobby going this make sure that this is the sites were up plus there were monitoring monitor and you know tried to make it almost a light so is anyone else Mr. this you the original I had the servers were just generic when you boxes the they were running Pentium 4 the 1 . 2 gigahertz and you to memory footprint those goals 750 the big my hands on the dollar of 300 back the if up the infrastructure side this were happening you know we were trying to remove remove the onus on talking with the CSP with regards to troubleshooting in in installation equipment because in the north we're not there on their own time you know it's hunting season it's this that and and you try to convey when you call chat with somebody mostly use nice language so that you know you try to convey the seriousness of the issue that they should you quote the communication track for us tweaking bottom jiggle cable or snow remove the snow or you could have been pattern in the events so that the wind character error doesn't cool down the shacks we were not at some sites we did have heating in the the dead of winter so minus 40 below contract run too hot and that was where they were called I can't get into that unfortunately but we had to open up the door at the end of the day that Chill coming of course with that as you know drastic changes denture moisture recursion 1 Jones said all this site was I took over the home user or have no solid nodes they strike a match of as it might want to follow this community because it's about this is that there is a lot then so so we work with what this in the north of 15 while 52 if on account of your life so currently no but it too many missions you yeah that where reason open here for Council axis with the that and we were replacing an our power distribution units was I forget the brand name so that would remotely triple requires that help to public funds over 7 is the son of is anybody have any fast tales like we're distance falls off there was 1 that
I talked about the words and so yeah well shipping this big
from that you and the shipping problems that she's and flights have been canceled due to the Arctic Arctic winter setting in you guys probably watched the discovery channel as well as in the media as pretty nasty stuff that they find so we had plays and shipping out equipment personnel getting gaining access to a remote collocations was initially there was a an outstanding disagreement between 2 companies and we had this some have paperwork signed that presented to them and say we will be coming site the Sierra equipment under state and enforcing in an emergency situation where community was offline meaning the local community service support water in there as soon as possible we had to you know work with that person you know expedite that issue tracking down someone in that community working for the company let's so it was always a logistical nightmare shipping equipment in safely there's another thing so if you can better your hardware and as much as possible because you know fragile this drop in on the on the runway in a little bit extra for it before it's on the plane that gentle what do you want to actually there was a story where our tech was in a community in a measure of the distance and he took us get sq well with the local CSP of the community was in the visit another 1 was in its own state so you know Solectron on the highway the next thing you can do all of things that what because pretty harsh conditions and you know thankfully the job I had I got sit on the front porch watch the young guys coming Hammond back edible money for beer everything's any other questions regarding logistics of system in yes do you think sensors and stuff like that yeah we had was minus 45 you know with the ships coming in we're in the summertime where the 10 degrees the contract would have been this summarize the winterized was and computers were soaring over 30 degrees the sources the stamp the people yeah it is was a running there were fine but if they had to start from a calls as we ran into trouble and you the some stories can which numbered all year are all right I said it was merely Chris Morgan devoted his lecture it a lot of it will give they are just not come back other yeah you know we had a good for what we had a small group on 1st 5 people working in the market you limit the time for them the word network and satellite chorus and then there is little army units in you know running the the that framework and you know what force this framework to them and learned you know how to up these uh this is cool I West equipment configure ports Israel you know what what's check for then working with the software group with the rolling under a new services and stuff like that so 1 thing I really glossed over in Ottawa is so we have the when search service instances located there were required so that they could have had access to the that points south at
the time we can have it is the only we have think it's hundred make connection copper then you know it was work connection of the so our clients but benefit from having the sites close itself and with that uh you know that was the 1 of the talks yesterday they were talking about uh you know using all of us to try and minimize the footprint of the jail and then creating other jails are clones off that Yale and then have there provisions instances out there we were doing that and this is before warden came on feel so we had to customize start and stop scripts were jail ministration that was good kerfuffle and then when it came to pass time we you we had a a framework in which we could operate safely and such that we didn't you and interoperable end users experiences with the system because this back at that time was like you know the internet you really need it people really relied on it that in order for content being delivered and threw up on States purchasing selling their unique craft someone had this I think that the law of the land of the yes yes you do sorry now we have this were small shops here we can't learn from through together I and and you why and there added below that you know this was of Free BSD safe I had the time you know I I had to come up to speed got previously had experience with age box IBM years itself like that you know getting to know the lingo and definitions the network guys at the definition of terms satellite there's I had my own also had an accident thought you know this slow down that means you enunciate I might mumble sometimes but it's a form of association earlier usually behind in the back room sure that this work see me out front back you uncheck and the so this is the infrastructure that have a not all beautiful sticky down I grew up couldn't care about its 0 in think lines that consisting I'll have a coffee and things got on this some partially we had the lagging and we have the packet loss to older infrastructure that got replace now with the newer equipment that's been deployed God you know that the hardest part is going through the years have back after management model that kind of bit they can change their up north I can't there could be below spot not all same thing that you the so we this is we have question of of what is the the I do not want to you might as In the last few years the some we had a taxi driver 1 can use all try random access the internet you know what it was between jobs and 1 the job fairs away you was just brilliant to
see that you know theory is driving a new so as internet access to these things great so sorry always
wireless what this I can't cover the rules of however in 100 yards so close and in some instances we had a wireless connection between 2 sites using Dragon wave technology so so what and when how close the signal was quality of signal having and then you know if depending on what the environment was like rain snow that congestion that build up the snow on editions of of got better as different satellite components change with different areas of this thank you and you and I the I for off-site in there are what had and we have 1 server in the building and then in the raid which was both 3 minutes from the other 3 men walk physically separated so we no no back up no less than that in the in the lines were notified that you know this is not a tree gone they use they didn't have time at the top so everything local we try to all the data are what if they picked by hand and we implemented nursing over to migrate jails back for or dependency much possible so that you're I will 15 at the at the latest the you can only do so much and the sites were using UFS false systems maybe a ZFS implementation snapshotting and then sending and receiving within great that really what started at started the existence 7 and what it was so body and not all the feature set wasn't there lot the what I took it in the stock previously released so itself extracted and local forces I can't do this squad jails set up and you still you know mounted up approximate that all systems then change and then applied operates to update binaries of the system this all packages that I want once that was done when and hacked out components that in 1 they created new entry falls church ck sums present a URL for a monkey brains that came across the handling this is part of the framework that you want to use what he was like yeah I don't know how to put the file just right where so I I had a script 1st I left company and left so I to pay have to rebuild it with its interest in the because I have the framework of so that the base it on the previously released based a test the latest so if you got that so they can make a point 1 the you sure the source results yeah OK that's a wrap think we're much