We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Devops for GIS

00:00

Formal Metadata

Title
Devops for GIS
Alternative Title
Devops for GIS in the Cloud
Title of Series
Number of Parts
208
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
24
197
QuicksortSoftware developerSelf-organizationDifferent (Kate Ryan album)MereologyComputer animation
Software developerOperator (mathematics)SoftwareUniqueness quantificationArchitectureInstallable File SystemWeb applicationSoftwareSoftware developerFile systemVirtual machine2 (number)Point cloudMobile appWindowGraphics processing unitInstance (computer science)DivisorPresentation of a groupMultiplication signPhysical systemProcess (computing)Operator (mathematics)Scaling (geometry)Cartesian coordinate systemDifferent (Kate Ryan album)NumberQuery languageServer (computing)Self-organizationType theoryComputer animation
Software developerPoint cloudCodeVideo game consolePoint cloudCodeSource codeSoftware developerOperator (mathematics)Software repository1 (number)Template (C++)Self-organizationCartesian coordinate systemPhysical systemSoftwareVideo game consoleInstance (computer science)Revision controlProduct (business)CircleProcess (computing)Software testingComputer animation
Installable File SystemFile systemData storage deviceData structureSequelServer (computing)Projective planeOverhead (computing)Physical systemDatabaseBitMultiplication signGeometryElasticity (physics)SoftwareFile formatReading (process)Computer animation
Operator (mathematics)Software developerUniqueness quantificationArchitectureOperator (mathematics)Software developerSoftwareProcess (computing)Point cloudMultiplication signReal numberCartesian coordinate systemLine (geometry)Software repositoryFile formatRight angleComputer animation
Software developerPoint cloudInstallation artProduct (business)Partition (number theory)Right angleMultiplication signControl flowTrailService (economics)Key (cryptography)Virtual machineQuicksortComputer architectureMedical imagingBitInstance (computer science)Overhead (computing)Data storage deviceOperator (mathematics)Integrated development environmentDatabaseComputer fileType theoryHand fanMobile appScaling (geometry)CASE <Informatik>Software repositoryFunctional (mathematics)CuboidSoftware developerGoodness of fitStreaming mediaTable (information)Process (computing)Configuration spaceLambda calculusComputing platformWechselseitige InformationTouch typingInstallation artCodeHTTP cookiePoint cloudMiniDiscState of matterShared memoryTerm (mathematics)LastteilungVideo game consoleGraphics processing unitWeb browserData centerEvent horizonSoftwareLogical constantRoutingElasticity (physics)WebsiteCategory of beingMusical ensembleServer (computing)Gastropod shellCartesian coordinate systemWeb pagePhysical systemComputer animation
Transcript: English(auto-generated)
Good morning guys. So this talk is mainly about sort of the philosophy of DevOps. It's meant to elicit lots of questions which Seth will answer because he's better at it than me but I think we're all kind of wondering what exactly is
DevOps, right? Does that mean you just hire a DevOps engineer and now Devoping? No one really knows exactly what DevOps is. It's different at every different organization I've ever worked at but at its heart it's a culture of philosophy that your organization and your developers embrace and becomes part
of how you deliver software. So yeah, am I Devoping and how would I even know? Probably ask Confucius because it's a philosophical debate but the best things
you can do is ask your questions. Are developers and the people who are deploying our software or people in charge of our infrastructure, are they operating in two different silos where developers are tossing things over a fence and being like, alright go deploy this. You're not Devoping if you're not. You should ask yourself how are we architecting new software projects? Is it
just the developers sitting together saying here's how we're gonna build this piece of software without any of the infrastructure or operations team starting that process too? And it's also how you deliver software. Are you
constantly pushing to the cloud? Is this a manual process? Does someone log in to a Windows machine with IIS and upload by FTP your code? Is that how you deploy? So all of these things kind of are at the center of how DevOps works in
different organizations. So number one is architecture. You should be thinking about how your application is going to scale and how your infrastructure is going to scale from day zero. Like from the initial thought, how is this going
to work when millions of people or millions of requests are potentially using it? And that should always involve operators and developers because just tossing it over the fence like I said it's not really a great way to deliver software. You're gonna get it up in the cloud and then your
operation is going to say we can't scale it because of X, Y, and Z. And it's important especially to the geospatial community because of some of the unique challenges that I've come across. I'm sure other operators in this room have seen and that's scaling. The data for geospatial presents some very unique
challenges. So I know there's plenty of GeoServer people here. We all know that GeoServer needs to use the file system and I'm sure all of you have wondered can I run GeoServer in a cluster? And you've probably tried and found it was
challenging because we have stateless staple systems and we can't ignore that. We could look at the 12 factor apps from you know the bleeding edge stuff in Silicon Valley that says don't use the file system. Treat your servers like cattle. You should be able to just murder them and bring another one up and never
care again. But it's not like that for geospatial data. We have persistent data. The other thing is we have relational data and we have a lot of it and that is difficult to scale with like post GIS and things like that. We eventually are going to hit a number of users or the amount of data where queries are
going to take a really long time or time out altogether. And the second thing is we use GPUs a lot. We don't want to serve our web applications on our GPU instances. We want to decouple APIs from GPU work and we need to architect our software to use these different types of resources. And then so delivering
software you should be doing like continuous automated delivery using like GitHub books for Jenkins or Travis or circle CI doesn't really matter which tool you use but this just lets developers just push code to get up or
whatever get refill you're using and it just kind of goes up into the cloud because your operators are building jobs so the pipelines for that code are codified in the repo itself describes how that application gets built tested
and then deployed and it's also good because it makes our developers way more efficient. Developers are just pushing code they don't have to worry about how it gets delivered to the cloud that's the ops side but when we work together developers and operators we have this clean system for pushing
stuff up directly to the cloud. Okay finally our infrastructure you know pointy-clicky in the AWS console because that's what happens when you do somebody's like oh what's this and then your whole infrastructure is gone
and because you didn't make it codified you can't get it back again so yeah you're not allowed to have access to the AWS console you can't just create an s3 bucket it's not just an s3 bucket all right there's a principle here and that should be infrastructure as code and it should be
repeatable so if I have a good example is cloud formation if I have my entire infrastructure in a cloud formation template if someone screws up I can go repeat that exact same infrastructure by just creating new stack infrastructure
can also be testable just like our software code I could bring up an instance provision it and then make sure that the packages are installed on the right versions and the users have the right permissions and then once those tests pass I can push that up into production and probably a lot of people are married to certain tools I am and this is definitely the diplomat me
saying specific tools don't matter they matter to me but it's you don't have to use the ones I like it's but they don't matter it's mainly the philosophy you have about delivering software yeah so I meant this to be
fairly quick so we can talk about DevOps and how you're doing it in your organizations I probably created a few questions well I can talk about a few
specific things so let's talk about scaling which is something that's come up quite a bit in a lot of the projects I've worked on the file system
and geo tips these are massive I'm not exactly sure how as a community we will be able to overcome some of the difficulties we have with using the file system we looked at s3 which is not really a great solution when you need a
fast IO we've looked at elastic file systems so if you guys are on AWS you can use the system which is basically an NFS drive that you mounts it's got
pretty decent performance but you still have the overhead of the network for reading and writing so scaling things like geo server they're gonna be difficult for some time I think until we have different ways taking that data
and putting them in more structured storage systems like databases or no sequel databases like redshift and some there are some some data formats like that but we don't really see a lot of them let's see so yeah we
wanted to talk about infrastructure how many people here are actual DevOps engineers or right cloud formation okay there's a couple people here so okay
so how many times have you probably or have you ever experienced where someone just hands you an application says alright go deploy this and you weren't involved in the architecting process that's fairly frequent to me the places that I've seen have the most success in delivering geospatial
applications to the cloud have been the places where the teams there are there are no real clear lines between development and operations and that's
that's usually a format for success is where your operators and your developers are in the same same room working on the same GitHub repos it's no fences between two so yeah it's something that you can take back to your organizations and ask yourself are we are we just writing software or
are we writing software and infrastructure because you should be doing both
sure so you said what would I recommend to developers so they can make
it easier for operations types to deploy it if you're familiar with the twelve-factor app that was I think came out of the Heroku community one of them is all of like your credentials and everything use the environment for
access for credentials don't hard-code things don't use properties files all of that stuff should be sourced from like the environment right so like the shell you should be using you know get property or get system properties that makes it easier for operators to inject those credentials dynamically in
the environment instead of having to said or echo things into some file somewhere on the disk so we want to mutate state the least amount as we're and if we're just changing files on the fly that's that's mutation right we
don't want to mutate things that leads to the chance for something to go wrong the other thing I would say is if you need to scale you need to think about if I have a hundred instances of my application running can they all talk
to each other or does the user need to hit the same server every single time if they do then that's a problem right you should have a way for state to be persisted outside of the application so that's one of the things I recommend yeah there are something like five talks here on serverless
architectures and a handful of talks on Docker as an architect and a developer what do things like abusive Docker that's a good question so he's
asking about serverless architectures Lambda and also things like Docker so Lambda is really cool I'm sure some of you have gotten to play with it but
we can tend to get a little too excited with some new toys like that I think that they will do a lot of good for event pipelines where you have a constant stream of data it just needs one function applied so like
normalized data or serve one very very specific purpose but I can see people building entire applications in serverless I think it's a great thing they'll be difficult there'll be so a lot of spaghetti code versioning Lambda
functions is a fairly new thing there's a talk later today that you should definitely listen to about service architectures from some of my colleagues but I would say that use them wisely and sparingly but they can be very powerful when you do yeah
talkers great any sort of container that you can just pick up and move anywhere is nice I haven't used it very much I'm a fan of the AMI which is my doctor image I like to just make Amazon machine images and bake all
my configuration to them because I usually work in the AWS platform if you were looking to deploy something that could go anywhere doctors a great solution for it so you said you can eventually kind of hit a wall in post press post GIS when you have concurrent users or you know a lot of
processes how do you recommend scaling Postgres so pagination I think is one of the big things and also can't remember the type of tables partitions
partition your data that can be difficult in automating partitioning of data if you're doing a lot of rights that could be a difficult thing to do especially in a live production database because the active partitioning has some overhead so you may affect users but I guess
ultimately you would have to look at the time of day or you police my traffic and then just do it sorry it's low but yeah I think partitioning is probably the best sure I like to use cloud formation for AWS for provisioning
my infrastructure I use chef solo and packer to build am eyes so I like to
I want to deploy elastic search I'll go install elastic search on like a new instance snapshot it and now that's my elastic search I can deploy a hundred of them without am I so yeah I use confirmation packer
ships a little so he's jenkins jenkins is obsessed jenkins guy he's really good jenkins jenkins deploy jenkins seriously there's a button jenkins it's pretty dangerous so applications when a doctor can't share session state
oh that that can so what do you do about things that cannot share so in
Amazon depending on who how that machine is being like that we're
talking about a user like a web browser hitting that machine then you can use sticky cookies on elastic load balancers and that will route the traffic from the user to look after to the machine that initiated the
session see someone's here no I am I guess I'm not advocating that everyone
can deploy everything in the cloud it's some some software is just so antiquated that it's just not going to work in that world it's meant to run on a machine in a data center so I guess I'm not sure I'll answer the question sorry so I'll be glad to share some some very bad stakes one of them
you saw maybe a hint of it giving users access to the AWS console I am the belief and sort of showed me this the way to this but no one should have access to the AWS console in your production environment like no human
being can access it only one Jenkins box has the right to assume the role to deploy into it and then not have that been unassumed that role and that way literally no one can go in there and touch anything and break stuff because
people will break stuff like I didn't break anything because they were like hey I can I see in the production AWS console real quick I want to see you make a S3 bucket and they tried to apply I am role to it that didn't
well it didn't have what they should have had so they just changed the I am and then everything that had that role applied to it downstream just changed everyone's like what happened no somebody did something and so I would advise using cloud trail and the other things that I've gotten bit before is
giving people access keys and then they accidentally commit them to get up where Bitcoin evil Bitcoin miners will scrape them and immediately spin up 10 10 X GPU extra largest so they can mine Bitcoin for themselves that's happened
twice now so make sure your developers are extremely locked down in terms of resources they can they can create an AWS and also do not commit them to get
off a public good hug repo and it's very bad thing yes those are some of the worst mistakes have made which is sort of not letting a philosophy when making edge cases be like well we can automatically deploy everything in
but this guy says he needs access to do this one thing you're trying to be nice and make it easy when really he should have just told you what he needed done you could have done it for my five minutes and it wouldn't have been a
getting having the opportunity to work with people who are a lot smarter than me has led me to learn a lot of things that have helped me not make
some of the mistakes other people have made which I would say that's a win um yeah so I get into basically work with other people and learn from them has been probably the most the thing that's helped me the most one of them being that my ami philosophy bake all your configuration and the machine
images or like Docker containers same same thing and reducing the amount of mutation that you do on a machine between the time you like so let's say we have to scale good example we got a lot of users hitting our stuff says we need to scale how long does it take you to scale 30 minutes to bring
machine on pull the package down compile it test it run it it's too needed scale when like 30 minutes ago if you're an online retailer that's your store right now your store is crowded with people and we're living a
cloud where we can move the walls but it takes us 30 minutes to do it everyone left so the first time you got on the front page of reddit your startups gonna make it but you couldn't scale so that's it your site remember you so I think that yeah