Devops for GIS
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 208 | |
Author | ||
License | CC Attribution 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/40967 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
FOSS4G Boston 2017143 / 208
13
14
16
23
24
25
26
27
28
36
39
48
53
67
68
73
76
80
96
101
104
105
122
127
129
136
143
147
148
150
152
163
178
188
192
194
197
199
200
201
00:00
QuicksortSoftware developerSelf-organizationDifferent (Kate Ryan album)MereologyComputer animation
00:47
Software developerOperator (mathematics)SoftwareUniqueness quantificationArchitectureInstallable File SystemWeb applicationSoftwareSoftware developerFile systemVirtual machine2 (number)Point cloudMobile appWindowGraphics processing unitInstance (computer science)DivisorPresentation of a groupMultiplication signPhysical systemProcess (computing)Operator (mathematics)Scaling (geometry)Cartesian coordinate systemDifferent (Kate Ryan album)NumberQuery languageServer (computing)Self-organizationType theoryComputer animation
04:24
Software developerPoint cloudCodeVideo game consolePoint cloudCodeSource codeSoftware developerOperator (mathematics)Software repository1 (number)Template (C++)Self-organizationCartesian coordinate systemPhysical systemSoftwareVideo game consoleInstance (computer science)Revision controlProduct (business)CircleProcess (computing)Software testingComputer animation
07:28
Installable File SystemFile systemData storage deviceData structureSequelServer (computing)Projective planeOverhead (computing)Physical systemDatabaseBitMultiplication signGeometryElasticity (physics)SoftwareFile formatReading (process)Computer animation
09:01
Operator (mathematics)Software developerUniqueness quantificationArchitectureOperator (mathematics)Software developerSoftwareProcess (computing)Point cloudMultiplication signReal numberCartesian coordinate systemLine (geometry)Software repositoryFile formatRight angleComputer animation
10:27
Software developerPoint cloudInstallation artProduct (business)Partition (number theory)Right angleMultiplication signControl flowTrailService (economics)Key (cryptography)Virtual machineQuicksortComputer architectureMedical imagingBitInstance (computer science)Overhead (computing)Data storage deviceOperator (mathematics)Integrated development environmentDatabaseComputer fileType theoryHand fanMobile appScaling (geometry)CASE <Informatik>Software repositoryFunctional (mathematics)CuboidSoftware developerGoodness of fitStreaming mediaTable (information)Process (computing)Configuration spaceLambda calculusComputing platformWechselseitige InformationTouch typingInstallation artCodeHTTP cookiePoint cloudMiniDiscState of matterShared memoryTerm (mathematics)LastteilungVideo game consoleGraphics processing unitWeb browserData centerEvent horizonSoftwareLogical constantRoutingElasticity (physics)WebsiteCategory of beingMusical ensembleServer (computing)Gastropod shellCartesian coordinate systemWeb pagePhysical systemComputer animation
Transcript: English(auto-generated)
00:00
Good morning guys. So this talk is mainly about sort of the philosophy of DevOps. It's meant to elicit lots of questions which Seth will answer because he's better at it than me but I think we're all kind of wondering what exactly is
00:21
DevOps, right? Does that mean you just hire a DevOps engineer and now Devoping? No one really knows exactly what DevOps is. It's different at every different organization I've ever worked at but at its heart it's a culture of philosophy that your organization and your developers embrace and becomes part
00:45
of how you deliver software. So yeah, am I Devoping and how would I even know? Probably ask Confucius because it's a philosophical debate but the best things
01:02
you can do is ask your questions. Are developers and the people who are deploying our software or people in charge of our infrastructure, are they operating in two different silos where developers are tossing things over a fence and being like, alright go deploy this. You're not Devoping if you're not. You should ask yourself how are we architecting new software projects? Is it
01:25
just the developers sitting together saying here's how we're gonna build this piece of software without any of the infrastructure or operations team starting that process too? And it's also how you deliver software. Are you
01:43
constantly pushing to the cloud? Is this a manual process? Does someone log in to a Windows machine with IIS and upload by FTP your code? Is that how you deploy? So all of these things kind of are at the center of how DevOps works in
02:00
different organizations. So number one is architecture. You should be thinking about how your application is going to scale and how your infrastructure is going to scale from day zero. Like from the initial thought, how is this going
02:21
to work when millions of people or millions of requests are potentially using it? And that should always involve operators and developers because just tossing it over the fence like I said it's not really a great way to deliver software. You're gonna get it up in the cloud and then your
02:42
operation is going to say we can't scale it because of X, Y, and Z. And it's important especially to the geospatial community because of some of the unique challenges that I've come across. I'm sure other operators in this room have seen and that's scaling. The data for geospatial presents some very unique
03:06
challenges. So I know there's plenty of GeoServer people here. We all know that GeoServer needs to use the file system and I'm sure all of you have wondered can I run GeoServer in a cluster? And you've probably tried and found it was
03:20
challenging because we have stateless staple systems and we can't ignore that. We could look at the 12 factor apps from you know the bleeding edge stuff in Silicon Valley that says don't use the file system. Treat your servers like cattle. You should be able to just murder them and bring another one up and never
03:41
care again. But it's not like that for geospatial data. We have persistent data. The other thing is we have relational data and we have a lot of it and that is difficult to scale with like post GIS and things like that. We eventually are going to hit a number of users or the amount of data where queries are
04:04
going to take a really long time or time out altogether. And the second thing is we use GPUs a lot. We don't want to serve our web applications on our GPU instances. We want to decouple APIs from GPU work and we need to architect our software to use these different types of resources. And then so delivering
04:27
software you should be doing like continuous automated delivery using like GitHub books for Jenkins or Travis or circle CI doesn't really matter which tool you use but this just lets developers just push code to get up or
04:45
whatever get refill you're using and it just kind of goes up into the cloud because your operators are building jobs so the pipelines for that code are codified in the repo itself describes how that application gets built tested
05:03
and then deployed and it's also good because it makes our developers way more efficient. Developers are just pushing code they don't have to worry about how it gets delivered to the cloud that's the ops side but when we work together developers and operators we have this clean system for pushing
05:23
stuff up directly to the cloud. Okay finally our infrastructure you know pointy-clicky in the AWS console because that's what happens when you do somebody's like oh what's this and then your whole infrastructure is gone
05:44
and because you didn't make it codified you can't get it back again so yeah you're not allowed to have access to the AWS console you can't just create an s3 bucket it's not just an s3 bucket all right there's a principle here and that should be infrastructure as code and it should be
06:04
repeatable so if I have a good example is cloud formation if I have my entire infrastructure in a cloud formation template if someone screws up I can go repeat that exact same infrastructure by just creating new stack infrastructure
06:21
can also be testable just like our software code I could bring up an instance provision it and then make sure that the packages are installed on the right versions and the users have the right permissions and then once those tests pass I can push that up into production and probably a lot of people are married to certain tools I am and this is definitely the diplomat me
06:45
saying specific tools don't matter they matter to me but it's you don't have to use the ones I like it's but they don't matter it's mainly the philosophy you have about delivering software yeah so I meant this to be
07:00
fairly quick so we can talk about DevOps and how you're doing it in your organizations I probably created a few questions well I can talk about a few
07:24
specific things so let's talk about scaling which is something that's come up quite a bit in a lot of the projects I've worked on the file system
07:41
and geo tips these are massive I'm not exactly sure how as a community we will be able to overcome some of the difficulties we have with using the file system we looked at s3 which is not really a great solution when you need a
08:08
fast IO we've looked at elastic file systems so if you guys are on AWS you can use the system which is basically an NFS drive that you mounts it's got
08:24
pretty decent performance but you still have the overhead of the network for reading and writing so scaling things like geo server they're gonna be difficult for some time I think until we have different ways taking that data
08:41
and putting them in more structured storage systems like databases or no sequel databases like redshift and some there are some some data formats like that but we don't really see a lot of them let's see so yeah we
09:08
wanted to talk about infrastructure how many people here are actual DevOps engineers or right cloud formation okay there's a couple people here so okay
09:22
so how many times have you probably or have you ever experienced where someone just hands you an application says alright go deploy this and you weren't involved in the architecting process that's fairly frequent to me the places that I've seen have the most success in delivering geospatial
09:44
applications to the cloud have been the places where the teams there are there are no real clear lines between development and operations and that's
10:01
that's usually a format for success is where your operators and your developers are in the same same room working on the same GitHub repos it's no fences between two so yeah it's something that you can take back to your organizations and ask yourself are we are we just writing software or
10:22
are we writing software and infrastructure because you should be doing both
10:55
sure so you said what would I recommend to developers so they can make
11:02
it easier for operations types to deploy it if you're familiar with the twelve-factor app that was I think came out of the Heroku community one of them is all of like your credentials and everything use the environment for
11:21
access for credentials don't hard-code things don't use properties files all of that stuff should be sourced from like the environment right so like the shell you should be using you know get property or get system properties that makes it easier for operators to inject those credentials dynamically in
11:44
the environment instead of having to said or echo things into some file somewhere on the disk so we want to mutate state the least amount as we're and if we're just changing files on the fly that's that's mutation right we
12:02
don't want to mutate things that leads to the chance for something to go wrong the other thing I would say is if you need to scale you need to think about if I have a hundred instances of my application running can they all talk
12:22
to each other or does the user need to hit the same server every single time if they do then that's a problem right you should have a way for state to be persisted outside of the application so that's one of the things I recommend yeah there are something like five talks here on serverless
12:49
architectures and a handful of talks on Docker as an architect and a developer what do things like abusive Docker that's a good question so he's
13:08
asking about serverless architectures Lambda and also things like Docker so Lambda is really cool I'm sure some of you have gotten to play with it but
13:20
we can tend to get a little too excited with some new toys like that I think that they will do a lot of good for event pipelines where you have a constant stream of data it just needs one function applied so like
13:40
normalized data or serve one very very specific purpose but I can see people building entire applications in serverless I think it's a great thing they'll be difficult there'll be so a lot of spaghetti code versioning Lambda
14:02
functions is a fairly new thing there's a talk later today that you should definitely listen to about service architectures from some of my colleagues but I would say that use them wisely and sparingly but they can be very powerful when you do yeah
14:24
talkers great any sort of container that you can just pick up and move anywhere is nice I haven't used it very much I'm a fan of the AMI which is my doctor image I like to just make Amazon machine images and bake all
14:41
my configuration to them because I usually work in the AWS platform if you were looking to deploy something that could go anywhere doctors a great solution for it so you said you can eventually kind of hit a wall in post press post GIS when you have concurrent users or you know a lot of
15:01
processes how do you recommend scaling Postgres so pagination I think is one of the big things and also can't remember the type of tables partitions
15:21
partition your data that can be difficult in automating partitioning of data if you're doing a lot of rights that could be a difficult thing to do especially in a live production database because the active partitioning has some overhead so you may affect users but I guess
15:43
ultimately you would have to look at the time of day or you police my traffic and then just do it sorry it's low but yeah I think partitioning is probably the best sure I like to use cloud formation for AWS for provisioning
16:12
my infrastructure I use chef solo and packer to build am eyes so I like to
16:21
I want to deploy elastic search I'll go install elastic search on like a new instance snapshot it and now that's my elastic search I can deploy a hundred of them without am I so yeah I use confirmation packer
16:41
ships a little so he's jenkins jenkins is obsessed jenkins guy he's really good jenkins jenkins deploy jenkins seriously there's a button jenkins it's pretty dangerous so applications when a doctor can't share session state
17:24
oh that that can so what do you do about things that cannot share so in
17:58
Amazon depending on who how that machine is being like that we're
18:03
talking about a user like a web browser hitting that machine then you can use sticky cookies on elastic load balancers and that will route the traffic from the user to look after to the machine that initiated the
18:21
session see someone's here no I am I guess I'm not advocating that everyone
18:43
can deploy everything in the cloud it's some some software is just so antiquated that it's just not going to work in that world it's meant to run on a machine in a data center so I guess I'm not sure I'll answer the question sorry so I'll be glad to share some some very bad stakes one of them
19:20
you saw maybe a hint of it giving users access to the AWS console I am the belief and sort of showed me this the way to this but no one should have access to the AWS console in your production environment like no human
19:45
being can access it only one Jenkins box has the right to assume the role to deploy into it and then not have that been unassumed that role and that way literally no one can go in there and touch anything and break stuff because
20:01
people will break stuff like I didn't break anything because they were like hey I can I see in the production AWS console real quick I want to see you make a S3 bucket and they tried to apply I am role to it that didn't
20:21
well it didn't have what they should have had so they just changed the I am and then everything that had that role applied to it downstream just changed everyone's like what happened no somebody did something and so I would advise using cloud trail and the other things that I've gotten bit before is
20:45
giving people access keys and then they accidentally commit them to get up where Bitcoin evil Bitcoin miners will scrape them and immediately spin up 10 10 X GPU extra largest so they can mine Bitcoin for themselves that's happened
21:05
twice now so make sure your developers are extremely locked down in terms of resources they can they can create an AWS and also do not commit them to get
21:24
off a public good hug repo and it's very bad thing yes those are some of the worst mistakes have made which is sort of not letting a philosophy when making edge cases be like well we can automatically deploy everything in
21:42
but this guy says he needs access to do this one thing you're trying to be nice and make it easy when really he should have just told you what he needed done you could have done it for my five minutes and it wouldn't have been a
22:12
getting having the opportunity to work with people who are a lot smarter than me has led me to learn a lot of things that have helped me not make
22:21
some of the mistakes other people have made which I would say that's a win um yeah so I get into basically work with other people and learn from them has been probably the most the thing that's helped me the most one of them being that my ami philosophy bake all your configuration and the machine
22:41
images or like Docker containers same same thing and reducing the amount of mutation that you do on a machine between the time you like so let's say we have to scale good example we got a lot of users hitting our stuff says we need to scale how long does it take you to scale 30 minutes to bring
23:02
machine on pull the package down compile it test it run it it's too needed scale when like 30 minutes ago if you're an online retailer that's your store right now your store is crowded with people and we're living a
23:20
cloud where we can move the walls but it takes us 30 minutes to do it everyone left so the first time you got on the front page of reddit your startups gonna make it but you couldn't scale so that's it your site remember you so I think that yeah
Recommendations
Series of 24 media