We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Cloud Native GIS

00:00

Formal Metadata

Title
Cloud Native GIS
Subtitle
The value of development velocity in GIS and why you should consider a shift to cloud-native
Title of Series
Number of Parts
52
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Alistair's talk was the second talk in the "Cloud / Development" session at FOSS4G SotM Oceania 2019, organised by OSGeo Oceania and held at The National Library in Wellington, New Zealand from November 12-15 2019. FOSS4G SotM Oceania is the coming together of Oceania's geospatial open source and open data community - with four days of workshops, presentations, a community sprint and social events.
VelocityShift operatorPoint cloudSystem administratorFormal grammarConfiguration spaceData managementFrictionOperations researchSoftware developerScale (map)Local GroupStandard deviationIntegrated development environmentGamma functionDynamical systemBitCartesian coordinate systemExistential quantificationComputer fontVirtual machineStack (abstract data type)Projective planeConfiguration spaceSoftware developerLevel (video gaming)Mathematical singularityFormal grammarGodProcess (computing)Presentation of a groupNeuroinformatikElectronic mailing listServer (computing)EmailArithmetic meanVirtual memoryOperator (mathematics)Mixed realityExterior algebraPoint cloudPerpetual motionMathematicsLaptopCoordinate systemPhysical systemStandard deviationSubsetSystem administratorCuboidCASE <Informatik>Product (business)Task (computing)Wave packetIntegrated development environmentSelf-organizationConfiguration managementSoftwareFluid staticsRadio-frequency identificationInstallation artSet (mathematics)StatisticsComputer animation
Point cloudCodeConfiguration spaceScheduling (computing)Execution unitSoftwareCartesian coordinate systemOverhead (computing)State of matterService-oriented architectureData managementPoint cloudData recoveryProgramming paradigmScaling (geometry)Virtual machineCuboidServer (computing)Multiplication signPoisson-KlammerScheduling (computing)CurvePortable communications deviceCloud computingDemonService (economics)Configuration spaceNeuroinformatikSet (mathematics)QuicksortDifferent (Kate Ryan album)MathematicsScripting languageProcess (computing)MappingPower (physics)Online helpLocal ringMoore's lawUsabilityDisk read-and-write headCASE <Informatik>Computer animation
DatabaseExtension (kinesiology)Information securityComputer-generated imageryMiniDiscConfiguration spaceInstance (computer science)Service (economics)Gamma functionDefault (computer science)Proxy serverInformation managementPoint cloudSoftwareTask (computing)Point cloudBitCartesian coordinate systemLevel (video gaming)Physical systemForm (programming)Incidence algebraQuery languageContext awarenessDecision theoryDirectory serviceService (economics)Proxy serverData storage deviceOpen sourceTerm (mathematics)Computer fileOverhead (computing)Multiplication signBlock (periodic table)SpacetimeMereologyGeometryCache (computing)Software developerTesselationECosObject (grammar)Type theorySoftwareCategory of beingGoodness of fitData managementScalabilityCASE <Informatik>Information securityConfiguration spaceAddress spacePhysicalismMiniDiscDistribution (mathematics)DatabaseRight angleServer (computing)INTEGRALPresentation of a groupWeb 2.0Projective planeMedical imagingFlow separationBackupQuicksortDefault (computer science)BuildingCodeAutomationHeegaard splittingSolid geometryInternet service providerConnectivity (graph theory)CubeEmailMobile appCapability Maturity ModelPlanningInternetworkingRevision controlComputer clusterFile systemSource codeVirtual machineData conversionCloud computingScaling (geometry)UsabilityArithmetic meanComputer animation
Transcript: English(auto-generated)
Hello everyone, so today we're going to talk a little bit about deploying GIS applications on cloud native stacks In this case kubernetes. So just begin with who am I I was introduced to sysadmin I've kind of moved into this DevOps role
I Have no formal GIS training. I sort of got asked to do this project and Had no idea what a coordinate system was I had my first experience well Changing from one coordinate system to another about three weeks ago so Yeah, that's that's about the level of knowledge
So just a small bit of history about how we normally build our our technological stacks We tend to have these quote unquote pet VMs where there's a lot of manual process put into building them People end up becoming owners of them. So when something goes wrong with a particular server
Someone else will say gets the XYZ person and it's their problem even though Realistically, we should Have everybody know how to work with most of our systems We are slowly moving towards DevOps and cloud native the
Problem there is with any large company. We're approximately 350 employees There's a lot of legacy systems to consider And a lot of people who are very hesitant For change because people don't Don't like what they don't know
We're also normally deploying things with more traditional tools so puppet ansible system configuration and We tend to have the traditional ops are in one team and developers in another and operations say things about you know developers want to throw something over the wall that they shouldn't or
Developers complain that operations are being a little too strict on what they're trying to do So just to compare some more traditional approaches That you will see there's the all-in-one virtual machine
I think someone spoke about those last night in the lightning talks the problems with those is You tend to just install things on there. There's not usually a lot of configuration management They're difficult to scale up to multiple machines because you might be running four or five applications on a single machine
The alternative to that is the handcrafted pet VMS I just mentioned When we create one of those there's still a lot of manual configuration to install a piece configuration software such as ansible or puppet We still have to think about how many resources we're going to allocate and that's a very static value
It's very difficult to change Like I said, we treat them like pits they have owners People tend to be very scared to work on a system that someone else built While they don't have the scaling issue, it's still a time-consuming task to stand up a new VM set it up
Set up your configuration management and so forth So with these approaches nothing is wrong For most companies these tend to work most
organizations As I mentioned they they lack flexibility and God something's happened to the the font Sorry about 20 minutes before this this presentation I had to convert this from a web-based
presentation to a PDF file Yeah, yeah I clearly didn't read the email properly Anyway The configuration tends to tightly be coupled to that project meaning that any
gains with Doing that trying to have a generalized approach don't always work and we also tend to have a lot of deviation between environments, so The machine built on a Developer's laptop may be an all-in-one vagrant, you know Linux virtual machine box
whereas the Development stat with a production stack. Sorry is probably seven or eight different boxes each with their own role in a singular application Yeah, and obviously who's heard the phrase it worked on my machine
I We've been thinking about getting a swear jar for that one So enter cloud-native These aren't everything that defines cloud-native These are just a handful that have been picked from the cloud-native computing foundations list
If you're not aware of who the cloud-native computing foundation out there a subset of the Linux foundation that has been Brought together to handle projects like kubernetes prometheus, which is a kubernetes
Native monitoring solution Um, but anyway, yeah, they're containerized. So as in docker, which I imagine by now most people are aware of Orchestration means you don't worry about where your application lives You let the software in this case, it'll be kubernetes
Handle where your application lives in your infrastructure You just worry about the application is in there And I put brackets around microbe because It's more service oriented. You don't look at an application as
the whole the whole thing is your application as opposed to Or the application is comprised of services, sorry and Flexible in that you can dynamically scale both your infrastructure and your services. So
If you need more compute power, it's really simple one-liner. I can add three servers of compute power It's highly available. So if one machine goes down the whole thing doesn't die and Things like disaster recovery
It makes better use of resources because we pull the resources and Like I said the dynamic orchestration will allocate Your containers or your applications where they can go where there's resources available and Portability in that this configuration can be or flexible as in portability in that
Your configuration can be quite easily moved between cloud providers Meaning you aren't necessarily locked into a certain vendor if you want to have Your application on one cloud and then you want to have your disaster recovery on another one It's entirely possible
So looking at Kubernetes It's one example of how to do cloud native. It's right now probably the most popular there's a few others such as Docker swarm, but This one seems to be the one that's taken over
Yeah, the configuration is all simple yaml syntax, I don't know how many people have sort of opened up something like puppet looked at the horrifically Difficult at times syntax and gone. I don't want to do that. Give me back, you know, my manually built bash scripts
I'm Reverting a bad configuration is quite simple because of this and I've already explained that so basically how Kubernetes works is we have Two and this this paradigm is pretty common amongst cloud native technologies
We have master manager leader So on nodes these each of these boxes represents just a virtual machine and then we have minions which Basically carry out whatever the master nodes say to do So the master nodes schedule containers create and maintain your resources. They ensure that everything is healthy
your minion worker follower nodes Literally just run your application. You don't have to worry about Them doing anything. They are just about bringing in compute resource Um, so how does this help us lease overheads I think in the last talk it was a lot about how cloud
Helps us with that This overheads in cost Wasted compute time is very expensive if I have to stand up eight servers when I really only need three That costs a lot of money Time like I said puppet is significantly more
fiddly than containers and yaml syntax and Portability as I mentioned when we initially built this we built this on a local kubernetes cluster We dropped it into the cloud and it worked seamlessly and the the other overhead is
You don't have to fight your IT team because they set up your cluster and you deploy your application to it Um, what else we got? Oh, yeah, and it gives you more time to make maps Which I imagined for this crowd will be Quite important
So in our experience, there is quite a steep learning curve initially You have to learn a lot of jargon a lot of verbs a lot of nouns things I didn't know the difference between a daemon set and a deployment when we started and I still fully don't
There's there's a lot of nuance to a lot of these things and it is well documented But there's also quite a steep learning curve initially It Becomes very easy to make changes once something new is learnt though. So once you've got a handle on the basics If you turn around and decide I don't like the way we did that
It's not a particularly difficult or time-consuming job to go through and adjust it to the new thing you've learned as opposed to Where state actually matters on your virtual machines in the past? It's quite difficult to go in revert as revert your state or modify it so forth
And yeah some applications aren't architected for the cloud There's nothing wrong with that. It just makes things a little bit more difficult to configure which we'll go through a bit later on Well here apparently
So The first thing we decided to do which half the internet will tell you you shouldn't is Running a database in kubernetes now This actually isn't there's a lot of fear mongering around this because people are scared Scared that they'll lose their data scared. They'll lose their database
On the other hand We tried a couple of things we see it on there's a project called cube DB by a company called apps code And they make it very simple to set up a master replica postgres database They
Basically take care of all the heavy lifting for you. It's an open source project. You can inspect the source code You can see what's actually being run in your cluster And Yeah, unfortunately cube DB when we first started using it it uses a Linux distribution called Alpine That is very small, but didn't have support for a lot of GIS
Type tools particularly. I think it was G Dell. We had a lot of trouble with So Well, we tried a lot of things compiling it manually and didn't really get anywhere So we ended up going with a Debian base image instead if it works it works, right?
And Yeah, we also had a few challenges around safely exposing the database for external use We eventually settled on Using an external proxy virtual machine. It's not ideal, but it works. It's not quote-unquote cloud-native
but Sometimes security is a little more important than ease of use So the next challenge we faced was trying to create a scalable geo server cluster We
The geo server configuration is entirely stored on disk as files which poses an interesting challenge because getting files into your system that aren't in the form of Say a database Can be a somewhat challenging System do and also maintaining those things you have to actually spin up block storage or
Object storage connect those things maintain them This post another challenge in that if you want to scale geo server out Every version of geo server you're running has to have its own Or has to have access to the same copy of configuration
So you know we We basically in conclusion it needed to have the same configuration and data We had to know when a file and disk has changed and the PDF thing has broken my presentation again Our simple solution was we wrote a small microservice that
Piggybacks off of the geo server notification module it goes out queries kubernetes API and says how many endpoints do you have that are geo service and Then reloads all of those in turn And as I mentioned we ended up using a shared file system we initially tried using a
Object storage based solution, but found that for certain files and certain parts of configuration. They were just a little too slow And this is probably the Yeah, this is the last sort of challenge we faced well last large challenge we faced
By default geo server ships with a built-in geo web cache For our needs that really didn't suit it. We wanted to have a tile cache be a separate service from WMS server from
Apostures database so we ended up using the standalone pulling that all of the the geo web cache component out of geo server and And Redirecting all of the tile requests or every WMS request through geo web cache initially
which Sounds a little insane but We solved this by adding a pretty simple proxy in front of geo web cache and geo server that Just says does geo web cache actually have that tile if not go to geo server get it return it
which is fairly straightforward and In this we've learned that These free and open source GIS projects are very mature pieces of software. They're very reliable Some already have integrations and long-term. That's something we'd like to change we would like to
Have these applications at least reliant on just es3 and more cloud native You Know containers and cloud native software take care of a lot of the boring things the day-to-day Management that you don't need to worry about We haven't had Any major incidents and I'm aware of where the whole thing has fallen over
short of problems with that cloud provider and Yet the other thing is this is a huge eco space There's you know, Docker and kubernetes are a very very small part of a very large ecosystem and some of these move very slowly at times because people lose interest or
you know, we've had a little bit of trouble with that and If you're interested in knowing any more or contact me about this, that's my email address there. Yeah questions
That's a really good presentation Um G a web cache is able to have a shared cache directory. Yes. Did you try that? Yeah, so we're we're caching We're caching directly to object storage
At this stage it was more about if we wanted more tile caches available for say rendering a bunch of tiles in advance or It's also more about making things as atomic as possible. So one of the big principles of containerization is Each application should try to do one thing And we just wanted geo server to be our WMS server
We wanted a tile cache. They work together fair enough. Yeah, cool questions from the audience Dave So I'm terrified of running a database and kubernetes Could you go into a little more around the decision to use to do that rather than say RDS? So
Um, first of all It terrified me myself to begin with I had many a conversation with one of my colleagues about should we really be doing this In particular there are quite a few Quite good projects out there are cube DBs one. There's another one for postgres called. I think I think the company name is crunchy data
These these providers Do a lot of things around that so automated object storage backups you can also automatically rebuild from a snapshot for our use case It definitely works. I can see there be use cases. We're
potentially having some other type of database service would be Much better, but I think as long as you're always backed by physical storage solid backups and a solid backup plan You shouldn't be scared It's not really that far different from just running a postgres container with a block storage thing behind it
Or did you have access to a DB as a service and that is also true So we're using catalyst cloud and we don't yet have database as a service So yeah that did so you couldn't even if you wanted yeah
It's catalyst it gonna be building a database as a service I Believe so, I'm not stay tuned on when I don't want to make any promises for Kubernetes doc a backed one, right Who knows any other questions really just a comment a lot of what you said
Yeah, I understand Yeah, I wanted to make this as easy I didn't want to get too bogged down in nitty-gritty details because that scares people Yeah
Cool so Chris Thanks, so Do you care to comment on I don't mean cost financially, but get cost of doing? Effectively making things cloud-native. I mean how hard There is work involved in that right and there is a work business decision about whether to do that or not
Cuz it's a lot of work right personally that that's not a business decision. I make I mean, I'm just the Developer we do have a team lead that isn't particularly me There is an overhead Especially considering we're very very new to the space. We've done bits and pieces of GAS before but nothing like this
We're also sorry very new to the cloud native space Our cloud is just getting started with the the kubernetes as a service offering is which is what we're using for this Um Yeah, there's overhead in building. Like I said those extra micro services having to test
multiple competing solutions things like that The splitting geo server one with the notifications was you'd count that in an hour and hours like a couple of hours of dev time the geo web caching proxying
part that was Probably the better part of a week of our time We're quite a small team. There's only three of us So yeah, that took quite a bit of our time just to get right make it reliable stable That sort of thing. So yeah, there is there is a time component. It does get shifted around
Cool. All right, we got to call it there. Thank you very much again Alistair