Deploying and operating GeoServer: a DevOps perspective
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 237 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/57193 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G Argentina 202163 / 237
12
15
16
23
26
36
44
52
53
54
59
72
90
99
114
121
122
123
124
127
129
130
139
154
155
166
203
204
219
223
224
230
00:00
Open sourceProjective planeRight anglePhysical systemServer (computing)Meeting/Interview
00:53
Operations researchServer (computing)SoftwareOpen sourceElectronic mailing listBitWave packetEnterprise architectureSoftware developerProduct (business)Perspective (visual)CollaborationismCloud computingClient (computing)Online helpMoment <Mathematik>Computer clusterWordOpen sourceData storage devicePresentation of a groupGroup actionService (economics)Hydraulic jumpDecision theoryVotingComputer animation
02:54
Open sourceKeilförmige AnordnungSoftwarePresentation of a groupHydraulic jumpDifferent (Kate Ryan album)Cloud computingCASE <Informatik>LoginProcess (computing)WordComputerPlanningFlow separationHuman migrationComputer animation
04:42
Open sourceServer (computing)SoftwareComputer networkMIDIIntegrated development environmentCloud computingService (economics)InternetworkingCartesian coordinate systemParticle systemArithmetic meanStructural loadSelf-organizationServer (computing)DatabaseData storage devicePlanningGraph coloringEndliche ModelltheorieUniform resource locatorMultiplication signVirtual machineSoftwareType theoryInternet service providerInformation securityPhysical systemState of matterInformation privacyInstance (computer science)Physical lawMathematicsLocal ringWorkloadIntegrated development environmentComputer hardwareCombinational logicTerm (mathematics)Flow separationComputerMixed realityTelecommunicationDigital rights managementLevel (video gaming)Elasticity (physics)Reduction of orderCore dumpSound effectDifferent (Kate Ryan album)Scaling (geometry)SpacetimeGroup actionVirtualizationComputer animation
11:15
Open sourceSoftwareGEDCOMModule (mathematics)Software engineeringServer (computing)AreaShared memoryAdditionCloud computingGroup actionLocal GroupInstance (computer science)Computer fileDigital rights managementShift operatorMessage passingFactory (trading post)CASE <Informatik>Service (economics)Component-based software engineeringOrientation (vector space)Strategy gameDatabaseData storage deviceIntegrated development environmentPlanningCartesian coordinate systemMechatronicsMultiplication signInternet service providerObject (grammar)BlogTerm (mathematics)Cellular automatonLink (knot theory)InternetworkingArithmetic meanShared memoryFitness functionPolarization (waves)Server (computing)ComputerVirtual machineGoodness of fitInsertion lossRight angleScaling (geometry)Set (mathematics)Human migrationRule of inferenceThumbnailComputer configurationPunched cardData loggerChecklistGradientMathematical optimizationOrbitComputer clusterCode refactoringScalabilityCache (computing)TesselationStandard deviationBitComputing platformConfiguration spaceCore dumpVirtualizationMedical imagingBlock (periodic table)DiagramSoftwareFile systemComputer animation
19:26
Open sourceSoftwareBit rateWordGroup actionInheritance (object-oriented programming)Web pageForm (programming)Flow separationInformationService (economics)LoginError messageWebsiteBit rateMetric systemDependent and independent variablesAstrophysicsMathematical optimizationCloud computingSemiconductor memoryPrice indexCartesian coordinate systemResponse time (technology)Uniform resource locatorCoefficient of determinationExtension (kinesiology)Integrated development environmentScripting languageDynamical systemInstance (computer science)Multiplication signOrbitEvent horizonLink (knot theory)Cache (computing)Computer animation
23:43
Open sourceBlogData storage deviceGamma functionThresholding (image processing)DivisorCASE <Informatik>Physical systemInformationInstance (computer science)Slide rulePoint (geometry)WebsiteOperator (mathematics)Bit rateShared memoryLevel (video gaming)KnotData storage deviceThresholding (image processing)Semiconductor memoryLimit (category theory)Insertion lossMetric systemUniqueness quantificationConfiguration spaceMoment (mathematics)Social classTerm (mathematics)MathematicsStructural loadSingle-precision floating-point formatService (economics)Computer configurationPresentation of a groupType theoryOnline helpProcess (computing)BlogTesselationSubset2 (number)ScalabilityDigital rights managementProduct (business)Group actionScaling (geometry)InternetworkingPlug-in (computing)Client (computing)Thomas BayesDemosceneArithmetic meanDifferent (Kate Ryan album)Server (computing)WhiteboardMultiplication signElasticity (physics)Virtual machineWeb 2.0WordPurchasingCuboidCartesian coordinate systemCodeCache (computing)Classical physicsLoginFilesharing-SystemMereologyVisualization (computer graphics)Mobile app1 (number)Context awarenessStack (abstract data type)Open sourceDisk read-and-write headWeb applicationCloud computingBefehlsprozessorAnalytic setComputer animationMeeting/Interview
Transcript: English(auto-generated)
00:08
Hello, everybody. The next presentation, it's about geo-server, and you have Lecendre Parma to talk about this.
00:25
He is a senior DevOps engineer at Jao Solutions. He designs and implements geospatial systems based on Jao Server and other great open source projects.
00:42
Hello, Alessandro. Hi, Diego. Can you hear me? Yes. Right. It's your... Yeah, thanks. Thanks for the nice introduction. My name is Alessandro Parma. I work at Jao Solutions as a DevOps engineer,
01:03
as Diego said. Today, we're going to talk about geo-server specifically, about how to deploy and operate geo-servers from a DevOps perspective. It's going to be cloud-oriented,
01:23
so we're going to talk a bit about cloud deployments and how to migrate your geo-server cluster to the cloud. Two words about geo-solutions. The company that I work for,
01:41
we're based in Italy and the US, and we have worldwide clients. We comprise of more than 30 collaborators and 25 engineers. Some of the products we work with are listed here, geo-server,
02:01
map-store, geo-node, geo-network. We offer support services, enterprise support services, deployment solutions, so we can help out with your deployments, customized solutions, so of course you can reach out for help with development
02:21
and professional trainings as well on all of the products listed above. Our affiliations, so we support strongly open source, as you can tell by the list of products we work with.
02:42
We collaborate and participate in many working groups, including OSGEO, OGC, and USGIF. Okay, let's jump into the presentation.
03:06
Here's the agenda, so what we're going to talk about in detail. Cloud computing, we're going to give a brief intro, just the terminology so that you can understand what we're talking about in case you don't know yet,
03:21
why it may be relevant to you, so what are the pros and cons of moving to the cloud, why you should be considering using the cloud if you're not yet using clouds, and the migration process. So if you're interested in it, we're gonna talk about migrating to the cloud in general,
03:42
as well as specifically for your geo-server. So you may be thinking about migrating your geo-server cluster from an on-premise to the cloud, we're gonna talk about that. We're gonna check what are the common pitfalls, as well as give you some tips gained from our experience.
04:05
We're gonna talk as well about containers or orchestrators and specifically Kubernetes, which is quite relevant, I think. Nowadays, it's gaining more and more popularity, and you could benefit from deploying your geo-server
04:22
in a Kubernetes orchestrator, why? And then two words about monitoring and logging. It's a bit different in the cloud compared to traditional deployments, so we're gonna talk about that, as well as how to gain insights from your geo-server cluster.
04:46
So, cloud computing. What is cloud computing? Basically means computing services over the internet. So servers, storage, databases, network, software,
05:02
whatever, every kind of service or resource that is offered to you by a provider over the internet. That's opposed to hosting your own thing with your own hardware that you bought locally.
05:20
And there are several pros and cons that we can talk about in terms of cloud computing. Some of the pros, the big pros of cloud computing are often mentioned elasticity. So the ability to adapt to the workload changes
05:43
by provisioning and deprovisioning resources on demand. So at the time the load increases, you'll be provisioning more resources. When the load goes down, you'll be decommissioning resources. Typical example of this would be AWS EC2,
06:02
if you're familiar with AWS or similar, an auto-scaling group. So you could shrink and enlarge your EC2 instances pool based on load, for instance. Another typical example would be elastic storage.
06:21
So you can get some storage from a cloud provider and it will adapt based on the amount of this space you need. Another pro of cloud computing is scalability.
06:41
So some of these services provided by cloud providers can change and allocate more resources depending on the need of your application. So let's say your virtual machine,
07:04
you can change the type of your virtual machine based on the amount of core or RAM that you need or scaling a database based on the load that your application, your own application is demanding from the database.
07:22
Other pro of cloud computing, reduce time to markets. So especially relevant for business and management, if you think about IIS, so infrastructure as a service, for instance, you can ask for virtual machines, databases, whatever,
07:44
to the cloud provider and they're immediately available to you. You don't have to put down the money, get your own infrastructure, get your own engineers and so on. So significantly shorter time to market.
08:05
Security and privacy maybe could be considered a con of using cloud services. While you're using cloud services, you're basically moving your local application and resources to the cloud
08:21
where they may be running along other services by other providers. So you need to be aware of that. Your applications may be running on a server that is used by other people and there are some security and privacy concerns following that.
08:42
Another pro would be lower costs. So you can significantly reduce the costs of hosting your applications when you move to the cloud, especially if you leverage the elasticity of cloud services. So if you pay attention to scale
09:01
and book resources, depending on the load of the system, then it can lead to a significant reduce of cost. If you just book a ton of resources from the cloud provider without adapting it to the load over time, it can increase the cost of hosting your services
09:24
because they're not cheap usually. There are a few different deployment models in the cloud. You have public cloud, which is the most widely used cloud deployment model.
09:41
It's cost effective. It's what I was saying before. So the cloud provider is making the infrastructure available to you and other people, other companies to run their software. They're all running on the same infrastructure. Then we have private cloud.
10:00
Private cloud is basically a dedicated cloud infrastructure for your organization or for yourself. It can be quite costly and it's usually reserved to people working for government or schools or agency or other environments
10:21
where you need absolutely secure environments and you're dealing sensitive data. So in the private cloud, you're isolated from other services. No other people is using the infrastructure. So it can be considered safer in terms of security and privacy.
10:43
Then we have a hybrid cloud. Hybrid cloud is a mix of the two. Mix of the two is a combination of public with a private environment. So that's typically implemented with restricting at the networking level the access
11:00
and the communication between the services. So your application is running next to other applications, but they're not allowed to talk to your applications. So this kind of combines the benefits of the two. Right, let's talk about moving GeoServer to the cloud.
11:23
So we talked about the cloud, a brief introduction. Now we're gonna talk about how you could migrate your local cluster to the cloud. There are a few methodologies. First one is rehost or also known as lift and shift.
11:42
It's an IAS approach, so infrastructure as a service approach. You're booking the resources from the provider and you're redeploying the application stack yourself without basically changing anything. So you're booking your virtual machines
12:00
and then deploying yourself the applications just like you would do on your local on-premise environment. In this scenario, you're not really leveraging all the services offered by the providers or managed service like databases or other kind of storage service for instance.
12:22
You're just taking your local environment and uploading it to the cloud. It's a relatively easy thing to do. So it's quite common pattern as well. The other approach would be refactor or lift, tinker and shift. In this scenario, you tweak a bit your architecture
12:45
and adapt it to the cloud, to the environment where it's deployed on. This would be a PaaS approach, so platform as a service approach. You're still booking the resources from the provider, but then you're also using some of the services
13:03
offered by the provider. An example would be a managed database. So you're not using the standard self provision database, you're using one of the providers. Finally, revise or build and replace. That's a more expensive approach
13:21
in terms of resources and time. You would basically rewrite your application to leverage all of the services available. This requires of course, quite a lot of fore-planning knowledge to be implemented. As we were saying, rehost means
13:42
taking your local resources and moving them to the cloud. How would you do that in a practical term? You need to choose the right kind of virtual machines. For GeoServer, that would be compute optimized virtual machines, so GeoServer likes fast CPU cores.
14:02
As a rule of thumb, you can get four core virtual machines with four gigabytes of RAM per instance, and redeploy to the cluster, redeploy the cluster. So you would migrate the instances, configure your application, upload your data, and you're done.
14:21
That would be the rehost approach. Refactor, use some of the services of the provider. As we said, managed database has quite a lot of advantages in my opinion. You can use backup and restore features, you can use snapshots, auto-upgrades, and so on.
14:41
So there are quite a lot of nice things that you don't need to worry about. And you can leverage some of the storage options provided to you. You need to be careful on the kind of storage that you use for each component of your application.
15:01
For GeoServer, that means you need to be, you need to choose the right storage for data gear, cache, data files, and so on. By the way also, COGs are supported by GeoServer. In that case, you would be leveraging some object storage
15:20
offered by the provider. And you can think about storage cache styles in the object storage too. Here's a small diagram of a GeoServer cluster deployed in EKS. So Kubernetes as a service offered by AWS
15:44
is a typical layout of how you could do it and the kind of storage that you can use for each one of the components we were talking before. FileShare, which is basically an NFS, you could use it for spatial data.
16:02
So to share spatial data between the instances and you could think about using it for cache styles, for instance. Has a couple of advantages. Of course, it's a shared file system. So all of the instances distributed across the nodes can use it and it scales pretty well.
16:24
Block storage would be your regular local storage. It's not shared, so you shouldn't be using it for data, for instance, if you want it to be available to all the instances. Benefits, low latency.
16:41
So it's good fit for temporary storing files like audit files and log files. Maybe cache tiles. In that case, you would have a non-shared cache between your instances, your GeoServer cluster. So there are some implications about that. So unless you really need very, very low latency,
17:02
I wouldn't use it for cache tiles. Finally, blob storage. Bob storage services like AWS S3 brings a ton of scalability. They scale pretty much indefinitely. It's cheap, so it's good to store lots of data.
17:24
And it's shared again. So you could think about using it for cache tiles, for instance, or cogs. So a quick recap, small checklist basically
17:43
about things to consider for a cloud migration. Go for computer-oriented instances of GeoServer. Choose a migration strategy, either lift and shift or refactor.
18:01
Consider using services offered by the provider like managed databases and picturized storage for the purpose and the needs of your project. Here's another topic that is linked to the previous one
18:23
to Kubernetes, you can also use Helm to deploy your GeoServer cluster. There are some details here. It's basically a package manager for Kubernetes. So it easily packages all of your software
18:40
into a set of files that you can deploy in your Kubernetes cluster, allowing you to kind of template them and adapting to your environment. So it's pretty nice too. I advise you check it out. There are some resources available on the internet like Docker images. You can find the links in here
19:02
and the Helm chart is coming too as well. So we're working on it. Keep an eye on our blog and you will find an update over there. There's also free webinar. If you're interested in running GeoServer on Kubernetes specifically that I hosted about a month ago,
19:23
it's available for you to take a look at. Two words about logging and monitoring real quick. So it can be tricky in a cloud environment to keep an eye on everything. The environment is pretty much dynamic and distributed.
19:43
So you have instances starting and stopping. You have distributed instances of your application across nodes. So it can be hard to identify and debug problems in this environment. We have some tips for you.
20:02
So you should consider aggregating and centralizing all the logs to a single location that is easy to navigate and filter. So you don't have to go around all your nodes and to check out the logs of the application.
20:21
Keep in mind that nodes can spawn and also go away so a node can die if you keep your logs on a specific node and you don't ship it over to a central location, you run the risk of losing it. And set up shippers to collect and send out the logs to the central service.
20:41
Collect metrics, of course, very important. There are performance indicators like response time, throughput, uptime, error rate, and so on. And auditing. Auditing is a nice feature offered by an extension, a GeoServer extension called Monitor.
21:01
It basically tracks requests made to GeoServer and export the information about these requests into audit logs. Audit logs that you can collect and ingest and create pretty dashboards from. Here you can see an audit event example,
21:23
the kind of information you have in it, performance, layer, errors, and so on. And here's an example of a small dashboard with performance-related information that you can look at. Things like response time, slow layers,
21:43
cache hits or miss, and so on. A few more example, response time, IP of the requesters, and so on. And finally, alerting. So remember to set up alerts for your services.
22:05
So you're checking the services being up and down. You're checking for errors, error rate, out of memory errors, and so on. Remember to use different channels depending on the severity of the problem.
22:23
So if it's a problem that needs immediate assessment attention, then you should consider paging someone. If it's a less severe problem, maybe you can just send out an email and avoid waking someone in the middle of the night.
22:40
That would be nice. And then you can think about automates the fix to the problem. So you can put in place some scripts and tools to try to fix the problem for you without waking up anyone. Examples of these would be watchdogs and health checks. So you're kind of probing the service
23:01
and checking the response just to make sure it's healthy and eventually restart it if needed. Here you can find some useful links about what I've been talking about, the G-solutions website, the webinars,
23:24
cloud-optimized geotiffs at home. And that's all I had, Diego. So I think if there's any question for me. A moment, yes.
23:43
I have some questions. A moment, please. The first one is, where can the dashboards with data from audit logs be viewed?
24:02
So it's a web application. Specifically, the one that I was showing you is part of a stack of application, applications by Elastic through the elk stack. It's a web-based application called Kibana. So you can access it from the internet easily.
24:23
And you'll find all the dashboards and visualizations. The ones that I showed you are just a subset, of course. There's a lot of things that you can do and information that you can view depending on what you're interested in. So it could be business-related information,
24:42
some analytics for managers or people that want to know how the service is being used or could be metrics logs for operations and so on. Okay. Another one is how to use a log data to tune the application.
25:05
Do you have any use case to exemplify? Yes, that's a nice question. If we're talking about audit files, for instance, the kind of information you have in them
25:23
can be very relevant if you're looking for improvements in terms of performance. So you can extract information for the audit files about caching, for instance. So you can realize that you're not leveraging the cache as much as you think and try to change things.
25:45
So take a look at your client application if it's not configured properly to use the cache, for instance. Or you can find some outliers. So you can be very fine-grained with these tools
26:02
and find slow requests. And from them, try to reverse engineer and try to understand why they've been that slow and then fix the layer. Maybe it's a configuration issue. Maybe it's a styling issue. So yeah, yeah, it can definitely be used.
26:22
I would use Kibana for that. Okay, I have another one. What storage is a command for geoweb cache? Can you make use of S3 as blob store for the cache?
26:40
Yeah, yeah, yeah. That's another good question. There's no definitive answer. So it really depends on your use case. S3 can be used by means of a plugin available for your server. So GWCS3 plugin allows you to store cache tiles in S3.
27:04
And that would give you a very good scalability. So you would be able to support a ton of requests per second thanks to the scalability of the S3 service itself. There are other options, of course.
27:22
A file share like an NFS or similar you can use to share the cache tiles between all your instances. Maybe would have a lower latency in that case. So in terms of pure performance would be a bit faster. But it can be a bottleneck
27:41
if you have a very, very high load on your system. So if you're serving many, many requests. It's possible to use the memory cache of App Engine
28:00
in front of the yourself. It's my question. Not that I'm aware of. I may be missing something but I'm not aware of any way to use it directly, directly from your server.
28:21
Okay. Another one is, Giuseppe, Kubernetes is a great competition to the classical ArcG server. Have you also helped clients to move from AGS to Giuseppe
28:40
and make the revised rebuild approach? Yes, yes, that's something we have done a few times with different clients already. And we keep posting regularly in our blog about useful information on how to ease the transition.
29:03
So if you head over to the blog, you will find some relevant blog posts about the topic. And yeah, that's something we can do. It's, we can help out with revising and migrating your cluster to the cloud.
29:22
Okay. More one. When should I consider distributed Giuseppe? Is there a threshold of data usage where it is optimal? Should I consider just sharding the data source
29:43
and keep a single Giuseppe nodes? Yeah, another very good question. So if you're running a single instance of GeoServer, no matter how good is your data store and how scalable it is, at some point you will hit a bottleneck,
30:01
either at the machine level, operating system level, or in the GeoServer code itself. So that's why it's useful for production systems and systems that have high load to be able to scale out. And you would need to set up more GeoServer instances
30:22
to overcome these kinds of limitations, especially distributed across multiple nodes. That way they would not compete for resources on the same node, for instance, CPU. They would not try to get all the CPU to steal it from each other.
30:42
Okay, thanks Alessandro, there is no more questions. Would you like to talk more, anything? Thank you, I don't know.
31:05
We have the contact info in the slides that I've shared before. If you have any questions, any more questions, and you can reach out to us using that contact information or head over to our website.
31:22
We keep publishing interesting things in our blogs. Okay, you have more, one question. Yeah, is it advisable to use a single DWC in front of a scalable cluster of GeoServer nodes?
31:47
Okay, so good question. It wouldn't be highly available. So if you set up a cluster of GeoServer nodes so that they are highly available in case one of them dies,
32:00
if you set up a single GWC instance in front of GeoServer, then you're creating a single point of failure again. So if that node goes down, then your service goes down and that's not good. We typically recommend using the integrated GeoWeb cache into GeoServer
32:23
without setting up a dedicated node in front of them. Okay, it's the last. I would like to thank all the presenters and the audience.
32:42
And it's the end of the day for us. And see you all tomorrow. Okay, bye-bye. Thank you all. Thank you. Bye-bye. Ciao, ciao.