We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

GeoHealthCheck - QoS Monitor for Geospatial Web Services

00:00

Formal Metadata

Title
GeoHealthCheck - QoS Monitor for Geospatial Web Services
Title of Series
Number of Parts
351
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Keeping (OGC) Geospatial Web Services up-and-running is best accommodated by continuous monitoring: not only downtime needs to be guarded, but also whether the services are functioning correctly and do not suffer from performance and/or other Quality of Service (QoS) issues. GeoHealthCheck (GHC) is an Open Source Python application for monitoring uptime and availability of OGC Web Services. In this talk we will explain GHC basics, how it works, how you can use and even extend GHC (plugins). There is an abundance of standard (HTTP) monitoring tools that may guard for general status and uptime of web services. But OGC web services often have their own error, "Exception", reporting not caught by generic HTTP uptime checkers. For example, an OGC Web Mapping Service (WMS) may provide an Exception as a valid XML response or in a error message written "in-image", or an error may render a blank image. A generic uptime checker may assume the service is functioning as from those requests and an HTTP status "200" is returned. Other OGC services may have specific QoS issues that are not directly obvious. A successful and valid "OWS GetCapabilities" response may not guarantee that individual services are functioning correctly. For example an OGC Web Feature Service (WFS) based on a dynamic database may return zero Features on a GetFeature response caused by issues in an underlying database. Even standard HTTP checkers supporting "keywords" may not detect all failure cases in OGC web services. Many OGC services will have multiple "layers" or feature types, how to check them all? What is needed is a form of semantic checking and reporting specific to OGC services! GeoHealthCheck (GHC) is an Open Source (MIT) web-based framework through which OGC-based web services can be monitored. GHC is written in Python (with Flask) under the umbrella of the GeoPython GitHub Organization. It is currently an OSGeo Community Project. GHC consists of two parts: (1) a web-UI app (using Flask) through which OGC service endpoint URLs and their checks can be managed, plus for visualising monitoring-results and (2) a monitoring engine that executes scheduled "health-checks" on the OGC service endpoints. Both parts share a common database (via SQLAlchemy, usually SQLite or PostgreSQL). The database also stores all historic results, allowing for various forms of reporting. GHC is extensible: at this moment of writing a plugin-system is developed for "Probes" in order to support an expanding number of cases for OGC specific requests and -checks. Work is in progress to provide a GHC API for various integrations. Links: - Website: geohealthcheck.org - Sources: github.com/geopython/GeoHealthCheck - Demo: geohealthcheck.osgeo.org
Keywords
202
Thumbnail
1:16:05
226
242
Basis <Mathematik>ReliefRaster graphicsWeb serviceCartesian coordinate systemSystem callProjective planeWeb 2.0Point (geometry)SurfaceMereologyTesselationQuality of serviceComputer animationJSON
Basis <Mathematik>Exception handlingRevision controlData typePhysical systemError messageMessage passingComputer-generated imageryFile formatSurfaceWeb 2.0Error messageComputer fileWeb serviceDatabaseMedical imagingException handlingTesselationMultiplication signImplementationPoint (geometry)Traffic reportingWell-formed formulaMetadataTable (information)Web applicationMusical ensembleServer (computing)Goodness of fitOpen sourceXMLComputer animationUML
Host Identity ProtocolSelf-organizationMeta elementSoftware testingFrequencyWeb serviceUniform resource locatorType theoryMultiplication signLibrary catalogInterface (computing)Data typeAverageInterior (topology)Dependent and independent variablesMessage passingServer (computing)Process (computing)Instance (computer science)Computer fileHeat transferCommunications protocolFile Transfer ProtocolQuery languageParameter (computer programming)Duality (mathematics)Group actionString (computer science)Validity (statistics)Open sourceContent (media)Military operationRevision controlException handlingSingle-precision floating-point formatFile formatComputer-generated imageryPlot (narrative)Total S.A.OctahedronError messageEmailMusical ensembleDemo (music)Dependent and independent variablesGeneric programmingInternetworkingWeb serviceMotion capturePoint (geometry)Projective planeWeb pageContext awarenessException handlingWeb 2.0Propositional formulaSlide ruleMultiplication signLink (knot theory)EmailResultantSeries (mathematics)Text editorCASE <Informatik>Plug-in (computing)Message passingComputer configurationComputer fileEndomorphismenmonoidInstance (computer science)Moment (mathematics)Revision controlType theoryPhysical systemGame theory1 (number)Parameter (computer programming)Validity (statistics)Image registrationQuality of serviceSoftware testingFlow separationWeb applicationMultilaterationComputer animationDiagramJSONXMLProgram flowchart
Parameter (computer programming)Configuration spaceData typePlug-in (computing)Pole (complex analysis)Uniform resource locatorError messageDependent and independent variablesString (computer science)Single-precision floating-point formatSocial classImplementationNetwork topologyCodeRange (statistics)Default (computer science)Percolation theoryLocal GroupTime domainWeb servicePay televisionOpen sourceInstance (computer science)Point cloudBasis <Mathematik>BitMechanism designHookingDatabaseElectric generatorLatent heatDependent and independent variablesEndliche ModelltheorieSoftware testingWeb serviceDefault (computer science)Cartesian coordinate systemType theoryWeb 2.0System administratorTraffic reportingGraph (mathematics)QuicksortInformationRepresentational state transferScheduling (computing)Data modelDifferent (Kate Ryan album)User interfaceFrequencyData managementMereologyAutomationNumberIntegrated development environmentResultantRenewal theoryProgrammer (hardware)Bootstrap aggregatingProjective planeCodeForm (programming)Instance (computer science)Pay televisionRevision controlWeb applicationInstallation artQuality of serviceMultiplication signPhysical systemComputer architectureComputer programmingPlug-in (computing)Line (geometry)Content (media)WikiParameter (computer programming)Medical imagingFigurate numberComputer fileSet (mathematics)Variable (mathematics)Standard deviationEmailCuboidLink (knot theory)PlanningRange (statistics)INTEGRALLevel (video gaming)Musical ensembleBuildingUniform boundedness principleComputer animationProgram flowchartJSONXMLUML
Link (knot theory)Demo (music)WebsiteLink (knot theory)Presentation of a groupComputer animation
Transcript: English(auto-generated)
Okay, yes, thank you. So this talk will be about, yeah, GeoHealthCheck, a quality of surface monitor for geospatial web services. That's a whole mouthful. And I will be doing the first part and then Tom will take over.
So yeah, we have an agenda, but I will just start from here. Let's say we have what we call OGC OWS monitoring challenges, and I should make it even more broader when you have any kind of geospatial web service, you have monitoring challenges,
believe me. That's actually how I joined the project. So at some point your customer may call and say, I see pink tiles, vedo piastrela rosa,
something like that. Si? Or ixe rosa kajal. Probably many of you have seen this. Let's see if we can get this down. This is what we expect. Let's say it's an OpenLayers application with tiles, but this is actually what we
see. Has anyone seen this? Yeah. Never. Oh, good. But you have very quality web services. It happens a lot. But the point is here, it's not that OpenLayers is just, or the server is bringing in pink
tiles, it's a reaction because your JavaScript web application has received a well-formed exception report and it doesn't know what to do with it. Or it could be even an image error message.
And my point here is, or you could have an empty database, an empty table. Let's say a table is filled every night and it fails, and you get a beautiful white image. Probably you do some monitoring, like HTTP monitoring. But even if you get the exception report, you could get, let's say, a 200 answer as
it's called, which means, well, service is working. It goes further. In OWS, there is something called get capabilities, and it's the metadata of the service from the end point.
But I've seen many implementations where the capabilities document is even a static file. So there's no guarantee that specific requests will work, even if you get a valid capabilities file. And if you're familiar with OWS services like get map, get feature, you have a 200
layer WMS, how would you know that layer 173 is failing? And I'm just presenting some challenges here. It goes further if you have time-based services like sensor web enablements, like the source or sensor things API, you have a viewer, but you also have, let's say,
at some point you see gaps in the data. So something may be failing in the whole pipeline, which also means you need some kind of history capture, not just service up or down, but maybe something happened.
So there's public uptime services, like uptime robot, pingdom. There's generic HTTP checking. Maybe you can add some keywords. But if you have OWS and also the modern OTC API service or any service, even SRE services,
you need deeper inspection. And also many of these uptime services are public. You may have your services running on the intranet, so they cannot even be accessed by a monitor from outside. So the value proposition is we need OWS-aware web monitoring services.
We need quality of service checking, more formally called, and history capture. And, of course, we have an answer to that. So Tom has started the GOLcheck project, I think, already around 2014, and I give
a quick tour of the UI. Later we explain how it all works. So, yeah, Tom started on his way to Phosphor-G Portlands, and I joined the project later, and it's a GeoPython project, and there's a page with several of these Python projects.
I give a short tour. We start, there's a dashboard. It's a web application, and, yeah, we can improve somewhat on the graphics, but the idea is you have a dashboard in there, you see all your, yeah, resources.
That's a terminology we use, basically, resources and OWS endpoints. And you can see for each of the endpoints, the percentage is up or, well, succeeding
because it's not just up or down. There's several tests and ones that are failing at the moment. And what we see here are CSW services, but let's see if we can scroll down.
It's like a game. Here we see WMS, and we also have, let's say, history captured. So you can see over time when a service has failed, and you can even inspect how it has failed. So how do you operate this?
You can have a user registration enabled, or you can disable that. But basically, to add resources, you need to log in. And then basically the first thing is you choose from which kind of service you want to inspect.
And there's now, this is even an old slide, but there's now like 20 services that we support because each of these services is supported through a plug-in. So it's an extensive plug-in system. So you say, well, I want to add a resource, as we call it, and a resource has an endpoint.
In this case, it's WMS. You don't have to enter the capabilities file, just the link to the endpoint. And you can give it tags, but that's optional. Let's go down. And when you enter the endpoint, you immediately get into the editor.
And there you add what we call probes. A probe is basically a series of, in the end, it's a series of requests that you fire on the endpoints, and later on, you will check the results. And the first probe, every probe, it's also an extensible system.
There's here, let's say, get capabilities probe. And there's dependent on the type of endpoint, so this is WMS. There's further probes available. We'll see them shortly. And you can edit the probe.
And here, there's not so much to edit. You can enter the version, maybe. And a probe also has one or more checks. And each check is basically inspection of the response. So to get capabilities in this case, there's a check, is it valid XML?
If it's not valid XML, it's already flagged failed, is the response, does it contain an OWS exception? Or it should at least contain a title. And if all that's good, then this probe has succeeded.
But there's also probes for inspecting all the layers. So for instance, you can have a probe for a single layer, and there's some intelligence here that it requests all the layers, and then you can choose which layer and set some parameters. But you can even, I don't know if it's in the next slide.
Oh yeah, you can have here all 70 layers, for instance. So that's, let's see, go. So if something fails, what happens?
Usually you get, you can program it, you get an email. And from there you can inspect what has gone wrong. So you get the email that something has failed, and then you can go with the link to the specific resource. And you can see this has failed over time.
This is an ISRIC endpoint, I see, and there you can find out that finally it's some ECW roster file, which is maybe not readable or missing. And if it's fixed, you can get, you get the message back again
and that the service is running again. And that's in a nutshell how it operates. And you'll probably be very curious how does this work under the hood, and that's what Tom will. Great. Thanks, Jus.
Let's figure this out here. So how is this all put together? Basically when we developed GeoHealthCheck and as we developed GeoHealthCheck, we have a number of different parts. So one is the web application or the dashboard, that is the user interface, and that is used, we're using Flask
as well as Bootstrap for some of the UI things. Then there's a health check runner. Those are the, that's the actual machinery that's doing these health checks in the background. The user interface is showing the results of all the health checks that happen over time, sort of offline to inspect the quality of service
that your services are performing at. And we also have a database. It's interesting to mention that in the default GeoHealthCheck setup, we do give you a database. I think it's PostgreSQL, at least in the Docker setup, but we do support different types of databases. So you may already have a database in your environment
and you simply want to reuse that database for your GeoHealthCheck requirements. So that is possible, just the same. So yeah, just to visualize that, we have the runner, which is basically storing results into the database, and then the web application is basically displaying them.
So pretty simple architecture. Simple's good. And as I mentioned, we have a Flask web application. You can configure your checks and configure all of your services in the user application. We also have some machinery in behind for you to sort of pre-populate services
that you want to monitor if you have sort of an automated workflow or you can just use the default user interface. There's an admin user that comes with the application and then you can manage users for different services and so on. The runner, again, it's driven by a scheduler
and it can run with different frequencies for different services and the machinery provides all these reports so you can build out these graphs that you showed in the tour and the overview. We also have web hooks and notifications. So as Just mentioned, this is extensible and our default sort of notification system
is through email, but you can also set it up to maybe post information to other services. So it's all very extensible and we do support some defaults out of the box, which is emails, for example. The basic model is that we have a resource and a resource is registered to GeoHealthCheck
and it goes under, it gets put through a number of probes and the probes are the things that do the actual testing. So the deeper testing that Just was talking about that are required when you interrogate OGC web services or OGC API services increasingly. And I should mention that we are supporting
both the first generation and increasingly the OGC API specifications for doing health checking. And each probe in and of itself has a check, which analyzes the response content and gets into the details with regards to how to figure out whether something is actually working.
This is getting a little easier with the next generation of OGC API services, which are doing more native HTTP responses instead of getting a 200 response and having to go inside and do all the digging for what the response actually means. So this should get a little bit easier over time. And we have a plugin mechanism.
So you can develop your own plugins, you can hook them up to the UI and so on. So for example, in one project that I'm working on with WMO, which we're presenting tomorrow morning called WIS2 in a Box, we're building out a MQTT plugin for PubSub health checking.
And the database has multiple entities. I won't get into details here because we don't have too much time left. That's a basic overview of our data model. I'll turn it over to Joos in the interest of time to close us out. Okay. Okay, the installation. Well, it's a standard Python installation,
but what we recommend is using Docker. We have ready Docker images on Docker hub. And even I think we have Docker compose files. And there's some settings that you can do and you can also pass these as Docker variables. Like we said, you can extend your health check
with own plugins and I don't have the time to go in much detail, but it's fairly simple. Basically, you can have two types. There's a template type where you have pre-registered requests with only parameters
that you need to fill in or there's free form. You can do anything. That's what you probably need with MQTT. There was even some code here, but in the interest of time, this is the simplest program that you could write. Just HTTP requests. This is all the lines you would need.
And also for the check, let's say you want to check if there's in the four or 500 range. So it's quite easy to write probes. And for the sake of time, the roadmap. And one of the, I think, premier things we're planning ahead, there's already a specification on our wiki
is to have rest API architecture. So means you won't have to use the UI, but many, many companies have an automated system. So they want to, let's say, automatically populate geo-health check database.
And as you may notice that the UI is also ready for renewal. So any Vue ES programmers here or join us. And integration, various other tools. And yeah, this is also an invitation to join our project.
Oh yeah, shameless plug. If you don't want to install geo-health check yourself, there's also a hosted version provided with a small subscription fee. So you don't have the hassle of upgrading and maintaining an instance yourself. We have some links. This entire presentation is online.
You can also find it through my website or easiest to healthcheck.org. And on behalf of Tom and I, and even Hannes Reuter is here, we thank you.