We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Croation SGA Geoportal-users management and usage analysis

00:00

Formal Metadata

Title
Croation SGA Geoportal-users management and usage analysis
Alternative Title
Users management, authorization and usage analysis on Croatian SGA Geoportal
Title of Series
Number of Parts
295
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
After initial deployment of Croatian SGA Geoportal, it made a great impact on the usage of GIS viewer and OGC web services in the country. After a few thoushands registered users, it's time to analyze who is using it and how much. Also, there was a need for the administrators to be able to have reported on usage, and to deal with users and groups authorization roles. We achieved this with the combination of proven solutions such as GeoServer, Geofence and custom code.
Keywords
129
131
137
139
Thumbnail
28:17
AuthorizationData managementMathematical analysisPresentation of a groupData managementGeometryConnectivity (graph theory)System administratorXML
Web serviceMathematical analysisFile viewerWeb 2.0Workstation <Musikinstrument>Open sourceWindows RegistryState of matterSurgeryCASE <Informatik>MereologySoftware
Spatial data infrastructureVertical directionScaling (geometry)Computer hardwareWeb serviceOpen setPhysical systemComputer networkProxy serverFile viewerStandard deviationCartesian coordinate systemBASIC-AWeb portalMaß <Mathematik>Quality of serviceDigital watermarkingType theoryImage registrationCore dumpStructural loadExplosionStatisticsInternetworkingAuthenticationDefault (computer science)LoginDatabaseUser profileGeometryMenu (computing)Client (computing)CalculusImage registrationConnectivity (graph theory)Web browserWeb serviceSystem administratorCartesian coordinate systemPhysical systemDigital photographySoftwareDatabaseFunktionalanalysisStandard deviationAuthenticationQuality of serviceMultiplication signDifferent (Kate Ryan album)Open sourceProduct (business)File viewerDefault (computer science)Web 2.0Level (video gaming)Front and back endsMereologyJava appletSemiconductor memoryKey (cryptography)Source codeType theoryUniform resource locatorLoginModal logicServer (computing)Figurate numberNumberGeometryStructural loadPlug-in (computing)Digital watermarkingState of matterData managementWeb portalEvent horizonClient (computing)INTEGRALMappingExecution unitSet (mathematics)Ring (mathematics)View (database)Workstation <Musikinstrument>Process (computing)Thomas BayesInsertion lossCuboidDrop (liquid)Fraction (mathematics)Transformation (genetics)Network topologyWater vaporDigitizingIn-Memory-DatenbankPower (physics)Design by contractMortality rateComputer hardwareOpen setAreaComputer animation
StatisticsData managementPasswordAuthenticationClient (computing)Coma BerenicesGroup actionView (database)Rule of inferenceBlogMathematical analysisData typeDean numberWeb serviceVolumeFile viewerArmStack (abstract data type)Proxy serverQuery languageRepresentational state transferSystem administratorLastteilungPasswordGroup actionAuthenticationStatisticsEmailWeb 2.0Rule of inferenceType theoryRange (statistics)Analytic setSlide ruleCartesian coordinate systemLageparameterFile viewerWeb serviceMathematical analysisLoginProcess (computing)Open setQuery languageAreaMultiplication signChemical equationHeat transferResultantConnected spaceConstructor (object-oriented programming)CuboidProxy serverRow (database)40 (number)Set (mathematics)Physical systemReal-time operating systemInsertion lossArc (geometry)Virtual machineDecision theoryBoss CorporationWorkstation <Musikinstrument>Server (computing)Presentation of a groupDifferential operatorView (database)Different (Kate Ryan album)Windows RegistryNetwork topologyLevel (video gaming)TouchscreenSpacetimeComputer animation
Forcing (mathematics)Execution unitDatabaseTraffic reportingMiniDiscData storage deviceWeightSubject indexingSoftware maintenanceMultiplication signOperator (mathematics)Bit rateNumberVirtual machineGame theoryServer (computing)WebsiteWorkstation <Musikinstrument>CASE <Informatik>Thomas BayesLink (knot theory)QuicksortWeb servicePhysical systemInsertion lossMetropolitan area networkComputer hardwareProcess (computing)Different (Kate Ryan album)Arithmetic meanError messageTelecommunicationSeries (mathematics)Client (computing)SoftwareSystem administratorMereologyInternet service providerIntegrated development environmentModal logic2 (number)MappingReal numberSpacetimeLastteilungLecture/Conference
Transcript: English(auto-generated)
Hi, everyone. It's three o'clock, so we should start with the presentation. My name is Dragan Padvesanets. I'm coming from a company called IGEA from Croatia. And today I will talk something about Croatian State Geodetic Administration GeoPortal,
especially about component, administration component, which deals with users' management and something about usage analysis through these years that we are running this GeoPortal.
So what it is, something about this GeoPortal, it provides anonymous and registered access to State Geodetic Administration datasets through web viewer and web services. It's an important part of Croatian spatial data infrastructure,
and it's a solution based on proven free and open source software such as Postgres, GeoServer, GeoWebcache, OpenLayers, Linux, HAProxy, and so on. So it's a very horizontal and vertical scalable system which was upgraded a few times
according to customer needs and according to rise of the usage in the country. It had a few major upgrades of backend, frontend software, hardware, and of course networking. There are two main components. One is the web viewer. You can visit it anytime.
It has some standard viewer functionality based on open source software such as OpenLayer, Cesium, Angular, and so on. Basic functionalities like panning, zooming, so on, and advanced functionalities like drawing, measuring, printing, and various other stuff.
Also login registration and so on. It also has 3D viewer based on Cesium so you can see some pretty stunning imagery there. There are also fast search through Elasticsearch where you can search for anything and so on.
Another main and now most important component are services. Through the years more and more institutions started to use WMS and WMTS and WFS services
because they are fast, they are reliable, and GeoPortal provides main stones for almost any GIS application in the country. It provides photo from various epochs, topographic maps, basemaps, administrative units, cadastral data, and so on. It is used by many other institutions, applications, and portals.
There is almost no portal or application that doesn't use some of the datasets from the GeoPortal. We are practically very important for all others. It had linear usage rising through the years.
WebViewer became maybe a less important component and services became more and more important component. Here are some facts and figures about GeoPortal. It is seven years in production. It had almost zero downtime unplanned. It has around 10,000 daily visitors through WebViewer.
Daily answers around 35 million requests. It has 4,000 registered users and it delivers monthly to various users around 15 terabytes of data. In peak times around noon there could be even more than 4,000 parallel requests to the system.
After two or three years and after the rise of the usage, some questions arise about the GeoPortal. Who and how is using the viewer and services?
Okay, you know you deliver a lot of data, but who is using it? How is using it? Why are they using it? Can we guarantee different quality of service for different types of users? Because it is not the same if you deliver to anonymous users or to service level agreement users. What are the most interesting areas for users?
What are they looking for? And how to manage user access for different types of data sets? So we decided with the state geodetic administration that we will go for the three user types. There are anonymous users, which have delivery of services according to available resources.
There are some watermarks on some of the data sets, but there is no registration necessary. You can use it with any client. User can also register it. If they are registered, they have some additional data sets. They have no watermarks and they have higher priority resources than anonymous users.
And there are service level agreement users, which must have guaranteed uptime and performance. And they get the highest priority resources. This is mostly done by load balancer, which has a different number of backend geo servers for different types of users.
So based on the user type, we decided which backend will serve this kind of user. Now something about authentication and registration. User registered on the GeoPortal frontend.
And after they registered, they get inserted into GeoServer database. There is a database plugin for GeoServer that reads users from database. Users get some authentication key and some default roles. And his authentication key is valid for eight hours after login.
So they can work for all day without any intervention. After eight hours, key is expired. Key is stored in memory database. And they must login again to reactivate the key.
So this is a workflow of some requests. So when request comes to the backend, this is some Java written software. We first check if the request has a key inside it. If it doesn't, then it is anonymous user.
So we just forward it to the backend. If there is a key, we check if it is valid. If it is valid, then we check what is the type of the user from the database. So we know if it is authenticated or if it is service level agreement user.
And based on this, we decide what backend will use to serve this user. So this is why we chose this authentication. Because while we were considering what to use,
we wanted to go for the simplest possible solution that will work in any client. So you just paste the authentication key to the end of getCapabilities URL. And it will work in any client, in AutoCAD, in ArcGIS, QGIS, any web client.
So there is nothing else necessary. You don't need to enter user name and password. It is just your authentication key. Something about user administration. It is a backend. It is an administration application for the administrator of the GeoPortal. Here he can see some statistics about total users, about registered users,
about how many there are now concurrent users on the system, how many registered users are using, and so on. Administrator can delete users manually, add new user, reset password,
if for some reason user cannot reset on its own. And also he can add user to different role. There are also, okay here, user names are blurred because of GDPR.
So here administrator can create some predefined groups and add or remove user to certain group. This means that it is quite easy to manage access for some of the data set. In fact, this is a user-friendly integrated geofence administration
via geofence REST API. Where administrator can add, remove, or edit geofence rules. And in the slide before, he can add user to this certain group. So for example, if you want to give WAFS access to some of your data set,
you will create a group for this and then just click and add users that you want to this group. And then when someone goes to get capabilities based on his authentication,
he will get according get capabilities XML. For example, if you have access for some data set, you will have it in your get capabilities. If you don't, then you will get blank get capabilities. Something about usage analysis.
Daily load balancer HAProxy creates about 20 gigabytes of request logs, which are delivered nightly to separate machine, which is dedicated for request analysis. Every request gets analyzed, so what we analyze is the layer requested,
bounding box from the request service, type of service, WMS, WMTS, WFS, and so on. End user is there. If it's not, then it's anonymous request. Analytics summary, which is done through the night, is then stored into database,
and then we can see some statistics about it next day. So in the administration application, administrator can see summary statistics for all users, how many requests there were, and how many traffic there was last day.
Also, he can pick the time range and get summary statistics for some date range. There are also detailed statistics per workspace, per type of service, per layers, and also we inject some headers from the web viewer
so we know that this request came from a web viewer, so we can also here distinguish the requests that came from a web viewer from those requests that came from services. There is also layers analysis.
There is summarized statistics for all layers, so we can see how many requests there was for every layer, and also we have detailed statistics per type of services for this layer and per users. There are also services analysis. There are summarized statistics for all type of services,
and we can also have detailed statistics per users, so we know top users for some services, so we can know who is using the most. Also, there is traffic analysis, which gives us summarized traffic analysis per day or per month,
so we can know how many megabytes or gigabytes we delivered, and we also have distinction from viewer and from services. At the end, there is summarized workspace analysis, where we see summarized statistics for all workspaces,
also have distinction between anonymous and registered users, and there are also detailed statistics per user, so we know top users for some of the workspace. There is also generated heat map, so we know what areas do users look at.
For every request, since we know the bounding box of the request, we can create a centroid, and then we insert this centroid into database, and of course then it's easy to create a heat map, so you know what are the most interesting areas.
On this screenshot, we see the example when there is end of highway, and there is probably some big interest about buying land, or maybe constructions work are here, so they made some plans, so they are using services in this area.
Now we are exploring future improvements on this analysis and usage monitoring, and we are very satisfied with some results we had from Elasticsearch Logstash Kibana stack.
We will try to connect HAProxy balancer directly to ELK stack, so we don't have to transfer logs and process them every night, because if we do this, we will have real-time monitoring,
and we will have visualizations in real-time to see how many requests are now, and so on. And also we will be able to make some instant analysis queries. And that's it for my presentation.
Thanks for your attention, and I'm open for questions. Hi, thank you.
First of all, I've got a few questions. How do you define users? I mean, not technically, but who are your users? Who are users? It can be anybody. For example, land service companies, institutions,
I don't know, environmental people, telecommunications, land agencies, real estate agencies, so a lot of users. Everyone who is doing something in the space.
Okay, great. How do they publish to you? Do they publish from ArcMap or from QGIS or both? How do they publish? Where do they store their services? I'm like a land authorisation,
and I want to publish my services, and I want to show up in your site. How do they do that? That's not meant to be a site for publishing data from the others, but just to serve the data.
You're saving the links, you're storing the links, you don't store the data, or you do? Yes, we do. You store the data? Yes, we store all data from geodetic administrations, all base maps, topographic maps. But you don't have the services? I don't understand. I want to publish, and I want to show up in your site.
How do I do it? I mean, if I'm insured, if I'm allowed. No, I don't think. The system is not meant to publish data from other institutions. It's just to deliver data from the geodetic administration
to anyone who is interested in their data. Maybe we can talk about it later. Can I ask a question? Yes, of course. How do you deal with large amounts of data? Is WMS service?
Yeah, this was necessary because, of course, we had problems, because there is not easy to serve 4000 HTTPS requests in a second. So now currently we always have some bottleneck. It will always have, since there is a rise of usage.
First there was disks, so they were upgraded to SSD storage and so on. Currently, HTTPS can be a problem, because it's really hard to deal with such a number of requests. But now we will go to the hardware load balancers
and maybe double or triple them to see. This is now the bottleneck, but it's not critical. We are seeing that it's going to be critical. So upgrades, upgrades all the time. Have you thought about having any data already pre-generated?
So there will be no live requests to GeoServer, to Postgres? Data are pre-generated, so most of the data is cached. But anyway, you must serve the request. If you have a request, you must give an answer.
This is good for users, because they don't care for the data. They don't have data on their own machines. Everybody knows that services are there, they are working, they are working fast, and they are reliable. But we must, of course, provide the services. So upgrades will be necessary every time.
This here is also a planned upgrade on the networking stack, because we are going near the 200 megabits in the critical times. So we will have to operate the uplink to 500 megabits at least. So just upgrades.
Very impressive. And your client is the Croatian government actually, right? Yes, we are doing maintenance of their system. So our client, but they are providing services for all who are interested in their data. Yes, that's the answer.
Thank you very much. You're welcome. Any more questions? You mentioned Elasticsearch a couple of times. I'd like to know what kind of searches do you offer to your users?
In other words, what data do you index in Elasticsearch? Anything. From administrative units, streets, numbers, house numbers, addresses, cadastral parcels, everything is indexed. Everything is indexed and we have refresh data from the land agency every night.
We get new data, differences, and then we re-index data. So probably every day you have fresh data and everything is indexed and searchable. So everything that is in database.
And that's almost everything from other administrative units and so on. Any more questions?
May I? May I? Yes, yes, of course. How do you control the quality of those datas?
That's not part of our job because we are just publishing what geodetic administration gives to us. So it's not users who see the errors. They just report directly to the geodetic administration, not to us.
We are just in charge. Providing actually the service of collecting. Thank you. No problem. Any last question?
Okay, thank you very much. Thank you.