We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Open Geospatial Data And Services Publication On The Cloud: The INGEOCLOUDS Open Source Approach

00:00

Formal Metadata

Title
Open Geospatial Data And Services Publication On The Cloud: The INGEOCLOUDS Open Source Approach
Title of Series
Number of Parts
95
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceNottingham

Content Metadata

Subject Area
Genre
Abstract
The cloud can be used as an infrastructure, as a platform or as a (desktop) software replacement according to the three different paradigms that it supports (IaaS, PaaS and SaaS). On the other hand at the moment more and more applications are using the cloud as their backend since it promises (unlimited) scalability and elasticity in terms of storage and computing power. In the open source geospatial world a lot of effort has been invested in developing excellent software that can be used to store, manage, visualize and publish on the web geospatial data and services. But when it comes to the cloud those offerings are not always readily available since the software, we all build, does not scale in a way that can take advantage of the cloud. In that respect we worked towards providing scalability and elasticity capabilities for the storage, querying and visualization of geospatial data based on existing open source solutions like the Mapserver, PostGIS, Apache and so on. We also worked on the lower part of the software stack so that we can build an elastic file system for storing geospatial data. So we are in the process of offering a fully open source solution that can take advantage of the cloud and its properties. Moreover we have coupled this solution with support for publishing anyone’s geospatial data as Linked Open Data so that they can be readily combined with other data on the web. In that respect we are using an open source SPARQL endpoint (Virtuoso) that allows us to store geospatially enabled information given that a suitable conceptual model will be provided described in RDF. Thus we allow for seamless integration of published data on the semantic web and we provide the necessary services for integrating this kind of offering in other applications in the future. Additionally we identified an emerging need to allow end users to publish their own data and create dynamically their own customized services on the cloud. Thus we exploit cloud’s “unlimited” storage capabilities to allow end users to publish their own data (as long as it is cost effective, too), combine them with existing data and create their own WMS/WFS customized services and publish them on the web. This has a great value-added for the users since they can actually publish their own maps. Finally, we demonstrate the capabilities of our technical solution by building and offering a set of advanced geophysical services through the platform. These services include a service for creating shakemaps (maps the visualize the effects caused by an earthquake to the environment), predicting landslides (providing maps assessing the possibility of landslides) and handling pollution information in ground waters. In conclusion, we offer an open source software stack that is based on existing open source software and extends it as needed in order to take to the most possible advantage of the properties of the cloud. We have tried to keep the software agnostic for the specific cloud and its capabilities. The work is carried out within the INGEOCLOUDS FP7 Project, co-funded by the EU, and with the participation of companies (AKKA technologies, France), research centers (CNR, Italy and FORTH, Greece) and data providers like geological surveys (GEUS, Denmark; GEO-ZS, Slovenia; BRGM, France and EKBAA, Greece) and earthquake research institutes (EPPO, Greece).
Open setWeb serviceOpen sourceCloud computingContext awarenessComputer fontMaxima and minimaInternet forumPointer (computer programming)Boundary value problemState of matterLatent heatNumberComputer networkTraffic reportingRule of inferenceInternet service providerTexture mappingWorld Wide Web ConsortiumFinitary relationMereologyDifferent (Kate Ryan album)Physical systemPlastikkarteBuildingGeometryVisualization (computer graphics)InformationDisintegrationSemantic WebLinked dataSoftwareArchitectureArchaeological field surveyPersonal digital assistantWater vaporDigital rights managementHazard (2005 film)User profileSet (mathematics)Decision theoryUniverse (mathematics)CASE <Informatik>Matching (graph theory)HypermediaMultiplication signSoftware developerDivision (mathematics)Lebesgue integrationPosition operatorInformation securityCartesian coordinate systemBitClosed setComputerPhysical systemProjective planeRule of inferencePhase transitionOrder (biology)Texture mappingGroup actionCondition numberGame theoryStatement (computer science)Process modelingAdditionBuildingExecution unitMappingObject (grammar)Context awarenessWeb serviceMoment (mathematics)Digital rights managementStandard deviationField (computer science)Term (mathematics)Flow separationSelf-organizationPrisoner's dilemmaPlastikkarteCloud computingInternet service providerAreaArchaeological field surveyIntegrated development environmentAdaptive behaviorOpen sourceForm (programming)Slide ruleShared memoryLevel (video gaming)Forcing (mathematics)Web 2.0Peer-to-peerPlanningPresentation of a groupWordINTEGRALSoftwareComputing platformPoint (geometry)Goodness of fitSpectrum (functional analysis)Computer fileComputer animation
Key (cryptography)Computer fontPlastikkarteCloud computingContext awarenessScaling (geometry)Channel capacityPerfect groupDigital rights managementSpherical capArchitectureWeb serviceVisualization (computer graphics)Independence (probability theory)Computer hardwareTotal S.A.BefehlsprozessorDatabaseSummierbarkeitComputing platformComputer networkPower (physics)Operations researchSystem programmingServer (computing)World Wide Web ConsortiumThresholding (image processing)Fundamental theorem of algebraScalabilityModul <Datentyp>Open setDatei-ServerPlanningStack (abstract data type)VolumeComponent-based software engineeringView (database)Server (computing)Cloud computingDatei-ServerTerm (mathematics)AuthorizationInternet service providerAuthenticationTotal S.A.Point (geometry)Digital rights managementWeb serviceModule (mathematics)Open setWeb 2.0Mainframe computerStructural loadPhysical systemCartesian coordinate systemService-oriented architectureNumberSampling (statistics)Operating systemProjective planeTexture mappingLinked dataSoftware frameworkDatabaseStaff (military)Adaptive behaviorFlow separationFigurate numberReduction of orderStatisticsCASE <Informatik>ComputerChannel capacityWordWhiteboardInformation securityResultantGroup actionRevision controlExistenceRow (database)Division (mathematics)AreaBitData storage deviceInsertion lossFamilyGoodness of fitPrice indexSet (mathematics)Metropolitan area networkString (computer science)MereologyCivil engineeringMaxima and minimaGame theoryProcess (computing)Disk read-and-write headFunction (mathematics)Standard deviationOrder (biology)Condition numberComputer animation
ArchitectureScalabilityCAN busManufacturing execution systemFocus (optics)Server (computing)Web serviceGeometryComponent-based software engineeringTexture mappingWorld Wide Web ConsortiumImplementationBasis <Mathematik>Integrated development environmentRule of inferenceMappingOvalMeasurementGame theoryFocus (optics)Electronic program guideMultiplication signPrisoner's dilemmaWeb serviceMappingCloud computingFile systemDatabaseBitRule of inferenceSoftware frameworkComputing platformTexture mappingMetadataServer (computing)WordWeb 2.0Combinational logicPhysical systemImplementationProjective planeLatent heatCartesian coordinate systemData managementDisk read-and-write headView (database)Message passingCondition numberMetropolitan area networkGroup actionPhysical lawProduct (business)TelecommunicationDivision (mathematics)MathematicsDigital rights managementComputerGrass (card game)NumberComputer animation
MappingFunctional (mathematics)Web serviceStructural loadView (database)Library catalogWeb serviceMappingView (database)Library catalogVolume (thermodynamics)Condition numberVirtual machineMathematicsOperator (mathematics)Different (Kate Ryan album)Arithmetic mean
Maxima and minimaFocus (optics)WritingMappingMetropolitan area networkPhysical systemImplementationDependent and independent variablesNumberPoint (geometry)WordOpen setFile systemComputer fileServer (computing)Term (mathematics)Different (Kate Ryan album)Open sourceComputer animation
ArchitectureReplication (computing)Parallel portReading (process)Operations researchServer (computing)Focus (optics)Structural loadScalabilityWeb serviceDatabaseVolumePhysical systemInstance (computer science)Internet service providerMachine visionExpert systemRight angleQuicksortPhysical systemBuildingRevision controlFlow separationWorkstation <Musikinstrument>PlanningState of matterWeb serviceMereologyServer (computing)DatabaseFile archiverSystem administratorSoftware frameworkInternet service providerComputer fileReplication (computing)LoginLastteilungComputer animation
Enterprise architectureWeb serviceInternet forumComputer networkSocial softwarePhysicsSoftware frameworkWikiBlogFocus (optics)Linked dataData storage deviceWorld Wide Web ConsortiumVector potentialConnected spaceSound effectOpen setSemantic WebWorkstation <Musikinstrument>Uniform convergenceProcess modelingGeometryUniform resource locatorImplementationHybrid computerInternetworkingCommunications protocolPhysical systemRevision controlWebDAVQuery languageFile formatDifferent (Kate Ryan album)MeasurementSample (statistics)Vulnerability (computing)MappingNumberServer (computing)Instance (computer science)Ewe languageChannel capacityBinary fileExecution unitDifferent (Kate Ryan album)Object (grammar)Mechanism designMetropolitan area networkSoftware testingHypothesisLocal ringProcess modelingWeb serviceINTEGRALBuildingBasis <Mathematik>Software developerPhysical systemMathematicsNominal numberPower (physics)Internet service providerMachine visionState of matterMultiplication signHypermediaProjective planeDigital rights managementInstance (computer science)Social classGoodness of fitImplementationRevision controloutputGene clusterTexture mappingComputer fileData storage device2 (number)PlanningVolume (thermodynamics)InformationWeb 2.0Data modelPublic domainPerspective (visual)State observerOpen sourceMereologyOpen setRelational databaseScaling (geometry)Computer animation
Digital rights managementElectric currentLinked dataWeb serviceComputing platformOpen setModule (mathematics)Public domainCross-correlationSample (statistics)ComputerAdventure gamePhysical systemMultiplication signTexture mappingSet (mathematics)Server (computing)Wave packetInternet service providerOpen setWeb serviceComputing platformWeb 2.0MappingForm (programming)BitBuffer solutionFaculty (division)Division (mathematics)TrailField (computer science)Insertion lossView (database)Group actionArchaeological field surveyCartesian coordinate systemComputer animation
Computer animationProgram flowchart
Transcript: English(auto-generated)
By the European Commission, it's not an R&D project. It's a pilot project, which is called in geoclouds. And as said by Chris, in geoclouds, one of the main issues has to be the use of open source software. Because as said by Chris, these tools are now mature.
And these are the most used tools in many organizations. And this is why we are speaking about this open source integration approach in this project. So I will structure my presentation on
these several aspects. I will remind about the context and the challenges we face in this project. I will give some words about cloud computing, even if I suppose everybody knows about it now. But maybe I will remind some basic principles and show how
we are dealing with this technology in this project. I will show you some achievements that I hope would interest you, and giving you some favor of what we are doing and the services we are going to provide. And I will speak about our plans and how you could be
also involved in further use and further development of the platform. So about the context, what is about? So the basis of the project is that many organizations and
public institutions in particular are faced with the needs of creating, sharing special environmental data. And this is what we are dealing with in this project. We are involving several partners from the geological world as a first step.
And they are facing the problem of publishing their data in several forms with maps, of course, but also providing download services, providing search services, et cetera. So this is what it's about. And the commission gives some funding for this project
because they are interested in assessing the cloud computing technology in doing this. And we have seen with Chris that basically it's a very interesting technology. And we are going to see how we put that in the RAN in the next slide.
So what was the statement? The basic statement is, I think, very well known to everybody. We have a big raise in data quantity, but also in quality. Now the data is produced by good institutions that are
professional in producing this data. The problem that they have is giving accessibility to this data and also sharing this data with peers on their work. So there is a gap here in the pace of
raising these several aspects. So the commission, you may be aware of that. Starting in 2007, a new set of recommendations grouped into the inspire level in order to improve
interoperability of data and also of services, trying to push and force public institutions in Europe, at least, to adapt common rules, technical rules, modeling rules.
It's more recommendation than, I would say, basic standards. But anyway, the goal is to have this public institution adapt these rules with a certain calendar going up to 2017. So there are legal obligations for this public
institution in every country for following these rules. And this is also what it's about. So what are the problems when you go to several public organizations and institutions? Of course, they all are dealing with web mapping
and producing maps with several approaches. Some are very advanced and use online tools. Some are just doing that with IT infrastructures. Some are not at all mature for doing that.
So each data provider basically use a particular application, its own application. It means then when you have to serve a bigger area of customers or at least of users of your data, you need to build a very efficient and reliable IT infrastructure
and discuss money, of course. Concerning INSPI, the issue is that the legal obligation are coming step by step until 2017, as said.
And what the problem and the situation they have, they usually dedicate some effort in producing INSPI compliant solutions, but trying not to touch too much about what they have already running. They don't want to make risk in the redefining completely their existing systems.
So they are confronted in a dilemma here. And in this project, we try to produce solutions for removing this dilemma. So quick ID cards just to know what we aim with this
project. So we build a cloud infrastructure for public agencies, especially in the environmental field. We try and publish geodata through advanced services.
So we hope they are assessed as advanced. One of the big points in the project is also to assess the move of existing data sets and services from several institutions to the new infrastructure. So we have a big panel of institutions.
Some of them are very mature in terms of IT and web mapping. Some are just working yet with Excel files. So we have a good spectrum of users here for the infrastructure. We want them to fulfill their inspired obligation, and of
course, perhaps to help in the interoperability of the data. So basically, the consortium involves geological surveys. As I said, this is the main topic we are dealing with at the moment about geoscience, groundwater management as your main use cases.
And the data sets are published under these thematics. And we have three ICT organizations with several skills. Where we are, where we started in 2012, and we are finishing next year in July, we are about to open a pilot
tool that will be usable by all of you. Just some words about cloud computing, as said by Chris, I think everybody has heard about that and read a lot of things about that. I just remind the basic principles that we use to
sell, to discuss. We use that to discuss with our partners just to convince them and to join the project at the beginning. Because it was not an evidence for them that cloud computing can be of interest. So basically, this is the well-known figures of the
promises of cloud computing in terms of capacity adaptation. Of course, you don't have to buy your servers on very successful steps. You rely on flexibility and elasticity of the cloud for
achieving your needs. So basically, cloud computing is based on the merging, the convergence of several thematics, virtualization, pilot computing, and service-oriented architecture. So you build these kind of virtual stats. We will see how it adapts, how it is in our case.
Of course, the promises of cloud computing are cost reductions. This is the main selling point when coming to public organization as well. So we have to prove in this project also that the
total cost of ownership is going to be reduced just because they will be able to just pay what they are using. It's not only about money, of course. What we promise to public institutions is, of course, a
large computing power, theoretically without limits, except what's the money you have, of course. You will get up-to-date operating systems and technologies. This is also a very interesting item. And of course, ubiquitous access and supposed
quality of service. So basically, what does it bring in our project? We see that when we put online through Amazon Web Services, of course, several servers, the system load in
terms of a request tends to flatten a bit, depending on the number of servers you deploy. Of course, I know the basics of cloud computing. We will see how this has been
instantiated in our project. The main achievements in our projects are not only about cloud computing, even if I think it's a fundamental point for our users to have services for data management, elastic services, what we call elastic
services, database server, file server. Data publication modules is very important to comply with INSPI legal obligations. And we produce some modules that allow to do that
following our GC standards, of course. We introduce also an interesting point in terms of fostering interoperability and cross-usage of the data between the different institutions, which is based on the linked open data approach. We provide an API that allows data providers to manage
their own private environment, to manage their own users, and to manage authorization and authentication on their data and data sets. And we have samples application deployed in the
cloud, and that integrates smoothly in our architecture. So in terms of stack that we deployed in the cloud, it's not only about porting.
We see that it's not only porting an application on a server which is just running in the cloud instead of running in your own IT infrastructure. You have to think about that more globally by putting, let's say, cloud features in your own application, in the
different layers of your IT application in the operating system. So this is basically provided by a cloud infrastructure provider, a cloud service provider like Amazon, of course, but you have to put that for your special data storage, the map server that we are using, map server in
the project, and also on the web server. So we really have to build a scalable geospatial framework. This stack is composed of very well-known tools, of course, so I'll let you discover that.
What we have also is a portal which is based on Citus 2, which is developed by the French space agency in France, and where my company is also very active in. We, as I said, we are dealing with link open data
staff through the use of virtuoso. We provide full AER, and to provide full, it's not yet fully incorporated, AER mechanism, authorization, authentication, and accounting, which is very important. Of course, you have to be able to build and to get a
clear view on the usage of your service, of your resources, a generalized REST architecture, and a secure API. So this is basically the schema, but I won't go into the detail. We try to be as much as possible agnostic from the
cloud computing platform, even if it's very difficult indeed. And we must admit that Amazon is 10 years ahead from competition in providing some very interesting services. But nevertheless, we built, on top of that, we have
full data management features, data publication, management, portal. OK, we're going to have some focus about that. What is important is the keywords for usage of our platform by data providers, scalability, specificity, on demand.
And to have measures about how their services or their data are used. If we go about the map application, as I said, I'm not a giant guy, so I want to spend a lot of time on about that. But basically, we build a kind of elastic geospatial server
based on our elastic file system and database. So we trained through that to answer this dilemma by, let's say, putting some new rules in the game for the
usage of the system. The Inspire implementation is driven by the user's requirements and not by following, step by step, the recommendation, the technical guides of the commission. We want to combine map publication and Inspire legal
requirements. So when you build maps, you always have access to publishing also WMS, WFS services, and also putting metadata in the catalog for that. OK, it seems interesting, of course.
And we want also to simplify the IT challenges. As I said, we have partners in the project that were still working with Excel sheets. And of course, for them, IT is a foreign word. So the project for the web mapping, we change a bit that.
From most of our partners, we had a homemade IT infrastructure for dissemination and publishing also Inspire services. Now we go to the cloud. Concerning the specific web mapping applications that
were used by every partner, now we give common framework for producing my maps and the metadata that goes together with that. We don't have dedicated Inspire services because they are now built in the system and very easily
generated by the system. So the different steps in publishing the maps are covered with possibilities of creating layers and tagging the data sets, defining services, view and download
services, which are the main services required by the Inspire recommendations, and also publishing a catalog for accessing the service, retrieving the services. So this is the kind of maps that are produced through that.
As I said, as an IT guy, I will speak about some implementation issues we faced and the use of open source tools for solving that. One main base is the file system, of course, which is said to be elastic.
And we based our solution in gloss surface, which offers very interesting performance and APIs. We are quite happy with that. In terms of throughput and response to different file system constraints, it answers quite well, depending on the
number of servers that we deploy. We also built up a framework based on Postgres.js for building an elastic database server.
So relying on the elastic file system, which is where the streaming replication is used from the latest versions of Postgres, archive logs are stored in the elastic FS. We have a master for dealing with every update operation,
and several slaves that can be elastically deployed or reused, depending on the charge on our databases. On top of that, we have a PG pool for load balancing. And this PG pool is also put as a reliable installation
with a watchdog system that is able to assess whether a PG pool server is still alive and eventually switch to another one.
OK, for manipulating that and administrating the system, we have developed a complete comprehensive RPR with some examples here about database creation. We have administration services for creating new data
providers that would be using the system where we can define and administrate the users, the different databases that they have. And also export and port services and monitoring, which is very important, as said, because if you advertise a
cloud for your users, you have also to be able to make them pay if they are using your resources cloud-wise. So they pay what they are using. So quite an important part in the system is also the
approach we have in thinking about reusing the data at a scale through the use of the linked open data approaches. So it's based on the web of data that wants to solve the
problem of data silos that everybody knows. It's really about information integration and being able to use data from different domains and being able to interconnect this data and to interpret this data
differently by taking different perspectives on the data. So this is a very important aspect in our project. We speak about scientific data in our project. We want to be able to integrate and describe this data in a uniform way.
So we produced a geoscientific observation model, G-Zone, which has an RDFS implementation in the triple store, which is based on Virtuoso, and we provide linked data access and manipulation API on this data.
Virtuoso has been chosen because of its open source version that has a very good set of features. Unfortunately, not yet GeoSparkQL support, but apparently they are working on that. And you can easily build the clusters of Virtuoso and also
using pre-built instances in the cloud, in Amazon Cloud. Basically, you take your different input files, and you have mapping mechanisms from your relational database to the RDF triple store using R2 RML standard, where it's
XML files where you just describe your mapping. You do the same for any kind of file. So this is just to show how the model is structured. So each concept on the semantic model is also
documented and well described for other institutions that would like to use the same model to integrate the data. And of course, this model is also mapped to the data models that Inspire requires in the different fields, for
example, in groundwater management. So what we have now is a pilot one running on the web. Pilot two is about to open at the end of this month. We need, nevertheless, to work yet on the Inspire compliance, which is not fully achieved. The objectives of the project are not yet fully achieved,
but we expect until July next year to do that. One of our main topics is to be able to assess more finally, in a more fine-grained level, the
performance of our system and manage the cost of the system, the overall cost of the system. We are also a big activity in assessing how the data providers perceive the use of the system with their data sets. And we are building with them a kind of business plan where
we assess the different possibilities for providing free services and also against these services, depending on the volumes of data, volumes of usage, et
cetera. To illustrate the kind of challenges we have in terms of optimization, these are the requirements from Inspire concerning the WMS services for get maps. With the small Amazon instances, we are not
answering the requirements, which requires up to 20 requests per second support. So we reached six only. If we go to a large Amazon instance, we reach 50. So this is the kind of tuning we still have to do.
We have very big work yet going on on the accounting, because we want to preserve the data providers ownership on the data sets and on the access to the service and to be able to give them the possibility to control that.
So we are also giving the possibility to do some open trials for data providers by inviting them to use the platform and eventually to push data in the cloud to develop linked data features, of course, which are very
promising. And also to integrate, we are integrating some WPS services. So the pilot tool set is being cooked and will come out at the end of the month. We have planned a workshop in November in Brussels. If you come by, you will be welcome, where we will have
some hands-on on the system and some training. So don't hesitate to contact us if you are interested in knowing more about that. Thank you. Questions?
Sorry, it was crystal clear. It's a more historical fact, because one of our partners is the BRGM, the French Geological Survey, which is
using maps extensively. And they were very much interested in deploying the web mapping application, which is called Carmen, which is known, I think, only in France. But it's quite powerful and self-service application, which is based on map server.
And it appeared also that most of our partners were also using map server. That's why we started building our system on that. But you're welcome. Thanks very much. I think it's the right time for half an hour and get
some service again.