Open Geospatial Data And Services Publication On The Cloud: The INGEOCLOUDS Open Source Approach
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 95 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/15565 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Nottingham |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G Nottingham 201365 / 95
17
25
29
31
32
34
48
50
56
58
68
69
70
82
89
91
00:00
Open setWeb serviceOpen sourceCloud computingContext awarenessComputer fontMaxima and minimaInternet forumPointer (computer programming)Boundary value problemState of matterLatent heatNumberComputer networkTraffic reportingRule of inferenceInternet service providerTexture mappingWorld Wide Web ConsortiumFinitary relationMereologyDifferent (Kate Ryan album)Physical systemPlastikkarteBuildingGeometryVisualization (computer graphics)InformationDisintegrationSemantic WebLinked dataSoftwareArchitectureArchaeological field surveyPersonal digital assistantWater vaporDigital rights managementHazard (2005 film)User profileSet (mathematics)Decision theoryUniverse (mathematics)CASE <Informatik>Matching (graph theory)HypermediaMultiplication signSoftware developerDivision (mathematics)Lebesgue integrationPosition operatorInformation securityCartesian coordinate systemBitClosed setComputerPhysical systemProjective planeRule of inferencePhase transitionOrder (biology)Texture mappingGroup actionCondition numberGame theoryStatement (computer science)Process modelingAdditionBuildingExecution unitMappingObject (grammar)Context awarenessWeb serviceMoment (mathematics)Digital rights managementStandard deviationField (computer science)Term (mathematics)Flow separationSelf-organizationPrisoner's dilemmaPlastikkarteCloud computingInternet service providerAreaArchaeological field surveyIntegrated development environmentAdaptive behaviorOpen sourceForm (programming)Slide ruleShared memoryLevel (video gaming)Forcing (mathematics)Web 2.0Peer-to-peerPlanningPresentation of a groupWordINTEGRALSoftwareComputing platformPoint (geometry)Goodness of fitSpectrum (functional analysis)Computer fileComputer animation
07:54
Key (cryptography)Computer fontPlastikkarteCloud computingContext awarenessScaling (geometry)Channel capacityPerfect groupDigital rights managementSpherical capArchitectureWeb serviceVisualization (computer graphics)Independence (probability theory)Computer hardwareTotal S.A.BefehlsprozessorDatabaseSummierbarkeitComputing platformComputer networkPower (physics)Operations researchSystem programmingServer (computing)World Wide Web ConsortiumThresholding (image processing)Fundamental theorem of algebraScalabilityModul <Datentyp>Open setDatei-ServerPlanningStack (abstract data type)VolumeComponent-based software engineeringView (database)Server (computing)Cloud computingDatei-ServerTerm (mathematics)AuthorizationInternet service providerAuthenticationTotal S.A.Point (geometry)Digital rights managementWeb serviceModule (mathematics)Open setWeb 2.0Mainframe computerStructural loadPhysical systemCartesian coordinate systemService-oriented architectureNumberSampling (statistics)Operating systemProjective planeTexture mappingLinked dataSoftware frameworkDatabaseStaff (military)Adaptive behaviorFlow separationFigurate numberReduction of orderStatisticsCASE <Informatik>ComputerChannel capacityWordWhiteboardInformation securityResultantGroup actionRevision controlExistenceRow (database)Division (mathematics)AreaBitData storage deviceInsertion lossFamilyGoodness of fitPrice indexSet (mathematics)Metropolitan area networkString (computer science)MereologyCivil engineeringMaxima and minimaGame theoryProcess (computing)Disk read-and-write headFunction (mathematics)Standard deviationOrder (biology)Condition numberComputer animation
14:34
ArchitectureScalabilityCAN busManufacturing execution systemFocus (optics)Server (computing)Web serviceGeometryComponent-based software engineeringTexture mappingWorld Wide Web ConsortiumImplementationBasis <Mathematik>Integrated development environmentRule of inferenceMappingOvalMeasurementGame theoryFocus (optics)Electronic program guideMultiplication signPrisoner's dilemmaWeb serviceMappingCloud computingFile systemDatabaseBitRule of inferenceSoftware frameworkComputing platformTexture mappingMetadataServer (computing)WordWeb 2.0Combinational logicPhysical systemImplementationProjective planeLatent heatCartesian coordinate systemData managementDisk read-and-write headView (database)Message passingCondition numberMetropolitan area networkGroup actionPhysical lawProduct (business)TelecommunicationDivision (mathematics)MathematicsDigital rights managementComputerGrass (card game)NumberComputer animation
17:44
MappingFunctional (mathematics)Web serviceStructural loadView (database)Library catalogWeb serviceMappingView (database)Library catalogVolume (thermodynamics)Condition numberVirtual machineMathematicsOperator (mathematics)Different (Kate Ryan album)Arithmetic mean
18:14
Maxima and minimaFocus (optics)WritingMappingMetropolitan area networkPhysical systemImplementationDependent and independent variablesNumberPoint (geometry)WordOpen setFile systemComputer fileServer (computing)Term (mathematics)Different (Kate Ryan album)Open sourceComputer animation
19:08
ArchitectureReplication (computing)Parallel portReading (process)Operations researchServer (computing)Focus (optics)Structural loadScalabilityWeb serviceDatabaseVolumePhysical systemInstance (computer science)Internet service providerMachine visionExpert systemRight angleQuicksortPhysical systemBuildingRevision controlFlow separationWorkstation <Musikinstrument>PlanningState of matterWeb serviceMereologyServer (computing)DatabaseFile archiverSystem administratorSoftware frameworkInternet service providerComputer fileReplication (computing)LoginLastteilungComputer animation
21:15
Enterprise architectureWeb serviceInternet forumComputer networkSocial softwarePhysicsSoftware frameworkWikiBlogFocus (optics)Linked dataData storage deviceWorld Wide Web ConsortiumVector potentialConnected spaceSound effectOpen setSemantic WebWorkstation <Musikinstrument>Uniform convergenceProcess modelingGeometryUniform resource locatorImplementationHybrid computerInternetworkingCommunications protocolPhysical systemRevision controlWebDAVQuery languageFile formatDifferent (Kate Ryan album)MeasurementSample (statistics)Vulnerability (computing)MappingNumberServer (computing)Instance (computer science)Ewe languageChannel capacityBinary fileExecution unitDifferent (Kate Ryan album)Object (grammar)Mechanism designMetropolitan area networkSoftware testingHypothesisLocal ringProcess modelingWeb serviceINTEGRALBuildingBasis <Mathematik>Software developerPhysical systemMathematicsNominal numberPower (physics)Internet service providerMachine visionState of matterMultiplication signHypermediaProjective planeDigital rights managementInstance (computer science)Social classGoodness of fitImplementationRevision controloutputGene clusterTexture mappingComputer fileData storage device2 (number)PlanningVolume (thermodynamics)InformationWeb 2.0Data modelPublic domainPerspective (visual)State observerOpen sourceMereologyOpen setRelational databaseScaling (geometry)Computer animation
25:59
Digital rights managementElectric currentLinked dataWeb serviceComputing platformOpen setModule (mathematics)Public domainCross-correlationSample (statistics)ComputerAdventure gamePhysical systemMultiplication signTexture mappingSet (mathematics)Server (computing)Wave packetInternet service providerOpen setWeb serviceComputing platformWeb 2.0MappingForm (programming)BitBuffer solutionFaculty (division)Division (mathematics)TrailField (computer science)Insertion lossView (database)Group actionArchaeological field surveyCartesian coordinate systemComputer animation
28:20
Computer animationProgram flowchart
Transcript: English(auto-generated)
00:00
By the European Commission, it's not an R&D project. It's a pilot project, which is called in geoclouds. And as said by Chris, in geoclouds, one of the main issues has to be the use of open source software. Because as said by Chris, these tools are now mature.
00:24
And these are the most used tools in many organizations. And this is why we are speaking about this open source integration approach in this project. So I will structure my presentation on
00:43
these several aspects. I will remind about the context and the challenges we face in this project. I will give some words about cloud computing, even if I suppose everybody knows about it now. But maybe I will remind some basic principles and show how
01:02
we are dealing with this technology in this project. I will show you some achievements that I hope would interest you, and giving you some favor of what we are doing and the services we are going to provide. And I will speak about our plans and how you could be
01:23
also involved in further use and further development of the platform. So about the context, what is about? So the basis of the project is that many organizations and
01:42
public institutions in particular are faced with the needs of creating, sharing special environmental data. And this is what we are dealing with in this project. We are involving several partners from the geological world as a first step.
02:02
And they are facing the problem of publishing their data in several forms with maps, of course, but also providing download services, providing search services, et cetera. So this is what it's about. And the commission gives some funding for this project
02:24
because they are interested in assessing the cloud computing technology in doing this. And we have seen with Chris that basically it's a very interesting technology. And we are going to see how we put that in the RAN in the next slide.
02:42
So what was the statement? The basic statement is, I think, very well known to everybody. We have a big raise in data quantity, but also in quality. Now the data is produced by good institutions that are
03:01
professional in producing this data. The problem that they have is giving accessibility to this data and also sharing this data with peers on their work. So there is a gap here in the pace of
03:21
raising these several aspects. So the commission, you may be aware of that. Starting in 2007, a new set of recommendations grouped into the inspire level in order to improve
03:44
interoperability of data and also of services, trying to push and force public institutions in Europe, at least, to adapt common rules, technical rules, modeling rules.
04:01
It's more recommendation than, I would say, basic standards. But anyway, the goal is to have this public institution adapt these rules with a certain calendar going up to 2017. So there are legal obligations for this public
04:23
institution in every country for following these rules. And this is also what it's about. So what are the problems when you go to several public organizations and institutions? Of course, they all are dealing with web mapping
04:43
and producing maps with several approaches. Some are very advanced and use online tools. Some are just doing that with IT infrastructures. Some are not at all mature for doing that.
05:03
So each data provider basically use a particular application, its own application. It means then when you have to serve a bigger area of customers or at least of users of your data, you need to build a very efficient and reliable IT infrastructure
05:26
and discuss money, of course. Concerning INSPI, the issue is that the legal obligation are coming step by step until 2017, as said.
05:41
And what the problem and the situation they have, they usually dedicate some effort in producing INSPI compliant solutions, but trying not to touch too much about what they have already running. They don't want to make risk in the redefining completely their existing systems.
06:03
So they are confronted in a dilemma here. And in this project, we try to produce solutions for removing this dilemma. So quick ID cards just to know what we aim with this
06:25
project. So we build a cloud infrastructure for public agencies, especially in the environmental field. We try and publish geodata through advanced services.
06:40
So we hope they are assessed as advanced. One of the big points in the project is also to assess the move of existing data sets and services from several institutions to the new infrastructure. So we have a big panel of institutions.
07:03
Some of them are very mature in terms of IT and web mapping. Some are just working yet with Excel files. So we have a good spectrum of users here for the infrastructure. We want them to fulfill their inspired obligation, and of
07:25
course, perhaps to help in the interoperability of the data. So basically, the consortium involves geological surveys. As I said, this is the main topic we are dealing with at the moment about geoscience, groundwater management as your main use cases.
07:43
And the data sets are published under these thematics. And we have three ICT organizations with several skills. Where we are, where we started in 2012, and we are finishing next year in July, we are about to open a pilot
08:03
tool that will be usable by all of you. Just some words about cloud computing, as said by Chris, I think everybody has heard about that and read a lot of things about that. I just remind the basic principles that we use to
08:21
sell, to discuss. We use that to discuss with our partners just to convince them and to join the project at the beginning. Because it was not an evidence for them that cloud computing can be of interest. So basically, this is the well-known figures of the
08:43
promises of cloud computing in terms of capacity adaptation. Of course, you don't have to buy your servers on very successful steps. You rely on flexibility and elasticity of the cloud for
09:01
achieving your needs. So basically, cloud computing is based on the merging, the convergence of several thematics, virtualization, pilot computing, and service-oriented architecture. So you build these kind of virtual stats. We will see how it adapts, how it is in our case.
09:26
Of course, the promises of cloud computing are cost reductions. This is the main selling point when coming to public organization as well. So we have to prove in this project also that the
09:43
total cost of ownership is going to be reduced just because they will be able to just pay what they are using. It's not only about money, of course. What we promise to public institutions is, of course, a
10:04
large computing power, theoretically without limits, except what's the money you have, of course. You will get up-to-date operating systems and technologies. This is also a very interesting item. And of course, ubiquitous access and supposed
10:22
quality of service. So basically, what does it bring in our project? We see that when we put online through Amazon Web Services, of course, several servers, the system load in
10:41
terms of a request tends to flatten a bit, depending on the number of servers you deploy. Of course, I know the basics of cloud computing. We will see how this has been
11:01
instantiated in our project. The main achievements in our projects are not only about cloud computing, even if I think it's a fundamental point for our users to have services for data management, elastic services, what we call elastic
11:22
services, database server, file server. Data publication modules is very important to comply with INSPI legal obligations. And we produce some modules that allow to do that
11:40
following our GC standards, of course. We introduce also an interesting point in terms of fostering interoperability and cross-usage of the data between the different institutions, which is based on the linked open data approach. We provide an API that allows data providers to manage
12:06
their own private environment, to manage their own users, and to manage authorization and authentication on their data and data sets. And we have samples application deployed in the
12:24
cloud, and that integrates smoothly in our architecture. So in terms of stack that we deployed in the cloud, it's not only about porting.
12:41
We see that it's not only porting an application on a server which is just running in the cloud instead of running in your own IT infrastructure. You have to think about that more globally by putting, let's say, cloud features in your own application, in the
13:02
different layers of your IT application in the operating system. So this is basically provided by a cloud infrastructure provider, a cloud service provider like Amazon, of course, but you have to put that for your special data storage, the map server that we are using, map server in
13:21
the project, and also on the web server. So we really have to build a scalable geospatial framework. This stack is composed of very well-known tools, of course, so I'll let you discover that.
13:42
What we have also is a portal which is based on Citus 2, which is developed by the French space agency in France, and where my company is also very active in. We, as I said, we are dealing with link open data
14:01
staff through the use of virtuoso. We provide full AER, and to provide full, it's not yet fully incorporated, AER mechanism, authorization, authentication, and accounting, which is very important. Of course, you have to be able to build and to get a
14:27
clear view on the usage of your service, of your resources, a generalized REST architecture, and a secure API. So this is basically the schema, but I won't go into the detail. We try to be as much as possible agnostic from the
14:43
cloud computing platform, even if it's very difficult indeed. And we must admit that Amazon is 10 years ahead from competition in providing some very interesting services. But nevertheless, we built, on top of that, we have
15:02
full data management features, data publication, management, portal. OK, we're going to have some focus about that. What is important is the keywords for usage of our platform by data providers, scalability, specificity, on demand.
15:22
And to have measures about how their services or their data are used. If we go about the map application, as I said, I'm not a giant guy, so I want to spend a lot of time on about that. But basically, we build a kind of elastic geospatial server
15:44
based on our elastic file system and database. So we trained through that to answer this dilemma by, let's say, putting some new rules in the game for the
16:02
usage of the system. The Inspire implementation is driven by the user's requirements and not by following, step by step, the recommendation, the technical guides of the commission. We want to combine map publication and Inspire legal
16:21
requirements. So when you build maps, you always have access to publishing also WMS, WFS services, and also putting metadata in the catalog for that. OK, it seems interesting, of course.
16:42
And we want also to simplify the IT challenges. As I said, we have partners in the project that were still working with Excel sheets. And of course, for them, IT is a foreign word. So the project for the web mapping, we change a bit that.
17:04
From most of our partners, we had a homemade IT infrastructure for dissemination and publishing also Inspire services. Now we go to the cloud. Concerning the specific web mapping applications that
17:21
were used by every partner, now we give common framework for producing my maps and the metadata that goes together with that. We don't have dedicated Inspire services because they are now built in the system and very easily
17:44
generated by the system. So the different steps in publishing the maps are covered with possibilities of creating layers and tagging the data sets, defining services, view and download
18:03
services, which are the main services required by the Inspire recommendations, and also publishing a catalog for accessing the service, retrieving the services. So this is the kind of maps that are produced through that.
18:22
As I said, as an IT guy, I will speak about some implementation issues we faced and the use of open source tools for solving that. One main base is the file system, of course, which is said to be elastic.
18:41
And we based our solution in gloss surface, which offers very interesting performance and APIs. We are quite happy with that. In terms of throughput and response to different file system constraints, it answers quite well, depending on the
19:05
number of servers that we deploy. We also built up a framework based on Postgres.js for building an elastic database server.
19:21
So relying on the elastic file system, which is where the streaming replication is used from the latest versions of Postgres, archive logs are stored in the elastic FS. We have a master for dealing with every update operation,
19:41
and several slaves that can be elastically deployed or reused, depending on the charge on our databases. On top of that, we have a PG pool for load balancing. And this PG pool is also put as a reliable installation
20:07
with a watchdog system that is able to assess whether a PG pool server is still alive and eventually switch to another one.
20:20
OK, for manipulating that and administrating the system, we have developed a complete comprehensive RPR with some examples here about database creation. We have administration services for creating new data
20:41
providers that would be using the system where we can define and administrate the users, the different databases that they have. And also export and port services and monitoring, which is very important, as said, because if you advertise a
21:01
cloud for your users, you have also to be able to make them pay if they are using your resources cloud-wise. So they pay what they are using. So quite an important part in the system is also the
21:22
approach we have in thinking about reusing the data at a scale through the use of the linked open data approaches. So it's based on the web of data that wants to solve the
21:42
problem of data silos that everybody knows. It's really about information integration and being able to use data from different domains and being able to interconnect this data and to interpret this data
22:01
differently by taking different perspectives on the data. So this is a very important aspect in our project. We speak about scientific data in our project. We want to be able to integrate and describe this data in a uniform way.
22:24
So we produced a geoscientific observation model, G-Zone, which has an RDFS implementation in the triple store, which is based on Virtuoso, and we provide linked data access and manipulation API on this data.
22:42
Virtuoso has been chosen because of its open source version that has a very good set of features. Unfortunately, not yet GeoSparkQL support, but apparently they are working on that. And you can easily build the clusters of Virtuoso and also
23:05
using pre-built instances in the cloud, in Amazon Cloud. Basically, you take your different input files, and you have mapping mechanisms from your relational database to the RDF triple store using R2 RML standard, where it's
23:26
XML files where you just describe your mapping. You do the same for any kind of file. So this is just to show how the model is structured. So each concept on the semantic model is also
23:41
documented and well described for other institutions that would like to use the same model to integrate the data. And of course, this model is also mapped to the data models that Inspire requires in the different fields, for
24:00
example, in groundwater management. So what we have now is a pilot one running on the web. Pilot two is about to open at the end of this month. We need, nevertheless, to work yet on the Inspire compliance, which is not fully achieved. The objectives of the project are not yet fully achieved,
24:22
but we expect until July next year to do that. One of our main topics is to be able to assess more finally, in a more fine-grained level, the
24:41
performance of our system and manage the cost of the system, the overall cost of the system. We are also a big activity in assessing how the data providers perceive the use of the system with their data sets. And we are building with them a kind of business plan where
25:07
we assess the different possibilities for providing free services and also against these services, depending on the volumes of data, volumes of usage, et
25:21
cetera. To illustrate the kind of challenges we have in terms of optimization, these are the requirements from Inspire concerning the WMS services for get maps. With the small Amazon instances, we are not
25:40
answering the requirements, which requires up to 20 requests per second support. So we reached six only. If we go to a large Amazon instance, we reach 50. So this is the kind of tuning we still have to do.
26:01
We have very big work yet going on on the accounting, because we want to preserve the data providers ownership on the data sets and on the access to the service and to be able to give them the possibility to control that.
26:21
So we are also giving the possibility to do some open trials for data providers by inviting them to use the platform and eventually to push data in the cloud to develop linked data features, of course, which are very
26:40
promising. And also to integrate, we are integrating some WPS services. So the pilot tool set is being cooked and will come out at the end of the month. We have planned a workshop in November in Brussels. If you come by, you will be welcome, where we will have
27:01
some hands-on on the system and some training. So don't hesitate to contact us if you are interested in knowing more about that. Thank you. Questions?
27:22
Sorry, it was crystal clear. It's a more historical fact, because one of our partners is the BRGM, the French Geological Survey, which is
27:41
using maps extensively. And they were very much interested in deploying the web mapping application, which is called Carmen, which is known, I think, only in France. But it's quite powerful and self-service application, which is based on map server.
28:00
And it appeared also that most of our partners were also using map server. That's why we started building our system on that. But you're welcome. Thanks very much. I think it's the right time for half an hour and get
28:22
some service again.