Scalability of GeoNetwork: Current Status and Future Directions
Formal Metadata
Title |
Scalability of GeoNetwork: Current Status and Future Directions
|
Title of Series | |
Part Number |
73
|
Number of Parts |
193
|
Author |
|
License |
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. |
Identifiers |
|
Publisher |
|
Release Date |
2016
|
Language |
English
|
Production Place |
Bonn
|
Content Metadata
Subject Area | |
Abstract |
In recent times, phenomenon such as the Internet of Things or the popularity of social networks, among others, have been responsible for an increase availability of sensor data and user generated content. To be able to ingest, store and analyze these massive volumes of information is a standing challenge that is no longer ignored. The data about this data is generally speaking, less of a problem, if we think for instance that trillions of sensor records, may share the same metadata record; for this reason catalogs have been less exposed to the challenges that took by storm the database community. Nevertheless, a large variety of datasets can also pose some performance challenges to traditional catalogs, and demand increase scalability. In this talk we will look at strategies for scaling GeoNetwork through load balancing, at its current limitations, and we will discuss potential improvements by adopting distributed search server technologies such as SOLR or ElasticSearch. On the database side, we will review the current database support, which is limited to ORM, and discuss the possibility of extending it to support NoSQL databases, which could be horizontally scaled, unleashing a new generation of metadata storage.
|
Keywords | GeoCat titellus |
Related Material

00:00
Mobile app
Group action
Direction (geometry)
Weight
State of matter
Bit
Scalability
Arm
Scalability
God
Voting
Computer animation
Software
Oval
Gamma function
01:29
Source code
Context awareness
Variety (linguistics)
Direction (geometry)
Source code
Length
User-generated content
Set (mathematics)
Term (mathematics)
Limit (category theory)
Scalability
Scalability
Number
Number
Uniform resource locator
Computer animation
Personal digital assistant
Series (mathematics)
Quicksort
Task (computing)
Writing
02:34
Point (geometry)
View (database)
Set (mathematics)
Library catalog
Price index
Limit (category theory)
Scalability
Metadata
Number
Order (biology)
Query language
Process (computing)
Information
Search engine (computing)
Information
Point (geometry)
Data storage device
Database
Library catalog
Limit (category theory)
Scalability
Subject indexing
Computer animation
Software
Hard disk drive
Vertical direction
03:51
Order (biology)
Computer animation
Meeting/Interview
Point (geometry)
Query language
Representation (politics)
Library catalog
Price index
Information
Search engine (computing)
04:25
Slide rule
Structural load
Point (geometry)
Library catalog
Price index
Core dump
Coma Berenices
Scanning tunneling microscope
Order (biology)
Wiki
Computer animation
Query language
Information
Search engine (computing)
04:46
Web page
Slide rule
Structural load
Relational database
Virtual machine
Sound effect
Coma Berenices
Database
Instance (computer science)
Limit (category theory)
Sequence
Subject indexing
Computer animation
Library (computing)
05:29
Area
Workload
Wiki
Computer animation
Structural load
High availability
Coma Berenices
Lastteilung
Parameter (computer programming)
06:32
Presentation of a group
Computer animation
Meeting/Interview
06:57
Computer file
Structural load
1 (number)
Virtual machine
Price index
Library catalog
Database
Instance (computer science)
Distance
Vector potential
Medical imaging
Different (Kate Ryan album)
File system
Endliche Modelltheorie
Website
Proxy server
Installable File System
Newton's law of universal gravitation
Computer file
Coma Berenices
Instance (computer science)
Limit (category theory)
Scalability
Subject indexing
Wiki
Computer animation
08:18
Client (computing)
Price index
Database
Instance (computer science)
Directory service
Java Message Service
Group action
Element (mathematics)
Component-based software engineering
Subject indexing
Mathematics
Message passing
Computer animation
Software
Computer cluster
Synchronization
Different (Kate Ryan album)
Videoconferencing
File system
Queue (abstract data type)
Vertex (graph theory)
Service-oriented architecture
Message passing
08:48
Server (computing)
Presentation of a group
Proxy server
Virtual machine
Directory service
Price index
Instance (computer science)
Limit (category theory)
Event horizon
Element (mathematics)
Synchronization
Electronic meeting system
Vertex (graph theory)
Website
Information
Server (computing)
Metadata
Database
Mereology
Directory service
Instance (computer science)
Subject indexing
Message passing
Computer animation
Personal digital assistant
Synchronization
Website
Service-oriented architecture
Engineering physics
10:11
Point (geometry)
Server (computing)
Computer file
Server (computing)
Software developer
Branch (computer science)
Planning
Price index
Branch (computer science)
Computer animation
Read-only memory
File system
Self-organization
Computer architecture
11:19
Multiplication
Server (computing)
Server (computing)
Computer file
Java applet
Data storage device
Virtual machine
Directory service
Database
Instance (computer science)
Revision control
Single-precision floating-point format
Architecture
Message passing
Computer animation
Personal digital assistant
Software framework
Service-oriented architecture
Message passing
Backup
Physical system
Installable File System
Physical system
11:55
Server (computing)
Table (information)
Sequel
Distribution (mathematics)
Relational database
Gene cluster
Database
Spring (hydrology)
Mathematics
Electronic meeting system
Authorization
Software framework
Vertex (graph theory)
Aerodynamics
Endliche Modelltheorie
Lastteilung
World Wide Web Consortium
Distribution (mathematics)
Relational database
Cellular automaton
Bit
Database
Instance (computer science)
Numbering scheme
Scalability
Partition (number theory)
Message passing
Spring (hydrology)
Computer animation
Software framework
Service-oriented architecture
Library (computing)
13:49
Architecture
Server (computing)
Computer animation
Server (computing)
Directory service
Database
Service-oriented architecture
Data structure
Service-oriented architecture
Message passing
Physical system
Computer architecture
14:09
Slide rule
Presentation of a group
Server (computing)
Scripting language
Regulärer Ausdruck <Textverarbeitung>
Code
Multiplication sign
Database
Library catalog
Branch (computer science)
Mereology
Disk read-and-write head
Code
Scalability
Metadata
Revision control
Spring (hydrology)
Customer relationship management
Meeting/Interview
Feasibility study
Visualization (computer graphics)
Square number
Code refactoring
Information
Data structure
Endliche Modelltheorie
Computer architecture
World Wide Web Consortium
Point cloud
Service (economics)
Reflection (mathematics)
Interface (computing)
Moment (mathematics)
Data storage device
Code
Branch (computer science)
Feasibility study
Database
Scalability
Subject indexing
Type theory
Computer animation
Revision control
Computing platform
Social class
Marginal distribution
Data structure
Physical system
19:18
Game controller
Server (computing)
Arm
Lecture/Conference
Meeting/Interview
File system
Water vapor
Bit
Mereology
19:57
Point (geometry)
Information
Meeting/Interview
State of matter
Square number
Proxy server
Physical system
20:57
Dynamical system
Meeting/Interview
Relational database
Universe (mathematics)
Metadata
Connected space
Annulus (mathematics)
21:46
Server (computing)
Meeting/Interview
Relational database
Different (Kate Ryan album)
Right angle
Metadata
22:21
Presentation of a group
Server (computing)
File format
Military base
Multiplication sign
View (database)
Execution unit
Sound effect
Database
Lattice (order)
Parameter (computer programming)
Metadata
Category of being
Software
Meeting/Interview
Personal digital assistant
Different (Kate Ryan album)
Self-organization
Right angle
Local ring
25:27
Lecture/Conference
Sampling (statistics)
Database
Replication (computing)
Scalability
25:57
Revision control
Lecture/Conference
Meeting/Interview
Relational database
Projective plane
Database
Endliche Modelltheorie
Scalability
26:51
Computer animation
Lecture/Conference
00:08
but gives them around our 2nd
00:11
speaker from I will not talk about the scalability issues related to network and its join us in West don't pronounce it right in the middle so at a little bit afternoon 1st of all thank you very much for being here and thanks for having vote for us but it's it's really nice to to be here this the group so a good thing a little bit more what Maria said I'm going to focus on 1 specific aspect of genital so so 1st of all I like a small ball how many people here use or interviews Janet can you please raise your hands look at and how many people work with the data so we datasets let's say larger than 32 terabytes again we have a couple of people missed with and so all so in this
01:41
talk we're as I said we are going to speak about the scalability of Chechen and work so well basically go through some of the limitations and more than anything and discuss some some proposals and and some scenarios for the origin at so the 1st setting a limit that the context for big data that maybe it's not a reality for everyone and it but there are in fact some use cases because the the the number of writing of that the sources increased a lot with the sensor data and also user-generated content and a great deal of these data that is actually as some some sort of location afterward so we're looking at large the spatial data sets and this is going to be a more and more common in the in the near future and
02:36
so what happens when we have a really really large datasets we can always increase the deceive you the the random the number of sick views around the hard drive but tool can a set of the 7 point we can no longer increase its and that's the limit of a vertical scalability and then some people when we need to start thinking about this you would about horizontal scalability and mostly this is what we're going to focus on the still
03:07
so as you probably all know it is a major network so it's a it's a catalog for just spatial information you are most of the Duvalier are you there so I don't have to explain a lot about this the so just a few are important details so genital uh stores accesses data which can be distributed remotely but each store so some some metadata locally on a database so some data because the metadata is really data so easy to actually stores data locally and as many also mention uses a search index a which is based also how that show there
04:02
is some of representation the OK
04:28
there seems to be a couple of
04:33
inches slides here but he's
04:36
just 1 or 2 slides so maybe I just spoke yeah just continued so thank you so
04:48
I was what I wanted to say is
04:51
that this slides is that there is so there is this index which is based on machine and this is in this stored locally on each instance so let the reinvent the CDs and limitations and the 2nd thing is this database that I described there is in fact uh soul is effect a relational database because of the way that was are supported on June library that is used is actually only supports relational databases such as Oracle or page 2 was this sequence so there
05:31
were some uh proposals therefore for scaling genetic and so on I I I mentioned the data is like an argument for for stealing genetic and maybe this is not relevant for most people but there are all the arguments that that it probably more more relevant such as high availability so if we have like uh a cluster of nodes then we gain when there is a one node dies the other 1 can against can take place so always a failover scenario this is quite useful there is also the scenario of load balancing so we can actually distribute the workload so these these these arguments that are relevant probably for for most people and there were already proposals that were looking at scaling genital area so based on this
06:28
argument of another and if this
06:32
quite challenging still the
06:42
original presentation the know
06:47
the original so well to go it again just sake well maybe just read
07:00
from OK this nonsensical
07:21
so at the end that the the guns scenarios that you work cannot be spent clustering and the limitations of our so basically the ones that I mentioned those uh there's also the the the file uploads so they are are stored locally in the computer file system and there are a few other things so I'm going to explain these proposals and how they look at these restrictions and what this suggests that for overcoming them so that the 1st model is 1 that comes from a from a proposal which is the 1 of sharing features between different distances so you can see in this image we have like uh the different nodes and each node is is on machine index but this index is actually synchronize between instances using their
08:19
uh a message broker so using the Java Message Service so when there is a change in 1 index of video other instances can know about it n the other the
08:35
other elements the the data directory is also shown so we can do this by using a shared file system network file system and the databases is also should
08:48
so there is an idea needs uh some synchronization between the nodes and there are a few other aspects that that have to be uh and also like the site UAD and needs to be not tied to any sense but that is it should be assigned to 1 specific knowledge and a single for the she assessment sessions the so
09:14
that the 2nd scenario he's a scenario of having that actually we no longer have a 1 index reasons but in this case we have a search service so and we can use solid which is the base almost and solar is assessed that is a nice of a lot of nice features but the 1 we are interested for the for the scope of this presentation is in extracting so we can actually splits that the index and distributed for the event instances that can be processed in different physical machine and the other the other elements you can see it's gets so it's going the doctor directory and the databases and we still need to the message broker to synchronize some information between nodes so that this
10:13
isn't just for you to to see this is an example of a solar architecture with a distributed notes and implemented using the token so interesting enough there did improvements so that the replacement of of Lucene by by solar by the solar server is something that is being implemented in there in 1 branch of the genetic organs so arranged by mostly reference 1960 is not you and and there are plans to large uh so these these developments in the region at work for points you so this could actually be a reality so this is the 1st
11:07
scenario and the 1st scenario is actually distributing everything as you can see that the the the file system it's no longer a shared file system but is a distributed file
11:19
system such as for instance that I do file system and in this case we gain we have multiple notes there's soul we were in the case of do on of the cluster we have the name notes which can also have a pickup in recent versions and where the the DataNodes ended the week instead of store very large files because they're actually they can across machines and this should be as transparent for a fortunate to work then the
11:55
message server that message
11:56
broker itself could be clustering so if we take for instance rather q we could also cluster that the message broker and the use so we we have search server also clusters and we have the
12:12
the the database and distributed as well so the question is can we this towards the current databases uh so what I said before the relay relational and uh it's a little bit tricky to to to scale to distribute the relational databases although there are efforts to do it to these but by design and uh the and not really when they would not really designed having that in mind so and there is another blend of databases that's it's more suitable for a horizontal distribution and these are the most equal databases so 1 example there are many Boddingtons foreign ossicle 1 example is the documented uh oriented databases such as for instance uh molar model the and so on if we if we use such databases we this with them and cluster them and very very easily as I said before that the major authority is using the uh this spring data framework that currently using the the GPA library which is based on gene JDBC and there are all the spring data libraries that support virtually no sequel databases such as a model of the BU or even culturally cell although all this changes that the uh I mentioned at some
13:50
are actually quite difficult to
13:53
between implement and if we have a monolithic structure so so if we think of a lot more of an architecture that is more basic services so service-oriented architecture it will be much easier to to
14:10
scale the the parts of the of genetic that we want to scale so we would like to reduce the complexity of of implementing the this scenario so this is where we can talk about because services so as I've shown quite a lot of slides we that with deducing doing this presentation and it's because I think it could be a very helpful tool to to help us to to implement this this scenario and from the latest version that was released less than 2 months ago docket integrated natively a clustering into his engine so I think it the we very suitable if we want to think about this kind of Microsoft micro-services architecture scenario so as just to to finish this summer and some final thoughts still more more than anything I would like to to promote some discussion around this topic so that the proposals that I show you before that the 1st proposals they were actually based on genetic to so as as many as so there were quite some radical changes on from genital 2 to 3 so that probably to be a good idea to to review them at the light of the current code base and the head of the solar and that the branch that the margin of the solar branch that I mentioned before is is something very positive that could contribute to to the 2nd scenario that describes so where we have like a index shouting but that the 1st scenario we are we are still quite far from its this is more than anything this is like an expression of interest and this is where I this is the moment where I can suggests some activities there that could ever and help us to to to implement this nice so 1 thing would be to look at and extending the database support to other type of databases to do these refactoring to to to user micro-services all or at least to look at the feasibility of all these refactoring and so on and of course I think it will adopt could be very you useful to to test this kind of scenario so I want to finish with just the with the reflection salt to with a cold really for everyone so if people are interested in contributing with whom which codes or we funding Deloitte's towards this scenario I think this could be be viable for the future and you could make a janitor work very robust the catalog for for the next century so thank you very much be the usual for this talk and we have plenty of time for questions and I think we just merge the 2 talks referred so far so if you have questions to money and you can yeah ask them as well so questions many people would try to do you years like the interface and just on the road 1 of the last slides we talking about Monday was not just for the indexing for the final battle UGA data as well uh sold maybe and I was not clear before but so many GeoNetwork store some
18:22
data as well so some metadata but it's data so stars of innovative ways so as I was suggesting to use model to with this metadata and as for the data itself you can also you can already be be remote and and you want to explain this further up I don't know what it's like this again I mean it's is she's asking about the that data so if will and I think that the metadata itself is very structure so maybe our nice square then database is not so good that can be used in so I actually it was lost during the metadata and the another question I have a
19:32
moral question for 1 there was a slight concerning the how file system and for to notice it looked a bit like out of water because it doesn't have the arm controller other since there is have you looked into this which 1 to use its 1st
19:48
2nd and the reason only to use the first one part is just an example I used to thinking about but we will actually not using any of or 1st services
20:01
scenario could be any reason that
20:02
you prefer to use have up once in the heart of 2 wars I have no and I haven't checked that can now it was it
20:11
was just that was just this is the year was it loses a slight sanctioning it's really far from gets at this point where more on analyzing state more than yes thinking is having that share prices than having a square of something that have more than really looking on the extracted knowledge to use so yes the example was that of the system and the Mongol but it could also be be instead of money we could be and culture and other technology yeah and other questions what information what
21:05
does this have in the things that we can contribute to the university I thank you for your problem sorry you talk about and chitectural unit for and I won't work toward full of what about the notion duration that during the for my
21:27
opinion to consider its relation to dynamic relations between metadata and in data do you have some the to say and go to the In this connection you mean also starting the data
21:48
or you mean that how to relate data and metadata
21:53
totaling of pulling but maybe on tool that did you make that when you have digital OK um
22:03
for W and only if a services if you configure it right you got automatically generated almost all their metadata just by scrapping the capabilities different in something like that right so you make that will so
22:24
when you I'm from a long time that I'm working with John at home but I don't want and looking forward I link between the on my data available in and 2 women and men directly in mind that this is my question and that and this and that creates quite yes this is the main problem with that
22:58
is that you can have so many different formats of data that you can find that common solution to all of them you have to uh check solution like w festively emissaries indicate properties having some special effects so on the database if you have a now we can do some kind of Hammerstein but then into that have raised you will need some were replaced this data to be limited data so in the end it is the useful to have them make that data somehow in the database so you can scrappy than the lead or only some of the final classifier is it useful that or it's better to keep it only on their metadata and so I think it depends on the use case what is the insidious is placed in and highly you I mean the next presentation with which I have I have very leverage imprint it and I think we have to make sure that there is a certain thing we have a lot would you know what do we do we manage a lot of use cases so people used to do not work within their local organization uh and and this kind of features is really helpful so so for that use cases in you really useful the other use case is a national as the which harvests a lot of local and you have networks where uh the status of a document can never be changed because it represents something that is submitted by a local government so it take another national portal as a kind of legal status and the the the document itself so these are 2 conflicting interests which meant that we try to solve and in 1 which is a kind of challenge we have a lot of Richard use case is very interesting to look at is but the thing is that there's so many use cases that it's difficult to find a common solution right so in the end and we find a solution with the services that we mystery effects which is 1 of the most straightforward and I think there's some that the bases and not argument behind them but there is some I guess the idea is that the bases have this kind of metadata there so it's more if you have completed use case this chair the meeting is the view unit working with them check what we can do OK thank you for your
25:28
questions as the 1 last very quick question just a comment about the scalability of the underlying database and their concepts and samples were could be helpful for you to manage that you don't need to switch for that reason to different the
25:53
present and it's a bidirectional replication Is
25:58
there inside the the evidence that
26:00
can manage yeah I was mentioned there are actually there
26:10
are some projects that the scale of the that cluster relational databases but I was what I was trying to emphasize is that the 2 most single databases are somehow by design more so they were more yes yes so they address this from the very beginning so it's something that is built into the logical so but sharing them from 4 brusqueness that have been added in school for doing it in this and it's true that usually there model like it's not on the database but somewhere maybe doing that there are conflicts Aquarius whatever it is 1
26:51
thank you for this little discussion
