Scalability of GeoNetwork: Current Status and Future Directions

Video thumbnail (Frame 0) Video thumbnail (Frame 2220) Video thumbnail (Frame 3859) Video thumbnail (Frame 4632) Video thumbnail (Frame 5768) Video thumbnail (Frame 6618) Video thumbnail (Frame 7151) Video thumbnail (Frame 8223) Video thumbnail (Frame 9687) Video thumbnail (Frame 10424) Video thumbnail (Frame 12446) Video thumbnail (Frame 13191) Video thumbnail (Frame 13774) Video thumbnail (Frame 15287) Video thumbnail (Frame 16616) Video thumbnail (Frame 17823) Video thumbnail (Frame 20714) Video thumbnail (Frame 27522) Video thumbnail (Frame 28955) Video thumbnail (Frame 29671) Video thumbnail (Frame 30222) Video thumbnail (Frame 31432) Video thumbnail (Frame 32147) Video thumbnail (Frame 32662) Video thumbnail (Frame 33526) Video thumbnail (Frame 34429) Video thumbnail (Frame 38185) Video thumbnail (Frame 38794) Video thumbnail (Frame 40276)
Video in TIB AV-Portal: Scalability of GeoNetwork: Current Status and Future Directions

Formal Metadata

Scalability of GeoNetwork: Current Status and Future Directions
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place

Content Metadata

Subject Area
In recent times, phenomenon such as the Internet of Things or the popularity of social networks, among others, have been responsible for an increase availability of sensor data and user generated content. To be able to ingest, store and analyze these massive volumes of information is a standing challenge that is no longer ignored. The data about this data is generally speaking, less of a problem, if we think for instance that trillions of sensor records, may share the same metadata record; for this reason catalogs have been less exposed to the challenges that took by storm the database community. Nevertheless, a large variety of datasets can also pose some performance challenges to traditional catalogs, and demand increase scalability. In this talk we will look at strategies for scaling GeoNetwork through load balancing, at its current limitations, and we will discuss potential improvements by adopting distributed search server technologies such as SOLR or ElasticSearch. On the database side, we will review the current database support, which is limited to ORM, and discuss the possibility of extending it to support NoSQL databases, which could be horizontally scaled, unleashing a new generation of metadata storage.
Keywords GeoCat titellus
Mobile app Group action Direction (geometry) Weight State of matter Bit Scalability Arm Scalability God Voting Computer animation Software Oval Gamma function
Source code Context awareness Variety (linguistics) Direction (geometry) Source code Length User-generated content Set (mathematics) Term (mathematics) Limit (category theory) Scalability Scalability Number Number Uniform resource locator Computer animation Personal digital assistant Series (mathematics) Quicksort Task (computing) Writing
Point (geometry) View (database) Set (mathematics) Library catalog Price index Limit (category theory) Scalability Metadata Number Order (biology) Query language Process (computing) Information Search engine (computing) Information Point (geometry) Data storage device Database Library catalog Limit (category theory) Scalability Subject indexing Computer animation Software Hard disk drive Vertical direction
Order (biology) Computer animation Meeting/Interview Point (geometry) Query language Representation (politics) Library catalog Price index Information Search engine (computing)
Slide rule Structural load Point (geometry) Library catalog Price index Core dump Coma Berenices Scanning tunneling microscope Order (biology) Wiki Computer animation Query language Information Search engine (computing)
Web page Slide rule Structural load Relational database Virtual machine Sound effect Coma Berenices Database Instance (computer science) Limit (category theory) Sequence Subject indexing Computer animation Library (computing)
Area Workload Wiki Computer animation Structural load High availability Coma Berenices Lastteilung Parameter (computer programming)
Presentation of a group Computer animation Meeting/Interview
Computer file Structural load 1 (number) Virtual machine Price index Library catalog Database Instance (computer science) Distance Vector potential Medical imaging Different (Kate Ryan album) File system Endliche Modelltheorie Website Proxy server Installable File System Newton's law of universal gravitation Computer file Coma Berenices Instance (computer science) Limit (category theory) Scalability Subject indexing Wiki Computer animation
Client (computing) Price index Database Instance (computer science) Directory service Java Message Service Group action Element (mathematics) Component-based software engineering Subject indexing Mathematics Message passing Computer animation Software Computer cluster Synchronization Different (Kate Ryan album) Videoconferencing File system Queue (abstract data type) Vertex (graph theory) Service-oriented architecture Message passing
Presentation of a group Server (computing) Proxy server Virtual machine Directory service Price index Instance (computer science) Limit (category theory) Event horizon Element (mathematics) Synchronization Electronic meeting system Vertex (graph theory) Website Information Server (computing) Metadata Database Mereology Instance (computer science) Directory service Subject indexing Message passing Computer animation Personal digital assistant Synchronization Website Service-oriented architecture Engineering physics
Point (geometry) Server (computing) Computer file Server (computing) Software developer Branch (computer science) Planning Price index Branch (computer science) Computer animation Read-only memory File system Self-organization Computer architecture
Multiplication Server (computing) Server (computing) Computer file Java applet Virtual machine Data storage device Directory service Database Instance (computer science) Revision control Single-precision floating-point format Architecture Message passing Computer animation Personal digital assistant Software framework Service-oriented architecture Message passing Backup Physical system Installable File System Physical system
Server (computing) Table (information) Sequel Distribution (mathematics) Relational database Gene cluster Database Spring (hydrology) Mathematics Electronic meeting system Authorization Software framework Vertex (graph theory) Aerodynamics Endliche Modelltheorie Lastteilung World Wide Web Consortium Distribution (mathematics) Relational database Cellular automaton Database Bit Instance (computer science) Numbering scheme Scalability Partition (number theory) Message passing Spring (hydrology) Computer animation Software framework Service-oriented architecture Library (computing)
Architecture Server (computing) Computer animation Server (computing) Directory service Database Service-oriented architecture Data structure Service-oriented architecture Message passing Physical system Computer architecture
Slide rule Presentation of a group Server (computing) Scripting language Regulärer Ausdruck <Textverarbeitung> Code Multiplication sign Database Library catalog Branch (computer science) Mereology Disk read-and-write head Code Scalability Metadata Revision control Spring (hydrology) Customer relationship management Meeting/Interview Feasibility study Visualization (computer graphics) Square number Code refactoring Information Data structure Endliche Modelltheorie Computer architecture World Wide Web Consortium Point cloud Service (economics) Interface (computing) Reflection (mathematics) Moment (mathematics) Data storage device Code Branch (computer science) Feasibility study Database Scalability Subject indexing Type theory Computer animation Revision control Computing platform Social class Marginal distribution Data structure Physical system
Game controller Server (computing) Arm Lecture/Conference Meeting/Interview File system Bit Water vapor Mereology
Point (geometry) Information Meeting/Interview State of matter Square number Proxy server Physical system
Dynamical system Meeting/Interview Relational database Universe (mathematics) Metadata Connected space Annulus (mathematics)
Server (computing) Meeting/Interview Relational database Different (Kate Ryan album) Right angle Metadata
Server (computing) Presentation of a group Military base File format Multiplication sign View (database) Execution unit Sound effect Database Parameter (computer programming) Lattice (order) Metadata Category of being Software Meeting/Interview Different (Kate Ryan album) Personal digital assistant Self-organization Right angle Local ring
Lecture/Conference Sampling (statistics) Database Replication (computing) Scalability
Revision control Lecture/Conference Meeting/Interview Relational database Projective plane Database Endliche Modelltheorie Scalability
Computer animation Lecture/Conference
but gives them around our 2nd
speaker from I will not talk about the scalability issues related to network and its join us in West don't pronounce it right in the middle so at a little bit afternoon 1st of all thank you very much for being here and thanks for having vote for us but it's it's really nice to to be here this the group so a good thing a little bit more what Maria said I'm going to focus on 1 specific aspect of genital so so 1st of all I like a small ball how many people here use or interviews Janet can you please raise your hands look at and how many people work with the data so we datasets let's say larger than 32 terabytes again we have a couple of people missed with and so all so in this
talk we're as I said we are going to speak about the scalability of Chechen and work so well basically go through some of the limitations and more than anything and discuss some some proposals and and some scenarios for the origin at so the 1st setting a limit that the context for big data that maybe it's not a reality for everyone and it but there are in fact some use cases because the the the number of writing of that the sources increased a lot with the sensor data and also user-generated content and a great deal of these data that is actually as some some sort of location afterward so we're looking at large the spatial data sets and this is going to be a more and more common in the in the near future and
so what happens when we have a really really large datasets we can always increase the deceive you the the random the number of sick views around the hard drive but tool can a set of the 7 point we can no longer increase its and that's the limit of a vertical scalability and then some people when we need to start thinking about this you would about horizontal scalability and mostly this is what we're going to focus on the still
so as you probably all know it is a major network so it's a it's a catalog for just spatial information you are most of the Duvalier are you there so I don't have to explain a lot about this the so just a few are important details so genital uh stores accesses data which can be distributed remotely but each store so some some metadata locally on a database so some data because the metadata is really data so easy to actually stores data locally and as many also mention uses a search index a which is based also how that show there
is some of representation the OK
there seems to be a couple of
inches slides here but he's
just 1 or 2 slides so maybe I just spoke yeah just continued so thank you so
I was what I wanted to say is
that this slides is that there is so there is this index which is based on machine and this is in this stored locally on each instance so let the reinvent the CDs and limitations and the 2nd thing is this database that I described there is in fact uh soul is effect a relational database because of the way that was are supported on June library that is used is actually only supports relational databases such as Oracle or page 2 was this sequence so there
were some uh proposals therefore for scaling genetic and so on I I I mentioned the data is like an argument for for stealing genetic and maybe this is not relevant for most people but there are all the arguments that that it probably more more relevant such as high availability so if we have like uh a cluster of nodes then we gain when there is a one node dies the other 1 can against can take place so always a failover scenario this is quite useful there is also the scenario of load balancing so we can actually distribute the workload so these these these arguments that are relevant probably for for most people and there were already proposals that were looking at scaling genital area so based on this
argument of another and if this
quite challenging still the
original presentation the know
the original so well to go it again just sake well maybe just read
from OK this nonsensical
so at the end that the the guns scenarios that you work cannot be spent clustering and the limitations of our so basically the ones that I mentioned those uh there's also the the the file uploads so they are are stored locally in the computer file system and there are a few other things so I'm going to explain these proposals and how they look at these restrictions and what this suggests that for overcoming them so that the 1st model is 1 that comes from a from a proposal which is the 1 of sharing features between different distances so you can see in this image we have like uh the different nodes and each node is is on machine index but this index is actually synchronize between instances using their
uh a message broker so using the Java Message Service so when there is a change in 1 index of video other instances can know about it n the other the
other elements the the data directory is also shown so we can do this by using a shared file system network file system and the databases is also should
so there is an idea needs uh some synchronization between the nodes and there are a few other aspects that that have to be uh and also like the site UAD and needs to be not tied to any sense but that is it should be assigned to 1 specific knowledge and a single for the she assessment sessions the so
that the 2nd scenario he's a scenario of having that actually we no longer have a 1 index reasons but in this case we have a search service so and we can use solid which is the base almost and solar is assessed that is a nice of a lot of nice features but the 1 we are interested for the for the scope of this presentation is in extracting so we can actually splits that the index and distributed for the event instances that can be processed in different physical machine and the other the other elements you can see it's gets so it's going the doctor directory and the databases and we still need to the message broker to synchronize some information between nodes so that this
isn't just for you to to see this is an example of a solar architecture with a distributed notes and implemented using the token so interesting enough there did improvements so that the replacement of of Lucene by by solar by the solar server is something that is being implemented in there in 1 branch of the genetic organs so arranged by mostly reference 1960 is not you and and there are plans to large uh so these these developments in the region at work for points you so this could actually be a reality so this is the 1st
scenario and the 1st scenario is actually distributing everything as you can see that the the the file system it's no longer a shared file system but is a distributed file
system such as for instance that I do file system and in this case we gain we have multiple notes there's soul we were in the case of do on of the cluster we have the name notes which can also have a pickup in recent versions and where the the DataNodes ended the week instead of store very large files because they're actually they can across machines and this should be as transparent for a fortunate to work then the
message server that message
broker itself could be clustering so if we take for instance rather q we could also cluster that the message broker and the use so we we have search server also clusters and we have the
the the database and distributed as well so the question is can we this towards the current databases uh so what I said before the relay relational and uh it's a little bit tricky to to to scale to distribute the relational databases although there are efforts to do it to these but by design and uh the and not really when they would not really designed having that in mind so and there is another blend of databases that's it's more suitable for a horizontal distribution and these are the most equal databases so 1 example there are many Boddingtons foreign ossicle 1 example is the documented uh oriented databases such as for instance uh molar model the and so on if we if we use such databases we this with them and cluster them and very very easily as I said before that the major authority is using the uh this spring data framework that currently using the the GPA library which is based on gene JDBC and there are all the spring data libraries that support virtually no sequel databases such as a model of the BU or even culturally cell although all this changes that the uh I mentioned at some
are actually quite difficult to
between implement and if we have a monolithic structure so so if we think of a lot more of an architecture that is more basic services so service-oriented architecture it will be much easier to to
scale the the parts of the of genetic that we want to scale so we would like to reduce the complexity of of implementing the this scenario so this is where we can talk about because services so as I've shown quite a lot of slides we that with deducing doing this presentation and it's because I think it could be a very helpful tool to to help us to to implement this this scenario and from the latest version that was released less than 2 months ago docket integrated natively a clustering into his engine so I think it the we very suitable if we want to think about this kind of Microsoft micro-services architecture scenario so as just to to finish this summer and some final thoughts still more more than anything I would like to to promote some discussion around this topic so that the proposals that I show you before that the 1st proposals they were actually based on genetic to so as as many as so there were quite some radical changes on from genital 2 to 3 so that probably to be a good idea to to review them at the light of the current code base and the head of the solar and that the branch that the margin of the solar branch that I mentioned before is is something very positive that could contribute to to the 2nd scenario that describes so where we have like a index shouting but that the 1st scenario we are we are still quite far from its this is more than anything this is like an expression of interest and this is where I this is the moment where I can suggests some activities there that could ever and help us to to to implement this nice so 1 thing would be to look at and extending the database support to other type of databases to do these refactoring to to to user micro-services all or at least to look at the feasibility of all these refactoring and so on and of course I think it will adopt could be very you useful to to test this kind of scenario so I want to finish with just the with the reflection salt to with a cold really for everyone so if people are interested in contributing with whom which codes or we funding Deloitte's towards this scenario I think this could be be viable for the future and you could make a janitor work very robust the catalog for for the next century so thank you very much be the usual for this talk and we have plenty of time for questions and I think we just merge the 2 talks referred so far so if you have questions to money and you can yeah ask them as well so questions many people would try to do you years like the interface and just on the road 1 of the last slides we talking about Monday was not just for the indexing for the final battle UGA data as well uh sold maybe and I was not clear before but so many GeoNetwork store some
data as well so some metadata but it's data so stars of innovative ways so as I was suggesting to use model to with this metadata and as for the data itself you can also you can already be be remote and and you want to explain this further up I don't know what it's like this again I mean it's is she's asking about the that data so if will and I think that the metadata itself is very structure so maybe our nice square then database is not so good that can be used in so I actually it was lost during the metadata and the another question I have a
moral question for 1 there was a slight concerning the how file system and for to notice it looked a bit like out of water because it doesn't have the arm controller other since there is have you looked into this which 1 to use its 1st
2nd and the reason only to use the first one part is just an example I used to thinking about but we will actually not using any of or 1st services
scenario could be any reason that
you prefer to use have up once in the heart of 2 wars I have no and I haven't checked that can now it was it
was just that was just this is the year was it loses a slight sanctioning it's really far from gets at this point where more on analyzing state more than yes thinking is having that share prices than having a square of something that have more than really looking on the extracted knowledge to use so yes the example was that of the system and the Mongol but it could also be be instead of money we could be and culture and other technology yeah and other questions what information what
does this have in the things that we can contribute to the university I thank you for your problem sorry you talk about and chitectural unit for and I won't work toward full of what about the notion duration that during the for my
opinion to consider its relation to dynamic relations between metadata and in data do you have some the to say and go to the In this connection you mean also starting the data
or you mean that how to relate data and metadata
totaling of pulling but maybe on tool that did you make that when you have digital OK um
for W and only if a services if you configure it right you got automatically generated almost all their metadata just by scrapping the capabilities different in something like that right so you make that will so
when you I'm from a long time that I'm working with John at home but I don't want and looking forward I link between the on my data available in and 2 women and men directly in mind that this is my question and that and this and that creates quite yes this is the main problem with that
is that you can have so many different formats of data that you can find that common solution to all of them you have to uh check solution like w festively emissaries indicate properties having some special effects so on the database if you have a now we can do some kind of Hammerstein but then into that have raised you will need some were replaced this data to be limited data so in the end it is the useful to have them make that data somehow in the database so you can scrappy than the lead or only some of the final classifier is it useful that or it's better to keep it only on their metadata and so I think it depends on the use case what is the insidious is placed in and highly you I mean the next presentation with which I have I have very leverage imprint it and I think we have to make sure that there is a certain thing we have a lot would you know what do we do we manage a lot of use cases so people used to do not work within their local organization uh and and this kind of features is really helpful so so for that use cases in you really useful the other use case is a national as the which harvests a lot of local and you have networks where uh the status of a document can never be changed because it represents something that is submitted by a local government so it take another national portal as a kind of legal status and the the the document itself so these are 2 conflicting interests which meant that we try to solve and in 1 which is a kind of challenge we have a lot of Richard use case is very interesting to look at is but the thing is that there's so many use cases that it's difficult to find a common solution right so in the end and we find a solution with the services that we mystery effects which is 1 of the most straightforward and I think there's some that the bases and not argument behind them but there is some I guess the idea is that the bases have this kind of metadata there so it's more if you have completed use case this chair the meeting is the view unit working with them check what we can do OK thank you for your
questions as the 1 last very quick question just a comment about the scalability of the underlying database and their concepts and samples were could be helpful for you to manage that you don't need to switch for that reason to different the
present and it's a bidirectional replication Is
there inside the the evidence that
can manage yeah I was mentioned there are actually there
are some projects that the scale of the that cluster relational databases but I was what I was trying to emphasize is that the 2 most single databases are somehow by design more so they were more yes yes so they address this from the very beginning so it's something that is built into the logical so but sharing them from 4 brusqueness that have been added in school for doing it in this and it's true that usually there model like it's not on the database but somewhere maybe doing that there are conflicts Aquarius whatever it is 1
thank you for this little discussion