Leveraging Big Geo Data through Metadata

Video thumbnail (Frame 0) Video thumbnail (Frame 1192) Video thumbnail (Frame 1811) Video thumbnail (Frame 4150) Video thumbnail (Frame 6809) Video thumbnail (Frame 8077) Video thumbnail (Frame 11232) Video thumbnail (Frame 13227) Video thumbnail (Frame 14886) Video thumbnail (Frame 16580) Video thumbnail (Frame 18103) Video thumbnail (Frame 22664) Video thumbnail (Frame 23915) Video thumbnail (Frame 25355) Video thumbnail (Frame 28088) Video thumbnail (Frame 28991) Video thumbnail (Frame 30720) Video thumbnail (Frame 31472) Video thumbnail (Frame 32146) Video thumbnail (Frame 32891) Video thumbnail (Frame 33619) Video thumbnail (Frame 34526) Video thumbnail (Frame 35520) Video thumbnail (Frame 36236) Video thumbnail (Frame 37089) Video thumbnail (Frame 37866) Video thumbnail (Frame 38434) Video thumbnail (Frame 40840)
Video in TIB AV-Portal: Leveraging Big Geo Data through Metadata

Formal Metadata

Leveraging Big Geo Data through Metadata
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place

Content Metadata

Subject Area
The increase in the scale of traditional data sources, along with an explosion in the availability of sensor data, have originated massive volumes of data, a great deal of which is actually geolocated. This is partly due to the wide adoption of cheaper position technologies, and to the exponential growing of Volunteered Geographic Geographic Information (VGI) movements, which rely on crowdsourcing approaches. Big Data has generated a lot of interest amongst industry, the developer community and the public in general, and it has been at the core of many technology innovations which took place recently (e.g.: NoSQL, MapReduce); these new approaches already started to involve the geo community with projects such as the ESRI Spatial Framework for Hadoop or GeoTrellis, just to mention a few. However, the focus has been mostly on storing data (at the infrastructure level) and using data (at the analysis level), leaving aside challenges such as discoverability, integration or security. In this talk, we will address some of these outstanding challenges through the use of metadata and the semantic web, and show how the use of a decentralized and standardized catalog can help to unlock the five V's of Big Data: Volume, Velocity, Variety, Veracity, and most importantly, Value.
Keywords GeoCat Universitat Pompeu Fabra
Context awareness Presentation of a group State of matter Artificial neural network Shared memory Coma Berenices Metadata Computer animation Lecture/Conference Personal digital assistant Software Integrated development environment Row (database) Process (computing) Summierbarkeit Wide area network
Slide rule Channel capacity Variety (linguistics) Connectivity (graph theory) Bit error rate Set (mathematics) Open set Instance (computer science) Open set Semantic Web Category of being Arithmetic mean Computer animation Personal digital assistant Bus (computing) 5 (number) World Wide Web Consortium
Web page Slide rule Metadata Value-added network Web 2.0 Finite element method Strategy game Internet forum Task (computing) Information Surface Content (media) Bit Database Deep Web Instance (computer science) Library catalog System call Subject indexing Category of being Computer animation Personal digital assistant Search engine (computing) Software repository Telecommunication Quicksort Row (database)
Point (geometry) Information Server (computing) Multiplication sign Set (mathematics) Library catalog Bit Volume (thermodynamics) Instance (computer science) Library catalog Rule of inference Metadata Subject indexing Process (computing) Velocity Computer animation Velocity Query language Query language Square number Volume Scheduling (computing) Task (computing)
Slide rule Presentation of a group Service (economics) Variety (linguistics) Variety (linguistics) 3 (number) Deep Web Open set 8 (number) Mereology Open set Variance Computer animation Personal digital assistant Process (computing)
Point (geometry) Implementation Computer file File format Virtual machine Maxima and minima Metadata Web 2.0 Wave Geometry Latent heat Internetworking Object (grammar) Uniqueness quantification Row (database) System identification Standard deviation Proper map File format Uniqueness quantification Library catalog Open set Symbol table Data model Computer animation Search engine (computing) Bridging (networking) System identification Data logger Object (grammar) Data structure Resultant Computer-assisted translation Row (database)
Standard deviation Web crawler Multiplication sign View (database) Insertion loss Mereology Data quality Web 2.0 Strategy game Object (grammar) Ontology Process (computing) Endliche Modelltheorie Physical system Link (knot theory) Sampling (statistics) Bit Instance (computer science) Product (business) Category of being Arithmetic mean Process (computing) Self-organization Endliche Modelltheorie Reading (process) Resultant Web page Point (geometry) Asynchronous Transfer Mode Service (economics) Proxy server Algorithm Variety (linguistics) Feedback 3 (number) Library catalog Mathematical analysis Vector potential Metadata Statement (computer science) Summierbarkeit Conditional-access module Proxy server Address space Standard deviation Information Interface (computing) Projective plane Mathematical analysis Library catalog Transformation (genetics) Word Bootstrap aggregating Computer animation Statement (computer science) Point cloud
3 (number) Maxima and minima Mathematical analysis Icosahedron Metadata Text mining Software framework Information Office suite Data structure World Wide Web Consortium Focus (optics) Information Projective plane Sampling (statistics) Cartesian coordinate system System call Data mining Category of being Computer animation Software Personal digital assistant Software framework Self-organization Personal area network Automation
Presentation of a group Multiplication sign Projective plane Data storage device 3 (number) Set (mathematics) Bit Information privacy Sphere System call Metadata Information privacy 2 (number) Pointer (computer programming) Software Different (Kate Ryan album) Personal digital assistant Boundary value problem Physical system
Electronic data processing Slide rule State of matter Software developer 3 (number) Device driver Bit Open set Library catalog Metadata Voting Exchange Server 2007 Service Pack 1 Computer animation Uniform resource name Implementation
Lecture/Conference Dimensional analysis Hill differential equation Hand fan Metadata
Word Process (computing) Meeting/Interview Personal digital assistant
Information Meeting/Interview Core dump Database Codierung <Programmierung> Rule of inference Metadata
Information Meeting/Interview Personal digital assistant Instance (computer science) Metadata
Texture mapping Meeting/Interview Quicksort Metadata
Computer file Meeting/Interview Lecture/Conference Mass Mereology
Implementation Process (computing) Software Meeting/Interview Uniformer Raum Order (biology) Text mining Metadata
Addition Constraint (mathematics) Meeting/Interview Lecture/Conference Projective plane Universe (mathematics) Physical law Website Flow separation Resultant Product (business)
Point (geometry) Standard deviation INTEGRAL Projective plane Interactive television Sheaf (mathematics) Similarity (geometry) Bit Library catalog Open set Sphere Computer programming CAN bus Process (computing) Meeting/Interview Authorization Curve fitting Spacetime
Computer animation Lecture/Conference Maxima and minima Airy function
OK OK so thanks again for
being here so now I'm going to more than anything in a share some thoughts with you about the
data and the whole metadata can help us to to handle the data so as we already said the case in previous presentation of all of the data so it's a reality in some use cases and a great deal of this is that this state is actually geolocated and so as the 1st question is what is the data
and there is no definition that is relying on these 4 continents put properties which are volume velocity variety and rest so these desire for components that you need to address in this kind kind of data sets and more recently we are hearing
about a value so all of these complement so what really is value value it is related to the the capacity of extracting meaning insights from the data so drawing conclusions from data and start acting upon this conclusion so ultimately this is the the final purpose of of of data and this is mostly referred in the in the in the case of business values so related to an increase in profits and so on because this is where the most of the use cases of Big data but it could be any other thing really could be for instance predicting a d an outbreak of disease or it could be even be improving the mobility of of the city this this is all value that is extracted from from the data so in this talk we would like to introduce you to succeed which is disabilities so this is related to this the discoverability of of the data really and this initiative said such as the Open Data or the Semantic Web attention from the to the visibility of of datasets of the availability of data so they should be available and it should be related should be linked and did so that the next slide uh I will I will
talk a little bit this slide i will talk a little bit about the data that is not visible which is actually most of the the content that is in the world wide web so you can see in the people of the iceberg we have the the databases is and the indexed by the search engines and then below that we have what was called the deep web so data repositories and medical records all sorts of information a Web page from people even there and then below that we have in the dark with and also part of the human work and in the case of the dark where for some reason the people that put data they're they're very interested in not making this data visible so if you need special tools to excessive and so that that no intention of of this that of bringing this data to the surface with but in some cases into the deep where we have data that is not intentionally and so for instance if we have like a memristor polity that is really some data but this database is really like not be in a catalog so he's not harvested at all and is not indexing this that is not discoverable but this was not the original intention so In this talk we want to talk a little bit about the strategies of bringing this data that lies and additionally in the hidden layer and how can we bring it to the to the surface where and for these we we want to to use metadata will want to draw potential for a for the importance of metadata in
this task so we we are talking a lot about metadata in all this talk so what really is metadata and metadata is really these data so data about status so we in all forum for the scope of this talk we are going to consider metadata is information and about the data so anything describing documents 4 datasets and and so on so you could be for instance in the case of the telecom company could be the data that describes the phone calls and or in the In the case of geographic information could be properties of the of the dataset and so on so can metadata
x is eventually data to address each 1 of the sixties that the information before and this is what we are going to look at it with a little bit more detail so that the 1st the the
1st 2 movies volume and velocity could really be addressed together because when we have data with a low latency most of the times it results in them a very large datasets self if you watch the previous talk as you have seen that 1 way of addressing this is a rule distributing so through horizontal scalability and as a catalog can be it can be very important in these because if we use a catalog hasn't in try point for accessing information and these allow us to have a distributed data sets which can be stored remotely then we can also think about this you would think that the metadata itself and this is what also talk in the previous talk uh so process called index charging when we actually sleep uh being the extent and and you will so for the for doing this we also need metadata air and when we think about this you would think the index that the the queries uh we can actually use these index uh where this information to make it more efficient in the use of the resources like for instance to reduce the power consumption so having in mind what the what is the legacy that we we are looking for for the square is and what the the resources available so now it it'll about the variety of
so it's own enjoyment join joined the company in a couple
months ago or maybe already bit more and she she's coming more from the as you know on us on our talks more from the EIT background where where my background is it is geography and inspire and so you can imagine that we have very vivid conversations in in the company in in the recent months and uh we try to capture some of that in in in this presentation and then it was of southern part of a place in this big room so I hope where we you again presented an interesting use cases so what am I I wanted to wanted to it and doing here was to address a variety aspects of of of Big Data and then mostly and the open data out there on the web and or actually most of the of the uh DDS EI services that we have which if you look at at the previous slide and are is somewhere in the middle of the Deep Web because there are not visible by by citizens a so so when I look at variety and I have to think about it the the the the 5 Open stars quite a quite as fast and the
1st hour is just get the data objects get it made made it available put it somewhere if it's the Internet put it on the internet is the Internet would onions so quite a few and structured data so as to make sure that that and the data is in in a readable in a structured format this is 1 that that made data can help with for example we had the ISO 1 9 1 1 0 standards which is the future catalog standards which in a new way I so I 1 9 1 1 5 is is far more and better embedded in the provisions in previously was indeed the ISO 1 and monitoring uh and things helps to to structured data the even machines that could could could could do and structuring of data if they have that and that feature catalog metadata at this is an obvious 1 and what give to that this was
important that the nuclear unique identification and that so aspire gotten then OGC and so and I look at individual uh data records and it could be true that the DPO judges cedar specifications for this but what we see in practice is that people have put symbol files on the web with uh and no unique identification for records and and those those are just identifies within the dataset that those are not your eyes that are uh conducting search engine search result can be uniquely go to inspired tried to this with their inspire idea and 12 many I hope that at some point we will arrive here and and and maybe discussions like this will help in that but what we see is that the implementation of the inspiring idea in in in in quite some countries is not optimal yet however the idea that and an
interface aspect is linked to other data so fortunate for sure every you 1st need to add your eyes and the ontologies that ISO 1 9 1 1 0 metadata can help here also added to the schema is what inspires schema for example then so in recently done this due for web adjusted project where I'll talk about that this afternoon and but also tomorrow and what we have a proxy layer on top of the WFS which is a very interesting approach to to to cross that if if the stock as as the AI community so in the end we would have this as a Big Data cloud with and metadata catalog in middle as kind of a bootstrap of all those nice WFS services out there this is what I jumping to
speak about the veracity so the other view of the data so this is quite obvious all metadata can help us to address varieties of rice so you can actually make a statement about data quality uh using metadata and you can even use the aggregated metadata for meaning from many samples to try and and judge what is that the quality of 1 individual sample so this is quite obvious and so regarding the value I mentioned before the value is mostly uh what comes from the the result of the analysis so if we use some kind of model where it is motion learning over a statistical model and we need to know quite a lot about the data so for instance we need to know if we have categories alleviated we have text or a floating points how many samples do you what is the the time scope and the spatial scopal of the datasets these these these are all metadata properties that are really not not only important but essential from for the analysis and now we go on to the 60 that visibility so How can we improve while Canadian promote the visibility of data and the 1 strategy is to use a standard tools such as an would system that or ISO s so that we can actually make we we can have like some meaning that part of the readings and the other 1 would be to to to fast with 2 the more friendly for to to the crawlers of searching genes so and tell them a little bit more about the the data so they can rank better than the will the web page and so there's going to be people political and more about this in talk about spatial data insertion Jinsoo if you're interested please go to this talk and now we use it then when we wrote a book about the future of metadata and we look about some other ways of creating metadata and maybe there are not so new words a we we can see it quite often this through the traditional scenario where a method that is a bottom-up sorry top-down process so easement made by by by professional people all authorized people in organizations and it's quite a time consuming and from absolutely boarding process where typing all these information about the data so we need to to the go away from this not especially if we have you which the quantities of data and 1 way of
doing this is volunteer geographic information or crowdsourcing in data and metadata so this is sample is from call work which is a European project that was focus on citizen science from collecting meaning information about the environment and in is Roger was they implemented the framework of a portal where the volunteers were able to to use their mobile devices to to collect to alter the data and metadata solely this is the opposite of having someone sitting in an office now that is that a person authorized by organization putting all the data in this case we have lots of people that volunteering at decide decide to join the project because I'm interested in and they are altering the metadata and
another another example of friends example he's using naturally and data-mining for for creating metadata so in this example of this is also another European project and there is a data mining framework where we have instructor information sold see in the scientific applications that they are in the text format or even DBpedia and through this framework we are able to extract the in structure from this information so properties about the publications like the citation network and the topics and so on so we are creating metadata they at ultimately the way directly from the data and this is also very powerful because you can we these we you can manage or you can targets and a large very large that datasets so
people opposing to say a little bit about the privacy which is something that is giving away at that inventions to meditate yes on
Germany's ozone presentation like this 1 should have
a slight ideas and metadata kills is 1 of those things so that people frequently from where us and there and this set it is also the difference what is made data but if it within the NCI sphere wanna talk about me today is about datasets and about features but but in in in the IT sphere metadata is also who is calling whom how many seconds to take all that's metadata data about a phone call and and and in this case if the submitted data and they're talking about but still even even in our sphere where we have data datasets and and features uh did did being described by by me today downward and uh didn't do the boundaries can invade orange and we have to to to understanding the and the user's privacy aspects and it knows some some really good work and did you should have a look at that every IT professional should which ordered the privacy by by design best practices and so in any software project that you do in a very early stage uh uh check out the privacy when you design the system it's about OK don't store which you don't need and if you saw something how long are you going to store and just just put it down somewhere on the paper and then decrease with your customer about that because it may come back to you a couple of time and so this so
that we have a bit more wrapper appear and but so we focus with metadata is is especially about assessment and an and discovery and but we hope to to to to have shown you a little bit more than it could be used in in more the cases in in the five-star Open Data processing and in the bandit data and uh accessibility and there was it me this but this is this is such a
principle kind of kind of slide and what we see currently is that that a lot of government state could take these initiatives to improve catalogs to improve and community engagement um but I actually doubt if if if governments to should be too big uh driver of these developments because uh the government's himself never have a role in that whole sphere and and within and that to look for and it from the other side and also we shall also did developments should also be inclusive to 2 and shows which are critical against governments and and and and then you come just disappoint who who should never found all of this should be the government should be the bigger companies or do we knew a new or existing foundations that will push this this work forward is 1 of my be questions maybe we can share from thoughts about over via this afternoon thank you you you call
for this talk Christians with and thank you to understand more clearly in your genome metadata have questions quality you will see and metadata and
sorry and in the local you will have very In Europe process on can I use word called you'll will lose those did In any case
that is normal so the
from your body if you go that that's so there's all of the database it's not meant the very core of segment of that the it's information how do you store data is the encoder what you will you can we do the same thing without
assuming it's question for you it's a question in the added that's the question pulled from our it is something that I wish show that a simple you called rules metadata in the sunlight from data what
it like you understand the your that I think you are talking about
the metadata that is not acceptable from the data yet so yes so we in this talk we wanted to use like a broad definition so that is that the data and metadata that you create can create from the data and then in this case you can mosaic generated again but then if you are introducing like inspired for instance information then if you believe it was mean you you cannot you cannot restore it again from the data so these to these these different facets I think you can ask for it you cannot restore
them mean if you the food
you introduce method few Create a few Altera metadata yourself and these metadata is lost you can understand automatically who
was a texture I mean you will feel that data and so could get metadata you find the hole in the dataset homemade and move the data for the full year course has been sort of that 1 maybe we can discuss this
in the majority of its own interests OK and again so a disclaimer
don't know much about you know Europe and of the just sent into to know and I elsewhere during the the part where you describe like unstructured data like plain text files fonts etc. of the when you talked about automating the whole
process so I was wondering about
any of and implementations and such a network to automated process so for example that you have like and what overwritten full text mining so to speak and have a lot of documents would only have animated it so that I can reproduce it and actually update metadata the new orders so this is kind of integrated into uniform would have to like to elsewhere and an important necessary a similar question
in previous talk which I think is very very valid and so so the work that we were referring to and this site that is that is from from a university in India and Portugal which is quite remote from June right now but indeed it could be a very nice addition antigen that so so it's it's a separate project doctrine vendor a name yes OK 1 last question the hi
and others that you widely used general genetic as capital for me to the top results another problem that these use that is the continent and the like to law if you wore consider this product and the why you should shown for constraint all of us
can yes so she can is is indeed a similar project um and what I
really like it is it is in the point sphere wherein the Jarvis here um you 2 years ago in a foster G we had a birds for further for catalog process session as well so prices only was there and we we discussed interoperability issues between catalogs and an yes so so there are contact them and maybe should be a middle a bit more and now and and uh the same within the rise in 2020 program which is kind of the the program of the European Union the uh some used you know what some you seek out but there's not much interaction between the projects and yet it could be that and and this section of the can and an author here it some money from together in Europe I don't use solutions is doing some working on integration between the 2 and personally I I see there's space enough for all of us see kind looks a bit more new opened we decided we are bit stronger on the Geoscience WFS integration SOS integration is better where where they all from more integration with with open data standards like the gap is an answer to your question woman no that's close this session
thank you again delusional on ammonia and called for the hottest how