Logo TIB AV-Portal Logo TIB AV-Portal

Leveraging Big Geo Data through Metadata

Video in TIB AV-Portal: Leveraging Big Geo Data through Metadata

Formal Metadata

Leveraging Big Geo Data through Metadata
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place

Content Metadata

Subject Area
The increase in the scale of traditional data sources, along with an explosion in the availability of sensor data, have originated massive volumes of data, a great deal of which is actually geolocated. This is partly due to the wide adoption of cheaper position technologies, and to the exponential growing of Volunteered Geographic Geographic Information (VGI) movements, which rely on crowdsourcing approaches. Big Data has generated a lot of interest amongst industry, the developer community and the public in general, and it has been at the core of many technology innovations which took place recently (e.g.: NoSQL, MapReduce); these new approaches already started to involve the geo community with projects such as the ESRI Spatial Framework for Hadoop or GeoTrellis, just to mention a few. However, the focus has been mostly on storing data (at the infrastructure level) and using data (at the analysis level), leaving aside challenges such as discoverability, integration or security. In this talk, we will address some of these outstanding challenges through the use of metadata and the semantic web, and show how the use of a decentralized and standardized catalog can help to unlock the five V's of Big Data: Volume, Velocity, Variety, Veracity, and most importantly, Value.
Keywords GeoCat Universitat Pompeu Fabra
Context presentation states neural network share coma metadata Computer animation Lecture/Conference case software environment record processes Sum WAN
Slides capacity varieties component Bit errors sets open subsets instance open subsets Semantic Web category means Computer animation case BUS 5th web
web pages Slides metadata van web fem strategy Forum tasks information surface Content Databases The Deep Web bits instance catalog system call Indexable category Computer animation search engine repo case communication sort record
point information server time sets catalog bits volume catalog instance rules metadata Indexable processes Geschwindigkeit Computer animation Query Geschwindigkeit Query square volumes schedule tasks
Slides presentation services varieties varieties 3rd The Deep Web open subsets 8th part open subsets variance Computer animation case processes
point implementation files formating machine maximal metadata web wave geometric specific internet objects unique record identification standards proper formating unique catalog open subsets Symbolic structured data Computer animation search engine bridges identification log files objects structure Results Cats record
standards webcrawlers time views loss part Datenqualität web strategy objects ons processes model systems link sampling bits instance production category means processes organization model reading Results point web pages modes services share algorithm varieties feedback 3rd catalog analysis potential metadata statements Sum CAMS share addresses standards information interfaces projects analysis catalog Transformers words Bootstrap Computer animation statements cloud
3rd maximal analysis iked metadata Text Mining framework information Office structure web focus information projects sampling applications system call category mining Computer animation software case framework organization WPAN automate
presentation time projects storage 3rd sets bits Privacy Sphere system call metadata Privacy second Pointer software different case boundaries systems
data processing Slides states Development 3rd drivers bits open subsets catalog metadata voting Exchange Server 2007 Service Pack 1 Computer animation URN implementation
Lecture/Conference dimension hill fan metadata
words processes Meeting/Interview case
information Meeting/Interview core Databases encoding rules metadata
information Meeting/Interview case instance metadata
map Meeting/Interview sort metadata
files Meeting/Interview Lecture/Conference Mass part
implementation processes software Meeting/Interview uniformity orders Text Mining metadata
addition constraints Meeting/Interview Lecture/Conference projects Universal law website several Results production
point standards integrators projects sheaf interactive similar bits open subsets catalog Sphere programs CAN-bus processes Meeting/Interview Authorization fitness spaces
Computer animation Lecture/Conference maximal Airy Functions
OK OK so thanks again for
being here so now I'm going to more than anything in a share some thoughts with you about the
data and the whole metadata can help us to to handle the data so as we already said the case in previous presentation of all of the data so it's a reality in some use cases and a great deal of this is that this state is actually geolocated and so as the 1st question is what is the data
and there is no definition that is relying on these 4 continents put properties which are volume velocity variety and rest so these desire for components that you need to address in this kind kind of data sets and more recently we are hearing
about a value so all of these complement so what really is value value it is related to the the capacity of extracting meaning insights from the data so drawing conclusions from data and start acting upon this conclusion so ultimately this is the the final purpose of of of data and this is mostly referred in the in the in the case of business values so related to an increase in profits and so on because this is where the most of the use cases of Big data but it could be any other thing really could be for instance predicting a d an outbreak of disease or it could be even be improving the mobility of of the city this this is all value that is extracted from from the data so in this talk we would like to introduce you to succeed which is disabilities so this is related to this the discoverability of of the data really and this initiative said such as the Open Data or the Semantic Web attention from the to the visibility of of datasets of the availability of data so they should be available and it should be related should be linked and did so that the next slide uh I will I will
talk a little bit this slide i will talk a little bit about the data that is not visible which is actually most of the the content that is in the world wide web so you can see in the people of the iceberg we have the the databases is and the indexed by the search engines and then below that we have what was called the deep web so data repositories and medical records all sorts of information a Web page from people even there and then below that we have in the dark with and also part of the human work and in the case of the dark where for some reason the people that put data they're they're very interested in not making this data visible so if you need special tools to excessive and so that that no intention of of this that of bringing this data to the surface with but in some cases into the deep where we have data that is not intentionally and so for instance if we have like a memristor polity that is really some data but this database is really like not be in a catalog so he's not harvested at all and is not indexing this that is not discoverable but this was not the original intention so In this talk we want to talk a little bit about the strategies of bringing this data that lies and additionally in the hidden layer and how can we bring it to the to the surface where and for these we we want to to use metadata will want to draw potential for a for the importance of metadata in
this task so we we are talking a lot about metadata in all this talk so what really is metadata and metadata is really these data so data about status so we in all forum for the scope of this talk we are going to consider metadata is information and about the data so anything describing documents 4 datasets and and so on so you could be for instance in the case of the telecom company could be the data that describes the phone calls and or in the In the case of geographic information could be properties of the of the dataset and so on so can metadata
x is eventually data to address each 1 of the sixties that the information before and this is what we are going to look at it with a little bit more detail so that the 1st the the
1st 2 movies volume and velocity could really be addressed together because when we have data with a low latency most of the times it results in them a very large datasets self if you watch the previous talk as you have seen that 1 way of addressing this is a rule distributing so through horizontal scalability and as a catalog can be it can be very important in these because if we use a catalog hasn't in try point for accessing information and these allow us to have a distributed data sets which can be stored remotely then we can also think about this you would think that the metadata itself and this is what also talk in the previous talk uh so process called index charging when we actually sleep uh being the extent and and you will so for the for doing this we also need metadata air and when we think about this you would think the index that the the queries uh we can actually use these index uh where this information to make it more efficient in the use of the resources like for instance to reduce the power consumption so having in mind what the what is the legacy that we we are looking for for the square is and what the the resources available so now it it'll about the variety of
so it's own enjoyment join joined the company in a couple
months ago or maybe already bit more and she she's coming more from the as you know on us on our talks more from the EIT background where where my background is it is geography and inspire and so you can imagine that we have very vivid conversations in in the company in in the recent months and uh we try to capture some of that in in in this presentation and then it was of southern part of a place in this big room so I hope where we you again presented an interesting use cases so what am I I wanted to wanted to it and doing here was to address a variety aspects of of of Big Data and then mostly and the open data out there on the web and or actually most of the of the uh DDS EI services that we have which if you look at at the previous slide and are is somewhere in the middle of the Deep Web because there are not visible by by citizens a so so when I look at variety and I have to think about it the the the the 5 Open stars quite a quite as fast and the
1st hour is just get the data objects get it made made it available put it somewhere if it's the Internet put it on the internet is the Internet would onions so quite a few and structured data so as to make sure that that and the data is in in a readable in a structured format this is 1 that that made data can help with for example we had the ISO 1 9 1 1 0 standards which is the future catalog standards which in a new way I so I 1 9 1 1 5 is is far more and better embedded in the provisions in previously was indeed the ISO 1 and monitoring uh and things helps to to structured data the even machines that could could could could do and structuring of data if they have that and that feature catalog metadata at this is an obvious 1 and what give to that this was
important that the nuclear unique identification and that so aspire gotten then OGC and so and I look at individual uh data records and it could be true that the DPO judges cedar specifications for this but what we see in practice is that people have put symbol files on the web with uh and no unique identification for records and and those those are just identifies within the dataset that those are not your eyes that are uh conducting search engine search result can be uniquely go to inspired tried to this with their inspire idea and 12 many I hope that at some point we will arrive here and and and maybe discussions like this will help in that but what we see is that the implementation of the inspiring idea in in in in quite some countries is not optimal yet however the idea that and an
interface aspect is linked to other data so fortunate for sure every you 1st need to add your eyes and the ontologies that ISO 1 9 1 1 0 metadata can help here also added to the schema is what inspires schema for example then so in recently done this due for web adjusted project where I'll talk about that this afternoon and but also tomorrow and what we have a proxy layer on top of the WFS which is a very interesting approach to to to cross that if if the stock as as the AI community so in the end we would have this as a Big Data cloud with and metadata catalog in middle as kind of a bootstrap of all those nice WFS services out there this is what I jumping to
speak about the veracity so the other view of the data so this is quite obvious all metadata can help us to address varieties of rice so you can actually make a statement about data quality uh using metadata and you can even use the aggregated metadata for meaning from many samples to try and and judge what is that the quality of 1 individual sample so this is quite obvious and so regarding the value I mentioned before the value is mostly uh what comes from the the result of the analysis so if we use some kind of model where it is motion learning over a statistical model and we need to know quite a lot about the data so for instance we need to know if we have categories alleviated we have text or a floating points how many samples do you what is the the time scope and the spatial scopal of the datasets these these these are all metadata properties that are really not not only important but essential from for the analysis and now we go on to the 60 that visibility so How can we improve while Canadian promote the visibility of data and the 1 strategy is to use a standard tools such as an would system that or ISO s so that we can actually make we we can have like some meaning that part of the readings and the other 1 would be to to to fast with 2 the more friendly for to to the crawlers of searching genes so and tell them a little bit more about the the data so they can rank better than the will the web page and so there's going to be people political and more about this in talk about spatial data insertion Jinsoo if you're interested please go to this talk and now we use it then when we wrote a book about the future of metadata and we look about some other ways of creating metadata and maybe there are not so new words a we we can see it quite often this through the traditional scenario where a method that is a bottom-up sorry top-down process so easement made by by by professional people all authorized people in organizations and it's quite a time consuming and from absolutely boarding process where typing all these information about the data so we need to to the go away from this not especially if we have you which the quantities of data and 1 way of
doing this is volunteer geographic information or crowdsourcing in data and metadata so this is sample is from call work which is a European project that was focus on citizen science from collecting meaning information about the environment and in is Roger was they implemented the framework of a portal where the volunteers were able to to use their mobile devices to to collect to alter the data and metadata solely this is the opposite of having someone sitting in an office now that is that a person authorized by organization putting all the data in this case we have lots of people that volunteering at decide decide to join the project because I'm interested in and they are altering the metadata and
another another example of friends example he's using naturally and data-mining for for creating metadata so in this example of this is also another European project and there is a data mining framework where we have instructor information sold see in the scientific applications that they are in the text format or even DBpedia and through this framework we are able to extract the in structure from this information so properties about the publications like the citation network and the topics and so on so we are creating metadata they at ultimately the way directly from the data and this is also very powerful because you can we these we you can manage or you can targets and a large very large that datasets so
people opposing to say a little bit about the privacy which is something that is giving away at that inventions to meditate yes on
Germany's ozone presentation like this 1 should have
a slight ideas and metadata kills is 1 of those things so that people frequently from where us and there and this set it is also the difference what is made data but if it within the NCI sphere wanna talk about me today is about datasets and about features but but in in in the IT sphere metadata is also who is calling whom how many seconds to take all that's metadata data about a phone call and and and in this case if the submitted data and they're talking about but still even even in our sphere where we have data datasets and and features uh did did being described by by me today downward and uh didn't do the boundaries can invade orange and we have to to to understanding the and the user's privacy aspects and it knows some some really good work and did you should have a look at that every IT professional should which ordered the privacy by by design best practices and so in any software project that you do in a very early stage uh uh check out the privacy when you design the system it's about OK don't store which you don't need and if you saw something how long are you going to store and just just put it down somewhere on the paper and then decrease with your customer about that because it may come back to you a couple of time and so this so
that we have a bit more wrapper appear and but so we focus with metadata is is especially about assessment and an and discovery and but we hope to to to to have shown you a little bit more than it could be used in in more the cases in in the five-star Open Data processing and in the bandit data and uh accessibility and there was it me this but this is this is such a
principle kind of kind of slide and what we see currently is that that a lot of government state could take these initiatives to improve catalogs to improve and community engagement um but I actually doubt if if if governments to should be too big uh driver of these developments because uh the government's himself never have a role in that whole sphere and and within and that to look for and it from the other side and also we shall also did developments should also be inclusive to 2 and shows which are critical against governments and and and and then you come just disappoint who who should never found all of this should be the government should be the bigger companies or do we knew a new or existing foundations that will push this this work forward is 1 of my be questions maybe we can share from thoughts about over via this afternoon thank you you you call
for this talk Christians with and thank you to understand more clearly in your genome metadata have questions quality you will see and metadata and
sorry and in the local you will have very In Europe process on can I use word called you'll will lose those did In any case
that is normal so the
from your body if you go that that's so there's all of the database it's not meant the very core of segment of that the it's information how do you store data is the encoder what you will you can we do the same thing without
assuming it's question for you it's a question in the added that's the question pulled from our it is something that I wish show that a simple you called rules metadata in the sunlight from data what
it like you understand the your that I think you are talking about
the metadata that is not acceptable from the data yet so yes so we in this talk we wanted to use like a broad definition so that is that the data and metadata that you create can create from the data and then in this case you can mosaic generated again but then if you are introducing like inspired for instance information then if you believe it was mean you you cannot you cannot restore it again from the data so these to these these different facets I think you can ask for it you cannot restore
them mean if you the food
you introduce method few Create a few Altera metadata yourself and these metadata is lost you can understand automatically who
was a texture I mean you will feel that data and so could get metadata you find the hole in the dataset homemade and move the data for the full year course has been sort of that 1 maybe we can discuss this
in the majority of its own interests OK and again so a disclaimer
don't know much about you know Europe and of the just sent into to know and I elsewhere during the the part where you describe like unstructured data like plain text files fonts etc. of the when you talked about automating the whole
process so I was wondering about
any of and implementations and such a network to automated process so for example that you have like and what overwritten full text mining so to speak and have a lot of documents would only have animated it so that I can reproduce it and actually update metadata the new orders so this is kind of integrated into uniform would have to like to elsewhere and an important necessary a similar question
in previous talk which I think is very very valid and so so the work that we were referring to and this site that is that is from from a university in India and Portugal which is quite remote from June right now but indeed it could be a very nice addition antigen that so so it's it's a separate project doctrine vendor a name yes OK 1 last question the hi
and others that you widely used general genetic as capital for me to the top results another problem that these use that is the continent and the like to law if you wore consider this product and the why you should shown for constraint all of us
can yes so she can is is indeed a similar project um and what I
really like it is it is in the point sphere wherein the Jarvis here um you 2 years ago in a foster G we had a birds for further for catalog process session as well so prices only was there and we we discussed interoperability issues between catalogs and an yes so so there are contact them and maybe should be a middle a bit more and now and and uh the same within the rise in 2020 program which is kind of the the program of the European Union the uh some used you know what some you seek out but there's not much interaction between the projects and yet it could be that and and this section of the can and an author here it some money from together in Europe I don't use solutions is doing some working on integration between the 2 and personally I I see there's space enough for all of us see kind looks a bit more new opened we decided we are bit stronger on the Geoscience WFS integration SOS integration is better where where they all from more integration with with open data standards like the gap is an answer to your question woman no that's close this session
thank you again delusional on ammonia and called for the hottest how