Tired of data tables full of data that I have no idea what it means? Use Sensor Things API
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 44 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/62080 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2023 | |
Production Place | Wageningen |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
8
9
10
11
15
17
19
22
23
25
27
29
32
36
37
41
00:00
MassPresentation of a groupTable (information)
00:43
Open setStandard deviationDomain nameLibrary catalogFile formatTable (information)Open setOrder (biology)System administrator
01:42
Library catalogWorkstation <Musikinstrument>Service (economics)File formatComputer networkDatabaseVariable (mathematics)FrequencyVisual systemVirtual memoryExtreme programmingInfinite conjugacy class propertyVapor barrierFormal languageTemporal logicSet (mathematics)Type theoryWorkstation <Musikinstrument>Variable (mathematics)CodeFormal languageDatabaseIdentifiabilityFrequencyVapor barrierMeasurementField (computer science)InformationData loggerComputer animation
04:28
Cellular automatonInternetworkingComputerData dictionaryNumbering schemeSubject indexingType theoryDatabaseData structureRelational databasePrimitive (album)Array data structureCategory of beingUniform resource locatorData streamCodeSymbol tableArithmetic meanSatelliteInternet der DingeCategory of beingUniform resource locatorWorkstation <Musikinstrument>Standard deviationBitMaß <Mathematik>DiagramData streamCuboidElectronic mailing listTable (information)Different (Kate Ryan album)State observerMultiplication signNumberLink (knot theory)Flow separationExecution unitDew pointMeasurementDatabaseCodecDegree (graph theory)Complete metric spaceMereologyKey (cryptography)Dependent and independent variablesTheory of relativityPositional notationComputer animation
11:40
Inclusion mapGroup actionFinitary relationWorkstation <Musikinstrument>Query languageThermal expansionObject (grammar)Dependent and independent variablesInformationSet (mathematics)State observerLogicInstance (computer science)Projective planeMoment (mathematics)Workstation <Musikinstrument>Attribute grammarVariable (mathematics)Loop (music)Standard deviationBusiness modelDependent and independent variablesCuboidPattern recognitionInformation privacyOpen setComputer animation
15:39
Query languageThermal expansionObject (grammar)Dependent and independent variablesInformationWeb browserOpen setVisualization (computer graphics)Library (computing)File viewerComputer wormMaxima and minimaDemo (music)Electronic visual displayState observerView (database)Compact spaceLevel (video gaming)Library (computing)RectangleMaxima and minimaPolygonOpen setFilter <Stochastik>Square numberComputer animation
17:25
Correlation and dependenceBusiness modeloutputChannel capacityRow (database)Set (mathematics)outputCross-correlationExtension (kinesiology)Business modelSpacetimeService (economics)NumberDifferent (Kate Ryan album)Computer animation
19:19
Standard deviationOpen setDistribution (mathematics)Moment (mathematics)Extension (kinesiology)Standard deviationRevision controlOpen setDistribution (mathematics)Table (information)Computer animation
20:55
Self-organization
21:18
Query languageDependent and independent variablesThermal expansionObject (grammar)InformationDifferent (Kate Ryan album)Variable (mathematics)Mechanism designInterior (topology)Computer animation
22:29
Business modelParameter (computer programming)CodeSubsetRecursionLinear regressionRevision controlDirection (geometry)Communications protocolFunctional (mathematics)Library catalogBitDifferent (Kate Ryan album)Data streamSet (mathematics)Centralizer and normalizerStack (abstract data type)Row (database)Computer animation
25:08
String (computer science)Predicate (grammar)CodeState observerHorizonDecision theoryMultiplication signProcess (computing)Execution unitResultantSystem identificationDisk read-and-write headPoint cloudProjective planeVariable (mathematics)Category of beingData modelOrder (biology)Computer animation
Transcript: English(auto-generated)
00:00
I'm looking for an inspiring title for this presentation. I thought about complaining about something, no? And this time, we are going to complain
00:23
about data tables full of data that we have no idea what it means. And then we are going to defend on the OTC sensor thing API is a better approach to that. I was already presented. I work at Creafti Digital Research Center in Barcelona.
00:44
So in order not to make angry anyone, I'm just going to throw the rocks into my own administration. But I have to say this is just an example.
01:02
And I'm pretty sure that you will identify the same problems everywhere. So welcome to the Catalan Open Data Portal. They also call it the transparency portal. And it's a place where our Catalan government publishes
01:22
many things related to many domains. And this includes geospatial data, also sensor data. So that's ideal to do this exercise. And they like to be standard. So they selected the standard format, the table.
01:41
So let's do a search and see what happens. The portal is working Catalan. So I have to type something that makes sense there, like meteorological data. This is what it says. And I have found something. It seems that there is a weather data record
02:01
at all the stations of the Negro of automatic meteorological stations from the Catalan Weather Service. They say this is a database that contains measurements with a frequency lower than daily, usually semi-hourly. And it is provided, of course, as a tabular data.
02:25
So this is what I get, a table. And I have different ways of getting this table, as a CSV or as an Excel file, but still a table. So the names of the fields are not very informative.
02:44
Of course, I'm trying here to simulate what a member of the public with no idea about these data sets will think about. I know these data sets for quite a while. So I probably can guess what is behind this.
03:00
But the general member of the public will be difficult to actually guess, for instance, where the station is. You see here, code CACC. OK, nice. What is the variable name that is behind this measure? 32, 33, 1? No idea.
03:21
We have two dates. So the typical problem, we have two dates. We have no idea which one is the one that is important, probably the one where the measurement took place. OK, we have a set of values. But since they belong to different variables, they are all intermixed.
03:41
It's kind of complicated. 940, probably it's a pressure. But who knows? And I have no idea what this column is, even if I have read the documentation. So don't worry. There is a place where each column is going to be defined.
04:04
Is there a language barrier there? Probably, yeah, for you. But I can assure you that for a Catalan native speaker, you are not going to extract much more information. I mean, call the variable. It's obviously the weather variable identifier.
04:21
But what else? I mean, tell me something else. Nothing more than that. I mean, this is what I get. So this cannot be happening to me, not again. I mean, I again have no idea what all these numbers means. So maybe there is a problem in the definition of a table.
04:43
So I went back to a very fundamental thing. I took a dictionary, and I looked for the term table. And it says that it's a list of entries that are identified by unique keys. And they contain related values.
05:03
They can be considered as an array of records, in relation to the base. It says that it has rows and columns. And in the intersection, you have the values. So what is missing? What is missing is the separation
05:20
and the notation of concepts. And this is what I'm trying to actually convince you today, that the sensor thing API, the OTC sensor thing API is able to do it a little bit better than this table. Let's try and take a temperature of 25 degrees in Barcelona yesterday, captured by some kind
05:42
of a sensor in a station. How we can separate concepts? Of course, we are talking about the temperature. So we need the definition of a temperature. Here you have one that says it's something related with hot and cold. Probably that's right. The temperature was 25 Celsius in Barcelona.
06:04
It tries to represent Barcelona, but it's actually an observatory uphill in this place. Here you have a nice picture, not from yesterday precisely. It says yesterday, so it has a date here.
06:20
And of course, it was captured by a modern digital sensor, like maybe this one. And it is part of a complete weather station with other instruments. So we have an observed property. This is the concept we are observing.
06:41
Sensor thing API call it observed property. We have an observation that has a date and a number. We have a data stream with all these numbers in Celsius, actually. We have two locations. And this is because in the sensor arena, you have this separation between where the sensor is
07:04
and what is really measured. Sometimes it's exactly the same because the sensor is actually in that place. Sometimes it's very different because you have a satellite and you are doing remote sensing. So in this case, I already mentioned
07:20
that the feature of interest Barcelona location of the sensor is a little bit more precise. The sensor is this stuff. And the whole weather station is called a thing because this is about the Internet of Things. This standard is for the Internet of Things. So something needed to be called a thing. And in this case, what we could call the platform,
07:41
for instance, is called the thing in sensor thing API. So UML, what I explained to you is what you are seeing now as a UML diagram. There are a little bit more properties here and there in the different boxes. But the essence is what I already explained to you.
08:04
So actually, even if it looks a little bit scary, it's really, really clear and simple and nice. So we have each and every of these concerns separated and associated with an actual concept.
08:22
So we have concepts that derives from the sensor thing API itself. So phenomenon time, for instance, to make it different from the acquisition time. The phenomenon time is actually when the observation was done. The sensor itself, the thing, the thing is in a location while the observation represents a feature of interest.
08:42
All these things are concepts that the actual standard provides to you and allows you to separate and to make it clear, nice, and independent. There are other concepts that are not in the sensor thing API because the standard cannot foreseen each and every possible measurement. So two things that are really important
09:02
here are the units of measure, in this case, Celsius. But don't get tricked by the units of measure. The user measure is not enough to actually understand what is happening. And the example of the temperature is particularly interesting because there are so many. If you look at this vocabulary that I mentioned,
09:23
there are so many. And one possible one that I could mention is this dew point temperature that is related with the relative humidity. But it's not the temperature, it's another thing. It's another kind of temperature. They cannot be immediately used together if you don't know what
09:43
you are doing with that, and both in Celsius, of course. So this is, to me, really, really important to have these vocabularies absolutely integrated into the actual data set to reduce this ambiguity. This is how an OGC sensor thing API response in JSON encoding,
10:05
the only these days provided, is looking like. The essence is it responds an array of values, and each value is actually an observation.
10:21
And here we have all we need. We have the time. We know it's the phenomenon time, so we know exactly. It's not the time the guy went there and downloaded the data. It's the actual measurement time. We have the measure, but immediately we know it belongs to a data stream that has units of measures.
10:41
This is not a temperature in this case, as we see is our air age that is actually linked to a concept. And the concept is relative humidity, and you can follow this link. We could try it, and you can follow this link, and you will get a definition of what relative humidity means.
11:01
And of course, it's presented in percent that these database, well, it's actually a document, but anyway, these database of units of measure actually says to you, this is a minus 2, no? This is a thing that is a percentage.
11:23
And you can have other observations, but even if they are all together, you immediately recognize this is a different thing because that's definitely a temperature, and this is in Celsius, so I understand what it means if I'm not an American guy that prefers Fahrenheit.
11:40
Anyway, so I understand. I'm happy. I know the what. I know the where. I know the how. I know the when. I know the who. I know almost everything. I have all the knowledge I need. I can start playing with the data immediately. Did I say the who? If you were paying attention, actually
12:02
the who was never mentioned because this is the sense of things API. So it is really interested in the thing. It assumes the thing is automatic. It assumes the thing is connected to the internet, so there is no need for anyone in the process, in the loop.
12:21
But sometimes you can try and extend these because there are circumstances where it's necessary. And one of the things that we find out is when we try to apply these very same standard for situations where these weather stations or air quality stations that we are doing in these other projects
12:42
that is called city ops, these weather stations are actually managed and owned by the citizens. The citizen is important. The citizen deserves recognition. The citizen wants recognition sometimes. He or she wants privacy too, but it wants recognition.
13:03
So we need to add some things in the model. But the model is extendable, so we can actually add these blue boxes there that is actually characterizing the who, including the concerns of the GDPR that are behind the business logic of the API.
13:26
It concludes the concept of campaigns. That is very common in citizen science. Includes license because this time, this is not an homogeneous data set. It's done by different people. Some of them, they are very open. Some of them could have some kind of concerns
13:41
or want to have a CC by because they want to recognition. So licenses can vary. And there is this concept of collaboration, grouping observations, and also relating observations together that can be in the actual Sense of Thing API instance
14:02
or outside. But this is not all what the Sense of Thing API provides to you. It provides an API because that's why it's called Sense of Thing API, of course. And this API is not one of these fancy new OTC open API things.
14:21
It's more like the other fancy OData-inspired way of doing APIs. So you can play with the URLs and start discovering things one by one. So you can get all the stations.
14:40
And this is a real example that I was playing with. This is an air quality data set that is constantly updated in the Netherlands. And well, this is the way. The first one is the way you get all your stations, all your features of interest there.
15:01
Then you can ask for one particular feature of interest using this parentheses on the number. Or you can ask for the observations in these features of interest. So you will get all these weather station variables one after the other. And you can start playing with the OData thing.
15:21
So you can select individual attributes. You can expand different attributes in a way that you get exactly the nice response you require in that particular moment. And this is where you get this one. So the URL that I'm presenting here
15:41
looks a little bit long and complicated, but it's the actual way that you get this compact view of n observations. In here, I simplified for you as two observations only. There are also spatial filtering.
16:01
We are in the OpenGL Spatial Consortium. So of course, it should be. And here below, you can see the way you do. It looks a little bit scary. But the answer is actually not. It's saying that I want what is within a geographic polygon
16:23
that is actually a square that has five corners. One is repeated. This is a polygon. And you get the features that are in that rectangle. So you can populate your map. And here, you have two examples of how to do that.
16:42
There is a library that is called STAM that can be combined with leaflet or open layers to actually access directly sensor thing API data. Or you can also use our minimum map browser, where we are extending the support to sensor thing API.
17:02
We did this experiment with biodiversity data, also mapped into sensor thing API in the biodiversity observations. There is not much sensor there. It's actually an expert looking at this bird in particular and providing a name. But more or less, you can also map it
17:22
to sensor thing API with no problem. So the fact that we are able to actually associate to these rows and columns the necessary concept creates a common understanding and immediately creates the capacity to collaborate.
17:41
So it's lowering the risks or misuse the data by just guessing what the numbers are. It allows us to visualize different concepts together and seeing their correlations. Because if we are using exactly the same standard, we will be able to, with the same tool, actually load three, four, five different services
18:02
and see everything together. It allows us to aggregate data that express the same concept. So if we know the concept is the same, then we can be sure that both the data sets are compatible. We can aggregate it into a bigger one that could have more extension, whatever,
18:20
so an extent our knowledge. By then using those as inputs into the models, the models has actual requirements on how the data should look like. If we know the concepts, we can be sure that the inputs are compatible with the requirements of the model. And we are not creating rubbish, but we
18:42
are creating knowledge. And in summary, it allows us for multidisciplinary research and innovation. So that's definitely something that is important. And actually, it gives us this way
19:00
because vocabularies are out there. And there have been some discussions in the data spaces also this morning. I attended another workshop on that. The vocabularies are out there. But the problem is not that. The problem is how we can actually associate the vocabularies to the actual data. That's the key aspect.
19:20
So for the moment where we are is the Open Geospatial Consortium has this sense of API as a standard for quite a while. There was a version 1. There was a version 1.1 that corrects small things, actually. They look like almost identical, except for a few details.
19:44
We would like to see it more and more used in situ data, in the data spaces, into these data cubes, and on other kind of infrastructures that we are trying to create. And in particular, also, of course,
20:02
into the Open Health Monitor community. So that's why we are now trying to convince you in this workshop, too. And this is because we want to replace this ambiguous CSV table distribution that everybody
20:22
seems to use and even love, but has their problems. And about sensor thing API Plus, this is this extension for the citizen science and beyond. It was introduced as a best practice in the OTC last year,
20:40
but we have been working to have it in the standard path. We hope that it can be added as a new extension for the sensor thing API in the third quarter of this year. So thanks.
21:00
I have a question, if I may start. Oh, OK. I was looking at this. I have problems pronouncing it. Q-U-D-T. Q-U-D-T. Yeah. So I see that's kind of a not-for-profit organization.
21:21
But how do you contribute to new variables? How do you contribute to it? What if the variables are not available? Yeah, that's definitely a good question. I mean, the immediate answer that I can give you is you can use whatever vocabulary you want. So actually, I was looking for, in this particular vocabulary,
21:47
I was trying to see if there is a difference between inner temperature, so the temperature in my home, from air temperature outside. And I was not able to find it.
22:02
So my first problem with this vocabulary. So you are not forcing to use this one. You can create your own. You can relate those vocabularies using the mechanisms in the linked data to do that. How to contribute to this particular organization?
22:20
I haven't done that in the past. And I do not have a direct answer. OK, thank you. OK.
22:40
About aligning these with the OpenAPI, it is not possible to describe what OData allows you to do completely in OpenAPI. So having a version of these that uses OpenAPI instead of the OData,
23:01
it will be kind of a regression in the sense that you will give up some functionality there. That's the recursivity that allows you to travel from one path to the next and add things dynamically. However, what is definitely possible is to provide a subset of what is
23:23
foreseen in the OData protocol as an OpenAPI document. So you can partially describe your API in OpenAPI. And I believe we need to try and do something like that soon because this question is a repetitive question in the OGC that seems to favor the OpenAPI,
23:44
at least for the rest of the components, I mean, features. So the question remains, and something needs to be done. So efforts are possible, and we will work on that direction.
24:04
About the stack, OK, the stack I have to think a little bit more. The stack is this catalog of assets normally used for remote sensing. Well, I suppose that some, I mean,
24:25
the data stream is actually representing more or less the different small data sets that you can extract from here. So making the data streams available
24:41
as different stack records and cataloging there, this is another thing that is perfectly possible. I'm not so familiar on how the stack is doing things in C2, but it's also true that the central thing API can be used for remote sensing.
25:01
This is another discussion we can have. So thank you for the question. I don't have a concrete answer, but let's work on that direction. Yes, that's, do I have a direct example?
25:21
I don't think I do, at least on top of my head. Every time that you need to face a new theme, you actually need to analyze which are the variables you are collecting, which are the units of measures, which are the actual results. If you have time associated, et cetera, et cetera,
25:42
and you have to map it into this data model. So that is the trick. And this is what we started to do in cost for cloud. This is the citizen science Horizon 2020 project just concluded two months ago or something like that,
26:01
for biodiversity. So we had these observatories of biodiversity. So there were people identifying animals and who is the sensor, how it's done, what is the actual atomic observation that you have to catalog, which are the relevant observed
26:21
properties. If the same guy is taking a picture, those are two different observations. One is the picture, the other one is the actual identification of the animal. Probably those are two observations that are independent, needs to be related together. So this is the kind of decisions you have to take. And my recommendation is precisely that.
26:43
Look at the data you said you have and try to separate these concepts. And that was actually the mission of my animation here, to try and show you precisely how those concepts are separated and mapped into the sensor thing API
27:01
in order to show you directly the UML. You will be dead immediately. So that's why I was trying to explain the process of how these needs to be done. And I hope that is useful for you.