Harmonizing the conceptualization of observation parameters
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 23 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/40191 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
2
3
7
9
16
17
22
23
00:00
TaskKartesische KoordinatenFormale SemantikParametersystemBeschreibungskomplexitätDomain <Netzwerk>Attributierte GrammatikMaß <Mathematik>DatenmodellKategorie <Mathematik>HypermediaRechenwerkMatrizenrechnungElement <Gruppentheorie>SpeicherabzugProgrammierumgebungPhasenumwandlungEinflussgrößeInformationMathematische LogikKategorie <Mathematik>Komplex <Algebra>Formale SemantikElement <Gruppentheorie>SpeicherabzugEndliche ModelltheorieProjektive EbeneGemeinsamer SpeicherNummernsystemBitDurchmesserARM <Computerarchitektur>ParametersystemWiderspruchsfreiheitWiederherstellung <Informatik>TermInformationMatchingRichtungMinimalgradAttributierte GrammatikGruppenoperationComputerspielDifferentePhysikalisches SystemSoftwareDienst <Informatik>Demoszene <Programmierung>Gewicht <Ausgleichsrechnung>MultiplikationsoperatorCAMBefehl <Informatik>MAPPhysikalischer EffektTypentheorieCASE <Informatik>RuhmasseWorkstation <Musikinstrument>PlastikkarteFormation <Mathematik>Objekt <Kategorie>GarbentheorieKlasse <Mathematik>Luenberger-BeobachterSchnittmengeWasserdampftafelEinflussgrößeVarietät <Mathematik>Prozess <Informatik>AggregatzustandQuellcodeKonzentrizitätGradientRechenwerkUnternehmensarchitekturGüte der AnpassungTwitter <Softwareplattform>ZahlenbereichMetropolitan area networkBildschirmmaskeTaskArithmetisches MittelMultifunktionMaßerweiterungZentrische StreckungVererbungshierarchieCharakteristisches PolynomKontextbezogenes SystemSchlüsselverwaltungStandardabweichungQuick-SortRankingReelle ZahlNormalvektorSichtenkonzeptBestimmtheitsmaßElementare ZahlentheorieART-NetzMathematikSuite <Programmpaket>Protokoll <Datenverarbeitungssystem>MereologieGenerator <Informatik>PunktStrömungsrichtungFächer <Mathematik>URLChord <Kommunikationsprotokoll>Natürliche ZahlUnrundheitDeskriptive StatistikIntegralMinkowski-MetrikFormale SpracheEDV-BeratungSchlussregelCoxeter-GruppeSoundverarbeitungDatenfeldRechenzentrumReferenzmodellOntologie <Wissensverarbeitung>Nabel <Mathematik>DatenverwaltungSummierbarkeitDebuggingDatensatzSteifes AnfangswertproblemTopologieFamilie <Mathematik>SoftwaretestPaarvergleichCodierungTabelleMotion CapturingMatrizenrechnungZählenDomain <Netzwerk>UmwandlungsenthalpieProgrammierumgebungStichprobenumfangSpezifisches VolumenDifferentialGebäude <Mathematik>BeobachtungsstudieZahlenwertHarmonische AnalyseRechenschieberElektronisches ForumPerfekte GruppeSelbstrepräsentationMapping <Computergraphik>Vollständiger VerbandFrequenzSchreiben <Datenverarbeitung>KoordinatenKonsistenz <Informatik>Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
00:00
So, I am working on harmonizing the conceptualization of observation parameters. I would like to introduce me a little bit. I'm a Semantic Analyst and Data Manager at the Environmental Agency of Austria, and I had been co-developing CERON, so this is a social ecological research and observation
00:26
ontology in 2009 until 2011. This ontology has not been used so much because we had no money left to continue the research
00:42
on that, but we continued to work on vocabularies and we developed ENF-TES, which was based on the CERONTO findings, and I'm the coordinator of ENF-TES, and ENF-TES is a vocabulary used
01:03
by a research infrastructure called Long-Term Ecosystem Research, and there's International, the ILTER, and the ELTER Europe, and I'm coordinating it for Europe. And, yeah, there's also the worldwide ELTER, so the ILTER is the international one, the worldwide.
01:31
Okay, in the U.S. they are using another vocabulary which is compatible to ENF-TES,
01:45
so we helped each other and we are – in the ongoing months we are going to extend each other's vocabulary, so this will, yeah, be very compatible for researchers
02:01
working with different datasets. Now I am involved in ENRIB-PLAS. ENRIB-PLAS is a project dealing with a lot of different research infrastructures, environmental research infrastructures, and I'm involved in the data theme where I also developed reference model with other researchers for ENRIB-PLAS, and I'm also coordinating
02:27
the provenance task there. And ENRIB-PLAS is now – will terminate in April next year, but it will be followed up by ENRIB-FAIR, so another four years of project going – as a building on the
02:46
research done so far. I'm also involved in the RDIA, the Research Data Alliance, which is an international community, targeting to enable open sharing and the use of data, and there are – I'm
03:07
also involved in – so I'm involved in the WISIG, that's the Vocabulary and Semantic Services Interest Group led by Simon Cox and others, and this WISIG interest group is
03:23
organized in different task groups, and one of these task groups started at the beginning of this year, and I was interested to bring together different people with the same objective,
03:45
which is the harmonization of measurement parameters, and I do this leading together with Michael Diegenbrock, which is working in Pangea.
04:03
So, this is a rather large group of stakeholders involved, so we have different research infrastructure representatives, so long-term ecosystem research – that's me, Alessandro Ojoni, and Philip Tranbev.
04:21
Then we have the ILTA from the ILTA Network, Kristin Vanderbilt, and I'm invited in October to speak also in California at their – sorry, I made an error. It is not ILTA, it is U.S. ILTA.
04:40
It's the U.S. long-term ecosystem research network there, and I'm invited to talk to them and, yeah, to harmonize these different approaches also with them. Then ICOS is involved, ANARE, LiveWatch, Aquadeva.
05:03
These are all different research infrastructures dealing with different domains in their environmental research, and lots of people which – who are also partly involved in others not, like those from Aquadeva.
05:25
And we are also working together with data centers like Pangea, which is led by Michael Diebenbrock, with BODC, CDatanet, Alexander Kokinaki – probably you know her –
05:45
and you have BIO, Nawal Karam, so different data centers. And we are also working together with different representatives of technologies, ontologies. So, Bioportal, John Graybiel, and Envo.
06:03
So, Pierluigi Putliak is also interested to join this effort, and Chris Mangle, and also EPIC with Ulrich Schwadmann. And also with Markus Stoker.
06:20
I'm not sure if he has joined us. Do you see if he's there? I don't think he's joined, Barbara, but Simon Cox has. He's joining us. Oh, perfect. And just to point out to other people, John Graybiel has presented in this forum before, and what we see here is quite a small community, I think, because I and certainly
06:40
Simon and you, Barbara, know quite a few of the people that are listed on this slide here. So, this community of ours, the vocabulary we're working on, is not enormous. At least, environmental vocabulary is not enormous. No, no, but for that, that we started, we had to work together. I think it's a good starting point, yeah?
07:00
So, and we are completely interested to enlarge this group, and also to invite you to join this effort or to see how we can learn from each other. So, but for me, it was really interesting to see that the interest grows on it, because
07:21
it seems that there are lots of approaches, but not everything is solved yet. So, that's what we found out so far. So, what we intend to do in this task group is to develop best practices and generally
07:41
accepted models for scientific observation and measurement parameters, including also, if possible, measurement methods and devices by using agreed core immunologies. So, this, because we want to annotate research data with these vocabularies,
08:01
and to, at the end, be able to improve interoperability for data discovery and data integration. So, the challenge is, we want to analyze ecological phenomena across geographic,
08:21
temporal, biological scales, and this requires a lot, a variety of existing datasets, which might derive from different data centers. So, we are dealing with observational data, which is often represented in tabular form,
08:43
and this data differs in number of attributes and relationships implied between the attributes and the coding conventions used for representing information within datasets. So, that's not that trivial as it might seem in the beginning, because
09:10
although there are already a lot of different schemas trying to solve this issue, these schemas are different, very different in scale, and are incompatible to each other.
09:28
And they capture the data semantics in different complexity. So, they describe the complexity
09:41
in different ways, and sometimes they stay at a very rough level. They provide semantics for specific domains, and for each domain there are different approaches. They indicate the admitted value of attributes, sometimes very vaguely,
10:06
and they are count or not for the specification of units. So, most schemas capture insufficiently data semantics by conflating associated attributes, and thus are not suitable to correctly describe unambiguously complex parameters. And I will show it only on a few
10:25
parameters here, or on a few examples. So, this is how we always have to deal with, we have to deal with observational data. They come along as tabular information with columns
10:45
indicating different attributes. It's not always sure what the abbreviation means, or researchers have very often to interpret and compare and try to find out what this really meant.
11:06
Sorry, I'm at home and I have a dog, and sometimes he barks, and I'm sorry about that. So, it is not always easy to understand what is meant by the different columns,
11:26
but to come to the point what we mean with conceptualization is that there's a property, a monitored property, which could be described in a sentence like monthly
11:42
mean dissolved lead in parts per billion in water taken from the river Thames by sampling. And so, using the O&M nomenclature, we would say that we could differentiate between a feature
12:05
and observable property in the process. And in this case, the features or the object in nature which has a location would be x and y coordinates
12:24
would be the river Thames. The observable property, the monthly mean dissolved lead in water, and the process would be sampling by sampling, and this could be then more specified. But
12:42
the issue which makes it more difficult is to try to decompose the middle element here, the observable property in atomic elements. So, for example, we could say what is meant
13:03
a monthly mean here, what is dissolved lead concentration, the BPP water. So, how can we hold them and what are these elements? And we can see that this is dealt very differently
13:24
in the different approaches. For example, in CERONTO-COA, which I was developing with my LTR team, it's not going into detail when we talk about this atomization. So, what we have here is that
13:52
we have in the middle the value set, which is the observation, more or less, and which has an investigation object we are observing, and the parameter method is a
14:09
complex element which has a parameter and has a method. But how this parameter itself is composed here, it is not clear. It lets it open. And if you look at OBOR, for example,
14:33
which is used by ACQUADIVA and by ANAL, we can find out that the model identifies entities
14:41
or objects being observed and observation of entities and their corresponding measurements. And for each measurement, there is the value of a characteristic of the entity according to a measurement standard or a protocol and the context assumed by each measurement and observation.
15:03
So, but it is not really clear how this characteristic should be, as what it is about, yeah, this characteristic. And I make this a bit more clear
15:21
at the exam. I'm trying to describe this example tree diameter at rest height, and you could say that the observation has an entity, in this case tree, and the measurement on that tree would be using an element characteristic, which is the diameter at rest height,
15:49
and this measurement has a value and uses a standard. But we could also say nobody would hinder us using OBOR, that the characteristic in this case is diameter,
16:04
and the protocol would be measured at rest height. So it's up to the researcher then to decide which, yeah, how he describes it within OBOR. And we can see that in the different
16:21
research infrastructures, as we have in our group, Anar and Aquadeva, that they use it differently. So if we then try to describe more complex parameters, like concentration of nitrate in soil water, we find that OBOR is limited because the characteristic in that case
16:49
is the nitrate concentration. So you have then to bring together two concepts or two
17:02
elements together in one concept. But you could also say that it might be important to differentiate between concentration and nitrate. So it's just to make this clear, and also soil water, which is the entity, could be split in soil and water, yeah, because
17:23
you are observing the nitrate concentration in the water of the matrix soil, and this is not depicted here. And if you look at the observation measurements of Simon Cox, then you see that
17:43
we talk here about observed properties, about the phenomenon, but we are not going into detail what this phenomenon could be. But there are extensions of this. And I will go
18:02
soon explaining this, but I just wanted to compare O and M and OBOR, and this is more or less rather simple to compare. There could be some... I think they're rather clear how they could be compared, and then I tried it here.
18:26
But still, when it comes to properties, they both don't have a real good solution, because when I come across Leibniz and Forden, they use an extension of O and M, and say that the
18:46
observable property, the phenomenon types, could be better described if they are split into
19:01
different other concepts, like object of interest and property, but also matrix and statistical measures and so on. So they go for the approach to atomize this observable property and different other elements. And
19:31
in ANSTES, I try to follow this scheme and say, okay, if we are looking at
19:44
if we are looking at the concentration of sulfur in soil water, we can atomize this description in different elements and say concentration would be the property, sulfate would be the object of interest,
20:05
the per unit volume unit, and soil water would be the matrix, and lysimeter, the device, for example. And in ANSTES, we have also the compound concept used by the scientists,
20:24
which would be then the parameter. The parameter is then a compound description of what could be then split in different elements, atomic concepts. So you would find both
20:42
levels of descriptions. And for simplicity reasons, we would not have concentration of sulfate in soil water, but concentration of sulfate, which is not, I'm not sure if this is
21:01
the best way to do it, because it's always a compromise between what is used by the scientists and what would be sensible to use for consistency reasons. So probably, so ANSTES is not finished, so this is in continuous evolvement, and it should also be like that. And probably,
21:29
we will come up with different solutions so that we could have concentration of sulfate and as a parent, and then differentiate it where this concentration of sulfate could be
21:43
to have other parameter specifications. But this is not really decided, but the status is that we have concentration of sulfate as a parameter.
22:00
And what could also, so if you are interested to know more about ANSTES, we can also talk about that afterwards, because I am more expert, so I can more explain you later on ANSTES and on the other models here, presented here. Then in Pangaea, they also try
22:23
to describe the different parameters depending on the complexity. And there are some approaches proposed by Robert Huber, but it is not fixed yet, because they are Pangaea,
22:47
as Michael Diepenbroek really looks for solutions which are then accepted not only for Pangaea, for Germany, but they want to use data from all the whole Europe and other
23:08
parts of the world and then try to really find a common approach or a mappable approach. So, they are still open to change this according to the outcoming of this group.
23:26
So, this is all in discussion. And PODC, so this is also a data center in France, which at least we see data, we see data information and they seem also to follow this
23:50
approach of automizing complex properties. And they have also a composer for the description
24:08
of parameters. And to compose this description, they use for each of these elements vocabularies. So, for example, for the measurement of the property, they have the vocabulary as six, which
24:24
are, for example, temperature, uptake rate, abundance, concentration, or the entity or object could be a biological entity or chemical entity or physical quantity. And for each of them, they have a specific vocabularies, they use, or they have the environmental matrix compartment,
24:51
or the measurement matrix relationship, which is helping to compose the sentence of the observation.
25:00
So, it seems, but I'm not sure, or I'm sure that we will find some inconsistencies also there with the approach I presented before, but it seems that it goes in this direction, in the same direction. So, what we did so far, we had six meetings where each of us presented
25:25
our approach. And we found out that we need to go into much more detail and really
25:43
and look for compare. So, describing complex use cases with our own approaches, then analyze the real differences between the different approaches and then find out where
26:02
is the gap. Is there a possibility to map it so that it is anonymous? Can we then find a common approach? What can we do? And this is not done just by presenting each other's approach. This is done by doing a really working group out of it.
26:26
So, we decided to prepare a case statement to become a working group. And this will then start in this autumn. So, I cannot present here some news in that sense
26:43
that we have invented already a new approach, but there is a big intention to go in this direction. So, this is not the roadmap because it has not been agreed so far, but it's some
27:01
ideas. So, we want to agree on core terms so that if I say property, everybody in the group understands what I mean by property. That's not so sure because everybody has another connotation to this term. So, we have to find our own language. Whatever then at the end
27:21
stands there for a specific concept. For me, it doesn't matter which name we use then. We have to agree on terminologies, on already existing terminologies for core elements,
27:40
for example, for chemical elements. Then choose good use cases with different complexity degrees. Then try to describe these specific use cases with each other's
28:03
approaches we have in our communities. And then compare them and find out the gaps, differences and whatever. Come to a conclusion where we overlap.
28:21
And out of that, develop a common model or mapping scheme between all of these different approaches. And then produce guidelines, write publications and so on. But for doing this, we need also some money, a project behind, and we are working hard on getting some funding for
28:47
this work. But I'm optimistic. I'm sure we will find a way to work on this because we really need to go on with this. So, for sure we are looking to involve other important
29:09
stakeholders. I will also try to bring together the people after the long summer period
29:22
again in autumn. And I invite also you to join us. That's what I wanted to tell you. So, if you have any questions, please just go ahead. Thank you.