ESIP EnviroSensing Cluster Pt. 2 - Cluster Projects & Highlights - Nov 18 - TIB AV-Portal

ESIP EnviroSensing Cluster Pt. 2 - Cluster Projects & Highlights - Nov 18

00:00

4

Zugehöriges Material

Australian Research Data Commons (ARDC)

Strachan, Scotty

Formale Metadaten

Titel

ESIP EnviroSensing Cluster Pt. 2 - Cluster Projects & Highlights - Nov 18

Serientitel

Anzahl der Teile

19

Autor

Strachan, Scotty

Mitwirkende

Celicourt, Paul

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/42937 (DOI)

Herausgeber

Australian Research Data Commons (ARDC)

Erscheinungsjahr

Sprache

Inhaltliche Metadaten

Fachgebiet

Genre

Webinar/Tutorial

Abstract

Topic - Cart before the horse: system QA and data QC practices for sensor networks The ESIP EnviroSensing Cluster is a workgroup within the Earth Science Information Partners Federation. The cluster was formed in 2012 as a community of mostly NSF-LTER network and data managers. 5.QA for Mountain Observatories Speaker: Dr. Scotty Strachan, NevCAN

Tech Talk5 / 19

1

23:02

Exploring GLAM data (with Jupyter notebooks) - Sept 18

2

18:58

The Prosecutions Project - Sept 18

3

17:29

Using NetCDF in Jupyter notebooks - Oct 18

4

21:33

Scientific Data in the Cloud - Oct 18

5

16:53

ESIP EnviroSensing Cluster Pt. 2 - Cluster Projects & Highlights - Nov 18

6

08:02

ESIP EnviroSensing Cluster Pt. 1 - Cart before the horse: system QA and data QC practices for sensor networks - Nov 18

7

06:04

ESIP EnviroSensing Cluster Pt. 4 - An Integrated Sensor Data Management System (ISDMS) - Nov 18

8

17:41

ESIP EnviroSensing Cluster Pt. 3 - Cloud-Hosted Real-time Data Services for the Geosciences (CHORDS) - Nov 18

9

11:41

Marine Data enhanced Virtual Laboratory DEVL #1 - June 18

10

11:21

Humanities, Arts and Social Sciences DEVL #2 - July 18

11

13:20

Geoscience DEVL #2 - GeoDEVL - July 18

12

10:50

EcoCloud DEVL #1 - EcoScience Research Data Cloud & Data Enhanced Virtual Laboratory (RDC & DEVL) - June 18

13

13:43

Climate Science Data Enhanced Virtual Laboratory #1 - June 18

14

14:46

Technology of the Characterisation Virtual Laboratory (C-DEVL project) #2 - July 18

15

15:44

[Molecular] Bioscience DEVL + RDC Projects #2 - July 18

16

13:27

Astro DEVL #1: ASVO - MWA Node - June 18

17

31:06

ESIP Information Quality Cluster: Vision, Objectives, Accomplishments and Status - March 2019

18

13:03

Australia National Computational Infrastructure - Implementing a Data Quality Strategy to simplify access to data - March 2019

19

10:28

ESIP Information Quality Cluster - A Brief Overview of Maturity Models for Consistemt Data Quality Ratings - March 2019

Automatisches Abspielen

Sprache

Text

Bild

00:00

Chord <Kommunikationsprotokoll>DatennetzDatenfeldData MiningDatennetzBiteCos

00:20

SpezialrechnerSpeicherabzugTabelleVariableInformationsspeicherungDemoszene <Programmierung>FiletransferprotokollNetzwerktopologieDurchmesserTabellePunktStellenringWeb SiteCASE <Informatik>Luenberger-BeobachterGemeinsamer SpeicherStandardabweichungDialektDemoszene <Programmierung>DatennetzQuick-SortInformationsspeicherungÄquivalenzklasseVariableTurm <Mathematik>ZoomWasserdampftafelGruppenoperationDifferenteDatenflussMultiplikationsoperatorNetzwerktopologieZahlenbereichBildgebendes VerfahrenDurchmesserSommerzeitDatenloggerFrequenzBitEchtzeitsystem

02:22

Spannweite <Stochastik>Snake <Bildverarbeitung>GradientZeitzoneSpannweite <Stochastik>FastringWeb SiteMathematikTermDiagramm

03:10

ZeitzoneKonditionszahlPhysikalische TheorieDatennetzGebäude <Mathematik>Einfache GenauigkeitProgrammierungDatenverwaltungTermATMMetropolitan area networkKonditionszahlComputeranimation

03:53

Physikalisches SystemSystemprogrammierungZusammenhängender GraphTeilmengeMAPRelationentheorieDatenstrukturLeistung <Physik>DatenverwaltungAutomatische HandlungsplanungLokales NetzDatennetzAutomatische HandlungsplanungPhysikalisches SystemRechter WinkelOffene MengeSicherungskopieComputeranimation

04:13

InformationCoxeter-GruppePhysikalisches SystemDatennetzProgrammierumgebungEntscheidungsmodellAutomatische HandlungsplanungGeometrische FrustrationEinflussgrößeQuellcodeSpezialrechnerSpeicherabzugTabelleVariableInformationsspeicherungDemoszene <Programmierung>FiletransferprotokollDurchmesserNetzwerktopologieSnake <Bildverarbeitung>Spannweite <Stochastik>Zusammenhängender GraphTeilmengeMAPSystemprogrammierungRelationentheorieDatenstrukturLeistung <Physik>DatenverwaltungDatenfeldEinfach zusammenhängender RaumMultiplikationsoperatorAnalysisLuenberger-BeobachterRelationentheorieMetadatenRechter WinkelPhysikalisches SystemQuick-SortAutomatische HandlungsplanungWasserdampftafelInformationsqualitätProzess <Informatik>SoftwarewartungTeilmengeVisualisierungMAPVarietät <Mathematik>EchtzeitsystemFastringDatenverwaltungDatennetzDebuggingMultipliziererForcingDatenstrukturGamecontrollerWeb SiteTermLeistung <Physik>DiagrammComputeranimation

06:43

InformationsspeicherungDatenflussDatenstrukturModul <Datentyp>RelationentheorieAutomatische HandlungsplanungKanalkapazitätLeistung <Physik>InstantiierungAggregatzustandDatenfeldAutomatische HandlungsplanungMultiplikationsoperatorTermEinflussgrößeGruppenoperationRandwertWeb SiteZentrische Streckung

07:57

PunktLeistung <Physik>EinflussgrößeKartesische KoordinatenPunktProgrammierumgebungMetadatenFeld <Mathematik>DatenstrukturUmwandlungsenthalpieBildschirmfensterQuick-SortTurm <Mathematik>DatennetzGruppenoperationInstantiierung

09:08

Lemma <Logik>Snake <Bildverarbeitung>Physikalisches SystemSystemprogrammierungDatenstrukturTelekommunikationWeg <Topologie>Automatische HandlungsplanungLeistung <Physik>TabelleKonfiguration <Informatik>DatenstrukturNeuroinformatikParametersystemRelativitätstheorieAutomatische HandlungsplanungWeg <Topologie>DatenbankBasis <Mathematik>Physikalisches SystemQuick-SortCASE <Informatik>Luenberger-BeobachterTelekommunikationFlash-SpeicherDatennetzt-TestSatellitensystemGeradePunktBitPunktwolkeTermKompakter RaumPlastikkarteComputeranimation

10:41

Bus <Informatik>TermParallele SchnittstelleInformationsspeicherungVorzeichen <Mathematik>Negative ZahlGamecontrollerRelationentheorieLeistung <Physik>Physikalisches SystemParallele SchnittstelleMultiplikationsoperatorRelationentheorieZusammenhängender GraphMereologieLastSchaltnetzLeistung <Physik>GamecontrollerDatenfeldResultanteGeradeUmsetzung <Informatik>SubstitutionBitWeb SiteRPCQuick-SortFunktionalProdukt <Mathematik>StrömungsrichtungCASE <Informatik>

12:05

Peripheres GerätLeistung <Physik>DatenloggerQuellcodeDatenflussLeistung <Physik>GrenzschichtablösungDatenloggerSchwellwertverfahrenProdukt <Mathematik>Physikalisches SystemCASE <Informatik>Einfach zusammenhängender RaumMultiplikationsoperatorTypentheorieComputeranimation

13:07

Prozess <Informatik>DatennetzReelle ZahlZusammenhängender GraphRechenschieberDatennetzMAP

13:25

SoftwareSpeicherabzugDatennetzPhysikalisches SystemGamecontrollerLeistung <Physik>Produkt <Mathematik>Physikalisches SystemLeistung <Physik>DatenverwaltungDatennetzComputeranimation

13:39

DatenverwaltungDatenverwaltungSelbst organisierendes SystemComputeranimation

14:06

DatenmodellDatenflussKomponente <Software>DatenfeldDatenbankTelekommunikationGamecontrollerInformationDatenverwaltungMAPQuick-SortFahne <Mathematik>Interface <Schaltung>MultiplikationsoperatorElektronische PublikationWeb SiteInstantiierungProdukt <Mathematik>Flussdiagramm

15:00

InformationDatenstromPhysikalisches SystemWeb SiteDatenbankDigitale PhotographieInformationt-TestKartesische KoordinatenApp <Programm>

15:29

HardwareSystemprogrammierungPhysikalisches SystemDatenverwaltungImpulsTermPhysikalisches SystemTermQuick-SortMultiplikationsoperatorSoftwareDatenverwaltungImpulsEinfügungsdämpfung

16:27

Logischer SchlussInformationLokales MinimumPhysikalisches SystemVollständiger VerbandPunktUmsetzung <Informatik>Multiplikationsoperator

Transkript: Englisch(automatisch erzeugt)

00:00

You can see mine and I will make this active. Here we go. Okay, so I'm gonna talk a little bit about quality assurance at NEPCAN. So this is the Nevada Climate Echohydrology, excuse me, assessment network. And I'm gonna just hopefully get through it pretty quick, talk a little bit about the network

00:21

and how we are keeping long-term observations running, hopefully in a uniform manner. So a quick overview of the network. We have 12 core sites that involve 10 meter towers with a number of sensors. We use Campbell scientific data logger equipment and we operate on a three second scan interval

00:43

for most of the data loggers. So we're integrating over a very high time frequency there and we run one minute, 10 minute and 60 minute tables that are compatible with different regional meteorological data standards that we share data out to.

01:02

In some cases, we have over 50 variables on a site depending on what we're trying to measure and we're connected in real time. We basically ping the data loggers every five minutes and bring the data back. Should the network go down, we have over a year of local storage for all of the tables onboard the data loggers.

01:20

We also have cameras that are point tilt zoom that every hour in the daylight, they rotate around, take a picture of 20 different scenes, including the cardinal directions, instruments and items of interest on the sites. We also are connected to the National Phonology Camera Network that the Harvard Forest, Harvard University, University of New Hampshire group

01:42

has been maintaining and I think we have four of those cameras out on the network right now and every 30 minutes we upload both near IR and visible spectrum images to that network. We are measuring things of interest besides weather like tree sap flow and diameter,

02:01

snow water equivalent and this sorts of thing. By now, we have since 2011, we have over 3 billion sensor observations in the database and over 4 million camera images, not counting the phonology cameras. So we're starting to get to the point where we have quite a bit of data going on and the question that we had to answer as we built this

02:22

was really, how do we construct a useful and quality a long term up here, Intermountain West of North America, this region called the Great Basin, which is kind of a high cold desert down in the south

02:43

where it transitions into the Mojave Desert near Las Vegas, it becomes a hot desert. Still gets cold in the winter, but it gets very hot in the summertime if you've ever been to Vegas. Our sites cover quite a range of elevation from down below 1000 meters in the south up to 3300 meters near tree line

03:03

up in the North transect. So we're really looking at mountain gradients and changes over vegetation zones in mountain gradients. So under these conditions, building and maintaining a network is difficult and in theory would require a lot of man hours to do.

03:21

So how do you reduce the expense of maintaining this kind of a network when there's this issue of sustainability and at least in the United States, getting the National Science Foundation and the federal government in kind of a central mode

03:41

to sustain these kind of long term networks is proven difficult even within things like the LTER program. I would say every single LTER data manager would argue that they're significantly underfunded for their mission. So when planning and deploying these kinds of sensor systems, uh oh, looks like Microsoft PowerPoint's

04:01

trying to stop on me. I'm gonna fire this back up. Sorry folks. Maybe I have too many applications open. But all right, let's see what happens. Okay, hopefully we can see this.

04:20

We'll go through to here. All right, please don't stop PowerPoint. So when we're planning these kinds of system deployments, so whether you've gotten a grant to do this or you're pursuing a grant to do this or you wanna expand your network, before you jump off the dock and land in the water, there are kind of a few items

04:40

that you wanna spend a lot of time up front in and that's engineering the systems, making sure that you have the right people on board that can handle the variety of systems and the kind of the long term maintenance issues and then think really hard about that sustainability. So when I talk about sensor networks to people, I really get to these three key issues, which is what's your science question?

05:02

What are you trying to acquire the data for? Make sure that you have the right sensors for the job as well as the siting considerations and where you're locating them. Think about what your level of acceptable reliability is. Are you okay with it being down for six months, for 12 months, for two minutes? And design the support systems to that. So the support systems, things like the structure,

05:22

the protection from wildlife or from electrical disturbances, things like redundancy need to be concerned, you need to be concerned with. Power supply, of course, is one of the big issues. And then there's this idea of connectivity and telemetry, which is really the force multiplier when it comes to both accelerating the science

05:43

that you're doing, getting the data back sooner and getting to analysis sooner and evaluating issues with the data sooner, as well as checking into problems via that connectivity and deciding what you need to bring to the field the next time you go out. So having real-time connectivity to sensor networks

06:02

really reduces the turnaround time to deal with problems and it increases your overall data quality. That management, how to keep it running. So what are the tools that you're gonna implement in your data workflow, both for acquisition as well as sort of near real-time visualization,

06:21

processing and archiving? What's your access plan? How often will you be able to get out to these sites depending on where they're located? And then of course, what are the quality control and quality assurance tools that you use to both check the data and to continually enter metadata so that if there are issues, you can go back in, or if you're sharing the data,

06:40

people can evaluate how that observation came to be. So a few of the planning acquisition items that I like to go over, I encourage people to review kind of the state of the art. So get involved in a community of people that have been doing this. So for instance, the LTR group that forms the cluster that we're in,

07:01

a lot of those folks have been doing this for a long time. So they're familiar with kind of what the state of the art is. You can see people like Paul just presented on pushing the boundaries of new technology and how to make this stuff more accessible, more interchangeable, easier to deploy, more uniform. Choosing an unbiased deployment, of course,

07:20

is very important. So if you wanna measure kind of air temperature associated with the free atmosphere, put it up on a mountain ridge, don't put it down in a mountain valley, things like that that sometimes we take for granted. And then of course, scale to what you can sustain. Make sure that if it's a seven hour drive away from your home institution, that you know how often you're gonna be able to go out there,

07:40

what seasons you can access these sites in, and then what you're willing to pay to do over time in terms of both, you know, field time for a technician and gasoline for the car, whatever it is to get them out there, as well as replacement parts. So the most common failure points that we've seen,

08:01

of course, have to do with the power supply and then structural failure, animal or human disturbance or other environmental disturbances. And then I group misapplication in with the failure points because quite often we're installing sensor networks to measure something specific for kind of a specific science question or application.

08:22

You know, so whether that's a storm water flows in Michigan, like what Matt Bartos does, or whether that's air temperature associated with, let's say fire weather out where I live. If someone wants to leverage some of those sensor data to do something else, for instance, maybe someone wants to look at air temperature

08:42

across agricultural fields out of the valley below this tower, well, the temperatures up at my tower aren't gonna necessarily tell them very much about sort of the ecological window of temperatures going on down in that valley. And so misapplication of sensor network data is a common and real thing across the sciences

09:00

and we need to be aware of that. And one of the solutions, of course, is really good metadata and making sure that when we go to use data that aren't from our particular observation network, that we talk to the people who made those observations and find out kind of what their purpose was and what the assumptions were. When we talk about planning support systems,

09:20

think about building the structure for the worst case scenario. So is that three meters of snow? Is it one meter of snow? These sorts of things. How are you gonna power the system? Are you going to have, you know, things like solar power or small wind generators? Do you have line power available? What are the weak points in those systems and then how do you adjust for that?

09:41

What kind of communication can you use? So are we gonna do burst satellite transmission? Are we going to do a graduate student with a combat flash card? Or are we going to do our own wireless network? Are we going to connect to cellular communications? There are all these different options and it's always best to have two to three options on the table if you wanna run a long-term network.

10:02

Tracking system health is very important. So do you track all of the battery voltages, perhaps the battery temperatures, temperatures inside the enclosures, relative humidity inside of the enclosures? Being able to see those parameters on a day-to-day basis really lets you evaluate whether or not your support systems are actually doing what they're supposed to do.

10:20

And you could predict whether or not you're going to have a failure sometimes based on the behavior of the support system. Then of course, where did the data reside? Your support system for the data themselves. Is that a central database back in a lab? Is that a computer under a desk? Is it something in the cloud that maybe has a bit more resiliency to it? All of these sorts of questions

10:40

need to be addressed when planning. One example, so something practical for the field people in the audience. How do you make a power redundancy for one of these remote systems? Well, in this case, I have two parallel redundant PV systems, but you could substitute wind energy or you could substitute DC conversion from a hard line

11:00

into one of these systems as well and you get the same result. Basically two DC systems that have their own batteries, their own charge controllers and their own power sources that then are combined passively. In this case, if you're a electrical diagram geek, you see that we have diodes here. So there are these passive combiners that you can purchase that use diodes to combine the voltages

11:22

so you don't have moving parts. The side of the system that has the highest voltage at the time is where the current will flow through. It's pretty straightforward. All of that can go into your sort of master load to voltage disconnect and out to the production equipment. If you have a system like this, you've just built some redundancy because you can lose a good portion or any one of these components

11:42

along one of the parallel systems and you still have system functionality. When you can only get out to a site for a month or two out of the year, if you're at high elevation, for example, or it's just expensive to travel there, spending just a little bit more money up front and having more reliable data, kind of worth it for me.

12:01

And it's saved my, there's one example of how you can combine these. It's saved my bacon several times having this kind of design. Low power protection is very important. So batteries in the field die if you discharge them and then get them very cold. So you want to avoid discharging them in the first place.

12:20

So one way to do that is to use low voltage disconnects. And that's a type of basically inline sensor that will detect whether or not your battery voltage has dropped to a threshold that in some cases you can set and some depending on the product, they're preset and it will shut off the current to your production equipment

12:42

once you get to a certain threshold. You can use several of these to tier different items that have different priorities on your system. So cameras and SAP flow sensors and radios aren't as important as let's say the meteorological data and the data logger associated with that. And so I can tier my low voltage disconnect down so that the very last thing to turn off

13:02

if I've lost my charging source is the data logger itself. That's another example of maintaining kind of a high level of quality on your network. Okay, networking all the things for better science is important. I'm going to accelerate it here because we are so far behind and we only have a few minutes left. And so I'm going to go through these next slides quickly.

13:21

I apologize for having to rush, but sometimes that's how it goes. So when we network all the things, we really do network all of the things. And that's everything from the power systems to the experimental sensors to the production data loggers themselves and everything in between. Okay, researcher data management.

13:41

So this is something that's very important because researchers don't usually think about data management. My paycheck as a scientist doesn't necessarily depend on having the best data management as long as I can publish and maybe I won't perish. That's changing now, as we all know. And so researcher data management

14:00

is something that we're all playing catch up on and hopefully everybody on this call and in your organization is very concerned about. So a typical workflow that we've developed inside of the cluster, for instance, looks like this. So you have sensors at a field site, you have communications layers, you have downloaded files, you have various checks before you bring that

14:20

into a level zero or raw database. Then you can go through various quality control processes that have management, both for automated and for manual. And then you have a production, say level one database where you've got a delayed release that has quality control flags associated with it. Those are data that you would make available out to maybe the general public with a caveat

14:42

and that sort of thing. And then of course you have kind of a quality control database that you also wanna be maintaining at the same time so that people can access the checks that have been made and what sort of flags have been thrown and what those mean. And also the quality assurance management interface. So how you collect things like installation information

15:05

associated with instruments and systems on the site. So this is one application that one of our students that's inside the cluster is developing. This is a mobile application that connects directly to my particular systems database. And I can enter information, take photos, add documents

15:23

to all of the deployments on my site. And those are tied directly to the data streams in the database. So I'm pretty much gonna wrap it up here. The true cost of sensor deployments actually has a lot to do with things like lost data, having to constantly repair, losing momentum,

15:41

you're losing science discovery and all that sort of thing when you have failures. So you try to minimize those costs and try to upfront invest more in not just your sensor system, but that workflow and the expertise and the kind of the software systems that you wanna use for tracking the data and the quality at the same time.

16:01

If you invest upfront, then you're likely to have quality data that costs the same or less over the long term. Researchers have a lot of concerns on their mind and usually they just wanna buy the equipment, spend the money and then it's sort of plug and play. They get their first publication, maybe after that they don't care about it or maybe they do and they just didn't think that there are these management issues.

16:21

So these are all things that we try to make the researchers in our lives more aware of as they design their particular systems. So we'll go to the questions and we'll take, I think we'll have the rest of the conversation at this point. Meng, I'll turn it back over to you. And this is just our last cluster poster

16:41

from I think the winter 2017, 2018 meeting kind of gives an overview of what's been going on in the cluster. So thank you very much and we appreciate the time and attention.