We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

ESIP Information Quality Cluster: Vision, Objectives, Accomplishments and Status - March 2019

00:00

Formal Metadata

Title
ESIP Information Quality Cluster: Vision, Objectives, Accomplishments and Status - March 2019
Title of Series
Number of Parts
19
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The ESIP Information Quality Cluster (IQC) has been active since 2014 covering a broad spectrum of topics related to information quality in Earth sciences. The cluster’s vision is to “become internationally recognized as an authoritative and responsive information resource for guiding the implementation of data quality standards and best practices of the science data systems, datasets, and data/metadata dissemination services.” The IQC considers four aspects of quality – Data Quality, Product Quality, Stewardship Quality and Service Quality.
InformationSession Initiation ProtocolSystem programmingNormed vector spaceState of matterSatelliteIntegrated development environmentDrag (physics)Distribution (mathematics)Computational physicsMachine visionInformationData centerIntegrated development environmentLine (geometry)NavigationSlide ruleMachine visionPresentation of a groupOcean currentObject (grammar)File archiverComputer animation
Computer fontMachine visionInformationTrajectoryElectronic mailing listStandard deviationSession Initiation ProtocolUsabilityImplementationProduct (business)Capability Maturity ModelMechanism designForm (programming)Open setElement (mathematics)CollaborationismSystem programmingService (economics)Moment (mathematics)InformationMereologySlide ruleComputer fontPresentation of a groupData qualityImplementationGroup actionPseudopotenzialMassCASE <Informatik>Social classTouchscreenComputer animation
Machine visionOpen setElement (mathematics)CollaborationismInformationImplementationStandard deviationService (economics)System programmingPrice indexUsabilityProduct (business)Mechanism designDigital object identifierValidity (statistics)Complete metric spaceMetadataContext awarenessRepository (publishing)Capability Maturity ModelCluster samplingTelecommunicationPersonal digital assistantGroup actionMatrix (mathematics)Focus (optics)Perspective (visual)System identificationMathematicsWeb pageWikiSession Initiation ProtocolComputer configurationSlide rulePhysical systemStatement (computer science)Integrated development environmentComputer programTime domainTrajectoryNumeral (linguistics)Menu (computing)Twin primeMaxima and minimaFinite element methodSoftwareCategory of beingElectronic mailing listKnowledge baseTraffic reportingData managementTemplate (C++)System callOffice suiteObservational studyUsabilityCodeRow (database)Data managementDiagramImplementationInformationMathematicsPerspective (visual)Data centerSelf-organizationSoftwareTelecommunicationNetwork topologyValidity (statistics)Descriptive statisticsComputer programmingSession Initiation ProtocolProduct (business)Matrix (mathematics)Level (video gaming)Category of beingIntegrated development environmentFlow separationState of matterData loggerConnected spaceForcing (mathematics)Sheaf (mathematics)Group actionMultilaterationMaterialization (paranormal)MereologyPhysical systemSoftware documentationProjective planeSlide ruleResultantSubsetConsistencyLink (knot theory)QuicksortAreaSystem callMachine visionGoodness of fitMeasurementDependent and independent variablesPlanningOperator (mathematics)MassCASE <Informatik>Process (computing)Strategy gamePresentation of a groupMetropolitan area networkTemplate (C++)Electronic signatureState observerInformation systemsComplete metric spacePersonal digital assistantKnowledge baseShared memoryPoint (geometry)Software repositorySocial classSet (mathematics)Open setWeb pageCartesian coordinate systemObservational studyEqualiser (mathematics)Internet service providerMetadataFocus (optics)Lattice (order)Electronic mailing listStatement (computer science)Software frameworkGreatest elementTraffic reportingComputer fileGene clusterView (database)WebsiteDifferent (Kate Ryan album)Fitness functionComputing platformObject (grammar)Archaeological field surveyCapability Maturity ModelElement (mathematics)Context awarenessMultiplication signUniform resource locatorStandard deviationData qualityWikiCollaborationismService (economics)Office suiteFile archiverMechanism designLanding pageSoftware developerModal logicCore dumpOcean currentProper mapAddress spaceSelectivity (electronic)Domain name2 (number)Program flowchart
Transcript: English(auto-generated)
Thanks, and thanks everyone for joining today. This is Yasin Wei. I'm from the Oak Ridge National Laboratory Distributed Active Archive Center, the ORNA deck. It's one of NASA's 12 distributed data centers. And ORNA deck is located in one of the Department
of Energy facilities. It's the Oak Ridge National Laboratory located in Tennessee, United States. And today, I'm giving this presentation about ECEIP information quality cluster, its vision,
objectives, accomplishments, and the status. And first, I want to thank all my colleagues here, Rama, Rama Prien, Ge Peng, and David Moroney. Rama is from NASA, Goddard Space Flight Center.
And he is the current lead chair for the ECEIP information quality cluster. And Ge Peng from North Carolina State University and NOAA's National Centers for Environmental Information, NCEI. And also David Moroney from Jet Propulsion Laboratory, JPL.
Both of them are co-chairs on the ECEIP information quality cluster. Actually, today, I'm giving this presentation for Rama, who unfortunately cannot join us today. Ge Peng is also with us today online.
So both of us will try to address questions, whatever questions you may have. So next, let me navigate to the next slide. I think I have some difficulty here, navigation.
OK, so bear with me for a moment. OK, great. Can you still see my screen? Yes. OK, great. Thanks. So this is today's outline of the presentation.
It contains two parts for me. The first part is the activities and outcomes and status of the ECEIP information quality cluster. And the second part of my presentation is about those activities and some
of the outcomes that come from the NASA's ESDSWG data quality working group, which is very closely related to the ECEIP information quality cluster.
So next, I think I have some trouble navigating to the next slide. OK, so first, about the ECEIP information quality cluster.
ECEIP IQC, the information quality cluster, was originally set up in 2011. Its vision was to become internationally recognized as an assertive and responsive information resource
for guiding the implementation of data quality standards and the best practices of the science data systems, data sets, and the data metadata dissemination services. And it's closely connected to the data stewardship committee.
And it has the open membership policy, as with all other collaboration areas inside of ECEIP. And here is the link to the ECEIP information quality cluster website from where you can find a lot of information and the resources
about the ECEIP IQC. So the objectives of the ECEIP IQC, the information quality cluster, includes share experiences, actively evaluate best practices and the standards for data quality
from the earth science community, to improve collection, description, discovery, and the usability of information about data quality in earth science data products. And consistently provide guidance to data managers
and stewards on the implementation of data quality best practices and standards, as well as for enhancing and improving data maturity. And also it supports data producers with information
about standards and the best practices for conveying data quality, provide mentoring as needed. And it supports data providers, distributors, intermediaries, to establish, improve, and involve mechanisms to assist the users in discovering and understanding
and applying data quality information properly. So from the perspective view of ECEIP IQC, the definition of information quality contains mainly four aspects, as listed on this slide.
The first aspect is the scientific quality. For example, the accuracy, precision, uncertainty, validity and suitability for use, or fitness for purpose in various applications.
And the second aspect is product quality. How well the scientific quality is assessed and documented and also the complete needs of metadata and documentation, provenance and context information, et cetera. And the third aspect of the information quality
is the stewardship quality and how well data being managed, preserved, and cared by an archive or data repository. And the fourth one, the fourth aspect is about the service quality, like how easy it is for users to find, get,
understand, trust, and use data, whether archive has people who understand the data available to help data users. And at the bottom of this slide, you find a link pointing to a white paper that was developed by Rama, Peng, David,
and also Chung Lin, titled, Ensuring and Improving Information Quality for Earth Science Data and Products. And that white paper talks about the information quality from the perspective view of ECIP-IQC
in much more detail. So feel free to take a look at the white paper to get more detailed information. So there have been a lot of activities going on within ECIP-IQC.
Like there has been a lot of collaborations with other ECIP clusters, and ECIP-IQC invites guests to give presentations at its monthly telecoms, as well as the bi-annual ECIP meetings.
And ECIP-IQC actively evaluates best practices and the standards for data quality from the earth science community, like develop and analyze use cases. For example, those use cases transition from the NASA Data Quality Working Group, which I will talk about a little bit later.
And also it improves collection, description, discovery, and usability of the information about data quality in earth science data products, like assess the recommendations from other groups, including the NASA Data Quality Working Group. It consistently provides guidance to data managers
and stewards on the implementation of data quality best practices and standards, as well as for enhancing and improving data maturity, like the maturity matrices developed by NOAA and adopted by other organizations. And Peng will talk about this maturity matrix
shortly after. And then it supports data producers, distributors, and intermediaries to provide information about standards and best practices for conveying data quality and mentoring as needed.
And it provides information to help establish improve and evolve mechanisms to assist the users in different areas like discovering, understanding, and applying data quality information properly. It also prepares publications and maintains the website.
It's a weekly with many useful resources and some links to those resources will be provided at the end of this presentation. So next, most recently, one major activity that UCIQC has been working on
is a white paper on earth science data uncertainty. It's still under development. And the focus of this white paper is more on discovery, which means to actively search, collect,
and organize those existing information and practices about uncertainty of earth science data. It's not to make its own recommendations about earth science data uncertainty, at least for now. That's the scope of this white paper. And this white paper is going to document
various perspectives about uncertainty. For example, mathematical, programmatic, and the user's perspective, observational, and also it tries to identify commonality and also differences between those different perspectives.
So this slide provides links to a lot of UCIQC materials, which are publicly accessible and available to the community, like the Wiki page of the UC Federation and also agency policies and guidelines
on information quality and some presentations, like those presentations that were given at UCIQC's monthly telecom. You can find those presentations following the link here. And there's some relevant standards
and relevant web pages and the IQC's Wiki pages. A lot of useful information. So feel free to take a look at those links to find some useful materials that you may get. ECIP itself is a big community.
It has a lot of collaborations, a lot of players within ECIP around the entire world, and not just within the United States, but also a lot of international players here. And ECIQC itself also has a lot of connections
and collaborations with organizations and parties from all around the world. So that's a very brief introduction to ECIQC, the activities that happen within the IQC
and some background information and also some useful resources. And at the remaining of this presentation, I'm going to talk about another activity that's very closely related to ECIQC.
It's the NASA Earth Science Data System Working Groups, one of the working groups actually, the Data Quality Working Group, DQWG. And I listed the NAWQA developed maturity matrices here as well, and Peng is going to talk about that after my presentation.
So let's go to the second part of this presentation. This is about NASA's ESDSWG Data Quality Working Group. This working group is one of NASA's Earth Science Data System Working Groups. ESDSWG is a long name.
You can think ESDSWG as something like ESIP, but with a focus of NASA's Earth Science Community. But ESIP has a broader scope. ESDSWG is mainly focused on NASA Earth Science Community.
And the Data Quality Working Group was formed as the annual meeting of the ESDSWG back in 2014, as the result of interest expressed by the NASA's Earth Science Data and Information System as this project,
and also making Earth system data records for use in research environments, the measures program, investigators. So I'm actually currently the leading chair
for NASA's Data System Working Group. And David Moroney from JPL, he was the chair for the Data Quality Working Group previously. And Rama has been the co-chair of the Data Quality Working Group
since it was created back in 2014. So as you see there, there's quite some overlap between the people that are within ESIP by QC, Information Quality Cluster and NASA's Data Quality Working Group.
And the mission statement for NASA's Data Quality Working Group is to evaluate and make recommendations to the ESDS project and the headquarters, and that means NASA headquarters, Earth Science Data Systems, ESDS program for improvements in the capturing,
representing, and enabling the use of data quality information, describing the accuracy, precision, uncertainty, and applicability, or fitness for use stewardship in the NASA Earth Science domain. And below this is the URL
that you can visit the ESDSWG website to get some more background information. About this platform. So the Data Quality Working Group was formed back in 2014, and this year is actually its fifth year,
and also it's going to be its last year. So we have been trying to wrapping up all the activities within the working group and also trying to consolidate all the outcomes from all previous years and share the outcomes with a broader community.
And this diagram shows the major Data Quality Working Group activities and outcomes between 2014 and 2018.
This is a very complicated diagram, but I'm going to try to describe it following the timeline here. So back in 2014, the DQWG started by collecting 16 use cases relevant to the
NASA Earth Science Data Information System. These are use cases about the data quality. And this is the link to the template that we used to collect data quality use cases. And you may find it useful at some situations.
And then we formed a few sub-working groups within the Data Quality Working Group to analyze those 16 use cases, and we came up with about 100 data quality recommendations.
And then we prioritized them, and we came up with 12 so-called prioritized recommendations. And that's the main activity that happened in 2014. And then we went to the second year of DQWG in 2015.
The first thing we did was we first let the working group members to vote on those 12 prioritized recommendations, and we selected four recommendations that were considered as low-hanging fruits. And then we discussed those four low-hanging fruit recommendations and came up with six high-level,
very high-level implementation strategies, and also identified some concrete implementation solutions that can help to address data quality challenges. And also there was an assessment conducted, and the results went into an assessment report.
And then we went into the third year. The very first thing we did in the third year, 2016, was we consolidated all the concrete implementation solutions that were identified into something called Solutions Master List.
And this is the URL from which you can access the Solutions Master List, and this list is aimed to provide concrete implementation solutions that can help users to address data quality issues.
Currently we have 26 solutions in the list. And then there is a question that came up. So we have those solutions, and what if someone wants to reuse the solution in their organization or in their project?
And how easy would it be? Or how difficult would it be? So there is really a necessity to evaluate the reuse readiness or easiness to use for each of those solutions. So later we discussed and came up with
a reuse readiness framework that can be potentially used to evaluate the reuse readiness of those solutions. And then on the other side, during the third year, based on the assessment report that came out in the second year, we tried to engage with the NASA Earth Science community,
including the AISD like the NASA Advanced Information Science and Technology community. And also we evaluated those use cases together with ECIP IQC at a summer ECIP meeting.
And another important thing that happened in the third year was the creation of the data call template. If you take a look at a diagram, in the middle of the diagram on the right side, the data call template, this template was created for the purpose of allowing any user,
it can be a data user or a data producer or a data provider, to evaluate the quality of NASA archived datasets from two perspectives, from the first, the science quality,
the second, the product quality, with a very consistent template or consistent approach. So that's the purpose of the template. And then we went into the, I'm going to mention that this URL points to the data call template itself.
And then we went into the fourth year. The very first thing we did was an effort to do a data call pilot study. And this study was an effort to really leverage the data call template that was created earlier
to assess selected NASA-archived datasets. So eventually we had the volunteers from six NASA DEX evaluated or assessed the quality for 14 NASA DEX-archived datasets.
And then some other activities happening in the fourth year includes the reuse readiness assessment using the reuse readiness framework that was created in the third year. So we actually used that framework to evaluate the reuse readiness
of selected solutions in the solutions master list. I said the selected solutions because this study only assessed the software solutions. Solutions may contain different categories of solutions
and software solution is one of them. And I will talk about the solutions master list in more detail in the next slide. So within the fourth year, another major activity that we started was to really wrapping up some of the activities and then consolidating the outcomes
from all previous years and try to share that with the broader community. And that was done through preparing and publishing technical note documents through the NASA SD Standards Office.
And the four years of work has accumulated in the delivery of four documents to the SD Standards Office. And the first document here is the data management plan template for DEX. The second one,
it's data producer's data management plan, the data quality selection guideline. And the third one is the data quality recommendations for data producers and distributors. And this one is really the most comprehensive one that contains all the recommendations,
all the use cases, all the solution master list, and also the methodology that we used to conduct all the activities within NASA Data Quality Working Group. And this third one is really the one that contains all the information. And the fourth one,
the high priority data quality recommendations for data producers and distributors is really a subset of the third document, but with a very focused scope targeting for NASA SDs to take some actions by following those high priority
data quality recommendations. And also we have been working on preparing another two documents. One is to summarize the outcomes and lessons learned from the reuse readiness assessment effort.
And the other one is the report on the data call pilot study. And one thing I want to mention is that the ECIP IQC later added another four use cases into the use case collection.
So right now we have total 20 use cases relevant to data quality or science data. And then I want to briefly mention the operational solutions master list.
And this list was created with the intention to identify operational solutions. Currently we have 26 relevant to the implementation strategies identified by the Data Quality Working Group. And this list right now is being actively maintained and managed by NASA SDs office.
And this is the URL to the solutions master list. It's publicly accessible to the community. And this list may be updated when there are new solutions identified that can help to address data quality issues.
So like I mentioned earlier, their solutions can be either software document or some standards and practices. And the solutions that we collected so far cover the following implementation categories. The first one is data quality information.
The second one is facilitated data centers and provider PI communication. How to help the scientists and the data center to talk with each other, to communicate with each other more efficiently. And third one is the metadata creation.
The first one is standards compliance checking and reporting. Like how do you evaluate if the metadata inside of the files or the separate metadata records follow certain standards. And another one is guidance and instructions
and also user services and a knowledge base. So that's something about the solutions master list. And then I think this is the last slide I have is about the status of those total six publications that DQWG prepared
and going to be published through the NASA SD standards office. And those six documents are grouped into two categories. The first category is mainly the data management plan template. It contains two documents,
one template for DAX and the other template for data producers. You may notice that the second document has a slightly different title from what's shown on that diagram. That's because we later renamed that document and also merged that document into another bigger document,
which right now is called the data management plan template for data producers. Those two documents have already been published through NASA SD standards office. And following the URL here, you can access those two documents. And the second category contains those four documents.
The comprehensive recommendations for data producers and distributors, the high priority recommendations. Those two will be published through NASA SD standards office very soon. So they have already been finalized.
So they will be available through the landing page. The second URL right now is not working, but it will be working. In the near future. And the last two documents, the reuse readiness assessment of data quality software products,
and the data core template and lessons learned from the pilot study. Those two documents will go through NASA SD's review process and will be published to the public at a later time. So that's all the information I have.
Thank you very much. Feel free to let me know if you have any questions. And also feel free to contact any one of us here, Rama, Pong, David or me, if you have any questions about AC BiQC or NASA
Data Quality Working Group. Thank you very much.