Quality Management at Research Data Repositories. Results from a survey and Framework of Quality Assurance for Data Publications at Research Data Repositories
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 4 | |
Autor | 0000-0002-0167-0466 (ORCID) 0000-0002-9754-3807 (ORCID) | |
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/59609 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
00:00
Computeranimation
00:12
Computeranimation
00:29
Computeranimation
10:23
Computeranimation
Transkript: Deutsch(automatisch erzeugt)
00:03
Hello everybody, welcome to the presentation of our research on quality management at research data repositories. Our research was conducted within the project Re3Data KORF. Re3Data KORF is a project based on the International Registry Re3Data.org.
00:24
It currently indexes more than 2,900 research data repositories. The project Re3Data KORF is funded by the German Research Foundation. The project's main objective is to increase the value of Re3Data for all stakeholders
00:41
by further establishing the service as a tool for finding and referencing research data repositories. The landscape of research data repositories is well studied. In contrast, we know much less about research data quality assurance at repositories and how they contribute to the data quality.
01:01
Moreover, their activities around ensuring data quality remain largely invisible, even though data quality management contributes to the trustworthiness of repositories. There are some questions that we wanted to address in the project, and those are what does quality management include for repositories?
01:21
What is the status quo of quality management at repositories? What can be learned from each other? And which information on quality management at the repository level can be useful for whom? To address the lack of information about research data quality assurance in Re3Data KORF,
01:41
we plan to make quality assurance measures more visible within Re3Data. And for this we conducted studies of data quality and data quality assurance practices, and we also plan to revise the Re3Data metadata schema with respect to quality management.
02:02
The research is based on a mixed methods approach, and this approach consists of a literature analysis, a data journal guideline analysis, and an analysis of CTS certification documents, where I studied statements on quality assurance measures. And based on the results of all analyses,
02:21
we have conceptualized a survey we conducted last year. Next to the survey, I also developed a framework for quality assurance of data publications at research data repositories in my PhD project. And this framework is one approach to better understand the diverse measures
02:42
that can be included in the concept of quality assurance. It consists of six categories, ranging from quality definition to quality development and quality control, quality improvement, evaluation to quality documentation. And I will come back to the categories on the next slides.
03:05
With the survey, we covered several topics, such as repository characteristics, formal assessment of data, data creation, data review, quality criteria, data evaluation by repository users, and indicators of data quality.
03:26
On the sampling, we used a mailing list of 1,897 Re3Data repository contacts. The survey was online in May and June last year, and we had a response rate of 332 valid responses.
03:43
I would also like to point out that most of the survey questions allowed multiple answers, and this is shown by the number of answers on each slide, next to the number of respondents for each question. In the following, I will present an excerpt of the survey results, and I have structured the results based on the categories of the quality assurance framework.
04:08
The category quality definition covers formats repository uses to define desired characteristics of data or data collections according to its profile, and examples are collection policies, deposit guidelines,
04:21
or even specific quality checklists. We were interested in details of collection policies and the criteria repositories applied to build their collections. And what we can see in this bar chart here is that for most repositories, suitability to the scope of the repository plays an important role,
04:44
and this is followed by a formal assessment before data deposit, the description of data in a publicly accessible document, and also data corresponding to peer-reviewed publications.
05:00
The category quality development covers preventive measures to establish and ensure data quality, and in the survey we created one question that was addressing both categories of quality definition and quality development, and we asked what types of support repositories offer.
05:21
The majority of repositories offer direct support from repository staff, but also a huge amount has data deposit guidelines and uses data format recommendations. And also 87 repositories, that means more than one quarter of the participants in this survey offer specific data quality checklists.
05:45
The category quality control covers measures to determine and control data quality, such as technical routines, formal assessment, and data review, or continuous quality monitoring. On this slide you can see two pie charts.
06:03
The left one shows the number of respondents applying formal criteria before data publication, and the right one shows how many repositories conduct data review, that means the assessment of rather scientific criteria beyond formal criteria. 62% of the respondents apply formal criteria,
06:23
and data review is conducted by roundabout 50% for all data sets and for certain data sets. For the analysis we also looked at the overlap of both approaches, and in absolute numbers 122 respondents do both,
06:42
so formal assessment as well as data peer review, which means that roundabout 36% of the participating repositories do both. The category quality improvement covers measures that are taken to improve quality, such as standard data creation processes, metadata enrichment,
07:04
as well as data use specifications. And here I picked a question that we thought would also be very interesting for the community to know more about, and this is how many repositories are actually rejecting data.
07:23
And of the repositories that would consider rejecting data deposit, some provided an estimation of the rate of rejected data sets in the last two years. So on average the respondents reported a rejection rate of 8.2%, and six repositories reach or even surpass a rejection rate of 50%,
07:44
and one respondent stated that even 70% of data sets were rejected. 31 respondents reported rejection rates of 0%. The category quality evaluation covers procedures that involve repository users in data evaluation,
08:03
for instance by offering quality ratings, user surveys or quality reports. In the survey we used the question specifically asking for those measures, and what we can see here is that the majority of repositories do not involve users in quality evaluation,
08:22
or they do accept only non-public textual feedback. But we also see that nevertheless 35 repositories for instance conduct user surveys. The category quality documentation covers the documentation of quality assurance measures performed and results of quality assessment,
08:42
for instance quality reports or flags in the metadata. In a corresponding question in the survey we focused on information on the quality of published data that can be assessed by the public. We asked what types of quality indicators are made public to data users.
09:01
With 232 the majority of repositories indicated links to corresponding publications. Besides links to versions of data and usage statistics were mentioned quite often, but also metadata information on quality assurance was mentioned by 82 repositories.
09:22
And with this I'm at the last slide of this presentation, but I would like to summarize our results in short form. So we conclude that repositories play a significant role in quality assurance of data publications. We also found that the approaches are diverse and repositories perform a variety of measures and practices,
09:43
but they do not necessarily perform all categories of measures that are described in the quality assurance framework. We also have to add here that quality assurance in general is not relevant to all repositories equally. And this is of course depending on the type of service for instance.
10:03
And from our previous research, from the pretest and especially from the free text responses and comments for the survey, we can conclude that quality assurance needs to be better acknowledged and thus needs to be much more visible. And with this I am at the end of my presentation and thank you very much for your attention.
Empfehlungen
Serie mit 4 Medien