We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

BEGIN: metadata for meaningful data metrics

00:00

Formale Metadaten

Titel
BEGIN: metadata for meaningful data metrics
Serientitel
Anzahl der Teile
5
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Make Data Count (MDC) is a scholarly change initiative, made up of researchers and open infrastructure experts, building and advocating for evidence-based open data metrics. Throughout MDC’s tenure, various areas key to the development of research data assessment metrics have been identified. Please join a Spring seminar and discussion series centered around priority work areas, adjacent initiatives to learn from, and steps that can be taken immediately to drive diverse research communities towards assessment and reward for open data. The third and last webinar in our series “BEGIN: metadata for meaningful metrics” will look at next steps to develop responsible and fair data metrics that can reflect the use and impact of research datasets and help elevate them to first-class scholarly outputs. We’ll focus on necessary metadata to construct metrics that take into account characteristics and contexts of open data across disciplines. Speakers include: 00:00 Introduction by Matt Buys (Executive Director of DataCite) 01:49 BEGIN: metadata for meaningful data metrics – Stefanie Haustein (University of Ottawa), 06:53 Meaningful Data Metrics for Whom? – Christine Borgman (UCLA), 14:13 Data Metrics and the Reward System of Science – Rodrigo Costas (Leiden University), 22:41 Developing responsible and fair data metrics – Nicolas Robinson-Garcia (Granada University), 29:56 Meaningful Indicators for Research Data – Isabella Peters (ZBW), 38:09 Q&A session
GruppenoperationUmsetzung <Informatik>Divergente ReiheMustererkennungMengeMultiplikationsoperatorComputerspielWeb SiteDreiecksfreier GraphBesprechung/Interview
Elektronische PublikationSichtenkonzeptDateiformatRückkanalTwitter <Softwareplattform>GoogolCoxeter-GruppeApp <Programm>Metrisches SystemDrucksondierungNeuronales NetzPerspektiveStandardabweichungRechnernetzZählenBeobachtungsstudieMultiplikationsoperatorMetrisches SystemBitKontextbezogenes SystemUmsetzung <Informatik>MetadatenDivergente ReiheQuelle <Physik>Web SiteTeilbarkeitElektronische BibliothekTwitter <Softwareplattform>Offene MengeSoftwareentwicklerTermFokalpunktAutomatische IndexierungFunktionalDreiecksfreier GraphMathematikFormation <Mathematik>PunktProjektive EbeneComputerspielStandardabweichungSichtenkonzeptSelbst organisierendes SystemExogene VariableChatten <Kommunikation>Besprechung/InterviewComputeranimationXML
Treiber <Programm>Framework <Informatik>Komponente <Software>Metrisches SystemStandardabweichungZählenLeistungsbewertungCodeStandardabweichungMetrisches SystemMereologieTermFunktion <Mathematik>MetadatenMAPRechter WinkelProjektive EbeneComputeranimation
MAPSondierungMixed RealityAnalysisPerspektiveMetadatenMetrisches SystemProjektive EbeneMomentenproblemKontextbezogenes SystemMetadatenPunktMetrisches SystemMixed RealitySondierungDivergente ReiheGenerator <Informatik>Computeranimation
InformationBeobachtungsstudieMetrisches SystemNichtunterscheidbarkeitInformationBeobachtungsstudieAutomatische HandlungsplanungMetrisches SystemMengeZahlenbereichCASE <Informatik>Besprechung/Interview
DatenverwaltungMetrisches SystemMetadatenIndexberechnungHauptidealSondierungNetzwerk-gebundene SpeicherungCASE <Informatik>StabMinkowski-MetrikPaarvergleichMetrisches SystemMultiplikationsoperatorUmsetzung <Informatik>Güte der AnpassungProdukt <Mathematik>GrenzschichtablösungPackprogrammZahlenbereichNatürliche ZahlCASE <Informatik>DifferenteBeobachtungsstudieMereologieComputerspielSchwellwertverfahrenParametersystemGesetz <Physik>BitEinflussgrößeReelle ZahlRechenschieberResultantePaarvergleichMinkowski-MetrikSpieltheorieComputeranimation
Metrisches SystemMetadatenDatenmissbrauchMetrisches SystemDatenfeldAutorisierungEinsMetadatenQuellcodeXMLUML
MetadatenMetrisches SystemQuellcodeEbeneRadikal <Mathematik>DatenverwaltungInformationAssoziativgesetzTelekommunikationTheoretische PhysikSolar-terrestrische PhysikQuellcodeDatensatzGüte der AnpassungMetrisches SystemTelekommunikationCASE <Informatik>MetadatenBeobachtungsstudieRechenschieberGrundraumBefehl <Informatik>ComputeranimationBesprechung/Interview
Innerer PunktMetrisches SystemFontMeterDelisches ProblemTouchscreenPerspektiveUmsetzung <Informatik>MAPMultiplikationsoperatorBesprechung/InterviewComputeranimation
Endliche ModelltheorieEreignishorizontMetrisches SystemInformationPhysikalisches SystemTabelleStrom <Mathematik>PaarvergleichDatenmodellTypentheorieTwitter <Softwareplattform>Soziale Softwarep-V-DiagrammSichtenkonzeptDokumentenserverQuellcodeFacebookStochastische AbhängigkeitDimensionsanalyseEinflussgrößeTransaktionMetadatenDatenfeldVollständigkeitRechenwerkATMDigital Object IdentifierProdukt <Mathematik>SystemprogrammierungHypermediaSoftwareentwicklerProdukt <Mathematik>MetadatenTypentheoriePerspektiveMengeMetrisches SystemKategorie <Mathematik>CodecProzess <Informatik>PaarvergleichMultiplikationsoperatorRouterSichtenkonzeptDatenfeldArithmetisches MittelFunktion <Mathematik>VerkehrsinformationElement <Gruppentheorie>StandardabweichungQuick-SortKreisflächeOrdnung <Mathematik>DatenbankMAPAttributierte GrammatikGemeinsamer SpeicherKollaboration <Informatik>HauptidealringTwitter <Softwareplattform>Coxeter-GruppeAnalytische MengeInformationPunktSpiegelung <Mathematik>Objektorientierte ProgrammierspracheKlasse <Mathematik>VollständigkeitQuellcodeNeuroinformatikZahlenbereichDigitale PhotographieComputeranimationXML
Shape <Informatik>SichtenkonzeptDatensatzSystemprogrammierungMetadatenProdukt <Mathematik>RichtungMetrisches SystemVollständigkeitRechenwerkDatenfeldATMDigital Object IdentifierSinguläres IntegralEinflussgrößeCAEKreisflächeTransaktionCoxeter-GruppePunktEinflussgrößeSichtenkonzeptProzess <Informatik>MengeWellenpaketKreisflächeGemeinsamer SpeicherMustererkennungCodecInformationt-TestComputeranimationXML
DateiformatKomplementaritätMetrisches SystemLesezeichen <Internet>RankingCoxeter-GruppeInformationTelekommunikationBeobachtungsstudieGrundraumRechenschieberMetrisches SystemRelativitätstheorieMengeDatenfeldQuellcodeAnalysisTaskSpieltheorieMAPArithmetisches MittelTermBesprechung/InterviewComputeranimationJSONXMLUML
SystemtechnikQuellcodeInformationDokumentenserverCASE <Informatik>DifferenteWendepunktDigital Object IdentifierCAN-BusBeobachtungsstudieTermMetrisches SystemPhysikalisches SystemProfil <Aerodynamik>InformationKollaboration <Informatik>IdentifizierbarkeitMetadatenQuellcodeBeobachtungsstudieDokumentenserverDatenbankAutorisierungMengeComputeranimation
QuellcodeInformationNummernsystemDemoszene <Programmierung>ComputerSchriftzeichenerkennungDokumentenserverCASE <Informatik>DifferenteMAPVollständigkeitQuellcodeDatensatzBeobachtungsstudieStrömungsrichtungMetrisches SystemTypentheorieProfil <Aerodynamik>DifferenteKugelLeistungsbewertungDatenfeldEinsPhysikalisches SystemCASE <Informatik>Statistische HypotheseProjektive EbeneBitComputeranimationDiagramm
SystementwurfEntscheidungstheorieKomponente <Software>QuellcodeGasströmungBeobachtungsstudieDruckverlaufMetrisches SystemLeistungsbewertungEndliche ModelltheorieWärmeübergangSystemaufrufWiderspruchsfreiheitBildschirmmaskeSCI <Informatik>Mathematische LogikDatenmodellTaskDistributionenraumProjektive EbeneFramework <Informatik>Funktion <Mathematik>BitOffene MengeProdukt <Mathematik>SystemplattformPerspektiveComputeranimationBesprechung/Interview
PowerPointSpeicherabzugSystem-on-ChipQuick-SortThermische ZustandsgleichungSupersymmetrieElektronischer Fingerabdruckp-V-DiagrammMetrisches SystemMetadatenKontextbezogenes SystemKomplementaritätLeistungsbewertungOffene MengeVorzeichen <Mathematik>SoundverarbeitungVersionsverwaltungTotal <Mathematik>VolumenRuhmasseEinflussgrößeMaßstabVerdünnung <Bildverarbeitung>Exogene VariableDigital Object IdentifierMinkowski-MetrikVektorrechnungTemporale LogikMustererkennungStellenringUmwandlungsenthalpieFrequenzART-NetzApache ForrestRelation <Informatik>Web-SeiteBenutzerbeteiligungPhysikalisches SystemMetrisches SystemInformationZählenIndexberechnungSpeicherabzugProjektive EbeneMultiplikationsoperatorRechenwerkData Matrix CodeKontextbezogenes SystemMereologieDatenverwaltungSoftwareentwicklerBitMetadatenPaarvergleichWeb SiteRechenschieberService providerDigital Object IdentifierMengeGrundraumValiditätGüte der AnpassungComputeranimation
DatenfeldMathematikComputerInformationTelekommunikationBaumechanikProgrammierumgebungHypermediaNotebook-ComputerPhysikalisches SystemMetrisches SystemRechter WinkelCMM <Software Engineering>InformationMengeWeb SiteGeradeAntwortfunktionIndexberechnungKonstruktor <Informatik>Computeranimation
SurjektivitätStrategisches SpielEntscheidungsbaumATMKette <Mathematik>AggregatzustandAnalysisDateiformatBeschreibungskomplexitätGradientDokumentenserverInformationsverarbeitungMAPBeobachtungsstudieComputerspielNotepad-ComputerVersionsverwaltungFokalpunktMetrisches SystemDatenmodellParametersystemFolge <Mathematik>MultiplikationssatzBereichsschätzungPhysikalisches SystemInformationZählenKontextbezogenes SystemZahlenbereichAutorisierungMengeCharakteristisches PolynomTwitter <Softwareplattform>BeobachtungsstudieTermRechter WinkelElektronische PublikationDifferenteRechenwerkTeilbarkeitOrdnung <Mathematik>PaarvergleichComputeranimation
PerspektiveMetrisches SystemMultiplikationsoperatorOrdnung <Mathematik>PunktZahlenbereichKontextbezogenes SystemProjektive EbeneFlächeninhaltQuaderSprachsyntheseBesprechung/Interview
IndexberechnungZahlenbereichSchwellwertverfahrenSpieltheorieFreewareBesprechung/Interview
GasströmungTermInhalt <Mathematik>MengeMetadatenErwartungswertProdukt <Mathematik>OrtsoperatorAutorisierungAutomatische IndexierungDigital Object IdentifierBesprechung/Interview
ProgrammierumgebungKreisflächeVirtuelle MaschineEinflussgrößeMengeMereologieDatenmissbrauchTypentheorieDatenfeldPerspektiveBesprechung/Interview
PerspektiveMathematikGemeinsamer SpeicherMereologieBesprechung/Interview
IndexberechnungGemeinsamer SpeicherExogene VariableLoopMetadatenMetrisches SystemBitPunktPerspektivePhysikalisches SystemKreisflächeMusterspracheBesprechung/Interview
Reelle ZahlFlächeninhaltArithmetisches MittelMengeRechenwerkDifferenteBesprechung/Interview
PerspektiveBeobachtungsstudieMetrisches SystemSoftwareentwicklerAutorisierungEin-AusgabeFlächeninhaltProdukt <Mathematik>MengeIndexberechnungSpiegelung <Mathematik>Motion CapturingInformationBitMultiplikationsoperatorKollaboration <Informatik>MusterspracheTeilbarkeitAutomatische IndexierungMeterBesprechung/Interview
BeobachtungsstudieDatensatzMereologieInstantiierungAutomatische HandlungsplanungBitTaskMengePhysikalische TheorieTermMetrisches SystemDatenverwaltungUmwandlungsenthalpieSoftwareschwachstelleBesprechung/Interview
TermIndexberechnungMetrisches SystemDifferentePerspektiveMetadatenCASE <Informatik>ValiditätDatensatzBesprechung/Interview
FlächeninhaltBitMultiplikationsoperatorGemeinsamer SpeicherDatensatzBesprechung/Interview
FlächeninhaltMultiplikationsoperatorMetadatenWeb SiteProjektive EbeneDatensatzMotion CapturingZählenVerschlingungYouTubeFlächentheorieVirtualisierungMetrisches SystemPerspektiveWeb-SeiteBesprechung/Interview
Besprechung/Interview
Transkript: Englisch(automatisch erzeugt)
Okay, I think we should probably start getting going. I'll kick things off. For those of you that don't know me, some of you might have heard this already if you joined right at the start, but I'm Matt Bass, I'm the Executive Director at Datasite. This is our third webinar in a series of three, so it's been a great series of webinars
and really focused around the Make Data Count Initiative and bringing things together. I think really ending it off with an exciting group of speakers and moving the conversation
forwards. I think I should also add, maybe at the start, just to mention that for Datasite, this is really important as a global community that really focuses around connecting research, identifying knowledge, and that's bringing the disparate pieces of the research lifecycle together. Specifically here, we're talking about the recognition of data and the reuse of data
across the scholarly record and across the community throughout the research lifecycle. It's really exciting to be continuing the conversation. There's lots of work that we're
still doing and if it's underway, and hopefully today we'll have a great set of discussions following the speakers and look forward to engaging with you all. With that, I'll hand over to Stephanie who will briefly introduce each of the speakers and be leading the session.
I'll go off camera in a moment and look forward to carrying the speakers. Over to you, Stephanie. Thank you very much, Matt, for starting us off here and thank you everyone for making the time to join us today, particularly our speakers. I just want to kick us off to, and you might
have heard this already if you have joined the series before, I just want to tell you a little bit about the Make Data Account Initiative and the kind of context we're having this conversation in. Today, our third and last of the spring webinar series is entitled, Begin, Metadata for Meaningful Data Metrics. Just a bit of housekeeping. It's really great that
you're introducing yourself in the chat. If you have questions for the panelists, please use the Q&A function because then we can really pull out the questions. That's what we'll
the comment function if you want to, but be mindful that it might be distracting to speakers. We're also really encouraging you to follow up discussions on Twitter and you can use the handles Make Data Account and DataSite. DataSite will also be tweeting the session.
What is Make Data Account? What is this initiative? We're a scholarly change movement committed to ensuring that the way data is used and cited is open, transparent, and responsible. Who is behind this? We are a collective of organizations such as DataSite,
Crossref, and the California Digital Library, and individuals such as me and researchers who are dedicated to the development of open data metrics. How do we want to do this? Well, we're a lot about building open infrastructure and community-based standards. We advocate a lot about the value and the importance of the role that data plays in the research life cycle and
acknowledging its reuse. We also, and this is particularly me and Isabella, Peter, so we're going to hear later on, are working on a research project where we want to contextualize and actually provide evidence base from a bibliometric but also a qualitative point of view to
build, in the end, meaningful metrics. We're based on values, so we're doing all of this open and open and transparent way, and we want to build metrics that are responsible. We certainly don't want to repeat mistakes from the past that we've seen in terms of impact factor or age index. The focus here is really on developing metrics, but we're really an
initiative, and we've already established some standards, so Scolix and the Counter Research Data Code of Practice, and we're really advocating the use of these standards in developing metrics. We're really on a journey, and this journey has been going on for a few
years, and we still have a lot of work ahead. Right now, we're at the stage that we have identified some best practices in terms of data citation and data metrics, and we want to work on adopting data citations, and we're also working on the contextualization part in terms
of research, and that's really what we're here today discussing about what kind of metadata do we need, what do we need to take into account when we want to build metrics, which is kind of the end of our journey, where we think that data metrics could really help incentivize researchers to share data, so to really value research data as an output
similarly to journal articles, and not having to write a paper about the data, but the data being a standalone and valuable output. So the Meaningful Data Counts project is a sub-project or research-based project funded by the Alfred P. Sloan Foundation that is led by Mia C. Piappi
and Isabella Peters as the co-PI. We have funding from the Alfred P. Sloan Foundation to generate evidence on data sharing, data reuse, and data citation, and we're doing that with a mixed-method approach based on bibliometric analyses, but also surveys and more qualitative interviews that we're starting to set up right now. The survey data is being analyzed at the
moment. We're pretty excited to share that soon, so that's kind of just the context of what we've been doing and the context of why we thought having a webinar series on these kind of topics involving the community and discussing these points would be really important. So I'm really, really thrilled to have four great speakers and panelists here today that are
going to provide their multiple perspectives on metadata for meaningful data metrics. So that's my quick introduction here, and now I'm really excited to pass it over to our first panelist, to Christine Borgman, who is a distinguished research professor in information
studies at UCLA. So over to you, Christine. Thank you. And thank you for the invitation, which is actually very well-timed for the kind of work that I've been doing. And I want to first set this up with a very old
problem in scholarly metrics and use a very new case involving this beautiful plane, the stratospheric observatory that flies for far infrared astronomy. I've been working with astronomers for a number of years now. So here's the old continuing problem,
is to think about how metrics influence behavior and including scholarly behavior. Goodhart's law is a way it's commonly phrased, but you'll see it with various other names. The general idea is any metric is going to be gained. And the more it's gained, the less
good a measure becomes. That's always been an issue. It goes back very, very far in bibliometrics and citation metrics. But now so much is at stake when publisher perish becomes impact or perish. I mean, not only are people getting
hired and promoted based on these kinds of scholarly metrics, whether data metrics or publication metrics, they're getting big cash prizes in some parts of the world for publishing in journals above a certain threshold. So the incentives to game these are absolutely huge. So that takes us to my case study that I want to talk about here.
The Sophia, which has been flying for eight years now, was just in the last few weeks publicly canceled by NASA and by its German partners. Now, this never happens. This is
really unusual. Once something, once a mission is up and flying, it generally gets to complete its entire mission. But this is being canceled eight years into a 20-year lifespan. Now, you can never know the full picture of why this is happening, of course, but the public
arguments about why it's being canceled have to do with scholarly metrics. And because of that, some of my partners in astronomy who have put very substantial parts of their scientific lives into this observatory came to me to discuss how they might respond and what's going
on here. And this slide is a result of that conversation with several astronomers. Now, notice on the left, the claims from the National Academies, from NASA, a study commissioned by nature, all come down to saying that Sophia fails on a metric of the number of
dollars it costs per paper that is produced. Now, the one that nature compared it to were ground-based telescopes and Hubble, which is a space-based telescope. Ground-based and
space-based do very different kinds of science, different kinds of metrics apply, and Sophia is a bit of a poor cousin because it flies on the 747. Comparing it to Hubble is also problematic because it is, again, a very different kind of mission. Hubble has been in
the sky since 1990. It has more than 30 years of data, and as a space telescope, it's up there 24-7 collecting data. Where Sophia is limited to the amount of flights it can take, it effectively can get about 25 hours a week or about 600 hours per year of on-the-sky time. So
Hubble gets about 100 times as many hours of actual observing time per year as Sophia does. So if you compare Sophia to Herschel, which is another infrared observatory, Sophia looks very good. And in the first third of the mission, it's getting very good growth in all
the usual metrics and has undisputed major scientific breakthroughs already. But about half the papers from Hubble and from Chandra and other big old observatories that have good data archives behind them are coming from the archives. So if you compare directly a new observatory like
Sophia to an old one like Hubble or Chandra that have had time to build up this archive and use that archive itself for paper production, you get a very different kind of comparison.
So you've got eight years of data in the archive, and already we're getting more and more papers out of the Sophia archive as it builds. But there's a real critical mass question here that we should be thinking about as well. So that takes us to what I really want to
pull all these pieces together and talking about data metrics and metadata for data metrics is that we need to recognize that all metrics are going to be gained by people subject to them. And that includes not only the authors and the publishers, but includes the funding agencies.
There's many uses to which these are going to be put and people choose ones that are advantageous to them. So any metric we end up with is going to advantage some people and it's going to disadvantage other people. So the criteria for meaningful data metrics and metadata for
meaningful data metrics needs to include questions of who benefits by them, who's disadvantaged by them. We need to think about the opportunities to gain the metric and something that we deal with very much in the privacy world as well is who will watch the watchers. Whatever metrics we
come up with, we need to see how they're used and we need to observe them. Things like retraction watch are very useful, but we don't have those all the way across this field. So let me stop there and thank you. I want to put up some of the sources just to make
sure they get in the record because we like bibliometrics and we like sources and good metadata as well. So I hope that's enough to get us off and running of thinking about what some of the criteria should be for better metrics for scientific data and scientific scholarly communication. Thank you. Thank you very much, Christine, for this wonderful intro and
great case study showing us really a concrete example of what can happen with bad metrics. Also, just as a note, the slides will be shared later on Zenodo, so then people can also look into the references as well. Thank you very much. So we'll move on to our next speaker
now with his introductory statement. Rodrigo Costas, who is a senior researcher at CWTS in Leiden University. So hello, everyone. Hello, Steffi. Nice to see you all. I will then share quickly my screen and please double check that you can see it.
Do you see everything? OK, well, thanks. Thanks again. Thank you for this opportunity to have this panel, this conversation with speakers of the level of Christine, of Nicolas, of Isabel.
So this is really, really a privilege. And from my side, I would like to take the say the very centometric perspective. And to do that, I would like to somehow reflect on our work we did, well, quite some time ago with Paul Bauters and other colleagues
here at CWTS. It was at the time the knowledge exchange report that we did to think on how we could develop data metrics from a quantitative and analytical point of view. And at that time, we did a quite thorough reflection on what kind of metrics we could produce, we could elaborate for
data, for data sharing, for data sharing activities and data outputs. And I mean, trying to, of course, summarize a lot, in a way, we came up with a conclusion that in essence, we can produce very similar metrics as we do for scientific publications.
We can have metrics related with the outputs, for example, the number of data sets that have been produced. We can also produce metrics related to the impact of those of those data sets, including citations, but not only citations, also downloads,
shares in social media, and other types of metrics, I mean, more in the realm of metrics that today we're also applying for scientific publications. And not even just there, I mean, we can also think in metrics about collaboration, about how scientists collaborate each other to produce data, what are the different thematic perspectives that they apply in their data
creation from topics, disciplines, and subject categories and subject disciplines that they somehow relate to their data production. And of course, also metrics related with the trends, I mean, how data production is growing or evolving over time. And at that time, we realize,
I guess, is one of the key elements of the discussion today is that in order to be able to produce all these metrics, we really need to have traceable and comprehensive metadata. So essentially, we need proper information on who are those creators of
the databases, sort of the data sets, who are their affiliations, and what are the different properties of those data outputs they are producing. And in essence, the lack of this metadata, the lack of this somehow comprehensive sources on data sharing kind of led us to the
idea that there was a sort of data sharing business circle in this process. So essentially, you have researchers who produce data, but somehow they don't feel that this is going to be rewarded, that this is going to be valuable, so they don't feel it's worth investing the time in making this data available, more and more accessible to others. Then what happens is that
there are not that many outputs that are, I mean, data sharing outputs that can be measured, that can help to be used to produce metrics, to be analyzed by a metrics, and then makes it difficult to measure it and make it visible, which in itself also keeps not encouraging
researchers to share their data. So in essence, we enter into this vicious circle. So then we are also at the time, and now we also keep thinking, how can we somehow change this situation? How can we create a roadmap to develop these more meaningful data metrics?
And of course, the first step is the important one, is that we have to place, we have to, in a way, present data sharing and the production of data as first-class research products, I would say at the same level as we do with scientific articles. We then also need to increase the attribution and traceability of these data sets using PIDs or DOIs,
including data on affiliations of the creators, publication dates, types of contributions regarding the production of data, and so on. And for example, I always wonder this, why,
I mean, we always claim our affiliations in our scientific articles, but sometimes you don't see this in the production of data. It's not clear who, I mean, researchers don't always include their affiliations in their production of data. Of course, all this also needs to be indexed in some of the major data infrastructures. I think data side is
probably the largest and the best one, but there are others where scientists could also try to report their data production. And again, doing the comparison with articles, so we do care a lot about where our articles are being indexed, but it seems that we don't care that much when
it comes to where is our data being indexed. Then there is another important step that is we really need to pay attention to the quality of the metadata. So the possibility of analyzing, of making visible these data sharing activities strongly relies on the quality of the metadata that describes those data sets. So we need to think in a standardization of
metadata fields like the data creators, their affiliations, the publishers, the dates of the production of those data sets and so on. And of course, the completeness that we have as much as complete data regarding the producers and those data sets. And finally, there
is also important conceptual challenges. For example, when it comes to count and measure data sharing and data production, we also need to think on how we're going to operationalize that concept of data. I mean, it's not the same a data set that is, for example, a collection of photographers than, for example, data that has been simulated by a computer. So we need to
keep all these questions in mind. So then having this roadmap in mind, then we could think in a sort of, oops, did I stop my presentation? Oh yeah, let's go back.
So we can also think in how we can somehow turn this business circle into a more virtuous circle. So then we can, instead of having researchers that are not encouraged, that are not oriented towards producing their data, we try to approach researchers from a point of view
of recognition and reward of their data sharing practices. So we tried, I mean, I would say at least we try not to penalize them for producing the data and possibly we try to reward them for those activities. So that would essentially create more traceable information, more critical
mass of data sets that are being published, that are being recorded, that are traceable, which can also help to somehow change the culture of scientists so that there is more activity regarding training students, new PhDs in the use and the sharing of the data, which also in essence will also help us to measure and make these activities more visible.
So essentially by making those activities visible, we'll measure them and we help to transform the whole process. And well, I hope this is enough ideas to move to the discussion. Thank you.
Thank you very much Rodrigo, lots of ideas. And I think we were going to have a lot of questions and things to discuss. So now we're going to move on to our third panelist, Nicolas Robinson Garcia, who is a Harmonic Carell Fellow at the Information and Communication Studies Department at the University of Granada.
Oh, thank you very much. I hope you can see my slides. Yes. Well, thank you very much for having me here and letting me join this panel and share some ideas. Well, I would like to share my thoughts basically especially with relation on how we're going to use these data metrics
and how they can be incorporated both technically, but also in terms of research assessment. And while when we see these data sources as someone who comes also from the field of access to see things. But I always wonder if these new metrics may become just
something more than researchers see that they have to do and can become an increasing burden like more tasks that we are putting on top of them. So I think that is something we should really think about when thinking about data metrics and how they will be used or gained
as we were talking as Christine mentioned before. Another idea I would like to share with you is this idea about the cost of sharing data versus publishing papers. Sharing data is also costly, so it's not only just producing it and collecting it, it's documenting it, it's making sure that it's reusable, that the analysis that we're
publishing in our paper can be 100% reproducible and even if this is something that is recognized at the same level as a paper, it may actually be something that researchers prefer not to look into because they may think it's more cost effective to continue their
career doing publishing papers and not sharing their data sets or citing them. Also citations, I think that is something that we have to think about on the conceptual meaning of citing a data set. One reason to cite a data set would be because we are actually using it and
reusing it and that means that it will be much more costly to get citations from data because it means that it's being used by more people rather than citing papers. So the cost of someone for using data sets is different from just citing a paper that has been published
elsewhere. And in this sense, well before we go to this, one answer could be that well maybe we should think about this in terms of diversity of scholars, maybe we shouldn't expect every researcher to share their data or to be the people who are actually gaining the credit
for this but actually think in terms of diversity of profiles of researchers who do different things and maybe there is a minority of researchers or a majority of researchers who are actually doing these things and which are not currently recognized in the research assessment system. Maybe this is where actually data metrics could help to visualise this kind of work.
One of the things that has already been mentioned by my colleagues is the idea of metadata. I think that's really, really, really important not only to be able to analyse the data but also to be able to integrate it with other data sources. These are just some screenshots from a study we did where we were trying to look at individuals sharing data using ORCID.
And well here you have some examples where we could get the data and how we did it. So we could get data that was connected through data side to ORCID through Figshare and then from
a specific repositories and here the idea of identifiers is essential to be able to connect these different databases and to be able to get that metadata that may be missing in a data bank from other sources. So maybe all the information from the authors might come from ORCID while we
get metadata from the data side itself from the repository. Of course here again we have the issue of completeness and of coverage. We can look into these sources but of course we are missing a large majority of people and probably also the records are not completely full.
And in this sense other things that we work in my end is in this idea of contributions and profiles of researchers and we do find actually that those who are performing the experiments which would be those who we expect are doing the field work and collecting data and processing
data and so on those who are actually the ones who would be sharing data are relatively young from what we see here in some of the studies that we've been conducting and this may be because this is something that is age-related but also it may be because actually these people are not
being represented by current evaluation systems and data metrics may give them a way actually to find a different career path within academia that is certainly needed. And here that would be one of the hypotheses that data metrics could be a solution actually
to find different paths from different types of research and we always tend to approach scientific metrics and research evaluation in general as a kind of one path one size fits all type of design where we have to assess everyone
in the same way. This means that whenever we find a new metric something that may show some light in different sides of the sphere of what we do like in the case of data metrics we just put it on top of the researchers instead of saying well maybe actually we may find researchers doing different roles and following different career paths.
And then well I just wanted to end up with a bit of self-promotion this is actually a project we're working on with diversity of projects. Another platform which we're very interested in looking at is actually the Open Science Framework where they're actually showing all of the products that can be produced within the framework of a project and
that could be also something to integrate within this ecosystem of open infrastructure that could serve us looking to research outputs beyond publication and that would be all from my end. Thank you very much. Thank you very much Nicholas for this really interesting
perspective of like roles of contributorship and providing credit to those that are producing data. So last but certainly not least I'm happy to introduce our fourth panelist Isabella Peters. She's professor of web science at the CBW Leibniz Center for Information Economics
and University of Kiel. A colleague and longtime friend we actually studied together already a long long time ago so thank you very much Isabella for your contribution. Yeah thank you very much for the well for the invitation and it's also a great pleasure for me
to be part of this really nice panel and it's good to it's always good to meet friends in these webinars and occasions. Yeah and now my part is to report a little bit about our quest for meaningful indicators for research data as Stephanie had introduced us to in the beginning
and what I really like is that our quest really has a great motto because it really explains the core of our research project already and that is to make data count we need meaningful data counts and we have heard about these meaningful data counts already for quite a bit but I would
also like to show you why we think it is necessary to have this kind of context information and meaningful yeah meaningful context information to produce a good data matrix in the end. Let's see whether this is working for good for me today.
Ah so here we are and I think I can be quick on this slide because we have heard about the incentive system already I think Stephanie has introduced that in the in her first slides but also Christine has talked about it and we know that metrics and indicators are powerful
to its to change behavior and they are really great incentives and whether this is good for whether this is for the good or for the bad really depends on the design and along with it on their validity and we know that as soon as countable units are out there people or managers
or whoever will take them and they will start to count them and they will they will start to develop indicators or metrics because it's just too easy and we hope that with our research the the scientific community will not make the same mistakes like in the past and we hope that they will learn from our evidence that metrics are not all in the world and that sometimes we
just need time to work out what would be good metrics and what we've seen is that right now we still need more research on just disciplinary data we use and also of disciplinary data
citation behavior and also that is I think something but also what Rigo has shown before we need a critical mass of context information and metadata to to be able to build the indicators and to make meaningful comparisons and what I want to say is that well in the end we have
to investigate the territory first before we can come up with metrics because we need to know what is available and what is common practice already and after that we can build new metrics and so when looking at data side for example which is one of the largest providers
for DOIs for research data we have found that only six percent of data sets which can be found in data side have disciplinary information with them so that is really not much yeah and also the overwhelming majority of research data has not been cited at all so 99 percent has
not been cited at all and I just post the question to the audience is that enough information for us to start the development of indicators I doubt that but that was maybe something for the discussion and we have also started with investigating where research data
comes from so that as to which disciplines the research data belongs to according to data side and what we can see here is that the majority of this little data from what we know where the discipline is from it stems from the natural sciences followed by medical and health sciences
and then the social sciences but as you can see just in the bottom line almost well those research data sets that are cited often do not contain any discipline information so we do not have a lot of information here at all and even for those that are cited the
information is still less so again I want to challenge you when asking so what what does this tell us for the construction of indicators in the end will it make sense to compare the natural sciences with the agricultural sciences for example which are the blue on the right and I
know that for some of you these might be very much with rhetorical questions but in the end we are already confronted with people that develop data metrics and research data metrics and well that is the territory we face right now and we as bibliometristians have to think about
whether the indicators make sense in the end and what also still is an open issue because we are talking about disciplines here the open issue is whether disciplinary classification systems that are made for scholarly articles or scholarly journals whether they are really transferable to
data sets we don't know that yet yeah right now we try we try that we try to classify data by using journal classification systems for example but whether this is a good idea we don't know so I would say and given that there should be an impulse for you as well so I challenge you
here I would say that as of today we can be quite confident to say that the world is not ready for data metrics yet and and at least it is not ready for those metrics that should be used universally across different disciplines also and I think that is something which Rodrigo
touched on as well the term data or data set is quite ambiguous and as you can see in this example I hope you can see it because I can only see my you know my colleagues here on the right hand side I hope you can can see the differences okay thank you Rodrigo so both of
these data sets are labeled data set but they are quite different in terms of the number of authors for example the license under which they are published the file size the data collection methods and so on so they are quite different but they are all called a data set and in the end again we need to to discuss what is a countable unit as you said Rodrigo right so that is that
is the same discussion we had when it came to publications because we also have some kind of idea of whether a tweet is a publication or the whole book is a publication or the journal article or whatever so we have the same problems here and also in terms of impact we do not know
yet if and how those characteristics of data sets also impact reuse and citation of the data and again from citation studies we know that yeah citation counts are affected by at least 28 factors yeah so that all somehow affect later citation counts and we have to find out whether
this is true for the data sets as well so yeah I think to summarize here we definitely need more information on data citation practices in order to come up with meaningful metrics and if we want
to avoid comparisons of apples and oranges we also need more metadata and context information in this regard so I guess that's what I wanted to say and thank you very much and I hand over to Stephanie. Thank you very much Isabella for this great context and summary of
what we've been up to in the meaningful data accounts project and thank you to all panelists for providing their perspective we're now moving over to the Q&A session and I want to start us off with a question that we prepared and we also if we have time left we're going to get to in the Q&A box so please add your questions we're also going to follow up later if we don't
have time for it so I'm proposing that we'll keep the same speaking order so Christine, Rodrigo, Nicholas and then Isabella and our first question would be what are the the next steps on the journey towards meaningful data metrics and what are really the biggest obstacles
so Christine if you could provide your perspective on that question I think that one is is pretty straightforward is and I've added to the the Q&A is that people reuse data without citing them and we've seen we've published heavily on exactly that point
is they don't seem to well there's a number of problems in there but the fundamental issue is data reuse is much much heavier than any of our indicators show and so incentivizing people to do the citations is a threshold before we know what to count I mean just like anything
to do with the COVID numbers we believe these are grossly underestimated what what the true numbers are so there's some incentive issues there and then there's the gaming issues that have been several people put in here as well that people are concerned about themselves being
gained as far as their their uses and reuses of their data and they're they're afraid of being scooped there's a free rider problem there's the the incentives are absolutely huge in here thank you very much and and just maybe a plug for our first of the three webinars where we really discuss that too in depth in terms of the signal we're getting in a formal citation
versus the overall you know acknowledging of reusing data for example Rodrigo next please yeah yeah from from my side I think there is an urgency in making these activities visible in which then metadata infrastructures that that index that make the production of
data visible are fundamental but they really must take a very serious position here so it's not just creating as Isabella very nicely shown I mean it cannot just be we have the publications with or the data sets with the DOI that needs to be with complete metadata so
authors are properly identified their institutions are properly identified the content of those data sets are are also properly identified and that will create visibility for these activities and my expectation is more or less I try to do with my reversing the vicious circle would be that that visibility and the machining of those of those
activities will create an environment where incentives and rewards can be created to to promote this activity so so for me that would be the very first step thank you Rodrigo and Nicolas next yes well I agree and I think that that even before even going to citations
is just being able to to identify the data sharing the public data sets and trying to find these activities also the issue that Isabella mentioned the types of data sets or whatever we call them what are they because the cost of the use the diversity there is huge even within and between fields and in some in some I mean the the cost of of sharing data and the and the
difficulties even if ethical difficulties will be very different depending on the fields and even the use the use of reusing it or not will change I think that in that part we still have a lot of work to do even before looking at impact
and Isabella can you provide your perspective on what should we do next and what are maybe the biggest obstacles well there's not much left to say I'm afraid but this is well this is good so I agree with everyone and I also think that the like the the practice of reusing and sharing
and citing data and making this visible and having that as a like a fully acknowledged way of doing scholarly work this is I think the major step and I know that this is you know due to organizational change and probably this is also the hardest part but I think we need that
yeah and I really like how everybody emphasized in the sense that the incentives are lacking right like we're not supporting researchers and not valuing enough data sharing and then it becomes what Nicholas also said a burden an
additional thing to do because no policy is requiring it but it's not really helping in the larger academic reward system and in the career and in that sense it's not a priority and that's where at least from the make data account perspective we're hoping that good and responsible data metrics can help with this kind of incentivization but we especially
the the vicious circle and the loop that Rodrigo kind of showed well we're a bit stuck in the sense of like we want to develop good metrics to incentivize but if the metadata is not there to build them but we have to start somewhere right and yeah it's really good points thank
you very much for for answering that question I do have another one before we go to the questions from the participants in the Q&A so my second question would be from your experience with bibliometric indicators and we have decades of experience with that which are mostly based on journal articles what are some lessons learned and advice for developing metrics for
research data so maybe given looking ahead when we're there when the metadata is complete and we know a bit more about you know like research data citation and the patterns behind that and the motivations behind that to cite or not cite data what are kind of lessons learned
when we're actually thinking about making or building an indicator Christine one would be to make it as simple and discreet as possible I mean let's see Rodrigo has mentioned one that
I've talked about much in other venues about the difficulty of defining what the unit is of a data set that we're going to cite and that's a classic problem for us but we pretty much agree on what a journal article is and yet the citation of that is extremely uneven and dirty
I keep pointing people to Zotero and I think we're now up to ten thousand or more different citation styles for journal articles so we can't agree on how to cite a journal article we're a long way from agreeing on how to cite a data set because we don't agree on what the
data set is in the first place so anything we come up with has to recognize that people are to cite it inconsistently and we have a real risk of dirty data to clean up and that takes us into a lot of other things I've got a couple of new pieces in that area and I'm going to put
them into the chat to pick up as well thank you very much Christine Rodrigo so I would I would I would start by saying that I mean most of the developments that are reflecting and questioning citation indicators, input factors, h-indexes, I'm thinking like the manifesto DORA
I think all of them apply to any new metric that we develop for data sets I mean I cannot imagine anything that would be radically different so all of those principles most of the reflections do apply and having said this I mean I also take a much broader
perspective when it comes to data metrics so to me it matters a little bit more than citations I mean for example for me something that would be very interesting is to study collaboration patterns in the production of data something that today can be challenging because the data about the authors is not properly standardized or is not even complete well we already
discussed the thematic base indicators how can we say if an area is growing in production of data or not because we don't have those pieces of information so in a way I would say
inspired for what we know from our centometric research and our centometric development I think there is a lot we can apply to the data metrics and all the lessons learned for articles basically apply for them and probably once we are in that somehow improve or better situation where we have infrastructure to properly capture data metrics
then probably specific problems will come and then it will be time for us to also rethink the indicators we're proposing what could be the challenges like Christine was pointing so when when the indicators stop being useful because somehow the purpose now is lost and it will be
that time but for the time being I think what we have learned from our centometric work which fully applies to data metrics thank you and Nicolas next yes I think that also there are there are some assumptions that I think that are dangerous that we make with the data
if we just translate let's say the conceptual part which is always the weakness of scientific the theory to data metrics so the idea that citations mean something specific when when when done in two data sets that producing data sets are is something
let's say that let's say homogeneous in terms of efforts and so on and I think we need much more qualitative studies there to think about how data is produced why how people recognize the skills that are necessary for people to to to share data and so on
um before that I mean for instance maybe a bit off the record here in in Spain we see that since we have to write a data management plan to apply for funding and many many senior researchers basically don't know how to do this and they don't have the skills they don't know where to how to make their data open what are the the priorities or the requirements that
they need to do this kind of tasks so there is a lot of literacy there and also a lot of understanding how how they do things which are the current practices that they have there thank you Nicholas and Isabella yeah well again I think I agree on all terms so this is really a nice spot here last in the
row but maybe to add something different or a different perspective so I think we should ask ourselves what we want to do with the metrics when we have them in the end because again what do we need them for I know there should be incentives and they should help us do something
but we should be more clear about what this is in the end because then we also can build again meaningful metrics or we can really have a good validity in this regard because what I don't like the idea of you know just building or constructing or designing
metrics or indicators just because data or something is available or in our case here with data metrics it's not available right now the metadata but again we would be I think we would be yeah we could be more specific about what metadata for example we need when we know
for what reasons we want to have an indicator or a metric so I think we should yeah start thinking from the end in a way. Thank you very much to all panelists and for answering these
great questions we have so many great questions also in the Q&A but I worry that we actually won't have time to go through them all but please don't worry we're going to save them and follow up so we're sharing the recording here but we're also going to share the chat
and the Q&A is after so thank you very much for the excellent discussion the contributors by the panelists but also for everyone participating today I'm just going to hand it over now to Daniela who's going to follow up with a little bit more of the follow-up that we have planned. Thanks Stephanie and thanks to all of you who spoke today I think your
perspectives are invaluable and I hope that folks are able to follow along all the work that you all are doing and especially the projects that you noted so if you have any other links put them in make sure it goes to everyone and not just hosts and panelists I know a lot of things are coming just to us so we will be saving the chat we have posted and perhaps Paul can put in
the link here for the Zenodo step but I'm putting a link in here to chat now with the links we do have a page on the make data count website that links to the recordings from the past
two webinars we'll be posting on YouTube the recording of this webinar it will then be linked forever on the make data count site and you can also find it through data site in their channel so the metadata of all this will be caught there as well so we encourage everyone to go back watch the recordings for the first two webinars I think they all kind of culminate to
one and go back so it's great to watch all three and see how they come together this isn't the end of us doing outreach in general through make data count but in virtual time so we thought this would be good to kind of capture what are the big three things that we want to highlight and that was understanding how to surface data citation
understanding how to get a better classification system for data and what metadata we need for meaningful data metrics so these are kind of the priority areas we're all working through right now and please get in touch and follow along as we continue to work through all these efforts again we'll leave the screen on for a second so that if you want to grab
the link again here here it is again and we will follow up with all other information so thank you to everyone it was a pleasure to see you all here yes thank you all thank you to the panelists