BEGIN: metadata for meaningful data metrics
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 5 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/69862 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
GruppenoperationUmsetzung <Informatik>Divergente ReiheMustererkennungMengeMultiplikationsoperatorComputerspielWeb SiteDreiecksfreier GraphBesprechung/Interview
01:49
Elektronische PublikationSichtenkonzeptDateiformatRückkanalTwitter <Softwareplattform>GoogolCoxeter-GruppeApp <Programm>Metrisches SystemDrucksondierungNeuronales NetzPerspektiveStandardabweichungRechnernetzZählenBeobachtungsstudieMultiplikationsoperatorMetrisches SystemBitKontextbezogenes SystemUmsetzung <Informatik>MetadatenDivergente ReiheQuelle <Physik>Web SiteTeilbarkeitElektronische BibliothekTwitter <Softwareplattform>Offene MengeSoftwareentwicklerTermFokalpunktAutomatische IndexierungFunktionalDreiecksfreier GraphMathematikFormation <Mathematik>PunktProjektive EbeneComputerspielStandardabweichungSichtenkonzeptSelbst organisierendes SystemExogene VariableChatten <Kommunikation>Besprechung/InterviewComputeranimationXML
04:24
Treiber <Programm>Framework <Informatik>Komponente <Software>Metrisches SystemStandardabweichungZählenLeistungsbewertungCodeStandardabweichungMetrisches SystemMereologieTermFunktion <Mathematik>MetadatenMAPRechter WinkelProjektive EbeneComputeranimation
05:31
MAPSondierungMixed RealityAnalysisPerspektiveMetadatenMetrisches SystemProjektive EbeneMomentenproblemKontextbezogenes SystemMetadatenPunktMetrisches SystemMixed RealitySondierungDivergente ReiheGenerator <Informatik>Computeranimation
06:32
InformationBeobachtungsstudieMetrisches SystemNichtunterscheidbarkeitInformationBeobachtungsstudieAutomatische HandlungsplanungMetrisches SystemMengeZahlenbereichCASE <Informatik>Besprechung/Interview
07:23
DatenverwaltungMetrisches SystemMetadatenIndexberechnungHauptidealSondierungNetzwerk-gebundene SpeicherungCASE <Informatik>StabMinkowski-MetrikPaarvergleichMetrisches SystemMultiplikationsoperatorUmsetzung <Informatik>Güte der AnpassungProdukt <Mathematik>GrenzschichtablösungPackprogrammZahlenbereichNatürliche ZahlCASE <Informatik>DifferenteBeobachtungsstudieMereologieComputerspielSchwellwertverfahrenParametersystemGesetz <Physik>BitEinflussgrößeReelle ZahlRechenschieberResultantePaarvergleichMinkowski-MetrikSpieltheorieComputeranimation
12:00
Metrisches SystemMetadatenDatenmissbrauchMetrisches SystemDatenfeldAutorisierungEinsMetadatenQuellcodeXMLUML
13:18
MetadatenMetrisches SystemQuellcodeEbeneRadikal <Mathematik>DatenverwaltungInformationAssoziativgesetzTelekommunikationTheoretische PhysikSolar-terrestrische PhysikQuellcodeDatensatzGüte der AnpassungMetrisches SystemTelekommunikationCASE <Informatik>MetadatenBeobachtungsstudieRechenschieberGrundraumBefehl <Informatik>ComputeranimationBesprechung/Interview
14:14
Innerer PunktMetrisches SystemFontMeterDelisches ProblemTouchscreenPerspektiveUmsetzung <Informatik>MAPMultiplikationsoperatorBesprechung/InterviewComputeranimation
14:57
Endliche ModelltheorieEreignishorizontMetrisches SystemInformationPhysikalisches SystemTabelleStrom <Mathematik>PaarvergleichDatenmodellTypentheorieTwitter <Softwareplattform>Soziale Softwarep-V-DiagrammSichtenkonzeptDokumentenserverQuellcodeFacebookStochastische AbhängigkeitDimensionsanalyseEinflussgrößeTransaktionMetadatenDatenfeldVollständigkeitRechenwerkATMDigital Object IdentifierProdukt <Mathematik>SystemprogrammierungHypermediaSoftwareentwicklerProdukt <Mathematik>MetadatenTypentheoriePerspektiveMengeMetrisches SystemKategorie <Mathematik>CodecProzess <Informatik>PaarvergleichMultiplikationsoperatorRouterSichtenkonzeptDatenfeldArithmetisches MittelFunktion <Mathematik>VerkehrsinformationElement <Gruppentheorie>StandardabweichungQuick-SortKreisflächeOrdnung <Mathematik>DatenbankMAPAttributierte GrammatikGemeinsamer SpeicherKollaboration <Informatik>HauptidealringTwitter <Softwareplattform>Coxeter-GruppeAnalytische MengeInformationPunktSpiegelung <Mathematik>Objektorientierte ProgrammierspracheKlasse <Mathematik>VollständigkeitQuellcodeNeuroinformatikZahlenbereichDigitale PhotographieComputeranimationXML
20:53
Shape <Informatik>SichtenkonzeptDatensatzSystemprogrammierungMetadatenProdukt <Mathematik>RichtungMetrisches SystemVollständigkeitRechenwerkDatenfeldATMDigital Object IdentifierSinguläres IntegralEinflussgrößeCAEKreisflächeTransaktionCoxeter-GruppePunktEinflussgrößeSichtenkonzeptProzess <Informatik>MengeWellenpaketKreisflächeGemeinsamer SpeicherMustererkennungCodecInformationt-TestComputeranimationXML
22:25
DateiformatKomplementaritätMetrisches SystemLesezeichen <Internet>RankingCoxeter-GruppeInformationTelekommunikationBeobachtungsstudieGrundraumRechenschieberMetrisches SystemRelativitätstheorieMengeDatenfeldQuellcodeAnalysisTaskSpieltheorieMAPArithmetisches MittelTermBesprechung/InterviewComputeranimationJSONXMLUML
25:04
SystemtechnikQuellcodeInformationDokumentenserverCASE <Informatik>DifferenteWendepunktDigital Object IdentifierCAN-BusBeobachtungsstudieTermMetrisches SystemPhysikalisches SystemProfil <Aerodynamik>InformationKollaboration <Informatik>IdentifizierbarkeitMetadatenQuellcodeBeobachtungsstudieDokumentenserverDatenbankAutorisierungMengeComputeranimation
26:46
QuellcodeInformationNummernsystemDemoszene <Programmierung>ComputerSchriftzeichenerkennungDokumentenserverCASE <Informatik>DifferenteMAPVollständigkeitQuellcodeDatensatzBeobachtungsstudieStrömungsrichtungMetrisches SystemTypentheorieProfil <Aerodynamik>DifferenteKugelLeistungsbewertungDatenfeldEinsPhysikalisches SystemCASE <Informatik>Statistische HypotheseProjektive EbeneBitComputeranimationDiagramm
28:45
SystementwurfEntscheidungstheorieKomponente <Software>QuellcodeGasströmungBeobachtungsstudieDruckverlaufMetrisches SystemLeistungsbewertungEndliche ModelltheorieWärmeübergangSystemaufrufWiderspruchsfreiheitBildschirmmaskeSCI <Informatik>Mathematische LogikDatenmodellTaskDistributionenraumProjektive EbeneFramework <Informatik>Funktion <Mathematik>BitOffene MengeProdukt <Mathematik>SystemplattformPerspektiveComputeranimationBesprechung/Interview
29:36
PowerPointSpeicherabzugSystem-on-ChipQuick-SortThermische ZustandsgleichungSupersymmetrieElektronischer Fingerabdruckp-V-DiagrammMetrisches SystemMetadatenKontextbezogenes SystemKomplementaritätLeistungsbewertungOffene MengeVorzeichen <Mathematik>SoundverarbeitungVersionsverwaltungTotal <Mathematik>VolumenRuhmasseEinflussgrößeMaßstabVerdünnung <Bildverarbeitung>Exogene VariableDigital Object IdentifierMinkowski-MetrikVektorrechnungTemporale LogikMustererkennungStellenringUmwandlungsenthalpieFrequenzART-NetzApache ForrestRelation <Informatik>Web-SeiteBenutzerbeteiligungPhysikalisches SystemMetrisches SystemInformationZählenIndexberechnungSpeicherabzugProjektive EbeneMultiplikationsoperatorRechenwerkData Matrix CodeKontextbezogenes SystemMereologieDatenverwaltungSoftwareentwicklerBitMetadatenPaarvergleichWeb SiteRechenschieberService providerDigital Object IdentifierMengeGrundraumValiditätGüte der AnpassungComputeranimation
33:34
DatenfeldMathematikComputerInformationTelekommunikationBaumechanikProgrammierumgebungHypermediaNotebook-ComputerPhysikalisches SystemMetrisches SystemRechter WinkelCMM <Software Engineering>InformationMengeWeb SiteGeradeAntwortfunktionIndexberechnungKonstruktor <Informatik>Computeranimation
36:01
SurjektivitätStrategisches SpielEntscheidungsbaumATMKette <Mathematik>AggregatzustandAnalysisDateiformatBeschreibungskomplexitätGradientDokumentenserverInformationsverarbeitungMAPBeobachtungsstudieComputerspielNotepad-ComputerVersionsverwaltungFokalpunktMetrisches SystemDatenmodellParametersystemFolge <Mathematik>MultiplikationssatzBereichsschätzungPhysikalisches SystemInformationZählenKontextbezogenes SystemZahlenbereichAutorisierungMengeCharakteristisches PolynomTwitter <Softwareplattform>BeobachtungsstudieTermRechter WinkelElektronische PublikationDifferenteRechenwerkTeilbarkeitOrdnung <Mathematik>PaarvergleichComputeranimation
38:00
PerspektiveMetrisches SystemMultiplikationsoperatorOrdnung <Mathematik>PunktZahlenbereichKontextbezogenes SystemProjektive EbeneFlächeninhaltQuaderSprachsyntheseBesprechung/Interview
39:09
IndexberechnungZahlenbereichSchwellwertverfahrenSpieltheorieFreewareBesprechung/Interview
39:51
GasströmungTermInhalt <Mathematik>MengeMetadatenErwartungswertProdukt <Mathematik>OrtsoperatorAutorisierungAutomatische IndexierungDigital Object IdentifierBesprechung/Interview
40:59
ProgrammierumgebungKreisflächeVirtuelle MaschineEinflussgrößeMengeMereologieDatenmissbrauchTypentheorieDatenfeldPerspektiveBesprechung/Interview
42:03
PerspektiveMathematikGemeinsamer SpeicherMereologieBesprechung/Interview
42:50
IndexberechnungGemeinsamer SpeicherExogene VariableLoopMetadatenMetrisches SystemBitPunktPerspektivePhysikalisches SystemKreisflächeMusterspracheBesprechung/Interview
44:33
Reelle ZahlFlächeninhaltArithmetisches MittelMengeRechenwerkDifferenteBesprechung/Interview
45:44
PerspektiveBeobachtungsstudieMetrisches SystemSoftwareentwicklerAutorisierungEin-AusgabeFlächeninhaltProdukt <Mathematik>MengeIndexberechnungSpiegelung <Mathematik>Motion CapturingInformationBitMultiplikationsoperatorKollaboration <Informatik>MusterspracheTeilbarkeitAutomatische IndexierungMeterBesprechung/Interview
47:53
BeobachtungsstudieDatensatzMereologieInstantiierungAutomatische HandlungsplanungBitTaskMengePhysikalische TheorieTermMetrisches SystemDatenverwaltungUmwandlungsenthalpieSoftwareschwachstelleBesprechung/Interview
49:15
TermIndexberechnungMetrisches SystemDifferentePerspektiveMetadatenCASE <Informatik>ValiditätDatensatzBesprechung/Interview
50:39
FlächeninhaltBitMultiplikationsoperatorGemeinsamer SpeicherDatensatzBesprechung/Interview
51:21
FlächeninhaltMultiplikationsoperatorMetadatenWeb SiteProjektive EbeneDatensatzMotion CapturingZählenVerschlingungYouTubeFlächentheorieVirtualisierungMetrisches SystemPerspektiveWeb-SeiteBesprechung/Interview
53:18
Besprechung/Interview
Transkript: Englisch(automatisch erzeugt)
00:01
Okay, I think we should probably start getting going. I'll kick things off. For those of you that don't know me, some of you might have heard this already if you joined right at the start, but I'm Matt Bass, I'm the Executive Director at Datasite. This is our third webinar in a series of three, so it's been a great series of webinars
00:28
and really focused around the Make Data Count Initiative and bringing things together. I think really ending it off with an exciting group of speakers and moving the conversation
00:40
forwards. I think I should also add, maybe at the start, just to mention that for Datasite, this is really important as a global community that really focuses around connecting research, identifying knowledge, and that's bringing the disparate pieces of the research lifecycle together. Specifically here, we're talking about the recognition of data and the reuse of data
01:06
across the scholarly record and across the community throughout the research lifecycle. It's really exciting to be continuing the conversation. There's lots of work that we're
01:22
still doing and if it's underway, and hopefully today we'll have a great set of discussions following the speakers and look forward to engaging with you all. With that, I'll hand over to Stephanie who will briefly introduce each of the speakers and be leading the session.
01:42
I'll go off camera in a moment and look forward to carrying the speakers. Over to you, Stephanie. Thank you very much, Matt, for starting us off here and thank you everyone for making the time to join us today, particularly our speakers. I just want to kick us off to, and you might
02:01
have heard this already if you have joined the series before, I just want to tell you a little bit about the Make Data Account Initiative and the kind of context we're having this conversation in. Today, our third and last of the spring webinar series is entitled, Begin, Metadata for Meaningful Data Metrics. Just a bit of housekeeping. It's really great that
02:25
you're introducing yourself in the chat. If you have questions for the panelists, please use the Q&A function because then we can really pull out the questions. That's what we'll
02:42
the comment function if you want to, but be mindful that it might be distracting to speakers. We're also really encouraging you to follow up discussions on Twitter and you can use the handles Make Data Account and DataSite. DataSite will also be tweeting the session.
03:02
What is Make Data Account? What is this initiative? We're a scholarly change movement committed to ensuring that the way data is used and cited is open, transparent, and responsible. Who is behind this? We are a collective of organizations such as DataSite,
03:20
Crossref, and the California Digital Library, and individuals such as me and researchers who are dedicated to the development of open data metrics. How do we want to do this? Well, we're a lot about building open infrastructure and community-based standards. We advocate a lot about the value and the importance of the role that data plays in the research life cycle and
03:44
acknowledging its reuse. We also, and this is particularly me and Isabella, Peter, so we're going to hear later on, are working on a research project where we want to contextualize and actually provide evidence base from a bibliometric but also a qualitative point of view to
04:01
build, in the end, meaningful metrics. We're based on values, so we're doing all of this open and open and transparent way, and we want to build metrics that are responsible. We certainly don't want to repeat mistakes from the past that we've seen in terms of impact factor or age index. The focus here is really on developing metrics, but we're really an
04:27
initiative, and we've already established some standards, so Scolix and the Counter Research Data Code of Practice, and we're really advocating the use of these standards in developing metrics. We're really on a journey, and this journey has been going on for a few
04:46
years, and we still have a lot of work ahead. Right now, we're at the stage that we have identified some best practices in terms of data citation and data metrics, and we want to work on adopting data citations, and we're also working on the contextualization part in terms
05:02
of research, and that's really what we're here today discussing about what kind of metadata do we need, what do we need to take into account when we want to build metrics, which is kind of the end of our journey, where we think that data metrics could really help incentivize researchers to share data, so to really value research data as an output
05:23
similarly to journal articles, and not having to write a paper about the data, but the data being a standalone and valuable output. So the Meaningful Data Counts project is a sub-project or research-based project funded by the Alfred P. Sloan Foundation that is led by Mia C. Piappi
05:43
and Isabella Peters as the co-PI. We have funding from the Alfred P. Sloan Foundation to generate evidence on data sharing, data reuse, and data citation, and we're doing that with a mixed-method approach based on bibliometric analyses, but also surveys and more qualitative interviews that we're starting to set up right now. The survey data is being analyzed at the
06:04
moment. We're pretty excited to share that soon, so that's kind of just the context of what we've been doing and the context of why we thought having a webinar series on these kind of topics involving the community and discussing these points would be really important. So I'm really, really thrilled to have four great speakers and panelists here today that are
06:25
going to provide their multiple perspectives on metadata for meaningful data metrics. So that's my quick introduction here, and now I'm really excited to pass it over to our first panelist, to Christine Borgman, who is a distinguished research professor in information
06:44
studies at UCLA. So over to you, Christine. Thank you. And thank you for the invitation, which is actually very well-timed for the kind of work that I've been doing. And I want to first set this up with a very old
07:05
problem in scholarly metrics and use a very new case involving this beautiful plane, the stratospheric observatory that flies for far infrared astronomy. I've been working with astronomers for a number of years now. So here's the old continuing problem,
07:27
is to think about how metrics influence behavior and including scholarly behavior. Goodhart's law is a way it's commonly phrased, but you'll see it with various other names. The general idea is any metric is going to be gained. And the more it's gained, the less
07:46
good a measure becomes. That's always been an issue. It goes back very, very far in bibliometrics and citation metrics. But now so much is at stake when publisher perish becomes impact or perish. I mean, not only are people getting
08:02
hired and promoted based on these kinds of scholarly metrics, whether data metrics or publication metrics, they're getting big cash prizes in some parts of the world for publishing in journals above a certain threshold. So the incentives to game these are absolutely huge. So that takes us to my case study that I want to talk about here.
08:27
The Sophia, which has been flying for eight years now, was just in the last few weeks publicly canceled by NASA and by its German partners. Now, this never happens. This is
08:42
really unusual. Once something, once a mission is up and flying, it generally gets to complete its entire mission. But this is being canceled eight years into a 20-year lifespan. Now, you can never know the full picture of why this is happening, of course, but the public
09:02
arguments about why it's being canceled have to do with scholarly metrics. And because of that, some of my partners in astronomy who have put very substantial parts of their scientific lives into this observatory came to me to discuss how they might respond and what's going
09:24
on here. And this slide is a result of that conversation with several astronomers. Now, notice on the left, the claims from the National Academies, from NASA, a study commissioned by nature, all come down to saying that Sophia fails on a metric of the number of
09:45
dollars it costs per paper that is produced. Now, the one that nature compared it to were ground-based telescopes and Hubble, which is a space-based telescope. Ground-based and
10:01
space-based do very different kinds of science, different kinds of metrics apply, and Sophia is a bit of a poor cousin because it flies on the 747. Comparing it to Hubble is also problematic because it is, again, a very different kind of mission. Hubble has been in
10:20
the sky since 1990. It has more than 30 years of data, and as a space telescope, it's up there 24-7 collecting data. Where Sophia is limited to the amount of flights it can take, it effectively can get about 25 hours a week or about 600 hours per year of on-the-sky time. So
10:44
Hubble gets about 100 times as many hours of actual observing time per year as Sophia does. So if you compare Sophia to Herschel, which is another infrared observatory, Sophia looks very good. And in the first third of the mission, it's getting very good growth in all
11:03
the usual metrics and has undisputed major scientific breakthroughs already. But about half the papers from Hubble and from Chandra and other big old observatories that have good data archives behind them are coming from the archives. So if you compare directly a new observatory like
11:26
Sophia to an old one like Hubble or Chandra that have had time to build up this archive and use that archive itself for paper production, you get a very different kind of comparison.
11:43
So you've got eight years of data in the archive, and already we're getting more and more papers out of the Sophia archive as it builds. But there's a real critical mass question here that we should be thinking about as well. So that takes us to what I really want to
12:03
pull all these pieces together and talking about data metrics and metadata for data metrics is that we need to recognize that all metrics are going to be gained by people subject to them. And that includes not only the authors and the publishers, but includes the funding agencies.
12:25
There's many uses to which these are going to be put and people choose ones that are advantageous to them. So any metric we end up with is going to advantage some people and it's going to disadvantage other people. So the criteria for meaningful data metrics and metadata for
12:45
meaningful data metrics needs to include questions of who benefits by them, who's disadvantaged by them. We need to think about the opportunities to gain the metric and something that we deal with very much in the privacy world as well is who will watch the watchers. Whatever metrics we
13:06
come up with, we need to see how they're used and we need to observe them. Things like retraction watch are very useful, but we don't have those all the way across this field. So let me stop there and thank you. I want to put up some of the sources just to make
13:22
sure they get in the record because we like bibliometrics and we like sources and good metadata as well. So I hope that's enough to get us off and running of thinking about what some of the criteria should be for better metrics for scientific data and scientific scholarly communication. Thank you. Thank you very much, Christine, for this wonderful intro and
13:45
great case study showing us really a concrete example of what can happen with bad metrics. Also, just as a note, the slides will be shared later on Zenodo, so then people can also look into the references as well. Thank you very much. So we'll move on to our next speaker
14:02
now with his introductory statement. Rodrigo Costas, who is a senior researcher at CWTS in Leiden University. So hello, everyone. Hello, Steffi. Nice to see you all. I will then share quickly my screen and please double check that you can see it.
14:26
Do you see everything? OK, well, thanks. Thanks again. Thank you for this opportunity to have this panel, this conversation with speakers of the level of Christine, of Nicolas, of Isabel.
14:44
So this is really, really a privilege. And from my side, I would like to take the say the very centometric perspective. And to do that, I would like to somehow reflect on our work we did, well, quite some time ago with Paul Bauters and other colleagues
15:03
here at CWTS. It was at the time the knowledge exchange report that we did to think on how we could develop data metrics from a quantitative and analytical point of view. And at that time, we did a quite thorough reflection on what kind of metrics we could produce, we could elaborate for
15:27
data, for data sharing, for data sharing activities and data outputs. And I mean, trying to, of course, summarize a lot, in a way, we came up with a conclusion that in essence, we can produce very similar metrics as we do for scientific publications.
15:45
We can have metrics related with the outputs, for example, the number of data sets that have been produced. We can also produce metrics related to the impact of those of those data sets, including citations, but not only citations, also downloads,
16:01
shares in social media, and other types of metrics, I mean, more in the realm of metrics that today we're also applying for scientific publications. And not even just there, I mean, we can also think in metrics about collaboration, about how scientists collaborate each other to produce data, what are the different thematic perspectives that they apply in their data
16:26
creation from topics, disciplines, and subject categories and subject disciplines that they somehow relate to their data production. And of course, also metrics related with the trends, I mean, how data production is growing or evolving over time. And at that time, we realize,
16:45
I guess, is one of the key elements of the discussion today is that in order to be able to produce all these metrics, we really need to have traceable and comprehensive metadata. So essentially, we need proper information on who are those creators of
17:00
the databases, sort of the data sets, who are their affiliations, and what are the different properties of those data outputs they are producing. And in essence, the lack of this metadata, the lack of this somehow comprehensive sources on data sharing kind of led us to the
17:21
idea that there was a sort of data sharing business circle in this process. So essentially, you have researchers who produce data, but somehow they don't feel that this is going to be rewarded, that this is going to be valuable, so they don't feel it's worth investing the time in making this data available, more and more accessible to others. Then what happens is that
17:43
there are not that many outputs that are, I mean, data sharing outputs that can be measured, that can help to be used to produce metrics, to be analyzed by a metrics, and then makes it difficult to measure it and make it visible, which in itself also keeps not encouraging
18:02
researchers to share their data. So in essence, we enter into this vicious circle. So then we are also at the time, and now we also keep thinking, how can we somehow change this situation? How can we create a roadmap to develop these more meaningful data metrics?
18:22
And of course, the first step is the important one, is that we have to place, we have to, in a way, present data sharing and the production of data as first-class research products, I would say at the same level as we do with scientific articles. We then also need to increase the attribution and traceability of these data sets using PIDs or DOIs,
18:47
including data on affiliations of the creators, publication dates, types of contributions regarding the production of data, and so on. And for example, I always wonder this, why,
19:01
I mean, we always claim our affiliations in our scientific articles, but sometimes you don't see this in the production of data. It's not clear who, I mean, researchers don't always include their affiliations in their production of data. Of course, all this also needs to be indexed in some of the major data infrastructures. I think data side is
19:22
probably the largest and the best one, but there are others where scientists could also try to report their data production. And again, doing the comparison with articles, so we do care a lot about where our articles are being indexed, but it seems that we don't care that much when
19:40
it comes to where is our data being indexed. Then there is another important step that is we really need to pay attention to the quality of the metadata. So the possibility of analyzing, of making visible these data sharing activities strongly relies on the quality of the metadata that describes those data sets. So we need to think in a standardization of
20:04
metadata fields like the data creators, their affiliations, the publishers, the dates of the production of those data sets and so on. And of course, the completeness that we have as much as complete data regarding the producers and those data sets. And finally, there
20:21
is also important conceptual challenges. For example, when it comes to count and measure data sharing and data production, we also need to think on how we're going to operationalize that concept of data. I mean, it's not the same a data set that is, for example, a collection of photographers than, for example, data that has been simulated by a computer. So we need to
20:43
keep all these questions in mind. So then having this roadmap in mind, then we could think in a sort of, oops, did I stop my presentation? Oh yeah, let's go back.
21:03
So we can also think in how we can somehow turn this business circle into a more virtuous circle. So then we can, instead of having researchers that are not encouraged, that are not oriented towards producing their data, we try to approach researchers from a point of view
21:23
of recognition and reward of their data sharing practices. So we tried, I mean, I would say at least we try not to penalize them for producing the data and possibly we try to reward them for those activities. So that would essentially create more traceable information, more critical
21:43
mass of data sets that are being published, that are being recorded, that are traceable, which can also help to somehow change the culture of scientists so that there is more activity regarding training students, new PhDs in the use and the sharing of the data, which also in essence will also help us to measure and make these activities more visible.
22:07
So essentially by making those activities visible, we'll measure them and we help to transform the whole process. And well, I hope this is enough ideas to move to the discussion. Thank you.
22:22
Thank you very much Rodrigo, lots of ideas. And I think we were going to have a lot of questions and things to discuss. So now we're going to move on to our third panelist, Nicolas Robinson Garcia, who is a Harmonic Carell Fellow at the Information and Communication Studies Department at the University of Granada.
22:41
Oh, thank you very much. I hope you can see my slides. Yes. Well, thank you very much for having me here and letting me join this panel and share some ideas. Well, I would like to share my thoughts basically especially with relation on how we're going to use these data metrics
23:01
and how they can be incorporated both technically, but also in terms of research assessment. And while when we see these data sources as someone who comes also from the field of access to see things. But I always wonder if these new metrics may become just
23:25
something more than researchers see that they have to do and can become an increasing burden like more tasks that we are putting on top of them. So I think that is something we should really think about when thinking about data metrics and how they will be used or gained
23:42
as we were talking as Christine mentioned before. Another idea I would like to share with you is this idea about the cost of sharing data versus publishing papers. Sharing data is also costly, so it's not only just producing it and collecting it, it's documenting it, it's making sure that it's reusable, that the analysis that we're
24:06
publishing in our paper can be 100% reproducible and even if this is something that is recognized at the same level as a paper, it may actually be something that researchers prefer not to look into because they may think it's more cost effective to continue their
24:24
career doing publishing papers and not sharing their data sets or citing them. Also citations, I think that is something that we have to think about on the conceptual meaning of citing a data set. One reason to cite a data set would be because we are actually using it and
24:44
reusing it and that means that it will be much more costly to get citations from data because it means that it's being used by more people rather than citing papers. So the cost of someone for using data sets is different from just citing a paper that has been published
25:02
elsewhere. And in this sense, well before we go to this, one answer could be that well maybe we should think about this in terms of diversity of scholars, maybe we shouldn't expect every researcher to share their data or to be the people who are actually gaining the credit
25:20
for this but actually think in terms of diversity of profiles of researchers who do different things and maybe there is a minority of researchers or a majority of researchers who are actually doing these things and which are not currently recognized in the research assessment system. Maybe this is where actually data metrics could help to visualise this kind of work.
25:44
One of the things that has already been mentioned by my colleagues is the idea of metadata. I think that's really, really, really important not only to be able to analyse the data but also to be able to integrate it with other data sources. These are just some screenshots from a study we did where we were trying to look at individuals sharing data using ORCID.
26:11
And well here you have some examples where we could get the data and how we did it. So we could get data that was connected through data side to ORCID through Figshare and then from
26:22
a specific repositories and here the idea of identifiers is essential to be able to connect these different databases and to be able to get that metadata that may be missing in a data bank from other sources. So maybe all the information from the authors might come from ORCID while we
26:42
get metadata from the data side itself from the repository. Of course here again we have the issue of completeness and of coverage. We can look into these sources but of course we are missing a large majority of people and probably also the records are not completely full.
27:05
And in this sense other things that we work in my end is in this idea of contributions and profiles of researchers and we do find actually that those who are performing the experiments which would be those who we expect are doing the field work and collecting data and processing
27:23
data and so on those who are actually the ones who would be sharing data are relatively young from what we see here in some of the studies that we've been conducting and this may be because this is something that is age-related but also it may be because actually these people are not
27:42
being represented by current evaluation systems and data metrics may give them a way actually to find a different career path within academia that is certainly needed. And here that would be one of the hypotheses that data metrics could be a solution actually
28:03
to find different paths from different types of research and we always tend to approach scientific metrics and research evaluation in general as a kind of one path one size fits all type of design where we have to assess everyone
28:23
in the same way. This means that whenever we find a new metric something that may show some light in different sides of the sphere of what we do like in the case of data metrics we just put it on top of the researchers instead of saying well maybe actually we may find researchers doing different roles and following different career paths.
28:45
And then well I just wanted to end up with a bit of self-promotion this is actually a project we're working on with diversity of projects. Another platform which we're very interested in looking at is actually the Open Science Framework where they're actually showing all of the products that can be produced within the framework of a project and
29:05
that could be also something to integrate within this ecosystem of open infrastructure that could serve us looking to research outputs beyond publication and that would be all from my end. Thank you very much. Thank you very much Nicholas for this really interesting
29:25
perspective of like roles of contributorship and providing credit to those that are producing data. So last but certainly not least I'm happy to introduce our fourth panelist Isabella Peters. She's professor of web science at the CBW Leibniz Center for Information Economics
29:43
and University of Kiel. A colleague and longtime friend we actually studied together already a long long time ago so thank you very much Isabella for your contribution. Yeah thank you very much for the well for the invitation and it's also a great pleasure for me
30:02
to be part of this really nice panel and it's good to it's always good to meet friends in these webinars and occasions. Yeah and now my part is to report a little bit about our quest for meaningful indicators for research data as Stephanie had introduced us to in the beginning
30:21
and what I really like is that our quest really has a great motto because it really explains the core of our research project already and that is to make data count we need meaningful data counts and we have heard about these meaningful data counts already for quite a bit but I would
30:41
also like to show you why we think it is necessary to have this kind of context information and meaningful yeah meaningful context information to produce a good data matrix in the end. Let's see whether this is working for good for me today.
31:00
Ah so here we are and I think I can be quick on this slide because we have heard about the incentive system already I think Stephanie has introduced that in the in her first slides but also Christine has talked about it and we know that metrics and indicators are powerful
31:20
to its to change behavior and they are really great incentives and whether this is good for whether this is for the good or for the bad really depends on the design and along with it on their validity and we know that as soon as countable units are out there people or managers
31:41
or whoever will take them and they will start to count them and they will they will start to develop indicators or metrics because it's just too easy and we hope that with our research the the scientific community will not make the same mistakes like in the past and we hope that they will learn from our evidence that metrics are not all in the world and that sometimes we
32:06
just need time to work out what would be good metrics and what we've seen is that right now we still need more research on just disciplinary data we use and also of disciplinary data
32:22
citation behavior and also that is I think something but also what Rigo has shown before we need a critical mass of context information and metadata to to be able to build the indicators and to make meaningful comparisons and what I want to say is that well in the end we have
32:46
to investigate the territory first before we can come up with metrics because we need to know what is available and what is common practice already and after that we can build new metrics and so when looking at data side for example which is one of the largest providers
33:05
for DOIs for research data we have found that only six percent of data sets which can be found in data side have disciplinary information with them so that is really not much yeah and also the overwhelming majority of research data has not been cited at all so 99 percent has
33:25
not been cited at all and I just post the question to the audience is that enough information for us to start the development of indicators I doubt that but that was maybe something for the discussion and we have also started with investigating where research data
33:44
comes from so that as to which disciplines the research data belongs to according to data side and what we can see here is that the majority of this little data from what we know where the discipline is from it stems from the natural sciences followed by medical and health sciences
34:00
and then the social sciences but as you can see just in the bottom line almost well those research data sets that are cited often do not contain any discipline information so we do not have a lot of information here at all and even for those that are cited the
34:24
information is still less so again I want to challenge you when asking so what what does this tell us for the construction of indicators in the end will it make sense to compare the natural sciences with the agricultural sciences for example which are the blue on the right and I
34:42
know that for some of you these might be very much with rhetorical questions but in the end we are already confronted with people that develop data metrics and research data metrics and well that is the territory we face right now and we as bibliometristians have to think about
35:01
whether the indicators make sense in the end and what also still is an open issue because we are talking about disciplines here the open issue is whether disciplinary classification systems that are made for scholarly articles or scholarly journals whether they are really transferable to
35:21
data sets we don't know that yet yeah right now we try we try that we try to classify data by using journal classification systems for example but whether this is a good idea we don't know so I would say and given that there should be an impulse for you as well so I challenge you
35:41
here I would say that as of today we can be quite confident to say that the world is not ready for data metrics yet and and at least it is not ready for those metrics that should be used universally across different disciplines also and I think that is something which Rodrigo
36:05
touched on as well the term data or data set is quite ambiguous and as you can see in this example I hope you can see it because I can only see my you know my colleagues here on the right hand side I hope you can can see the differences okay thank you Rodrigo so both of
36:22
these data sets are labeled data set but they are quite different in terms of the number of authors for example the license under which they are published the file size the data collection methods and so on so they are quite different but they are all called a data set and in the end again we need to to discuss what is a countable unit as you said Rodrigo right so that is that
36:45
is the same discussion we had when it came to publications because we also have some kind of idea of whether a tweet is a publication or the whole book is a publication or the journal article or whatever so we have the same problems here and also in terms of impact we do not know
37:04
yet if and how those characteristics of data sets also impact reuse and citation of the data and again from citation studies we know that yeah citation counts are affected by at least 28 factors yeah so that all somehow affect later citation counts and we have to find out whether
37:27
this is true for the data sets as well so yeah I think to summarize here we definitely need more information on data citation practices in order to come up with meaningful metrics and if we want
37:42
to avoid comparisons of apples and oranges we also need more metadata and context information in this regard so I guess that's what I wanted to say and thank you very much and I hand over to Stephanie. Thank you very much Isabella for this great context and summary of
38:02
what we've been up to in the meaningful data accounts project and thank you to all panelists for providing their perspective we're now moving over to the Q&A session and I want to start us off with a question that we prepared and we also if we have time left we're going to get to in the Q&A box so please add your questions we're also going to follow up later if we don't
38:25
have time for it so I'm proposing that we'll keep the same speaking order so Christine, Rodrigo, Nicholas and then Isabella and our first question would be what are the the next steps on the journey towards meaningful data metrics and what are really the biggest obstacles
38:43
so Christine if you could provide your perspective on that question I think that one is is pretty straightforward is and I've added to the the Q&A is that people reuse data without citing them and we've seen we've published heavily on exactly that point
39:02
is they don't seem to well there's a number of problems in there but the fundamental issue is data reuse is much much heavier than any of our indicators show and so incentivizing people to do the citations is a threshold before we know what to count I mean just like anything
39:23
to do with the COVID numbers we believe these are grossly underestimated what what the true numbers are so there's some incentive issues there and then there's the gaming issues that have been several people put in here as well that people are concerned about themselves being
39:41
gained as far as their their uses and reuses of their data and they're they're afraid of being scooped there's a free rider problem there's the the incentives are absolutely huge in here thank you very much and and just maybe a plug for our first of the three webinars where we really discuss that too in depth in terms of the signal we're getting in a formal citation
40:03
versus the overall you know acknowledging of reusing data for example Rodrigo next please yeah yeah from from my side I think there is an urgency in making these activities visible in which then metadata infrastructures that that index that make the production of
40:27
data visible are fundamental but they really must take a very serious position here so it's not just creating as Isabella very nicely shown I mean it cannot just be we have the publications with or the data sets with the DOI that needs to be with complete metadata so
40:44
authors are properly identified their institutions are properly identified the content of those data sets are are also properly identified and that will create visibility for these activities and my expectation is more or less I try to do with my reversing the vicious circle would be that that visibility and the machining of those of those
41:04
activities will create an environment where incentives and rewards can be created to to promote this activity so so for me that would be the very first step thank you Rodrigo and Nicolas next yes well I agree and I think that that even before even going to citations
41:22
is just being able to to identify the data sharing the public data sets and trying to find these activities also the issue that Isabella mentioned the types of data sets or whatever we call them what are they because the cost of the use the diversity there is huge even within and between fields and in some in some I mean the the cost of of sharing data and the and the
41:47
difficulties even if ethical difficulties will be very different depending on the fields and even the use the use of reusing it or not will change I think that in that part we still have a lot of work to do even before looking at impact
42:02
and Isabella can you provide your perspective on what should we do next and what are maybe the biggest obstacles well there's not much left to say I'm afraid but this is well this is good so I agree with everyone and I also think that the like the the practice of reusing and sharing
42:22
and citing data and making this visible and having that as a like a fully acknowledged way of doing scholarly work this is I think the major step and I know that this is you know due to organizational change and probably this is also the hardest part but I think we need that
42:49
yeah and I really like how everybody emphasized in the sense that the incentives are lacking right like we're not supporting researchers and not valuing enough data sharing and then it becomes what Nicholas also said a burden an
43:04
additional thing to do because no policy is requiring it but it's not really helping in the larger academic reward system and in the career and in that sense it's not a priority and that's where at least from the make data account perspective we're hoping that good and responsible data metrics can help with this kind of incentivization but we especially
43:24
the the vicious circle and the loop that Rodrigo kind of showed well we're a bit stuck in the sense of like we want to develop good metrics to incentivize but if the metadata is not there to build them but we have to start somewhere right and yeah it's really good points thank
43:41
you very much for for answering that question I do have another one before we go to the questions from the participants in the Q&A so my second question would be from your experience with bibliometric indicators and we have decades of experience with that which are mostly based on journal articles what are some lessons learned and advice for developing metrics for
44:05
research data so maybe given looking ahead when we're there when the metadata is complete and we know a bit more about you know like research data citation and the patterns behind that and the motivations behind that to cite or not cite data what are kind of lessons learned
44:24
when we're actually thinking about making or building an indicator Christine one would be to make it as simple and discreet as possible I mean let's see Rodrigo has mentioned one that
44:42
I've talked about much in other venues about the difficulty of defining what the unit is of a data set that we're going to cite and that's a classic problem for us but we pretty much agree on what a journal article is and yet the citation of that is extremely uneven and dirty
45:02
I keep pointing people to Zotero and I think we're now up to ten thousand or more different citation styles for journal articles so we can't agree on how to cite a journal article we're a long way from agreeing on how to cite a data set because we don't agree on what the
45:21
data set is in the first place so anything we come up with has to recognize that people are to cite it inconsistently and we have a real risk of dirty data to clean up and that takes us into a lot of other things I've got a couple of new pieces in that area and I'm going to put
45:40
them into the chat to pick up as well thank you very much Christine Rodrigo so I would I would I would start by saying that I mean most of the developments that are reflecting and questioning citation indicators, input factors, h-indexes, I'm thinking like the manifesto DORA
46:05
I think all of them apply to any new metric that we develop for data sets I mean I cannot imagine anything that would be radically different so all of those principles most of the reflections do apply and having said this I mean I also take a much broader
46:23
perspective when it comes to data metrics so to me it matters a little bit more than citations I mean for example for me something that would be very interesting is to study collaboration patterns in the production of data something that today can be challenging because the data about the authors is not properly standardized or is not even complete well we already
46:46
discussed the thematic base indicators how can we say if an area is growing in production of data or not because we don't have those pieces of information so in a way I would say
47:02
inspired for what we know from our centometric research and our centometric development I think there is a lot we can apply to the data metrics and all the lessons learned for articles basically apply for them and probably once we are in that somehow improve or better situation where we have infrastructure to properly capture data metrics
47:26
then probably specific problems will come and then it will be time for us to also rethink the indicators we're proposing what could be the challenges like Christine was pointing so when when the indicators stop being useful because somehow the purpose now is lost and it will be
47:44
that time but for the time being I think what we have learned from our centometric work which fully applies to data metrics thank you and Nicolas next yes I think that also there are there are some assumptions that I think that are dangerous that we make with the data
48:02
if we just translate let's say the conceptual part which is always the weakness of scientific the theory to data metrics so the idea that citations mean something specific when when when done in two data sets that producing data sets are is something
48:23
let's say that let's say homogeneous in terms of efforts and so on and I think we need much more qualitative studies there to think about how data is produced why how people recognize the skills that are necessary for people to to to share data and so on
48:41
um before that I mean for instance maybe a bit off the record here in in Spain we see that since we have to write a data management plan to apply for funding and many many senior researchers basically don't know how to do this and they don't have the skills they don't know where to how to make their data open what are the the priorities or the requirements that
49:02
they need to do this kind of tasks so there is a lot of literacy there and also a lot of understanding how how they do things which are the current practices that they have there thank you Nicholas and Isabella yeah well again I think I agree on all terms so this is really a nice spot here last in the
49:25
row but maybe to add something different or a different perspective so I think we should ask ourselves what we want to do with the metrics when we have them in the end because again what do we need them for I know there should be incentives and they should help us do something
49:46
but we should be more clear about what this is in the end because then we also can build again meaningful metrics or we can really have a good validity in this regard because what I don't like the idea of you know just building or constructing or designing
50:05
metrics or indicators just because data or something is available or in our case here with data metrics it's not available right now the metadata but again we would be I think we would be yeah we could be more specific about what metadata for example we need when we know
50:25
for what reasons we want to have an indicator or a metric so I think we should yeah start thinking from the end in a way. Thank you very much to all panelists and for answering these
50:42
great questions we have so many great questions also in the Q&A but I worry that we actually won't have time to go through them all but please don't worry we're going to save them and follow up so we're sharing the recording here but we're also going to share the chat
51:01
and the Q&A is after so thank you very much for the excellent discussion the contributors by the panelists but also for everyone participating today I'm just going to hand it over now to Daniela who's going to follow up with a little bit more of the follow-up that we have planned. Thanks Stephanie and thanks to all of you who spoke today I think your
51:23
perspectives are invaluable and I hope that folks are able to follow along all the work that you all are doing and especially the projects that you noted so if you have any other links put them in make sure it goes to everyone and not just hosts and panelists I know a lot of things are coming just to us so we will be saving the chat we have posted and perhaps Paul can put in
51:47
the link here for the Zenodo step but I'm putting a link in here to chat now with the links we do have a page on the make data count website that links to the recordings from the past
52:00
two webinars we'll be posting on YouTube the recording of this webinar it will then be linked forever on the make data count site and you can also find it through data site in their channel so the metadata of all this will be caught there as well so we encourage everyone to go back watch the recordings for the first two webinars I think they all kind of culminate to
52:22
one and go back so it's great to watch all three and see how they come together this isn't the end of us doing outreach in general through make data count but in virtual time so we thought this would be good to kind of capture what are the big three things that we want to highlight and that was understanding how to surface data citation
52:43
understanding how to get a better classification system for data and what metadata we need for meaningful data metrics so these are kind of the priority areas we're all working through right now and please get in touch and follow along as we continue to work through all these efforts again we'll leave the screen on for a second so that if you want to grab
53:05
the link again here here it is again and we will follow up with all other information so thank you to everyone it was a pleasure to see you all here yes thank you all thank you to the panelists