We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

What other metadata could be made open about a research publication?

00:00

Formal Metadata

Title
What other metadata could be made open about a research publication?
Title of Series
Number of Parts
36
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
36
Scheduling (computing)MathematicsBitShared memoryGroup actionIncidence algebraData analysisUniverse (mathematics)Visualization (computer graphics)Procedural programmingTouchscreenOpen setMeeting/Interview
SurfaceAreaConsistencyInternet forumLine (geometry)Formal grammarProbability density functionStatement (computer science)Source codeShared memoryTouchscreenInfinityRecursionView (database)Instance (computer science)Price indexAuthorizationInformationRight angleGreatest elementEmailStandard deviationAddress spaceWeb pagePeer-to-peerTotal S.A.CASE <Informatik>Metric systemSurfaceSoftwareFunction (mathematics)Text editorAreaTerm (mathematics)DatabaseHypermediaSource codeRaw image formatWebsiteWikiAdditionInclusion mapBitWordMessage passingOpen setMetadataProbability density functionString (computer science)Focus (optics)Parameter (computer programming)Computer animation
Source codeString (computer science)Covering spaceWeb pagePeer-to-peerCASE <Informatik>AuthorizationText editorEmbargoNuclear spaceTranslation (relic)Degree (graph theory)Formal languageFrequencyPublic domainInformationData structurePoint (geometry)Computer fileRoutingLatent heatObservational studyField (computer science)Revision controlStatement (computer science)Open setProbability density functionIdentity managementDependent and independent variablesOrder (biology)Uniform resource locatorEmailStatisticsAddress spaceRow (database)Error messageStaff (military)BitFunction (mathematics)File viewerMultiplicationArmFile formatMultiplication signEvent horizonOffice suiteContent (media)Inheritance (object-oriented programming)Complex (psychology)Type theoryComputer animation
Source codeDigital object identifierProbability density functionInformationVideoconferencingUser interfaceDifferent (Kate Ryan album)CASE <Informatik>Process (computing)SoftwareWhiteboardPurchasingEvent horizonCategory of beingInformationStandard deviationElectronic visual displayProjective planeLink (knot theory)Physical systemUniverse (mathematics)MappingMoment (mathematics)1 (number)Volume (thermodynamics)AbstractionMeasurementView (database)Latent heatMetadataDigital object identifierSelf-organizationExtension (kinesiology)PhysicalismTraffic reportingMultiplicationProbability density functionSheaf (mathematics)Point (geometry)Open setTerm (mathematics)Instance (computer science)Function (mathematics)Software developerState of matterPlanningSystem callText editorWikiException handlingParticle systemComputer animation
Traffic shapingHTTP cookieDataflowAdditionTraffic reportingCondition numberAlpha (investment)Revision controlComputer animation
BitRevision controlExplosionAlpha (investment)Maxima and minimaIdentity managementFile formatAlpha (investment)Projective planeCASE <Informatik>WikiVirtual machineInformationRevision controlProbability density functionSource codeXML
Traffic shapingHidden Markov modelWikiCASE <Informatik>InformationDigital object identifierProjective planeAuthorizationLetterpress printingComputer animation
Source codeMusical ensembleDigital object identifierAuthorizationInformationAdditionEvent horizonLibrary catalogData analysisServer (computing)MereologyTraffic reportingLink (knot theory)File formatOpen setGroup actionCASE <Informatik>String (computer science)FeedbackChannel capacityType theoryOffice suiteMathematical analysisComputer animation
VideoconferencingDebuggerFreewareProjective planeGraphics tabletXML
Observational studyShared memoryPoint (geometry)TouchscreenMeeting/Interview
Traffic reportingIdentity managementEqualiser (mathematics)Open setGoodness of fitProof theorySound effectField (computer science)CASE <Informatik>Sign (mathematics)Peer-to-peerMeeting/Interview
Meeting/Interview
Transcript: English(auto-generated)
All right, as mentioned before, a bit of a change to the schedule. I'm going to throw to my co-host, Thomas, who's going to do a talk on how open can you make metadata. So a bit of an intro. I'm sure most of you know him already, but Thomas is a data specialist at La Trobe University with a background in evolutionary biochemistry.
His professional role focuses in open data and open research outputs across the university, ranging from data analysis and visualization to sharing and dissemination. He also helps to run the WikiJournal user group, a small publishing group of open access Wikipedia-integrated academic journals, and is an experienced editor of Wikipedia and Wikidata.
So, Thomas, I'll hand over to you. Thank you very much, and I shall stick my screen share up, so apologies for the infinite recursion until I change away from the StreamYard view. So one of the things that I've been interested in with the way that
Wikidata works is that we actually have the opportunity to include a lot more metadata than we normally would see involved in a site such as Crossref or in PubMed. In addition to all of that information, we have this opportunity to do a lot more annotation, some of it manually,
but hopefully a lot of it in an automated fashion. And so I'm going to use this article here as a brief example of some of that. So this is an article that was published in the WikiJournal of Medicine, and this is for those who haven't come across the WikiJournals before,
these are academic journals that are hosted on the same software that Wikipedia and other Wikis are, so that MediaWiki software. And it actually affords a couple of benefits in terms of interacting with Wikidata, so they're actually quite a nice case example. So the first thing to note is that this is all formatted up like you would expect in many another
academic journal with authors, you've got an abstract, the text down at the bottom, and then on the right hand side the various bits of metadata like its PDF and DOI and so on and so forth. Additionally, because the WikiJournals focus on open publishing and transparency, also the
editors are indicated and in fact their roles as editors and also the peer reviewers are indicated. And if we go through to the peer review page, because the peer review comments are also open, it gives a little bit more information about those. But the interesting thing here is that
all of this information on the right hand side and these authors up at the top, all of that information isn't stored on this page. For those who are used to working with MediaWiki and looking at source code, you might expect that all of these are included as
parameters of a template actually present on this page, but they're not. They're actually all stored in Wikidata and called from Wikidata to populate up this page. And that Wikidata item is here. So again, this is the same title and you can see on the
right hand side that it links to the WikiJournal article as its main Wikimedia Foundation wiki page. And it has some of the standard information that you might expect, so it indicates that it's an instance of a scholarly article and indicates title.
The keywords are pretty standard as well that you might expect for other databases. So in this case, it's an article about burns. It's about a particular metric of total body surface area, which is a metric of burn severity. And then two different ways of estimating TBSA,
burn case 3D and the Lund Browder chart. OK, so far so normal. We also have the authors listed, but one thing that we're starting to see that is unusual for Wikidata's way of storing author
information is firstly, we already have an email address string here to indicate the corresponding author. There has been a general discussion on Wikidata about whether to include in individuals Wikidata items their email address if that email address is open. So for example,
in this particular author, if we follow through to the Wikidata item about them, they are a researcher. It does not include their email address here. It does include their ORCID ID. And from their ORCID, you can go to their publications and from their publications, they are a corresponding author on some of them. And that corresponding authorship will have
associated with it an email address. But typically, it's not considered standard to include someone's email address actually in there, the item about them. However, it I think is reasonable to include an email address for a corresponding author
on the Wikidata item about that output. Additionally, we have affiliation strings listed. I think actually the better way of doing this instead of affiliation string is to use affiliation and specifically point to the item for Hanil General Hospital.
The reason that it's formatted up as a string currently is for it being pulled into this article here and formatting up correctly in the information about those authors.
However, eventually all of this is going to be a bit more automated. So the affiliation string is going to instead be replaced with an affiliation or indeed the information may be able to be pulled straight from that author's item as to where their employer is and in what country that employer is based. But now we get into the more unique information. So I think one
of the nice examples here is the reviewed by field where we can indicate the information about the peer reviewers. Now again, I've mentioned WikiJournals are unusual in that they
have both the peer review comments and many of the peer review identities open. So about 75% of peer reviewers agree to have their identities open and we can see an example here for Herbert Haller. But we also are able to do something useful even for anonymous peer reviewers.
So in this particular case we have this anonymous peer reviewer but we are able to at least include what's their field of study and also what are their academic degrees. So was this particularly for medical articles, is this being reviewed by an MD or a PhD and in this particular case it was someone who had expertise in both. But you might imagine that there's an
interdisciplinary article and you would like to be able to indicate which of those disciplines are covered by the peer reviewers. Let's say this is quite a monodisciplinary article but let's say there was an article about burns, pharmacology and genomics. You would want to
indicate even if you had two anonymous peer reviewers, well did those peer reviewers expertise cover all of those topics or was there a big gap because none of them actually had any expertise in genomics and therefore you wouldn't expect them to be able to pick up significant errors in the genomics aspect of that paper. So this information is typically not open
for most academic journals which I think is a great pity and I also think it's a pity that typically peer reviewer comments aren't just secret but they're permanently secret. So it's not as though they're under embargo and will eventually be released,
most peer reviewer comments are secret forever and in fact many journals no longer even keep record, no longer have records for all of the peer reviewer comments and author responses that they have for older articles. So I was speaking to one journal where they'd moved offices about
a decade ago and all of the peer reviewer comments that were older than about 30 years, they just lost all of those paper documents and they couldn't find them again. So we've lost all of that information that would actually be quite useful and one of my hopes is that eventually journals that do decide to go a closed and secret peer review route at least have those
on file somewhere so that they can be put under a long-term embargo because I know secrets about the nuclear warship activities of the UK and America that have been released
after embargo periods when we don't know the peer reviewer comments on some of the very significant publications that are now old enough that they really should be, that information really should be entering the public domain. Additionally you'll notice here that this peer reviewer has the quality of a declared conflict of interest. One of the things that we haven't yet
worked out the best way of structuring is how to link to the specific statement of that declared conflict of interest. I think in this particular scenario it's going to be best to add an additional qualifier down here just with a URL pointing to the statement of that declared conflict of interest but I'd be interested in other people's opinions of how to best structure
the specific conflict of interests that reviewers or authors bring up. And additionally I want to quickly note the handling editors here so
editorial staff for a journal vary so some of them will be handling editors and we've indicated those handling editors here. Additionally for articles that have detailed statistical methodology it's becoming more common to ensure that at least one experienced statistician has
a look at that publication to make sure that all of the stats are in order. In this particular case again we've indicated the editor who had that role but you could also imagine putting in editor roles for typesetters and copy editors to recognise the work that has gone into
producing that final nicely formatted PDF that a journal can put out as well as the work that goes into checking the language content. The only role that I've seen consistently recognised is for journals that translate into multiple languages having the role of translator well
documented but even that is not universal. And the last things that I wanted to quickly get to about on this page is some of the significant events. Okay submission is a pretty standard one but I think that it's useful to also include in
Wikidata items more open information about ethics approval because I think that that's pretty important to know. Firstly whether ethics approval has been given for a particular piece of work and also who conferred it and in fact eventually I suspect that we're going to want
to not just link to the organisation that conferred ethics approval but conceivably if that organisation has multiple different ethics boards and universities will often have this they'll have low-risk ethics boards and high-risk ethics boards or different ethics boards for medical non-medical science and humanities research they'll have they'll separate out those
ethics boards so being able to indicate which of those ethics boards gave approval I think is also going to start being quite important for works in general. And the last thing I wanted to point out here is I think that we have a methodologies section here yes describes a project
that uses so again we are able to link to both methodologies and also conceivably instrumentation so in this particular case there was no specific instruments that were used but unsurprisingly this article about Burns comparing different measures
uses those measures so in particular we're able to say oh well it's able to it's using this particular piece of software burn case 3d and in fact we can state how they got that piece of software so in this case they didn't purchase that piece of software it was actually donated
by one of the developers of that software and the same thing goes for the methodologies so you can see some of the methodologies listed here and this would also allow you to then search the literature for if you're planning on using a particular methodology finding instances in the
literature of other works that use that methodology or if a particular methodology has been found to have a significant flaw and being able to search the literature for that for that flaw but also in terms of instrumentation you might imagine a physics publication that used a specific particle accelerator so the closest particle accelerator to me is at
Monash University and we might find that actually there'd been a miscalibration of that particle accelerator between March and July of 2019 and so we could search for all of the publications that came out of work that used that particle accelerator between those two
dates and so you can see how this is actually quite extensible to enrich the metadata about these publications in a way that is extremely unusual for journal articles and so we can actually see that an additional extension of the use of that information over here so this is one of
the volumes of the Wiki Journal of Science and again all of these items here have all of this information pulled from Wikidata so the way that it's pulling these abstracts is because
Wikidata knows the URL of this particular item it's able to go to that URL and pull out the first abstract paragraph and then display this abstract paragraph here and then all of the rest of this information is directly hosted within the Wikidata item for it.
And this is of course the example for published items but even articles in process so here is our open publication processing board and you can see again all of these pre-print items that are in process have all of this information drawn directly from Wikidata so there's the
Wikidata item and the link to the pre-print itself but then its submission date, who are the handling editors, does it have any peer reviewers listed yet, that one interestingly is incorrect but these other ones have correctly list the peer reviewers that have been produced
that have submitted their peer reviewer comments and then for accepted articles you can see whether they had a DOI assigned yet for those that are intended to be integrated back into Wikipedia whether that's occurred yet and whether a PDF has been created and then obviously
articles that are finished in that process have their row removed. So I'm going to quickly talk about a project that is looking at some of this way to produce more detailed information around publications which is this
started concept so this is a standardized data on initiatives and I'm going to be talking about it specifically from the point of view of academic research and research publications although bear in mind that it's more generally extensible and that it's designed to be extensible to
initiatives in general so this might be items that are never published and are citizen science works or even more broadly than that but from a research publication or research outputs point of view here is a mapping of the existing started report system which I'll
show you an example in a moment it's a rather bland PDF but the best mapping between those categories over to various Wikidata items and also whether they are compulsory within a started form or whether they're optional so that eventually for an article like this so this is a
a research article about involving people affected with a genomic condition in the research about that genomic condition and one of its supplementary items in addition to a report so some people may have seen grip reports before a started report has also been included
and it looks like this so this rather uninspiring alpha version and it's just a plain PDF but even just skimming this PDF you'll be able to see that actually a lot of this is very structured data and should in fact be included in a machine readable format rather
than in a PDF format and so step one was recapitulating all of this information on a wiki and so in this particular case we're using WikiSpore which is a project for small and
experimental wikis to test out ideas before perhaps spinning off their own full project so in this case here's the started item and this is just essentially a copy paste of all of that information from the PDF but in a way that is going to start allowing us to replace these
items eventually with information from the Wikidata item and so in this particular case we've got the Wikidata item actually for this preprint here so this preprint here has a DOI and this is the Wikidata item for that preprint and again starting to introduce way more
information about those authors so for example the author roles so whether in this particular case this author is not only a representative author doing the research themselves but they're also specifically there in their capacity of being affected by the topic of the research
and they're involved in the design data analysis and member checking and additional information around other authors and indeed authors who don't have an item yet and are just included as author name strings but also this work had contributors specifically it had a group of 25
contributors who gave feedback on it and those contributors were affected by the topic of this research and again we've got information about the ethics approval but also significant events around this so information about for example when this report was actually written and who was
the author of the report as and additionally what material was produced and in particular I think that it's quite interesting to talk about the data that is produced because again we're very interested in being able to openly catalogue that data as it comes out and so in this particular
case there's data that's produced some of which is sensitive data and some of which is non-sensitive data so that's the data that is sensitive is kept confidential and we can talk
about also what type of access restriction it has so in this particular case it's mediated access which means that you can get access to it but you need permission from the owners and in this particular case it is owned by the research participants and the research researchers and it's stored in servers at La Trobe University whereas the non-sensitive data is unrestricted
access under this particular copyright license it was published in the open access journal but specifically we can show the download link for that data and what file format that data is in I'm not going to go into any more detail on other examples but I'm hoping you can see
how much more information you're able to associate with items some of this is only possible if the researchers themselves start including that information and so I'm hopeful that the idea of including these started reports as part of publications will become more common
and to that end this started project is working on a simple front end that'll allow people who don't have any Wikidata experience to be able to deposit that data directly into
Wikidata when they submit their articles so I shall leave it there and hopefully this has sparked a few ideas and I've also put some open questions in the etherpad about how even best to store this data and what to do with things like free text around that and different ways of
structuring it so I'll be interested in any questions that people have and I shall stop my screen share at this point thanks we've got a question on the etherpad I'll just show
it now so are reviewers ever shy about showing ignorance in their questioning if they know their comments are open there's actually been a couple of studies on that there's not any evidence that reviewers are shy about their the comments that they make but there is some
evidence that in some fields it becomes harder to attract a reviewer to give comments at all so some some journals have reported that if they ask or if they ask reviewers to make the their reports and identities open then they're more likely to just decline to review
although that's not that's not the case across all fields and the the reported effects are relatively small I think that the the greater effect that people have seen has actually been a greater quality of review if reviewers know that their review comments are going to be made open and in particular if their
if their identities are going to be associated with that the reviews tend to be longer and also tend to have things like fewer spelling mistakes suggesting that perhaps the reviewers have have gone back and proofread their own comments which I think is a good a good
sign for for peer review quality oh Alex I think you're muted unfortunately sorry