We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Snippet - Data, Journals and Academic Publishers: Wiley Publishers - Publishers' Responses to Research Data

00:00

Formal Metadata

Title
Snippet - Data, Journals and Academic Publishers: Wiley Publishers - Publishers' Responses to Research Data
Alternative Title
Data Journals: Wiley Publishers - Publishers' Responses to Research Data
Title of Series
Number of Parts
18
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
In this webinar recording Dr. Fiona Murphy PREPARDE Project (UK) talks about recent initiatives, current examples and future directions of data publishing - publisher perspectives and current current models and future directions.
HomographyWhiteboardArchaeological field surveyIdentical particlesStaff (military)Open setSoftware developerService (economics)Data managementTelecommunicationOffice suiteNumberUniverse (mathematics)MereologyComputer networkGroup actionSelf-organizationFocus (optics)Event horizonGoodness of fitArtificial neural networkCore dumpSeries (mathematics)Lattice (order)Element (mathematics)Data analysisIntegrated development environmentRepository (publishing)Physical systemData storage devicePeer-to-peerInformationSet (mathematics)JSONXMLUML
Service (economics)Open setIntegrated development environmentJames Waddell Alexander IIDependent and independent variablesHomographyOrder (biology)Goodness of fitBitRight angleTouch typingWeb 2.0Process (computing)Archaeological field surveyQuicksortPoint (geometry)Range (statistics)TouchscreenIdentifiabilitySlide ruleShared memoryThermoelectric effectComputer animation
AnalogyBitSquare numberState of matterSet (mathematics)MetadataPartial derivativeProcess (computing)Natural numberIdentifiabilityHyperlinkDifferent (Kate Ryan album)Function (mathematics)Product (business)Slide ruleCASE <Informatik>Repository (publishing)Computer animation
Rule of inferenceTheory of relativityLattice (order)Computer programmingProcess (computing)Standard deviationMultiplication signMereologyOffice suitePressureCASE <Informatik>Set (mathematics)Point (geometry)HorizonProgramming paradigmTraffic reportingVector potentialPeer-to-peerRepository (publishing)Mixed realityOpen setFunction (mathematics)Enterprise architectureOcean currentChannel capacityBusiness modelFile archiverAuthorizationQuicksortBuildingProgrammer (hardware)XMLComputer animation
Process (computing)SoftwareFormal grammarMechanism designDigital object identifierPermanentProduct (business)SpacetimeSet (mathematics)MereologyData centerPermanentNatural numberOrder (biology)Integrated development environmentProduct (business)Function (mathematics)Multiplication signHomographyIdentifiabilityLevel (video gaming)Descriptive statisticsWeb pageWebsiteDigital object identifierWater vaporComputer animation
Set (mathematics)Subject indexingElectronic mailing listMoment (mathematics)XMLProgram flowchart
Repository (publishing)Boiling pointKey (cryptography)QuicksortVapor barrier1 (number)Process (computing)Dressing (medical)Set (mathematics)Adaptive behaviorMereologyOrder (biology)Peer-to-peerCASE <Informatik>HomographyAddress spaceProgram flowchart
Peer-to-peerSupremumHomographyCycle (graph theory)AreaMiniDiscSet (mathematics)Computer animation
System programmingRepository (publishing)Link (knot theory)Link (knot theory)Repository (publishing)Peer-to-peerInformationMoment (mathematics)Characteristic polynomialMultiplication signAreaObservational studyProcess (computing)Basis <Mathematik>HomographyElectronic mailing listComputer animation
Peer-to-peerFeedbackProcess (computing)Set (mathematics)Electronic mailing listEmailInformationData managementRepository (publishing)PlanningFormal grammarConnected spaceHomographyComputer animation
Different (Kate Ryan album)Service-oriented architectureSet (mathematics)Motion captureTrailMoment (mathematics)InformationObservational studyData centerExistential quantificationLink (knot theory)Line (geometry)Windows RegistryMultiplicationAnalogyOrder (biology)Centralizer and normalizerState observerXMLComputer animation
MereologyCanonical ensembleDigital object identifierSpacetimeVideo gamePoint (geometry)TelecommunicationNatural numberField (computer science)Resource allocationSet (mathematics)Electronic mailing listContext awarenessArithmetic meanSign (mathematics)WebsiteFaculty (division)Shared memoryResultantPhysical systemData centerOpen setAreaHomographyInformationComputer animation
InformationSelf-organizationPhysical systemPoint (geometry)BuildingFunction (mathematics)Group actionContext awarenessMoment (mathematics)Table (information)WebsiteData managementChannel capacityInformation privacyData centerElectronic mailing listEmailPatch (Unix)BlogElement (mathematics)BitGoodness of fitDifferent (Kate Ryan album)Flow separationXMLComputer animation
Computer programmingTelecommunicationFunction (mathematics)Broadcasting (networking)Lattice (order)Query languageMatching (graph theory)TwitterSpacetimeProgrammer (hardware)XML
Transcript: English(auto-generated)
Welcome everyone to another ANZ webinar event. It's a pleasure to have you all here online from near and far as part of the Greater ANZ webinar series, which today have included topics such as data management, data licensing, data citation, to name but a few. My name is Alexander Hayes and I had with me here on this sunny Canberra day Jerry Ryder, a research data analyst at ANZ
who's flown all the way from Fair Adelaide. Welcome everybody to South Australia to join us for this important event and of course a myriad of meetings that she's doing. Welcome Jerry. For your interest everyone and to acknowledge the significance of this webinar topic, it's important to note
that we've got attendees registered for this webinar from the University of Canterbury, New Zealand, University of Tasmania, the Australian Antarctic Division, the University of Edinburgh, Share Sciences Australia, La Trobe University, University of Canberra Australia, Deakin University,
University of Melbourne Australia, Wiley Publishers, University of Western Sydney, Griffith University, University of Queensland, Research Data Storage Infrastructure, RDSI, Monash University and that's just to name a few. A few of these organisations, it's obviously for to whom data publishing is of great
interest and an already an integral part of their research activities. So we've got very two distinguished guests today joining us today who are a privilege to have on board given that the topic at hand is data journals. Jane Smith is the Sherpa Service Development Officer at the Centre for Research Communications, University of Nottingham. In this role, Jane's involved
in a number of projects around open access information including Romeo, the JORD and those of you who have been involved in institutional publications and repositories, you'll be familiar with at least some of these acronyms. Jane's here today to talk about the JORD project, the Journal Research Policy
Data Bank, which has a particular focus on journal publishers' data sharing policies. We also have with us Dr Fiona Murphy, who is the publisher for Earth and Environmental Sciences journals at Wiley, working with a number of titles, societies and other publishing partners. Fiona is also increasingly
involved with emerging initiatives that promote good management practices of research data, including reuse, use, citation and linking from primary publications. Among other activities, this has led to being a core part of the PREPARE project on peer review and publication of data sets and to
membership of the STM Association Research Data Group and World Data Systems Data Publication Working Group. Now for a very brief background on Anne's activities. During late 2012, Anne's staff undertook a desktop
survey to identify data journals across a range of disciplines in order to define what a data journal is, to review data journal policies in particular, looking for requirements for DOIs, data deposit and data citation, as well as to assess the status of data journals surveyed, taking into account years established peer review processes and whether they're indexed,
in fact, by Thomson Reuters' Web of Science. So we're pleased today to be able to bring together these LEAD international initiatives and these guest speakers in a webinar that will sure shed some light on the policies devised by academic publishers to promote linkage between data journals,
journal articles and underlying research data. So I'd like to welcome Fiona Murphy, who has been involved in a sister project to George, but also has some experiences in her role from Wiley as well. So Fiona, I think we should be able to see your screen now.
Yeah, can you? Yes, beautiful. Thank you. Yep, it says showing screen here. Okay, right. I'll take it away then. Thank you. I'm going to say good morning, everybody. And obviously,
good afternoon to most of the people here. And thank you very much for inviting me to speak to you today. I just wanted to say I'm going to talk a bit about some of the things that I've been doing around publishing research data and also sort of touch on some other things that are going on
as well. I wanted to start, though, with just a couple of what and why slides just to make sure that we're all maybe starting from the same starting point. So here you go. So what do we mean by publishing data? I think it's analogous to, but not precisely the same, as publishing primary data. I'm just trying to get your square a bit
smaller because it's really massive and I can't see my slides very well. There we go. There we go. It's not precisely the same as publishing primary research in so far as primary research output is
generally a finished product, whereas the data underlying it is often raw or in various states of partial process. So data should also be, and now I can't. It should be permanently or long-term
archived in a reliable repository. I've put reliable in quote marks because that can be a problematic concept all on its own. It should be allocated to persistent identifier. I would say DOI, but I think there are also problems around the nature of different kinds of data sets, which can mean that a URI or even a web link is
the only thing that's possible in certain cases. And there should be a critical level of metadata to allow discoverability to enable people to find a particular data set and to know what it is that they're looking at. And then the why. Why would we publish data?
Well, there's a very good reason to provide academic credits to the scientists, particularly the kinds of scientists who traditionally haven't been able to accrue publications and the kind of status and career path that go along with that. And also, the
publication path is one which is known within research communities and can be incorporated into current research and proposal grant workflows. Hopefully, it ensures that the data set is uploaded to, again, a trusted repository, and you can have some reliance on archiving and curation practices. And
again, I think that's something that's emerging as a need for general and better understood best practice or standards and accreditation rules. I've got peer review processes. This is
another thing that I think, I'm sorry, I couldn't switch my Skype off if that's annoying people. I hope it's not too bad. Peer review is another part of the data publication process, which is, again, analogous to the primary research piece. But
it's also not exactly the same. It's something that people have a great deal of issue with because of the size of potential data sets, time, and the skill that might be required to actually manage a peer review. The fact that it's quite well known that reviewers are already under a great deal
of strain and time pressure. So that's an potential sort of pain point in the process. Publication data, again, if we saying that it would then become more discoverable, more permanently available, then hopefully, it would then be more visible to people who aren't necessarily in the know
immediately to be able to find and reuse. And transparency, it should also support the movement towards accountability to the public and to the funding agencies,
given that a lot of money does get spent in research, and you want to see what it is you've got. And that's the other really good reason why you'd want to publish the research because it's the way that the research data, it's the way that the wind is just generally blowing. Many of you are probably aware that the White House Office of Science
and Technology Policy had a big meeting last week about public access to research, and they spent half the time talking about data as opposed to just the regular or standard research output. The Science and Open Enterprise
report came out the end of 2011, I believe, and it's a very interesting report, and it does make the case that science and all kinds of research should be open up to
people that paid for it, and anyone that wants to use it, and people should be able to find their way around in it. In fact, I've heard the report's main author speak, and he makes a very clear case that librarians are key to this new paradigm. And Horizon 2020, another one I just
picked, is the EU's programme, that's the European Union's programme for research innovation. They've got a budget, it was 80 billion euros, I think it might have been cut a little bit, but they're absolutely placing a top priority on opening up research and allowing, also
facilitating the capacity to build data sets, knowledge, to be able to find interoperability and synergies, to drive new insights and business models, and growth and jobs. Generally, they see this as being really key to
Europe's long-term viability and prosperity. So what do publishers do? Well, one of the things that we can do, I guess, is our sort of knee-jerk reaction to most things, is start a new journal. So, Geoscience Data Journal is a partnership between Wiley and the
World Meteorological Society, and it's also been supported by NERC, which is the Natural Environment Research Council in the UK. In particular, the British Atmospheric Data Centre has been very helpful, giving us a time and people's space to work through how we might set
this up and make it work. As you can see, we publish short data papers which are cross-linked to and which cite data sets that have been deposited in an approved data centre in the water as a DOI or another permanent identifier. I've also put a little description here of what we believe a data article is and why it's a good thing to
do. As you see, it's the when, the how, the why the data was collected and what the data product is, and it's a way of pulling together all the parts of the project, the output, that would enable reproducibility and also reusability. I've also put up a splash page here of
what an article looks like, and I wanted just to draw your attention to the fact that there we've got the DOI of the article itself, but we've also put the DOI of the data set up there on the front page. We thought it was
really important to have it sitting up there amongst the front matter, really prominent, so that people can see how it relates to the article overall. I did want to mention as well that we also put the DOI in the reference list because we're wanting to support the general citation of data sets, and we're also mindful of the
Thompson data citation index, which is being pulled together at the moment, and we want to make sure that we support whatever working workflows that they eventually come out with. I've got here on the left a
picture of the workflow as we envisaged it at the beginning. As you can see, it's pretty complicated. There are a lot of processes, and there are a lot of parts where the research has almost been battered between, say, the publisher and the editorial process of the primary
research paper, but also the repository and the data set itself. We felt that this is a potential barrier to people really picking up and running with this sort of publication, so we started isolating, trying to name the
issues we felt were the key ones, and so there was a workflow and cross-linking issue. The journal and the repository need to be able to speak to each other. We need to know something about the repository in order to be able to work with it, including whether it's going to be
here next year, what happens to a data set that goes into that repository. I felt that it was intrinsic to calling something a journal is that there should be peer review. Again, as I mentioned earlier, peer review of data sets is a big ask, and people aren't really clear what it is
they're supposed to do. Just generally, I think Jane touched on it as well, that people, researchers, they are being slightly pushed towards behaving in this sort of way, but they're also having to operate in the real world where
if you've painstakingly compiled a data set, you don't want just achieving any credit yourself, and it's important to be able to engage with people and answer questions, address concerns, and adapt as required. We felt that
also boiled down to the need for a better understanding of how this sort of publication, a journal and a repository would interact. In which case, we started working on a prepared project, that's where that came in. Again,
like George, it's GISC funded, and they're managing research data strand. I've put up here the key partners, the contact details of the project leads, and as you can see, we're coming towards the end of our cycle. We've got some outputs, and we've got some final, including
what I'd like to point out. One of our work packages, one of the areas we've been investigating is repository accreditation. Because clearly, if you've got data paper, the data set, there needs to be a strong, durable link. But there are a lot of questions that we
have to know on an individual basis if a repository is trustworthy. But we need to have some way of sealing that, of publicizing that, and of generally ensuring we don't have to keep duplicating that work every
time we either start a new journal or another publisher wants to work with that repository. So we were looking to see how to start pulling that insight and information together. As you can see, we've put a list of the characteristics that we've been looking at around the project accreditation and how you assess
whether an repository is good to work with. We're in the process at the moment of finding some recommendations, which I'll give people a note of how
to interact with in a moment. This is the second key area of our study, the peer review of data. As you can see, we had a workshop a couple of months ago, and we decided that we had three recommendations. The need to connect data review, data management
planning, so to basically pull the data management plan, which hopefully happened much earlier in the process, with the review, which then happens at the end. So you can connect what was the project supposed to achieve, what was the data that was supposed to come out, and how was it supposed to be collected or used or assessed. We wanted to show
that there's two sets of reviews. There could be a scientific review, there could be a technical review, and these both needed to be reflected in the curation and information that was held about the dataset. We also wanted to connect the
processes of the data review with the article review. As you can see, we've got a formal document for comment, which is up in that URL there, and there's also a mail list, the data publication email address, which is very easy to
join and to comment on. I'd be very happy if you were to do that. You can also find, if you join the list and then go back through some of the previous posts, you can also find the material around the repository accreditation. Most
recently we were looking at cross-linking and workflows. So I've only just had the workshop on that one, and so I've just put some preliminary findings from that. The loudest voice actually was, as I was touching on before, the need for there to be some central registry of broker, and that's partly, I think, also related
to the accreditation issue, but it's also to do with the fact that at the moment all the links are bilateral, and any information that's sent between them is largely manual, and it's just not going to be workable to try and build
that up. If you can imagine in a world where a lot of datasets are being cited and you're wanting to collect, capture information about who's cited what, has an article, has a data
paper pulled together, multiple datasets, and then cited them, they could be sitting in different data centres, and we just can't keep a track on that manually. I think something on the lines of the cross-ref brokerage, maybe
something around Thomson Reuters and the ISI might be a possibility, but it feels that in order for people to be incentivised to publish datasets, we need to be able to collect the citations, and that then needs to be done in a manageable way. As we said, so data citation, I think, is
also emerging as a currency that's understood amongst research communities. In fact, data citation is like publication of data, it's analogous to, but not exactly the same as primary research. If you can imagine, if you've
got a long-term observation dataset in the atmospheric sciences, a dataset could be cited, and then the same dataset could be cited a year later, and it wouldn't that be a different dataset. So there's a certain amount of something being fixed and yet not fixed, which you
certainly don't get with primary research articles, but the concept of citing that dataset is something that I think many of us are familiar with. So I also wanted, in the interest of fairness, to mention, obviously, while I'm not the only scientific technical
publisher, and I'm also aware of, and in fact applaud the fact that many publishers are exploring this area, and I think it's a sign of its growing importance, and people just realising that this is going to be critical
for underpinning scholarly communication going forward. So I've just been through a few journals and publications down here to illustrate that, and I'm aware this is by no means exhaustive. We actually do have quite a good list on the prepared website. That was one of the things that we did was to pull together a
list of data journals, which again people are welcome to go and have a look at. But Earth System Science Data is an EGU publication. It's open access. It's been going for about four years, and it's fairly similar to Geoscience
Data Journal. It has an open peer review system, and at one point it was also publishing supplementary information, which we decided we didn't want to do. I think they've now tightened up their criteria. Scientific Data from Nature was announced very recently, and I think
that that was a real signal, I think, of the importance that this topic is starting to assume. The Scientific Data is going to be publishing what they've termed data descriptors, which I think are pretty much data papers, data articles,
but it hasn't yet formally launched. So I think that's just a space to watch and get more information about as we go forward. Gigascience Biomed Central is very much a life in medical sciences. It's a big data project, and Gigascience also undertakes to hold the
data set as part of the publication. The Faculty of F1000 Research, this is another quite new entrant into the field. Again, they have open peer review, which is also post
publication, which is another one maybe to have a look at. It's also in the life sciences, biomedical, biomedical sciences, but it's also building up partnerships with some of the data centres, such as Figshare, which will take your
uninteresting data sets and allocate DOI. They're just generally going to build the canon of scientific technical knowledge. I also thought it might be useful to mention a couple of things that you can go and do after this session. Have a look at our site, which has
got quite a lot of information about the work packages and the output that we're conducting, and also the blog and the mailing list as well. You're very welcome to join or interact with
that. More widely, there's a GIS mailing list which relates back to the website I mentioned earlier. That's quite interesting. That's very international. It's got a lot of librarians, data centre managers, publishers, interested
researchers who are all trying to engage around this, and it's picking up, I think, on a lot of the things that are going on that other organisations are engaged with. Research Data Alliance, again, is quite another question, new initiative, which I know Anne is very involved
with, which I think is at a point where it would be quite interesting, quite useful to at least engage with, join some of the mailing lists, because there are a lot of working groups which are just getting off the ground at the moment. Some of them are around things
like data publication, data citation, capacity building, and so forth. At the very least, you can keep an eye on what's going on by being aware of them. The World Data System is another international organisation which is encouraging a membership. Again, I'm aware that the
Australian Antarctic Data Centre is certainly a member of the Australian Bureau of Meteorology, but it's generally, it has a mission to support the best practice of stewardship, curation, research data, and you're invited to support
the mission. You can become an associate member, which doesn't involve paying any money, but which does involve being called to the table to actually engage with and support the policies as they're emerging, as they're worked through, and it's with an idea of joining things up and
supporting interoperability, and not really venting the wheel as well. I think there is a potential issue with there are so many things going on at the moment that people could well be working in isolation and reinventing the wheel in several different places at once. I think Research Data
Alliance, the World Data System, are very much looking to see what's already out there, what's good practice, where the low-hanging fruit is, and to actually build from there and support the things that are already happening. In the future, a little bit of a blue sky moment, hopefully I'll
know a lot more about the future tomorrow, because there's actually a meeting in Oxford, I've put the programme there. We're hoping to have at least some of the sessions broadcast or recorded, but hopefully there'll be a Twitter feed as well, and we'll try and make sure there are some
outputs that come out from that. More generally, I think the sense that the stakeholders in this, in scholarly communications, we're in a shifting landscape. I think it's really important that we speak to each other, that we're adaptable, and I think that there's just so much to do that there
should be space for all of us within that. And I think that journals and scholarly communications are going to start really changing in the not too distant future, and I think there'll be more enriched content, there'll be more tools for query.
I think things like copyright and ownership are going to become, they're going to adapt, they're going to change, I'm not going to say they're not so much important, but I think they're going to be important in a different way. And that's it.