We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Snippet - Data, Journals and Academic Publishers: JoRD Project

00:00

Formale Metadaten

Titel
Snippet - Data, Journals and Academic Publishers: JoRD Project
Alternativer Titel
Data Journals: JoRD Project
Serientitel
Anzahl der Teile
18
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
In this webinar recording Jane Smith from the JoRD Project (UK) talks about recent initiatives, current examples and future directions of data publishing.
Offene MengeDivergente ReiheElement <Gruppentheorie>MereologiePeer-to-Peer-NetzInformationsspeicherungEreignishorizontDatenanalyseSoftwareentwicklerSpeicherabzugDatenverwaltungDienst <Informatik>Güte der AnpassungZahlenbereichFokalpunktProjektive EbeneRechnernetzSondierungPhysikalisches SystemProgrammierumgebungDokumentenserverInformationGrundraumTelekommunikationOffice-PaketWhiteboardNichtunterscheidbarkeitSelbst organisierendes SystemStabVollständiger VerbandNeuronales NetzGruppenoperationSchnittmengeJSONXMLUML
Dienst <Informatik>DualitätssatzProgrammierumgebungBenutzerbeteiligungProzess <Informatik>Ordnung <Mathematik>Spannweite <Stochastik>IdentifizierbarkeitSondierungWärmespannungComputeranimation
Coxeter-GruppeInformationIndexberechnungProgrammierungRechter WinkelProjektive EbeneTUNIS <Programm>DatenverwaltungBitMini-DiscAutorisierungMereologieWhiteboardSinusfunktionAdditionDatenstrukturDifferenteMixed RealityFeasibility-StudieMathematikSchnittmengeProgrammiergerätXML
Feasibility-StudieEDV-BeratungInformationQuellcodeProgrammDienst <Informatik>VektorpotenzialShape <Informatik>DateiformatEndliche ModelltheorieSchlüsselverwaltungInformationUnternehmensmodellQuellcodeKonfiguration <Informatik>StabQuick-SortAggregatzustandOrdnung <Mathematik>Service providerFeasibility-StudieEDV-BeratungDienst <Informatik>DokumentenserverStrömungsrichtungMAPDateiformatNichtkommutative Jordan-AlgebraDatenverwaltungRechnernetzTelekommunikationComputeranimation
StandardabweichungWiderspruchsfreiheitExistenzsatzFlächeninhaltSondierungFaktor <Algebra>IndexberechnungBeobachtungsstudieTeilbarkeitBitFlächeninhaltStandardabweichungPhysikalischer EffektWiderspruchsfreiheitGemeinsamer SpeicherAlgorithmische ProgrammierspracheZentrische StreckungMinimalgradTermGraphiktablettSchreib-Lese-KopfAutorisierungStatistikHyperbelverfahrenMailing-ListeKlasse <Mathematik>Computeranimation
Einfache GenauigkeitSondierungSoftwareschwachstelleProjektive EbeneDateiformatZentrische StreckungMultiplikationsoperatorTexteditorWeb SiteGraphAdditionKonditionszahlOrtsoperatorAutorisierungComputeranimationDiagramm
SondierungAutorisierungTypentheorieBefehl <Informatik>URLWeb SitePunktMaterialisation <Physik>Anwendungsspezifischer ProzessorProgrammierungMultiplikationsoperatorDatentypUmwandlungsenthalpieIndexberechnungTermMultimediaWiderspruchsfreiheitSpieltheorieEDV-BeratungProgrammcodeXMLUML
ProgrammbibliothekDokumentenserverSystemverwaltungSondierungDatenverwaltungZahlenbereichTabellenkalkulationBildschirmmaskeFeuchteleitungGruppenoperationDifferenteKomplex <Algebra>AnalysisQuick-SortProgrammbibliothekOrtsoperatorStabSystemverwaltungWhiteboardMailing-ListeSoftwareentwicklerProgrammierungGrenzschichtablösungPunktInformationSondierungFokalpunktDienst <Informatik>AutorisierungCASE <Informatik>TopologieElektronische PublikationOffene MengeBitSichtenkonzeptMini-DiscExogene VariableFitnessfunktionIndexberechnungComputeranimation
DokumentenserverDienst <Informatik>MetadatenExogene VariableElektronische PublikationKonditionszahlDienst <Informatik>Algorithmische ProgrammierspracheUmwandlungsenthalpieUmsetzung <Informatik>Inhalt <Mathematik>DokumentenserverDatentypEDV-BeratungQuick-SortGemeinsamer SpeicherAggregatzustandWort <Informatik>Intelligentes NetzSondierungMetadatenReibungswärmeFlächentheorieComputeranimation
Endliche ModelltheorieDienst <Informatik>GeradeDatenbankInterface <Schaltung>Desintegration <Mathematik>DokumentenserverCOMLokales MinimumDienst <Informatik>CASE <Informatik>Selbst organisierendes SystemQuick-SortFormale SpracheZweiInteraktives FernsehenProgrammiergerätFeasibility-StudieDatenverwaltungDokumentenserverIntegralDatenbankKartesische KoordinatenBenutzerbeteiligungMereologieInterface <Schaltung>AggregatzustandEDV-BeratungAdditionVirtuelle MaschineDifferenzkernFramework <Informatik>Elektronischer ProgrammführerBenutzeroberflächeKonfiguration <Informatik>Computeranimation
FlächeninhaltDienst <Informatik>DifferenteProzess <Informatik>ÄhnlichkeitsgeometrieMultiplikationsoperatorZahlenbereichPhasenumwandlungProjektive EbeneBeobachtungsstudieAlgorithmische ProgrammierspracheElektronischer ProgrammführerBitInformationGemeinsamer SpeicherComputeranimation
DatenmodellImplementierungPhasenumwandlungKontrollstrukturDienst <Informatik>Basis <Mathematik>Kartesische KoordinatenVirtuelle MaschineKontextbezogenes SystemGebäude <Mathematik>SchnittmengeGüte der AnpassungFlächentheorieSoftwareentwicklerEndliche ModelltheoriePhasenumwandlungInterface <Schaltung>Dienst <Informatik>Computeranimation
DatenverwaltungService providerDateiformatMetadatenSyntaktische AnalyseHilfesystemElektronische PublikationService providerTaskLeistung <Physik>SelbstrepräsentationZahlenbereichFeasibility-StudieEntscheidungstheorieMomentenproblemSondierungDienst <Informatik>Güte der AnpassungDatenverwaltungMetadatenBeobachtungsstudieTabelleTypentheorieDateiformatCoxeter-GruppeMini-DiscGreen-FunktionMinkowski-MetrikComputeranimation
HypermediaVererbungshierarchieAusnahmebehandlungXML
Transkript: Englisch(automatisch erzeugt)
Welcome everyone to another ANZ webinar event. It's a pleasure to have you all here online from near and far as part of the Greater ANZ webinar series, which today have included topics such as data management, data licensing, data citation, to name but a few. My name is Alexander Hayes and I had with me here on this sunny Canberra day Jerry Ryder, a research data analyst at ANZ
who's flown all the way from Fair Adelaide. Welcome everybody to South Australia to join us for this important event and of course a myriad of meetings that she's doing. Welcome Jerry. For your interest everyone and to acknowledge the significance of this webinar topic, it's important to note
that we've got attendees registered for this webinar from the University of Canterbury, New Zealand, University of Tasmania, the Australian Antarctic Division, the University of Edinburgh, Share Sciences Australia, La Trobe University, University of Canberra Australia, Deakin University,
University of Melbourne Australia, Wiley Publishers, University of Western Sydney, Griffith University, University of Queensland, Research Data Storage Infrastructure, RDSI, Monash University and that's just to name a few. A few of these organisations, it's obviously for to whom data publishing is of great
interest and an already an integral part of their research activities. So we've got very two distinguished guests today joining us today who are a privilege to have on board given that the topic at hand is data journals. Jane Smith is the Sherpa Service Development Officer at the Centre for Research Communications, University of Nottingham. In this role, Jane's involved
in a number of projects around open access information including Romeo, the JORD and those of you who have been involved in institutional publications and repositories, you'll be familiar with at least some of these acronyms. Jane's here today to talk about the JORD project, the Journal Research Policy
Data Bank, which has a particular focus on journal publishers' data sharing policies. We also have with us Dr Fiona Murphy, who is the publisher for Earth and Environmental Sciences journals at Wiley, working with a number of titles, societies and other publishing partners. Fiona is also increasingly
involved with emerging initiatives that promote good management practices of research data, including reuse, use, citation and linking from primary publications. Among other activities, this has led to being a core part of the PREPARE project on peer review and publication of data sets and to
membership of the STM Association Research Data Group and World Data Systems Data Publication Working Group. Now for a very brief background on Anne's activities. During late 2012, Anne's staff undertook a desktop
survey to identify data journals across a range of disciplines in order to define what a data journal is, to review data journal policies in particular, looking for requirements for DOIs, data deposit and data citation, as well as to assess the status of data journals surveyed, taking into account years established peer review processes and whether they're indexed,
in fact, by Thomson Reuters' Web of Science. So we're pleased today to be able to bring together these LEAD international initiatives and these guest speakers in a webinar that will sure shed some light on the policies devised by academic publishers to promote linkage between data journals,
journal articles and underlying research data. I'd now like to introduce to you Jane Smith from the University of Nottingham. I hope everyone can see the presentation appearing. Here I am. I've been working on the Journal Research Data Bank, Data Policy Bank or JORD for
simplification. Just before I talk about what happened to the project and findings, I just want to give a bit of background. I'm sure you're all quite familiar with it if you're tuning into the Anne's webinars, but just bear with me. Data has become an increasingly valuable resource in its own right.
People are wanting access to the data behind journals, not just the data in the journal article. So they're wanting to access the data set. Research councils are now wanting that publicly funded research data to be made more
available and shared across the communities, as much as an indication that they're spending their money appropriately. With changes in research practice and technology, it's now possible to make use of these data sets and collect different data sets, different researchers and extract
additional information across the board. As I'm sure you were, I think it was 2011, Anne's had an international workshop and part of this came out, the conclusion that it would be a good advantage to collect journals' policies on research data, what the journals and the publishers want the
authors to do with that data. So GISC, which is who funded JORD, through the Managing Research Data Programme, incorporated this idea into it and asked for people to do a feasibility study of, is this actually sensible to do?
Other aspects of the programme, including making research data management programmes and management strands in various institutions, so there's a bit more of an infrastructure to be developed. And if institutions are developing, asking the researchers to deposit data, they're going to start wanting to know what the journals will let them do.
So in some ways, we've been calling it somewhat short and cheekily as the rule-me-over data to help people understand. So, Jordan's six months feasibility study, it ran in July, December last year, as it was commissioned by GISC, and it was run by Centre for Research
Communications, Research Information Network, our colleague, Paul Sturgis, at the University of Loughborough, that's just down the road from Nottingham, and Mark Ware Consulting. And together, we scoped and shaped a potential service that could provide ready source information for covering journal policy landscape of research data.
So we did this in three stages. Sorry, I need to have my notes in the right order. Our aim is to identify the scope and format of a service to collate and summarise general data policies, but also to investigate and recommend business models, which is the aim that would be
financially self-sustaining. So those key stages I mentioned. First, we wanted to investigate what was the current state of journal policies on research data. Are there any out there? How good are they? What do they cover? That sort of thing.
We also wanted to consult with stakeholders. I'm not just talking about researchers, but the research managers, the funders, the publishers, support work, people who support researchers like librarians and repository staff. And as mentioned, we want to look at the business models
and what service options are available. So the review, we want to look at what had been done already in the literature. Had anyone done something similar? Did they have any recommendations on how to do the studies? Things like that. In summary of the literature review, general thoughts was there wasn't a great deal of literature on this area,
particularly on journal policies and research data. There might be stuff about research data, but not necessarily about journals having policies. However, there were some key studies and these found a large percentage of journals lacked policies and data sharing. And those studies are the likes of McCain in 95 and, perhaps more famously, Pirouin and Chapman in 2008.
I don't have the full references, but I can get them to people if you wish. There's also no standard procedures across these, from these studies indicated how a journal should create a data sharing policy or what those policies should advise.
That's right, there's also a large degree of inconsistency. Some were very vague, some were very clear and cut off what was wanted. There's also little guidance available to the authors. However, some subject areas, like biomedical science, was leading the way in this area.
And also, perhaps for cause of little guidance, researchers' data sharing habits were also quite inconsistent. So with this knowledge, we went to start looking at what policies the journals actually have.
We decided to look at all the highest and lowest impact factor journals and to pick 100 of each of these from the two subject areas covered by the Thomson Reuters Citation Index, and that's science and social sciences. As you've noticed, we only actually looked at 371.
This is because there's actually some duplication across these two lists. Of those 371 titles we investigated, 162, which is 44%, actually had policies. In fact, there's 230 policies, which I'll explain a bit later, but it does make sense.
Those are quite good subject coverage. There's 36 subject areas covered across these two lists. We did consider whether or not to contain journals we knew had policies in, but decided at the end to remove these because that could give a bit of bias and we didn't actually know where they sat on an impact factor scale.
So this is a graph of who had policies. As you can see, the majority of the journals we looked at had no policy. We have some listed as unknown, and that's really where we were unable to find a journal website.
So we couldn't find it if they had a policy, and we decided not to go for direct contact with the journal editors due to the timescale of the project. However, there were multiple policies for the journals, about 15%, and this would be where there might be a policy on data sharing, there might be a policy on data preservation,
there might be a policy on the formats of the data, and so the data policy was spread across multiple policies of the journal. We used Pirouin Chapman's definition of strong and weak policies, which in summary is where strong policies where data put to deposit is the condition of publication.
For example, if you don't deposit it, you can't publish, whereas a weak policy would merely suggest or recommend the deposit. Based on this, of the journal policies we found, nearly three quarters were weak, with only a quarter being strong.
Perhaps again not too surprising, the high impact journals were more likely to have a strong policy, and the lower impact journals were more likely to just recommend or suggest that authors shared data. However, again, as indicated in our literature review, approaches varied between subject disciplines,
with some more established than others. We did in fact notice, in addition to biomedical sciences, some of the chemical structure journals had more established practices. In addition to finding out where they had a policy, we also wanted to know what was in that policy.
So we looked at data types, and with this, basically what type of data did they want the author to deposit. Most of the time we found it was datasets, multimedia, other data, fairly general terms, quite important in terms of datasets, but general, whereas very few were asking for specific types of data,
but those that did were actually things like program code, or protein and crystal structure to be deposited somewhere. We looked at where they're asking to deposit. The greatest percentage of the policies we looked at, requested materials were put on a website, fairly general again, or just journals on website.
However, when we did some stakeholder consultation, it was revealed that a lot of the publishers were actually quite keen on well-managed subject depositaries, but few were actually specifying then in their journals to do so. We then looked at when we were looking at deposit. This is again quite inconsistent across the policies.
23% of the policies we looked at asked the data to be made available for peer reviewers, but not necessarily available to the readers after that point. 51% mentioned actually depositing alongside the article,
and with some of these percentages, they might be ticking several of the requests. They might ask for reviewers and to be deposited later and available, so it's more interesting. At least one journal did allow the collusion of an institutional website URL as an end note to this articles, as long as it was a statement there that said the data hadn't been peer reviewed
and may be updated. It did allow for that tying in of the data, the background data to the article. Regarding sanctions, very few, only 22 of the policies we found, made any indication that if you didn't deposit the data, you might not be published.
So we tried to look at consulting stakeholders, and these were really across the board. Scholarly publishers, research funders, research administrators from positive staff, library staff, and the researchers themselves. And we wanted to look at how they currently share data. Do they agree with the idea?
Do they have any concerns about sharing data? Would they use a service listed journal data policies? And for those that are around, would they be interested in assisting with this upkeep? So we conducted 23 deep in-depth interviews, and these were mainly with publishers, library, support staff.
We also had a focus group of researchers and a workshop with publishers. Then we did an online survey that was directed at researchers again. And across this, it was found a complex situation with different stakeholder groups making some assumptions about each other's views. And what their actions. However, the majority did support making data open,
and listed quite a few benefits of doing so. For example, preserving data for the future, promoting knowledge, reducing fraudulent claims, and enabling the data to be scrutinized by the community. However, there were some concerns and barriers and caveats. The researchers were concerned about who would own the copyright to the data.
Would the data be available in a form that it could be valuable to share? A spreadsheet of numbers might not make any sense to another researcher. Do they need another layer or sort of basic analysis before it can be shared? And in some cases of the researchers, particularly early careers researchers, they were concerned that making the data available before they submitted their PhD
could mean their PhD was worthless. So, just give some of the three of the main groups and some of their comments. Researchers. They indicated quite, they thought a general policy bank would be quite valuable,
because it allowed them to access whether a particular journal policy fits their form of data or data sharing ethos or the requirements of their funders. And it can be a point of reference of accessing other researchers' data. The librarians and the policy staff, those with a history of librarianship had really not so much knowledge about curating data,
but they had similar experience with curating journal and monographs collections and thought this knowledge could be transferred. However, in spite of this potential, there wasn't much happening. In the UK, since that stakeholder management, the same GISC program has
lost several research data management programs at the various institutions taking part. So, that picture may be changing. However, they thought the librarians did think that a policy bank would be quite valuable. It enabled them to support and develop research data management at their institution and would help them gain information, provide publication guidance to the researchers
that were interested. Now, publishers obviously wanted to look at what they thought. They thought that the audience for George was a little bit unclear. Was it researchers? Was it the publishers? Was it the librarians? However, they thought that an accessible list of information on data policies could
be useful for the funders and policy staff and authors themselves, but especially for researchers to ensure compliance with funders and institutional demands. So, some sort of summaries of the stakeholder consultation. All of the stakeholders recognised the importance of linking between journal content and underlying
data, particularly where data is stored in subject-based repositories. There was consensus about the importance of making data freely available. How is a less unified approach about actually doing so in practice?
So, some of the common features that came out of the stakeholder conversation of what should be an adored service. There's quite a wide-ranging specification and requirements, and if you listed them all together, it's going to be quite hard to satisfy everyone fully. However, these are the five common features that came out. We wanted clear, automated and simple instructions on the service.
Clear documentation on the service's aims, its policies and procedures. We wanted to know, for the journal policies, we wanted to know what the conditions of policy were, would they be able to reuse, how to access and any restrictions on the data. We wanted guidelines for recommended file and data types and metadata,
policy wording, how to write the policies, and we wanted to know where the data could be archived. Almost 80% of our respondents to our online survey, which targeted researchers, answered that they would use such a centralised service that records the data sharing policies of
academic journals. So, there certainly was interest in the service. But the big thing is, can it be self-sustainable? So, having my colleagues developed a potential, based on the stakeholder consultation, some three basic services are then market tested and spoke to the stakeholders
which they should be more interested in. So, the first one suggested was a very basic service. The minimal web interface, it'd have, excuse the acronym, an API, an application programmers interfaces, which would allow machine-to-machine interaction with the database. But it wouldn't be much more than that. The second was an enhanced service,
that'd be the same as the basic, but there'd be additional data integration. So, it would link to compliance with funder policies, possibly institutional policies, and it may list recommended repositories for the deposit. Lastly, there was a device service, same as enhanced,
but on top of this, it'd be a more advisory guide, best practice for writing policies, policy frameworks, or policy language suggestions. In general, the stakeholder preferred either of the first two. However, when it came to speaking to budget holders,
although they were quite positive with the idea, on the sort of research management side, they were less keen on providing the funding. They didn't think they could persuade the organisation that this was sufficient benefit enough to want the funding. Conversely, the publishers were quite keen on funding, but they wanted a lot more in the
service that actually would possibly make it impractical to start off. However, based on these three services, options, and stakeholder computations, a full business case was submitted to GIST as part of the feasibility study.
A quick summary of the findings of the project. Regarding data sharing, it was felt that this was quite an interested subject. It was certainly a growing area. There are publishers developing data only journals, and a rough guide from some previous studies of McCain in 1995 and Pirouin and Chapman in 2008,
bearing in mind the different population sizes, there did appear to be an increased number of policies each time people looked at it. So it's certainly an increasing area. However, when it comes to actually sharing information,
there's a lot more floor uptake. Researchers were perhaps more likely to share with them but not necessarily the colleagues on the other side of the country or the other side of the world. Again, similar reasons down to the hypothetical PhDs you mentioned before, the concern that other people would trump them in publications.
The policies that did exist, there's a possible slight increase in this area, but they're still generally poor, not very clear, and they were missing in some subject areas. There's a general support for a job service,
but these are requirements different from research and publishing communities. So although there's some features, there could be some issues of how to go about it. However, the data is in an increasing area, so a job service could benefit the future in this
area, and it could help build while the numbers of policies are smaller, a bit better than all the discussion, and build now. So George recommended to GISC a two-phase study, or two-phase procedure to go ahead. Phase one would be grant funded,
and it would build a simple service focusing on getting a good data set of the policies, with simple technology. Then use that to build engagement with stakeholders, build awareness, establish a need for the service, and there could still be that machine-to-machine interface
with third parties creating applications on top of it, and also to further develop self-sustaining model. Phase two would implement the self-sustaining model, and there might be a need for some additional funding before breaking even, but there could also be opportunities for grant funding, research and development activities.
So some final thoughts on what we found for the George survey, George service. The use base for a George policy bank would probably mainly be people on the service that work within and
support the research community. A lesser number of users would be publishers and funding bodies, so representatives acknowledges some use for collation of the journal data policies that we found. Such a service could provide easy access to journal data policies, provide clarity on when, where and what to deposit, provide guidance on file and metadata
formats, and help librarians and support tasks to enable researchers. As I mentioned, there's currently a small number of policies available, we're talking in the hundreds, if we take into account previous studies. So building a George-type service would be much
simpler and likely to be built sturdily if done now. It will enable introduction of good practice now before the policy numbers increase dramatically and no one has an idea what to do with them. So at the moment we're awaiting a decision from JISC on how to take the George concept further as they consider our feasibility study. So my recommendation to you is get
involved in research data. If your institution has a research data management plan, get involved. If it hasn't, encourage the powers that be. It's a good idea. So there's a few references there in short. As I said, I can provide them in full if required. So any questions? That was a really interesting presentation.
And having done a very small desktop survey of journal data policies, applaud the rigor of the work that was done by George and recognise it was no small task.