We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Characteristics of a Well-Developed Grey Literature Repository: The Case of the International Nuclear Information System

00:00

Formal Metadata

Title
Characteristics of a Well-Developed Grey Literature Repository: The Case of the International Nuclear Information System
Title of Series
Number of Parts
30
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer

Content Metadata

Subject Area
Genre
Abstract
The number of national, institutional, and subject repositories of grey literature has increased dramatically over recent years. The Directory of Open Access Repositories (OpenDOAR) currently lists 5848 repositories, a 75% increase over the last five years. Most of these repositories hold grey literature of one type or another. The degree of development of these repositories is mixed, some are of questionable quality while others are exemplars, so it is useful to define what constitutes a well-developed repository. A well-developed repository can be seen as one that meets the needs of end users, as well as the interests of authors and sponsoring organizations. Characteristics, such as timeliness, openness, user-friendliness, accuracy, and completeness, are proposed as those which meet user and institutional needs and define the degree of development for a given repository. Timeliness refers to the speed at which materials in the scope of the repository are made available to the public. Openness is the degree to which material is accessible as well as shareable. User-friendliness is a subjective quality but is defined by the ease of use of the repository’s user interface. Accuracy can be measured in many ways – the verisimilitude of metadata, the suitability of indexes and search results, the percentage of dead links to external resources such as full text, and other measures. Finally, completeness describes how well a repository encompasses its scope. The International Nuclear Information System (INIS) has been in operation since 1970 as a repository for grey and traditional literature in all areas of nuclear science and technology. It existed before the wide adoption of information management principles, and invented methods and workflows to fulfil its mission. Therefore, there are gaps between the ideal repository, embodied in the outlined characteristics, and the repository as it currently stands. These gaps are identified and solutions, as well as a plan for implementation, are proposed.
Keywords
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Transcript: English(auto-generated)
Hello, I'm Brian Bales, the coordinator of the International Nuclear Information System at the IAEA, and I would like to talk about the characteristics of a well-developed grey literature repository, and specifically about the case of the International Nuclear Information System.
So there's been a great deal of change in the information landscape and in scientific publishing, especially in the area of openness. If we look at OpenDoor, the directory of open access repositories, 15 years ago there were only 78 listed, now there are over 6,000, and there are so many sources of freely available
open information in science, including in Crossref, the directory of open access journals, Archive which has preprints, CORE, one of my favorites is PubMed Central, and so we are paying to free and open materials that would otherwise sit behind a firewall.
And in this changing information landscape, I've been thinking about the International Nuclear Information System. It has its roots in the very founding of the International Atomic Energy Agency, where the third purpose of the agency was to foster the exchange of scientific and
technical information. Now, if we fast forward to this year, INES is authorized to provide member states and other users with relevant, reliable, and up-to-date information in the area of nuclear science and technology.
So INES was founded and continues to operate in this way by 23 countries and two international organizations, and has since grown to 132 countries and 17 international organizations. But the basic idea is that member states and organizations will send us their nuclear
science and technology literature, including grey literature, and grey literature makes up a lot of what makes INES special and important in the world. So INES originally existed in print and microfiche form.
In fact, the first computer, as we can see here, was bought by the agency for the purpose of INES. Here we can see microfiche being created. Microfiche is where the references were held. The so-called atom index file was printed, and you would search in the atom index book
and then find the appropriate microfiche. So INES quickly, though, became a computer searchable database and was at the forefront of computer science. And if you think about all of the equipment and all of the time that was put into the
production of microfiche, the purchase of a computer, the great investment that the information management, so we are the inheritors of that today and the custodians of that bringing it into the future.
So INES continued to develop and eventually created a web-enabled interface, INES repository search, and this is, to me, quite user friendly. It allows you a simple search or an advanced search. In fact, here I've done a search on grey literature, and we see 8,000 references to grey literature
in our repository already. The first one is a reference to Kiyoshi Ikeda. Then we have a full text from Dobie Savage. And we also have a DOI reference to a paper by Joachim Schupfel.
So the growth in users in recent years has been quite remarkable. In fact, when INES became an open and Google searchable repository, it experienced a great jump in the number of users. In fact, since 2011, the 10 years from 2011 to 2021, it's experienced an 11,241 percent
increase in that time. So it's been a remarkable journey in the number of users and the usefulness of INES. I would say it's gone from being a specialized repository that a few experts knew about
to a general repository for the general public around the world. And so by most measures, INES has been a great success. It has over 4.5 million records. Over 2 million of these lead to full text.
About 617,000 full text are hosted by us locally. But then there are some ways in which it could improve. And I think that in looking at other repositories that we can see, we can model ways in which it could improve. For example, INES harvests sporadically.
In other words, we go out and find material in our scope. But we do this when we have time or when it's requested by the repository owner. Manual operations that have been developed over the 50 years of INES have slowed the ingest of materials.
In fact, lag time, which means I put an asterisk there, to explain lag time means that a piece of literature comes out. And then how many days, months, or years is it until that piece of literature appears
in INES? Well, it's sometimes years behind is the lag time. The corpus that's created in our scope, I estimate at approximately 250,000 pieces of literature per year. But the corpus that is ingested is approximately 125,000 per year, so about half.
And that includes both non-gray and gray literature, conventional and non-conventional. So as we think about redesigning INES or as improving INES in some ways, honoring what's gone on in the past and the success that it's had in the past, but also seeing
what's out there, incorporating what we can, and preparing for the future, I thought to look at a couple of repositories. And I've looked at many more than this. But here are just a couple of examples that kind of embody what I'm talking about.
If we look at the astrophysics data system, which is obviously in the scope of astrophysics, it has kind of a similar simple interface, but it also has examples on how to do more advanced searches.
In fact, if we look at some of the facts of the astrophysics data system, it has over 13.3 million references, although it's only existed for about 20 years, where INES has existed for 50 years. It harvests automatically with a daily frequency. Records always lead to a full text, and it's less concerned with accuracy, and it
invites user correction, so it's somewhat outsourced, the QA, to users. If users see a problem with a record, they can click a button and suggest an improvement to it. It has an API for automatic harvesting of itself, and it's done a special project
where it's gone back and comprehensively harvested from the historic coverage of core journals, and as I showed you, it has an advanced and very specific search available.
Another example is InspireHEP from CERN. It's very similar, isn't it? I mean, it has a simple search at the top, and then gives you examples on how to do advanced searches below through kind of a full text, a free text search.
The workflow of InspireHEP is quite interesting, and it's perhaps something that NS could emulate. On the left side, we see the automated workflow, where periodically, daily, as I said before, a crawler goes to archive, to the proceedings of science, to other
publishers, and extracts records. They are then sent to a literature workflow where keywords are extracted, where a record is in scope or not is found, and references are extracted. These are then sent to a curator who accepts or declines the submission.
Now, sometimes, I've heard from people that in certain cases and certain publications, it's completely automated. So, if they know that a journal is going to have in scope records where the keywords are well-developed, then they will go automatically into the
repository. A second workflow is that literature is submitted by authors or other people into the author workflow, and these are either accepted or declined if they're of significance or if they're in scope, and if the
metadata is well-developed, and then they are accepted into the repository. So, these are a couple of ideas that perhaps we could bring into Innes. So, InspireHEP is only one of several repositories that are run by CERN. You have the CERN document server. Zenodo came from CERN.
Anyway, right now, it has over 1.5 million references. As I said, it harvests automatically with daily frequency, and like the other one, they are less concerned with accuracy. They invite user correction and even user submission. It has an API for open extraction of its own content, and it has
advanced and a very specific search available. So, having looked at some repositories that I admire, I also thought to look at some standards, and there are definitely some standards for repositories.
There's Ferris Faire, OpenAir, Plan S, CoreTrustSeal, and others. But one thing in common is that all of the standards encourage openness. And let's look at each one of these in a little bit of detail. So, starting with Ferris Faire, this says that science, open
science, should be findable, accessible, interoperable, and reusable. And it has detailed recommendations on how to achieve these aims. Also, it has a collaboration with CoreTrustSeal that combines these two into a capability maturity model.
OpenAir says that it has a detailed standard and defines the recommended data fields that are used in automated exchange between repositories. So, that's an interesting one as well.
With Plan S, several science funders have gotten together and have defined how the science that they fund should be open access, and that includes open access repositories. It recommends things like permanent IDs for deposited
publications and authors, and the use of JATS XML, which is a standard for a data exchange, as well as an open API for the exchange of things between repositories.
The CoreTrustSeal trustworthy data repository requirements, well, they give you 16 areas that a repository should continue and define to be able to receive that certification, such as preservation, security, data reuse, licensing, et cetera. So, having looked at the successful repositories, as well as
the standards that are encouraged out there, these five characteristics seem to be those that are shared by the successful repositories and that are also compliant with the standards. And if we look at these, they are timeliness, openness,
preservation, user friendliness, and comprehensiveness. And just by chance, this spells topic. So, if we cover a topic well, then we will have these characteristics well in hand. So, let's look at these individually.
So, the definition of timeliness is being done at a favorable or useful time. And a piece of science, a publication in science is most useful, the closer it is to the idea having been formulated or the study having been done or the report
having been written. Each day that goes by, each month or year that goes by, the research becomes less and less valuable. So, if we look at the repositories I mentioned, each of these harvests on a daily basis and as soon as
something is published on the repositories that it's monitoring, these will be appearing in their repository. So, I would set a goal for Ennis that our journal articles and those that are sent to us that are gray literature be input within one week of publication or
one week of being turned into us. Now, in gray literature, we're meeting this. But in non-gray literature, we're not. Often, we're years behind. So, this is something that we need to work on. Openness. Openness is a characteristic of most of the successful
repositories I've looked at. And one question is how does openness benefit a repository? Why should a repository be open? Well, if you think about the mission of these sponsoring organizations such as the IAEA, does the IAEA want knowledge, information, science to be
the exclusive purview of wealthy and well-developed countries? Or does it want to equalize the playing field basically and for science to be available for everyone in the world? Not just those from wealthy countries, for those from less developed countries.
That should be the goal of most organizations. Most public organizations have this goal. And so, openness encourages this. Preservation. If we think about the mission of a repository, preservation ensures that ingested materials will
continue to be accessible and with their integrity intact. And it's all about the appropriate level of care when we're considering preservation. And I'm coming from an archival background, and there are definitely best practices in preservation and a preservation maturity model which says that
you should do periodic checksums, you should have redundant storage in multiple locations, and protections against malicious or accidental deletion. And I could share that standard with you if you would like. But that's something that we in NS need to adopt.
Additionally, user-friendliness. Now, user-friendliness, it should be obvious that we want a simple and understandable design that brings users back. Overcomplication was the old web. Cluttered web pages was the old web. Nowadays, people are expecting a Google-like interface to be able to simply type in a search
term, hit enter, and get results. But also, people are expecting to have an advanced search for those who are advanced users, scientific users, but also that simple interface that I talked about for the great majority of users. Comprehensiveness means that the site covers its
scope as completely as possible. We probably aren't going to get everything, but we should get as close to comprehensiveness as we can, and comprehensiveness should be the goal. And this means that our site would be a one-stop shop for everything that they need in nuclear
science and technology. And this means not only the most recent records, which is a great goal, as I talked about, timeliness, but we should also go back in time when we can. The most recent records should be the priority, but those going back into the past aren't invaluable. They're valuable as well.
So Innes could adopt all of their best practices and the best repositories in the world. It could take from the standards that have been given. It could improve in timeliness, openness, preservation, user-friendliness, and comprehensiveness. And perhaps more attributes could be found.
If you have any to suggest, please let me know. And furthermore, a maturity model in these areas could be developed. That could be the subject of a future paper. And thank you very much for your attention.