We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Grey Literature and Persistent Identifiers: GreyNet’s Use Case

00:00

Formal Metadata

Title
Grey Literature and Persistent Identifiers: GreyNet’s Use Case
Title of Series
Number of Parts
17
Author
License
CC Attribution - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Computer animation
Transcript: English(auto-generated)
Hello, my name is Dominic Ferrisi. I'm from GrayNet International. And the title of this presentation is Gray Literature and Persistent Identifiers, GrayNet's Use Case. It's my privilege to present on behalf of the other co-authors,
Stefania Viaggioni and Carlo Carlesi from ISTE CNR in Pisa, and Chris Barrs from Don's Data Archive in the Netherlands. I'd like to begin with a brief overview. A GrayNet's PID project is a follow up to an access grade project carried out in 2019,
in which an online survey was held among stakeholders in the field of gray literature. Recipients in GrayNet's community of practice were asked their opinions about persistent identifiers, in particular, the DOI.
The focus of this PID project has expanded to include the DOI for research outputs, along with the ORSID for authors and researchers, and the RORID for research organizations. This project seeks to go beyond a straightforward compilation and linking of these PIDs
by building the PID graph and contribute to other PID graphs built by service providers, such as DataSite and OpenAir.
In this use case, the PID graph seeks to demonstrate how persistent identifiers can further research in this field of gray literature, and how they can contribute in making research entities conform to the FAIR data principles. PIDs and the PID graph are also seen to serve in the digital transformation of gray literature,
and as such will contribute to education and training in this field of information. It is intended that this project will not only serve as a case for GrayNet,
but will provide a model for other communities of practice in gray literature. In 2019, an online survey was carried out among GrayNet's community of practice
in order to gain their opinions on the uses and applications of persistent identifiers. Results from an online survey within the Access Gray project clearly indicated a positive opinion about persistent identifiers for gray literature.
The results of the survey from 8 of the 10 questions were significantly positive regarding persistent identifiers. The results indicated that persistent identifiers increase access to gray literature,
increase the citation of gray literature, allow for the preservation of gray literature, and are vital in linking and cross-linking data. However, the results from 2 of the 10 questions were less agreed on. The results did not indicate that the minting of DOIs for research outputs
would be a sufficient incentive for their acquisition in a repository, namely one that is reliant on self-archiving. And the results proved uncertain as to whether a DOI is a quality indicator
that increases the value of gray literature. Now let's look at some key characteristics of persistent identifiers and how they apply to GrayNet's PID project.
A persistent identifier is a permanent reference and unique label to an object that is independent of the storage location. The unique label ensures that the object can always be found, even if the name of the object or the repository changes.
As a result, an object can always be found unambiguously on the basis of its PID. This is important for the long-term storage and archiving of objects in a rapidly changing world. In short, PIDs provide the address of an object such as a landing page in a repository.
PIDs can be used to link objects and in so doing connect other associated metadata in a record. PIDs unambiguously identify objects if they move to other systems and services,
and they are computer-readable, demonstrating their interconnectedness with other research communities. The value of the PID not only provides a link to a digital object,
be it a person, publication or organization, but also allows the metadata associated with the digital object to become connected. When that metadata itself is expressed as a PID,
this further allows for the creation of a PID graph that models FAIR data principles. Findable, accessible, interoperable and reusable. These four principles will be discussed further in this presentation
following the introduction of GrayNet's network service in which they are applied. In order to obtain optimal use of persistent identifiers, a sustained data infrastructure must be in place within a community of practice,
one that facilitates a coherent data workflow. It is only then that a PID graph can be constructed and implemented. In this section, we look at the various components in GrayNet's data infrastructure and then discuss how they are implemented within its workflow.
It is important to mention here that GrayNet's workflow, as it applies to this project, includes retrospective input. A GrayNet's PID project relies upon a data infrastructure that is in place and integrated.
The four fundaments of this project are a registered open access repository that incorporates digital persistent identifiers, namely the DOI assigned to conference papers,
the ORSID assigned to the authors and researchers, and ROARID assigned to the research organizations. Let's have a brief look at each of the four fundaments in this project, beginning with the GrayGuide repository.
GrayNet International collaborated with ISTE CNR to construct the GrayGuide web access repository, which would come to house its collections of accepted conference proposals, published conference papers, and author-researcher biographical records.
To this end, open source software was identified and incorporated, metadata templates were created to fit the three document types, and in 2017, these collections were fully online accessible.
The GrayGuide repository is since registered in OpenDoor, the directory of open access repositories. The remaining three fundaments are the digital persistent identifiers incorporated in the project.
In 2018, GrayNet became a DOI minting service within DataSite and began assigning DOIs to its collections of conference papers in the GrayGuide repository. Since it is this collection upon which the PID project is based,
the ORSID and the ROARIDs had to be included in the DOI records in order to later construct the PID graph and be part of the PID graphs of DataSite, Commons, and OpenAIR.
Later in that year, ORSIDs were included in biographical records in the GrayGuide, and an active campaign in and among GrayNet's authors and researchers was initiated, encouraging them to register an ORSID ID if they did not already have one.
In order to facilitate this, a link to the ORSID registry was provided, as well as a link to GrayNet's bio collection displaying the ORSID logo alongside those records in which the PID had already been assigned.
In 2020, the ROARID for research organizations was added as a metadata field to bio records in the GrayGuide. By way of a search in the ROAR registry,
ROARIDs of organizations could be online accessed and included in the records of those authors and researchers whose conference papers are archived in the GLB collection, as well as their corresponding DOI records in DataSite.
GrayNet has since applied for the ROARID and was assigned one in the spring of this year. A ROARID, like or unlike an ORSID, is not assigned separately, but rather in interval batch releases,
that is, when new records have been approved. Perhaps we might now consider that when the ROARID of an organization is linked with a research output assigned a DOI,
and when the author researcher has an ORSID, that these three PIDs taken together can be seen as a quality indicator for a conference paper or other grade literature document type.
Let's look at the shared data workflow implemented in the PID project. Our project team was formed bringing together human resources and expertise needed, namely the system management and development of the GrayGuide repository,
the communication network management of the GrayNet community, and the acquired knowledge and expertise of the PID graph. From early January of this year through the first week of March of this year, GrayNet undertook three tasks integral to the PID project.
First, to complete minting DOIs for its collection of conference papers in the GL series, including those published this year. The collection now totals 443 conference papers with DOIs. Our other service providers,
namely Don's Easy for GrayNet's published datasets and the TIB Aave portal for its conference video presentations, also assigns DOIs in data site. However, these were not included in the population of this project.
The second task that ran parallel with the first was the retrospective search and retrieval of ORSIDs and ROAR IDs that were added to both the DOI metadata records and their respective bio records in the GrayGuide repository.
The retrospective task also included the input of biographical records on behalf of authors and researchers whose names appear on the conference papers, but who had not yet submitted a bio record.
This was accomplished in part by retrieving biographical notes from previous conferences in the GL series, preserved in GrayNet's in-house archive, and partly via Google searches. A third ongoing task with regard to the data workflow
dealt with records that needed some modification in order to benefit the PID project, such as an existing ORSID in a record carries 16 digits,
but is not preceded by the lead text hotttps.org. If it doesn't have the lead text, it is not an actionable PID. Or, when an ORSID is retrieved only to find the message no public information available.
This makes it difficult, if not impossible, to confirm the identity of the author or researcher. And also, when the author or researcher's organization is absent or unclear in a record,
it becomes difficult or impossible to assign a ROAR ID using the ROAR registry. While these and other such problems were few in number, the time required to correct them was disproportionate.
Nevertheless, when a system and a service rely on self-archiving, and when a persistent identifier, such as an ORSID, can only be acquired by the author, researcher, him or herself,
then these tasks must be calculated in the workflow. Now that the complete collection of conference papers in the GL series has an assigned DOI in data site, one that incorporates their corresponding ORSID
and ROAR IDs, then the number of actionable persistent identifiers for our project is accounted for, which then allows for the construction of the PID graph. The total actionable PIDs in our project number 769,
443 of which are DOI's assigned GrayNet's conference papers, 146 ORSIDs indicating that 61% of GrayNet's bio records include an ORSID, and 180 ROAR IDs,
that is 75% of GrayNet's records in GrayNet's bio collection, that include a ROAR ID. It may well be to note that these amounts were recorded in mid-March of this year.
When looking at the implementation of the PID graph, an article published in early January of this year attracted GrayNet's attention and was drawn to the benefits of connecting the various types of persistent identifiers
in producing a PID graph. For our project, this includes, as we have already mentioned, the DOI, the ORSID, and the ROAR IDs. It is expected that this PID infrastructure
would further demonstrate the value of persistent identifiers, as well as open the potential for more research, in our case, research in the field of gray literature. To construct the PID graph, two elements are required.
First, backend services that collect PID connections in a standardized way, focusing on two PIDs that are connected. This is essentially building the elements of the graph. Second, query interfaces that combine these connections
with PID metadata. A technology that is highly suitable for this is the GraftGL, which is an open-source data query and manipulation language for APIs,
and a runtime for fulfilling queries with existing data. This widely adopted query language provides a standardized interface that can be federated, making it easier to build client applications
for the PID graph. Applications built on top of the PID graph allow users to explore the rich connections between PIDs and to address specific use cases. The PID graph demonstrates that
we can gain more from PIDs when we look at their connections, indicating that the sum is more than its parts. Let's now have a look at four PID graphs
created from Granet's tour of persistent identifiers. Each graph is comprised of multiple resources, referred to as nodes, that are connected by lines, referred to as edges. In this first diagram, the PID graph appears in a horizontal format
from right to left and depicts from a DOI perspective three of Granet's publications in blue, connected with their authors in brown and their respective organizations sand colored. Notice that the author depicted in pink
does not yet have an ORSID. In this second diagram, the PID graph appears in a cluster format and depicts from a DOI perspective the same three publications as in the first diagram. However, now they are connected
with the inclusion of the author's names and their respective organizations, that is, more metadata. In general, when depicting a PID graph, the cluster format takes preference over the horizontal format.
Now, in this third diagram, the PID graph depicts from an ORSID perspective an author and his respective organization linked to seven of his publications. One of the publications is further shown linked to three co-authors of whom only one organization appears as shown.
In this fourth diagram, from a ORSID perspective, a research organization is encircled and linked to a cluster of publications that is further encircled
and linked to a number of their authors. Three of the authors are further shown linked to their respective organizations. In the PID graph, persistent identifiers are themselves the basic entities that are linked together.
Whatever they refer to is left implicit. The approach, however, requires that the PID metadata are sufficiently rich to represent the relationships of interest and that the PIDs are of high enough quality.
The advantage of this is that it becomes much easier to create graphs and to implement and scale rather than working with concepts and knowledge mining. When we now look at the PIDs
and the PID graph in relation to fair data principles, we see that PIDs themselves allow for the guarantee of interconnected services from minting to linking onto access and preservation. When these services are situated
in the workflow of a mature community of practice, they create a fair research environment. PIDs contribute in making research entities conform to the fair data principles. By way of the PID graph, connections between different entities
within the research landscape allow researchers to access new information. PIDs also play a role in the reusability of data by enabling rich metadata and their provenance to be associated with a digital object.
PIDs provide the possibility to link entities long-term and enable information exchange of identifying persons and organizations over different services. The overall PID infrastructure is made up of
PIDs service providers, repositories, curation systems, aggregators, indexes, metadata, and people. PIDs connect all of these elements, not only technically via metadata and integrations, but also socially via communities that have formed
over decades or longer. GrayNet International, now in its 29th year, can be considered a mature research community socially. By including PIDs for objects, projects, persons,
organizations in the metadata, the technical maturity of GrayNet's infrastructure can now likewise be demonstrated. As a result of this, sustainable connections can be made. Objects, projects, persons,
and organizations become computer-readable and understandable by other services like DataSite and OpenAir. A PID graph can be created, and GrayNet's information can also become part of other PID graphs.
Other services like OpenAir, PID graph, and DataSite Commons can be used to query for purposes of analytics and analysis, and it is a demonstration of fair principles for gray literature.
To include and expand on the fair principles, PIDs and metadata ensure that the entities they refer to are usable and citable, pointing directly to an object such as a specific item
or a specific version of a dataset, hence increasing the usability of that object for researchers. It also helps them to formally cite research outputs such as data and resources,
which in turn facilitates reuse and helps increase recognition. It is accessible in that PIDs enable reliable measurement and prediction of impact, facilitating a more strategic approach to investment, driving maximum benefit,
and ensuring that valuable resources are sustained. And now, some conclusions drawn from our PID project. Research in the field of gray literature will likely increase due to the incorporation and use of persistent identifiers.
PIDs, like other rich metadata, can be counted and cross-tabulated, enabling researchers to examine relationships in and among diverse types of data. As such, PIDs are actionable and can be used for new
research. Furthermore, PIDs and the PID graph can be seen not only to serve research in gray literature but also extend to new services in areas of education and training.
PIDs and the PID graph are shown to have real value in defining GrayNet's position as a mature research organization by sustaining and leveraging its resources, by adhering to the fair data principles, and by signaling
increased trust in gray literature beyond our own community of practice. While the minting of a DOI was not sufficient as a selling point in the earlier Access Gray project for attracting content to a repository,
the DOI now linked to the ORSID and the ROR IDs illustrated by the PID graph may prove more effective. Also, while the Access Gray project laid the foundation and direction
for this PID project, it is our understanding that implementation of the PID graph will go even further to provide a new strategy and approach to research in the field of gray literature. I truly
appreciate the attention you have given to this presentation and I invite you and your comments either via this online media platform or via info at graynet.org Thank you again.