We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Grey Literature in Open Repositories: New Insights and New Issues

00:00

Formal Metadata

Title
Grey Literature in Open Repositories: New Insights and New Issues
Title of Series
Number of Parts
17
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Computer animation
Computer animation
Computer animation
Computer animation
Meeting/Interview
Transcript: English(auto-generated)
Hello, my name is Joachim Schapfel. I'm from France. I'm a researcher in information science at the University of Lille and a consultant in the field of scientific information.
I will now present a research we conducted this year on grey literature in the field of open science, especially the place of grey literature in open repositories. A research we conducted with colleagues from Lille, from our research laboratory,
and from Rennes, from the University of Rennes. So, what's the meaning of grey literature? When researchers speak about grey literature, when they speak, often they don't.
They just use it. When they speak about grey literature, especially when they are conducting systematic reviews, they think about unpublished papers, papers which are not databases,
which are not peer reviewed, and so on. On the other hand, and this is the more usual traditional meaning of grey literature, when librarians and information professionals speak about grey literature, they do it in the context of acquisition policy and collection building.
And what they have in mind are documents that are difficult to find, difficult to get, which are published and disseminated
outside of the usual dissemination channels, often in small numbers, and which are most often outside of holdings, archives, libraries, and so on, but sometimes also inside, but just difficult to identify.
Repositories, which are the context of our study. Repositories have become, in a couple of years, a central part of open science.
The international directory, Open DOR, counts near to 6000 different repositories all around the world. The European Open Science Monitor estimates that their content
represents about 25% of the published research, with large differences between countries and disciplines.
Our colleague Daniel Lucy, 10 years ago, said, you used a nice expression, she said that open repositories, especially institutional repositories, could be home for grey literature.
And this is not completely wrong. If we have a look on the content of open repositories, many repositories contain thesis and dissertations,
or conference papers, workshop papers, reports, working papers, and other grey items. Together with our colleagues from Lille and from other universities
and research organizations in France and from other countries, especially in the context of grey net international, we conducted a couple of studies during the last 10 years,
and especially in France, we found that all open repositories, nearly all contain some kind of grey literature, about one third of all deposits globally, usually average,
is grey literature. The part of grey literature is increasing over the years. Its accessibility degree of openness is higher than of commercial academic publications like articles and books,
yet the accessibility of grey literature varies between different repositories and between different document types. Usually, thesis reports and working papers are more open, more available than conference papers. So last year, last year's conference of grey net international,
we presented empirical evidence on the place of grey documents in the Hull repository, which is the national central repository in France.
This year, we conducted a follow-up study, and I will present our results from this year
and make a comparison with the results from last year. The study assessed all the deposits of the 10 most important research universities in France,
with about 12% of the French universities, but nearly half of the French research performance, and about one third of the Hull deposits.
We assessed all deposits, all document types, and metadata. The sample consisted of more than 1,200 research laboratories
in all domains and disciplines, and of more than one million metadata documents. So, the global part of grey literature in this sample is 33%.
The white literature articles, books, chapters, book reviews, and so on,
two-thirds. And there are some data files, even if Hull is not a data repository, especially images, image files, video files, audio files, some code, some maps.
So one-third is grey literature, two-thirds white literature. Having a look at the different types of grey literature, most of them, more than two-thirds of the grey literature in this sample
is conference papers, 70%. 70% of the deposits are conference papers.
And then come the PhD thesis, 11%, preprints, working papers, 8%, reports, all kinds of reports, 5%, posters, 4%, and other less important types like, for instance,
Beckler-Fatt and Masler Dissertations courseware, 2%. The Hull deposit can be a document, full text,
together with its metadata, but it can also be just the metadata without the document. Maybe the document then is embargoed, but most often it's simply not there. They're just the metadata.
This means that only one part of the deposits are freely openly available in terms of access to the document itself. So comparing grey and white literature,
this means that grey literature on Hull is more accessible than white. 38% of the grey literature can be accessed on the level of the document and only 28% of the books and the journals are openly available on Hull.
Data, nearly all data, is openly available because when people want to deposit image file or a map,
especially image and video file, they must, this is mandatory, they must deposit also the document file. The image file not only creates the metadata.
The same obligation exists for PhD thesis. If you want to have a PhD thesis on Hull, you must deposit the document file,
not only create a record for metadata. And the result is that nearly 100% of the PhD thesis on Hull are openly available, same level as data.
There's another category of grey document types which is available more than a half, 50%, 70%. These are preprints, working papers, reports, other types,
except for conference papers and posters, with a low level of accessibility, only 20 to 30%. So you can see there are really three different levels.
PhD thesis on the other hand, conference papers, posters on the other hand, and in the middle preprints, reports, working papers, master's dissertation and so on.
So we measured the degree of openness and the part of grey literature in four large scientific domains, biomed, science and technology, social science and humanities, and law, economic business and so on. You can see the differences between these large domains on the left side.
On the one hand, the differences regarding the overall number, which is the size of a bubble, but also the degree of openness
and the degree, the part of grey literature below. More open access, you can find it in life and medical science and science and technology that is on the upper level of the figure and more grey literature, science and technology,
social science and humanities, which are on the right side. So you can see the special situation of science and technology
with a high level of open access, but also a high level of grey literature, more important than the others, and of all of the high number of documents in HAL.
Regarding the level of disciplines, 10, we assessed the same variables regarding 10 scientific disciplines, more open access,
on the other side, mathematics, computer science, life and medical science, more grey literature, the right side, computer science, mathematics, civil engineering. So you can see that what we saw just here before,
the specific situation of science and technology, in fact, is largely conditioned by mathematics, computer science, and a little bit civil engineering and a little bit less physics.
So let me make some comments. The detail, you will have it in the paper, but some comments regarding importance of grey literature.
The only grey document type that is really important are conference papers, especially in life and medical science, mathematics and computer science. On the other hand, other grey resources like reports, preprints, working papers
are much less important. Even of course, they may contain unique and difficult content, but regarding their significance in the different disciplines and domains, most important are conference papers.
Regarding openness, what can be said? The degree of openness of conference papers is higher in mathematics, near to 40%, humanities and computer science, much higher than, for instance, in physics or social science, where only 10 to 15% of the documents are available.
Reports, mathematics, computer science and civil engineering are the disciplines with the highest degree of openness, more than 80% of the reports in these disciplines are freely available.
Working papers and preprints, humanities, mathematics and computer science are the disciplines with the highest degree of openness, more than 70% of working papers and preprints in these disciplines are freely available.
We had a look on the existence of persistent identifiers. You know, the most important of them on the level of documents is a DOI, Digital Object Identifier.
So we had a look and we assessed if the documents had been deposited with a DOI or not. So overall, for all the deposits, 1 million deposits we assessed,
41% have a DOI, 41%. But most of them, 9 out of 10, more than 90% are white,
mainly articles, of course, genre articles, and only less than 10% are gray. Here, mainly conference papers, posters and preprints, but you can see it on the left side, a very low percentage of DOI's conference papers, about 15%,
posters, preprints, less than 5%. This is very, very low. We made another assessment on licensing.
So when you deposit a document on HAL, you can choose to publish it on HAL under an open license. If not, it will be available under the usual IP and copyright laws.
So in fact, 11%, only 11% of the deposited documents are published with an open license. The most important license is a liberal license, CCBY,
with 5.5% of the license of the documents. Not very much. And then the next license, Creative Commons, CCBY, non-commercial, non-derivative license, which is more restrictive, with 3% of the documents.
1% of all the deposited document files are published in the public domain. We compared the different document types
and we were surprised to see that, in fact, grey items are less often published under an open license than white. Open licenses are most often applied to the sharing of research data.
PhD theses are very seldom published under an open license, less often than, for instance, poster reports and other Michelinist documents. What has changed since last year?
Hull, generally, became less grey. A little bit, but became less grey. The number of grey items increased a little bit, but less than all the deposits. And less, especially as in journal articles, chapters, books and so on.
Which means that, as a result, the part of grey decreased from 35% in 2020 to 33% this year, 2021. This, in all domains and disciplines, there are no differences.
This is consistent. I wonder if you can consider this as a kind of sealing or capping of the part of grey literature at a level about one third, 33%. So we'll see.
If you make a follow-up next year, we'll see if it continues to decrease or if it will remain stable on this level. I think it will remain stable because we have a representative sample of scientific output.
But we'll see. Perhaps journal articles will continue to increase at a quicker pace
than grey literature. We'll see. On the other hand, Hull became slightly more open, but more open. The part of open access, especially among grey literature,
slightly increased from 36.6 last year to 37.6 this year. The differences between the types of grey literature remain unchanged. More open thesis, working papers and so on, less open conference paper. Grey items remain more accessible than articles and books
except for conference papers and posters. And then the question is, just like last year, are they really grey, all grey, these conference papers? We all know that one part of the conference papers
are published as journal articles, but they are indexed in Hull then, like conference papers, even if they're disseminated in journals. Others are published by commercial publishers,
not really grey. And one reason may be also that some conference papers are simply not available as a paper, but only as a presentation. And for the moment, Hull does not consider conference presentations
such as PowerPoint presentations or PDFs as a full document. We also assessed,
based on all this information, these results, we tried to assess the fairness of the Hull deposits. You know the fair principles, findability, accessibility,
interoperability and reusability. I don't need to explain these principles. General observation regarding Hull is that less than half of the deposits have a DOI,
which is not very significant for the findability. It can be improved. Yet grey documents are even weaker than articles because there are very few grey documents with a DOI.
Regarding accessibility, grey documents are globally better than articles on books on Hull. They are oil. Oil deposits are retrieval monohull with a standard OAI protocol,
but only for one third of the document is accessible. I already explained this. This is better for grey documents, especially for the PhD thesis and for reports, working papers and so on.
We didn't assess the interoperability on the level of metadata. Regarding reusability, overall, it can be said that the reusability, the compliance with this principle of Hull is rather weak.
Only 15 items, for instance, are released with a license. This is the same result for grey documents.
Their compliance with this principle, with the reusability principle, is rather low. I'll finish with some concluding remarks
on challenges and issues based on our research. First, we have seen that there are many,
many documents without access to the document files. They are just records, but no document files.
These unavailable documents can be considered as a new form of grey literature, especially for those of the documents which are not on other platforms. Of course, there are documents which are not on Hull,
but on other platforms, especially articles. Normally, they are on journal platforms. If they have not disappeared, they are still there. But where are the reports? Where are the dissertations? Where are the working papers?
When they are not on Hull, some may be elsewhere, perhaps in academic social networks. But most of them, I'm sure we won't be able to find them. This is new, great literature. Also, there are many documents on Hull without a PID,
though less identifiable, less findable. Files without DOI can be considered interpreted as a new form of grey, less findable.
Before, it was a question of less accessible. Now, it's less findable. This is a traditional problem of grey literature. Hard to get, hard to identify. Where are they?
The third issue is about licensing. Many documents freely, openly accessible on Hull are in fact what have been said or described as gratis open available.
Gratis means you can get them for free, but usability is limited. There are restrictions, especially because of copyright and intellectual property laws. They are not libra. Libra open access means with a minimum of restrictions.
So, when you can't use the documents as you want, this can be considered as a new form of grey literature. Grey literature you can read,
but you can't, for instance, use it for text mining. New analytical tools. Regarding fairization, what should be, can be,
must be approved above all is a findability and usability of documents, especially of the traditional grey document types. There's another issue speaking about repositories.
Long-term preservation. In our study, however, it's not an issue because Hull guarantees long-term conservation. So, it's not an issue for Hull, but we must keep in mind that for other repositories, especially for local institutional repositories,
this may be an important issue. Where will all these documents be? The day the local repository will disappear. So, thank you for your attention.
I'd like also to express my thanks to the funding received by the French Perfist Network and especially many thanks to our colleague,
Bena Jackman from LIDL for the extraction of all the Hull data. Thank you again. If you have any comments, any questions about what we did,
so please contact us. Here is my contact address. You will find us. And we'll stay in contact. Goodbye. Thank you.