We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A Retrospective on: The Challenges of Introducing Grey Literature into a Scholarly Publishing Platform

00:00

Formal Metadata

Title
A Retrospective on: The Challenges of Introducing Grey Literature into a Scholarly Publishing Platform
Title of Series
Number of Parts
30
Author
License
CC Attribution - NoDerivatives 3.0 Germany:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
In 2019, GeoScienceWorld were actively planning to bring a large content and data repository, that includes a significant proportion of highly valued grey literature, into our existing collection of 50+ peer reviewed journals and over 2300 books in the geosciences. Due to various external situations, including the impacts of the COVID-19 pandemic, and an absence of community accepted standards for grey literature publishing, this project has stalled. GeoScienceWorld continues to investigate opportunities to bring original datasets, as well as other collections of grey literature, predominantly in the form of partner societies’ conference proceedings and related conference materials, into our traditional research platform. We are also in the early stages of planning for a new research tool that will be truly content agnostic in bringing research and valuable insights to our primary end user stakeholders, researchers, whether in academia or industry. As an organization, GeoScienceWorld is further implementing an Agile mindset and development philosophy in order to bring increasingly useful, and timely, resources to our stakeholder groups. A key ceremony of all truly Agile development processes is the Retrospective. In this paper, I review the initial aims of the project to incorporate a large grey dataset into our traditional scholarly literature platform and provide reflections on how both GeoScienceWorld and the wider grey literature community can move forward to bring such valuable datasets to audiences that both want, and need, such content to advance their own research. For each element of the initial project, I ask the following Agile Retrospective questions: 1. What did we do well? 2. What could we have done better? 3. What have we learned? 4. What are we still puzzled by? As a result of applying these questions to the initial project, I will present recommendations that both inform GeoScienceWorld’s future integration and presentation of grey literature, as well as offer a clearer path toward greater grey literature acceptance within traditional scholarly platforms such as ours.
Keywords
Computer animation
Computer animation
Computer animation
Meeting/Interview
Transcript: English(auto-generated)
Good morning. My name is Alistair Rees. I'm the product manager of platform and implementations at Geoscience World. And this morning, I'm going to present on a retrospective on the challenges of incorporating grey literature into a scholarly publishing platform. Before we get going, a little bit of background on Geoscience World.
We are a nonprofit collaborative that was established in 2004, and our mission is to aggregate and disseminate information to advance Earth science and to strengthen the sustainability of our society partners. On our research platform, we currently host 51 journals, as well as 2200 ebooks, and the GRF ANI database that consists of 4
.5 million GRF records, and the GRF Thesaurus that we use to empower several functionalities within our websites, such as our related content widgets. As I said, part of our mission is to strengthen the sustainability of our society partners, and in order to do
that, we return money to the societies through the royalties program as a result of their journals and book publishing programs. And since 2004, we've returned more than $50 million to those societies.
Moving on then, just a quick reminder. The project that we're talking about is the same project that I presented on in Prague in 2019, where we had the opportunity to acquire a larger data set that includes about 30% grey literature,
mostly meeting abstracts, but also other content types such as posters and video content and maps. And I've broken down the challenges that we discussed in that presentation into three areas, preparing the content, the impact on search functionality, and the need for new purchase models in order to support the business of posting this content.
And at GSW, we are currently going through a process of really trying to bring our corporate culture into alignment with the full values of the agile manifesto. Really, we're kind of implementing an agile mindset to our work.
And one of the key agile ceremonies is a retrospective, an opportunity to look back at a project and ask four key questions. What went well? What could have been better? What have we learned? And what questions remain? So I want to apply those four questions to the three elements of the original project that I presented on in
Prague, and offer some conclusions at the end as to potential ways to move forward as a community around grey literature. So in terms of content preparation, things that went well was we really did understand the
scope of the content that was coming our way, and is still potentially coming our way. We knew that there would be over 100 new journals, a large swathe of books. But as I said, we knew that about 30% of that content would fall under the broad category of grey literature.
And so within that project, you know, we were aware exactly how much content there was and that understanding and working with our partners to really define and look at the content allowed us to have a clear identification of the amount of grey literature versus peer reviewed and also what grey literature would be coming down into the process, into the project.
Things we could have understood better around the content preparation was we really were not aware that there were no standards around XML and DTDs for grey literature. We just kind of assumed because we have jacks for journals, bits for books, there would be some kind of shared unified DTD for grey literature.
So we could have been better prepared to understand that that didn't really exist. And we also kind of had a general assumption that there would, again, because we were working on a paradigm of having a common display approach for presenting literature with journals and books, we
kind of assumed that that kind of exists in some form within the community, that there would be a common unified approach for display. So we're not really displaying grey literature in an online environment, but we just really, there
isn't any kind of shared standards or common understanding of how this literature should be displayed online. So we learned that each and every content type brings its own unique challenges, even though we knew that
meeting abstracts would be different from posters, different from maps, we hadn't really appreciated until we kind of went into this content preparation process just how many unique challenges each particular piece of content would bring to us. But when we started learning, realising all of these different unique content types were there,
we started to understand that there is a lot of overlap in terms of metadata elements. An author is still an author, regardless of whether they are writing a report or if they're writing a meeting abstract, they're still an author.
So there is overlap in the metadata, and so we can use that to start forming the bonds and forming the chains of something approaching a unified XML GTD and experience online. Which does lead us on to the open questions that we still have, as the project has stalled, we still have these open questions to answer when it gets going again.
And, you know, the main questions are really, is there a need for a grey literature equivalent to chats or bits? Is there the need to have a centrally defined XML schema that serves as an industry standard for grey literature? And if we answer that question as yes, there is a need, then from a geoscience world perspective, is that, is the creation of such
a shared XML schema, is that something that we as an organisation want to, can, should take a lead in making that, in defining that schema?
Especially as we have, you know, a large amount of experience working within technical, the technical side of publishing and around XML workflows and work and tools for loading content and making sure that content is discoverable through various online discovery services.
Is there, do we bring to the table a valuable technical perspective that could be of service to the grey literature community? Moving on from content preparation to the implications that our project had for search. Things that really went well as we kind of investigated the implications of grey literature on our
search environment is that we kind of understood that end users want to direct their own search experience. They don't want to be forced into particular directions or rabbit holes within the search environment because an expert or someone else has decided that it's better for them to go down here.
They really want to be able to search the content and find answers to their questions without the influence of kind of almost like Wizard of Oz behind the green curtain pulling the strings. But at the same time, we understood very clearly that the need for visual cues is important in such environment.
So that users can very quickly, especially around content status with regards to grey literature versus peer reviewed. So that users can very quickly identify the types of content that they are being presented within a
search result and therefore kind of keep moving through the experience and not have to ask that question. By going into the content, is this something that I can use? But it's right there as a visual cue in the search result. And that then led us to some questions that we could really kind of, we could have done a
better job of answering exactly what information within search results and search facets do users really want to see? So users do have limited time. The average time session time on our website is between 90 and 100 seconds. And so users just want to be getting in, finding the content and moving on in their day.
And so we need to better understand what information is really important to the user in the search results and in the search facets. One of the things we learned as an outcome of this is that we really need to do a better job of always keeping in mind the web best practice of not impeding discovery.
It's not our job as an online content aggregator and an online publisher to tell end users what it is they need to discover. We just need to present them with the most relevant search results for what they're looking for and let them build their own search experience.
I'm probably kind of showing my age a little bit here. When I was a kid, I really enjoyed make your own adventure books where you would read the page at the bottom and say, if you want to go into the woods, go to page 10. If you want to go to the lake, go to page 110, 112 or whatever it would be.
And so that's kind of how we need to think about search for the end user. The end user is on their own journey of discovery, and we need to facilitate that journey rather than define the journey for them. And we also have learned that the vast majority of users simply do not want, they feel overwhelmed by the amount of metadata that we're presenting.
And when I say we, I mean academic publishing as an industry. The amount of metadata that we present to the end user in the search results is overwhelming our users, especially when you look at a lot of websites. In scholarly publishing, in particular, we have very textually dense search results with maybe 10 or 15 individual meta metadata points.
Are they really all important? So that leads us on to our open questions here. What metadata really is the most important to display to the end user in that limited space?
Can we make better use of the space to show the metadata that is really valuable to the end user? You could use the phrase we need to get meta about metadata and find the key parts of that metadata to display to the user. We also open, you know, in thinking about search, it kind of raises this question when we have visual cues in the search
results, such as an open access icon or a free tag that tells the user this piece of content is open access or free. Do we then need to have a facet over in the left hand side that allows the
user to only see open access or is it enough to see that that content is open access? One of the constant challenges and balances of challenges of working within the technological side of academic publishing is kind of finding that balance between the search experience and the page load
time and giving the user enough information to make a decision about the piece of content. And if they can make that decision without overwhelming the user with facets and filters, then that's a benefit to the end user.
Moving on to our final section, talking about new business models that would support the presence of great digital within the platform. We've early on very quickly identified our primary market audience for this kind of content. It was corporations, it was consultants. We very quickly understood that there was a limited interest, at least from institutions that we work with.
It was a limited interest from academic institutions in this body of grey literature that would be part of this larger project.
And so we really identified that we would be targeting the corporate market for this content. And so we worked ahead of the project to implement tokenized product offerings, which are available exclusively for our corporate customers, because that is a way of purchasing content that they're very familiar with. They're used to that kind of way of working.
There are other organizations doing similar things to us who, you know, they also have this tokenized product offering and we kind of work together with them to really make sure that that was something that would work best for a corporate market that was looking for this kind of content. However, we could have done a better job of identifying secondary markets, we kind of tended to fall into this dichotomy between our corporate customers and our academic customers.
Recently, I was in Charleston at a conference down there, and there was a presentation given by a lady who was talking about how open access drives their business, drives discoverability, drives usage. And when they worked with the IP registry to really understand the kind of people
that were using their open access content whilst not having an account within their system, they discovered that there were organizations such as police departments, government agencies, non-governmental organizations who were using their open access content.
So they found value in the content that was being presented, and that allowed an opportunity to then find a way to serve that community, you know, serve products to that community that was behind the paywall because they had taken the time to identify those secondary markets. So I think that's something that we as a publishing company could do far better because we
do tend to just think in that corporate academic way of thinking that we're missing secondary markets. It would also, you know, probably for this particular content set, be good to see if we can identify some
kind of unique selling point that would be an attraction to the academic market as well as the corporate market. And that's not something that we've really done very well so far. Things that we've learned, probably the most challenging learning that we've come across as a result of this project is that as we talk
with our corporate customers, you know, we hear time and time again that they are prepared to pay for access to publicly available content. If it's in a central location. So, you know, within the geoscience environment, for example, the majority of
states have a geological survey, the majority of states have a state geologist. And these organizations are creating reports, they're creating maps, they're creating valuable content that then sits on their own servers and their own repository and is not discoverable to the wider geoscience community. And corporate customers in particular and probably other secondary market customers are willing to pay
to have all of that content located on a central discoverable environment such as geoscience world. So that was actually quite a surprise to us. But it's something that we're working through the ramifications of right now.
We've also discovered, at least within the geoscience research community, and this may be true of other scientific communities, I just, you know, want to call it out that this is just from a geoscience perspective right now. Is that the research community seems to be less interested in whether content is gray or peer reviewed.
When they're doing their research. They're really just looking for answers they're really looking for. Does this piece of content have the information and the knowledge that I need to move my research forward. And it would be a secondary possibly a tertiary concern, whether it's gray, gray literature or peer reviewed.
And again, that was kind of something of a shock to us or a surprise. Always had this understanding of peer reviewed content being held up as the gold standard, and really it's a much more complex muddied picture than that.
Which leads us to our open question you know what kind of business models really do suit gray literature within a traditionally scholarly publishing platform are we talking about a subscription model, do we think about premium models where certain amount is free. And if you want access to everything you have to pay a small fee so for example you could read the HTML version of piece
of content but then not, you'd have to pay if you wanted to download the XML or PDF or data supplements or something like that. You know, is there a business model there that the content is free and should be free. But users would pay to have access to a tool to interact with that content kind
of like Spotify kind of well be where you pay to not have ads which is great. But you can also use other other features within Spotify, because you have a subscription you purchase the access to the app on a monthly basis. And the big question that we've come back to more than once. And again, this may be a geoscience perspective is,
is the data behind the article behind the literary article, the content, is the data more valuable than the actual words
and pictures, because it's the data that users want to interact with and manipulate to make sure that this is repurposed. And they can kind of find their own, you make sure that the conclusions are valid and kind of see where the authors are coming from. So does that data have more value than the PDF of an article.
So a question that we ask ourselves pretty often and we kind of get different answers depending on which of our various stakeholder groups we're talking to. So, moving on then to our conclusions. It is clear to us as Geoscience World that grey literature is a valuable resource
for our research community. It's very important. It has its own intrinsic value to our researchers. We are as a result of this, where we're at right now we're prepared for this project to pick up again. It's, as I said, kind of stalled because of COVID because of all the uncertainty around in the last couple
of years, but we're ready for this to kind of move forward and bring this content set into our research platform. And as a result of going through this retrospective process, we really feel as though we have a firm basis, a firm footing for moving forward when this project does pick up again.
We have all of those things that we've done well. We know the things that we need to improve upon. And so we can start working towards improving those. We've got questions that we need to ask. We know what we've learned. And so that allows us to have that firm basis. And it also reminds us that even though this project has stalled, we have learned things.
You know, we have improved our platform as a result of this project. We have answered questions about our business as a result of this project, even though it's in a stalled situation.
So those learnings and questions have resulted in improvements elsewhere. And a lot of that comes back to, I mentioned the agile transformation that we're in the process of here at GSW, where we're really trying to implement or align, rather it's probably a better word, an agile mindset with our corporate culture, where we live the full values of the Agile Manifesto.
I encourage you, if you don't know the Agile Manifesto, go and take a read, take a look at the four values in particular and think about how those values can impact the organisational culture. One of the key parts for me in particular is that by having an agile mindset, it leads to continuous improvement. We're always looking to improve.
We're never happy just with the status quo. We always want to be better. And one of the other conclusions I've come to as a result of doing this retrospective and also the original presentation in Prague about the project was really,
you know, that when it comes to how do we present grey literature in an online environment, leadership is so important. And, you know, within the grey literature community, we maybe have
a different perspective coming from the more technical platform based publishing world. And, you know, that leadership to kind of define more closely how we present content that is grey literature to the research community so that they benefit is really important, but it takes leadership.
Does that mean we want to be the leaders? That's an important question from our retrospective. But it is something that is really, we feel is important to have, you know, that there is within the grey literature community, this willingness to step up and define XML standards
and presentation standards that could become an industry standard for the grey literature community as a whole. And with that, I'd like to thank you for your time. Thank you for listening. If you have any questions, feel free to ask me. Thank you very much.