We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Ensuring Public Access to Research Data: Perspectives from Three Academic Research Libraries

00:00

Formal Metadata

Title
Ensuring Public Access to Research Data: Perspectives from Three Academic Research Libraries
Title of Series
Number of Parts
17
Author
License
CC Attribution - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Computer animation
Diagram
Diagram
Computer animation
Computer animation
Computer animation
Computer animation
Transcript: English(auto-generated)
Hi, I'm Michael Levine Clark. I'm the Dean of libraries at the University of Denver and I'm here with Rick Anderson who's the university librarian at Brigham Young University and Judy Russell, the Dean of university libraries at the University of Florida. And we're here to talk about public access to research data, and the role of libraries and the role of
the institutions we represent in ensuring that that data is discoverable and reusable and preserved for future for future use. We of course wish that we could be with you in Amsterdam, both because we want to be in Amsterdam but also because we would
love this to be a conversation where we could talk about some of the challenges that we're facing at our libraries and some of the challenges that you might be facing at your own institutions, and within your own entities if you're a publisher or somebody else. Data is, is, I think, in some ways the epitome of grey literature, it is, it is hard to find it is not controlled
in many ways, and yet it's fundamentally important for for the ongoing research that all of our institutions do it's it's it's it's an important
part of the research and ecosystem, and it's often forgotten. At a chorus forum in January of 2021 about about research data, Varsha Cody are from Springer nature, really summarize this well, stating that that for researchers to cite papers.
That's just second nature, and no one thinks twice about it. But we don't think about data in the same way at all. We think about data very differently papers are cited because that's the way that that recognition is given, both to the, to the publications that you are relying on for your work but also for for tenure
and promotion and all of the recognitions within the system. citations are what are what tend to be counted and yet gathering a data set or building a data set is fundamental to research and the publications wouldn't be possible
without those data sets, and yet we spend relatively little time or effort as institutions, whether we're libraries or somewhere else in the institution, thinking about how to manage those data sets and how to present the data to the world.
The, this is a example of the researchers view of the ecosystem that is influencing them. This comes from an association of American universities and Association of public and land grant universities guide to accelerating public access to research
data, and it's worth taking a little bit of time to talk this through because this is a good overview of all of the challenges that we face as institutions and thinking about managing research data. So at the center of everything is the researcher, the researcher is the one who gets the
grant funding, the researcher is the one who, who conducts the research and builds the data set. And they're the one who then is faced with the need to manage that data set in some way. Surrounding the researcher, really the their closest set of interactions is
with their college and their department, and the college and the department. Set the, the norms for recognition and rewards that's where the appointment and promotion and tenure document lives that's where the, the tenure committee meets. That's where the norms are set for what to expect in terms of tenure and promotion and and certainly within the
department. The disciplinary culture is is in effect and, and the norms and in a particular discipline around sharing of data around openness to openness openness to open access.
Also have an impact on what the researcher does and how comfortable the researcher is with with sharing of information. And, and so it's very important to think about, about those, those interactions, the institution.
And I'll come back to in a minute, but the institution lives between the college and the department, and a whole range of external factors that influence the decisions of the researcher. So, as sponsors and agencies, the funders are increasingly requiring some sort of deposit of data. They're requiring that data be preserved that
data be made accessible reproducible, and that, that, that it be linked even to, to the publications and, and so the institution needs
to be thinking about that as does the researcher disciplines and societies are are just like the department are setting the norms for the discipline. There are a range of different services and tools and repositories out there and some of those are within a society, some of those are commercially available, some of those are perhaps at an institution from a co author.
All of all of that matters and the institution needs to be paying attention to those alternate repositories and those other tools and services that might be used by a researcher publishers of course matter tremendously here.
The researcher chooses to publish with a particular publisher for for multiple different reasons and and publishers are are quite aware of the need for the positive data and many publishers are starting to think about how they can play a role in in hosting data sets.
The problem of course is that it's not a one to one relationship, the data set might be associated with three or four articles from three or four different publishers. There might be multiple data sets that are associated with a single article and so the publisher has a role but it's only a role among many.
And so, all of those are factors that come into play when the institution is starting to think about its role in managing data. So, the institution is really the place where where the requirements for managing data come come to the fore.
The, the institution may or may not be paying attention to this some institutions I think are paying closer attention, the needs to manage data than others.
And some institutions are are dividing this up in different ways right so it's not the institution doesn't mean the same thing in each of our cases and it probably doesn't mean the same thing in terms of many different librarians if you ask them this question. So, the library might be part of this, the Office of Research and sponsored programs might be part of this institutional research could be playing
a role campus it could be playing a role and and all of this could be happening within a college and not at the institutional level as a whole. But the, the activities have to happen right so there, somebody needs to be paying attention to make sure that
the researcher is in compliance with funder mandates that they are doing what the funder expected with with that data. There's, there are requirements around metadata and linking. Somebody needs to be describing the data and the data set, somebody needs to be ensuring that there
is a, the data set is linked to the funder, and is linked to the relevant publication or publications. And that seems to me to be a library role but it's not necessarily a library role everywhere.
There's a requirement to report out to the funder. There's the need to make sure that the data are preserved and accessible. And then underlying all of this, certainly not part of the requirements from a funder perspective, is, is the need for reputation management and institutions are engaging
in a lot of activity around management of publications, and doing so to help raise the reputation and raise the ranking of the institution. My, my university has invested a lot of time and a lot of effort into faculty profiles into into publicizing the research of the, of the institution.
And to the extent that that sharing data and making data sets available for others to do research from can help increase the reputation of the university. It certainly crosses into that reputation management role as well.
So that's the ecosystem that we're, we're talking about and now I want to just touch on our institutions for a minute. We come from three different research institutions in the United States and these are institutions of different profiles but all
of us are, are of high or very high research activity within the US, the Carnegie Corporation does classification systems for for higher education, and every, every university every college in the US is is part of that system.
You may have heard the term are one, which is, which means doctoral university high research activity and University of Florida where George, where Judy works is is a, an R1 high research activity university. Rick and I are our twos or doctoral universities high research activity, which is just
one step down from from the R1 is the high research activity, very high research activity. We've, we've given you a bit of data here about our institutions to try to give a US sense that not every research institution is the same, and not even every research institution at the same Carnegie class is the same.
We, we give very different amounts of doctoral degrees so BYU, Rick's institution gives a printed 212 doctoral degrees and in the last was recorded in. We pulled the data from ours is is 389 at the University of Denver and University of Florida gave almost 1900 doctoral degrees in
that in that year right so we're dealing with a very different level of research as as sort of represented at the doctoral level. We have different sizes of faculty as well from 5200 full time faculty at the University of Florida down to 762 at the University of Denver.
And the, the number of librarians is very different as well. Rick at a smaller institution has pretty close to the same number of librarians and professional staff as, as Judy at a much
larger university and University of Denver much smaller and so the faculty to librarian and professional staff ratio is a useful way we think of of trying to understand the effort and the the scope of work that needs to be covered by the staff within.
And it's 47 to one, University of Florida, 25 to one at the University of Denver and 20 to one at Brigham Young University of Denver and Brigham Young spend about $40 million a year on research and the University of Florida is at 942 million right
a very very different scale of research, and that comes out in the number of articles published. There's 8400 articles published annually at the University of Florida, only 669 at the University of Denver and this is this is a year of data from Web of Science right so it's not the full count of articles but it's the count of articles that gets represented in Web of Science.
So we're very different, and yet all of us have requirements about the positive data because all of us are federally funded, all of us are getting funding from agencies that require that data sets be deposited.
And the last thing I want to just touch on is that though we are all comprehensive research universities, we have different strengths. The Brigham Young University has particular strengths in liberal arts and sciences, and has an emphasis on undergraduate research that our institutions don't.
University of Denver is particularly strong in social and behavioral sciences and particularly in mental health, and the University of Florida has strengths in STEM and, and a particular strength in agriculture. So I'm going to turn it over to Judy to talk a little bit about the University of Florida. Thank you, Michael.
We thought it would also be helpful to you to understand a little bit more about us, if we shared with you the top 10 publishers that are authors utilized for their articles. So this is a choice by the authors, but as you can see from the data, the, the authors at the University of Florida are very heavily favoring Elsevier journals.
Then there's a relatively comparable level of use of Wiley and Springer journals and then it begins to drop off significantly. Obviously there are many, many more publishers in which our articles are published, given the sheer number of articles there are,
but this I think helps to reinforce that idea of what a STEM university we are, and Michael mentioned that we're also an agricultural university as a land grant so you see that within our top 10 is the American Society of Horticultural Sciences, for example, which, as you look at the other institutions is not going to show up, not that they may not have some articles
there but it's not going to show up as the top publisher so we looked at a 10 year span of publication and pulled that data for all three universities, and then tried to see how that data over time feeling that would normalize a little bit,
how it had had changed. And it does I think reflect a lot about the disciplines but also about the scale of publishing and and how it's being managed by our authors. And I think that Michael is then going to show you the comparable data for the University of Denver. Great. Thanks, Judy.
We're also. We also have Elsevier number one. We are, as I mentioned our particular strengths are in the social sciences and yet Elsevier is still our top publisher. Taylor and Francis is number two and you may not remember from the slide before but Taylor and Francis was
down around three or 4% at the University of Florida and yet accounts for about 11% of our publishing. Another one I want to draw out is American Psychological Association the APA which is about 3.6% of our publishing and is the sixth highest publisher and I mentioned that we have particular strengths and in mental health and psychology and so that that's evident
in our publishing. We are still a comprehensive university and still do have strength across the board and so we have IEEE and IOP and ACS in our top 10 as well right we're not just the social sciences institution but you can see some of this pattern is skewed a little bit toward
the social sciences, because of those, those strengths are overall publishing over this 10 year period is much smaller than University of Florida, just 4800 articles.
Now I'm going to turn it over to Rick to talk about BYU. Yeah, thanks you guys so this, this shows the distribution of articles between 2011 and 2020 by publisher from BYU and and again you know you won't be too shocked to see that Elsevier and Wiley and Springer are our top three.
And that Taylor and Francis follows close behind, we, you know, as a comprehensive university, we, we of course teach across all subject domains. We are not as heavily focused on the stem disciplines as Florida, and we're not quite as heavily focused on social sciences as
Denver but we have very strong programs and all those areas as well as across the humanities and fine arts and knowing that about us you won't see anything particularly surprising on this chart, except perhaps the presence of BYU itself in the top 10 of
our publishers. This is, this is particularly this might be particularly surprising if you know that we don't that BYU doesn't have It's a quarterly journal that is very active. And that is also cross disciplinary because Brigham
Young University is affiliated with and sponsored by the Church of Jesus Christ of Latter Day Saints. It has a particular interest in scholarship that is related to the history, culture and doctrine of the church. And we have a
lot of very active researchers on our campus who are not only active in their particular disciplines, you know, they may be sociologists or they may be chemists or they may be anthropologists, but who in many cases, conduct research in their subject areas that is also done through the lens of Latter Day Saint history, culture and doctrine. And so we might have sociological studies of
You know Mormon mothers in a certain time period in in certain circumstances. That could find their way the study like that could find its way into a sociology journal or
because it's related to Latter Day Saint culture. It could also find its way into BYU studies and so That the fact that BYU studies, which is a rigorous and peer reviewed journal, but one with a particular disciplinary or cross disciplinary focus Because it publishes so much. We have quite a bit of publication originating with BYU that is published in BYU
studies. So the next slide we've got is a comparison of the output of our three institutions by publisher and I do think it's worth noting that, you know, as we all know journal publishing is a prestige marketplace.
Authors select journals to in which to publish based on largely their expectation that publishing in that journal will enhance their reputation, help them achieve or keep tenure and will get their work out in front of their peers in their disciplines.
There are roughly according to STM there are roughly 45,000 scholarly journals being published worldwide right now. Elsevier controls only about 2500 of those And yet you can see from this chart, the degree to which Elsevier controls the prestige economy across multiple disciplines.
It's clearly the most favored venue by the authors at all three of our institutions, followed closely by the two other science publishers that control, you know, comparably large segments of the prestige economy of scholarly publishing Wiley and Springer.
So we thought that this was this was a particularly interesting chart because it shows That despite the the significant differences between our between our three institutions.
Nevertheless, the somewhat similar publication patterns that we see among our faculty So, um, what we see on this slide with this rubric is an outline of some of the important questions and factors that have to be addressed when you're considering
Or implementing a campus data plan. One of those factors is the question of leadership who if anyone on this campus. It has sort of overall responsibility for making data findable and accessible and for ensuring that it is reliably stored and curated
That of course leads inevitably to the next question, which is who's going to pay to ensure that the data are reliably curated and made accessible to researchers One of the important factors, perhaps the most important factor that drives faculty behavior with regard to data deposit is the question of recognition. This is, this is the universal currency of faculty life.
Now, of course, faculty behavior with regard to data deposit is also constrained by the requirements of their funder, especially if they hope to get funded again in the future by a funder that they've been That they've gotten money from in the past. So the questions for for those of us who are charged with data management.
Are you know, are there ways that that factors like recognition and and policy can be used to incentivize faculty, not just to store their data, but also to share it.
Policy itself is of course a very important parameter here different funders and different institutions require different kinds of data to be retained for different lengths of time. At each institution. We've got to ask ourselves what what kinds of data do we have control of that are that have to be shared per policy.
What kinds may be shared and and importantly, what kinds absolutely must not be shared. For reasons related to privacy or or or other other constraining factors. So these are obviously critical questions and they involve high stakes, including the potential for significant legal liability.
And and frankly, this is one of the reasons that data policy formation at universities is often slow. It's precisely because the stakes are high and sometimes they involve different kinds of liability. So another of the constraining factors around data management is the question of infrastructure.
At every university, we've got to ask ourselves, do we have the repositories, the researcher services, the compliance review mechanisms that we need in order to form and implement both a coherent policy and a policy that keeps us on the right side of the angels.
One of one of the other limiting factors on our ability to manage data is the need to harmonize different agency policies at an institution like Judy's
And then there are the crosswinds of, and I think this is a matter of growing tension, the tension between very important issues of security and the growing desire for scientific openness on the part of many in the scholarly communication ecosystem.
That tension is significant and really shouldn't be underestimated. So those are some of the factors that that sort of constrain and shape our efforts. If we go on to the next slide. Like like any significant or any other significant programmatic undertaking,
Making research data accessible offers real opportunities for us as institutions, but also entails real challenges. And we've talked about some of those already. They also include The fact that proprietary data can constitute a significant economic asset, both to individual researchers and to their host institutions. And this means that there is
A strong natural incentive to store and curate it carefully and to make it available, at least locally. On the other hand, since proprietary data can constitute a significant economic asset, both to individuals and to their host institutions. In some cases, there may be strong disincentives to share it outside the local institution. These are these are
some of the issues that we wrestle with when we try to figure out how to manage data locally. Sharing data has the potential to bring new and positive attention to an institution and its researchers. And one of the most important ways it can do this is through citation in articles and
reports, which of course will happen more to the degree that you make your data findable and accessible. Of course, sharing data can also bring unwanted attention to your institution if it turns out that some of the data was falsified or if the data lead to
So sharing data is, of course, like almost everything else in this in this area to edge sword. Storage, of course, is one of the biggest and the most obvious challenges for us of data management. As we all know, researchers in some disciplines such as machine learning or remote sensing or bioinformatics
Can produce astonishingly large data sets that pose very significant challenges, not only for management and accessibility, but sometimes just for brute storage and preservation. And then, of course, like all tools research data does have the potential to be misused. And because research data sets represent a potentially powerful tool, they have the potential for truly destructive misuse.
If they're not managed and safeguarded carefully sensitive data can be accessed by Genuinely nefarious actors and can be used for any number of destructive purposes. So
taking on the responsibility to manage and curate institutional research data is a serious undertaking. So Michael and Judy and I all work in libraries and at many institutions, the library would seem like the obvious place for a research data management program. Libraries have historically been all about gathering organizing curating and providing access to
published information. So why not have them do the same for research data. But obviously not all of the functions that are needed around data management are a clear fit for traditional library structures and skill sets. We in libraries have always been strong in the areas of gathering or harvesting content and taking care of it and providing access to it.
But the content that we're used to managing in these ways has historically been primarily published content in a very limited number of formats. Data sets represent genuinely a new frontier for us, even when what we're being asked to do
with them seems superficially similar to what we've done historically with books and journals and databases, etc. Less obvious than these roles are such very important functions as monitoring campus compliance with funders or governments data requirements.
Watching out for misuse of locally hosted data and providing analysis to third parties, either within or even outside the institution. These opportunities constitute real growth opportunities for the library mission, but taking advantage of them will require significant redirection of resources and for our staff and faculty
The negotiation of a pretty steep learning curve. And I'm not sure that we always appreciate the degree to which hopping on to this train really would require fundamentally rethinking the way we allocate staff and space and money. Let's go on to the next slide. These are some, these are some related publications that have to do with with
Data management data standards. Many of you, I think, will already be familiar with what's called the fair standard. It's a set of criteria designed to help data curators ensure that their data are F findable, A accessible, I interoperable and R reusable so fair.
The RDA working group on the fair data, what's called the fair data maturity model has delivered a set of indicators and priorities and guidelines that are designed to help researchers in self assessment and funder evaluation and you can see a link to that rubric here.
The European research on Research Institute has partnered with another group called fairs fair to undertake what they're calling the fairware initiative. By which they're working with funders across the EU and the UK and the US and Canada to develop new open source tools to allow assessment of research output fairness.
And to implement global standards for fair certification of repositories and to build a pool of existing knowledge such as workflow hub. And fairs fair is also developing another tool and I'm not sure if the exact right way to pronounce this. I don't know if it's pronounced Fuji or f-u-gee. I hope it's Fuji.
But this is a tool that is designed to automate fair data assessment and fair aware, which is an online questionnaire to assess researchers and data managers knowledge of fair requirements right and with that.
I think I'm handing it off to Michael. Or is it Judy? To Judy. That's okay. So we've touched on a few tools and initiatives that are ongoing, and there are many of them and we've put a few of them here on the slide just as a way of indicating how much is going on in this
arena, and most of us are using a few of these things, or examining some of these things to see how to use them. They do different things for us. They help us in different ways, and so it's interesting to
see how the population in this area is growing and the different things that can be done. We've recently participated in a seminar on the Coleridge Initiative, and they're doing a lot of AI on data and trying to search the text of research articles to find what the data sets are that may not
be cited in the traditional way, but that if they can through automated systems, find those things and identify them. They can increase the use of those data sets and see the comparisons of who's using them and what types of research are they doing with them. That's a really interesting initiative.
There are other things here. I think most of you are probably familiar with Scopus and Web of Science, and they're great tools for mining information about what our authors are publishing, but they're also increasingly helping us identify more trends and other kinds of things. Michael and I are both at institutions that are using the Corus dashboard, and we're going to show you here
in a moment slides from our dashboards. It'll give you an idea of what kind of information that brings to us. But I think we're all in relatively early stages in our data management processes, and there are a lot of people out there who are trying to help us through these initiatives and help us through development of tools.
And then I think some of us are working on tools on our own campuses, which may solve local problems, but don't necessarily contribute really to the ecosystem and to the broader analysis and use of data sets. Michael, if you can advance the slide. So I mentioned that we are a subscriber to the Corus dashboard.
And I thought it would be interesting to take a minute here and look at the kind of information that comes to us. And this for us is a really important tool, simply if nothing else because of the scale that you saw.
We have so many faculty, our librarians are so vastly outnumbered, there's so much publishing going on, and so much research that any tool that we have that can help us track things more effectively is going to be really valuable to us. I'd like to call attention to a couple of things in the left column there today's indicators. So if you look
at it, 28.1% of the publications from UF authors that are available right now in the Corus data set are verified open access on the publisher site. So we've got a fairly high rate of open access.
This isn't necessarily corresponding authors, this is any author whose name is associated with an open access publication. But look then down a little further and you'll see 3.1% of these articles are associated with data sets. That's the little tiny line at the very bottom of the screen. If nothing else, when you look at the
at these numbers, it's a real indication of how much work is still to be done, because many of those articles have data sets, but those data sets are not yet deposited and identified in a way that meets the fair standards in a way that makes them easy to identify. And this dashboard helps us
communicate that issue and and address that problem on our campus. Also, you'll see that 31% of our authors are represented with ORCID IDs. We know many more of our authors have ORCID IDs, but they don't always use them when they're publishing, which makes it much more difficult for us to differentiate the articles.
We're hoping that more and more publishers will require ORCID IDs. Some of them are already doing that. It's a great way for us to organize and be sure that we're accurately identifying the authors from our institution. One of our biggest challenges has been identifying our authors. A number of them
will submit articles and they'll use a personal email address, like a Gmail address. And so it isn't quickly identified. And with thousands of articles, we're not going to open those articles to try to determine what department they're in, right? If we have their ORCID ID, if we have their UFL email address, we can very easily parse this data and route it back to them, to their department,
which is really important with respect to the compliance. And you can look here, too, at the information on the agency portals. One of the things CORUS does, which in the long run is going to be very helpful to us, is to check the portals that are required for deposit and confirm whether the articles are there.
Now, the fact that you're seeing only 19.5% of them as being in those portals reflects a number of things, one of which, if you look at the graph itself, you'll see there's a big uptick between 2018 and 2019. And again, before 2020, as CORUS expanded its data coverage and added more and more publishers. So when you look at the scope of this,
it's only really in the last year and a half or so that it's a much more comprehensive set of publications, but also the agency portal requirements are for federal data. And so we're doing a lot of data that's funded by other types of funders, including we get state funding, including private foundations, other
kinds of things. So in some ways, that's not as alarming a number as it might otherwise look, but it is an indicator for us. But I think this is a good example where a tool can help us with our analysis and
the way that we support our university with information about our authors and about our compliance in particular. I think I'm going to leave it there and let Michael talk about his use of the CORUS data. Thanks, Judy. So, as I've already mentioned, we're at a much smaller scale
than the University of Florida. We only have 1383 publications in our CORUS dashboard. And we, because of that, were able to do a project, I think probably in 2019, where we went and we looked at each
and every publication in the database and we contacted the authors and we did structured interviews with the authors where we tried to understand their understanding about requirements from the funders and their decisions about what they had done in terms of self-archiving of the
material. It was really useful and you can see that our numbers were pretty good back then. They've gone down a little bit. We had a pretty good understanding for a while of what our faculty were doing. It was a useful exercise, not something
we can do in the long term, certainly not possible as a regular practice, even at an institution that's as small as ours. But doing it once was really helpful for us. Unfortunately, we didn't ask any questions at that time about data sets. And you can see that our data set number is down even lower than the University of Florida at 2.7%.
And it would have been useful, I think, for us to try to ask those questions of our authors and maybe we'll go in and do that as a next step. There's clearly, even among those who've managed to publish this as an open access article
on a publisher site, they're not necessarily following through on the compliance with the data sets. So I'm going to go to the next slide. And I'm going to ask this question, and I think Judy will
start us off, but how do our libraries work with the Office of Research and other campus partners on data initiatives? And this was an area where we really had been looking forward to having a conversation with all of you to share what we're doing, but then to better understand what you're doing. And unfortunately, this format isn't going to allow us to do that. But we do hope
some of you will contact us and find ways to communicate with one another about this question. University of Florida's main partners are the Office of Research, the CIOs through the research computing function, and all of our contacts with the associate deans for research in our 16 colleges.
Obviously, individual contacts with many researchers, but in terms of kind of those corporate identities within the university, those are the big three. And the Smathers Library has some pretty significant roles in that process. We are the official site for guidance on funder requirements on behalf of the Office of Research, and have been from the very earliest days.
So we maintain information for all of the funders with which our university has grants and contracts, but also to the extent we can with others because there's an anticipation that at some point there might be a contract with those people or for purposes of comparison. We do have a significant program of liaison librarians and subject specialists, and they're all trained to assist
authors with their deposit and compliance, but also to know how to refer them back to the expert. When it's needed. So often what we find is the researcher will over and over again be funded from a single place from NIH or NSF or whatever it might be. And then when they get a grant or a contract from another funder,
they fumble a little bit because they just don't automatically know what to do, and that's been a major role that we have had. We're also a founder and active participant in the UF Research Computing Advisory Committee. It's a 30 member group of which three members are from the library, our Associate Dean
for Research and Health Science, our Data Management Librarian, and our Senior Director for Strategic Initiatives. And they have an active role in doing some of the things we've already talked about, setting the policies, prioritizing the allocation of resources, those types of decisions. And so it's been a really important thing that we've had people at the table from
the beginning, that we helped establish the group and have maintained a close relationship with them. Also, I mentioned up above that we have a Data Management Librarian. He assists researchers with the developments of achievable and reasonable data management plans, and sometimes with their implementation of those plans. Before that, what we often found was that the data management plans were
sketchy, might be a nice way to say it, but often unachievable. They were unrealistic sometimes in what really could be done, and when the time came to actually comply with what had been put in the grant proposal, it was very difficult to fulfill.
So we think we have very much increased the accuracy and viability of the data management plans, and the result has been that our Data Management Librarian has actually been brought in as an active participant in a number of the grants by the academic faculty, including serving as the PI for one major grant that has significant data implications.
He also leads the university-wide data management and curation working group, so that addresses, again, some of those other things we've talked about, that we do have a group that is specifically functioning to try to identify how do we handle data management campus-wide and looking at the issues around curation. We have a lot
of work still to do. We haven't solved these problems, but we're at least addressing them and talking about them. And as with the funder requirements, our liaison librarians and subject specialists are trained to offer basic assistance to researchers related to data management, but also to know how to refer them to the Data Management Librarian or other experts when that's needed.
And again, when you saw that ratio of how vastly outnumbered we are by those faculty, it's really helpful to have a little more of an army who's out there that has enough knowledge to be helpful, often to resolve the question by pointing them to specific resources and things, but again, knowing when it really needs an expert and then getting those people to our experts.
Our libraries also maintain the institutional repository, which we call IR at UF, little app sign, for the positive papers, posters, presentations, and small to moderate size data sets. The large to very large data sets go to what we call hypergator, which is managed by the CIO, which
is our large computer infrastructure. And we participate in enhancing the metadata and the tagging and linking of those things. I've already mentioned that we have this license for Chorus and that does help us identify the published
articles and to track the compliance with the deposit mandates when they're articles that have a compliance mandate. The scale, as you saw in the table at the beginning and in the chart is such that we have to have automated solutions, it's just not possible to do it any other way. And we do have plans to utilize the data from Chorus to generate notifications to the associate deans for research in each college
and to the authors when the deadline for the deposit is approaching and Chorus has not yet confirmed that the deposit has been made. The fact that we could use Chorus to determine that a lot of them are compliant already, so we don't waste time chasing those down that
we can really focus on the ones that are, if you will, at risk, or that might need a little help to get over the finish line. I think is very important to us and so that is something that we're actively planning for. We haven't implemented it yet, but it's a part of our intention of what we would do with the Chorus data. But that comes back to that same question of how do we identify that author, how do we know what college they're in, how do we know which associate dean needs a report.
And that's where things like the ORCID IDs and the insistence on using the official university email address would make a big difference in the efficacy of our work and the ease of automation of that work. And I think that my colleagues, we're each going to talk a little bit. I think Michael, you were going to go
next and talk a little bit about how you are working with the Office of Research and other campus partners at Denver. Thanks, Judy. I will say that we do not have nearly as elaborate a process, nor as formal a process. We are kind of muddling
along when it comes to data, and we do have a close working relationship with the Office of Research and with IT on managing faculty publications. But we haven't managed to figure out how to do this with data. Probably a decade ago,
we pulled together a group from the libraries, from Institutional Research, from the Office of Research, and from campus IT to talk through how we might put together a process and some policies around data management. And we never got the policies in place. We got a process in place. And that process, I
was sort of embarrassed to discover the other day still lives on the Office of Research website, even though everybody involved in that conversation has since left the university and we don't actually have that process in place. So with a different series of events, I might have been telling you we have this great process and these wonderful policies. But instead, we
were kind of back where we were a decade ago, knowing that we need to do something and not actually having anything formal in place. That said, we do work informally with these partners, and our institutional repository is
the home for, just like a University of Florida, small and medium size data sets. These are both because of mandates, but also in many cases because of faculty interest
in sharing their data. So some of it is not about a mandate at all. We share what we can in an institutional repository that is not necessarily designed for data sets. So it's not the ideal place for some of these, but we do what we can. And then in an ad hoc basis on particular requests and particular projects, we do get to help with metadata and with other things around data.
And then finally, we do run the campuses, the campuses instance of DMP tool is run through the library.
And so we do help with data management plans and with sort of guidance on that part of the process. And I guess that is the one place where the library really does have a defined role. So I'll turn it over to Rick to hear what BYU is doing.
My answer is going to be the briefest and the least impressive of the three, which shouldn't, you know, doesn't reflect badly on BYU, but partly it's because I'm relatively new in my position and I'm still learning about the various relationships that we have on campus. I can report that we have
a very close and constructive relationship, more than I've seen at other institutions where I've worked, between the library and the vice president, the Office of the Vice President of Research. And we, like Denver and Florida, we host in our institutional repository data sets that are submitted to us by faculty.
We do not have a lot of data in our repository. As of this morning, we're hosting 84 data sets at, I think it was 84, let me double check.
Sorry, 34 data sets in our collection which collectively represent just under nine gigabytes of data. So we are not hosting any big data sets. And what concerns me a little bit is that I don't believe for a second that there aren't any big data sets being produced on our campus.
And I worry a little bit that, in particular, some of those big data sets may not be being managed in a way that's compliant with funder or funder requirements or policy. So this is an area for further exploration for our library. Right now we have a very
good scholarly communication and copyright librarian who manages our institutional repository and does an excellent job. But we just don't have a mandate on campus to be going out and seeking data sets. I know that our Office of Research greatly appreciates the fact that we host data sets in our repository.
But the fact that we have so few of them and have such small size suggests to me that there are opportunities for us that we need to take advantage of. And I think it's to me to be the first one to answer this question. And again, you know, my
answer is going to be fairly brief and fairly unimpressive. We are not doing much in our library to actively track publication or deposit of either OA articles or research data. We have a
history of actively soliciting deposit of articles from faculty authors and in fact we solicit their CVs and then we have staff who will go through the CVs, identify articles that are eligible for deposit in our IR and who then go out and put them there.
But as is the case at other institutions where I've worked, and I know that this is true of many of my colleagues' experience, it can be awfully hard to get faculty interested in doing even the minimal amount of work necessary to, you know, just send us their CV
and get their articles into the IR. For many faculty, they don't feel like it really solves a problem for them. So while we are making efforts to track publication and deposit, we are not doing so in any
kind of aggressive or certainly policy-based manner. There's no campus policy at Brigham Young requiring or even encouraging faculty necessarily to deposit their papers. So that's where we are at BYU.
So we're, I think in a similar state, don't have any policies around deposit of things in the institutional repository. We do have an active role in the library in working with the Office of Institutional Research to ensure that all of our faculty publications are represented in
a central database. In our case, it's Activity Insight from Digital Measures, but, you know, every institution has a database of that sort. And we have an employee, actually a couple of employees whose job it is to make sure that the publications in there are accurately represented.
And to the extent that they can identify an open access version of that article while they're doing that work, or to the extent that they can identify that it is possible to deposit an open version of that article, they work with faculty to get those into the repository. So it's sort of a stealth
process, but it's absolutely not a mandate or a formal process of any sort. We do work with CHORUS to identify open access articles
in that fairly limited data set. But again, we don't do anything in this area with research data, right? It's all of our work in terms of tracking publication and deposit is about the publications, not about the data sets themselves. So, Judy.
As I noted, we're active with the CIO staff and researchers for the development of data management plans and management of research data. We can track publications by UF authors, especially the open access articles through CHORUS, through Web of Science, and reports from our transformative and read and publish agreements. We have a transformative agreement with Elsevier, which provides a discount, not a waiver,
of APCs. And we have been actively doing research with those authors to better understand why they choose open access and when and why they don't and the effect of the APCs on them. And we've actually been sharing that research data with Elsevier and it has
helped us have a better understanding of the barriers, which price is a heavy barrier, but it's certainly not the only one to those choices. At UF, we don't have a strong culture of deposit and the libraries do not actively collect open access articles or author manuscripts, except for the number of years that we had an open access publishing fund.
We did maintain those articles in the institutional repository. But again, the IR is there, it accepts deposits, providing stable links to allow the authors to cite their articles and data. And we do actively promote it while not actively pursuing the authors to try to get them to deposit their articles or their data.
So looking at what research tools and resources we're using to manage this, what would, if we were managing it fully, be a massive flow of information, I think, at every one of our campuses. We've already talked about the fact that we're using Corus, we use Web of Science, we have academic
analytics to track publications by our authors, we're negotiating licenses for Scopus and SciVal to expand our tools. I think we're a little embarrassed that for a research institution of our size that we haven't had those tools, but we're in the process of getting them. I mentioned before that we have plans to develop a system to route information from Corus to the
associate deans for research and the authors, especially as an early warning system when the deposit deadline is nearing. But we're also looking and observing a number of the other tools that we've referenced, and some that weren't even on our slide, to see what more things we could or should be doing.
And there is some internal development, particularly the collaborations between the CIO and the Office of Research to help them with that management, but it's not directly a responsibility of the library. And so I think that passes you back to Rick.
Yeah, and here, you know, again, the answer is that we are not really doing much to manage this massive flow of information, so we're not really using any particular tools for that, particularly as regards to data set management.
I mean, we subscribe to Web of Science, we subscribe to Scopus, we use these primarily as research tools. I mean, they are used primarily by our patrons as research tools, rather than by library staff for the purpose of tracking and monitoring output or research output on our campus.
And so I mentioned that we've been working on this project to clean up metadata in activity insight about our faculty publications, and we use Web of Science, we use Esploro, we use the digital measures tools,
we use the Chorus dashboard, we were pulling a lot of information in about publications, and yet sounding sort of like a broken record, we aren't doing anything in this regard about data, right? We are really not paying attention in the way that we probably should be to research data.
So I'm going to now ask us our last set of questions, which we'll attempt to answer sort of together, but how are libraries addressing the challenges of understanding multiple funders' requirements, identifying the research associated with the institution and supported by those
funders, and then communicating that work to the funders and researchers? And I'll say, again, I think we're actually doing a pretty good job as an institution, or a relatively good job as an institution in terms of published material. We are, we're tracking
funder requirements. I mean, we're tracking it internally, and I don't know how useful this is, but we have a libguide that's kept up to date about funder requirements that people can reference. We are using this project that we've been doing to identify faculty publications to try
to simultaneously understand whether those are associated with funders. Of course, the Chorus dashboard is helping us in that regard, but we're not doing this and really need to be doing this with data. We're not tracking data sets in any meaningful way.
The work that we've done so far as a library is, basically, if somebody has come to us, we take it on. But if nobody comes to us, we're completely unaware of the data set's existence
and of any need to help. So, Judy. Thank you. So, at the Smathers Libraries, we're documenting the requirements of the funders, and by maintaining that documentation,
we've been able to develop an understanding of the common and the uncommon requirements, and to assist our authors with compliance, especially when they receive a grant from a new or infrequently used funder. That's, I think, where the greatest vulnerability for them occurs. As noted, we're using a variety of tools to identify the research publications,
and to some extent, the data from the UF authors. The Office of Research does maintain a grant database, which we use to cross check a lot of our information, and we do have an ORCID membership, which allows us to actively encourage the use of the ORCID IDs, starting with the graduate students when they first arrive. An ORCID number is required to deposit a thesis
or dissertation, so we're definitely training the next generation, and that's definitely a role that is a very important one for the future. We do not directly report to the funders, but we can run reports for the researchers, their departments, and labs, or their colleges,
and by the way that we make a lot of these resources available to them, and by the training we provide to them, they have access to do their own searches as well. As Michael said, what we're really lacking is the same type of attention to data that has been given to publications, and that's something I think that we're just at the tip of the iceberg on what's
really involved on the data. And the answer from my institution is very much like the answer from University of Denver, that we have an excellent and unusually large cohort of subject
librarians here who are very actively engaged with the fact with the professorial faculty on campus, and who work with them on a variety of issues and tasks related to, you know, securing funding, finding publication venues, and then we encourage them to deposit their work, and
of course to the degree that we have the opportunity to talk to them about data management, we do. But apart from that more general subject outreach from the library, we don't have any kind of dedicated outreach program around funder policy compliance or data management, data set
management as such. So, you know, as I said in response to the previous question, there are some real opportunities for us here if we're willing to redirect some resources and take advantage of
them. So, I guess my closing statement will simply be that, you know, preparing along with Michael and Judy for this presentation has been a real learning experience for me, and it has,
you know, I always sort of knew vaguely that we had some opportunities, as I just said, some opportunities for service growth in the library around the area of data management. I understand that much more acutely now than I did. I think that's both exciting and potentially daunting for us. So, I do believe that research data management represents a truly significant
opportunity for libraries to become mission critical at their institutions in ways that we have not been in the past, and I think that's particularly important as we see what are in
some cases formerly mission critical functions becoming frankly less and less central to the value proposition that we offer, and I think that's my concluding statement. Okay, I'll go next. So, the Smathers Libraries are serving a population of over 53,000
students. A third of them are in graduate and professional degree programs, but many of the undergraduates are engaged in research. So, in terms of the population of people doing research, the students add significantly to the numbers that we cited for our faculty.
The libraries actually facilitate the publication of the Open Access Journal of Undergraduate Research here at UF, and until recently the editor was a member of the library faculty. She recently took a job working in the Office of Research. As noted in the table at the beginning, the libraries are also serving over 5,200 faculty who are publishing approximately 8,400
articles a year. As a result, our 111 librarians and professional staff, not all of whom are public service staff, so many of these are behind the scenes people, but they're vastly outnumbered and they're stretched very thin. So, we constantly search for or seek to develop automated tools to enable us to meet the demand for our services as efficiently as possible.
We try to anticipate and prepare for the needs of our researchers and authors, and the new challenge, of course, is data management, preservation, and access. Funder requirements for deposit of journal articles are relatively easy by comparison. We're watching the evolution of the requirements for data deposit with some trepidation,
recognizing that the scale is significantly larger and the data fields and formats are variable and often complex, and the policy issues around the data are even more complex. As in that quote from Springer Nature that we used at the beginning of the program,
for researchers to cite papers as second nature, no one thinks twice about it. We need to get them to think about data in the same way. I think that's sort of our new mantra. That is a part of the mission that we're beginning to undertake and joining with many others on the campus who are increasingly aware of the need for this. I think librarians are
well equipped to be effective partners in the implementation effort, but it is and will continue to be a challenge. And while some funding is included in grants, it's largely a very expensive, unfunded mandate. As librarians, we relish that challenge and we will do our part to ensure that the research data is fair, findable, accessible, interoperable, and reusable.
And with that, I'll pass it to Michael. All right. Thanks, Judy, and thanks, Rick. I also, like Rick, I think learned a lot in putting together this presentation. I knew we were behind in where we should be in managing data. I've actually been advocating for a position
I've been advocating for a greater role of the library and greater, I guess, connection with the Office of Research and with IT in this regard.
It does feel like it's something that we need to be taking on as a profession. Academic libraries have always been the place where the campus's research needs from a published
perspective have been met. And we've taken on primary sources in recent years in ways that we changes as the material that is needed for research changes,
libraries need to be keeping up with that. And it feels like now is the time that funder mandates aside, we should be getting involved in research data management.
As Judy said, it's something that is a necessity for our campuses and is a critical role that the institution needs to take on. If the library isn't involved in this process, it feels like a lost opportunity, especially considering that, as Rick pointed out,
much of the work of managing research data is work that libraries have always done. It's preservation, it's access, it's retrieval, it's description. And we should be part of those things, certainly with campus partners, but it's something that we absolutely have to be
doing. So with that, I just want to apologize again that we can't be having this as a conversation with you. We really had hoped that we would not be talking at you, that we would be talking with you and learning from you in a conversation somewhere in Amsterdam.
We do hope that you reach out to us and we're leaving our contact information here. Please do reach out with questions or comments and we'd be happy to try to get back to you. And we hope that we can have a follow-up conversation next year at the Gray Literature Conference about
research data.