We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Is Open Science FAIR for Stakeholders?

00:00

Formal Metadata

Title
Is Open Science FAIR for Stakeholders?
Title of Series
Number of Parts
18
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Physical systemInformationUniverse (mathematics)Game theoryOpen setFaculty (division)State of matterBitLecture/ConferenceMeeting/Interview
Form (programming)Faculty (division)Physical systemPhysical systemUniverse (mathematics)State of matterCASE <Informatik>Integrated development environmentTable (information)Message passingOpen setComputer animation
Form (programming)Open setData managementCore dumpMathematical analysisSupersonic speedTable (information)Content (media)Computer programmingCollaborationismObservational studyData managementSoftware developerContent (media)Open setTable (information)CASE <Informatik>Core dumpOcean currentIntegrated development environmentComputer animation
Open setTerm (mathematics)Form (programming)Data managementRange (statistics)Point (geometry)Element (mathematics)Communications protocolFormal grammarFormal languageAttribute grammarRepresentation (politics)Library (computing)Boundary value problemSpecial unitary groupLaptopOpen setFile archiverSign (mathematics)Universe (mathematics)Shared memoryType theoryNeuroinformatikStudent's t-testTemplate (C++)Cycle (graph theory)Field (computer science)Element (mathematics)TunisMultiplication signGroup actionPrincipal idealEndliche ModelltheorieProjective planeArchaeological field surveyPresentation of a groupObservational studyOffice suiteRepository (publishing)Mathematical analysisCASE <Informatik>Information systemsSupercomputerStaff (military)Channel capacityInformationDean numberVideo gameFrequencyTelecommunicationDemo (music)OAISSoftware developerComputer animation
Open setForm (programming)Subject indexingMultiplication signMereologyObservational studyCollaborationismOnline helpPrincipal idealBridging (networking)CausalityArchaeological field surveyInformationFood energyNatural numberData managementGreen's function
Open setForm (programming)SoftwareService (economics)Term (mathematics)Data analysisObservational studyGreen's functionInformationGraph (mathematics)MetadataPrisoner's dilemmaPresentation of a groupContext awareness
Form (programming)Term (mathematics)Open setSoftwareService (economics)Data analysisPresentation of a groupFrame problemFormal languageInformationMultiplication signComputer animation
Core dumpNeuroinformatikBiostatisticsSystem programmingProcess modelingRepository (publishing)Mathematical analysisData managementDisintegrationCoordinate systemData analysisProcess (computing)Duality (mathematics)Video projectorPoint (geometry)Finitary relationForm (programming)Open setComputer programmingCore dumpMathematical analysisSolid geometryINTEGRALWebsiteEmailArchaeological field surveyPlanningEndliche ModelltheorieData managementDependent and independent variablesMultiplication signBitoutputPointer (computer programming)Point (geometry)Integrated development environmentProjective planeTranslation (relic)Library (computing)Open setData storage deviceSineOrder of magnitudeSet (mathematics)Computer animation
Mathematical analysisInheritance (object-oriented programming)Data managementSmith chartForm (programming)FingerprintGauge theoryMaxima and minimaPay televisionData managementEndliche ModelltheorieAreaArchaeological field surveyRepository (publishing)Multiplication signCycle (graph theory)Video gameDivisorInformation securityIntegrated development environmentData storage devicePoint cloudWebsiteRight angleUniverse (mathematics)VotingOrder (biology)Library (computing)Process (computing)SoftwareSoftware repositoryDiagramFocus (optics)Database normalizationCore dumpBackupNeuroinformatikMathematical analysisComputer animation
Data managementForm (programming)Integrated development environmentPhysical systemOffice suitePhysical systemInformationLattice (order)Software frameworkDean numberOffice suiteTranslation (relic)DiagramEndliche ModelltheorieData managementUniverse (mathematics)Self-organizationLevel (video gaming)Client (computing)Scaling (geometry)Sheaf (mathematics)BlogTerm (mathematics)RippingFreewareLine (geometry)NeuroinformatikPoint (geometry)WebsiteSign (mathematics)Web 2.0BitComputer animation
Data managementForm (programming)Open setMaizeProjective planeInformationTranslation (relic)Type theoryOrder (biology)Data managementFile archiverEnterprise architectureLibrary (computing)Lattice (order)Collaborative softwareUniverse (mathematics)Multiplication signPersonal digital assistantStrategy gameOpen sourceOpen setRevision controlPlanningPhysical systemSoftware developerNeuroinformatikArmGame controllerKey (cryptography)Addition2 (number)Special unitary groupNumberCollaborationismMetropolitan area networkMedical imagingCategory of beingCausalityComputer animation
Form (programming)Office suitePersonal digital assistantDean numberIntegral domainTranslation (relic)Integrated development environmentNeuroinformatikArc (geometry)Open setMereologyPrice indexData managementSystem programmingLocal GroupLogic synthesisDigital signalUniverse (mathematics)Matrix (mathematics)Inheritance (object-oriented programming)Computer programSubject indexingPresentation of a groupComputer animation
Transcript: English(auto-generated)
So I'll start off with a definition of open science, and then also a fair, and then also stakeholders. And here's a little bit of information of background University of Florida, established 1853. It's a Carnegie Classification R1, Doctor University. We have over 54,000 students and over 4,000 faculty.
It's located in Gainesville, Florida, USA, and it's the leader in the state university system of Florida. So here's the table of contents. Once again, just gonna introduce a definition, one definition of open science, fair, and then stakeholders, and then leading into a case study
of National Institutes of Health, National Institute of Environmental Health Science, P42, Data Management Analysis Core. So basically, we had a PI from the Center of Environmental Health and Toxicology go for this particular program solicitation,
and the new program solicitation required a Data Management Analysis Core, which was new. Then also I'll talk about the proposal for the development of a Data Management Analysis Core, and then developing social technical data management collaborations with stakeholders. So here's just a definition from Foster as far as open science.
And just focusing on that definition, pretty much quite a few researchers at UF are dealing with the challenges of lab notebooks, electronic lab notebooks, which would be the best one. Actually, lab archives has been courting University of Florida for over a year. We haven't really decided to go with lab archives.
We're also interested in side notes. So currently right now, trying to demo various electronic lab notebooks that can satisfy the needs of a large portion of the researchers, high producing researchers at UF. As far as the FAIR principles, pretty much everyone is familiar with the 15 principles. These are just the first element
of the four group of findable, accessible, interoperable, or reusable. Initially, prior for this presentation, I had developed a survey, which included the open science definition, then also the FAIR principle. I shared it with the lead PI of the grant-funded project I'm gonna talk about.
And he said, what does that mean? So he looked at open science. Even though he read the definition, he's like, what are some examples? And then with the FAIR principle, I listed all the 15 principles. He said, that makes no sense to me or any of the researchers. And so, but I'll go into the details.
And basically, they have not been introduced to these FAIR principles. So there is a data literacy. And the next two slides, we'll talk about that previous study. Stakeholders, this is key. And this also should include students, students and staff, particularly at UF, there are three case studies that are led
to a proposal that was accepted yesterday, where you had graduate students who are not able to have access to hypergator, which is research of high-performance computing at UF, because it requires an investment, is mainly for researchers. And then also we had a doctoral student who came to my office who needed to store three terabytes of data
so that exceed the capacity of the library. And then we had another student in Marsden Science Library. So I'd like to acknowledge Valerie Minston, who's the chair of Marsden Science Library, who shared with me that there was one student in the lab who had a bank of computers. It was doing analysis and used up a whole bank of computers where other students came in
and didn't have access to the computer. So it disrupted accessibility. All three of those cases were students who need access to high-performance computing, but did not have the resources. So they were making use of the library resources. So there's a proposal developed. It was approved by the Dean of the library,
then also the Vice President and CIO of Information Technology. And that was approved yesterday to provide resources to have students be able to access research computing. So this is key. All of this underscores open science. So look, once again, open science, how our research is gonna collaborate and contribute,
where's the research data, the raw data, derived data, where is it gonna be stored, what type of repository, discipline-specific repository, institutional repository, a general repository. So all of these, even the methods and shares, reproducibility, what type of templates,
what type of scientific workflows, all of these challenges are embedded within our open science definition. Also that includes the whole data curation life cycle model then embedded within that is also the open archive information system. So that open science, I can see most researchers looking at what does that mean? But to us in this field,
we actually know that's embedded and that's over a period of time of models. So this is from BRAA. This was an early study done early this year and it's at a nature index. And so 15% of scientists that were surveyed are familiar with FAIR.
So as you can see, the green is, I'm familiar with FAIR principles, yellow, I have previously heard of it and then gray is never. And as you can see, and this actually coincides with the researchers at UF, potentially when I introduced the FAIR principle as part of a survey and then also open science.
And as you can see across the discipline, quite a few scientists and disciplines aren't even aware of the FAIR principle. So that comes down to data literacy, how do we bridge that gap? And so you need to have data management librarians and collaborators to help bridge that gap as far as information literacy and spreading what FAIR principles is. And how could it benefit our researchers?
And this is another graph from the same study and it's basically compliance with FAIR principles. So only one third of those that are familiar and then it shows the green of very much, yellow somewhat and then neutral. So once again, this study just underscores that even though FAIR principles has been around about three or four years on the mainstream,
but it's still, it's a lot information literacy as far as getting to the researchers to explain what it actually is and the metadata. A few of my presentations, scientists have stopped me in the middle of presentation to ask what is metadata? So basically just starting the presentations off
with brief definitions to kind of frame it, not an absolute definition, but the frame to give them context, but also mapping it to their own language. So better than saying metadata, information describing your genomics data. So just working with the scientists to better harmonize some of the language that is familiar within this profession,
but not outside in other disciplines. So I spent a little bit of time on this. This was sent to me from the vice president of research. So this was the new core. So any Superfund site that was going for, any institutions going for this, have to have this new core. So currently there are 21 funded Superfund sites under NIH.
So when this came and went to the vice president research and then sent it to the library, I actually looked at this and my first thought was this is huge. So even the PI looking at this, he had no idea where to start.
So what I did, I contacted currently funded Superfund site, two PIs, one PI North Eastern, which has a very solid data management and modeling core. He was willing to speak with us and also the PI via Zoom, which was very helpful. And then another PI for a Superfund site in New Mexico
also agreed to speak with us to help us better understand and how were they managing their data. I sent emails out to other PIs. They responded, it wasn't a requirement in the past. So we really hadn't thought about it. And then also prior to this, I sent the email.
Well, I contacted NIH, sent the draft survey asking for input and the response was, we do not sanction or support such a survey. We'd have to go through OMB and we do not have any plans to do so. So basically I was contacting NIH to give me some pointers to help
so I can assess some of these currently funded Superfund sites. So maybe my approach was wrong. I'll put it like that. The abstract, which is in the program is solid, but it was just my approach the way I went about it needs to be worked on. So the purpose of this is the Superfund site includes four cores,
where basically BMR stands for Environmental Core Research and then ESE is Environmental Science, Environmental Science Engineering Projects. And so with this project, there was administrative core, a research translation, data management analysis core, and then also outreach.
So as you can see from this new core requirement, basically you have to support data management and integration of data access across the center, irrespective of data size. So it could be 30 gigabytes or even 20 terabytes. So how do you plan to do that? Then also the DMACC, the Data Management Analysis Core does not have a set budget. So a lot of researchers now are having to include
within their budget the funded proposal for data management. Most of them that have interacted are doing this after the fact. I was added on to a grant last year and then was taken off this year. They said we can have funded and they were going internally. And so there's a challenge of the need to set aside a budget upfront
as far as how to store the data and to support open science. The diagram to the last, this is developed by the lead PI, Dr. Priscilla Volpe from the Center of Environmental Health, Human Toxicology.
We're gonna focus on this Data Management Analysis Core. This diagram to the right, this was developed between the library and then also Dr. William Barbezov, who's a Director of Informatics in the Interdisciplinary Center for Biotechnology Research. So this middle area is modeled after USGS,
United States Geological Survey Science Dive Data Lifecycle Model. And then once again, this is restricted data. So this is specific at UF. We have a secure research environment where the researchers have restricted data. They can use that, but they have to investment cloud it.
The cloud storage is the hypergator at University of Florida. And then also just explaining the various data repositories and storage, the curation process and redundant and backup. This is modeled at the Northeastern ProTech and they have a data management and modeling core, which is very solid and actually predates the data management requirement 2018.
So it actually is an exemplar. It was one of, out of the 21 currently funded Superfund site, in my opinion, is an exemplar. And then the University of New Mexico, the PI, Dr. Christopher Volpe asked the PI University of New Mexico,
well, how are you supporting this data, the Superfund, the data management at your institution? And he responded that they're heavily leveraged. So in order, even though they're funded by NIH, they're heavily leveraged as far as having to get funding from other resources to help support the management of data being funded by this one particular agency.
So there are a lot of challenges as far as software development, engineering resources to support the data management. This was introduced last year. And actually, this is the diagram
as a social technical framework. This has been around for decades. There's a lot of literature on socio technical systems thinking, but like to introduce this as far as data management and also this can apply to data science. The three that are circled are currently what we're facing
in dealing with data management right now at the university level at the University of Florida. So of course you have your stakeholders. So we have had a vice president of research, director of research computing, the director of clinical translational science, information technology, and then also the dean of the libraries,
all in agreement. And then regulatory framework as far as the new requirement from NIH, but also funding agencies, but also the financial consideration. And so this is important here as far as who's gonna support the data management at UF. And so initially the office of research, they're gonna start some initial funding, but then we have to come up with a sustainability model
over the longterm between research computing, clinical translational science and Georgia Smathers, how we plan to support data management across UF. So the top four that are in bold are the key organizer of this proposal, which is right now in office of research.
These bottom two are just the clients, the first clients of this NIH proposal, which was not funded. So even though it was not funded, it led to the proposal of a data management at the university scale where the vice president of research sent out an invitation,
specific asking people to come. He developed a meeting and invited us to come up with a proposal. So this is where we are now. So the first meeting was with the VP and assistant vice president of research initiating a data management planning proposal and discussed a plan that was December the 12th, 2018.
And that NIH proposal was submitted December 18th. So this actually predates that proposal by almost about a week. And so the proposal was developed September 10th, sent to the VP after being vetted through key stakeholders. And currently we have a meeting set for October 31st.
This will be the second meeting with the vice president and also assistant vice president of research to discuss next steps. So our proposal and includes our key stakeholders, but also coming up with a strategy of how to support various funded projects at UF from those that are multimillion dollar projects that have enough funding
to pay for data management through clinical translational science, information technology, which is very expensive to those for the smaller projects that might not have a big budget for data management. So coming up with a proposal that satisfies both the high end research project, but also the low end as well,
and that's underdeveloped now. And so I say that because all of this is necessary in order to support the underlying infrastructure for open science, which what type of collaboration tools you plan to use. You have some researchers that are using definitely open source as Jupyter notebooks,
then you have some we use in lab archives. One issue that came up, we were trying to push for a university wide license for GitHub version control software, where you have quite a few researchers using GitHub. Apparently we do not have a university wide license. At the time when I submitted the proposal, it was 30,000.
That was before it was taken over by Microsoft. Now enterprise edition for GitHub is 50,000. So it's increasing. University of Minnesota has a GitHub license. And so we really couldn't push forward because at the time, some departments said it's not a priority. So if some of the challenges is working with some of these stakeholders to identify,
the library sometimes are an initiator and a leader in some of these challenges, but we're not a big enough player with which to move it forward. So it's just a constant development, but right now I would say I'm very positive with the libraries being considered a partner with the VP of Office of Research, Research Computing,
Clinical and Translational Science Institute. So I'd like to acknowledge once again, all of those who contribute background to this presentation for me being here. And here's references. And that's all I have.