We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Research data storage and sharing: a UQ journey #8 - 13 June 2018

00:00

Formal Metadata

Title
Research data storage and sharing: a UQ journey #8 - 13 June 2018
Alternative Title
Research Data Storage and sharing made easy with the UQ Research Data Manager
Title of Series
Number of Parts
20
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The University of Queensland has developed the UQ Research Data Manager (UQRDM) to provide researchers and higher degree by research students with a means to practise good stewardship of research data. The UQRDM also facilitates collaboration among researchers, both within UQ and other Australian and overseas institutions. External partners can access their project’s research data easily, using their own institutional credentials. In addition to research data storage, the system provides automatically generated and downloadable data management plans that researchers can use in grant, ethics and industry funding applications. The system will also close the loop between working data and managed data collections.
Data storage deviceService (economics)BitRadiusData storage deviceData managementShared memoryProjective planeComputer animation
Service (economics)Data storage devicePoint (geometry)Projective planePhysical systemTerm (mathematics)Multiplication signStudent's t-testComputer animation
MassMeta elementSystem programmingTerm (mathematics)Insertion lossControl flowEvolutionarily stable strategyInsertion lossPhysical systemMultiplication signProjective planeMultiplicationOffice suiteProcess (computing)Universe (mathematics)Goodness of fitStudent's t-testDesign by contractMereologyLaptopCollaborationismMoving averageInformationType theoryBoss CorporationMetadataComputer animation
Series (mathematics)Human migrationLoop (music)MetadataData storage deviceField (computer science)Client (computing)SynchronizationPoint cloudSingle-precision floating-point formatCollaborationismComputing platformInformationProjective planeMereologyExecution unitCollaborationismPhysical systemNumberType theoryQuicksortMultiplicationStrategy gameComputer configurationUniverse (mathematics)View (database)Computing platformDecision theoryData storage deviceSingle-precision floating-point formatRule of inferencePoint cloudFeedbackInformation privacyPosition operatorInformation securityMessage passingClient (computing)Faculty (division)Shared memoryBasis <Mathematik>Computer fileMultiplication signBitField (computer science)MetadataLaptopCASE <Informatik>EmailLoop (music)Series (mathematics)Tablet computerFunction (mathematics)Group actionDevice driverSupercomputerSynchronizationData storage deviceLibrary (computing)Focus (optics)WeightLibrary catalogComputer animation
EmailBackupPlanningSheaf (mathematics)Category of beingMetadataCuboidInformationProcess (computing)Data managementMessage passingFeedbackRow (database)QuicksortFitness functionEmailInformation overloadShared memoryProjective planeTouch typingExecution unitComputer configurationPasswordCartesian coordinate systemBitIdentifiabilityLink (knot theory)Moment (mathematics)Game theoryPhysical systemUniverse (mathematics)BuildingSet (mathematics)Address spaceDecision theoryAttribute grammarPC CardGroup actionMaxima and minimaPrincipal idealType theoryBackupPresentation of a groupCollaborationismYouTubeTerm (mathematics)Computer animation
BackupNumberStaff (military)Strategy gameSystem administratorData integrityTerm (mathematics)Projective planeProcess (computing)Series (mathematics)Physical systemMoving averageComputer programmingYouTubeCollaborationismCategory of beingTelecommunicationStrategy gameStudent's t-testSheaf (mathematics)Different (Kate Ryan album)Universe (mathematics)Message passingComputing platformMereologyData managementImplementationMultiplication signDegree (graph theory)DebuggerFaculty (division)Execution unitLocal ring1 (number)Disk read-and-write headReflection (mathematics)Staff (military)Archaeological field surveyFrequencyOrder (biology)Loop (music)Presentation of a groupVideo gameBitOnline helpOpen setWave packetComputer animation
Data storage deviceSystem administratorData integrityMathematicsTerm (mathematics)Office suiteLink (knot theory)LaptopFeedbackSystem callPoint (geometry)Collaborative softwareCollaborationismGroup actionRule of inferenceINTEGRALMereologyProjective planeLatent heatComputing platformMessage passingMultiplication signCore dumpCustomer relationship managementLink (knot theory)Data managementIntrusion detection systemData storage deviceContext awarenessNumberReal numberThermal conductivityLaptopExecution unitDesign by contractCodeCodeSupercomputerNeuroinformatikArrow of timeMoment (mathematics)Universe (mathematics)Cartesian coordinate systemWebsiteImplementationRow (database)Queue (abstract data type)CASE <Informatik>ChainYouTubeForm (programming)Disk read-and-write headInstitut für Didaktik der MathematikStudent's t-testRevision controlDigitizingCuboidIdentifiabilitySoftware development kitMetadataOffice suiteSpacetimeRepository (publishing)File archiverSubsetFunction (mathematics)Slide ruleLibrary (computing)InformationInterface (computing)Term (mathematics)TrailPermanentOnline helpDigital object identifierSet (mathematics)FeedbackStaff (military)1 (number)BitSign (mathematics)XMLComputer animation
Computer filePlanningData managementFluid staticsProcess (computing)Uniqueness quantificationBuildingPropositional formulaMessage passingProduct (business)Musical ensembleGame theoryStudent's t-testStaff (military)Library (computing)Content-addressable memoryService (economics)Directed setPerformance appraisalCodierung <Programmierung>Projective planeUniverse (mathematics)Presentation of a groupFluid staticsCodePhysical systemData storage deviceSystem administratoroutputMereologyMultiplication signComputing platformPoint (geometry)Point cloudTrailRaw image formatPlanningData managementSeries (mathematics)Moment (mathematics)File formatProduct (business)View (database)Sheaf (mathematics)Row (database)BitPerformance appraisalMixed realityLibrary (computing)Staff (military)Message passingNumberEmailUniqueness quantificationPropositional formulaProcess (computing)Single-precision floating-point formatDirection (geometry)BuildingStudent's t-testBit rateFile archiverComputer animation
Decision theorySubsetTraffic reportingCodeTerm (mathematics)Information privacyDynamic random-access memoryType theorySelf-organizationElectronic mailing listSupercomputerFile archiverMultiplication signMereologyPower (physics)InformationNP-hardSystem callDisk read-and-write headOpen sourceWeb 2.0CollaborationismProcess (computing)EmailFront and back endsClient (computing)Software developerLine (geometry)Universe (mathematics)FrequencyProjective planeTheory of relativityPhysical systemGame theoryPoint (geometry)Cloud computingFaculty (division)Game controllerLevel (video gaming)Functional (mathematics)WebsiteService (economics)Point cloudData storage deviceNumberProduct (business)Repository (publishing)Performance appraisalSynchronizationDifferent (Kate Ryan album)FeedbackShared memoryMultiplicationAreaHeegaard splittingMetadataRow (database)QuicksortFood energyCASE <Informatik>Sensitivity analysisAttribute grammarGroup actionBit ratePlanningSet (mathematics)WeightDebuggerIdentifiabilityData storage deviceExecution unitComputer animationMeeting/Interview
Physical systemMultiplication signFeedbackEmailProjective planeGoodness of fitDescriptive statisticsVideoconferencingData storage deviceData managementSynchronizationPoint cloudGreatest elementRow (database)YouTubeFile formatFerry CorstenFile archiverDecision theoryLevel (video gaming)Student's t-testInteractive televisionComputer fileLibrary (computing)Point (geometry)Context awarenessTerm (mathematics)Service (economics)BenutzerhandbuchNumberMoment (mathematics)CuboidElectronic mailing listHidden Markov modelQuicksortFrequencyStandard deviationGame theoryCondition numberUser interfaceDesign by contractStatement (computer science)Generalized linear modelMereologyRange (statistics)IP addressSensitivity analysisCausalityImplementationShared memoryBasis <Mathematik>Demo (music)Type theoryComplete metric spaceAverageInductive reasoningFigurate numberMotion captureUniverse (mathematics)InformationHTTP cookieCASE <Informatik>Software developerPersonal digital assistantBuildingData conversionLink (knot theory)Directed graphCoroutineCore dumpWordMeeting/Interview
Multiplication signJSONXMLUML
Transcript: English(auto-generated)
Welcome everybody to this webinar, this Ansonic to RDS webinar. My name is Susanna Sabine and I will be your host for today. I will be joined in a little bit when they turn their cameras and things on by Sandrine and Andrew from the University of Queensland. They will be, oh there they are. Hi guys,
welcome. Sorry, they will be talking to us today about research data storage and sharing made easy with the University of Queensland Research Data Manager who's talking today. Sandrine who's the project lead on this research data management amazing initiative and Dr Andrew Janke who's
the technical lead for the project. So I'll hand it over to you Sandrine. Thank you. So yeah we're going to both do this today. So the first thing is to just give you an introduction to UQ for those who aren't aware. So UQ is a university of about 2,700 academics. We have about 5,000 HDRs typically at any point in time so
PhD and Masters students and this project was also not about just building a new system but it was about changing how UQ thinks about research data. It's about how we manage research data long term. It's how we meet our obligations to publishers and to funders. It's all the questions
around how do we store research data for 40 years, how do we store it for 25, how do we store it for 100 years. It's the questions that are often hard to think about and while we're not going to say we solved everything with this project but it's at least we could build a system that had a look towards some of those problems. So when we started the project it was a
initiative out of the Office of the Deputy Vice Chancellor Research and it started off as a project around human data and that human data was hard to store. Researchers are under a lot of obligations around human data and how we manage it long term but with the system we developed it was noticed that
perhaps we could try roll out for the whole of UQ for all types of research data. So when we went and talked to academics before we started and piloted the system what you see there is a lot of the problems that academics were having and this is pretty consistent across the whole university. Multiple people were entering multiple things into multiple systems
and that was seen as a waste of time. It was a loss of research time. There was no reward in a lot of these systems around why researchers were entering all the metadata and information in systems. Collaboration is always hard especially to do it within the restrictions that are placed upon
you by some contracts within the university and funding arrangements. We had students running away with data on laptops and you couldn't contact them at the end of PhDs, all the things that happen in most universities. So it was about building a system to help researchers with their problems but in the process of doing so also build a system that happened to meet the
university's obligations around research data and having good collections of data attached with publications. So you know here we are we're essentially playing Santa Claus here because we're being asked for the moon. We might not be able to make it. We might but you know we've got to try. So the way we
approached it was instead of asking for funding to solve all problems in the world including the moon we decided that instead we should stage this as a series of projects and these have run over the past now two and a half years within UQ and it was staging it as manageable chunks. So the first part
was around building a metadata system which was like a DMP but a whole lot more minimal. It was trying to decide which pieces of information were absolutely business critical and nothing more that we needed to know
about researchers projects. Part of that messaging was around looking at existing university systems and recognizing that these systems were necessarily researcher focused. So the existing library catalog systems and the publication systems were centered around the researcher as an individual
rather than around projects. So it meant we had to build a new system alongside the existing library systems around projects. The second one was about how do we store research data or make it available in all the places that researchers expect. So this meant that we talk to researchers and get them to
understand tell us how do you work with your data currently. The third project is where we start to get a bit interesting and this is around how do we attach the working data storage to outputs. So how do we identify a group of data within a working project and this led to a publication
and number five sorry number four we're not there yet. Number five is about closing the loop it's about how do we roll this out to the whole of the university it's not a not a system which we thought where we could just launch it with an email and a bit of fanfare and say here's a new system you should use it. We didn't think that would work well so we had a whole
strategy around that which we'll go into here today. So in a bit more detail so there was some specific things we were going in on the first project and that is we sort of had a an idea that it had to be a no black holes approach and by black holes approach this is expanding upon the idea of a
researcher might enter ethics information into multiple systems within the university that would enter their ethics number alongside their grants. They would enter an ethics number to the HERA committee that they would make sure they have ethics locally within their centres so it meant multiple systems that they were entering information in so even for simple things
these are what turn off our users if they have to enter the same information again so we we had to remove that type of system. We also had to know that what was essential so for us what was essential was the type of project so what sort of data they had the collaborators they wanted to
include on that project including national international and the unit that led the project and this is not even when you say the unit that leads it that's not a simple thing because people get in the quandary of I'm administered by this unit in my university but I have a joint position with this other unit so the messaging we always give is it's not about you it's about
the project and then the story becomes a lot simpler typically everybody can identify a research unit that leads a project rather than they themselves and then we ask them where which sorry we don't ask how they would like to store their data how long they need to back it up for the types
of security instead we make those decisions because it's a constantly changing landscape of the Australian Privacy Act the ARC rules so we ask about the type of data and from that we make a decision. The second project in a bit more depth is the feedback we got is that a lot of people were using
Dropbox a lot of people were using Google Drive a lot of people were using our net cloud store some people use the university fast sharing systems which at the time in UQ were faculty base every faculty had their own share and they couldn't share between faculties and men that the system we developed had to be as good as all those systems because if it
wasn't there wasn't going to be good uptake so the system we built is a single view of storage we tend to call it we set up a single drive letter across the whole the university for UQ this is new for a lot of universities this isn't new so we understand where this might seem like it's pretty simple but for a researcher in UQ the fact that someone can see the same
drive though they can another school was new we also had to have a cloud-based system it was important and we synchronize the view between those two systems so something you put into the our drivers we call it will appear in the cloud system and something them and other researchers put us in the cloud system for a project will appear in the our drive
as well but some researchers we also mount the same storage into HPC facilities because that's where they need the data they don't want to be copying it we also have a synchronization client much the same as on it or Dropbox or cloud store any of these systems which means that a researcher can choose to synchronize what they want from their projects to
their own devices their laptops their iPads their tablets their whatever and this is an interesting use case and that researchers do things like they go out and they record interviews with people out in the field on their iPhone so it means we can support that now they no longer have to then plug their iPhone into their laptop everything should just work that's what
we're aiming for the second part is that it has to work on any any platform within UQ this is important in that existing systems there were second-class citizens the Mac users the Linux users in the university we always took a view from the start of this project that this system has to
work equally well across all major platforms that our users were using it had to be easy within Australia which meant there's one option the Australian Access Federation and for international there's really only the other way to approach this is you have to make individual user names for your
international collaborators and that's not great because the experience for the international users and the national users should be as seamless as those within UQ I think when you're building these systems you have to think about building it not just for your own users but we always thought about building this system from the start for the external user as well
which meant they had to use their existing university credential we've already lost the game this means we need to rely upon something called the edge of person principal name the EPPM this wasn't supported in the AAF Federation when we first started it now is it's an hour required attribute
which meant we had to do a lot of legwork talking to the universities who don't support that attribute yet and raise it on their radar as to why we thought it was important it's about getting a long-term persistent ID on external researchers and we can't rely upon email for that because an email address is often recycled within a university there's only so many Hien
Noyans you can have a university before that email address is reused which would allow access to a data set to a new Hien Noyan who shouldn't have access to that data set so we had to use persistent identifiers we now capture Orkut wherever we can for example and that may be a long-term view
but at the moment the adoption of Orkut within Australia is not huge so we can't rely upon it yet. Industry partners are hard we had a couple of thoughts at the start of the project of how we'd handle them we of course support out-of-the-box sharing via email so you can share a one-time link
to a subfolder or to an entire folder to an industry partner they get a password which is auto-generated so that's much the same as Dropbox the same workflow we also knew that we wanted to get a persistent ID on industry partners where possible to solve the 40-year problem our thoughts
around this are for LinkedIn we get what I would call very bimodal feedback on this some people like it some people hate it the message we have to go out with is that we've made LinkedIn available for you if you choose to use it it's available we certainly haven't gone out with the message of saying you integrate your industry partner with LinkedIn the
feedback we get from some of our industry partners within UQ is that some again some like it some don't so it always has to be about providing options the third deliverable was about DMPs our researchers needed them often for grants for various things and for a lot of people these things are
confusing so that meant for funders they're often asking for this ethics often asked for this and often what happens is a researcher will enter a bit of throwaway text which they've canvassed around with their friends and see what else someone else has put in an application and doesn't really mean much they're not invested in the process which we'd like them to be around data management planning even if it's very soft touch
so from the start we ask some questions these are optional and if the researcher fills out these tick boxes it means that from that information what we can do is start to build a picture of how they're managing their metadata
within the project they can find again as much of this information as they need or they want or they see fit we encourage them to fill out as much as possible but none of it is mandatory we also don't don't present these questions at the start we only present them if they go back into the RDM system so it's not overload when they first fill out a record they only see a very small amount of information requested when they go back to update
their record they're encouraged to add this extra information so these are sorts of question we ask information about these have been structured largely upon existing DMP sections but we try and make things as drop-down and as easily fillable as possible this means that when they go back to their
records we provide easily downloadable data management plans because we now have all the information that we know about the project and we build a as detailed data management plan as the information they provide so if the feedback is from their funder that they need information about
intellectual property well then we ask them to fill out that section tick and cross the boxes regenerate the PDF and a data management plan is generated for them which is compliant with the various funding requirements and the paper requirements as much as we can possibly make it so then they just copy and paste that into the grant application it was also about we need
to provide an easy way for them to do things which are currently hard for them and things which are hard for researchers are things like backup if they have to manage it themselves we're also starting to think about long-term curation of data at EQ it's not easy because most researchers acknowledge the question I lead on with is you know how long do you have to keep your
data for I often get blank looks for that and I'll see the the presentation by asking a question of you know who is AIC funded and then who knows how long you have to keep the data the answer is five often we get seven because that's the science answer and it's about doing the education but also
making them aware that we're not requiring them to know this information it's about if we give them all the information if they give us the minimal amount of information about their project about the grant about the type of grant it is because from the application ID it means that the RDM system can make the decision about how long to keep that data for them and
it also means we can provide them a system where we can guarantee that working data will exist 25 years from now because the data is taught by the project there's an identified group of CIs by their persistent ID wherever possible and if they've left it means that we have to school the unit that led that project and that means there's a role that
someone has identified as looking after that project long term so maybe we've solved all the world's problems maybe we haven't we can tell you now about what the experience has been so far based upon the system we've developed and the roll out process we have so far
Andrew has explained that it's a series of projects that the UQR team was built so I'm going to concentrate my thoughts on IDMT4 which is closing the loop which is the implementation of the system at the University of Queensland so we are tasked with implementing the system across all universities
so that means talking to researchers in life science but also talking to researchers in humanities their needs, their perception, their vocabulary are different so we need to adapt ourselves so we started the launch in January we're now halfway through and so far we're looking good
it has been a swift uptake so we know that more than a thousand projects have been created on the system we also know that they have been set up as collaborative projects with two or more collaborators for each project this is really good news for us
because it demonstrates that people are not simply dumping the data of me in the system they're actually using the system, the platform to really drive the collaborations So who are our users? Well we now have more than 1,200 UNIX IDEA
so we know that the majority are staff members we also have a lot of students in particular higher HDR students, so higher degree by research and also about 300 external collaborators so we know that these are mostly from external universities
So Andrew has briefly mentioned that but it was not just about saying okay you've got a platform go and use it now we spend a lot of time doing some pre-work before the launch so it was about making sure that the senior management is informed and also is taking a big part in the project in engagement
so the DDCR, the Deputy Vice Chancellor of Research is sponsoring the project and there has been some really clear messages in terms of communication to our associate dean research to make sure that they were aware of the system and were supporting it We have done some work as well with ITS, so it's about working with different sections of ITS
not only about the communication that we'd like them to rely on our behalf but also with the help desk So it's about us understanding what the support desk needed and making sure that we were putting in place resources that they will require
when they need to be at the front end and making sure that the answer users inquire The other category of people we've done some work with before the launch were the librarians so at UQ we've got liaison librarians So each school at the university has dedicated librarians who look after researchers, help them answer the questions about research
so we need to know who they are, training them, involving them in the implementation that was coming in the school So it's really a partnership with all the stakeholders In terms of the rollout strategy, we approached it a little bit like a road show
So we're spending time with each unit, whether they're a school, a faculty, a faculty center, an institute at UQ and we plan with them a dedicated and purposeful rollout So what does it involve? Well it involves quite a long engagement period
So we spend time communicating with the executive of the unit, so the head of school for example, the school manager So what we're going to do now is to explain why we're coming, what the platform is about, how we would like to tackle the presentation to the whole unit We do that, the presentation to the whole unit, and students would do that typically what we call week zero
And following that, there's two weeks of dedicated support here that is offered to the school where we come back, educate someone, spend some time with students, whatever they need us to do in order to drive uptakes Along the way we communicate with the head of school for example, so they know what the uptake looks like
Open a survey, collect feedback, and reflect on the presentation and reflect on the rollout So that's pretty much our engagement with units, but that also involves a lot of people outside of the school So for example, research administrators, so they are based in faculty or in school
They're the ones with the most local knowledge And they're usually the first point of call for researchers We also talk to the research partnership managers so they're the people in charge of contracts with industry partners at UQ Because research partnership managers are also the first point of contact for collaborative agreements
to share the data or check that the contract is already in place with industry partners Allow the researcher to use the UQ research data manager We talk to the research integrity advisors So they are academics who can answer questions about integrity and best practice at UQ
And it's important that we make them aware that we're talking to their units Because there also will be a point of contact when people are asking questions about the UQ RDM When they are asking how it fits with the integrity and the code of conduct And last point, we're talking with the ITS relationship managers
So they are people based in ITS and they are the link between schools and ITS So they're the ones answering the questions about storage when people approach them for example So this exercise is really about raising the awareness of the platform amongst this group
And our message is always the same with all these people It's not just an IT platform It's about making sure that it doesn't become a big dump of unorganized data So it's one of the messages that we keep on stressing It is for projects and not for people
So not about the last 10 years of data of me But really the data of the project that I'm associated with at this point in time So it needs to be project specific Because it's described by the metadata and organized by projects And to create this approach we have the collaborative tool that is available to them
So people understand better when you say well actually define your project by the group Of collaborators that you need to give access to the research data As part of the role as well, we try to establish a data custodian role So going back to our metadata form
Andrew mentioned before that we ask which school or department needs the project So that gives us pretty much a chain of commands Because in each school we're setting up the data custodian So it's a role usually held by the head of school
That will help us to have that chain of commands to make sure that the data is never orphaned For example when the library receives a request regarding a data set for reuse And the investigators are no longer at UQ So the data custodian will be contacted to seek their recommendation For example when people leave projects and they are no longer UQ staff on the record
So it's about transferring the lead of the project to somebody else The data custodian role can either act or delegate in that case It's also about research integrity So when concerns are raised on the integrity of a research project
And the UQ integrity office needs to investigate And potentially gain access to the research data They can contact the data custodian role So it's really a little step towards the F and the A of FAIR data So findable and accessible At least within UQ
That way we know that UQ has a pretty good oversight of the research data If it needs to be found again As part of the implementation, so IDMP4 We're also instilling the message that the UQ RDM is here to stay So we're working with other departments at the university Such as graduate school
So the unit that looks after our PhD students for example It's about making them aware of the platform of course I'm encouraging them to communicate to students that it's a good tool to use It's about talking to the office of ethics So if they've got questions or if they've got concern in an ethics application They can point the researchers to the UQ RDM platform
It's about the office of research integrity Who needs to answer questions about research integrity And again if researchers ask them questions They can point them to our platform And last thing it's about We're also using with the research computing center
Who looks after HPC facilities at UQ To make sure that they know that they can direct researchers to us as well The other aspect on which we are working at the moment It's about linking So going back to IDMP3 Which is about migrating data to manage collections
So far the UQ RDM has been used for working data And it is intended to be used that way What we need to do is making sure that the green arrow that you've got on the slide here Actually goes towards output So it's about a researcher selecting a subset of the data in their project
And pushing it towards the data sets for which they can obtain DOI And linking them to our institutional repository which is called eSpace So again that's another little step towards fair data The other aspect of that project is also about archiving
So it's making sure that at the end of a project Researchers can select everything and push it to archive This is about compliance Making sure that UQ help researchers to comply with Legislative requirements So if the data needs to be kept for 10-25 years
So still in our way of linking with persistent identifier We have made that link with Orkut So people can actually see with the collaborations that they've got involved in the project Their Orkut ideas So permanent identifier to make sure that the data doesn't go astray
It's about, so Orkut is one platform And we're also linking with RED, the Research Activity Identifier So by project it's done by person So what's coming for the project is further enhancement
So the UQ RDM will be used to provision digital research notebooks So by digital research notebook we're talking electronic lab notebooks So the electronic version of the lab notebook that goes into the laboratory Where researchers put all their notes and signs etc
So when researchers create a project record on the UQ RDM They will just have to tick a box And the digital research notebook will be made available to them if they choose to This is about making sure that they keep all version of all documents So it's particularly important for compliance issue For example, later down the track they want to pattern
Or they've got very strong requirements in terms of keeping all versions of all documents The other piece that we are working towards It's about giving researchers feedback on the use And on a bit more education around the UQ RDM So for example, raising awareness in terms of how long does that need to be kept for
For example, so we'd like to feed that back to them Having an interface where it actually shows Well your data will be curated for such number of years The other piece of information we'd like to feed back to them
Is the cost of storage Not that they will have to pay for the storage But it's about making people aware Industry partners are granting bodies how much UQ is investing in the creation of data And raising awareness around the income contribution for any kind of project
So it has been an interesting journey And while the project is still going strong We know that we've got some challenges ahead of us So the first one that I will mention is about data management planning So Andrew has mentioned that we've got a section on the record
That has a series of questions about DMP We're not quite sure how this section will be used It's not compulsory It's totally up to the researchers So if they want to use it, they can But will it? So that would be quite an interesting question to revisit later down the track
The other challenges that we've got is around extra sensitive data So we've got a good platform But when you mention cloud People can be a little bit concerned And industry partners in particular So we have more convincing to do Making sure that we adapt our system to the more higher requirements if you'd like
We need to do other further steps towards reproducibility So there's more education Because it's not just about providing storage It's about educating people what they can put in there What they should be putting in there And also when we get to the archiving
Making sure that the package that is produced Is actually useful for reproducibility So we've been talking about baggage format We've been talking to the Delacrate project To know where we're going It will also be about keeping the flame alive So the uptake has been swift as I was mentioning
But we're very well aware that the system cannot be static We've got a competitive edge at the moment At the EQ because we've got that one view What single view of research data at EQ But it's about making sure that the system is enhanced To keep on being perceived as one of the best platform for EQ researchers
We've also been talking to scientists and instrument managers That would like the same kind of system but for instruments So not just about working data but raw data It will come in time We haven't had a single solution at this point in time to offer to them
So it's still in the making Well we've got all these challenges We also know that the last five months have demonstrated That the platform is successful and has taught us a couple of lessons So we know that we're on the right track with the engagement process It's very time consuming but it's also worthwhile
People are engaging, engaging successfully with the platform And I don't think it would have been possible if we just had released it by an email It's also making sure we know that we need to keep on building a unique value proposition So why is the UQ RDM better than the commercial product at UQ
And keep on message And because of the number of stakeholders This is quite a challenge as well Making sure that we are all saying the same thing And lastly we know it's a success because it has Involved the input of a lot of people
So it's a mix of people, academics, admin people, library people Recent PhD students, technical staff and also champions So researchers have been a big part of it Without their feedback and without their engagement with the platform I don't think the project would have been that successful
Last thing from me is about trying to Try to build a UQ RDM community So obviously the UQ RDM is for UQ researchers But we have made the code available for evaluation by other universities
So I think four within Australia and in New Zealand had access to the code And it's about sharing the lessons that we have so far encountered So that's it for the presentation And we'll be quite happy to answer your questions Fantastic, thanks Sandrine and thanks Andrew I've got a couple of questions that have come through
The first one was a really early one that said What is the uptake of EduGain internationally? Do you know it? We thought it was very consistent, turns out no
It's actually pretty good We don't have hard numbers on our success rate with international partners I would estimate it to be around about 50-50 if not better What's interesting is that different countries have made decisions about how they use EduGain So for example the Dutch group, SurfNet I think it's called
They make a decision where if you join a service to EduGain A university in the Dutch consortium has to ask to have that service included Before the Dutch researchers will be able to use it Whereas in every other country every service is available In the UK there's some sensitivities around releasing EPPN
Around identifying information And yet EPPN is a required attribute of the EduGain federation The approach we've had to take is there's 2700 plus institutions within EduGain Which means that we have to ask our researchers to try and let us know where it fails
Where it has failed we've pretty much always succeeded by contacting the technical contact Which is available on the web in the EduGain federation Contacted them making the case This is important to us as UQ It's important to these group of people within your university That part is critical You can't approach the other university and say you should do this for UQ
They don't care Please It's yeah please even that doesn't work It's about is the identified if they have prof after and probably works better Saying here's more people in your university who want to work with UQ with this system If we haven't failed once with that approach Sometimes it takes time but we haven't failed
Fantastic That's a very good way of going about it for all sorts of systems not just EduGain Okay the next question we have is What is UQ planning to do with data after the retention period? This is a non-solved question We don't have any plan for deleting anything So we're currently in talks with the ITS
Trying to figure out a solution for data storage And we're talking long term And it's certainly not a technical problem There are existing cloud technologies which make it very cheap to store data long term It's more about the policy around it So yes the archiving workflows which are coming Allow us to keep data on ice forever at the end of a seven year five year whatever project
What's not in place is the university policy around After 25 years who makes the decision to delete it? That's the hard question It's around what's the process? Do you email the CIs? If they don't respond what do you do? Do you email the head of school? Who makes the call? And this is we're trying to embed this information when we make the archive
To say who has the power to delete this long term Which is why we're taking time to implement the archive I've just got a question on that one You link the data in the archive to the DOI on it to see whether it's been used I'm sure that that would be part of the decision making process there as well
Absolutely That's always a hard one there Okay another question I'm interested to know if your code is able to be evaluated by government funded Science organizations Why not? I'd say that would be a email Andrew or Sandrine and ask
answer to your question there The advice is around for non-commercial use That's fine Okay The next question is Andrew mentioned asking what types of data researchers have Can he please elaborate on what he meant by this?
That's very simple to say Is the data human? Is the data human identified? Do you have data which requires access to HPC? It's really a very small list of questions If you think about it in terms of the business requirements It typically comes down to things like the Australian Privacy Act Identify human data has to be within Australia
It's a very small subset If you'd like details we're prepared to share But I've effectively just said the only questions we need to make a decision Okay I think linked with that might be Which institutions in Australia are currently accessing the UQIDM code?
I will need to go back to my list I think we've got University of... No, I don't want to say any wrong things Email and ask I think is the answer to the question on that one Okay
Does the system have any reporting function? In time So at this point in time it's an ad hoc reporting to the head of units We tell them how many records they've got in the system Who are the institutions that the researchers are collaborating with
So it's very generic It's not very detailed because the policy in place to share that metadata hasn't been set up yet So we are cautioning them And I would say what is the amount of data in the system?
It is immense but it's about keeping researchers trusted and this is critical So our project control group is 50% academic represented from every faculty Every level ABCDE across the whole university The last thing we want to do is say Here's this magical reporting tool
If you use it the university will know exactly what you're doing Can't go bad The researchers I don't know whether it's understandable or not But they're cautious about being followed minutely And we have to be very cognizant of that at all points in time And the feedback we give to them is there is not a policy
Until there is a policy which everyone will have access to The reporting is minimal right now It's de-identifiable Okay We just had somebody who's piped up the UOW I believe is one of the institutions which does have access
Yes actually Then someone says excuse my ignorance but what is the 25 year period Retention in relation to Clinical trials is the typical retention period for these 25 years For drug development work for example it's infinite
So 25 years was really an example for long term That's alright The next question is will the SYNC clients work for other universities and are they open as well? Absolutely it's based upon the next cloud platform which is free open source
The SYNC client is how the international and national collaborators and the industry partners get access to the data if they wish or they use the web front end Yes Okay well the next one is when you say the system code is available for evaluation is it open source I think you may have just answered that question on that one
It's not a released open source product yet because UQ is keen that they protect their interests in it which means it will be released under an open source license but it's going through trademarking of various things now and the university does not want to release under an open source license until those things are in place That's understandable
Somebody's asking why next cloud and not own cloud That was the decision of ITS there are some technical reasons around it It's predominantly around the way external storage is mounted into a person's storage If you're keen on details contact me I can say why
I understand the community split from own cloud and next cloud Which means we don't maintain compatibility with our net cloud store There are a few technical reasons only it's not philosophical Okay and then do you think there will be one system that will service multiple universities and PFRA's?
That would be lovely That should be the Santa Claus is it? That's certainly a Santa Claus It's Mars, not the moon Yeah it's such a complex landscape here and universities all have different requirements and understandings
I wish the answer was yes practically I think if we could get even one or two universities onto this sort of system and share the technologies that would be fantastic Okay next question is the metadata from the UQ RDM published into RDA and or into your eSpace or other repositories for discovery?
So it will be yes the data set will be pushed through eSpace and then after that it's the traditional institutional repository So if the question is around the metadata of the project no that's always private but the archive the data sets that come out of that yes are pushed into eSpace which is pushed onto RDA
There were a few discussions around this should this metadata run all projects and UQ become available after a certain period of time Numbers were thrown around like seven and ten years we haven't done it is the answer Okay that's the last of our long list of questions so if anybody has a
quick question that they want to put in there just put it in there now I just wanted to ask you about you said that it was voluntary to get into it at the moment what has been your uptake and is it a sort of a exponential uptake or just a linear little gradual
So it's fairly very swift 22 weeks now since we launched a platform and we've got a thousand records we've got 1300 unique users so no it has been very quick
It grows on average by about three and a half terabytes a week of storage of data going into the system across the university grows by about 50 users a week embryonicly without us doing anything at about 10 to 15 projects a week with no effort on our own every time we do
a rollout we get a bump of about 30 to 40 projects in the system and it really depends upon local policy some schools when we say it's not mandatory it's a school decision in UQ around how they implement some policies in some schools they've made it mandatory for all HDR students to have a record on the RDM system by their candidature which is M1 which is one year into
their candidature but again the school decision we don't push it okay somebody says well done So we're providing a lot of support in terms of wordings in terms of of policies making sure that it's you know for example in HR induction list or
HR exit list or this kind of thing so it's about raising awareness but ultimately at this point in time the schools decide and our approach has always been there's no point mandating it because if it's not good enough they're already researchers are already non-compliant
we're not auditing we know Dropbox is used we know other things are used there's no point mandating a system if you can't meet the demand it doesn't work and it's about it's a slow it's a slow implementation if we if we decided that we needed to mandate it straight away perhaps
we might have been capable to have a year take or it would have caused more growing pain and break the trust of researchers. Okay we've got a whole bunch more questions that have come in so are you advising extra sensitive data users e.g health linkage data clinical trials
etc to use the system? Yeah all right there's two parts of the answer to this one is okay I'll answer the technical side. The answer is we've built the system to handle it so this means for most contracts we see we can meet the needs of the contract which means we can technically
guarantee that the data will only be seen by these four researchers in this IP address range and we'll keep every copy of every data and who changed it. Unfortunately there's also the question which is Sandra. So we take a case by case approach where we actually ask researchers
to check in the contract whether what are the conditions imposed upon them for storage for sharing so we cannot say we cannot say cookie cutter approach it's going to be good for everyone go for it we yeah we have to be cautious. I think this is a statement here it
says the 25 years there's a retention period for clinical trials with newborns seven years is standard period for attention with trial data but starts when the youngest trial participant turns 18 so that's for those who are interested okay and unfortunately
these things change often so we ask at the start how long do you need to keep it for because in five years time that information may be out of date. That's true so is that something that going back to people is that something that you're intending to build into the system then? We've thought about feeding it back but we're also cautious about people
gaining the system so that means if we give them feedback and when will they tick the boxes and say we'll keep this for 25 years we're not keen on researchers if they're a wily bunch they'll figure out which buttons to press to meet their needs even if they don't understand. Well they are the sort of minds we want in research aren't they I think.
Come in here it says a very interesting talk thanks first time I've come across someone who else thinking project centric not researcher centric so that's a a very different basis for your system and that possibly will you know be a very big foundation which will make it
different from what else is out there. The next question I had is how many person years was it to build the system? Development effort or all the people? It just says to build the system. Okay so the build assistant development effort one and a half FTE for two years.
In terms of complete effort it's probably an average of three and a half FTE for two and a half years total. A lot of in-kind, a lot of support, a lot of talking.
Okay and sorry next one with respect to extra sensitive and keeping access to authorised users only does your Nextcloud sync manage sharing data? If it's extra sensitive data we turn off the synchronization the share by link functionality within Nextcloud and this is one of the reasons we chose Nextcloud. They didn't support the time so it means we can manage how users are
allowed to share data within the web interface based upon the type of data on a project basis so for each external share and yes we do only allow identified researchers to access the data in a identified human data collection. Okay now the next one is is there a demo or recorded
demo to see how a researcher would interact? I presume that means interact with your system. So we've got a user guide that is available to anyone. We've got a couple of videos there
to demonstrate the system so yes the answer is yes. And are there checks for file obsolescence built into the archival stage? Andrew's going yeah that one thought about that one. Okay I would like them to be.
The answer is is that I am very keen that the archiving workflows are not just a dump it and put it on ice. There are some existing systems out there probably spearheaded a lot by Pete Serfton in UTS where there are long-term data archival and curation services that work
in the same way as an archivist works in a library. You don't just store something and shop on the shelf and run away for 25 years. It's about managing that and making sure that documentation and things are up to date. The way we're storing the long-term archives is in bag-it format, possibly data crate. Because it's in a managed format it means we could run
routine checks year on year to make sure file formats are up to date and convert on the fly if possible. It is certainly something we're thinking about. There is no international consensus that I can see on how to do this yet. We will make a decision on that shortly but we will
certainly set it up so that we can. Fantastic. There's just a comment that is with regards to keeping your next cloud sink management and that basically that's super cool that you're able
to turn off the the sinking there. So that's the last question that we have. Oh no another one just got snuck in. Can the user guide be made available outside UQ? Sandrine it seemed to be that it already is, is that correct? Yeah absolutely. So how would someone find it then?
Just contact me. If you forward things around we can send a bunch of links to YouTube videos. If you send them to me everybody who's on the webinar will get a an email when the recording is available and I can put them in that same email if you'd like and then we
can put them at the bottom of the YouTube description as well so that's easy enough to do. Okay oh goodness me they keep coming in. Who did you get to do the voice over for the obviously gone out and checked them out already. Fantastic so I think that's probably time to wrap
up now. Thank you so much Sandrine and Andrew for your time today. The number of questions coming in has meant that people are really engaged with this topic and really interested in it and we greatly appreciate your time with that. So thank you everybody for
coming today and thank you again for presenting and we'll see you next time.