PiDs Short bites #4 - RAID, a PiD for projects - 2 Nov 2017

Video in TIB AV-Portal: PiDs Short bites #4 - RAID, a PiD for projects - 2 Nov 2017

Formal Metadata

Title
PiDs Short bites #4 - RAID, a PiD for projects - 2 Nov 2017
Alternative Title
PiDs Short bites #4: A PID for projects - RAiD
Title of Series
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2017
Language
English

Content Metadata

Subject Area
Abstract
This webinar will cover: RAiD PID, issues surrounding Research Data Management planning and processes. Speakers - Dr Andrew Janke, Software Architect DLCF Project, RDS - Siobhann McCafferty, Project Manager, DLCF Project, AAF
Addition Data management Identifiability Service (economics) Series (mathematics) Software framework Series (mathematics) RAID RAID Event horizon Demoscene
Computer program Service (economics) Sampling (statistics) Physicalism Information privacy Number Data management Software Series (mathematics) YouTube YouTube Geometry Row (database)
Group action Touchscreen Component-based software engineering Intrusion detection system Universe (mathematics) Feedback Statement (computer science) Student's t-test Statistics
View (database) Function (mathematics) Quicksort RAID Machine vision Physical system Number Traffic reporting
Data management Identifiability Multiplication sign Moment (mathematics) Diagram Function (mathematics) Disk read-and-write head
Process (computing) Intrusion detection system Point cloud
Identifiability Bit Mereology Formal language Data management Data management Process (computing) Software Repository (publishing) Software Repository (publishing) Physical system Row (database)
Point (geometry) Pell's equation Link (knot theory) Quantum state View (database) Touch typing Library (computing) Formal language Physical system
Data management Connected space Strategy game Channel capacity Software framework Bit Connected space
Point (geometry) Group action Service (economics) Identifiability Authentication Data storage device RAID Metadata Area Product (business) 2 (number) Data management Chain Process (computing) Service (economics) Standard deviation Channel capacity Fitness function Data storage device Computer Metadata Computer network Usability Group action Portable communications device RAID Data management Software Function (mathematics) Chain Right angle
Point (geometry) Trail Group action Service (economics) Gradient Disintegration Data storage device Function (mathematics) RAID Metadata Number Neuroinformatik Uniform resource locator Data management Component-based software engineering IP address Intrusion detection system Set (mathematics) Authorization Automation Energy level Process (computing) Resource allocation Collaborationism Service (economics) Email Trail Information Point (geometry) Data storage device Computer Digital object identifier RAID Connected space Component-based software engineering Pointer (computer programming) Process (computing) Resource allocation Internet service provider Self-organization Metric system Physical system Spacetime Address space
Point (geometry) Service (economics) INTEGRAL Multiplication sign View (database) Range (statistics) Virtual machine 1 (number) Event horizon Goodness of fit Mathematics Bit rate Different (Kate Ryan album) Energy level Physical system Feedback Moment (mathematics) Moment of inertia Limit (category theory) RAID Internet service provider Order (biology) Video game Self-organization Quicksort
Web page Point (geometry) Group action Service (economics) Identifiability Link (knot theory) INTEGRAL Multiplication sign Decision theory Sheaf (mathematics) Mereology Metadata Revision control Bit rate String (computer science) Cuboid Series (mathematics) Data structure Position operator Computing platform Physical system Standard deviation Key (cryptography) Information Moment (mathematics) Data storage device Content (media) Bit Maxima and minima Line (geometry) Lattice (order) RAID Data mining Data management Uniform resource locator Process (computing) Internet service provider Website Resolvent formalism Writing Row (database)
hi and welcome to today's webinar on the topic of raves research activity identifiers so let's get started so my name's Natasha Simon's from the Australian national data service or Ann's and I'm your host for the webinar today and my colleague Susannah sabayon is behind the scenes co-hosting the webinar with me so this webinar will introduce you to raid a new addition to the world of persistent identifiers or PID events raid stands for research activity identifier and is an ID for research activities which places the project at the center of the data management lifecycle our two speakers for today dr. Andrew janky and Shahbandar mccafferty from the data lifecycle framework project or dlc a fair shot and raid is the first of the dlc f enabling technologies to be developed so this webinar is the fourth in a series examining persistent identifiers and their use in research the first webinar looked at citing grey
literature using duis and the second introduced the international geo sample number for physical samples and the third explode explored the global initiative called scolex which is linking data and publications and you can find the recordings of these webinars on the and YouTube channel so
I'd also like to acknowledge the Commonwealth Government for their support of Ann's research data services and nectar under the interests program and now to introduce our speakers for today dr. Andrew Genki who is a software architect for the DL CF project and research data services based at the University of Queensland and two one mccafferty who is the project manager for the DL CF project and the Australian works for the Australian access Federation in Brisbane so I'll now hand over to Andrew and Sean I'm gonna start
with a few horror stories because that's always fun some of the things that do see approaching tried to solve or had a look at when we talk to universities that the star was what are some of the problems you were having and there were people expressed multiple things a lot of things are around lost data people can access data and what it came down to in
the end is that universities had no great way of identifying what a project was there was good notion of what a project was within an institution but the best research is done collaboratively so people want to be able to get around this problem somewhere and what came out of that is that we knew we needed a project ID we looked around the Silver's available internationally we couldn't find much we talked to the other group and their advice to us was have a go so we didn't and we care with the notion of rayon which is a research activity ID we actually started with project IDs but well it was semantically associated with project was a problem so we backed off to research IDs one of the things we also need from the start is that researchers do a lot of things some of the things are on the screen there and unfortunately often doing research is a small component and we knew that researchers also from their feedback had
a lot of what they call black holes so they were reporting you to a lot of systems already so in grants and finance and publications and the last thing we wanted to do was develop yet malleus system around raids where researchers had to report information into without getting any benefit we know there's a number of systems around already the recording what our projects are about typically around outputs only and we
sort of took this view of research is that what we tend to publish is near the nice shiny vision that what happens in the basement is probably only a half story of the working day what actually goes on underneath that often who really know is what goes under the project so we knew we had to keep all these things in mind I'm going to get on my soapbox
now and say from the start of this project we were very keen to say the research data is associated with projects and not people and this is because we're in the situation where institutions are responsible for the outputs of research beyond the time when the researchers are left so we knew we had to structure the rail identifier around that idea if I look at my you
know very simple diagram we all tend to draw a draw of how the research lifecycle works is we have it in our heads that people are going to fill out a data management plan at the start of their research and then they go and do something put some data somewhere they're going to produce outputs we know this doesn't necessarily work but if you look at existing tools out there like DMV tool figshare and DMV online who we could use as a project ID we recognize that these tools were starting to work on the dollar really towards the publication and output stage they're trying to change this now a lot of these tools around where they actually interact with researchers but for the moment I would put them in the outreach so the problem is research probably looks more like
this and it meant that we weren't really
covering are looking for IDs and issues around research data that happened at the start of the process so I've drawn little black cloud over here of the probably probably not capturing well so
we knew we had to change the language a little bit when we go out and talk to people about research activities ideas and that is we're not talking about a project repository or a repository software tool like being sharing one of these we're talking about research lifestyle and management and we're talking about attaching identifies throughout this process so it's about
attaching the idea to arrayed identifiers at the style and a research it has an idea or wants to get some stories in their institution and they use a DMR system a data management record system to provision some stories to their project or the project they're associated with and the idea is is if we can get this get the right identifiers attached at all parts of a lifecycle we can't necessarily join everything up yet but at least we can find it so we're about taking there from the a part of fair I think I our will come in time but if we can just attack findable and accessible I think would have done very well
another thing we knew we had to do is that change some of the language around what people say project is the existing view of a lot of projects like orchid is that a project is a noun it's a thing describing something that's that happened to us in ray or indie LCF project is a verb it's a continually evolving project which road continues to associated with and the various touch points that researchers interact with with things that help them get their research done so we knew we had to build a system that was essentially transparent to researchers and they would just happen by the infrastructure in the background so I
know that Sean so following from
identifying a lot of these these issues the done lifecycle framework started to come up with what we called enabling technologies so the DL C F is a little
bit of background so national strategy to connect research infrastructure through slowly increase the science capacities we've got five stakeholders
will be easily recognizable RDS and its nectar and ends and all of these groups do different things so those capacities
sorry influenced what the deals have decided to focus on so our enabling technology is our cloud-based connectors and identifies for research data tools storage and outputs the first of those was identified as a project ID which became raid second and group ID and group management service and meta data store and api's so it's no we're talking about about rain mainly the need for these persistent identifiers was because we needed a really simple tool we needed something that connected things up without being heavy heavy software or threading very taxing for researchers or for the institution's themselves and had to be something that could be quite responsive as semi-automated so we wanted a PID and the benefits are standardization metadata mechanism creating a persistent chain of products supports fitting principles wai-wait flexible tool and facilitates rich data linkage right is simply point
to handle as minted via ends and attached to it is what we call the DMR which is the right military manifest so the manifest contains other persistent IDs related to the project so do eyes organs other reigns potentially group IDs tools and societies in any other rich metadata that the service provider is the point of which the Rhodes creator wants to implode
so I'll show you what a rain looks like on a conceptual level you can see on the left here we've got the rain number it's really simple service point day Mountain year that that's meant to attach to it we have these potential spaces so space for the group ID which can include all kids or email and DIY grids or is knees tall services and upgrades and further kind of abstract it out you can see what could go in there what they can do with them and here's an example so the raid being in a simple hand or has attached to it actually a lot of information it doesn't hold any of the information we don't hold any data it's just metadata at the bag of pointers that gives you a timeline of what happens during the project as process and what we hope is by the use of simple things like persistent IDs that we can link up all of this pre-existing infrastructure semi-automatic things make research easier and make tracking research easier for institutions that gives you improved visibility and research tivity components and people the tools processes storage and outputs involved in each project gives you an audit trail for research processes and it improves the visibility of collaborations across institutions the infrastructure you get metrics for use and connections you can automate storage and provisioning you can also automate allocations storage and compute and you can access authorization you can automate access and authorization so that's me
thanks very much so I'll just ask you
both have been talking a lot about raids at conferences and events for a little while now what sort of feedback have you had from people on on raids what sort of questions do you get asked yes so I can ask from a technical point of view interested in how long it takes to integrate with with the range system because it obviously makes changes in how I did policy at their institution and it needs institutional take it's down to the actual technical implementation because we've designed rain to be more machine to machine rather than a manual service the feedback we get is that the integration time is very short it's not a hard thing to do what is always hard is getting the policy and the agreement in place that this is a good thing with respect to the feedback we've got from institutions around Israeli good idea and I think the answer is a resounding yes in that they know they need something to identify projects across some institutions but again implementation and although all the things are an inertia of large organizations we've actually gained even you know corporate your own country we've gained a lot better uptake and feel feedback in some ways from New Zealand then we have from Australia this is probably because of their different approach to things so in New Zealand many may know that the Arteaga orchid was driven by the department rather than body institutions so the department's at MBA you said this is a good idea we will help all our institutions go forward with this and we're looking as to whether we can do the same thing around Ray so we know that the New Zealand DeBaun of Education is interested in developing a project idea across a limit on attrition so they see this as perhaps a good way to do that so we get many many and varied feedback we get even some researchers who say they're really I wouldn't say the clever ones but the ones who were converted so we just start and I say how do we integrate and for the moment we really can't help them much because we're not aiming the service research right now it's a something that should happen in the background without them having to put any effort into it and should just make their life easier that was the goal of it but that means that in order to provide a level of trust around the road providers and the Roundy's it means we must have agreements in place with service providers who are linking to rate so that doesn't exclude researchers but it gives them a different level of access to what they might have expected well
follow-up question I think you've probably answered this that our rates only for projects at Australian institutions know at the moment we're starting the Izone process for right so probably an international standard DNA ed standard and then an international standard so it was begun with the intent but it was international because researchers work internationally they're not just working in Australia so yeah it's international okay thank you the next question accessing the raid API will it be publicly available that's the first part of the question yes if there's a chanting in here I'm tapping in right now where you can go have a look it's API Durand a talk about you API doc raid okay you people can join that and we will we can also assume that we can also send that link around anyway recording people who are interested in looking in that a bit more if you an you I don't I don't read it or go you there's a pretty version of the API which allows you to interact with it and see interactively what you're doing now to get beyond the initial party will be the key and if you need then contact us and we will provide you with a trial key which doesn't require anything beyond B's in trial system yeah okay so at the moment people don't need to register to have a look at that they can just they can actually access the API as it is yes but if they know they can contact you yeah okay so next part of that question is are you aware of how many systems have integrated with raid so far and do you how many exemplars we're very aware and so raid was launched in April and it has a small amount of people who are integrated with it so far probably the best example is a new curious new audience system which engine is involved with so the DMP kind of talk for you curious it's been updated and has integrated right and storage positioning which comes out of there so it's a pretty good example also ridden box wrap which is where Doc's have made for us very lightweight version of ready box the research exciting portal which has integrated rein capability with an credential can user in box wrap so that's a minimal DMR system that will allow you to put in the minimal information so projectile and may be an institution and a few people and that will generate array for your project around the DMR but as accessible as if somebody wanted now to mint arrayed or start meeting rates how would they go about doing that and something that your institution or research group should do for you so we talked about service providers or service points and that can mean any platform with integrated write capability so if I as the researchers thought it was useful to ever rate I would go and hopefully pull to my data management planning pane or talk to you whoever runs mine my signature on and have them integrate rain and to be management processes and have arrangement of fear that it's not something you do yourself it should be through your institution because we need it that really rich metadata that comes out of I'm a researcher from here I'm using this piece of machinery these are the people involved in my projects and they're not intend it to be this is my project it's intended to be this is a project I worked on ok and so great and it's a scissor it's a handle so when you click on the handle what page does that how does that resolve what does that resolve - I guess the magical content without a string loose required by hands for people who need a handle yes so the handles will resolve now they're resolved to a page on the raid website which essentially this rate is valid or not that's a decision by the service providers of the institutions that in time we model the underlying data structure of the information about a project on the Rif CS service activity resi schema sorry this means that in time if a institution chooses to make information about a project public then that handle will resolve to the service record within our da and we will update and obviously will migrate the information into the service record in the public section of research side restraint is there anything further you'd like to add we've come to the end of questions is there anything no but we're going to clear from anyone who would like to have a go on integrating the radio API into the institution of our research groups workflows the more people that use it the more useful it becomes so we're looking at the moment improving uptake in Australia and the rest of the world so if you're interested job as a line and no cost to you I sometimes give people money so we want people to use it it's federally backed it's not going anywhere it's a really useful little thing okay okay and it's raid okay you said the website address yes okay fantastic well thank you everyone for attending today's webinar and just a reminder that this is one as part of the persistent identifiers webinar series you can look out in the ends online news or on our website for an announced that of the next webinar in the series but that will
not be until next year now and thank you again to Andrew and Siobhan for making the time to give their webinar today thanks I all
Feedback