Storage for Research Data #3 - 21 July 2016

Video thumbnail (Frame 0) Video thumbnail (Frame 1465) Video thumbnail (Frame 2572) Video thumbnail (Frame 4872) Video thumbnail (Frame 5867) Video thumbnail (Frame 8459) Video thumbnail (Frame 11057) Video thumbnail (Frame 12206) Video thumbnail (Frame 13344) Video thumbnail (Frame 14652) Video thumbnail (Frame 17669) Video thumbnail (Frame 18668) Video thumbnail (Frame 19758) Video thumbnail (Frame 20896) Video thumbnail (Frame 21981) Video thumbnail (Frame 23040) Video thumbnail (Frame 27936) Video thumbnail (Frame 29998) Video thumbnail (Frame 31798) Video thumbnail (Frame 37902) Video thumbnail (Frame 41093) Video thumbnail (Frame 42287) Video thumbnail (Frame 43332) Video thumbnail (Frame 44540) Video thumbnail (Frame 45607) Video thumbnail (Frame 46736) Video thumbnail (Frame 47758) Video thumbnail (Frame 49770) Video thumbnail (Frame 51714) Video thumbnail (Frame 59177) Video thumbnail (Frame 60357) Video thumbnail (Frame 61374) Video thumbnail (Frame 76984)
Video in TIB AV-Portal: Storage for Research Data #3 - 21 July 2016

Formal Metadata

Storage for Research Data #3 - 21 July 2016
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Providing secured, trusted and reliable access to data storage during the course of research is of critical importance to minimizing risks associated with unauthorized access, accidental loss, and the protection of personal and sensitive data. Yet at the same time, research is becoming more international with research teams and collaborators spread across many institutions and geographic locations around the world. There are clear needs for researchers and collaborators to access working data remotely in the field and from different countries. In this third webinar of the Research Data Information Integration Webinar Series, a panel of speakers will discuss data storage in use (or in development) at their institutions to support the management of data in the course of research.
Slide rule Information INTEGRAL Disintegration Data storage device UML Data management Mathematics Goodness of fit Series (mathematics) Gastropod shell Right angle Information Series (mathematics)
Enterprise architecture Service (economics) Game controller Service (economics) Information Key (cryptography) Computer-generated imagery Data storage device Connected space Data management Data management Uniform resource locator Different (Kate Ryan album) Universe (mathematics) Fingerprint Physical system
Slide rule Presentation of a group Disintegration Moment (mathematics) State of matter Data storage device Code Metadata Data storage device Term (mathematics) Repository (publishing) Cuboid Diagram Information Descriptive statistics Spacetime
Server (computing) Demo (music) Repository (publishing) State of matter Repository (publishing) Code Metadata Hill differential equation XML Metadata Row (database)
Point (geometry) Focus (optics) Greatest element Presentation of a group Service (economics) Connectivity (graph theory) Projective plane Moment (mathematics) Source code State of matter Data storage device Fitness function Virtual machine Code Set (mathematics) Bit XML Data management Process (computing) Repository (publishing) Repository (publishing) Hard disk drive Diagram Physical system
Context awareness Group action Presentation of a group Disintegration Computer font Mereology Metadata Medical imaging Latent heat Synchronization Computer configuration Term (mathematics) Different (Kate Ryan album) Electronic visual display Descriptive statistics Computing platform Physical system Service (economics) Focus (optics) Projective plane Moment (mathematics) Data storage device Code Planning Staff (military) Digital object identifier OSI model Computer configuration Hard disk drive Flux Row (database) Library (computing)
Functional (mathematics) User interface Link (knot theory) Structural load Demo (music) Password Focus (optics) Metadata Cross-site scripting Duality (mathematics) Cuboid Summierbarkeit Game theory Descriptive statistics Dean number Physical system Civil engineering Data storage device Code Process capability index Menu (computing) Digital object identifier Web browser Sign (mathematics) Single-precision floating-point format Process (computing) Sheaf (mathematics) Quicksort Row (database) Address space
Point (geometry) Wechselseitige Information Asynchronous Transfer Mode Greatest element Freeware User interface Link (knot theory) Computer file View (database) Mobile Web Programmable read-only memory MIDI Discrete element method Metadata Uniformer Raum Blog Musical ensemble Ranking Information Gamma function Game theory Wireless LAN Dean number Thumbnail Physical system Metropolitan area network Link (knot theory) Observational study Axiom of choice Touchscreen Haar measure Real number Web page Lemma (mathematics) Latin square Content (media) Data storage device Menu (computing) Coma Berenices Computer network Ripping Peg solitaire Inclusion map Internetworking Personal digital assistant Quicksort Constructive solid geometry Row (database)
Functional (mathematics) Service (economics) Link (knot theory) Multiplication sign Structural equation modeling Data storage device Computer icon Number Neuroinformatik Confluence (abstract rewriting) Wiki Hooking Term (mathematics) Synchronization Multiplication Computing platform Projective plane Data storage device Shared memory Menu (computing) Instance (computer science) Limit (category theory) Web application Type theory Digital photography Software Filesharing-System Communications protocol Physical system Spacetime Row (database)
Freeware Service (economics) Programmable read-only memory Computer-generated imagery Demo (music) Thermal expansion Content (media) Different (Kate Ryan album) Species Website Game theory Dean number Computer font Observational study Web page Computer file Projective plane Data storage device Fitness function Process capability index Range (statistics) Instance (computer science) Group action Sequence Mathematics Basis <Mathematik> Blog Normed vector space IRIS-T Form (programming)
User interface Computer file Programmable read-only memory Thermal expansion Hooking Different (Kate Ryan album) Website Physical system Observational study File format Computer file Menu (computing) Mass Range (statistics) Open set Rothe-Verfahren Frequency Repository (publishing) Personal digital assistant Normed vector space Software testing Object (grammar) Quicksort Row (database) Library (computing)
Gateway (telecommunications) Functional (mathematics) User interface Link (knot theory) Computer file Thermal expansion Average Planning Data management Process (computing) Website Game theory Enterprise resource planning Physical system Observational study Online help Adaptive behavior Data storage device Shared memory Code Mass Range (statistics) Menu (computing) Expandierender Graph Digital object identifier Open set Inclusion map Uniform resource locator Software Basis <Mathematik> Oval Repository (publishing) Commodore VIC-20 Normed vector space Energy level Row (database)
Computer icon Electronic data interchange Link (knot theory) User interface Principal ideal domain Sine Programmable read-only memory Interior (topology) Data storage device Electronic mailing list 1 (number) Structural equation modeling Data storage device Inclusion map Personal digital assistant Software testing MiniDisc Summierbarkeit Game theory Dean number Spacetime Row (database)
Programmable read-only memory Hill differential equation Quicksort Open set Game theory Physical system
Implementation Server (computing) Service (economics) User interface Multiplication sign Structural equation modeling Hidden Markov model Perspective (visual) Neuroinformatik Centralizer and normalizer Blog Game theory Position operator Identity management Physical system Self-organization Collaborationism Moment (mathematics) Data storage device Shared memory Login Mass Bit Maxima and minima Limit (category theory) Connected space Process (computing) Software Oval Universe (mathematics) Hard disk drive Website Astrophysics Spacetime Directed graph
Meta element Sensitivity analysis Game controller User interface Multiplication sign 1 (number) Plastikkarte Metadata Uniform resource locator Term (mathematics) Document management system Cuboid Data structure Drum memory Gamma function Normal (geometry) MiniDisc Summierbarkeit Descriptive statistics Wireless LAN Link (knot theory) Patch (Unix) Suite (music) Interior (topology) Data storage device Coma Berenices Computer network Menu (computing) Database transaction Ripping Group action Digital object identifier Plane (geometry) Permanent Internet service provider Data center Hill differential equation Gastropod shell Simulation Local ring Acousto-optic modulator Library (computing) Row (database)
Windows Registry Mobile app INTEGRAL Multiplication sign Data storage device Metadata Data management Term (mathematics) Cuboid Information Physical system Service (economics) Data storage device Bit Line (geometry) Windows Registry Connected space Data management Software Repository (publishing) Series (mathematics) System programming Point cloud Quicksort Spacetime
Group action Mobile app Service (economics) Computer file Multiplication sign Set (mathematics) Mereology Metadata Revision control Latent heat Root Computer configuration Operator (mathematics) Repository (publishing) Videoconferencing Authorization Cuboid Diagram Data conversion Traffic reporting Descriptive statistics Physical system Collaborationism Enterprise architecture Email Information Software developer Interface (computing) Moment (mathematics) Data storage device Staff (military) Bit Line (geometry) Connected space Proof theory Data management Process (computing) Loop (music) Repository (publishing) Point cloud Cycle (graph theory) Quicksort Library (computing) E-learning Row (database)
Default (computer science) Computer file Bit rate Observational study Term (mathematics) Static random-access memory Videoconferencing Videoconferencing Coprocessor Embargo
Medical imaging Observational study Frequency Computer file Integrated development environment Information Green's function Videoconferencing 8 (number) Data dictionary Embargo
Type theory Observational study Computer file Information Green's function Right angle Videoconferencing Approximation Metadata Row (database) Physical system
Point (geometry) Email Observational study Frequency Information Green's function Information Videoconferencing Embargo Metadata Library (computing)
Process (computing) Observational study Green's function Source code Data storage device File archiver Set (mathematics) Instance (computer science) Videoconferencing Physical system Library (computing)
Meta element Observational study Information Dependent and independent variables MIDI Food energy Repository (publishing) Green's function Row (database) Data conversion Videoconferencing Row (database) Library (computing) Physical system
Trail Group action Presentation of a group Computer file Disintegration Data storage device Number Duality (mathematics) Cuboid Information Data structure Service (economics) Dialect Software developer Moment (mathematics) Projective plane Data storage device Demoscene Connected space Database normalization Uniform resource locator Process (computing) Series (mathematics) File archiver Point cloud Iteration Quicksort Library (computing)
Functional (mathematics) Computer file Variety (linguistics) Virtual machine Set (mathematics) Data storage device Staff (military) Neuroinformatik Number Data management Spreadsheet Goodness of fit Computer configuration Different (Kate Ryan album) Term (mathematics) Set (mathematics) Cuboid Physical system Computer architecture MIDI Standard deviation Link (knot theory) Web portal Information Demo (music) Projective plane Data storage device Shared memory Metadata Bit Line (geometry) Instance (computer science) Cartesian coordinate system Virtual machine 10 (number) Connected space Uniform resource locator Data management Process (computing) Internet service provider Mixed reality MiniDisc Quicksort Physical system Flux Row (database) Directed graph
Email Link (knot theory) Computer file Distribution (mathematics) Maxima and minima Analogy Similarity (geometry) Average Emulation Supercomputer Graphical user interface Cuboid Row (database) Multiplication Window View (database) Archaeological field survey Computer file Instance (computer science) Bookmark (World Wide Web) Forest Mathematics Uniform resource locator Website Computer music Simulation Row (database)
Polygon User interface Distribution (mathematics) Computer-generated imagery Collaborationism Limit (category theory) Average Variance Area Word Bubble memory Pattern language Cuboid Species Row (database) Website Address space Physical system Window Link (knot theory) Information View (database) Image resolution Computer file Interior (topology) Computer Metadata Menu (computing) Linear programming Bookmark (World Wide Web) Similarity (geometry) Forest Mathematics Web 2.0 Series (mathematics) Internet forum Endliche Modelltheorie Row (database)
Context awareness Distribution (mathematics) Gradient Multiplication sign Direction (geometry) Range (statistics) 1 (number) Set (mathematics) Price index Data analysis Mereology Food energy Fraction (mathematics) Web 2.0 Mechanism design Set (mathematics) Cuboid Row (database) Species Information Local ring Resource allocation Position operator Physical system Computer icon Metropolitan area network Logical constant Touchscreen Web portal View (database) Temporal logic Software developer Computer file Moment (mathematics) Feedback Electronic mailing list Shared memory Data storage device Metadata Internet service provider Menu (computing) Staff (military) Bit Instance (computer science) Bookmark (World Wide Web) Sequence Open set Type theory Process (computing) Database Internet service provider Endliche Modelltheorie Modul <Datentyp> Quicksort Flux Row (database) Spacetime Asynchronous Transfer Mode Game controller Server (computing) Functional (mathematics) Service (economics) Link (knot theory) Computer file Motion capture Virtual machine Electronic program guide Similarity (geometry) Online help Student's t-test Expert system Interprozesskommunikation Wave packet Goodness of fit Term (mathematics) Game theory Window Focus (optics) Information Online help Interface (computing) Projective plane Interactive television Computer network Directory service Similarity (geometry) Mathematics Word Integrated development environment Universe (mathematics) Point cloud Iteration
Service (economics)
all right good afternoon everyone let's start welcome to the webinar series in research data information integrations this is the third of our series today we're going to be looking at storage for research data my name is Paul Wong I'm your host today my colleague Susanna and shell bang whoo because think today's webinar virtually with me research is certainly changing verbally one change is that there's great emphasis in accessibility and reuse of data and better management of research data leading to better research in the long run now a unifying theme of this webinar series is the idea of the research data management lifecycle as I show you the next slide we have new
previous webinars one on research data management planning one on ethics clearance for research data so today we will be talking about storage locations for research data now through this unifying theme of research data management lifecycle we want to get a better understanding of how research data informations is integrated throughout the lifecycle that means that we need to look at the connectivities of different enterprise system to support the management of research data our
speakers today today are from Deakin universities christopher whose research directors are big keys in your librarians from University of Newcastle and RJ major research data services at James Cook universities so I'm going to pass the controller to Christopher today
I was just can t run over what we're doing here taking in terms of this spice of integrating storage with description discovery I swear I've loosely described it so I'll start the presentation it's going
to advance for me so we've got a fairly loosely coupled ecosystem to handle this at Deakin which is great it's flexible design but as I've said there it can forth causes a lot of confusion in practice and the the I've had a lot of problem getting researchers engaged with the fabric because it is quite confusing and you'll see with the diagram I'll present slides later and what I mean so just trying to disambiguate some of that and clarify what the the tools are about and how they can actually assist rather than inhibit publishing a data and using the storage is really what we're focusing on at the moment so in the describe space we we implemented red box mint under the sea in the Commons and other ends funded initiatives and we call that research data footprints describe the footprint of your research we've got the discovery layer so that
repository isn't what we present to the world at large we feed that into our Asha Thunder research repository which is called dro taking research online and then that's what's research that Australia harvests for the individual records and the actual data that may be shared in an open way is made visible through a very simple very basic portal called the Deacon data portal which is basically just an Apache server on top of the data itself and I'll show demo of all these things later want to expand those those screenshots so when we were implementing this metadata repository we
also implemented a research data storage system which allows researchers to provision storage themselves we didn't have any strict requirements on that so anybody can create a bucket to to store data but it is aligned with that that data portal when a research is ready to they can publish the data itself and it will link those things together and that's what allows it to be exposed to the data portal so how does this all fit
together this is the diagram I was talking about just before so we've got various components and I'm sure most of you would be familiar with some of these systems in play but it's basically the management system is the source of truth for project and party data around researchers that feeds this repository else is talking about the storage system can be you can create storage and choose to link it to a project or not hood we've quite flexible with that because we understand that the actual process of writing a grant can actually generate a bit of data so before success outcome so we didn't want to dissuade people from using the central storage that we've got on offer and really it was also a carrot to stop people buying sternal hard drives and storing data locally on my machine so having that resilient storage in our davison it was a pretty key point for that service and then the rest of its pretty pretty and familiar to most of you say we do eyes against every data set that's created and suppose that through this fabric down the bottom so it is a bit of a quagmire and does cause a bit of confusion but with with the presentation layer which is our focus on the moment it is it is limited and then
it's just a bucket of data and we're just presenting it as a list and so the benefit so the researcher is limited and that's what our focus is on now is looking at well how can we better make people aware of this storage that is available and how it should is intended to be used and how can we better display some of the the data that people are generating at the moment I'm getting a lot of people creating storage containers or collections and just backing up their whole hard drive to it and there's really no description and delineation to what how they're describing things so it's really identified to me that there's there's pretty poor practice out there and in terms of how people structure what they're doing and so that's where our library and staff for helping out a lot in that one to one or one to the small group discussions around how better to describe and manage data in in the broader context what I was also going to say there is we've got a portal that taking called Deacon sync and we're looking to provide some context to what researchers are doing there around storage and so when one of the ideas is to link present to the researcher if they've got a successful grant outcome to present to them the option of creating storage if we know we have they haven't linked it to that project already it's because we've got all that metadata they have we can actually you leverage quite a lot so with that portal we can provide a lot of value and direct every all the researchers to go there to say okay well you may want to be creating some records because we can see the projects been running and it's near the end of its lifecycle or at the earliest stages can actually create storage to put the data in that you're planning to generate with that project the other options in the
presentation layer we're looking at our discipline specific or quite aggregated systems that allow you to display data for various different disciplines so we're only just starting to look at how we can integrate these things into this this platform with this ecosystem and so those things are like a maker for all different disciplines that may want to create collections and maybe some themselves and use that as their presentation layer rather than just a bucket with a and Apache UNIX on top of that figshare and my TARDIS bringing around image data figshare being quite general and looking at picture for institutions and how that could potentially play a part or media flux was still really investigating always different options so that's the real of ecosystem and I didn't want to go into
too much on that and really want to show you how it all sort of functions this is
the red box system we have and most people would have seen that in the past
it's allowing you to create the data
descriptions as we all are well aware well I want to show you he was the process we go through for each of these and how the DIYs are linked into the
actual data portal side of things so
when the process is they create a
metadata record and then when they're ready to publish the data they click a
publish in there in the store which I'll show you in a minute and then the links for that come into here
and it's published it publishes this data portal link you may be able to see on the screen the URL down the bottom which keeps those two things in check and then when you go to view that actual
data collection you can then see it on this data portal and which one was that the interview data for some Papua New Guinean audio interviews so we replicate
the metadata from that footprints record
and actually show the contents here to
be able to download it if you want to but it's very very basic there's no packaging of that which would be really ideal there's no thumbnail sort of view of
that so really you're just downloading in that first example there 800 megabytes and then you can actually understand what it's all about so exporting the metadata of that name peg file in this case is not really done at this point and that's where I'm wanting to get some improvements to present that better the data store is this system
here it's just a web application that
hooks into our corporate stories we have available and what we've done is provided for collection types and we allow our researchers to create those
activities they can link those to a
project and then they can create these buckets to store things so they can create a photo traditional network attached file share which is these little yellow icons and they can create any number of those there's a nominal limit of 10 but they can create any and they are limited they can put as much data in there as they like and that uses our what technology were using now using Isilon storage for that so it means this snapshots taken three times a day one one snapshot at the end of the day for three months so they've got complete ability to restore files and manage their data very flexibly there is another one called a publishable file share so when they're ready to publish data they can create one of those it's no different in terms of the technology but it allows you to hook into the actual footprints record and then that little data portal link happens the other one there with little star this is an icon for a protocol simplicity so we're providing a Dropbox like service because we need it there's a lot of researchers working with external parties and they've got a lot of issues sharing data externally so they can use this is now to provide that so that's using our own on-premise storage with a synchronize or sync and share platform on top of that so gives them unlimited storage although unlimited in sense that you need the storage on your local computer that's a really function but it is has been very it's taken up quite rapidly because people really want that capability without having to pay for a Dropbox accountant and use that storage and the other a collection which I don't have in this demonstration activity here is a wiki space so we've got a confluence wiki instance which they can use for collaborative work internally and so the store or the research data
store has really gone from storage as in storing data to actually a store as you buy things and so that's going to expand will be providing a whole lot of other services through this research to other store so blog engines and Amica instances and a whole lot of different things will be provided through this one portal for researchers and it all be tied together under this this activity or this project better so a particular
example I was going to show you is the Pacific sea stone but Marquis has got a
some sequence data that he's produced and he wanted to make it open so he's gone ahead and published that he's
credit the our fits fedora Asha
repository record through our footprint system and then he wanted to share that
to the world originally he he was working with the library and they stored the objects within the repository which wasn't great and so now they're provided through the
data portal and so you can download the
gigabytes or megabytes in this case of files and one thing I'm advising researchers is to really be descriptive about what that is I'm sure people in his discipline understand what all those different file formats are but it doesn't really have a overview sort of readme file that could describe it better so we're working with them on that and that's presented with that hook
up through that that link there and also
I think it's available here so you can actually be taken straight to that record all the DOI is mapped through to
our repository so footprints really is just a collection gateway that links those things together and allows the the record to be curated it's accurately as possible so really that's all I was
wanting to cover off today can Chris talk more about the publishing function absolutely
so really it's it's what we call it
published but it's and it really does
four miles as a link between the two systems they create a publishable file
share which is just say a network attached storage location everyone should be fully familiar with network
attached storage assistant it work Drive and so they would just have a folder like this to store things let's just say this workshops one for example was something though it stalled I would structure their data within that space that's completely offline it's not exposed to anyone other than themselves and then when they're ready to publish
the data I'll just see you something this is UAT so they like it's mirrors but when they're ready to publish the data they can then click a publish
button quite simply guys say this particular
folder which is fictitious because UAT when they're ready to publish they
literally do that it will then look at all their footprints records and provide a list of ones that they that haven't
been positioned they can just choose that so in this example here I've already published against this other one but this one here I could potentially do that and then I can provide global access so to say it anyone could get access or I could restrict it to an AF member so in some way you could limit down to anyone who's a member of the AF - who could see that so it's sort of semi open and Chancellor's collection and then within a few minutes that collection would be exposed through that data portal I showed you before so you
would see it would appear here or if I logged in and it was restricted there would be more exposed once I'm logged into the system so anyone in Australia
can log into this startup or as you can see and didn't see that so that's how that that's that's working all right
questions a whole bunch of other questions that are coming in as we do that and since what's the maximum storage space a researcher can quest is there a maximum did you say yes what is the maximum no it's unlimited so our IT managing in the growth and capital acquisition that has to happen and they deal with that as it goes so yes it's completely unlimited so this next question probably ties into that sense what's the cost of the implementation what you have a deacon particularly the data storage costs so there's no explicit cost it's covered under our central central capital expenditure on storage so it's just factored into all the storage that the university buys so there hasn't been an explicit cost for this particular service at the moment there's just about what are we up to hundred terabytes with another 60 at another site so nearly 200 terabytes is what we're looking at so not overly large we don't have any astrophysicists with a petabyte in their back pocket so it's it's probably relatively small to most institutions but it is it's covered under that so they provision that under systematic um peculiar throughout the year so every time they they're always negotiating a new price for that storage so I don't have to worry about that which is lux either a luxurious position to be so probably the ties in to that use a couple questions which sort of melt together hmm one is this use of storage by external to deepen users most collaborations in our national international so is this possible that the external to deeper users can use it and there's another question very similar which says is this server is going to be available for researchers and other universities and it's their storage size limitations for them so the first bit is covered under that sink and share service where they can they can provision it so a deacon identity can provision it and share that with colleagues they're working with other institutions which but there are limits to that because if you're synchronizing to your own computer you need the hard drive storage on your own computer there are limits the traditional network attached storage any dekum identity can access that because they can create a VPN connection but the external people cannot say the way that's traditionally been handled at deakin is we often make those collaborators we need it as a visitor to the University and then they get access to there so the storage so it's a little bit cumbersome and I don't it but but most people know how to work around that and then follow that process and the last question no there wouldn't be the ability for non Deakin people to create the storage space in the first place it really has to be instigated from deacons perspective ok another one is other researchers able to do is by this publishing method yeah so in the footprints system that's where the diys
mitad and they are done by the library
so the library when they're performing quality checks on the description are
performing a step of minting it so it's done implicitly in that the workflow of a metadata record is curated by the library and they're the ones actually doing that but it's effectively it's a business to business transaction that happens on every one of those records so yeah research themselves don't but the Barbie doesn't on their behalf ok and then probably the last one so we can keep the pause time it says is that all the data stored and eaten infrastructure or is extraordinary structure eg Azeris or local research provider so yes it is all one big in infrastructure so amongst our four main campuses we've got two data centers and it's stored on the the directors within those data centers and replicated on those two so we haven't engaged with the the RDS I provision storage it's all purely within on-premise which our researchers like because it means they can particularly if it's sensitive data they can check a lot of boxes in terms of their compliance that they need to thank you thank you for the autos wonderful questions our next speakers is a key so I'm just gonna pass the control to Vicki thank you very much
profession to share what we're doing here at Newcastle so just going to talk about some from data to discovery in terms of our research line storage and
the connections that we have so in time and Newcastle story I'm so tell you a little bit about the systems and tools that we have and then talk about the three workflows that sort of make up their systems and tools and the connections and integrations between notes so to tell the story I'm just
going to introduce you to the systems and the tools there in this space that we Castle so the research data storage we have on clapped for data archiving and publishing we're using a tool software app that was created for to run on a cloud that's called credit the data management and registry for the data management meta data curation workflows we're using red box and met similar to what Chris was just and publish a discovery we're doing that via our institutional repository which is Noma here at Newcastle so I'm just going to
talk a little bit about the the workflow and how they all connect and just describe that to you and after I've done that I've just got two small videos short videos that just actually just show you that in action so you can tell you about it they'll actually show it too unlike Chris I wasn't keen to do a lot of demonstration because that will probably go wrong so I'm using them so for research so the work the very first um these are the three workplace I'm just talking about the connections between the two so a research data storage so in that is our own cloud which is enterprise version seven data run at the moment and that's it so it's a petabyte and on that we have this app which is credit credit was developed it's made in 2013 we say work on crater that was born from work that Penta septum was doing University of Western Sydney off Western Sydney University I should say at the time it was a collaboration between University castle Western Sydney intersects you're doing their development and in those early days University of Sydney as well so credit was about the problem that we had identified in the line we are not wanting to have this connection with the research data storage to look into our data management and publishing workflow the red box in the mint so and this crater was developed with the development side of way back in 2013 there's been a few development cycles along the way so there's been a few spreads and agile developments to get it to where it is now and also this some future development coming out which I'll tell you about at the end so you know in that workflow that research data storage well that's what sitting there in the data management publishing one we have similar to Chris breadbox and the mint so the red box is a melody metadata stores descriptive curation workflow and it's hooked up to the mint which is our name authority service through our party records our staff members our researchers and also for our bread sorry that's what I was looking for information about our grants and then that's connected to mobile which is for discovery sorry I'll just so just run through it quickly so in the research shows storage workflow that first one there what some researchers do there or users of it they log into the oak cloud we have they create a crate crate is a data frame they add files to that crate and the files are the files that they they working files for me that I have online class so they add it to the crate then they have the opportunity in credit to also add metadata and from there they can review the metadata and then they can publish the crate when they publish the crate that moves across a couple of things happen one of those is it comes to the library into the next workflow that data management and publishing and the research intersects an email that a lot of that tomato powder in it and then in the data management and publishing workflow the one sitting in the middle there that's where the library works on the metadata that come across the alert that's come across the operator so that alert arrived into that system the library works on actually augment the metadata and we add metadata and we probably have more conversation with there the research app to actually I work my permissions and probably more on descriptions I wanna be happy without we publish a report and that is crossing to lower or discovery up to your research data Australia so this is because highly
sophisticated is to make diagram so what it's just a way of very simply demonstrating sort of what's happening so we've got iron cloud the researchers are in there with this it's a storage they're working fast I shouldn't say that own cloud is just one of the storage options you have a castle but if you want to have the connections to publish on cloud it's where we have the ability to do that make a passive so from the crater chill 2 things happen when a research users crater and they publish or submit and data crate they press the button which I'll show you shortly two things happen so our metadata alert goes across to a red box system and it's like the part of a record so it's an alert that had information that's been collected by the research has been working in crater the second thing that happens is that the data crane itself so a zip file and it uses the maggot specification that came out of California to do a library that data set and then actually that crate goes into our storage storage layer so the metadata up a loop goes across and it's ingested into red box so more what happens there in red box along with that and then from red box we send a mark a DC or if CF across the mobile embedded in that from that metadata alert information all the way through that process traveling with it is the URL to the data crack in the storage like so an institutional repository has a proof interface into the storage layer so it's able to be the gatekeeper for the access to the data so if it's publicly available it's only publicly available to go over to that root access so just quickly and this is very true
minute video quickly just demonstrating what I've just told you in terms of uncharted crater so there's research a
little team they see all their files you unplowed they're able to toggle up and they'll see and then welcome that's called credit they can by default they have a default rate for their data but
they can create a new one so I'm going to the processor brain and you click this is my study on rainfall it's my typo and I click to create and now I have a crack this just told me at
the top there in yellow that I've got a new crack now I'm toggling and I'm going back to my files there and now I can add
my right clicking the credits will let you add to the crate so I'm just adding in my data dictionary in my population information on frogs my environmental information and I've got some images but basically you pick and choose what it is that's the researching package up goes into that data crap and when you're finished it's telling you as you go that it's adding things to the crates 8th national sale so we'll go
back to credit and we'll see the files
have gone into our great now over on the right hand side the researcher of the user has some ability to add some metadata around those files that will go to that crate and Sultan's going across the back of a record publisher so there's a few things we are to pray information with the title the creators we're just adding them now that's walked up to our met system so it's actually doing another type again to mention that's bringing it back hooked up to the mint again that searching for grants so we can select the grant all that information back in and so forth so there's small work going on around what actual meditation should be here there's
a feature to check the crate to make
sure that all the items are valid and
still there get you out of them if you want to hit the button to submit you get to review all the metadata that you've entered at this point can go back and change it or you can hit the submit
button and that's a big button is easier to research so story about a great information to the library so you can send an email I've a dish with people
that you work with it's to say that that's what's happened so that's how credit is running on the research tool also does he pop the data crane a copy for themselves and also say they'll
swear if they're watching so the submit button has done two things it set that correct I've got a crane with that data set to storage for archival purposes and then it's actually
send it across to the library so this is our bread boxed instance which is not publicly may look to chill the library user so it's really as yes and I'll just start the process to show you what happens here so when you're logged into the system the very first thing is the alert is alert that son of le garite study study or green frogs so that's arrived in the source next ownCloud - crater so it's telling me the library it's come from storage and it's alright so we start the process of
looking at that record we go into it and when we start working on it and Chris
Jones before there's various things you know the library works on so we can add lots of information there through a conversation of the researcher as well so that's basically how it works this is
just demonstrating that the information from the crate comes over its populated as a spelling by the researcher into various sessions in this system when
we're finished we get the button to publish the record so this is where we do it and the record is published across to our institutional repository and it just shows that it's actually being published so finally
after the publish button it arrives in mobile so it has behind the scenes it's so sending receives over as well and it's harvested from there we harvest it up to research data Australia so I guess the last thing I've been saying so that's the process and that's the three workflows and that's how they connect it from the research storage and cloud through the red box meant to the library through the discovery on the other end we just facilitate its room over without connection back into research data storage so lastly I would mention that I've said that if there's been a number of iterations with development of their crater tool and I mentor currently funding for the development and enhancements to the - at the moment which they'll be trialing with collateral plus so if there's a group of us working on that so obviously on it and each sector doing the development and also University of Western Sydney and here that's a great console because we've been working on this white wall so that's the end of my presentation thank you very much antastic thank you if anybody has any questions of it can you oh no that's
what he already it says once the project is complete and all credible dialects packaged and up packaged up and published in archives how do you ensure researchers go back and delete all remaining redundant when you're sucking the file straight to a crate Viki what does it track where they are it can if you research and moves them around does that then become disconnected from the crate referencing where those files are so it I presume the advice would be to sort of structure as a location pretty much where you're going to have it and set it and not change with it too much yeah all right so thank you very much
for listening to me this morning today I'm going to focus on just like everybody else during accessing exposing research data jcu and storage we have
quite a few different options that we make available to different researchers so we have HPC so all researchers can apply for an account on the HPC and it's depending upon what they want to do they can use it for just storage or also for for compute purposes jcu is very fortunate to be an RDS I original node so this gives us a two petabytes of disk storage that we have here like here and access to the IDF storage is available through an application process and we tend to encourage people who want access to larger dis storage to apply for an artist application and the other storage we have is a system called research data which is really it's it's a red box it's publicly exposed and this one is designed for completed data sets so that as there's a self submission process that all workflow that the users can go through though so they can complete their they can attach files with the total size of up to 50 50 megabytes so this is typically things like an Excel spreadsheets and zip files that we normally see I'll just move on to my next line I was the other thing I've ever say is on every research can also store files and on a system that need to be kept private and looking expose them in different ways as well depending upon which system the users use so for access again HPC standard access applies it's a sage SCP FTP some of this some is be challenging for some users so we try and use other systems to make access to their storage easier and this is being very helpful to us we have written it's for a variety of storage we can mount that on the HPC for processing or compute access we'll have quite a large number of users here at JC you are making or using its parishes so for those of you who don't know this is um web-based access to our DSi storage and this is this can be for tens of terabytes of data if you wish this has been very helpful to some users in that if they're at a at a location where connectivity is poor aspera shares has been able to give them good throughput in terms of waiting their data and accessing it there is also functionality to provide a sink functionality using a sparrow but and so Christopher pointed out quite at earlier its dependent upon your having the local storage available especially you exceed if you deal with many terabytes of data media Fox gives us lots of options we're focusing on portals functionality for MIDI flux and we're currently working with architecture on improving this so it's a way that we can quickly create a mini portal to expose research data and to have our access restrictions on that and we can also create virtual machines to expose research data by different different websites if if they depending upon the projects or the requirements of the user looks openly and as I said the other one the other option is I mentioned earlier is research data you can attach the 50 mix know up to 50 mix exposure so this is where we title altogether and mostly it jcu the system for exposing it is is that is research data which is our air box instance so it's probably available and there's a feed that happens once a week where ends harvest the records for research data Australia there's another system called the jcu research portfolio that is used and records from research data have a rum displayed under a tab on resource portfolio and this is to provide information about JC researchers but also to see what sort of research data is available from those researchers and the information in the research portfolio is built using the jason research management system maybe I just like to give her a bit of a quick demo if I've been switched to mine so just to try and show you how it all
ties in here's our publicly facing red box instance so I've pretty searched for a record that I know he's got some
links to data so we just reliant on the
researchers adding URLs to explain it to expose where the data may be and in this
example here there's a public link to where the actual publication has been made but the data is stored with that publication and also here there's a link to inside a research which is and actually started sitting on our HPC so the user can then download a zip files and again if there's something similar for data on our REO site and we can expose that data using a similar method so if we show you just JC you don't mean
so this is the research portfolio so if you get address you've got me a redirects to here so you can search for a researcher so just use Jeremy Vander
Waal has lots of Records and if as I
said if they have any data in our you know their box system or research data this temple be generated and you can know so the records from in here so what
we can do is then click on the record
its various its I am live so let's have a look here we go sorry this is just a
just a listening of the information you would see in their box and if you wanted
to you can go off to an actual red box actually this is a this is the actual data so here's a door just a directory listing over data that you can download as we are we all seen this before but
you know just the records that jeremy has in research data Australia I'll just
pick say some of these bird information
click on data provider
so same links similar topics I think
I'll leave it there thinking I would like to open it up for questions please through you yet but while we waiting for wants to come through from 4j I'll go back to one of the ones that was for Vicki which was do you how much training support you often for your staff in terms of credit so what we did you keep that off a little while ago actually it was last year we were in a workshop actually introducing own cloud and actually trialing credit researchers too common and if the purpose of giving feedback so we did a lot through that session we have a this online help and a guide and in context at the moment we've got four to five how many between 400 and 500 users and as part of that purpose they have to go through a norian tation session so actually trying to I to actually deliver that to them question together and then there's this session that they go to and we try to just transfer that on online so just actually get some information to orientate them and then they get access to it know what you do what the time they start yeah okay one back now for Jay since do you link AAF credentials with LD AP for HBCUs no we don't so our HBCUs is now restricted to jcu researchers or people who are enrolled at the university or word here on the I guess that question is probably asked around data access if I couldn't talk about aspera shares a little bit more we can provide a wider range of access to data that's expose well for that system so our aspera shares is hooked into an LDAP it's managed by rusev so for those of you don't know if you've managed chris clouds so josie works closely with yusuf and they have a portal anyone who is a member of the AAF can log on to Chris cloud and in their in Christopher credentials and provided there then given access to an allocation they can then access they can log on to our shares machine or the one based in Brisbane if those storage in there and access it out of that way by shares we can give people access from out the University but cou suppose I have a mechanism that I can provide access for people overseas as well okay and another one for J do you also have SSH SCP type access to the Aspira shares or is it web only good question espera shares is web only the infrastructure underneath is it is possible to get a Ganis SCP access to that storage we usually do it by mounting it on let's say the HPC for instance that's storage and it may access that way we haven't actually exposed the there are two servers that manage the sharon's infrastructure sit behind chairs it is possible to expose them using that but it has not been done I can edit that to respective we have a interactive interactive box that's attached to our storage and so that's how they can get a CCP SSH access to download and they can run things technical tools like the screen or whatever so they've got our sequence file I want to download it takes awhile I can just set it going and come back later so that's that's been taken out quite well now this one's back for Vicki Vicki it says use the crate owned by an individual or a project from the interface I would guess an individual belong to an individual Jay you mentioned media flux have you been able to get this operational that's right okay okay we've spent a lot of time working on media flux and I mentioned that our main focus has been on the portals functionality we've found so far that the present ability of those portals isn't very good mr. regards to being able to customize CSS but we're about to well we've been working closely with them yeah I think developments about to start very soon that will allow us to give have full control and CSS inside those portals to expose the data I do have a couple of data sets in media flux but I'd say I guess watch this space that's all I can say we do uh I think it has great potential but it needs you need some developer resources to spend more time work and that's that will probably be me okay here's a question for all the speakers it says what sort of processes or services do you visit developing on top of the storage service there was a question around our of deletion your data curation be a process or service to build on top so perhaps Viki if you wanted to start that you need us repeat the question for me please sure it says for all speakers what sort of processes or services do you envisage developing on top of the storage service there was a question around data deletion would data curation be a process or service to build on top so I'm not really going to come in on that have anything concrete to actually save it in my mind you know what I would like to see in terms of what we already have hearing is that for looking into the future not a crystal ball and notes it's less than five years but it's more than two I would just like to see a lot of the practices around the data storage and publication of absolutely streamlined and that there's less involvement and and by individuals and people and it's a lot more automated so that's what I would like to say so hopefully that's just sort of service that I think that we should sort of put more time and effort into processes they're things that actually automate and take ourselves out of the way of the research so they're more in control so that's really what I would like but that's really probably not answering a question that was asked I think speaking how about you Jay interesting question I guess we're probably not there yet here at JC you for instance I didn't know Redbox we're capturing time frames in which people what data would be retained for but we're not asking any of that at the moment as far as maybe to answer the data curation side of things when particularly we found records submitted via red box our librarian is reviewing the records but then we also have a look at inside that so disputes and things to see if there are columns and neatly labeled and that people can understand the data that's in me for external use that yeah I think we're not quite there yet either these regards to those sorts of issues okay well increase their way sort of similar to that most of the energy has been invested as I was saying earlier making people aware that the service is there and that's how we would advise for that discipline and that they could use it I think it'd be luxurious position to be in to focus on the letter that you the question sort of talks to and in terms of curation as well really the the direction I've been providing there is saying well you need to you need to be working with best practice in your discipline so if you're unaware of that then we can we can work with you to come up with something proposed that I'm sure the preservation and that is a deep and you do a PhD on that at the moment we're really telling people look stick with common denominators say don't go to if you're going for this spike you need to think about the environments that you would potentially need to access that in five years time and so if you if you're choosing a vendor with your data analysis or capture that potentially may go past or the technology may change three or four iterations you may not actually be able to use that the future so that's something you need to consider well you haven't really a lot of time you know yeah we just just about out of time here so I'm going to very very small question which is fall through them can students especially PhD students access these services to store their research data yes guys going yes because go yes and Christmas going yes fantastic wonderful way to finish all back to you thank you there's a wonderful questions and thank you for all the many speakers who provide the insight important experience
has been very thought-provoking certainly