Storage for Research Data #3 - 21 July 2016
Formal Metadata
Title |
Storage for Research Data #3 - 21 July 2016
|
Title of Series | |
Author |
|
License |
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. |
Identifiers |
|
Publisher |
|
Release Date |
2016
|
Language |
English
|
Content Metadata
Subject Area | |
Abstract |
Providing secured, trusted and reliable access to data storage during the course of research is of critical importance to minimizing risks associated with unauthorized access, accidental loss, and the protection of personal and sensitive data. Yet at the same time, research is becoming more international with research teams and collaborators spread across many institutions and geographic locations around the world. There are clear needs for researchers and collaborators to access working data remotely in the field and from different countries. In this third webinar of the Research Data Information Integration Webinar Series, a panel of speakers will discuss data storage in use (or in development) at their institutions to support the management of data in the course of research.
|

00:00
Slide rule
Information
INTEGRAL
Disintegration
Data storage device
UML
Data management
Mathematics
Goodness of fit
Series (mathematics)
Gastropod shell
Right angle
Information
Series (mathematics)
00:59
Enterprise architecture
Service (economics)
Game controller
Service (economics)
Information
Key (cryptography)
Computer-generated imagery
Data storage device
Connected space
Data management
Data management
Uniform resource locator
Different (Kate Ryan album)
Universe (mathematics)
Fingerprint
Physical system
02:05
Slide rule
Presentation of a group
Disintegration
Moment (mathematics)
State of matter
Data storage device
Code
Metadata
Data storage device
Term (mathematics)
Repository (publishing)
Cuboid
Diagram
Information
Descriptive statistics
Spacetime
03:15
Server (computing)
Demo (music)
Repository (publishing)
State of matter
Repository (publishing)
Code
Metadata
Hill differential equation
XML
Metadata
Row (database)
03:57
Point (geometry)
Focus (optics)
Greatest element
Presentation of a group
Service (economics)
Connectivity (graph theory)
Projective plane
Moment (mathematics)
Source code
State of matter
Data storage device
Fitness function
Virtual machine
Code
Set (mathematics)
Bit
XML
Data management
Process (computing)
Repository (publishing)
Repository (publishing)
Hard disk drive
Diagram
Physical system
05:38
Context awareness
Group action
Presentation of a group
Disintegration
Computer font
Mereology
Metadata
Medical imaging
Latent heat
Synchronization
Computer configuration
Term (mathematics)
Different (Kate Ryan album)
Electronic visual display
Descriptive statistics
Computing platform
Physical system
Service (economics)
Focus (optics)
Projective plane
Moment (mathematics)
Data storage device
Code
Planning
Staff (military)
Digital object identifier
OSI model
Computer configuration
Hard disk drive
Flux
Row (database)
Library (computing)
08:08
Functional (mathematics)
User interface
Link (knot theory)
Structural load
Demo (music)
Password
Focus (optics)
Metadata
Cross-site scripting
Duality (mathematics)
Cuboid
Summierbarkeit
Game theory
Descriptive statistics
Dean number
Physical system
Civil engineering
Data storage device
Code
Process capability index
Menu (computing)
Digital object identifier
Web browser
Sign (mathematics)
Single-precision floating-point format
Process (computing)
Sheaf (mathematics)
Quicksort
Row (database)
Address space
08:54
Point (geometry)
Wechselseitige Information
Asynchronous Transfer Mode
Greatest element
Freeware
User interface
Link (knot theory)
Computer file
View (database)
Mobile Web
Programmable read-only memory
MIDI
Discrete element method
Metadata
Uniformer Raum
Blog
Musical ensemble
Ranking
Information
Gamma function
Game theory
Wireless LAN
Dean number
Thumbnail
Physical system
Metropolitan area network
Link (knot theory)
Observational study
Axiom of choice
Touchscreen
Haar measure
Real number
Web page
Lemma (mathematics)
Latin square
Content (media)
Data storage device
Menu (computing)
Coma Berenices
Computer network
Ripping
Peg solitaire
Inclusion map
Internetworking
Personal digital assistant
Quicksort
Constructive solid geometry
Row (database)
09:46
Functional (mathematics)
Service (economics)
Link (knot theory)
Multiplication sign
Structural equation modeling
Data storage device
Computer icon
Number
Neuroinformatik
Confluence (abstract rewriting)
Wiki
Hooking
Term (mathematics)
Synchronization
Multiplication
Computing platform
Projective plane
Data storage device
Shared memory
Menu (computing)
Instance (computer science)
Limit (category theory)
Web application
Type theory
Digital photography
Software
Filesharing-System
Communications protocol
Physical system
Spacetime
Row (database)
11:47
Freeware
Service (economics)
Programmable read-only memory
Computer-generated imagery
Demo (music)
Thermal expansion
Content (media)
Different (Kate Ryan album)
Species
Website
Game theory
Dean number
Computer font
Observational study
Web page
Computer file
Projective plane
Data storage device
Fitness function
Process capability index
Range (statistics)
Instance (computer science)
Group action
Sequence
Mathematics
Basis <Mathematik>
Blog
Normed vector space
IRIS-T
Form (programming)
12:27
User interface
Computer file
Programmable read-only memory
Thermal expansion
Hooking
Different (Kate Ryan album)
Website
Physical system
Observational study
File format
Computer file
Menu (computing)
Mass
Range (statistics)
Open set
Rothe-Verfahren
Frequency
Repository (publishing)
Personal digital assistant
Normed vector space
Software testing
Object (grammar)
Quicksort
Row (database)
Library (computing)
13:10
Gateway (telecommunications)
Functional (mathematics)
User interface
Link (knot theory)
Computer file
Thermal expansion
Average
Planning
Data management
Process (computing)
Website
Game theory
Enterprise resource planning
Physical system
Observational study
Online help
Adaptive behavior
Data storage device
Shared memory
Code
Mass
Range (statistics)
Menu (computing)
Expandierender Graph
Digital object identifier
Open set
Inclusion map
Uniform resource locator
Software
Basis <Mathematik>
Oval
Repository (publishing)
Commodore VIC-20
Normed vector space
Energy level
Row (database)
13:56
Computer icon
Electronic data interchange
Link (knot theory)
User interface
Principal ideal domain
Sine
Programmable read-only memory
Interior (topology)
Data storage device
Electronic mailing list
1 (number)
Structural equation modeling
Data storage device
Inclusion map
Personal digital assistant
Software testing
MiniDisc
Summierbarkeit
Game theory
Dean number
Spacetime
Row (database)
14:39
Programmable read-only memory
Hill differential equation
Quicksort
Open set
Game theory
Physical system
15:22
Implementation
Server (computing)
Service (economics)
User interface
Multiplication sign
Structural equation modeling
Hidden Markov model
Perspective (visual)
Neuroinformatik
Centralizer and normalizer
Blog
Game theory
Position operator
Identity management
Physical system
Self-organization
Collaborationism
Moment (mathematics)
Data storage device
Shared memory
Login
Mass
Bit
Maxima and minima
Limit (category theory)
Connected space
Process (computing)
Software
Oval
Universe (mathematics)
Hard disk drive
Website
Astrophysics
Spacetime
Directed graph
18:37
Meta element
Sensitivity analysis
Game controller
User interface
Multiplication sign
1 (number)
Plastikkarte
Metadata
Uniform resource locator
Term (mathematics)
Document management system
Cuboid
Data structure
Drum memory
Gamma function
Normal (geometry)
MiniDisc
Summierbarkeit
Descriptive statistics
Wireless LAN
Link (knot theory)
Patch (Unix)
Suite (music)
Interior (topology)
Data storage device
Coma Berenices
Computer network
Menu (computing)
Database transaction
Ripping
Group action
Digital object identifier
Plane (geometry)
Permanent
Internet service provider
Data center
Hill differential equation
Gastropod shell
Simulation
Local ring
Acousto-optic modulator
Library (computing)
Row (database)
20:00
Windows Registry
Mobile app
INTEGRAL
Multiplication sign
Data storage device
Metadata
Data management
Term (mathematics)
Cuboid
Information
Physical system
Service (economics)
Data storage device
Bit
Line (geometry)
Windows Registry
Connected space
Data management
Software
Repository (publishing)
Series (mathematics)
System programming
Point cloud
Quicksort
Spacetime
21:12
Group action
Mobile app
Service (economics)
Computer file
Multiplication sign
Set (mathematics)
Mereology
Metadata
Revision control
Latent heat
Root
Computer configuration
Operator (mathematics)
Repository (publishing)
Videoconferencing
Authorization
Cuboid
Diagram
Data conversion
Traffic reporting
Descriptive statistics
Physical system
Collaborationism
Enterprise architecture
Email
Information
Software developer
Interface (computing)
Moment (mathematics)
Data storage device
Staff (military)
Bit
Line (geometry)
Connected space
Proof theory
Data management
Process (computing)
Loop (music)
Repository (publishing)
Point cloud
Cycle (graph theory)
Quicksort
Library (computing)
E-learning
Row (database)
27:24
Default (computer science)
Computer file
Bit rate
Observational study
Term (mathematics)
Static random-access memory
Videoconferencing
Videoconferencing
Coprocessor
Embargo
28:11
Medical imaging
Observational study
Frequency
Computer file
Integrated development environment
Information
Green's function
Videoconferencing
8 (number)
Data dictionary
Embargo
28:53
Type theory
Observational study
Computer file
Information
Green's function
Right angle
Videoconferencing
Approximation
Metadata
Row (database)
Physical system
29:42
Point (geometry)
Email
Observational study
Frequency
Information
Green's function
Information
Videoconferencing
Embargo
Metadata
Library (computing)
30:24
Process (computing)
Observational study
Green's function
Source code
Data storage device
File archiver
Set (mathematics)
Instance (computer science)
Videoconferencing
Physical system
Library (computing)
31:09
Meta element
Observational study
Information
Dependent and independent variables
MIDI
Food energy
Repository (publishing)
Green's function
Row (database)
Data conversion
Videoconferencing
Row (database)
Library (computing)
Physical system
31:50
Trail
Group action
Presentation of a group
Computer file
Disintegration
Data storage device
Number
Duality (mathematics)
Cuboid
Information
Data structure
Service (economics)
Dialect
Software developer
Moment (mathematics)
Projective plane
Data storage device
Demoscene
Connected space
Database normalization
Uniform resource locator
Process (computing)
Series (mathematics)
File archiver
Point cloud
Iteration
Quicksort
Library (computing)
34:29
Functional (mathematics)
Computer file
Variety (linguistics)
Virtual machine
Set (mathematics)
Data storage device
Staff (military)
Neuroinformatik
Number
Data management
Spreadsheet
Goodness of fit
Computer configuration
Different (Kate Ryan album)
Term (mathematics)
Set (mathematics)
Cuboid
Physical system
Computer architecture
MIDI
Standard deviation
Link (knot theory)
Web portal
Information
Demo (music)
Projective plane
Data storage device
Shared memory
Metadata
Bit
Line (geometry)
Instance (computer science)
Cartesian coordinate system
Virtual machine
10 (number)
Connected space
Uniform resource locator
Data management
Process (computing)
Internet service provider
Mixed reality
MiniDisc
Quicksort
Physical system
Flux
Row (database)
Directed graph
39:27
Email
Link (knot theory)
Computer file
Distribution (mathematics)
Maxima and minima
Analogy
Similarity (geometry)
Average
Emulation
Supercomputer
Graphical user interface
Cuboid
Row (database)
Multiplication
Window
View (database)
Archaeological field survey
Computer file
Instance (computer science)
Bookmark (World Wide Web)
Forest
Mathematics
Uniform resource locator
Website
Computer music
Simulation
Row (database)
40:14
Polygon
User interface
Distribution (mathematics)
Computer-generated imagery
Collaborationism
Limit (category theory)
Average
Variance
Area
Word
Bubble memory
Pattern language
Cuboid
Species
Row (database)
Website
Address space
Physical system
Window
Link (knot theory)
Information
View (database)
Image resolution
Computer file
Interior (topology)
Computer
Metadata
Menu (computing)
Linear programming
Bookmark (World Wide Web)
Similarity (geometry)
Forest
Mathematics
Web 2.0
Series (mathematics)
Internet forum
Endliche Modelltheorie
Row (database)
40:55
Context awareness
Distribution (mathematics)
Gradient
Multiplication sign
Direction (geometry)
Range (statistics)
1 (number)
Set (mathematics)
Price index
Data analysis
Mereology
Food energy
Fraction (mathematics)
Web 2.0
Mechanism design
Set (mathematics)
Cuboid
Row (database)
Species
Information
Local ring
Resource allocation
Position operator
Physical system
Computer icon
Metropolitan area network
Logical constant
Touchscreen
Web portal
View (database)
Temporal logic
Software developer
Computer file
Moment (mathematics)
Feedback
Electronic mailing list
Shared memory
Data storage device
Metadata
Internet service provider
Menu (computing)
Staff (military)
Bit
Instance (computer science)
Bookmark (World Wide Web)
Sequence
Open set
Type theory
Process (computing)
Database
Internet service provider
Endliche Modelltheorie
Modul <Datentyp>
Quicksort
Flux
Row (database)
Spacetime
Asynchronous Transfer Mode
Game controller
Server (computing)
Functional (mathematics)
Service (economics)
Link (knot theory)
Computer file
Motion capture
Virtual machine
Electronic program guide
Similarity (geometry)
Online help
Student's t-test
Expert system
Interprozesskommunikation
Wave packet
Goodness of fit
Term (mathematics)
Game theory
Window
Focus (optics)
Information
Online help
Interface (computing)
Projective plane
Interactive television
Computer network
Directory service
Similarity (geometry)
Mathematics
Word
Integrated development environment
Universe (mathematics)
Point cloud
Iteration
51:19
Service (economics)
00:00
all right good afternoon everyone let's start welcome to the webinar series in research data information integrations this is the third of our series today we're going to be looking at storage for research data my name is Paul Wong I'm your host today my colleague Susanna and shell bang whoo because think today's webinar virtually with me research is certainly changing verbally one change is that there's great emphasis in accessibility and reuse of data and better management of research data leading to better research in the long run now a unifying theme of this webinar series is the idea of the research data management lifecycle as I show you the next slide we have new
01:04
previous webinars one on research data management planning one on ethics clearance for research data so today we will be talking about storage locations for research data now through this unifying theme of research data management lifecycle we want to get a better understanding of how research data informations is integrated throughout the lifecycle that means that we need to look at the connectivities of different enterprise system to support the management of research data our
01:43
speakers today today are from Deakin universities christopher whose research directors are big keys in your librarians from University of Newcastle and RJ major research data services at James Cook universities so I'm going to pass the controller to Christopher today
02:06
I was just can t run over what we're doing here taking in terms of this spice of integrating storage with description discovery I swear I've loosely described it so I'll start the presentation it's going
02:20
to advance for me so we've got a fairly loosely coupled ecosystem to handle this at Deakin which is great it's flexible design but as I've said there it can forth causes a lot of confusion in practice and the the I've had a lot of problem getting researchers engaged with the fabric because it is quite confusing and you'll see with the diagram I'll present slides later and what I mean so just trying to disambiguate some of that and clarify what the the tools are about and how they can actually assist rather than inhibit publishing a data and using the storage is really what we're focusing on at the moment so in the describe space we we implemented red box mint under the sea in the Commons and other ends funded initiatives and we call that research data footprints describe the footprint of your research we've got the discovery layer so that
03:17
repository isn't what we present to the world at large we feed that into our Asha Thunder research repository which is called dro taking research online and then that's what's research that Australia harvests for the individual records and the actual data that may be shared in an open way is made visible through a very simple very basic portal called the Deacon data portal which is basically just an Apache server on top of the data itself and I'll show demo of all these things later want to expand those those screenshots so when we were implementing this metadata repository we
03:58
also implemented a research data storage system which allows researchers to provision storage themselves we didn't have any strict requirements on that so anybody can create a bucket to to store data but it is aligned with that that data portal when a research is ready to they can publish the data itself and it will link those things together and that's what allows it to be exposed to the data portal so how does this all fit
04:25
together this is the diagram I was talking about just before so we've got various components and I'm sure most of you would be familiar with some of these systems in play but it's basically the management system is the source of truth for project and party data around researchers that feeds this repository else is talking about the storage system can be you can create storage and choose to link it to a project or not hood we've quite flexible with that because we understand that the actual process of writing a grant can actually generate a bit of data so before success outcome so we didn't want to dissuade people from using the central storage that we've got on offer and really it was also a carrot to stop people buying sternal hard drives and storing data locally on my machine so having that resilient storage in our davison it was a pretty key point for that service and then the rest of its pretty pretty and familiar to most of you say we do eyes against every data set that's created and suppose that through this fabric down the bottom so it is a bit of a quagmire and does cause a bit of confusion but with with the presentation layer which is our focus on the moment it is it is limited and then
05:40
it's just a bucket of data and we're just presenting it as a list and so the benefit so the researcher is limited and that's what our focus is on now is looking at well how can we better make people aware of this storage that is available and how it should is intended to be used and how can we better display some of the the data that people are generating at the moment I'm getting a lot of people creating storage containers or collections and just backing up their whole hard drive to it and there's really no description and delineation to what how they're describing things so it's really identified to me that there's there's pretty poor practice out there and in terms of how people structure what they're doing and so that's where our library and staff for helping out a lot in that one to one or one to the small group discussions around how better to describe and manage data in in the broader context what I was also going to say there is we've got a portal that taking called Deacon sync and we're looking to provide some context to what researchers are doing there around storage and so when one of the ideas is to link present to the researcher if they've got a successful grant outcome to present to them the option of creating storage if we know we have they haven't linked it to that project already it's because we've got all that metadata they have we can actually you leverage quite a lot so with that portal we can provide a lot of value and direct every all the researchers to go there to say okay well you may want to be creating some records because we can see the projects been running and it's near the end of its lifecycle or at the earliest stages can actually create storage to put the data in that you're planning to generate with that project the other options in the
07:23
presentation layer we're looking at our discipline specific or quite aggregated systems that allow you to display data for various different disciplines so we're only just starting to look at how we can integrate these things into this this platform with this ecosystem and so those things are like a maker for all different disciplines that may want to create collections and maybe some themselves and use that as their presentation layer rather than just a bucket with a and Apache UNIX on top of that figshare and my TARDIS bringing around image data figshare being quite general and looking at picture for institutions and how that could potentially play a part or media flux was still really investigating always different options so that's the real of ecosystem and I didn't want to go into
08:10
too much on that and really want to show you how it all sort of functions this is
08:15
the red box system we have and most people would have seen that in the past
08:19
it's allowing you to create the data
08:24
descriptions as we all are well aware well I want to show you he was the process we go through for each of these and how the DIYs are linked into the
08:38
actual data portal side of things so
08:40
when the process is they create a
08:43
metadata record and then when they're ready to publish the data they click a
08:47
publish in there in the store which I'll show you in a minute and then the links for that come into here
08:54
and it's published it publishes this data portal link you may be able to see on the screen the URL down the bottom which keeps those two things in check and then when you go to view that actual
09:06
data collection you can then see it on this data portal and which one was that the interview data for some Papua New Guinean audio interviews so we replicate
09:15
the metadata from that footprints record
09:19
and actually show the contents here to
09:21
be able to download it if you want to but it's very very basic there's no packaging of that which would be really ideal there's no thumbnail sort of view of
09:31
that so really you're just downloading in that first example there 800 megabytes and then you can actually understand what it's all about so exporting the metadata of that name peg file in this case is not really done at this point and that's where I'm wanting to get some improvements to present that better the data store is this system
09:48
here it's just a web application that
09:50
hooks into our corporate stories we have available and what we've done is provided for collection types and we allow our researchers to create those
10:01
activities they can link those to a
10:04
project and then they can create these buckets to store things so they can create a photo traditional network attached file share which is these little yellow icons and they can create any number of those there's a nominal limit of 10 but they can create any and they are limited they can put as much data in there as they like and that uses our what technology were using now using Isilon storage for that so it means this snapshots taken three times a day one one snapshot at the end of the day for three months so they've got complete ability to restore files and manage their data very flexibly there is another one called a publishable file share so when they're ready to publish data they can create one of those it's no different in terms of the technology but it allows you to hook into the actual footprints record and then that little data portal link happens the other one there with little star this is an icon for a protocol simplicity so we're providing a Dropbox like service because we need it there's a lot of researchers working with external parties and they've got a lot of issues sharing data externally so they can use this is now to provide that so that's using our own on-premise storage with a synchronize or sync and share platform on top of that so gives them unlimited storage although unlimited in sense that you need the storage on your local computer that's a really function but it is has been very it's taken up quite rapidly because people really want that capability without having to pay for a Dropbox accountant and use that storage and the other a collection which I don't have in this demonstration activity here is a wiki space so we've got a confluence wiki instance which they can use for collaborative work internally and so the store or the research data
11:47
store has really gone from storage as in storing data to actually a store as you buy things and so that's going to expand will be providing a whole lot of other services through this research to other store so blog engines and Amica instances and a whole lot of different things will be provided through this one portal for researchers and it all be tied together under this this activity or this project better so a particular
12:10
example I was going to show you is the Pacific sea stone but Marquis has got a
12:16
some sequence data that he's produced and he wanted to make it open so he's gone ahead and published that he's
12:24
credit the our fits fedora Asha
12:28
repository record through our footprint system and then he wanted to share that
12:33
to the world originally he he was working with the library and they stored the objects within the repository which wasn't great and so now they're provided through the
12:45
data portal and so you can download the
12:49
gigabytes or megabytes in this case of files and one thing I'm advising researchers is to really be descriptive about what that is I'm sure people in his discipline understand what all those different file formats are but it doesn't really have a overview sort of readme file that could describe it better so we're working with them on that and that's presented with that hook
13:11
up through that that link there and also
13:14
I think it's available here so you can actually be taken straight to that record all the DOI is mapped through to
13:21
our repository so footprints really is just a collection gateway that links those things together and allows the the record to be curated it's accurately as possible so really that's all I was
13:34
wanting to cover off today can Chris talk more about the publishing function absolutely
13:41
so really it's it's what we call it
13:44
published but it's and it really does
13:46
four miles as a link between the two systems they create a publishable file
13:51
share which is just say a network attached storage location everyone should be fully familiar with network
13:57
attached storage assistant it work Drive and so they would just have a folder like this to store things let's just say this workshops one for example was something though it stalled I would structure their data within that space that's completely offline it's not exposed to anyone other than themselves and then when they're ready to publish
14:14
the data I'll just see you something this is UAT so they like it's mirrors but when they're ready to publish the data they can then click a publish
14:24
button quite simply guys say this particular
14:29
folder which is fictitious because UAT when they're ready to publish they
14:34
literally do that it will then look at all their footprints records and provide a list of ones that they that haven't
14:40
been positioned they can just choose that so in this example here I've already published against this other one but this one here I could potentially do that and then I can provide global access so to say it anyone could get access or I could restrict it to an AF member so in some way you could limit down to anyone who's a member of the AF - who could see that so it's sort of semi open and Chancellor's collection and then within a few minutes that collection would be exposed through that data portal I showed you before so you
15:09
would see it would appear here or if I logged in and it was restricted there would be more exposed once I'm logged into the system so anyone in Australia
15:22
can log into this startup or as you can see and didn't see that so that's how that that's that's working all right
15:31
questions a whole bunch of other questions that are coming in as we do that and since what's the maximum storage space a researcher can quest is there a maximum did you say yes what is the maximum no it's unlimited so our IT managing in the growth and capital acquisition that has to happen and they deal with that as it goes so yes it's completely unlimited so this next question probably ties into that sense what's the cost of the implementation what you have a deacon particularly the data storage costs so there's no explicit cost it's covered under our central central capital expenditure on storage so it's just factored into all the storage that the university buys so there hasn't been an explicit cost for this particular service at the moment there's just about what are we up to hundred terabytes with another 60 at another site so nearly 200 terabytes is what we're looking at so not overly large we don't have any astrophysicists with a petabyte in their back pocket so it's it's probably relatively small to most institutions but it is it's covered under that so they provision that under systematic um peculiar throughout the year so every time they they're always negotiating a new price for that storage so I don't have to worry about that which is lux either a luxurious position to be so probably the ties in to that use a couple questions which sort of melt together hmm one is this use of storage by external to deepen users most collaborations in our national international so is this possible that the external to deeper users can use it and there's another question very similar which says is this server is going to be available for researchers and other universities and it's their storage size limitations for them so the first bit is covered under that sink and share service where they can they can provision it so a deacon identity can provision it and share that with colleagues they're working with other institutions which but there are limits to that because if you're synchronizing to your own computer you need the hard drive storage on your own computer there are limits the traditional network attached storage any dekum identity can access that because they can create a VPN connection but the external people cannot say the way that's traditionally been handled at deakin is we often make those collaborators we need it as a visitor to the University and then they get access to there so the storage so it's a little bit cumbersome and I don't it but but most people know how to work around that and then follow that process and the last question no there wouldn't be the ability for non Deakin people to create the storage space in the first place it really has to be instigated from deacons perspective ok another one is other researchers able to do is by this publishing method yeah so in the footprints system that's where the diys
18:40
mitad and they are done by the library
18:43
so the library when they're performing quality checks on the description are
18:47
performing a step of minting it so it's done implicitly in that the workflow of a metadata record is curated by the library and they're the ones actually doing that but it's effectively it's a business to business transaction that happens on every one of those records so yeah research themselves don't but the Barbie doesn't on their behalf ok and then probably the last one so we can keep the pause time it says is that all the data stored and eaten infrastructure or is extraordinary structure eg Azeris or local research provider so yes it is all one big in infrastructure so amongst our four main campuses we've got two data centers and it's stored on the the directors within those data centers and replicated on those two so we haven't engaged with the the RDS I provision storage it's all purely within on-premise which our researchers like because it means they can particularly if it's sensitive data they can check a lot of boxes in terms of their compliance that they need to thank you thank you for the autos wonderful questions our next speakers is a key so I'm just gonna pass the control to Vicki thank you very much
20:01
profession to share what we're doing here at Newcastle so just going to talk about some from data to discovery in terms of our research line storage and
20:13
the connections that we have so in time and Newcastle story I'm so tell you a little bit about the systems and tools that we have and then talk about the three workflows that sort of make up their systems and tools and the connections and integrations between notes so to tell the story I'm just
20:32
going to introduce you to the systems and the tools there in this space that we Castle so the research data storage we have on clapped for data archiving and publishing we're using a tool software app that was created for to run on a cloud that's called credit the data management and registry for the data management meta data curation workflows we're using red box and met similar to what Chris was just and publish a discovery we're doing that via our institutional repository which is Noma here at Newcastle so I'm just going to
21:13
talk a little bit about the the workflow and how they all connect and just describe that to you and after I've done that I've just got two small videos short videos that just actually just show you that in action so you can tell you about it they'll actually show it too unlike Chris I wasn't keen to do a lot of demonstration because that will probably go wrong so I'm using them so for research so the work the very first um these are the three workplace I'm just talking about the connections between the two so a research data storage so in that is our own cloud which is enterprise version seven data run at the moment and that's it so it's a petabyte and on that we have this app which is credit credit was developed it's made in 2013 we say work on crater that was born from work that Penta septum was doing University of Western Sydney off Western Sydney University I should say at the time it was a collaboration between University castle Western Sydney intersects you're doing their development and in those early days University of Sydney as well so credit was about the problem that we had identified in the line we are not wanting to have this connection with the research data storage to look into our data management and publishing workflow the red box in the mint so and this crater was developed with the development side of way back in 2013 there's been a few development cycles along the way so there's been a few spreads and agile developments to get it to where it is now and also this some future development coming out which I'll tell you about at the end so you know in that workflow that research data storage well that's what sitting there in the data management publishing one we have similar to Chris breadbox and the mint so the red box is a melody metadata stores descriptive curation workflow and it's hooked up to the mint which is our name authority service through our party records our staff members our researchers and also for our bread sorry that's what I was looking for information about our grants and then that's connected to mobile which is for discovery sorry I'll just so just run through it quickly so in the research shows storage workflow that first one there what some researchers do there or users of it they log into the oak cloud we have they create a crate crate is a data frame they add files to that crate and the files are the files that they they working files for me that I have online class so they add it to the crate then they have the opportunity in credit to also add metadata and from there they can review the metadata and then they can publish the crate when they publish the crate that moves across a couple of things happen one of those is it comes to the library into the next workflow that data management and publishing and the research intersects an email that a lot of that tomato powder in it and then in the data management and publishing workflow the one sitting in the middle there that's where the library works on the metadata that come across the alert that's come across the operator so that alert arrived into that system the library works on actually augment the metadata and we add metadata and we probably have more conversation with there the research app to actually I work my permissions and probably more on descriptions I wanna be happy without we publish a report and that is crossing to lower or discovery up to your research data Australia so this is because highly
25:18
sophisticated is to make diagram so what it's just a way of very simply demonstrating sort of what's happening so we've got iron cloud the researchers are in there with this it's a storage they're working fast I shouldn't say that own cloud is just one of the storage options you have a castle but if you want to have the connections to publish on cloud it's where we have the ability to do that make a passive so from the crater chill 2 things happen when a research users crater and they publish or submit and data crate they press the button which I'll show you shortly two things happen so our metadata alert goes across to a red box system and it's like the part of a record so it's an alert that had information that's been collected by the research has been working in crater the second thing that happens is that the data crane itself so a zip file and it uses the maggot specification that came out of California to do a library that data set and then actually that crate goes into our storage storage layer so the metadata up a loop goes across and it's ingested into red box so more what happens there in red box along with that and then from red box we send a mark a DC or if CF across the mobile embedded in that from that metadata alert information all the way through that process traveling with it is the URL to the data crack in the storage like so an institutional repository has a proof interface into the storage layer so it's able to be the gatekeeper for the access to the data so if it's publicly available it's only publicly available to go over to that root access so just quickly and this is very true
27:28
minute video quickly just demonstrating what I've just told you in terms of uncharted crater so there's research a
27:36
little team they see all their files you unplowed they're able to toggle up and they'll see and then welcome that's called credit they can by default they have a default rate for their data but
27:49
they can create a new one so I'm going to the processor brain and you click this is my study on rainfall it's my typo and I click to create and now I have a crack this just told me at
28:14
the top there in yellow that I've got a new crack now I'm toggling and I'm going back to my files there and now I can add
28:22
my right clicking the credits will let you add to the crate so I'm just adding in my data dictionary in my population information on frogs my environmental information and I've got some images but basically you pick and choose what it is that's the researching package up goes into that data crap and when you're finished it's telling you as you go that it's adding things to the crates 8th national sale so we'll go
28:54
back to credit and we'll see the files
28:58
have gone into our great now over on the right hand side the researcher of the user has some ability to add some metadata around those files that will go to that crate and Sultan's going across the back of a record publisher so there's a few things we are to pray information with the title the creators we're just adding them now that's walked up to our met system so it's actually doing another type again to mention that's bringing it back hooked up to the mint again that searching for grants so we can select the grant all that information back in and so forth so there's small work going on around what actual meditation should be here there's
29:44
a feature to check the crate to make
29:46
sure that all the items are valid and
29:47
still there get you out of them if you want to hit the button to submit you get to review all the metadata that you've entered at this point can go back and change it or you can hit the submit
29:58
button and that's a big button is easier to research so story about a great information to the library so you can send an email I've a dish with people
30:10
that you work with it's to say that that's what's happened so that's how credit is running on the research tool also does he pop the data crane a copy for themselves and also say they'll
30:26
swear if they're watching so the submit button has done two things it set that correct I've got a crane with that data set to storage for archival purposes and then it's actually
30:39
send it across to the library so this is our bread boxed instance which is not publicly may look to chill the library user so it's really as yes and I'll just start the process to show you what happens here so when you're logged into the system the very first thing is the alert is alert that son of le garite study study or green frogs so that's arrived in the source next ownCloud - crater so it's telling me the library it's come from storage and it's alright so we start the process of
31:11
looking at that record we go into it and when we start working on it and Chris
31:17
Jones before there's various things you know the library works on so we can add lots of information there through a conversation of the researcher as well so that's basically how it works this is
31:28
just demonstrating that the information from the crate comes over its populated as a spelling by the researcher into various sessions in this system when
31:39
we're finished we get the button to publish the record so this is where we do it and the record is published across to our institutional repository and it just shows that it's actually being published so finally
31:52
after the publish button it arrives in mobile so it has behind the scenes it's so sending receives over as well and it's harvested from there we harvest it up to research data Australia so I guess the last thing I've been saying so that's the process and that's the three workflows and that's how they connect it from the research storage and cloud through the red box meant to the library through the discovery on the other end we just facilitate its room over without connection back into research data storage so lastly I would mention that I've said that if there's been a number of iterations with development of their crater tool and I mentor currently funding for the development and enhancements to the - at the moment which they'll be trialing with collateral plus so if there's a group of us working on that so obviously on it and each sector doing the development and also University of Western Sydney and here that's a great console because we've been working on this white wall so that's the end of my presentation thank you very much antastic thank you if anybody has any questions of it can you oh no that's
33:13
what he already it says once the project is complete and all credible dialects packaged and up packaged up and published in archives how do you ensure researchers go back and delete all remaining redundant when you're sucking the file straight to a crate Viki what does it track where they are it can if you research and moves them around does that then become disconnected from the crate referencing where those files are so it I presume the advice would be to sort of structure as a location pretty much where you're going to have it and set it and not change with it too much yeah all right so thank you very much
34:30
for listening to me this morning today I'm going to focus on just like everybody else during accessing exposing research data jcu and storage we have
34:43
quite a few different options that we make available to different researchers so we have HPC so all researchers can apply for an account on the HPC and it's depending upon what they want to do they can use it for just storage or also for for compute purposes jcu is very fortunate to be an RDS I original node so this gives us a two petabytes of disk storage that we have here like here and access to the IDF storage is available through an application process and we tend to encourage people who want access to larger dis storage to apply for an artist application and the other storage we have is a system called research data which is really it's it's a red box it's publicly exposed and this one is designed for completed data sets so that as there's a self submission process that all workflow that the users can go through though so they can complete their they can attach files with the total size of up to 50 50 megabytes so this is typically things like an Excel spreadsheets and zip files that we normally see I'll just move on to my next line I was the other thing I've ever say is on every research can also store files and on a system that need to be kept private and looking expose them in different ways as well depending upon which system the users use so for access again HPC standard access applies it's a sage SCP FTP some of this some is be challenging for some users so we try and use other systems to make access to their storage easier and this is being very helpful to us we have written it's for a variety of storage we can mount that on the HPC for processing or compute access we'll have quite a large number of users here at JC you are making or using its parishes so for those of you who don't know this is um web-based access to our DSi storage and this is this can be for tens of terabytes of data if you wish this has been very helpful to some users in that if they're at a at a location where connectivity is poor aspera shares has been able to give them good throughput in terms of waiting their data and accessing it there is also functionality to provide a sink functionality using a sparrow but and so Christopher pointed out quite at earlier its dependent upon your having the local storage available especially you exceed if you deal with many terabytes of data media Fox gives us lots of options we're focusing on portals functionality for MIDI flux and we're currently working with architecture on improving this so it's a way that we can quickly create a mini portal to expose research data and to have our access restrictions on that and we can also create virtual machines to expose research data by different different websites if if they depending upon the projects or the requirements of the user looks openly and as I said the other one the other option is I mentioned earlier is research data you can attach the 50 mix know up to 50 mix exposure so this is where we title altogether and mostly it jcu the system for exposing it is is that is research data which is our air box instance so it's probably available and there's a feed that happens once a week where ends harvest the records for research data Australia there's another system called the jcu research portfolio that is used and records from research data have a rum displayed under a tab on resource portfolio and this is to provide information about JC researchers but also to see what sort of research data is available from those researchers and the information in the research portfolio is built using the jason research management system maybe I just like to give her a bit of a quick demo if I've been switched to mine so just to try and show you how it all
39:29
ties in here's our publicly facing red box instance so I've pretty searched for a record that I know he's got some
39:38
links to data so we just reliant on the
39:42
researchers adding URLs to explain it to expose where the data may be and in this
39:49
example here there's a public link to where the actual publication has been made but the data is stored with that publication and also here there's a link to inside a research which is and actually started sitting on our HPC so the user can then download a zip files and again if there's something similar for data on our REO site and we can expose that data using a similar method so if we show you just JC you don't mean
40:18
so this is the research portfolio so if you get address you've got me a redirects to here so you can search for a researcher so just use Jeremy Vander
40:29
Waal has lots of Records and if as I
40:32
said if they have any data in our you know their box system or research data this temple be generated and you can know so the records from in here so what
40:43
we can do is then click on the record
40:45
its various its I am live so let's have a look here we go sorry this is just a
40:52
just a listening of the information you would see in their box and if you wanted
40:56
to you can go off to an actual red box actually this is a this is the actual data so here's a door just a directory listing over data that you can download as we are we all seen this before but
41:06
you know just the records that jeremy has in research data Australia I'll just
41:13
pick say some of these bird information
41:16
click on data provider
41:18
so same links similar topics I think
41:21
I'll leave it there thinking I would like to open it up for questions please through you yet but while we waiting for wants to come through from 4j I'll go back to one of the ones that was for Vicki which was do you how much training support you often for your staff in terms of credit so what we did you keep that off a little while ago actually it was last year we were in a workshop actually introducing own cloud and actually trialing credit researchers too common and if the purpose of giving feedback so we did a lot through that session we have a this online help and a guide and in context at the moment we've got four to five how many between 400 and 500 users and as part of that purpose they have to go through a norian tation session so actually trying to I to actually deliver that to them question together and then there's this session that they go to and we try to just transfer that on online so just actually get some information to orientate them and then they get access to it know what you do what the time they start yeah okay one back now for Jay since do you link AAF credentials with LD AP for HBCUs no we don't so our HBCUs is now restricted to jcu researchers or people who are enrolled at the university or word here on the I guess that question is probably asked around data access if I couldn't talk about aspera shares a little bit more we can provide a wider range of access to data that's expose well for that system so our aspera shares is hooked into an LDAP it's managed by rusev so for those of you don't know if you've managed chris clouds so josie works closely with yusuf and they have a portal anyone who is a member of the AAF can log on to Chris cloud and in their in Christopher credentials and provided there then given access to an allocation they can then access they can log on to our shares machine or the one based in Brisbane if those storage in there and access it out of that way by shares we can give people access from out the University but cou suppose I have a mechanism that I can provide access for people overseas as well okay and another one for J do you also have SSH SCP type access to the Aspira shares or is it web only good question espera shares is web only the infrastructure underneath is it is possible to get a Ganis SCP access to that storage we usually do it by mounting it on let's say the HPC for instance that's storage and it may access that way we haven't actually exposed the there are two servers that manage the sharon's infrastructure sit behind chairs it is possible to expose them using that but it has not been done I can edit that to respective we have a interactive interactive box that's attached to our storage and so that's how they can get a CCP SSH access to download and they can run things technical tools like the screen or whatever so they've got our sequence file I want to download it takes awhile I can just set it going and come back later so that's that's been taken out quite well now this one's back for Vicki Vicki it says use the crate owned by an individual or a project from the interface I would guess an individual belong to an individual Jay you mentioned media flux have you been able to get this operational that's right okay okay we've spent a lot of time working on media flux and I mentioned that our main focus has been on the portals functionality we've found so far that the present ability of those portals isn't very good mr. regards to being able to customize CSS but we're about to well we've been working closely with them yeah I think developments about to start very soon that will allow us to give have full control and CSS inside those portals to expose the data I do have a couple of data sets in media flux but I'd say I guess watch this space that's all I can say we do uh I think it has great potential but it needs you need some developer resources to spend more time work and that's that will probably be me okay here's a question for all the speakers it says what sort of processes or services do you visit developing on top of the storage service there was a question around our of deletion your data curation be a process or service to build on top so perhaps Viki if you wanted to start that you need us repeat the question for me please sure it says for all speakers what sort of processes or services do you envisage developing on top of the storage service there was a question around data deletion would data curation be a process or service to build on top so I'm not really going to come in on that have anything concrete to actually save it in my mind you know what I would like to see in terms of what we already have hearing is that for looking into the future not a crystal ball and notes it's less than five years but it's more than two I would just like to see a lot of the practices around the data storage and publication of absolutely streamlined and that there's less involvement and and by individuals and people and it's a lot more automated so that's what I would like to say so hopefully that's just sort of service that I think that we should sort of put more time and effort into processes they're things that actually automate and take ourselves out of the way of the research so they're more in control so that's really what I would like but that's really probably not answering a question that was asked I think speaking how about you Jay interesting question I guess we're probably not there yet here at JC you for instance I didn't know Redbox we're capturing time frames in which people what data would be retained for but we're not asking any of that at the moment as far as maybe to answer the data curation side of things when particularly we found records submitted via red box our librarian is reviewing the records but then we also have a look at inside that so disputes and things to see if there are columns and neatly labeled and that people can understand the data that's in me for external use that yeah I think we're not quite there yet either these regards to those sorts of issues okay well increase their way sort of similar to that most of the energy has been invested as I was saying earlier making people aware that the service is there and that's how we would advise for that discipline and that they could use it I think it'd be luxurious position to be in to focus on the letter that you the question sort of talks to and in terms of curation as well really the the direction I've been providing there is saying well you need to you need to be working with best practice in your discipline so if you're unaware of that then we can we can work with you to come up with something proposed that I'm sure the preservation and that is a deep and you do a PhD on that at the moment we're really telling people look stick with common denominators say don't go to if you're going for this spike you need to think about the environments that you would potentially need to access that in five years time and so if you if you're choosing a vendor with your data analysis or capture that potentially may go past or the technology may change three or four iterations you may not actually be able to use that the future so that's something you need to consider well you haven't really a lot of time you know yeah we just just about out of time here so I'm going to very very small question which is fall through them can students especially PhD students access these services to store their research data yes guys going yes because go yes and Christmas going yes fantastic wonderful way to finish all back to you thank you there's a wonderful questions and thank you for all the many speakers who provide the insight important experience
51:21
has been very thought-provoking certainly
