FAIR Accessible #2 - A for Accessible I - 06-09-17

Video thumbnail (Frame 0) Video thumbnail (Frame 8429) Video thumbnail (Frame 9687) Video thumbnail (Frame 13105) Video thumbnail (Frame 15462) Video thumbnail (Frame 16504) Video thumbnail (Frame 17676) Video thumbnail (Frame 19682) Video thumbnail (Frame 23567) Video thumbnail (Frame 24886) Video thumbnail (Frame 26684) Video thumbnail (Frame 28355) Video thumbnail (Frame 32555) Video thumbnail (Frame 34380) Video thumbnail (Frame 35192) Video thumbnail (Frame 36757) Video thumbnail (Frame 37907) Video thumbnail (Frame 41570) Video thumbnail (Frame 44981) Video thumbnail (Frame 46500) Video thumbnail (Frame 47516) Video thumbnail (Frame 48806) Video thumbnail (Frame 49706) Video thumbnail (Frame 51665) Video thumbnail (Frame 53538) Video thumbnail (Frame 54522) Video thumbnail (Frame 55281) Video thumbnail (Frame 57915)
Video in TIB AV-Portal: FAIR Accessible #2 - A for Accessible I - 06-09-17

Formal Metadata

Title
FAIR Accessible #2 - A for Accessible I - 06-09-17
Title of Series
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2017
Language
English

Content Metadata

Subject Area
Abstract
The FAIR data principles were drafted by the FORCE11 group in 2015. The principles have since received worldwide recognition as a useful framework for thinking about sharing data in a way that will enable maximum use and reuse. This webinar series is a great opportunity to explore each of the 4 FAIR principles in depth - practical case studies from a range of disciplines and organisations from around Australia, and resources to support the uptake of FAIR principles.
Point (geometry) Webcam Meta element Sensitivity analysis Freeware Service (economics) Identifiability Observational study Transformation (genetics) Multiplication sign Authentication Virtual machine Materialization (paranormal) Set (mathematics) Digital object identifier Open set Information privacy Perspective (visual) Metadata Mach's principle Frequency Internetworking Authorization Videoconferencing Series (mathematics) Implementation Condition number Authentication Standard deviation Observational study Information systems Forcing (mathematics) Projective plane Metadata Bit Line (geometry) Limit (category theory) Data management Software Angle Personal digital assistant Telecommunication Self-organization Procedural programming Quicksort Communications protocol Row (database)
Point (geometry) Euler angles View (database) Characteristic polynomial Set (mathematics) Metadata Number Revision control Latent heat Mechanism design Species Address space Form (programming) Email Information Planning Open set Category of being Uniform resource locator Angle Personal digital assistant Revision control Self-organization Species Quicksort Procedural programming
Service (economics) Standard deviation Service (economics) Observational study Mapping Virtual machine Set (mathematics) Line (geometry) Artificial life Mereology Perspective (visual) Theory Data management Personal digital assistant Quicksort Communications protocol Routing
Meta element Data management Freeware Observational study Authentication Fitness function 1 (number) Metadata Communications protocol
Standard deviation Observational study Observational study Universe (mathematics) Projective plane 1 (number) Quicksort
Area Service (economics) Group action Manufacturing execution system Service (economics) System administrator Archaeological field survey State of matter Archaeological field survey Electronic mailing list Physicalism Set (mathematics) Bit Limit (category theory) Event horizon Event horizon Different (Kate Ryan album) Set (mathematics) Video game Traffic reporting
Point (geometry) Group action Identifiability Link (knot theory) Length Codierung <Programmierung> Multiplication sign Cellular automaton 1 (number) Archaeological field survey Design by contract Mathematical analysis Mereology Code Number Office suite Communications protocol Address space Condition number Area Dependent and independent variables Standard deviation Consistency Cellular automaton Aliasing Archaeological field survey Mathematical analysis Sound effect Information retrieval Universe (mathematics) Quicksort Communications protocol Address space
Email Server (computing) Regulärer Ausdruck <Textverarbeitung> Observational study Correspondence (mathematics) Archaeological field survey Password Metadata Data management Software Statement (computer science) Information Office suite Form (programming) Physical system Area Email Weight Archaeological field survey Expression Data storage device Internet service provider Word Data management Software Password Statement (computer science) Website Point cloud Quicksort Arithmetic progression
Web page Presentation of a group Freeware Social software Observational study View (database) Archaeological field survey Set (mathematics) Electronic mailing list Data dictionary Newsletter Perspective (visual) Metadata Variable (mathematics) Revision control Frequency Radio-frequency identification Term (mathematics) Different (Kate Ryan album) Website Communications protocol Traffic reporting Descriptive statistics Dependent and independent variables Mapping Archaeological field survey Electronic mailing list Metadata Internet service provider Planning Artificial life Variable (mathematics) Process (computing) Facebook Universe (mathematics) File archiver Website FAQ Quicksort Procedural programming Communications protocol Routing
Focus (optics) Presentation of a group User interface Perspective (visual) Element (mathematics) Supercomputer Neuroinformatik Data management Different (Kate Ryan album) Right angle Quicksort Address space Spacetime
Slide rule Game controller Inheritance (object-oriented programming) Service (economics) Zeitdilatation View (database) Computer-generated imagery Set (mathematics) Online help Data storage device Digital library Metadata Data quality Supercomputer Medical imaging Optics Natural number Visualization (computer graphics) Software Focus (optics) Computer Data analysis Database Library catalog Wave packet Data management Visualization (computer graphics) Endliche Modelltheorie Asymmetry Data type
Point (geometry) Server (computing) Group action Game controller Identifiability Thread (computing) Service (economics) INTEGRAL Multiplication sign Electronic mailing list Graph coloring Perspective (visual) Zugriffskontrolle Independence (probability theory) Derivation (linguistics) Frequency Different (Kate Ryan album) File system Metrologie Arrow of time Implementation Embargo Condition number Service (economics) Electric generator GUI widget Projective plane Shared memory Flow separation Type theory Radical (chemistry) Data management Word Frequency Universe (mathematics) Writing Reading (process) Embargo
Point (geometry) Slide rule Implementation Server (computing) Identifiability Service (economics) Link (knot theory) Computer file Code Multiplication sign Set (mathematics) Frustration Average Mereology Perspective (visual) Metadata Scalability Product (business) Data management Data model Frequency Latent heat File system Process (computing) Communications protocol Service (economics) Server (computing) Digitizing Projective plane Gradient Code Range (statistics) Principal ideal domain Bit Digital signal Library catalog Sphere Open set Category of being Type theory Uniform resource locator Data model Right angle Service-oriented architecture Communications protocol
Standard deviation Service (economics) Standard deviation Service (economics) Web portal Mapping Gradient Data storage device Open set Open set Software bug Type theory Medical imaging Web service Uniform resource locator Process (computing) Different (Kate Ryan album) Self-organization Process (computing) Quicksort Freeware Self-organization
Service (economics) Server (computing) Link (knot theory) Thread (computing) Service (economics) Link (knot theory) Information Interface (computing) Stress (mechanics) Virtual machine Library catalog Product (business) Software Directed set Circle Data type Geometry
Web crawler Thread (computing) Multiplication sign File format Mereology Subset Web 2.0 Medical imaging Visualization (computer graphics) File system Multiplication Physical system Service (economics) Email Mapping Concurrency (computer science) Computer file Stress (mechanics) Real-time operating system Type theory Fluid statics File viewer Row (database) Server (computing) Implementation Service (economics) Computer file Software developer Computer-generated imagery Average Host Identity Protocol Metadata Product (business) Oscillation Internetworking Term (mathematics) Integrated development environment Directed set Standard deviation Key (cryptography) Server (computing) Polygon Java applet Client (computing) Database Scalability Web browser Query language Function (mathematics) Window
Point (geometry) Source code Scaling (geometry) Information Point (geometry) Data storage device Control flow Metadata Revision control Type theory Explosion System programming Revision control Integrated development environment Aerodynamics Implementation Traffic reporting
Slide rule Sensitivity analysis Greatest element Group action Presentation of a group Service (economics) Link (knot theory) Information Artificial neural network Projective plane Virtual machine Materialization (paranormal) Bit Control flow Perspective (visual) Number Different (Kate Ryan album) Personal digital assistant Internet forum File archiver Website Quicksort Condition number Spacetime
Collaborationism Strategy game Computer programming
so welcome everybody to the for the second in this series of webinars about the Fair Dexter principles and today we're up to a four accessible so um last
week we talked about the first one findable and now accessible and next week we'll talk about interoperable and the week after that about reusable so first of all electrons use myself my name's Keith Russell I'm from the Australian national data service I'm your host for today a big thank you to Susanna Susanna Sabine in the background she's organizing co-hosting this webinar with me just as a bit of background the Australian national data service works with research organizations around the country to establish trusted partnerships reliable services and to enhance capability around the sector to add value to research starter and to enhance the capability in the research sector so we are working together with two other ingress funded projects so that's RDS and Nixon to create a an aligned set of joint investments to deliver transformation in the research sector they are so so we have three speakers for today I'll do a quick kick off and just give a very brief introduction to what the fair data principles say about accessible and then I'm really excited and very grateful for a two or two of our speakers today David David Fitzgerald he is in this webinar he doesn't have a video big webcam so that's why I don't seem at present and David is a Data Manager at the Australian longitudinal study of women's health and David's gonna be talking about how in the study and how in the data that's been provided they make the data accessible and I was especially interested in this perspective from the angle of sensitive data and making sensitive data accessible the other speaker for today's Jing Bao Dao Wang from NCI and I've asked him about to talk a bit about where how NCI makes their data accessible using services are the data so they can be interrogating use by humans and so first of all I like to give a brief introduction about the a in the in the date in the fair data principles and the a stands for accessible so the way it's been described and way force eleven described the principles is that metadata so data and the metadata that both of them are retrieved by their identifier using a standardized communications protocol so when we talk when retrieved by their identifier that's the identify we talked about last week so that can be a DOI a handle a Perl something that's persistent and that through by using the do I handle Perl you should be able to get access to the data or the metadata and the protocol to get there which would be open free and universally implementable so the thing to think about there is that it's it's something that is a protocol which is standardized and used by can be used by anybody it's not not something that misspoke not something that's home built or badly documented with classic examples is HTTP that's very very normal way of using through internet accessing materials and accessing data it should not require some specialized expensive software another point they make in the data in the in the data principles is that the protocol or should should allow from authentication and authorization presence procedure where necessary so this is a common misunderstanding is that when people read accessible they think oh that means I have to make my data open if you actually read the fair data principles that's not what they're saying what they're saying is accessible does not actually have to be open or free but it you you are expected to give exact conditions under which the data are accessible so even the heavenly protected and private data can be made fair and if you implement it properly implement the fair data principles properly then a human being can see that the data is and maybe not openly available but then what steps they need to take to get access to the data and because in the data principles they also talk about machine access to data if a machine goes hunting around and find they're looking for the data the machine should be able to recognize that the data is not open and what steps need to be taken to actually get to the data I'll talk about that a little further if the user selects either the human or the machine has been granted access to the data then it should be accessible through some sort of authentication and authorization procedure substandard procedure the last point they make under the their data principles about being accessible is in the case in the the case in which data is no longer available at least the metadata should be accessible so this is of course not ideal but in some cases it is necessary to actually make take the data down so that could be if consent for use was only for a limited period of time or maybe there's been a legal takedown notice or something along those lines that really make it impossible to no longer make the data available in that case it is valuable to still keep up a metadata record describing the data and explaining that the data is no longer available now just to reinforce
that accessible does not always have to be open there are clear cases in which data can not be made openly available obvious example is where data refers to human beings and specific characteristics of those human beings like information about their health their income religion attitudes political persuasion all that sort of stuff that's not sort of information you can make publicly available other examples and that's probably worth remembering is that there are other sets of data so for example threatened species the location of where threatened species are can be data which is not something you want to make openly available because that could mean that the last few of those species are hunted down or collected famous example the wall of my pine it's a location of that of those specific
species you need to be protected so finally the another example where data can not always be made openly available is where there are commercial interests in the data and maybe the metadata can be shared but the data itself is sensitive well is there are commercial interests around that and in that case it would not be appropriate for that to be made open available when you when considering making data accessible we do argue to make it as accessible as possible and as openly available as possible possible angle there is just to provide the metadata as it's the arting point if the risk cannot be made available at least the metadata slightly more useful perhaps is making it available through mediated access and in that case it's valuable to be clear about how the user can actually get access and that can be through by providing an email address name telephone number and if for example the user has to go through an ethics procedure to get access to the data then clearly describe that ethics procedure and what sort of information is required to apply for that it to apply for that ethics procedure so I was talking about the immediate access and about providing information about who to contact if you want to get access to the data one thing to keep in mind there is if you are if you list a person or a person within the organization have a think about whether that person's ever going to leave if that's a researcher if they're going to another organization have a fallback have some sort of mechanism to make sure that or maybe a more general email address so that when that data custodian leaves somebody else can at least answer the question and grant access to the data another possible angle in making data accessible is like creating a de-identified version of the data and making that public as long as it's properly the identified and that can be useful for certain data users at least have a better view what's in the data set and for some purposes that a de-identified version can be enough finally a good point to keep in mind is if you do want to make the data accessible plan for this thing you could send forms because coming back afterwards and trying to get consent is not easy another angle worth keeping in
mind and that's something I've invited a gene bow to to talk about more is making data accessible can be through various routes and various protocols and in some cases it doesn't make sense to have a large data set available through download in some case it can make much more sense to actually have services over the data which allow the users to interrogate parts of the day to pull in parts of the data that are much more specific and much and answer their requests and that can be for a human being but especially for a machine that can be really useful so one thing to keep in mind there you need some sort of community agreed standards around that but Ginga is gonna talk much more about that so that was all more map from a much more a theoretical perspective I'm very grateful that I have two speakers today to talk about accessible in practice and how they have actually tackled making data accessible and my first the first speaker for today is David David Fitzgerald and he's a Data Manager at the Australian longitudinal study for women's health of women's health and I was I'm very grateful that David's available to talk about what ALS ALS WH has done to make quite sensitive data still accessible for others to reuse so David is on the line and I would like to hand over to David and then David can David can talk about - how how the how in this in the Australian longitudinal study women health women's health they have made of data accessible Thank You
Keith okay so so I'm David Fitzgerald
the Data Manager for else oceans and I pronounce it that's the Australian longitudinal study on women's health and are we talking about the accessibility issues for this so I'm gonna first of all explain good background to the our study and then talk about the accessibility issues and try and relate them to the fit out of principles which
which I've just listed here and these is these Act ones which I'm Keith the showed earlier so I won't go through them in detail but I'll try and relate these to US study okay so what is the study it's a it's a collaborative effort
project from the two universities of Newcastle and Queensland and in fact the two universities sort of they're related to keeping the sensitive data which I'll talk about briefly it's one of Australia's longest-running longer to didn't know if we don't hippedy me illogical studies so it's been going since 1996 and is ongoing and we we hope to go go further into the future standard by the Australian Government so we started off with over 40,000 women and a few years ago we got a new cohort of um of 17,000 women and I'll show you the four cohorts we work with here they are so the four cohorts are aged bathe
is based and and we define them in the years of birth so you can see this one the oldest one born 29 21 to 26 and there's three other ones of various ages and as you can imagine each cohort has their own health issues and that's what we are interested in and indeed the Australian Government is interested in so what are we collecting and our
methodology so health issues and in particular mental physical reproductive social health there's more and also life transitions so the different ages of a woman obviously go through different life transitions life events and things which related to health and employment health service use and and more and I'll just mention bit of data linkages I don't want to stress this because it's a big area with lots of issues but we have actually linked our survey data with some administrative data sets and in fact that list of their the NBS previous and cancer registries and limited patient hospital group and and emitted patient hospital these are the linkage can quickly sensitive and we treat them quite differently in how we make the data accessible so the data is used
extensively and and particularly more than 680 peer-reviewed papers have been published using our data and also we we were walk back to the government frequently and national health policies have been informed by reports end use without data okay so I'll go on to the sort of
aspects of accessibility and and to see how it relates to her data so that one there about being retrieval by an identifier using standard communications protocol so all the datasets from our survey which are analyzed and used have a have n identifier the same identifier and it's I just arrested it's de-identified but with a consistent new identifier and that's across all survey so any one using our survey data I'll just put the caveat long as it's not part of the link to data but anyone using this survey data has one and only one identifier for use and we say this has been de-identified because that there are no personal names on the data no addresses no post codes no dates of birth although the the year and month of birth are actually given so obviously to do things like I'm age analysis and any there are the main ones but any other data which is deemed to be identifier identifiable is stripped off the identifier is we call of the ID alias it's actually not the administrative ID which a respondent would see or somebody working in an office in Newcastle who some communicating with our respondents they would not know what the identifier the analyzable identifier is they would have a different administrative idea and just on this point any small cell sizes which we think are identifiable as sort of grouped into larger groups and for example country of birth we we sort of group into broad sort of continental geographical areas to avoid particular countries of birth camp coming up and anyone using the data it has to along with a number of other conditions they must not identify respondents which although we go to lengths to sort of make that very difficult lift it's conceivable that something could come up but they promise and sign that they will not fight into fire respondents if they ever had that possibility okay so I was also just sort
of asked to sort of look at legal and ethical issues so we do have a legal contract with the Australian Government Department of Health and the fact that this is ongoing and it's we have we didn't get a 20-year one and we are regularly updated and short-term contracts and also the ethics ethics committees from the two universities here have approved our usage and effect every time we do a new survey because as long as you'd know if every year we're actually going back to at least one of the cohorts to survey them each new survey which is not not identical to previous surveys is subject to ethics committee oversight an approval so that's up so we do have extensive um legal and ethical issues here so oh I want to talk about how actually I'm a
investigator or a reuse it word um would get access to our survey data so they and and as if we explain this is all on the website but um they were must first complete an expression of interest form and particularly they'd say who they are why they are a sort of a serious researcher what what they want us to find out from the data and and that would be reviewed by our publications sub studies that's the BSA committee and and if and then if it is if if they were GOI expression of interest is approved they will sign confidentiality done Huw stock documents statements before receiving the idiot the identified data and they were also must done Rebecca but their progress and they we expect um some some sort of some immediate work on on the data and for them to continue with their exes madam
if their expression interest is successful the data are actually sent to them and this is this is an area work which I'm directly involved in and so we we do it before sending that are encrypted we use 7z Sigma Z software and that's compresses it as well we use the net cloud store system to send data to the approved researchers reuses and an email was sent to them as well with with passwords but also to establish contact with the management here with server for future correspondents and I just put a note here about we have linked out of but Wheaton we never sent this out actually and anyone using this has to actually come to our offices all or actually there was a sex Institute Shore facility which also can have it but we don't know and they linked at us and we have agreed not to send it anyway so public metadata so this refers back
to the protocol been open so we have a website which lists the the above procedure in fact that I went through but also has a lot of metadata on on it including a data dictionary which lists all the variables and the mini data sets we have a dietary supplement which is a description of the frequently used variables with some some detail a data map that shows how the variables are used across the different surveys in cohorts when I say different surveys the longitudinal we have up to eight surveys for some of our cohorts and so each one is deemed a different survey and has slight differences from other surveys we have a list of all the variables used and spirits for easy access we also have data books which lists the essentially the frequency summaries the variables the questionnaires that this that the respondents filled in technical reports which we produce should have go into detail on mini reports and a frequently asked question page on on exactly that and so making metadata accessible and that we make data although a doubt is not completely open we do want to make it accessible and we do archive both metadata and the data and we do that annually and with Australian data archives and although they are not releasing it yet the the plan is in the future for them to take over a release of our data perhaps when we're not doing it ourselves and and that that will be a role to keep our data sort of useful when used in the long term and yeah so that's what I've got to say I just like to acknowledge the especially the the woman in our study who fell in the surveys and of course the the government Department of Health for funding us and the universities of Queensland and New South Wales for doing a job so thank you that's what I have to say thank you David thanks just really really interesting interesting presentation interesting to hear how you've made data accessible in in practice and what it means to make sensitive data accessible to researchers Thanks thanks for that that perspective and thanks for that view on how quite sensitive data can still be made accessible through various routes I think it's really interesting to hear that you both have the route of de-identified data through appropriate routes but also linked data so much richer version but then through through either sure or we're coming to the Ale LSW ALS the Australian long as yourself I've got to work on that one thanks okay I would like to now move on to Jing bone
and Jing Bruce King bows got a I've
asked Jing bow to talk also about making data accessible through a very different perspective and Jing bow works at NCI
and their NCI does all sorts of elements
around making data findable accessible interoperable reusable today I'll asking about a focus on the accessible side of
things but I do want to note that that NCI also does a whole bunch of other things in this space thank you he's I think I will just turn off my camera because I can't see my presentation right so my name is Jing Boran I work at national computation on infrastructure which is a computer a supercomputer Center located in Australian National University campus so today I'm going to address different flavor of data accessibility practice at NCI and before I go for that I just wanted to make a comment that fair principle is quite used for to govern our data management practice and we'll use it a lot in every single aspect in our data management so
this is the quick overview of the data sets we have so as as you can see I've listed here the main data type that we saw at NCI our national collections about climate models secular images asymmetry elevation hydrology geophysics and those data are quite geospatial focused but all we also have other social science data and genomic sequencing data and astronomy data so we
aim to provide a user with data as a service as many digital repository will do in our data management we catalog data so that people can query the metadata database to find what we have here we also publish data through various data services that's a focus I'm going to talk about in the next few slides we offer data quality assurance data quality control and benchmarking news pieces would provide data through a virtual laboratories we also provide help on data visualization if I wanted to make something that we are different from other digital repository is because we're co-located with HPC facility high performance computing given the nature of our large view of the data we host more than 10 Qaida by research data so we really want to make good use of the high performance computing here - to advance science research so this is
the six dot points that I wanted to address today about a data access so I put their red color words to show the difference for each points so initially I will talk about the how do we control the data access and then I'm going to present one example of how do we use persistent identifiers to manage data access then I will talk about two main data services that we offer at NCI for our users one is the threads when the other one is a jiske which is more fancy and scalable distributed data server finally I'm going to cover very quickly about the data versioning and the quality of the data so the first point is about how do
we control the data access most of our data are coming from our stakeholders such as Geoscience Australia the Bureau of metrology there's arrow universities and many data has been funded by Australian government so it naturally fall into CC by for license some owners also impose that the data should be non-commercial non derivative or share alike type of CC by we also have international partners such as in the European and US and they impose an even stricter terminal conditions if we wanted to access the data so this is a legal perspective about how do we control the license that data access to rely senses on the file system we actually hard-coded the data access control using echoes so this is the way how do we separate different group of people access the same same data so we have basically for each collection we have two access group the first group is has the read and write permission which means those are data managers who are is able to generate data and write data and modify data the second group is a read-only group so for those people who are in the read-only group they can access the data on the file system but they can't really modify anything this way we actually protect the integrity of the data we only give access to authorized a person who really can manage the data so there is also a social aspect of data access for a research project we often see the embargo period that maybe after 2 years of the project the data can be made available also some researchers say I want to share my data after my Journal article about the status that is published so another example is from the Bureau of Meteorology we have a data that we there is a six-month time delay between the data is being developed verified until it is being operational available on our threat server so the
second point I wanted to raise is our practice about implementing a person identifier often we experience some frustration about when we give people the URL to access the data it only valid for a certain period of time or only valid during the time that somebody can maintain it afterwards we can't really guarantee and also the URL the original URL if you look at on the left hand side of the slides those are the metadata catalog URL or service endpoint URL let's um look at the second one which is service endpoint so from this URL name Commission you can tell the later part which include the project code file path file name anything in this path for example protocol changed of you you rename the file or we shuffle the file around and this link will be broken so the original URL that we provided here is not a very stable one we adopt the product that the csro developed some time ago about a persistent identifier as a broker so we know most of the time we give the external user the right-hand side the name convention as you can see we have four main categories after a PID nc800 rgw now we have data set who have services who have documentation and we have vocabularies the only thing people unique is the file identifier or you your ID it says basically as long as the identifier keeps the same the URL on the right hand side is pretty consistent if anything changed in the original URL on the left hand side what we need to do is update the mapping inside of the PID services broker without interrupting the URL that we give to the external user we have the technical implementation published in the digital science journal so you are welcome to have a look now I'm going to
talk about the main data services that he's really wanted me to address and from ancestress perspective so I divided our the type of data service into two main Group one is the OGC services I'm going to talk about more about what is out you see in a second the other type of data services is more project step specific such as we are one of the largest node in Austria in southern hemisphere as part of the Earth System Federation grade which is the aggregation of climate model from Global Research Institute so the way we provide data services is we copy the main of the data model to serve for Australian users another fancy data services I'm going to show you a bit more is called jiske it's a scalable data server that directly interact with our file system so what is
OTC how to see is open geospatial consortium it is an international nonprofit organization to make quality open standards for global geospatial community we find OGC standards quite useful for us because we have a lot of geospatial future data and OTC have all sorts of standards for different type of mapping feature coverage processing for us to use because it's so common it's free for people to use and if we made a data available through OGC standards a lot of people naturally can access our data so
that's the motivation so what is the OGC services it's actually an API in in the middle between the data store and the user so the user can request whatever available on OTC services let's say I want a map about the anomaly across Hope Australian continental and NCI hosts this data but we will host the data we don't host images what the LTC web services is do is he actually extracts the image it returned back to the user and user can takes the URL which contained an image of the data put on their own web portal for example you can get the URL and copy and paste onto the national map to show the grades so NCI
has two main production data type service one via the threads so you can often find the threads available on our data catalog this is the interface of the geo network so the red circle link is the NCI stress thread server so you can open it click it a second interface
is that they catalog they more or less contain the same information but serving for different purposes geo network is mainly for data harvester from machine accessible the data catalog is for human readable so stress in a short in a very
simple term is it's a data services which allow you to browse and access the data so I've listed here six main type of service that stress offer the very first a--to opened up and that Cydia is subset subsetting the data so we have a lot of very large data but in practice when scientists access the data they don't necessarily have to access all the data they might just need a very small piece of data from this big pool so what the threads can offer is you can actually be fine your query and only get the data the part that you want so it really saved a lot of traffic on the internet and this the other two standard OTC web mapping services web carriages is very popular for people to access the mapping and coverage directly out of our data and of course threats offer a very quick data viewer if you don't know what this data is you can have a quick look of what it is on the web without downloading it of course they're also threads offer the direct download if you really want to download the data another
fancy mobile distributed in a server that I was talking about is Cochise key jiske is the in-house and Sabet uncie I developed a product what it does is we have a lot of data on a file system millions meanings file on a system if we wanted to people to query this data how it's going to be very hard to create millions of metadata records for every single file so what we've done is will use the crawler to craw the file system get the header of the file and formulate as a database metadata database and then the database will be a query window for people to handing the request say give me a pol give me some images in a polygon at what at some time so they so the metadata database actually includes those essential geospatial information and it returned back to you the user of what they requested so we publish the recently technical details of jiske implementation you're more than welcome to have a look everything but I think
you're getting closely and I just wanted to ask you if there was only only about a minute or two left so if you could yes so the last of two points will be version data again because of the scale of the data we can't really store every single step of the data so what we can do is we stop the raw data and the final version and we keep the URI of the metadata in the middle step so in that way the province information was kept and also saved the storage the last
point of the quality data is I would think somewhat some user said we can't really assume we can access data and data is flawless so by publishing data aside with the quality report we wanted to provide a data access with a certain type of assurance so we also have the publication that is in going to be in place very soon thank you for your attention and that's our experiences so far about this access thanks thanks
Jimbo that was really very rollover a very quick overview of of all the one all the
work you've been doing there around services and all the work you've been doing there about making data accessible and not only for humans but also for machines so first of all I would like to
thank thank David engine Bo again for providing a sort of an insight into what it means in practice in making data accessible from different perspectives so that was the very interesting presentations in case you are interested in learning more about making your data accessible and things you can think about there this is slide provides you some resources the mid day two projects got a number of materials around sensitive data there's a link here to the Australian Australian data archive and the access conditions there on the Ann's website we have some materials on sensitive data another piece of work we're doing together with the community is looking at Oh services so this is the work that Jinbo also talked about and making sure that the service is over the data are discoverable and there is an interest group working in this space so if you are interested in learning more about it and in also engaging more around that please follow the link and there's more information in there about that data services interest group last year we also did a 23 things we searched out of things and two of the research data things are relevant to the topics discussed today so have a look at those thing 10 and thing 19 if you want to learn more and also wanna get your hands dirty can try out a bit a little bit what it what it means to make data accessible the link at the bottom is the general link about the fair data principles on the Anne's website so this week we've talked about accessible next week we're gonna be talking out interoperable thank you all for your attention or finally I would like to
acknowledge and thank first of all a speakers for today and but I would also like to thank ingress so the National collaborative research infrastructure strategy program for funding ends and making this all possible thank you all for your time and look forward to seeing you next
Feedback