FAIR in Astronomy research - webinar 22 Oct 2018

Video thumbnail (Frame 0) Video thumbnail (Frame 1858) Video thumbnail (Frame 4441) Video thumbnail (Frame 16843) Video thumbnail (Frame 18227) Video thumbnail (Frame 21591) Video thumbnail (Frame 23202) Video thumbnail (Frame 25675) Video thumbnail (Frame 26943) Video thumbnail (Frame 29641) Video thumbnail (Frame 32880) Video thumbnail (Frame 34272) Video thumbnail (Frame 35691) Video thumbnail (Frame 37725) Video thumbnail (Frame 39660) Video thumbnail (Frame 41112) Video thumbnail (Frame 43170) Video thumbnail (Frame 44480) Video thumbnail (Frame 45892) Video thumbnail (Frame 48201) Video thumbnail (Frame 50486) Video thumbnail (Frame 52401) Video thumbnail (Frame 54827) Video thumbnail (Frame 61065) Video thumbnail (Frame 63362) Video thumbnail (Frame 64563) Video thumbnail (Frame 68284) Video thumbnail (Frame 69532) Video thumbnail (Frame 72780) Video thumbnail (Frame 76582) Video thumbnail (Frame 78532) Video thumbnail (Frame 86858)
Video in TIB AV-Portal: FAIR in Astronomy research - webinar 22 Oct 2018

Formal Metadata

Title
FAIR in Astronomy research - webinar 22 Oct 2018
Title of Series
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
In this webinar ARDC partnered with the ADACS project to explore the FAIR data principles in the context of Astronomy research and the ASVO and IVOA as a community exemplars of the implementation of the FAIR data principles. Keith Russell (ARDC): Looking at FAIR In this talk Keith will provide an overview of the FAIR principles and how it was used in astronomy before it became official. He will conclude the talk by discussing what other disciplines can learn from their approach. Dr. Katrina Sealey (Macquarie University): Australian Astronomy and the FAIR Data Principles - what does this look like? The broad concepts of findable, accessible, interoperable and reusable data has long been a part of the astronomical communities data principles. Now that the world wide data communities are working towards formalised FAIR principles we will discuss how the astronomical community is adopting the FAIR principles into our ASVO data management. Dr Luke Davies (ICRAR, UWA): Hosting the next generation of galaxy evolution surveys at AAO Data Central The final talk of the webinar will look at a case study. Luke will discuss data products from the Galaxy And Mass Assembly survey and Deep Extragalactic Visible Legacy Survey (DEVILS -devilsurvey.org) and how they are generally used. Furthermore, he will outline some of the challenges faced in terms of hosting and serving these data to the community, and how the AAO Data Central helped overcome these challenges.
Slide rule Goodness of fit Data management Personal digital assistant Self-organization Perspective (visual) Computer programming
Pattern recognition Standard deviation Group action Context awareness Decision theory File format 1 (number) Set (mathematics) Function (mathematics) Information privacy Perspective (visual) Independence (probability theory) Different (Kate Ryan album) Repository (publishing) Information Series (mathematics) Formal grammar Collaborationism Service (economics) Link (knot theory) File format Building Metadata Bit Translation (relic) Open set Formal language Latent heat Repository (publishing) Self-organization Website Authorization Quicksort Representation (politics) Spacetime Point (geometry) Slide rule Service (economics) Identifiability Link (knot theory) Computer file Collaborationism Virtual machine Expert system Generic programming Metadata Element (mathematics) Number Local Group Time domain Latent heat Natural number Term (mathematics) Energy level Commitment scheme Statement (computer science) Communications protocol Traffic reporting Standard deviation Information Coalition Forcing (mathematics) Projective plane Mathematical analysis Expert system Content (media) Independence (probability theory) Generic programming Basis <Mathematik> Codebuch Inclusion map Uniform boundedness principle Commitment scheme Basis <Mathematik> Ontology Function (mathematics) Sheaf (mathematics) Universe (mathematics) Statement (computer science) Formal verification Momentum Routing
Group action Different (Kate Ryan album) Computer-generated imagery Strategy game Computer program Materialization (paranormal) Interpreter (computing) Bit Quicksort Series (mathematics) Spacetime Wave packet
Collaborationism Slide rule Group action Context awareness Presentation of a group Computer-generated imagery Projective plane Moment (mathematics) Computer program Independence (probability theory) Coordinate system Disk read-and-write head Limit (category theory) Machine vision Software Strategy game Self-organization Whiteboard Office suite
Area Group action Integrated development environment File format Multiplication sign Coordinate system Planning Bit Right angle Mereology Communications protocol Element (mathematics)
Multiplication sign Self-organization Mereology Booting Machine vision
Filter <Stochastik> State observer Web crawler File format Real number Virtual university Theory Computer simulation Mereology Medical imaging Different (Kate Ryan album) Cuboid Quicksort Astrophysics Astrophysics
Metre Point (geometry) Filter <Stochastik> State observer Collaborationism Group action Information management Multiplication sign Archaeological field survey Set (mathematics) Database Total S.A. Supercomputer Medical imaging Different (Kate Ryan album) Operator (mathematics) File archiver Software testing Object (grammar) Quicksort Position operator Spacetime
Point (geometry) Slide rule State observer Standard deviation Matter wave Archaeological field survey 1 (number) Set (mathematics) Bit Degree (graph theory) Revision control Centralizer and normalizer Repository (publishing) Term (mathematics) Optics Table (information) Arithmetic progression Communications protocol
Group action Uniform resource locator Mathematics Process (computing) Query language Universe (mathematics) Digital object identifier Quicksort
Frequency Execution unit Standard deviation Multiplication sign Intrusion detection system Archaeological field survey Bit File viewer Quicksort Open set Metadata Stability theory
Area Identifiability Term (mathematics) Website Arithmetic progression
User interface Area Nachlauf <Strömungsmechanik> Identifiability Touchscreen Query language Digital object identifier Parameter (computer programming) Communications protocol Metadata
Area Centralizer and normalizer Information Internetworking Term (mathematics) Personal digital assistant Archaeological field survey Virtual machine Metadata Product (business)
Information Computer file Software Personal digital assistant Archaeological field survey Reduction of order Videoconferencing Metadata Reading (process) Product (business)
Authentication Slide rule Service (economics) Moment (mathematics) Database Mechanism design Nachlauf <Strömungsmechanik> Different (Kate Ryan album) Query language Authorization Self-organization Information security Fingerprint Systems engineering
User interface Service (economics) Sequel Query language Commodore VIC-20 File archiver Set (mathematics) Website Quicksort Lattice (order) Machine vision
Point (geometry) Service (economics) Electric generator Touchscreen Gamma function View (database) Archaeological field survey Moment (mathematics) Archaeological field survey Core dump Online help Bit Evolute Computer programming Data management Centralizer and normalizer Mathematics Process (computing) Term (mathematics) Time evolution Systems engineering
Metric system Covering space Range (statistics) Archaeological field survey Library catalog Complete metric space Mass Data analysis Food energy Number Local Group Different (Kate Ryan album) Term (mathematics) Musical ensemble Hardware-in-the-loop simulation Integrated development environment Data structure MiniDisc Extension (kinesiology) Multiplication Area Gamma function Archaeological field survey Mass Field (computer science) Bit Database Group action Evolute Mathematical morphology Similarity (geometry) Category of being Type theory Spectroscopy Medical imaging Time evolution Universe (mathematics) Helmholtz decomposition Quicksort Gezeitenkraft Data type Local ring
Email State observer Dynamical system Pixel Link (knot theory) Computer file Matter wave Computer-generated imagery Mass Field (computer science) Rotation Product (business) Medical imaging Centralizer and normalizer Different (Kate Ryan album) Representation (politics) Diagram Series (mathematics) Physical system Rotation Dynamical system Scale (map) Email Scaling (geometry) Information Gamma function Fitness function Mass Database Library catalog Line (geometry) Complete metric space Evolute Mathematical morphology Measurement Product (business) Kinematics Universe (mathematics) Order (biology) Right angle Quicksort Data type
Email Building Greatest element Group action Multiplication sign Archaeological field survey 1 (number) Database Matching (graph theory) Function (mathematics) Medical imaging Different (Kate Ryan album) Personal digital assistant Descriptive statistics Stability theory Source code Email Gamma function Computer file Fitness function Sampling (statistics) Electronic mailing list Mass Maxima and minima Term (mathematics) Product (business) Sample (statistics) Internet service provider Interface (computing) Right angle Quicksort Data type Curve fitting Slide rule Table (information) Link (knot theory) Computer file Computer-generated imagery Product (business) Local Group Centralizer and normalizer Term (mathematics) Focus (optics) Matching (graph theory) Image resolution Interface (computing) Archaeological field survey Database Library catalog Number Computer hardware Table (information)
Point (geometry) Service (economics) Computer file Gamma function Multiplication sign Computer file Projective plane Archaeological field survey Curve Independence (probability theory) Database Database Bit Term (mathematics) Single-precision floating-point format Universe (mathematics)
Medical imaging Centralizer and normalizer Service (economics) Medical imaging Link (knot theory) Term (mathematics) Server (computing) Source code Archaeological field survey Energy level Login Chi-squared distribution
Slide rule Functional (mathematics) Greatest element Service (economics) Cone penetration test Patch (Unix) Multiplication sign Computer-generated imagery Source code Archaeological field survey Login Mereology Product (business) Medical imaging Term (mathematics) Personal digital assistant Query language Form (programming) Source code Cone penetration test Matching (graph theory) Gamma function Interface (computing) Archaeological field survey Moment (mathematics) Fitness function Sampling (statistics) Database Library catalog Term (mathematics) Product (business) Data management Sample (statistics) Query language Personal digital assistant Computer hardware Analog-to-digital converter Interface (computing) Convex hull Object (grammar) Asymptotic analysis
Point (geometry) Group action Computer file Feedback Computer-generated imagery Source code Virtual machine Content (media) Mereology Product (business) Measurement Revision control Mathematics Term (mathematics) Single-precision floating-point format Personal digital assistant Software Diagram Backup Source code Information Archaeological field survey Feedback Library catalog Control flow Term (mathematics) Measurement Product (business) Number Process (computing) Sample (statistics) Software Computer hardware Revision control Interface (computing) File archiver Table (information) Resultant Modem Reduction of order Asynchronous Transfer Mode
State observer Functional (mathematics) Server (computing) Computer file Feedback Firewall (computing) Virtual machine Archaeological field survey Measurement Centralizer and normalizer Term (mathematics) Software Fiber (mathematics) Firewall (computing) Structural load State observer Bit Database Stack (abstract data type) Process (computing) Point cloud Right angle Modem Reduction of order Asynchronous Transfer Mode
Axiom of choice State observer Group action Matter wave Multiplication sign Archaeological field survey Sheaf (mathematics) Set (mathematics) Mereology Type theory Different (Kate Ryan album) Area Link (knot theory) Gamma function Structural load Moment (mathematics) Sampling (statistics) Coordinate system Bit Type theory Process (computing) Time evolution Self-organization Right angle Quicksort Data type Data structure Point (geometry) Functional (mathematics) Maxima and minima Online help Number Centralizer and normalizer Term (mathematics) Intrusion detection system Energy level Data structure Execution unit Standard deviation Information Uniqueness quantification Archaeological field survey Projective plane Expert system Field (computer science) Correlation and dependence Library catalog Evolute Personal digital assistant Function (mathematics) File archiver Object (grammar)
Word Gamma function Code Term (mathematics) Multiplication sign Moment (mathematics) Computer program Archaeological field survey Row (database)
okay good afternoon everybody and welcome thank you for joining this webinar this afternoon this is a parching partnership webinar between the a IDC and EDX programs and today we will be looking at the fare data principles in astronomy research we have three speakers today so I will introduce our first speaker who is case Russell from the ard see he's our partnership programs manager and he will be talking about the fare data principles so Keith so you should now see a rather perhaps
slightly corny slide about the fare in front of the stars is that is that up and running okay so today I'll kick off very briefly talking about the fare data principles I work for the Australian Research out of Commons an organization that is not specialised in astronomy so the perspective I'll bring to this is just a general introduction on the fare data principles and then later Katrina and Luke will talk about what how the fare data principles have been implemented in astronomy and I'm also really intriguing what they have to say so I'll try keep it short so and I'll try and advance to the next slide that we are the fare data principles so the
the principles were drafted in a workshop in at the Lawrence Center in Leiden in the Netherlands back in 2015 and those principles were rapidly received more and more interest and attention from all around the world first of all they were described in a nature article and we'll put up on the force 11 website and slowly but surely organisations started to pick up and recognize them and say well these are actually really useful way of thinking about how you can maximize the reuse of research data so I think there's a few few elements there which are important it is to keep in mind that that sort of have been very helpful in making them so successful one of those is that the principles are technology agnostic and they will work across all sorts of different technologies another aspect to them is that they are discipline independent there's no they're not drafted from the perspective of one specific discipline but they can be applied across all sorts of different research decisions another I think another element in their success has been the fact that they address both aspects of the data but also have the metadata on top of the data that will enable making the data more reusable and I think final point to keep in mind I think a very important point is that they look at not only and look at it only from the perspective of the human wanting to reuse the data but also from the perspective of machines wanting to pick up and bring together huge amounts of data analyzing that and enabling a further research across a wide array of data assets so from the perspective why
would you bother making your data fair I think there's a number of things to keep in mind well first of all making the data fair enables the valuable reuse of valuable research data outputs or research outputs and their research it will enable research to be more reproducible and verifiable it also makes it possible to bring together and start building up a rich set of data assets and the data set of data assets that you yourself have control over but you can also share with others and that can form the basis for collaboration with research partners most nationally internationally etc I think a very important point and that comes out of that point I made on the last slide about bringing together data sets I think that the machine readable aspect in the current day with the huge emphasis on data intensive research I think making data fair and especially interoperable enables novel and innovative research and a final point in in the current day is also the emphasis on impact and making sure that data has impact and research outcomes can be translated and be used and picked up by business and industry but also by policymakers in the general pub and general public I think fear is a very important aspect in making sure that data can enable that so since these principles have been drafted it's been picked up in a lot of different policies all around the world and you're seeing so for example publishers have in the past already had data availability policies and those data availability policies are becoming stronger and stronger and recently that has resulted in a coalition of so it's called cop des the coalition of publishing data if for Earth and Space Sciences has set up a statement of commitment in which they describe their they ask publishers and also data a data infrastructure organizations to commit to trying to make more data fair and that's received a lot of interest and now has been signed by a number of publishers in and quite significant publishers too so that includes Elsevier Wiley Springer and nature so not smallest publishers here in Australia funders have also picked this up and there's been a bit of interest in it and funders are starting to talk about fair and how that could be incorporated in what the funders are doing the the universities Australia DVCC our committee set up a set up a working group to look at what fair could mean and that resulted in the fair access policy statement which is now also available online and received quite a lot of attention internationally funders also looking at it and of a requesting data sharing statements alongside alongside grants and the European Commission has set up an expert group on fair data saying well what does it mean if we want to make more data fair how does how would that happen what what are the actions that would need to be taken in that space and there recently was a release the interim report for comments so what are the fair data principles the fair date principles for letters findable accessible interoperable reusable there there's actually a little bit of level of detail behind those four letters and I'll very briefly try to unpack that here so findable in the context of the fair dead principles means that researchers should describe their data and make it well described make sure it has a persistent globally unique identifier and the thinking behind that is that data does not get lost if you put your data somewhere that even if the data moves that a researcher or somebody wants to reuse the data can still find it also that should be findable through discipline specific search routes and generic ones the a stands for accessible and accessible does not mean open necessarily that does depend a little bit on the context so one way of describing it is saying data should be open where possible but closed where required and in some disciplines that can be important especially when you're thinking about a sensitive data a culturally sensitive data privacy privacy issues related that commercially sensitive data etc the data should be made accessible through appropriate routes and one way to do that is to deposit it in a repository which provides those routes to access the data and one thing to keep in mind there is that data sometimes needs different services over it so some data you can see if it's a small data set you might make it downloadable but if you're talking about a very large data set it actually can make more sense to provide the data as a service so others can approach the data and pick up specific bits they need for their specific purpose rather than having to download huge data sets and finally if you are talking about close data at least provide information about how a user of the data can get access to it and some background information so they know what they're actually looking at interoperable so this comes back to that point about data intensive research and how fair data can play a role in enabling data intensive research if you are going to make your data available make it available using a standard fort file format that other people can understand and use and try for the content of the data try to think about using a community agreed vocabulary hopefully there is one already in your in your community in your research community if there's not it actually makes sense to come together with others others in your space to agree a vocabulary so that people know what the terms are that you're using and there is an agreement around that that goes for the data and also for the metadata on top of on top of your data finally include links to relevant information about the different players the different people involved in creating the data also the different the different projects involved at publication that was involved that sort of information is extremely valuable especially if you tie that together in the interoperable fashion it makes it possible to find that information the last letter in the series is reusable I always say don't think that FA and I are not about reusable I think it's just the extra bit on top of FA and either make it reusable so to make it reusable it has to be findable accessible interoperable but there's a number of further aspects you might need to you need to think about to make it reusable so one aspect there is include not only some information about what so so people can find it but also some richer information in the background about more describing the actual output usually this is most discipline specific information metadata around the data include information on how the data was created some provenance information and that's extremely useful for reuses of the data to understand what the settings were what the and by which instrument the data was collected the different analysis tools that we use to actually create the data and the settings on those tools etc finally a very important point assign a machine-readable license to the data and we recommend a Creative Commons license machine readable so that machines can also harvest the data and find it and pick it up and use it and make sure it has a license because data that doesn't have a license for a reuse er is actually really difficult because there's no clarity on what you can make of the data so wrapping up if you want
to learn more about the fair data principles and different materials to support you in that the AR d c-- australian research starter commons over the past years started to collect all sorts of materials in that space we have a whole series of training resources training resources and materials broken down by the fair date principles so have a look at those if you visit the the URL there you'll be able to get access to these materials and what we've also developed is a fair self assessment tool which allows you to usual you hold it up against your own dataset and have a bit of a check and see ah how fair is my data what sort of actions could I take to make the data more fair it's an interpretation of the fair data principles and it's sometimes a useful way of looking at how you can make your data more fair so thank you all this was
just a very brief introduction I hope this is useful and now I'm really intrigued to hear what fair means in astronomy so I'd like to hand over now Thank You Keith next we have dr. Katrina
Seeley from Macquarie University and the aao who will be talking to us today about
the Australian astronomy and fair data principles in astronomy Thank You Katrina thank you so much so I'm going
to talk about the old sky virtual observatory and are we fair we're fair in this context is exactly what Keith was just talking about so this is going to be a story of collaboration for a shared vision where astronomical data is findable accessible interoperable and reusable and within the Australian community this is something we've been thinking about for a very long time and we'll get to that in a few more slides so just starting officer yes I am the head of research data and software at a Macquarie but I am also the ASBO national coordinator so I'm just going to take a slight interlude at the moment and just talk about astronomy Australia Limited and again a lot of what I'm going to say ties back directly to Keith's presentation it's particularly around funding bodies and governance around the fair data principles so Australia Astronomy Limited is a not-for-profit joint venture organization that has 16 Astronomical member groups and these are groups that are very keen and very active in doing astronomical research it has a board that can consists of astronomers as well as some independent community members and the the board has two advisory committees a scientific advisory committee as well as a project oversight committee so Australia Astronomy limited as an organisation
works to ensure that the funding that comes into Australian astronomy meets the priorities and needs of the astronomical community there's an astronomy takeda plan and as well we follow their increase roadmap so part of the governance around Australia Astronomy Limited is to make sure that the funding goes to the right priority areas and one of those areas is a research and under that umbrella Australia Astronomy Limited has various groups working beneath it and part of that is a SVO coordination role so let's have a little bit of a chat about the all-sky virtual Observatory so this is the Australian part of a much larger international virtual Observatory community so going back to the early the early 90s the astronomical community
community started talking about standardized formats for their data and protocols which to work under so by 2002 the international virtual observatory alliance ivo a I'll talk about that a few times was born and the Australian communities always been part of this international virtual Observatory Alliance community now there's no membership fees this is just and understanding in a collaborative environment that all the astronomers
want to work under to ensure that our data does meet those goals of being findable accessible interoperable and reusable remembering that we were trying to do this long before the fair became
fair so we were originally as vo and in probably four or five years ago there was a reboot around the time that they all began became more involved within the II research part of the Australian community and we really rebranded rebooted as the ASBO so there are five Australian nodes underneath this banner and we're all there because we have a shared vision and we want to be part of the all-sky virtual Observatory so I'm going to have a quick look over each of the five nodes so that you can see that they're all run and organized by five very different organisations for very different purposes so firstly you will
have a look at their Murchison widefield array so this is an international consortium led by Curtin University I love this radio telescope I love the little spiders so that's why you get two of those images there and you can see how that ties in to their logo MWA is one of the four SK a precursor telescopes and they currently have 28 to petabytes of publicly available data so what I find amazing is that each observation can be up to hundreds of gigabytes in size so part of the MWA if the own notes goal is to take that data and make it into smaller more usable sized chunks for the astronomers the next note that we're going to have a
look at is completely different this is the theoretical Astrophysical Observatory the town node and it's led by Swinburne University so tel has within it datasets for simulations of cosmological and galaxy formation data but you're also able to go there and generate your own virtual universes so you can see there there's the box you can basically set up what you want put in the different sort of filters that your telescope might have and then produce the data so theoreticians and observational astronomers alike can use the theoretical data to see if it matches the real world data we then have
an optical node so this is the sky mapper node it's again a consortium this one is led by a an Australian National University it's a one point three meter telescope it's siding spring Observatory and what skymapper does is continually Maps the sky so it has different filters and it Maps the sky continually through all those filters which means you have a huge time base of different observations so astronomers can go back and have a look at different times to see if perhaps the star is exploded or to see if there are very fast-moving objects that have changed position in the sky so it's a very active virtual observatory or database of objects here optical data's Tim's not to have is larger size data sets as the radio telescopes so the total survey size so far is one petabytes of data but it's still very impressive that every second there's a hundred megabits of data being produced the fourth node so we have the Kasden
node this is a collaboration between the CSI CSIRO astronomy space Sciences as well as the CSIRO information management technology and Palsy supercomputer groups this is a data archive for Australia's ska Pathfinder it's when it is in full operation it will have five petabytes of data a year so it's just in its testing sort of preliminary stages you can see 36 of the antenna and some of the images that we get from the castor node now I want to talk about a slightly different
no the last four nodes all hold data from one particular telescope data central started off as the AAT node of the all-sky virtual Observatory an AAT standing for anglo-australian telescope and you can see a picture inside the dome there and started off by taking some of the survey data that was taken from this telescope of national significance but now we have over ten data sets within data central of optical and other wavelengths more importantly we have 44 years worth of legacy data taken from this telescope very soon you'll be able to just download your pipeline data reduced data and there's other data coming from other telescopes at siding spring Observatory that will do a live data feed in the data central is a repository of primarily optical data sets but there will be others there as well now I'm just going to go forward and the next few slides are nowhere near as pretty and it's graphic as these ones so I asked each of the five nodes to tell me how they found themselves to be findable accessible interoperable and reusable so this is not yet looking at the 15 fair points we'll do that as well but let's just have a little bit of a walk through each of the nodes now in
these tables why is a yes and our P is in progress so usually there's some degree of yes there and we're working towards it so for each of the nodes in terms of being findable the nodes have or all the observations have a unique ID we all are working towards the International virtual observatory alliance standards follow the TAT protocol now I've pulled that out separately that's actually one of the Ivo a protocols it's the table access protocol it's particularly important because it's a way that you can query using SQL or the astronomy version of that a dql you can actually then go and any of the datasets that are registered with the I VOA you can go and
query them and that's not just our AF vo that's all the international datasets as well so it's already a way that you can go out and be findable accessible and interoperable and reuse data you'll see that most of the nodes are ivo a
registered or have their do is their digital object identifier z' MW a is only just getting their first data releases together so they're in the process of minting their do eyes and data central has recently gone through a transition from the government sector into the university sector so during
that stage there was a URL change in a some other sort of details that needed to be worked out so we're in the process now of registering with ivo a and working on the do ice so it was covering
all findable so if we have a look at accessible now again a little bit as keith said before talking about whether your data is open or perhaps closed just depending on on your what you require astronomers have open data there might often be a period of time where it's proprietary data that might be typically 18 months but generally astronomers will share openly their data so all of them are open skymapper has initially
Australian data releases before going global but astronomical data is there for the though again being accessible we use the same standards we're making our data more accessible by putting in the UI so user interfaces our data can all use standard data viewers and we'll use standard tools if we have a look at interoperable again standard data metadata survey IDs so yes we think we're interoperable and being reusable so as I've said we believe we have an open policy so please take take our data stable data releases and they signed IDs and metadata so that was sort of how we
viewed the four terms I then and then very glad that we showed this I then asked each of the nodes to go to the ADRC or the ants nectar RDS or au site and look at the fare tools and actually use the tool to find out where
they sat with each of the fare as per kids talk principles or guidelines so I've just said below what each of the the fifteen principles are now that doesn't look very easy to read so I've popped this over the top so yes the wise yes they may not be as far as we would like to take them but we have them in place and these are areas that we know we need to improve or are in progress so one of the first areas just to have a little look at is specifying the data identifiers now okay spoke about this and this is something that we do however the tap
protocol being quite an early developed ivo a protocol does not transfer through the DOI so if you do a query you won't get the DOI another example is data central has a user interface and it's very configurable so the user can actually choose what parameters get set back and
displayed on the screen so you can turn off the data identifier if you wish so this is a clearly an area we need to then look at and improve if we then go across the A's we can see that we cover off the accessible quite well if we have a look at the interoperable there's a couple of places there to improve caster and Terrell both acknowledge that they could improve their metadata to include more of the vocabulary that we use there is clearly a lot in the metadata but this can always be more and if we have a
look at I three this is about talking about having qualified references to other metadata so that's just making sure that you might link off to your your definitions for your vocabulary and so forth so we all agreed that we could actually improve in that area as well so if we didn't go across to the reusable we can see that we probably need to again with in terms of licenses that we have two of the nodes there have the creative common license in place but it to astronomers we've always felt that our data is open however in the internet
world that is not the case you need to have a license that shows that your data is open so we're working on putting the Creative Commons license in place on the other nodes and the last area we really need to work on and tweak is some of improving some of the provenance information in the metadata so taking again some other examples we might have some recent surveys within data central and the data might be data products so the raw data has come from the telescope it's then being calibrated and then it's had a whole lot of manipulation done to it so all of that information needs to be within the metadata or be machine
readable or provide a ball and so that will be
case for a lot of the data but if we have a survey that is from ten to fifteen years ago it won't necessarily have all that information fed into the metadata we might need to then go back and link the papers that describe the data reduction or the producing of the data products or pop it into the read me
file some of the nodes have videos as well that explains how the data is taken how you can reduce the data where you can get the software to reproduce so it's all about making sure that we have transparency and reproducibility across all the data and that's something that as a community we know that we need and we're strongly working towards so just to kind of pop a cross to the the last
slide that we have here so we've talked about each of the five nodes and we're all working towards being fair or fairer but we'd also like to see ourselves as one and be a SVO fair so we've got a whole lot of different I guess activities going on at the moment as a community we're working on a shared authorization and authentication mechanism because the five nodes all
belong to different organizations for example CSIRO would need to have their authorization mechanism to log into their databases data central has its own so we're putting a thin layer across the top so that you can log into data central then if you do want a query across to some data a caster or mwa you can then be authenticated just we're putting them in place at the moment
we're also working and we've done some pilot testing so that we can work with the distributed data sets that we've got and now I'm talking about through the user interface so not through doing the sequel or a dql query so we might be able to do an optical to optical query find that there's data bring it back open it up in a tool that allows you to overlay and visualize and within the next 12 months we're looking to do some pilot testing with optical to radio data as well and then spreading that out across all the nodes that we have and then across the international virtual observatory community as well so not just with our Australian data but also looking at incorporating other large edges and we've also got a pilot test going on with ESO European Southern Observatory so international data archives as of community I said it's a very it's a collaborative arrangement we're all there because we want to be working together we have monthly a SVO technical meetings we have bi-yearly workshops we're working on a coordinate vision coordinate a vision that we're all doing together unifying our website so one entry portal for everyone and of course as a collective we engage with the international virtual Observatory alliance community as well and hopefully we'll be in a place to contribute some of these tools that we're putting back into the community so just to sort of fire I guess finish up are we fair I would say yes we are but could we be
fairer definitely or could I even say we're working towards being the fairest
of them all thank you Thank You Katrina
very interesting next up we have Luke
Davies from UWA and Ikra he will be talking about hosting the next generation of galaxy evolution surveys at the AAA data central so thank you Luke
over to you hi okay so I assume you can all hear me and see the screen now so there's a little bit of a change of pace from the other things that have been presented here and I'm going to talk about this from astronomers point of view as someone who actually goes and collects some of this data and I'm currently working on two large surveys which are being hosted by the data central node of the SVO that Katrina mentioned just now so one of these is and The Devil's Survey and one of them is gamma cameras a survey that finished a while ago but we were in the process of bringing over all the data into Data Central and Devils is a program that is ongoing at the moment we have some different extra challenges that we're working through in terms of managing and running a survey and with the help of data central so I'm not going to be talking too much about fair data policies because I don't know huge amount about it but I'm going to talk about some of the data that we take how we'd like to use that data and how data central is helping us do that so
firstly just as a bit of a quick background on the surveys and gamma is basically the galaxies and mass assembly survey it's a survey that's sort of designed to measure the evolution of energy and structure in the relatively local universe by measuring lots of galaxies so basically we measure the properties of about 300,000 galaxies in lots of different ways don't worry too much about all of these numbers but all you really need to know is that we cover quite a large area of the sky and we try and pull together a lot of different disparate data types from lots of different surveys and put them into this one large overarching database that we try and then link up different properties of the galaxies now Devils is this new ongoing survey
and you can kind of think of this thing as being very very similar to gamma in terms of its goals that it's hoping to achieve but it's actually you're hoping to to do this in terms of evolution over the last sort of eight billion years in the universe so gamma looks at the very local universe and devils then looks at the very distant universe and we try and link those two things together it's one of the quite important things about these surveys is that we try and use the same data analysis techniques the same data types over extensive sort of redshift range in the universe and try and piece everything together so it's important that we are consistent across all of these surveys so just a bit of a
general background to try and frame how we might want to use some of this data if you look at the research field of galaxy evolution and you want to try and understand how galaxies have changed as the universe is evolved there are tons of different things that you'd like to measure about a galaxy so you can measure things like how many stars it has how much dust it has whether it has a central supermassive black hole dynamic star formation where it lives in the universe now in order to form this complete picture of galaxy evolution we'd like to measure all of these things but all of these things come from very different observations so if you want to measure the stellar mass you want to look in a optical and near-infrared going through to things like the gas masks that you measure in the radio you need spectroscopy to measure where the galaxy lives in the universe so actually measuring all of these different things requires you to use different telescopes and different instruments so within gamma using the AAT and this Katrina mentioned recently and with Devils using the same instrument we measure the spectroscopy but then we also compile together all of the data from all of these other facilities at all of these other wavelengths to try and build this complete picture of galaxy evolution now these different facilities and telescopes have completely different data types that are stored in very different ways which have lots of information stored in different ways and catalogs in different ways so it's very very tricky to try and combine all of this but if you want to do interesting things with science you have to actually draw lines across this diagram and link up the different data types that we have so only by being able to combine all of this in some massive database can we actually do the really interesting cutting-edge science that we'd like to do so what are the kind of data products
that we measure within gamma and Devils so firstly one of the primary things we measure are imaging datasets so we basically have a series of very large fits images coming from different facilities so on the right in this image here this is just a sort of brief pictorial representation of some of the data that we have each one of these images represents about an 80 gigabyte Fitz file showing you different wavelengths of light as you go through from the UV into the far infrared now these images sometimes have very very different Fitz headers they have different pixel scales they can have different rotations and sizes and world coordinate systems applied to them as well we then also have a huge number of
data tables so this is just a list of some of the tables that we currently have in gamma each one of these links on the left is a different data table but each one of these can contain sub tables as well so some of them have up to four or five tables that sit nested within them we also have non unique matching between these tables so some matched on a base catalogue ID but then everyone's have multiple entries for the same ID which I can explain in detail if people are interested we then also have descriptive files that go with each of these tables and there's a minimum of two per table so one that describes all of the columns and the the you CDs and things and then one that is a general more basic description of everything that happened to generate the table then we have spectroscopic if it's data files as well so these are essentially 1d fits files which describe the spectroscopic data so these have very different headers and things as well to the fits imaging data they also come from many different facilities which have different resolutions and different headers and things so we have data that we compiled from the 80 but also from the Sloan Digital Sky Survey and things like said cosmos which is run on the VLT as well so we have to try and combine those into some consistent manner and then finally we also have lots of PNG images which display data products such as SED fits like you have in the bottom right here but also ones that describe group diagnostics and stellar-mass fitting and things like that so there's quite a lot of different data types and data products that we produce for both gamma and for Devils so I'm just going to return to this slide a few times through this talk to talk about some of the problems that we face in hosting and serving this dated to the community so we can maximize the scientific output of these surveys so I've just on the left here and written out a few different things that we might want to have from our data products this is purely focused on trying to extract the maximum science that we can so we really want things like easy access for team members which is fine but we also want to have things like and being able to cross match data products across these surveys and with other surveys and also things like being able to manage the documentation and new data products really easily and then gonna just compare this to most surveys sort of pre gamma and what gamma currently has before the move of the data central and what we now have with data central the bottom so last column on this is kind of the stuff that we have for for Devils as well and we're building this as we go along so just in terms of most surveys most surveys currently in astronomy a focus towards access to the team they set up their surveys and their samples and their data so that their team members can access it but they don't go much further than that and some do stss is a good example where they've done this well but I think this is something in gamma we've tried to do and try to go further than previous surveys in terms of hosting and serving the data most surveys don't have an easy intuitive interface some do have restricted restricted public access which is built into their databases but a lot don't and then pretty much everything else on this list most surveys pre gamma don't have so there's no way to easily combine the data across different facilities and match up with other surveys there's no way to provide long term stable data access and things like that so what we did with gamma was
we actually have our own bespoke data
access portal that we set up this was designed by Joe Liske who's at Hamburg University now and it basically allows you to access all of the gamma data but
in a very isolated sense so there's no
way to combine this data with any other facilities or any other surveys it just allows you to pull out all other things that we've measured from the gamma database now this is great if you're trying to do a project just within gamma
and it's also great if you're one of the team members who know exactly what all of this means but it's very difficult as a independent user who's just turned up to find the right things that you need to use and a lot of documentation to read and it's a pretty steep learning curve we also have a big issue on here that the documentation is provided as a text file and then ingested by a single person and then it's completely uneditable by the person who created that documentation without going through quite a long drawn-out process and then finally I think is probably most important for us here is that we have a single point of contact for this entire database so if this person is very busy and can't do things on a short timescale and things don't get updated on a short timescale and this has become a problem for us mainly because Joe is incredibly good at running this things but he set it up when he was a postdoc we didn't know much work to do and now he's a professor at a university who doesn't have much time to do this but nobody else understands how it works and can update it and that's actually caused a bit of a sticking point within gamma in terms of getting our data out there we also do have a cutout service which
allows you to pull out individual images of sources from the large imaging datasets that we have this was
essentially designed by myself and Liz Mannering who now works for data central so it can kind of be thought of as a precursor for some other things that data central is doing it basically is a base level from where data central started in terms of their correct service but once again it's very isolated doesn't link up with any other surveys it also has a single team logging for everyone so we have no idea who's using it and how they're using it and things like that which is awkward and it's very hard to go back and replicate anything that you've done afterwards so it's kind of useful as a base level it's kind of I would say better than most survey teams produced from them by themselves but it's way below where we should be in terms of
making this data with fair data access and usage so just to return to that slide and to say what we had before data central we definitely had access to our team members we're getting better with this easy and intuitive interface we do have restricted public access for parts of the database you can actually start building up some quite large samples and looking at individual sources and you can cross match data between the gamma data products but not outside of them we can use this cutout tool to extract individual images from these large fits images but then below that we're struggling because we don't have easy to access and manage data products we don't have stable long-term access we had a problem that our data service went down they're stored in one place at the moment and they went down and no one could access the data for a long time so that was a big problem and then the bottom one when I say latency for problems with people I mean just Joe being busy and not being able to do stuff which causes us issues so since we've had that for a number of years we're now moving all of the data and all of our data accessing gamma over to data' central so data central allows us a lot of functionality to be able to solve some of these issues so firstly as I mentioned before the current service that we had for gamma is now been moved over to data central it works in a lot better way is quicker you can actually run much larger queries you can access the data in a more easily way and it's much more standard artists as well in terms of the one that we previously had we already do have some of the devil's data in this cut out service as well which we don't have anywhere else you also have a query form where you can query all of our catalogs cross match them all together and find all the documentation from those catalogs which works in a much easier way than the gamma database has worked previously and then there's also a cone search which allows you to pick a patch on the sky and find out all of the gamma objects that sit within that particular region of sky which we don't currently have in gamma at all anyway so these tools are basically replicating what the gamma database has done for a few years but doing it in a much better way and adding more functionality on top of those the other thing is is that you have user histories and logins and things like that that we don't have currently so it basically provides you with this ability to reproduce what you've done and to actually check how people are using the database as well and then finally because it also has all of these other
surveys as well linked into it you can easily cross match between the surveys to do interesting science cases so we're no longer limited to just the gamma data products when we want to identify interesting samples for doing our science and pull out interesting data products there's also document central within
data central as well which allows us to update all of our documentation for all our data products so as before we would have uploaded a text file and it would have stayed static forever the person who actually created the data product can now log in and change documentation that date products and things like that which is super useful and it also has version control and things so we can go back through and find out what people have done and most importantly we don't have this single point of contact anymore because anyone who's allowed can log in and change the information themselves which is great so returning
finally to this table I think pretty much in terms of the things that we want on the left-hand side in terms of serving our data and hosting it in that in a useful way data central covers all of these I would say in helping us to serve and support this data which is great so we're very happy with how things are going with this into action with data central one other thing that I wanted to point out in terms of how data central is helping us they're also helping us in terms of devil's would be observing that we're doing we have a kind of weird mode of observing with devils that hasn't really been used before in terms of astronomy and it's something called this nightly feedback observing mode traditionally and astronomy people would go and they take their data and then they go away and they reduce the data they look at it they analyze it they get all of their results and then they publish a paper and that process can take anywhere from weeks to months to years and within Devils we actually operate in this way where we take the data on one nights and then we actually reduce all of the data analyze it all measure everything we can from the spectra update all of our catalogs and then re observe the data the next night so we try and condense all of the process that we would do over probably a year and in a traditional sense into about 24 hours and that causes us quite a few problems which data central is helping us with so in this process we observe the galaxies at the 80 we then reduce it automatically combine it with all of the previous data that we have in our archive we measure a redshifts for all of those galaxies we then work out which galaxies have secured redshifts and we check out everything that has a secure redshift we then keep all of the sources that don't have a redshift yet we rerun the software to actually decide which things we're going to observe next the next night we prepare for our observing and then we observe the next night now the the complicated thing is that this part of this diagram happens at the siding spring Observatory at the AAT and this part of the diagram happens here where I am in Perth at ikura mainly because we need to run all of this software on our large machines that we have here we need this process like I said to
happen in about 24 hours so this causes some complications so just to describe that in a bit more detail when we have
an observer at the telescope they basically take the data that we record on the night this gets passed to our server machines here in Perth we run all of our pipeline to analyze all of the data and reduce all of it we then generate the files that we need to do the observing the next night and they pass back to the telescope so the observer can use it the problem with this in terms of getting it done in this short timescale is there is a firewall between the eighty and the outside world which makes this very difficult there are ways to get round it that are slightly complicated and but I didn't know how to do it in a very easy way but we needed to operate in this mode and kindly that the people from data central said yeah we have a solution that allows you to get away from this and add to you a load of other functionality as well so what we did was we stuck data central basically in the middle of this process and linked up the telescope and our database here both to the data central functionality so what then happens is that you upload all of the data from the telescope through the data central own cloud which you can see on the bottom right here this is then automatically synced to the database here which runs the pipeline makes the fiber configurations they automatically get synced back to data central and then synced up to the telescope again and this whole process takes about 12 hours so they basically have put themselves in the middle and allowed us to easily transfer this data from the telescope but this also allows us a load of extra functionality in terms of using the data so he basically can access it via mobile wherever we are and we produce a lot of diagnostic clocks to tell us how the observing went so I can now log in on my phone and just see that all of the data was reduced correctly and everything looks fine from wherever I am in the world and it also gives us a latency for if things break locally so we had a problem that on Christmas Day our server machine here went down and I was running around in the middle of the night trying to sort stuff out to fix everything and then I realized that I could just download all the data that I needed from data central run the pipeline and upload it back up to date a central and it sold the massive problem that we would have had in terms of breaking the observing over Christmas so the functionality that's provided in from data central in terms of actually running the survey as has basically saved us from from having to lose a night's observing over Christmas which is really useful so just
in summary and gamma and Devils are these two surveys that are aimed at studying the evolution of galaxies and structure over the last 7 billion years to do this we compile these massively large data sets spanning loads of different data types with loads of different intricacies and weird things we have to worry about to maximize the scientific return of both of these surveys we need to be able to combine it in loads of interesting different ways and we want to link up different datasets from different samples and things we want to be able to update documentation and help run observations and things like that and then importantly we want to be able to provide this data to the public in an easily usable and understandable way and data central is helping us with all of this functionality and actually adding things on which are better which we haven't actually asked for which is great and then finally most importantly is data central doing this actually frees up more time for people like me to do science so I spend a lot of my time over the last few years writing stuff for having data access and being able to provide data to the public in teams and things and now we have a dedicated team of people who are doing this for me which means I can go off and do more of my own work which is always great and cool and that's about it so I'll leave it there thank you Luke there's a
question - Luke just intrigued when you were talking about talking about combining data from different instruments and different wavelengths and bring that together assuming astronomy is sort of a standard in which you say well that part of the sky that section actually has a sort of unique way as a standard way of describing it so that across all sorts of different data sets you can say well I actually want that little bit is there a standard around that currently it's it's what people like Katrina and her team are working on and it's very difficult at the moment most surveys in the way that they've worked in the past is that they've completely worked in isolation and then they tried to sort of post engineer a way of combining their data and they usually leave it open to individual people who are doing individual science projects to do it themselves now what this does is it brings in a massive problem in that anyone who does anything slightly wrong gets the wrong answer and even teams were doing the right thing but doing it slightly differently get different answers as well so it's incredibly tricky within Gama we've actually tried to do this ourselves a bit and I think we're one of the first surveys that's actually pretty huge effort into trying to standardize data that's in the same area of sky so that you can link it together in a nice way and but we I think that the problem has been is that we're astronomers who don't know much about doing this type of stuff and we're trying to do it ourselves and we're trying to make sensible choices but trying to adhere to things like the fair policies is just it's probably not high on the agenda of an astronomer when they're trying to get some science papers out so now people like the ASBO here and Katrina are trying to do this for us and actually make sure that we do in a standardized way and actually just it it really punts things out of the way of us from doing them so that they get done properly and it allows us to do science much more easily they one of the benefits that came from Gama was that you could just trust what have been done in a useful sensible way and then just do science really quickly and but we've moved into a point where we can't do this anymore with gamma there's not enough people working on it and there's too much data and too much too many problems actually as well for combining stuff so having people who can take over who are experts in doing this or us is great so just to add into that so we're working with any new surveys teams just to make sure that they are trying to fit in and use the same methodologies the same IDs the same ways so they can all talk to each other but there are other ways that you can search the sky you can always use we have a standard coordinate system so if you know you want to look in a certain part of the sky you can actually cut out pieces across the different survey so there are different ways that you can then build up or look for what you're after in the sky but yes we going back to Luke yes we are trying to get standardised IDs or names in certain areas I mean there are some some some objects that have standard catalogue numbers that you can or names that you can search on but there's so many different ways you can actually look for things in the sky it's kind of interesting as well because gamma and Devils I mentioned number two very different use cases in terms of data central in how that the whole process works because Galan was done in the past before we had all of this on our minds and we did things how we thought it was best for us within our survey and that means the data central now has to go back and post engineer all the things that they want want it to be within gamma where Devils we're observing right now and where we're taking the data so I'm working with the team there so that I actually structured the data and provide all of the information that they need as the data is coming in so we're kind of working in this transition where we focusing on making the data more usable right from a ground level with the new surveys that we're doing absolutely so we're very thankful to teams like Luke's so that we can actually take it it's much easier for us to work with everyone going forward then trying to to retrofit but we will do the best we can and is there a is there an international standards bodies that I've year either IV I am in that that's sort of the organization that would sort of come up with community agreed standards and approaches so yes there is but again it depends on the individual surveys as to how they talk about the sky but we can always find things in the sky from a standard coordinate system and so forth so this but there are ways to make it easier I think one of the toughest things in in astronomy is actually convincing people that you've done it correctly I think that's the hardest thing because people will always they've they're in this mindset where they do something themselves and it's always been a case where you just download all the data and you just do it all yourself and then you trust yourself to have done it right whether whether you've done it right or not you kind of believe yourself and the problem is there's a lot of skepticism for these big archives and IAS vo tools that people haven't done things correctly and you have to kind of build that reputation so it's quite tricky to convince the world that you've done the right thing so coming it with some standardized way that you can show gets a bit published works and it's the right thing to do that's kind of the the tricky bit but you're right ultimately though we want to have a whole sort of provenance within the data so people can go back to the very beginning if they want and go oh yeah they got it right so we want to have all of that in there you make your
code publicly available as well so that as well as the documentation and the data you can see record as well yep yes everything's available okay yeah we're within this city we're actually moving towards that but I'd say when we're not at the moment in terms of making everything publicly available and there's this I'd say within gamma there are a lot of things which are hidden that people take our word for that we've done it correctly and I think we've built a reputation as a survey team that people believe it but I think we are moving into that realm where we try and make everything public as well now so people can go back and redo things themselves if they want to okay well I think that's that's time now I'd like to thank all of our speakers again thank you very much and that concludes our webinar for this afternoon and
Feedback