Costs and Benefits of Data Provision

Video thumbnail (Frame 0) Video thumbnail (Frame 14655) Video thumbnail (Frame 24642) Video thumbnail (Frame 35017) Video thumbnail (Frame 44553) Video thumbnail (Frame 57223) Video thumbnail (Frame 69893) Video thumbnail (Frame 82563) Video thumbnail (Frame 83716)
Video in TIB AV-Portal: Costs and Benefits of Data Provision

Formal Metadata

Costs and Benefits of Data Provision
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
In 2011, ANDS commissioned a study to examine the costs and benefits of public sector organisations making their Public Sector Information (PSI) data freely available. The study was undertaken by Prof John Houghton, a prominent economist and researcher in the 'open access' field. It involved a number of public sector agencies, each suggesting different benefit:cost ratios. Overall, the study demonstrated that the benefits of making PSI data freely available far outweigh any costs, by some considerable margins. In late 2014, ANDS commissioned a follow-up report to estimate the total public spend on research, the value of the data created and analysed during the research process, and evaluate the benefits of curating and openly sharing public research data. This webinar explores the findings of these two reports.
Context awareness Euler angles Multiplication sign Range (statistics) Archaeological field survey Open set Function (mathematics) Computer programming Business reporting Medical imaging Estimator Different (Kate Ryan album) Core dump Extension (kinesiology) Physical system Scripting language Shared memory Bit Surface of revolution Measurement Data management output Website Self-organization Quicksort Slide rule Statistics Service (economics) Observational study Real number Drop (liquid) Product (business) Element (mathematics) Number Twitter Authorization Traffic reporting Absolute value Task (computing) Condition number MIDI Focus (optics) Information Forcing (mathematics) Uniqueness quantification Projective plane Line (geometry) Limit (category theory) System call Vector potential Word Software Personal digital assistant Query language
Point (geometry) Statistics Server (computing) Freeware Service (economics) Observational study State of matter Multiplication sign Real number Set (mathematics) Online help Water vapor Open set Drop (liquid) Student's t-test Mereology Virtual memory Number Hypothesis Estimator Mathematics Different (Kate Ryan album) Term (mathematics) Nichtlineares Gleichungssystem Extension (kinesiology) Macro (computer science) Traffic reporting Absolute value Binary multiplier Source code Boss Corporation Dataflow Information Range (statistics) Total S.A. Measurement Product (business) Type theory Arithmetic mean Software Personal digital assistant Calculation output Self-organization Right angle Figurate number Quicksort Freeware
Presentation of a group Group action Multiplication sign Range (statistics) Execution unit Archaeological field survey Bit rate Open set Mereology Estimator Different (Kate Ryan album) Repository (publishing) Circle Information Endliche Modelltheorie Series (mathematics) Source code Witt algebra Moment (mathematics) Range (statistics) Bit Term (mathematics) Measurement Open set Arithmetic mean Data management Repository (publishing) Software repository Interface (computing) Self-organization Summierbarkeit Energy level Quicksort Simulation Point (geometry) Mobile app Service (economics) Observational study MIDI Field (computer science) Element (mathematics) Number Regular graph Average Energy level Traffic reporting Macro (computer science) Game theory Self-organization Addition Focus (optics) Dependent and independent variables Polygon mesh Haar measure Direction (geometry) Weight Uniqueness quantification Civil engineering Theory Binary file Sturm's theorem Vector potential Word Hausdorff space Personal digital assistant String (computer science) Physicist Calculation Data center Codec
Principal ideal State observer Mashup <Internet> Context awareness Greatest element Confidence interval View (database) Multiplication sign Archaeological field survey 1 (number) Open set Information privacy Mereology Computer programming Expected value Estimator Different (Kate Ryan album) Software framework Information security Position operator Covering space Constraint (mathematics) Theory of relativity Reflection (mathematics) Moment (mathematics) Staff (military) Bit Instance (computer science) Message passing Arithmetic mean Data management Process (computing) Repository (publishing) Order (biology) Self-organization Right angle Quicksort Metric system Sinc function Reading (process) Point (geometry) Mobile app Service (economics) Observational study Machine vision Twitter 2 (number) Term (mathematics) Drill commands Harmonic analysis Software testing Traffic reporting Absolute value Standard deviation Information Projective plane Line (geometry) Cartesian coordinate system Frame problem Software Personal digital assistant Mixed reality Network topology Universe (mathematics) Data center File archiver Speech synthesis
Sensitivity analysis Group action Context awareness Multiplication sign Decision theory View (database) Archaeological field survey Function (mathematics) Mereology Data quality Expected value Optical disc drive Medical imaging Oval Software framework Physical system Area Source code Collaborationism Range (statistics) Category of being Data management Process (computing) Website Right angle Quicksort Point (geometry) Identifiability Observational study MIDI Hidden Markov model Online help Student's t-test Number Product (business) Goodness of fit Term (mathematics) Operator (mathematics) Energy level Theory of everything Mathematical optimization Condition number Information Physical law Civil engineering Content (media) Planning Personal digital assistant Calculation Formal grammar Data center Object (grammar) Table (information) Routing Spectrum (functional analysis)
Meta element Slide rule User interface Observational study Link (knot theory) Java applet 1 (number) Mereology Business reporting Hypermedia Website Exception handling Digital rights management
Observational study Touchscreen Link (knot theory) Network operating system Aliasing Metadata Mass Hyperbolic function Open set Mach's principle Mathematics Word Explosion Hypermedia Nichtlineares Gleichungssystem Figurate number Exception handling
afternoon and welcome everyone to this webinar costs and benefits of dark provision I'm in the room with me looking at the slide is Professor John Houghton to my left and is Adrian Burton who is the director of services and probably just off camera Susanna's sabayon we also need to acknowledge Nicolas Crowe news of co author or co-author i should say but who's not here today so i will hand over to adrian to describe very briefly what ends dozen why we're interested in this so dr. Burton custom benefits of data provision why would ends be interested in such a thing um well and Australian national data scripts really newcomers to the webinar today is an instructor program within Australia that in fact its overall goal is for there to be more valuable research data for researchers so really that is in fact our core mission that we want research data the outputs of research and the inputs of research that our data products for them to be more valuable for researchers in Australia and of course not just for researchers but for education industry public policy general citizens in Australia to have access to that data our focus of course is to make better research so we're really framing our questions about the value of data within its value to have better research quite often able to go back a little bit perhaps because of the information revolution and other sharing practices it has become more possible to you know share data reuse data and have it as a valuable output of research absolutely going back a bit the general attitude towards data was that it's a you know waste product you are a byproduct of any of the research industry like carbon dioxide or other you know just pollutants that just spill onto the floor and once you've done the project then it's just one of the things you throw away it's a waste by-product if you like but now with the sharing systems and you know the fact that a lot of the data is digital and there's an absolute amazing global network of information sharing the data itself is now being recognized as a valuable product not a so what is the difference between a black product in there and a major product limits where you see the value and in one sense perhaps what we're doing now is your reconfiguring our research systems to take into account this product of research and say well what is its value and how can we make it more valuable again for your better research n for the broader society so that's why we are interested because actually it's that it's the actually behind the whole idea behind the Australian national data service everything we do around our research data management infrastructure policy citation of data everything is so that the output of research and the input to research is more valuable so it was in that kind of a context if you like that we've worked previously with John and we thought well okay what about that question how would we how do you measure the potential for valuing data okay so there are actually two reports and the first droplet will deal with failure first up on the second report is written by John Houghton and Nicholas grown and I did not mention that Nicholas grooms also the author of the Gulf to task force report and he is from macroeconomic so John I know you just can't wait to get stuck into this if you wanna take us through that first report yes sure i think it was around 2011 that we actually did the study focusing on government information public sector information PSI and we did case studies at the australian bureau of statistics and geosciences australia and hydrological data and using with both the National Water Commission and the Bureau of Meteorology it was an interesting it was I think it's unique I've never seen another study that actually does a measures the costs and the impacts before and after it was a unique opportunity because the ABS in Bolivar had just adopted open access and then a year later CC licensing so was it most studies try to estimate the future benefit of open access but this was actually a study in which we had that before half so it was it was a really good opportunity having said that we don't have to say that measuring these sorts of things isn't easy and we there were Lou tations to the data that we had available to do the study so in a way it was really only the Bureau of Statistics case that word well the other two were much more limited we used sort of three elements to it we looked at the activity costs and cost savings of the agency and focusing on the ABS incest the one that worked best and we looked at what sort of new activities were being done and what activities weren't being done any longer as they adopted open access for example the ABS used to have quite a lot of shop fronts where you could go in and buy reports and so on and they greatly reduce the number of children so leave you know these kind of savings were available for the agencies like that we be focused on the agencies and we didn't do a survey of users but so there was an assumption in that we we assumed that the user costs activity costs kind of the mirror image of the agency activity costs for example one of the activities that the ABS and the other agencies were doing was answering the phone and answering queries about licensing conditions what they couldn't couldn't do with the data now obviously there's two people on a phone call so the assumption that the amount that it costs a BS to run that serviced was also being spent by the user on the other end of the phone so if we made that assumption it was a sort of a simplifying way to do to study in a case of the ABS we basically found that they ended up losing revenue of course from not selling data but they also made some savings and that we could quantify and they had quantified and they also there were also savings in a sense that we didn't modify which was in things like if you stop doing the shop front then everybody in the organization can concentrate on the real issue of gathering and produce a statistic so it's like distracting activities that they had to manage they could focus on core business bit more so they were savings there on the other side we also try to look at the wider axle we used over the welfare approach and a return on investment in data approach and both of those require some measure of the impact of open access basically and the amount of extra use and that was where the study actually turned out to be much more difficult than i would have envisaged what we tried to do was to use download website statistics download data from each of the agencies and from before and after but of course there's a whole range of reasons why downloads change go up basically over time there was a general trend in the mid late 2000s of people to download things than they did the year before I mean it's still the case so there was obviously if you have more available online there's going to be more download so you have to look at both the extension and the intensification of use and this proved to be actually extremely difficult time we did manage to sort of tease out both the extension indentification abuse a little bit enough to make some estimates and I think basically the bottom line for the ABS was that the overall costs for both the agencies and the users of doing it was about 5 million a year circa 2006 2007 and the total benefits the savings plus the increased returns was about 25 million years so the benefits for about five times the costs in that case should we bring that graphic on from the yes video yes that's a John afore you go
into this gorgeous equation but you explain that home world because it has other meanings doesn't that time yes it does but i think it probably means the same at economic times broadly speaking it's a it's an approach that basically what we were trying to do was estimate the change in consumer surplus so consumer surplus is the difference between what people would have been willing to pay and what they did pay so if you don't have to pay as much as you were willing to then you getting a benefit of consumer surplus and there's various producer-consumer servers add to social welfare so obviously if you stopped selling something and give it away then there's a lot of people who were paying for the data that don't have to so that is a consumer service straightaway there's also the issue that more is going to be used more and so there'll be more users who weren't willing to pay the old price but they're willing to pay zero so then there's a sort of a standard way of calculating the increase in consumer surplus that is the welfare part of the calculation now that was an economic we didn't help at all I just loosely this gagas new users yes it was the one in one Asian that graphically receptor that's the one that ends up being five to one doesn't drop in the case of ABS yes in the case of the geoscience australia it was much higher i think around 13 to one but for the particular time of data that they were using but i think it's really important point to note that that doesn't reflect on the agency that you know the agency that gets 13 times the benefit isn't better as an agency than the one that gets five times benefit it's due to the completely different sorts of data and types of uses of data geospatial data is highly valued up but most people use GPS I use GPS to get here as it happens so everybody knows that it's valuable data are hardly so it's important to realize that it's custom benefit relates to the date and what the proponents of the agency who uses this kind of data John I was the first of the two studies was basically public sector whom uses that part everybody really am in various ways one of the well in terms of national statistics clearly anybody who does anything of it to do with the economy or policy uses National Statistics anybody who is in government in a sort of a non-government lobbying organization or in industry of big users of various sorts of statistics you also get very in case of water data we looked at victorian example and one of the big users was a school education and because and that's common in a number of other studies that I've done with research data and with etc information that students react a lot more and a lot more engaged if the data is actually real you know it's it's a it's a depth depth measurement from a river that they know exists it's not a textbook example that's just purely hypothetical and it really helps students engage even when they know it's real data you know much more than they do with abstract hypotheticals so in this case here leave the cost benefit benefit cost i should say ratio so it varies from five to twenty i think a previous study by ISIL tasman put geospatial data over 20 so we know that that is being used by government it's being used by research input to research as well and we'll talk later on about business and industry but am i right John the nicklaus gruen or you and Nikolas some extrapolated this in another study instead of just the labia stew the homeless the government we did a deep the first study that we're talking about here is was case studies so there was a wonderful Dilbert cartoon that I store once that said and Dilbert went into his boss with the pointed hair and said my powerpoint as everything I've got data for the real people when case studies for the ideas so I just bring that up I'm only going with it is less hide it so but you can't really taste that is a really interesting they speak to what's going on a major point of doing the first report was to try to come up with a methodology that deep departments and agencies produced themselves to make the case for open access and to help them see the benefits to them of open access but you know you any case study is just that you can't multiply case studies and get the macro feature so and in a study that Nick and I did for the Emidio network a year or so ago we did a macro sort of estimate of the value with public sector in nation and research data combined and we got an answer of around 17 billion dollars a year in in total country so the first study basically showed that the the case for making public sector data free available so overwhelmingly positive we have a figure very well a huge number here representing the annual value of that data well do it will mention lived on our statistics showed that that first study had i think in one year over four thousand downloads around the world so I think you'll see in government state territory governments and Commonwealth governments all the government departments have engaged to some extent in providing access to their data sets freely available license from a book about licensing that towards the end so John can we move you on now to the second report the recent one which
was i suppose what most of the people are signed up or to find out about the
open research data the houghton grew and report like to take us through that one it's a very different approaches mates not a k-state approaches a of a macro approach and there are two main elements to it and I'm hopefully going to state the really obvious the first question was to try to measure the value of data in public research and at the moment and to state the obvious that's something that exists so we could measure it the second part of the study was to try to estimate the potential upside value with curating and sharing data from public funded research and and that doesn't exist yet so we had to estimate its oh and the report is very much into completely different sort of approaches what we were trying to do was to and measure measure value when measured a potential upside value curation so the focus of the study with basically government-funded research and there's two ways you can look at that that is funding by sector Thunder commonwealth government spending on research which is about 9 billion a year but you can also look at it as a sort of a policy level who's going to make policy who's going to follow the policy in which case it probably makes more sense to look at sector of execution so government as a research organization the combo at CSIRO and so on and higher education for which you can make public policy so we looked at that as I've learnt that research at that level which is about 13 billion a year so I gave us a kind of a range estimate of of the sorts of things we were dealing with and the first thing we did we did two approaches to measuring the value of data in public research now and I'm sort of carefully saying those words because what we can measure is people's use and the activity of using data but of course research is a global of activity not all the data used by Australian researchers is Australian quite a lot of it isn't and so we're not talking about the value of data produced in public research we're talking about the value of the activity of using data in public researcher not sure what I'm making that clear but and it is an important distinction the first thing we did was a really simple approach is probably the most basic sort of costing approach you can use and economics which is useful you which is simply measures the time and other costs involved in creating manipulating and analyzing data and the second approach we used was to try to look at a turn on investment in the amount of money spent on the activity at average returns to R&D for both of those for the whole report in fact we based week in a sense didn't do me original research we based it on a series of studies that I've been doing and am doing in the UK with legal be greed Charles B relented and we've been doing studies of research data centers we did a study first of the economic and social data service and one of the archaeology data service one of the British apps varick data center and we're currently doing a survey at the moment for a study at the european bioinformatics institute so based on to what we're doing is to use the activities that we know about from the users of data centers in the UK and say that if that was happening in Australia what would that be worth so basically we found that in the UK studies the survey respondents of which there were many thousands by the way and reported that they and their in their opinion others in their field spent between 35 to about sixty percent of their research time creating analyzing manipulating data thirty five thirty percent of urine archaeologists sixty percent of your atmospheric physicists I mean that's not surprising so broadly I mean just as a simple mean we took about say that about forty five percent of research time of typical researchers is across disciplines is spending data so that's worth anywhere between two and six billion dollars a year in Australian public research that was the first thing we did the use-value the second thing was and to look at a returns to R&D Anita as a legs sort of model to do that because the returns accrue over 20 years and you expressed the value in net present value from one use expenditure but anyway and coincidentally because it's a totally different mesh meant method the answer was the same it was between two and six million six billion sorry be a year in net present value so that was our estimate of the value of data it's a very broad range because there's two ways in which you can do it and Nick and I I think it's been two very long fruitless evenings arguing to and fro which we should do so we ended up doing both because we couldn't decide so that explains why there's a range we also felt that to be honest you know if we can if we gave a pinpoint number I'm 599 a we don't know how accurate it is really and so a broad range i think is more honest but i would say that i'd expect the answer to actually be closer to the top end range closer to the six billion 2 billion but that's kind of an opinion rather than a calculation so
what I've been talking about was the value of data which is the left-hand column and that's what it says one point on I said two to six billion but that that describes what we were using research activity 4 times in the UK and the use value in the return on investment calculation so now moving over to the other side with the repositories heading and they are completely separate as I say one is existing we measured it the other doesn't exist so creates tomato and moving over to that side we try to look try to estimate the potential upside value of repositories so basically the calculation is saying if all of the researchers publicly funded in Australia and realize the same benefits as the regular users of the UK data centers then this would be the potential upside value now of course it could be a lot more than that if we in Australia did better than the UK managing and making data available it could be quite a lot more unit you know there's no way to know it's just a way of getting a ballpark estimate the two big things that all of the surveys in the UK that we've done varying like that the users of UK research data centers and what necessarily in the UK and the things that they all report is the number one impact of using data centers is the efficiency in fact that they save a lot of time creating data and so forth and and the second one is there's an obviously additional use by reuse silikal reuse and by people who couldn't either create the data themselves or obtain it anywhere else so that is pure additional you use and we can calculate the value of that additional use so the elements to this calculation where the times save and of course when a researcher saves time they don't think okay well I'll finish it 330 and go home they do more research so the time saving is just the first step the second step is well if you use that time to do more research than that extra research is also going to have a return over time so there's two elements there and the other one was the average return to the pure use so and we estimated the sum of those impacts to be 1.8 to 5.5 billion a year so the right-hand column we in in essence have no idea how much of that is still available or already being done but we as kind of a scenario estimate we simply said that maybe ten to twenty percent of the data we are producing is currently being generated and openly available that's probably generous and so unrealized eighty to ninety percent so the unrealized upside as we say 1.4 to 4.9 billion here is what's available to us and there's a few issues well as a lot of issues obviously around those things we wanted to look at the value of national collections and there's obviously some specialist data centers around in Australia and we didn't do any kind of estimate but I think it's fairly it's just an opinion that given a pretty unique sort of climate fauna flora and so forth national data in Australia is probably worth more than national data about some things is worth in a European country but that's just an opinion and we didn't sort of do an estimate of that and the other thing we did which we kind of isn't on this chart we've tried to think about what would be the cops group we're saying the outside is maybe up to five billion what's the cost and of course we're measuring we're trying to estimate something we haven't done yet so you got really custard and it depends how you do it clearly if you curate data very thoroughly very well it can cost you almost anything it could be very expensive if you do it badly it doesn't cost you very much but then you probably where I've realized the five billion potential benefits the Wonder probably the best we went around a number of circles with it but probably the best estimate just as a ballpark again was again to do with the UK research data centers which have been historically at least it's changing a bit now but history the subject data centers were funded by the relevant research council and so the economic Social Research Council funded the economic social data service and so on there was a government report in the UK that said that across the disciplines there was a remarkably consistent expenditure on those subject repositories data repositories which was about 1.4 21.5 percent of total funding so if we go back to our original what's public funding nine to thirteen billion 1.42 1.5% which suggests 150 to 200 million a year would be the cost of sort of quality duration so um clearly 200 million as a cost and five billion as potential benefit is pretty much a no-brainer territory and nestled in the user and well that's just really
representing the the current estimate and the available outside of the 1.4 24.9 the straight line is a straight line because it looks funny to just have two points but even then the governess wouldn't say that was a trend between two tato not sure why we drew the line okay graphic blips and yes so damn that that just sort of summarizes what we're what we think we're doing now and where we think could get but there in that fear there's there is more than just that isn't there because in the top right-hand corner I'm not sure everyone can read that it says all data using garlic infrastructure John talked a bit a little bit about data infrastructure versus repositories because there is a bit of a difference and we want to make sure people understand you've mentioned the cost of running these sentences fairly small in relation to the possible return so let's just get a discussion going on mop on the bottom left is this an individual of the desk another one right we have fully data infrastructure what does that actually mean what's the message there and clearly the infrastructure is both hard and soft certainly from the surveys we've done elsewhere the you know the guidelines the knee the standards and all these sorts of things are highly valued and highly important is certainly not about by T and networks obviously that's essential but the the soft infrastructure is is vital and that comes across very much from the studies we've done all the time that you know people use the guidelines and go to a data center because of those kinds of facilities I'm actually a pretty regular user of the economic and social data service in the UK they have some wonderful simple methodological guidelines available on their website freely which has nothing to do with the data for such so that's a really important term very important aspect term to the infrastructure if that's what you had in mind really it is there's two sides of it we're talking infrastructure and policy so what does the policy mean in this context and well in the in this second report we and the made some policy sort of was the way they sending went recommendations probably mumbling plus under observation yes okay on today it's one of the point was the one who just say both hard and soft infrastructure and the soft instructor is very cool and a starting point in the Senate policy point of view is mandates from government from funding agencies from institutions and and one thing that's come up in other studies over recently did a study for the Canadian Research Council's with cover people and elsewhere and we was sort of a backgrounder to their recently announced try agency open access policy but and one of the things that came up in that really strongly is the importance of harmonization of policies and research is a global activity you know you can you don't just have one Thunder one institution you you're collaborating across institutions across countries across funders and if they all have different policies mandates different sort of things that you have to do to comply it's a nightmare if the cost of compliance is high and people just don't bother to do it unless they really taste so I think it's really important when you think about mandates to think about harmonizing what we're expecting and asking worldwide as much as possible no that's because I mean that value that's a simple things later hands you know just go and get it and you know it's something to bear in mark and it's all part of this going from yo from the bottom start of the top staff includes a coverage of all of you all data that should be shared and reused being able to be there as part of that is the policy that provides those in indict yeah yeah and you've hit on another thing there in terms of policy obviously there's constraints about Logan is not everything can be openness privacy concerns is security commercial and confidence so it's not ever think of you and it's in it's vital to sort of sort that out and make it really clear up front and one trend and I'm careful not to criticize anybody but one trend and well okay I'll criticize my own University Victoria University when and I think it's something we've got a thing about in terms of policy so on to give me if I'm drifting when we want to do a survey we get human research ethics clearance application and we specify what data we're going to collect who we are going to ask what we're going to do with it basically we ask permission of the subjects to use the data to the purpose that we define now that's fine if the data is mine and I'm going to use it but if I'm going to make it open I've got no idea what people are going to use it for and three years times I can't possibly seek permission for that use so I think most institutions have got to rethink the sort of research ethics process and that positions process it can't be about permissions it's got to be about you know protecting the subjects of whatever it is from you no foreseeable harm and confidentiality privacy those sort of things it cannot be about permission otherwise if it's never gonna work and you know I think I think that's something we need to rethink policy wise and that's another example of the harmonization of these different new ethics policy and or the ethics framework and the funding framework other funders and not making it hard for the researchers Tov you know by having competing requirements on them yeah at the one end to make things available in their land to destroy them as soon as the project is finished yes yes yeah yes which is still you know the case with our ethics brutal you guys keep it for five years how are you going to destroy it is the question how are you going to make it openly vocal further but so we those things so these are things that are included in when we say by through policy and infrastructure there's a more coherent policy framework in the broadest possible sense of absolute X and commercialization and research funding policy to bring us up to a broader coverage yes yes and um speaking and since Nicholas grown isn't here I'll sort of speak on Nick's behalf one of the things that he's very keen on policy Liza's he runs in a firm so it's better than surprising but there's a trade-off in all of these things in policy between us have a kind of a top-down approach a bottom-up approach and quite often there's a tendency and to have been too much top down and so that you know that that presupposes that the person designing Lee top-down policy knows what's going to happen in three years time and can predict how best to do that which is often not the test affair whereas if you um you know if you leave
people to work out how to do it for themselves you can get more innovative solutions I mean to give up so I think that the mandates and the policies and a sense need to be about setting a vision setting a name they need to be about guidelines they don't need to be about instructions and I think and perhaps as a concrete example I might use my own University again criticized it and know you're reaching you of a pointless to return you don't have any closer than I think you know and victoria university has an open access policy applications and I'm sort of facetiously call it the Phil will repository policy it's not actually an open access policy it defines the expectation of open access but it says everything must be on the repository the institutional repository well that's that's an instruction the open access policy is a guideline and expectation I might from you know bottom that might prefer to do it and they do prefer to do it differently use ssrn make or other subject repository is not institutional as a principal so I think in all of these things when we're doing mandates we should stop it the the guidelines and the expectation and leave to implement you know all of the implantation but how we actually achieve it for more innovation from the bottom up a repository policy like that rather presupposes that green open access is going to be the solution but I'm not sure that that's the case and don't think it is the case so now i think it's it's kind of foreclosing innovation that we could have and may actually be quite negative and next very big on that point could I ask John before we sort of formally ask audience for questions what's your current read on the policy for publicly from the data I'm just dissolving the berry tree I've accepted I've in research data what's your read it are the guidelines clear enough are they out there you haven't mentioned licensing yet but we might get to that because that's one of the things you need to do in order to allow people to use that point put in a repository it's not licensed so yes using that question IP and look um I'm not sure I know you two clearly know much more about it than I do in terms of those issues we have an interesting question er you know my sense is the it's got a long way to go in terms of the guidelines and doing it this is a question many researchers claim that there is no second use value for their research are they brought their data and well if people are using data from a data center that they could not have created and could not have obtained anywhere else then they must be wrong so someone is using it for something that they didn't foresee or collect it for so yes I think they are wrong I think ability is researchers to imagine what we can do with data is good but it's not good enough to imagine what everybody else would think of doing with it and I think it's only thing if you look at some of even their sort of the mashups that you get of executive what people do with it is something you will never foresee in terms of apps and those sorts of things and I think it's true of research data too I have splice waiting for the next question I could clarify i might even ask a dream to clarify in order to get under the top star to realize that five five-and-a-half billion Betty's annually it's not just enough to put your data and positive what else is required what what what does that really mean which reading that access increasing value of data as data access increases through policy and infrastructure well somewhat something that will even though you even licensing the first thing to note is that you know we're not at zero here the bottom star is not at zero zero so there is a there is activity going on yeah we have infrastructure and some policy that is pushing us at the moment for an example the whole increased program obviously not all of it is to do with managing and curating data some of it some of that investment is to do with generating it but as we all know the way you generate you know does make a big difference to reuse later so you know if we take the for example the increase in westmont a proportion that is going into creating the other thing was like I mas etcetera that that does have a management and curation and access and discovery role for her in data in Australia so that's why you know we're obviously you're in and particularly in those instances where we're way off being at zero so I think there's a part of the message here is that those that where we have data and destruction here that should be continued and maintained and has been working across all the anchors facilities but also with each the research organizations the organizations have a have a role in this and there's been some great uptake with much every research organization in Australia what we need to go further there is making these things so some of these things are the right thing to do but not always easy the tools of the support services in the promotion that makes it makes managing curating publishing data as easy as it can be as part of that infrastructure and that includes the kind of information that's required for reuse is a big question behind this is that you know some I think behind Roz's question is well it may not be totally useful for secondary youssef you know we don't have the calibrations or the methodology or the or it hasn't been collected using the community standard etc so those kind of things that you mentioned are part of this you know making data available and when we say it is valuable if we say that later in some kind of a data services value valuable well is it well because it's so much easier to use yeah because it's it's there it's been managed it's been documented and say accessible to all those qualities are the ones that we're trying to bring into yo more as a business as usual for for data part of this again is the incentive question so being able to track these your users or users an important part of the frame if you like that we sit around this that says that yes there's an impact from the reuse of this data and that somehow measurable and is a good reflection back on the original researchers on the research organization and the data archive all three of them you need to have some way of getting a pat on the back to say that yeah we've had a platter part in this and so that's where we're getting to these things about the your data's citation and metrics for your user we use you know they're an important part of this this mix again licensing as you talked about so yeah there's our the agreement licensing that for you say so particularly in the Australian context
if you're silent and assuming there is some intellectual property in the data that you've created if you're silent then silent wings no content to reuse in in the Australian context so r you have to be I think everyone needs to be aware that if part of this whole new world is that we're assuming that the data is not a waste product that it actually is something that needs to be reused then you need to say that somehow at least those terms and conditions need to be made explicit that yes this can be reused the most open way possible so that's the idea of putting the license there's a very simple you know Creative Commons by framework which is a good place to start you know why wouldn't you default that and say other any reasons not to use that as a starting point and that means that there's clarity from your point of view if there's new mistakes in the day to this you know the indemnity is already baked in and this clarity from the users point of view yes I can freely create new innovative products in research and industry and education that not not somehow clouded with uncertainty as to they can be really have some questions coming in we need to I'd like to mention that we actually have a fully operational licensing systems country called oz gold so a us geol check it out and it has a license checker so this is the point that they do is making so well there if you don't license the data if you leave it unlicensed effectively in law no one can use it all rights reserved so let's go straight to the questions below I really like the idea of recycling it puts me in mind of you know waste product and recycling it's like being an organ donor hmm I think so say you can use i leave my data to science it's kind of a nice idea you can get four toes yes well and it's all part of the return on industries you know it's a manufacturing industry can make themselves more productive and more efficient by reusing some of the what would previously called waste products and same thing here in this edge we've got some questionnaires lives read them out so they run those what we're talking about here what is the optimum investment so the data is efficiently reusable we are actually try to look into well I don't know I mean it's one of those things that I we don't know I mean we don't actually know what's the optimum investment in research even that we have tables of performance that say that we invested lower proportion of GDP than Finland does it's actually a very controversial point what is too much and and maybe there is a point of which it's too much I've only ever seen one study that tried to do that calculation so I mean what's the optimum investment i don't i don't think we've got any idea but i think it's probably more than we're currently doing almost then that question that we looked at another way which is going back to an earlier point you made the investment in group and data infrastructure / / the value of that data is a small number next question what quality initiatives are planned to reward researchers for good data management and help others identify quality data available for use quality initiatives are plan to reward your searches date image it's certainly one thing I'm not answering a question but it's it's one thing that comes out very strongly from the UK data center studies that one of the qualitative questions and answers that we're getting is that the knowing this and understanding the quality and the processing of the data that there is what they value what the users value pretty much above everything else so those two questions perhaps relate to each other the optimum investment and the quality so a quality an issue I'm not aware of anything formal in that areas or a quality initiative what I'm imagining you plug in battle here is you know this is five-star data because you know it's accessible easily reusable well-documented ugly license so there might be some kind of a standard around you know the quality of good data and as you say if your data had that fire those five stars then you know that would be our award for that good data management and would allow the people to recognize that now there are some very formal at the very most formal level in Australia the ABS does have a very rigid rigid rigors got the first three letters right a very rigorous framework for data quality particularly if those that data is going to be used in a public policy decision Biko egg or something like that then there is a quite a formal documentation of data quality that is available data cited on our website as well as on the ABS website that's probably at one end of a spectrum of very formal quality and quality assurance it's probably something we could do at the more informal research collaboration level I'm just wondering now do either you want to comment on what might be the answer to that question if if it were the case that research funders counter data as research outputs rather than my products my understanding is currently Australian funders do not do that yes and that's where he bit facetious doing some kind of a mandate that in may increase the quantity of data not necessarily good quality data so it's a good you know me in your appointed out guidelines and objectives at that funding level it might be good to have our five-star system dances these are the kind of things that are included in a good quality there's so much we built on the process that went into creating the data room we've got two questions working on the new odds gold license chooser now and it will include questions about sensitive data and research started generally we can play the commenting that as a as a comment and say that that's baton from our scholars working on our final question from the day what investment needs to be made in the education aspect of work the data in the research sector is there any specific amount currently invested on education which should there be a policy on the investment in the educational aspect of promoting the open data and its usefulness just from you know what experience what we hear from various surveys and interviews and I think the getting particularly PhD students into the habit of looking for data an established data centers he is actually a really important thing because you know it's a process of research is evolving how we do research it's quite different now to where it was and I say a long time ago when I did my pitch to be it's very difficult period and so I think we need to do we do need to think about how we encourage students and phd's to three sets in particular to go down that sort of route build it into expectations of the supervision and you mentioned overs part of the infrastructure under our definition here is included
hard and soft and that the education is a really critical aspect of fun yeah
well John and Nicholas is Monty thank you for that thank you as always any follow-up questions or commentary please contact john directly at the victoria center for strategic economic no Victoria institute is not the strategic economic studies or me Greg Lachlan here at ends and just lurking in the final slide that we are part of this increase initiative in Australia which is all about it this kind of thing show where
we can find the ones on the website yet
to all of these reports from the quick links on our website so you've got the cost ritual one that John talked about the costs and benefits of Java provision
is there equation is there and some words from Hobart and then the quick links again the latest one they're open
data before vailable there so very easy to find from any age from the answer it and below the screen there is that figure again so thank you John Adrian Nicholas Susanna and everyone who logged
in today