Update on Data Publishing with Dataverse


Castro, Eleni
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Nancy, France

Computer animation
Lecture/Conference Meeting/Interview
Lecture/Conference Meeting/Interview
so I'm just deal but it would ideally at the Institute of quality social science we have a group called the sightseeing its failure deliver a year from around and we
joined is actually and other data seems compose the researchers the has also suffered nearest there is quality assurance engineers and spoke searches myself the part of the information science discipline and we all work together to build tools and Applications framework that allow that sisterís searches to Shia the data deposited in the topic today publishing the data set out how we
viewers to milliard data restaurant used during because this is probably a good time to review the brief Introduction about what dangerous actually it's so my group develops favours it's a at is an open sauce by applications that allows people to publish site preserve research data Italy get out but he liked and style and we have an installation at Harvard University its research Europe was a Tory that's not only from itself opened for anybody and somewhere to its content were died agnostics originally only for the social sciences but now we expanded to other disciplines and I'd just the kind of Overview these are some of the numbers of the have current Davidson Silesia incensed 20 30 we've been maintained Dealwise using on the Seattle's the service which is part of the site is now sell out to the as a mention were not hard is the only place that has stayed versus sold we have several institutions mostly around the world that installed and some of them include the Odom Institute at the rest of her life and we have folks at Heidelberg University in Germany and stole the recently we also have on the other side Asia food on University in China who actually developed a Mandarin version of the interface so at the moment was only until the end of his was only available English and helped us developed and were opened to expand international sanctions to Spanish French as well in other languages swimmers available to help us and when he wants all day were speaking really have a satisfy the number of users in the UK so you can have researchers deposited in the area project departments and some and talk to you about more journals can also use the 1st published
so we offer 3 different levels of services for Journal so the lowest hanging fruit obviously is the journals can just mentioned dangerous as of a recommended the puzzle Tory and offers would go on and deposited into their own divas luckily at the Harvard success opened for anyone they can come in and set up their own placed a deposit of their under the a research and then they would just send e-mail with the data staycation to the Journal and once that that published the citational include if power the Journal was have more control over what datasets are group together under the Journal publications they can create their own Journal traders and are that way when the Office of the data to the particular versus the editor has a chance to review the case before they could prove for publication now 3rd week 1 of the more of 1 of the options that allows more technical integration and we've been working with a group that scientific University public not project is to create a plugin from the Journal side that can automatically deposit the research into the 1st half so what would happen is that the author would submit the article in the data at the same time Journal management system and once the review has proved the data it will be automatically said the and 8 p ice Saudi the IFA are case into a person who initially few ice cream Chatswood Desplechin looks like his Welsh or and the is the
verse integration project that limited to previously is funded by the side of the nation itself reaching its at end of project and we've been working with the their particular product open Journal systems which allows Journal see used this opened sauce application platform to manage their Journal articles and so well but what this also helps is that we we prefer with this integration which automatically insert citations into the Journal article in the journal optical automatically gets a publication citation in the 2nd David so there by directly linked to get now 1 of the out of things we like to work on an which out the Journal Edita's have asked us to see band which its that have died a visualizations or visualizations of the tables used in the actual Journal article you have to leave the article to actually see the data and the something that we like to work on future so his
screenshot of what the ODA looks like so when Andrew managers setting up a plugin in the Journal of the data in as they have the option to sell up data policies and we've Adaptive the Triad jointly are counting policy for a part of the and that we've also allowed the Journal manages to put off the guidelines and including the need to address occasion within the references for the data and we also had the ability to see on the screen chap who also added options for review assess how they contributed data and if you like to see more of these boilerplate falling at the bottom of the screen now when the author is going to submit the data using the the plug and this will happen during the manuscript Simitian as well so this is 1 of the steps once the submitted nondescript than they are given the option so if they have the data in a more positive were already then they can choose options the which allows them to enter the data citation hellebori it is not deposited anywhere then they making paid to publish it in the 1st of these the 2 we we want to allow flexibility and that goes along with the joint data and declaration dissertation principles number 3 evidence which is that anything that you use to put together a really researchers goal should be shown in a reference to within the publication now this is
the last options for the review a guest also review of the data and the article at the same time Ceci the and and the box as he probably get read from the screen but taking to review the data Antioco on the same place I don't have to go to the date research data repository and also than go into the Journal management system so I'll once the bravura has a in the area has approved the article for publication the data is automatically said via a pie-eyed into the diverse there are 2 options for publishing status datasets so that the deal like the action made available for the trip in the ocean as the rest in the area has a rudimentary as a choice to allow the said the published article approval for if they want to wait making allowed the pulpit the deceptively published today in the journal she was actually really so you can allow people to see the data ahead of time say it takes a long time for your Journal Oracle's published obesity is out there for people to use readers and so on that when you look at the article page in such as you will see that the estate agent is on the paper page right below the link to the full tax than the abstract and this is making the data become closer to what the
brutal the published article at so this is not just diagram that my boss missing across that shows account of the the life cycle of an integrated publishing and what it includes is not just the article but the code and what my group was working on the behind is that the and that kind of resonated with dissertation principle number 1 which is the point of raising importance of data now I'd
want and I just a brief overview of what kind of data publishing features Bourgoin have 4 point which is coming up the ball and the adult the will to go into debt because making that many of big changes in Davis and that will include a user face changed and other features but after the purpose of the time I'll just
stick to data publishing so previously Wendy diverse was allowing you you published a dataset you could only go up a major versions ego the 1st version 1 and then if made any changes would automatically good version to even if you had a tie in the description which for the sake of figuring out what it actually changed it is that it's not very helpful so were going to be allowing more granular versioning so you can have the choice to do my the Persian changed so say you just this small meditated changes and that he did not uploading files change files the version can go up to a point 1 for example words 1 point 1 and other thing that the offer is a universal numerical fingerprinted and that's for showing what happened last changed tabula file that you added and Michael Altman injury came alive work with at the Institute were behind this particular file six-city and you can read the odd collected by the link to the bottom of the page now
December versioning at in the White has what is going to look is that you have the option which are mentioned earlier to take between a major minor lies and
out what you can then see is when you do make a claim you have the ability to see the details of the change so at 1st I see this outbox array here it is easy to write a high-level changes between the version when you click on show
details will actually show you from version 1 2 version to what has actually changed in either files for your Medea so were were just allowing more ability for people to see the changes previously yet manually and what you actually changed which is makes sense because the system can see what has changed so were making it more automated for 4 4 folks to see from the tenacity this extra for
themselves now offer of this something happens very often but hopefully there trying to find ways to make sure that when you published something it is irrevocable you can't just go back into leader but we want have allowed researchers have control over how they can be Stieg session the data but still have a landing page available for the sake of persistent and so once published the data is available searchable people can inside it so we've been decided that you can no longer deleted is that once publishing only deaccessioning need to provide reason and to give them more flexibility they are in with point were trying to returning the set to see if we can just the accession of particular version sustained version 1 and everything was fine but the version to added a file that was inadvertently not something they should published which had identifiable data taking just the accession that particular version or the candy accession interrogated set
and here is an example of a new version to decide where area here shows you that the best file they would like to the accession so you can pay to the accession ideas that
you can pick the version in the screen shut up period storey it's really not to to read the script but I'll make the sides available later and then is also reasons that you can provide the drop down list of reasons that you can take in some of them say that this is identifiable data the researchers goals protracted dataset has been transferred to another Positano the obvious request that you remove it and so on and then you can provide the at least additional information as to why you you the accession and and it has moved to another part of the Tory of supply the you
are out so once used the accession to use a stranger of what the landing page would look like and I've mentions before its part of the estate agent principle of a 6 which is persistent so the research somewhere that someone concurrency out yesterday site but now and no longer in the files of the longer available
coming after 4 point now we are trying to to find ways to publish privacy sensitive data and this is a project working with the funded by the National Science nations with the school of hard School of Engineering and Applied Science as the Berkman centre at Harvard for long and the day of privacy less so this project allows for each researchers to to secure a safely deposit the data in cryptid version of data verse and also going through a series of questions figure out what the data is so they can create automatically create 3 the differential privacy is that they are amenable risk version of the openly available for anyone to see and people to reuse this a special within the field of by medicine and health of the seas alive interested this particular price so that will be coming up soon and add another project that were working on that really excited to attend the talks later on today and tomorrow is the integration with more integration with working using their and right now we were of for 4 point or only able to allow offers or research is manually and the working and I believe like to be able to allow them to create work it accounts from a from are side in also automatically connect all the diverse datasets to the working week to set up an
inquiry much and feel free to get in and get in touch with me outlook and are Science website and the and the answer questions had
questions that we can more from much early that you precipitation unjust wondering if of user and business model behind the reverse the 1st and only we that the question before and for now at the Harvard University the work that stunned to develop the site where the where the application is also wanted by the University and so anything that is either work done at Harvard or deposited in into the Harvard University guaranteed for persistent and previously at the Institute required to social science managed all the infrastructure the
servers everything and that it was likely the university itself Dean that the work that we were doing was important enough that was supported at the University level so hard University anti services now maintained salt servers so eye think that as a good sign that the trust that were developing a product that in the words to the researchers think I'm from the invention the and action to to send in your you say you are not are looking for whom he moved to join the world of the maiden
away and there was conditions and in the state mentioned went on the show who is in a hole and the day after the US so the what a should stress this more all we do it and say you could sell your
own installation Davis and have all the data preserved in your own provide champagne service which are by no you have been amazing infrastructure so you could support the how everything to want to work with us and everything be deposit and Harvard we meet we make no claims on any of the data which has provided the instruction so everything the researchers can maintain control of all the data is not Aino it seems like because it Hauser plays the we would control and tell you what to do with it but if you want to export everything also we would help you take care that sold a was 1 example of the Dutch at the University in Netherlands were moving to the dominance of repository divas anyone in these have all data has Havas but now they want to move it so we help them migrate all the datasets over to the servers of and the and the question landing of went more than to go to the polls with the law and that the deal with the team could be the subject of a new show
that is the worst that could use of order so called you in the region could installed by altitudinally vision is designed to be used part to his questions are still held by dances correctly so windy diverse only been working with the seamless astronomy Group and the Harvard Center for
astrophysics so in particular example we've worked with them to support Metadata standards of the Virtual Observatory community and not only that so we have aggregate level Metadata of Astronomy datasets but we've also got 1 step behind and with FITS files and we've been able to extract had a Metadata such as 1 Observatory out what facility but what area of this guy was a was use for that particular file so even able to automatically extract that data and provide the researcher from to manually entered its already in the data itself so the and the density question I think what he for University of thank you for the slow presentation of interested the and the differences between New Jersey among the versions of the the session and if you have a different preservation actions that some of the
worst days of my revision who plays the view the duration of the World the No no so that the only thing that the major and minor change does is that in the data citation it indicate that he would be a version 1 point 1 or version to and what we've tried to do is that would find it anyone's uploaded of file or change the file that automatically will change to
all major version so aside from that it's only just a give researchers or Charitas the granular ability to explain what the changes and were still learning This Is This is a very difficult thing we were in meetings for a week straight year Tanihara time Figaro what constitutes what should be a version change in what we love the uranium but you may have to be divided by law just justice a This Is this is a big problem for us used to show the world about the word and the vote on the sole
of the development of the new so re-establish city we make excitable only we don't changes 1 thing about the author of Out of It so it's like this early was the solution of modern review of versions of the old 1 but in terms of what you would you create a few for operation package to maintain the peace each iteration the amid a pile up of the kind of thing so out sea to be brought by totally sympathisers at home which I want to win the group but the same goes deaccessioning so this only once in the published and I realise of she annihilation to put that in the end so what we recommend and we just trying to best practices we have as many mornings as possible you're or
you sure this is going to the gym not the weakest so we bought trying but it's not perfect thinking


