Eefke Smit, STM Association at the DataCite summer meeting 2012

Video in TIB AV-Portal: Eefke Smit, STM Association at the DataCite summer meeting 2012

Formal Metadata

Eefke Smit, STM Association at the DataCite summer meeting 2012
Data and Publications: and how they belong together
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Web page Standard deviation Source code View (database) Disintegration Web page Scanning tunneling microscope Bit Perspective (visual) Copenhagen interpretation Meeting/Interview Natural number Video game Data structure Figurate number Data structure Associative property
Web page Source code Execution unit Natural number Uniform resource name Web page Motion capture Boundary value problem Scanning tunneling microscope Figurate number Lattice (order) Table (information)
Reading (process) Email Metric system Table (information) Link (knot theory) Multiplication sign Sheaf (mathematics) Database Content (media) Sequence Revision control Meeting/Interview Information Hydraulic jump Source code Information Digitizing Scanning tunneling microscope Letterpress printing Bookmark (World Wide Web) Digital object identifier Intrusion detection system Histology Endliche Modelltheorie Energy level Figurate number Cuboid Data structure Probability density function
Point (geometry) Graph (mathematics) View (database) Multiplication sign Artificial neural network Cellular automaton Set (mathematics) Average Protein Total S.A. Number Sequence Inclusion map Different (Kate Ryan album) Hypermedia Process (computing) Office suite Data compression Exception handling Physical system Source code Proper map Cellular automaton Computer Counting Scanning tunneling microscope Database Bit Volume (thermodynamics) Staff (military) Total S.A. ACID Line (geometry) Limit (category theory) Protein Type theory Arithmetic mean Computer animation Uniform resource name File viewer Volume Data structure Electric current
Presentation of a group Multiplication sign Archaeological field survey Set (mathematics) Database Ordinary differential equation Mereology Information privacy Proper map Perspective (visual) Neuroinformatik Medical imaging Bit rate Meeting/Interview Set (mathematics) Office suite System identification Source code Electric generator Ext functor Bit Lattice (order) Perturbation theory Flow separation Structured programming Message passing Document management system Process (computing) Repository (publishing) Self-organization Website Figurate number Reading (process) Maxima and minima Portable communications device Goodness of fit Telecommunication Authorization Energy level Representation (politics) Data structure MiniDisc Hydraulic jump Form (programming) Condensation Supremum Raw image format Dependent and independent variables Inheritance (object-oriented programming) Information Archaeological field survey Projective plane Scanning tunneling microscope Data exchange Computer animation Friction Network topology Charge carrier Data center Video game Object (grammar) Table (information)
Area Observational study Multiplication sign Bit Term (mathematics) Neuroinformatik Category of being Message passing Estimator Moving average MiniDisc Data structure Quicksort MiniDisc
Link (knot theory) INTEGRAL Codierung <Programmierung> Disintegration Interactive television Set (mathematics) Database Lattice (order) Declarative programming Element (mathematics) Data management Frequency Different (Kate Ryan album) Set (mathematics) Ideal (ethics) Authorization Uniqueness quantification Statement (computer science) MiniDisc Computing platform Probability density function Raw image format Link (knot theory) Proper map Shared memory Fitness function Scanning tunneling microscope Bit Data mining Personal digital assistant Statement (computer science) File archiver Text editor File viewer Right angle Ideal (ethics)
Meeting/Interview Repository (publishing) Authorization Statement (computer science) Data center Website Self-organization Perspective (visual)
thank you very much a young and indeed and I think on behalf of publishing community and they were all very positive about what they decide to it and it's always a pleasure to be here that I want to wear a up tie to entertain you aren't a luncheon on that's a bit next challenger and nobody can be even half as entertaining as and next step there but I wanted to give you some perspective from a publishers view of how data and publications and belong together and there such a big explosion in research community life researches has become quite different but life for publishers has also become quite different and I want to start with a few examples of that nature was so kind to let me borrow from them for example a user paper a roughly 60 years old and it was a very famous paper often quoted because it was the paper that revealed the structure after DNA at 60 years ago such a ground breaking paper mounted 1 page 2 1 figure no data that we
move on uh To roughly years 10 years ago and that was when human gene known was completely unravels and nature dedicated a special issue to that which counted 62 pages 49 figures 27 tables and Eocene 1 page that lists all the people are contributors 1st and empty pages were already listing all those people and you could also seen that there will be a meeting of boundaries of what was possible paper because issue has pulled out of everything and could hardly actually capture Ed important step in science we move on another 10 years and that
was when there the knowledge on human gene now and I were celebrated after 10 years and dated and put it on my tab leadership and but also for but not because it was very cool it is very cold to to do that but there were also extra reasons
and the reasons were that more than a thousand chinos were described the raw data was enormous and he knows our air our in so-called right short on time an animal and a simply would never have paper anymore and also if you then zoom in on legal atomic version of the paper that's still looks to me like it very much looks like a traditional paper but there's so many youngsters to its links to triple jump to related information and figures Figure preview was collapsible sections of the paper has really become very different with so much digital information available that other examples
to and for example a lot of work has been done in Manchester on the so-called utopia documents and are good examples where you can't jump within the paper did their interactive media and within the PDF if you find a figure you could actually click through to their data data underneath and also fined or kind of cool tools to lead presented those data different created to turn around so that is another example of how daytime have
become integrated into articles and by chemical journal for compresses the first one to launch a utopian documents that I've already seen announcements of more journals and more publishers who are who are adopting Elsevier debate publisher and has also includes all kind of acts into air a digital publications and 1 example here is Gina protein viewers you could use from within the article sold so while you're reading the article you could open up to protein view and actually takes you to the data back in world protein database or in jam bank sold the data doesn't have to be quickly article it can be anywhere in the world official data on guy but if you in the article will help you to immediately put up that data to play with to see it in a few hours etc. such sell all of this
looks really cool so you'd say there's is a lot of opportunity in hiding problem well actually there is a bit of a problem and that is the sheer volume of all that time this is a grafted you've probably seen before comes from biochemical journal and indicates with red over years in 1965 the number of journals has grown in percentage of Sorry Wrong Number publications grown and you see the other college I have different types of data sets have pro and deceit that it depends on what kind of data you talk about some of them started years some of them started but all after the turn of the century to give a steep steep price that is so much so what does that mean for journals some publishers are really struggling with what they called the data are a problem and that I'm taking our tool examples perhaps a bit common total but I think they're very illustrative MND journals In the past 2 10 years have started to include data that in journal supplements and and that was because anything that could be published in the article itself could not be made available online such as set that became very popular way but for example the Journal narrow science 2 years ago announced that they wish they would stop accepting a supplements because they were drama and there was simply had too many and varied you see how volume in here exports expressed in megabytes on a volume of what it published is in the red line and you could see individually how there was an explosion in the amount of staff that also submitted to them in supplements and at a certain point they said stop with Nevada do it in a more simply because there was too much of a burden on the pier review system and they said they couldn't guarantee quality they couldn't really look into it said is no more with a few small exceptions like care not a media stuff things like that the journal Cell had a similar problem but they chose for another way Count a limitations on the kind of supplementary material but office submits because they were feeling that they were were
turning more more into a dumping ground of data and where authors whatever they had at the end of their and the of projects they would send it to the publisher and expected to be on their website for ever and I think I'll come back to that but I'll jump to a little conclusion that I have I think the reason why authors and publishers is because they have to fuel to elsewhere if there were many many more good data on guys they would send them but I'll come to that but I just want to use a preview what I want a OK so the general message here that's a publishers cannot cannot guarantee the proper handling and duration of data the course they want to see the authors and that is where the friction Curtis and STM was involved in a project and together with a few more and then organization your audience in the EU project by inside and as part of that project survey In 2009 where we ask researchers where they currently stored in data and there were a 1 and thousands respondents and gave the following feature a not ghosts stays at the computer at work but also that portable air carriers but for example also a large percentage of computers at home whereas for example digital archives and get very little I find that a bit worried an especially if you look as the responses to before we question and that is where would you be
willing to meet research data and suddenly the digital on price which got such a low percentage as in previous slight rate very hot publishers rate lower than that in the perspective of what they get nowadays so I think you now coming back on that little conclusion that already gave I think there's a lot of demand the research community to have much better and much more anymore data on another EU project in which SCM is false and on which I want to give you a preview to some of the outcomes is project I won't opportunities for data exchange and certain and CFR also in it and you'll hear presentations later from them STM was in working package together with the British Library lead and the Dortch RBB take because we resuming in our data I'm publications of being handles nowadays with key objective to see what kind of impact and and incentives that we could think of an if if there was a combined better chief few integrate data sets and publications it in a more useful ways and what we came to work was 1st of all a definition off if we talk about that kind of data are we talking about is is a sad they and they well we uh put them in and what we call the date of publication pyramids which has 48 is the lowest layer really the role of daytime as as generators and produced an in research project next layers from Andrew Trimble or just described the data collection like you find immunity daytime hours like industry level further up is really the process data and data representation so you make the step from the data for example Intergraph's that that show and a knack condensed form of data and I'm on top of the data as you could find him in publications and now widely structured image life this because it gives a nice way to show all the different ways in which data publications are nowadays connected to study this time at the top of the pyramid of course publications were always had date you know it's not new data publication should do something together is very hard to find a publication that does not have anything but usually In a publication it is very good and very process Cherry aggregated into perhaps 1 figure 1 little table with the most noteworthy process the outcomes of the data that of course it is a very important but this is the most traditional way which entered data has always been an presented the 2nd day it is what you see nowadays happened in supplementary information and to journal articles and that overthrow older stuff that did not fit into the article but that office still want to presented readers among way or the other so the 2nd 1 is a journal for articles supplements if we go to the parents live here that is increasing practice nowadays and and 1 that I think it has a lot of promise for the future and that they a found that is held in data centers and repulsed trees and is referred to from the articles because really there is no need to have the data and the publication in 1 place born and today can really be in different places and I think especially from my own publishers background I think that data centers are much more professional and much more expertise reading data well publishers do what supplements the a way was describes already early this morning by 1st and that they publications or data articles that is really a new thing although quite a spreading and my quite quite fast now and also making use of data in their repositories and but articles are really about describing the data the quality of the way it it can be applied by others set to trap and a it is interesting to see you in a few years ago Herbert to mind and a rhodonite tonight got me we may be the least of all the examples that were the very 1st journals were then period and but since then I also get all kind of example centimeter of journals who were actually doing this already much longer and had had data articles as a separate article site within an amateur but it's interesting to see that there's so much more enthusiasm for it and fish for therefore that of course the most
desolate Foreman and 1 that is so bad that bit of beauty and that is all the data Joyce the remains on disks in Institute and that is the big percentage of researchers who say that they keep a data on the wrong computer or the heart or whatever so if this sort of structures the way he you know if this is typification
uh of the way things are being done now and then of course the next obvious question is what is the situation and how bad is it well pyramid slightly shorter realities this that horrible category is probably far too large not if you look at different studies than the current estimate is that two-thirds of 70 % data is not shared or is never made available or whatever and data on times are springing up here and there and and taking a lot of new initiatives but many disciplines I without and an and actors back to to my message before you know what I think a lot of researchers would like to see more and more toast with data on that we come to the area with supplements you know a lot of data have that ends up in
journal supplements now not always the best way to make him reusable and accessible at the top of the pyramid is fairly stable that a debate as that always got into publications and that will remain to be sold if this is a largely shorter reality scores are you told me we could also the ideal pyramid and I think it would look like you always have raw data that is perhaps more jets shares or not I'm ready to be shared so you'll always have that later underneath but hopefully it would not be as big as previous periods my hope is that data are guys will get a lot bigger share of the supplements will shrink at think that supplements to journals which will you back 10 or 15 years from now that will say I now that was a that was a temporary thing because people thought that you know where didn't fit on the paper and Woodward a sophisticated it gets enough to really integrated data with the articles but of course as these examples at the start of mine and talk show there's a lot happening here so I would hope that case Exley grows and little get much more sophisticated ways like Utopia documents like care viewers To date editor elsewhere in daytime soap this would be my ideal pyramid and of course them and we establish a set ourselves how publishers have to make things better well I few obvious things that's partly what we heard In the other 2 talks he had before are Assistant identifies its but bi-directional linking about partnering with data are not an accepted says it better better integration of underlying data in here in the articles and I think I can also use this platform here to announce you'd have to be a data assigned actually issuing a statement together right after my talk it builds on the Brussels declaration that we had in 2007 where and and some 50 60 publishers said the role research data should be made freely available as much as we can and it statement that we announced To date is a 1st of all that we encourage authors of research papers to deposit their research validated data it just worth data armbands 2nd that where we should have a bi-directional linking in place between 2 data sets and the publications 30 that they should be visibility of these links from people who started the publication saying people who
started the archive they should have links feasible To showed related articles a 5th elements of 2 statement is that we want to work on best practice recommendations for the citation of data sets and the work that say that we doing to date to warrant a day after all meant for Co daytime is exactly on this and there are think this is now very important firstly I find their way of data citation bit messy right now is a lot of differences
between different disciplines also an some data on Gorazde is very very good instructions on how they want authors to sites the in the air in India repository but others don't any instructions I would very much like the paper that their teachers of duration Center issued last year about best practice guidelines for data situation because he did give some some how to do it before and I think you know from the perspective of publishers and publishers eager to do it right and you know let's agree with all the communities how we actually want to do it especially with data on how do you want to an and last but not least a data center invite other organizations to also a statement that we can help to wear Mini and that was my talk I think that perhaps you have to so