Add to Watchlist

Eefke Smit, STM Association at the DataCite summer meeting 2012

3 views

Citation of segment
Embed Code
Purchasing a DVD Cite video
Series
Annotations
Transcript
thank you very much a young and indeed and I think on behalf of publishing community and they were all very positive about what they decide to it and it's always a pleasure to be here that I want to wear a up tie to entertain you aren't a luncheon on that's a bit next challenger and nobody can be even half as entertaining as and next step there but I wanted to give you some perspective from a publishers view of how data and publications and belong together and there such a big explosion in research community life researches has become quite different but life for publishers has also become quite different and I want to start with a few examples of that nature was so kind to let me borrow from them for example a user paper a roughly 60 years old and it was a very famous paper often quoted because it was the paper that revealed the structure after DNA at 60 years ago such a ground breaking paper mounted 1 page 2 1 figure no data that we
move on uh To roughly years 10 years ago and that was when human gene known was completely unravels and nature dedicated a special issue to that which counted 62 pages 49 figures 27 tables and Eocene 1 page that lists all the people are contributors 1st and empty pages were already listing all those people and you could also seen that there will be a meeting of boundaries of what was possible paper because issue has pulled out of everything and could hardly actually capture Ed important step in science we move on another 10 years and that
was when there the knowledge on human gene now and I were celebrated after 10 years and dated and put it on my tab leadership and but also for but not because it was very cool it is very cold to to do that but there were also extra reasons
and the reasons were that more than a thousand chinos were described the raw data was enormous and he knows our air our in so-called right short on time an animal and a simply would never have paper anymore and also if you then zoom in on legal atomic version of the paper that's still looks to me like it very much looks like a traditional paper but there's so many youngsters to its links to triple jump to related information and figures Figure preview was collapsible sections of the paper has really become very different with so much digital information available that other examples
to and for example a lot of work has been done in Manchester on the so-called utopia documents and are good examples where you can't jump within the paper did their interactive media and within the PDF if you find a figure you could actually click through to their data data underneath and also fined or kind of cool tools to lead presented those data different created to turn around so that is another example of how daytime have
become integrated into articles and by chemical journal for compresses the first one to launch a utopian documents that I've already seen announcements of more journals and more publishers who are who are adopting Elsevier debate publisher and has also includes all kind of acts into air a digital publications and 1 example here is Gina protein viewers you could use from within the article sold so while you're reading the article you could open up to protein view and actually takes you to the data back in world protein database or in jam bank sold the data doesn't have to be quickly article it can be anywhere in the world official data on guy but if you in the article will help you to immediately put up that data to play with to see it in a few hours etc. such sell all of this
looks really cool so you'd say there's is a lot of opportunity in hiding problem well actually there is a bit of a problem and that is the sheer volume of all that time this is a grafted you've probably seen before comes from biochemical journal and indicates with red over years in 1965 the number of journals has grown in percentage of Sorry Wrong Number publications grown and you see the other college I have different types of data sets have pro and deceit that it depends on what kind of data you talk about some of them started years some of them started but all after the turn of the century to give a steep steep price that is so much so what does that mean for journals some publishers are really struggling with what they called the data are a problem and that I'm taking our tool examples perhaps a bit common total but I think they're very illustrative MND journals In the past 2 10 years have started to include data that in journal supplements and and that was because anything that could be published in the article itself could not be made available online such as set that became very popular way but for example the Journal narrow science 2 years ago announced that they wish they would stop accepting a supplements because they were drama and there was simply had too many and varied you see how volume in here exports expressed in megabytes on a volume of what it published is in the red line and you could see individually how there was an explosion in the amount of staff that also submitted to them in supplements and at a certain point they said stop with Nevada do it in a more simply because there was too much of a burden on the pier review system and they said they couldn't guarantee quality they couldn't really look into it said is no more with a few small exceptions like care not a media stuff things like that the journal Cell had a similar problem but they chose for another way Count a limitations on the kind of supplementary material but office submits because they were feeling that they were were
turning more more into a dumping ground of data and where authors whatever they had at the end of their and the of projects they would send it to the publisher and expected to be on their website for ever and I think I'll come back to that but I'll jump to a little conclusion that I have I think the reason why authors and publishers is because they have to fuel to elsewhere if there were many many more good data on guys they would send them but I'll come to that but I just want to use a preview what I want a OK so the general message here that's a publishers cannot cannot guarantee the proper handling and duration of data the course they want to see the authors and that is where the friction Curtis and STM was involved in a project and together with a few more and then organization your audience in the EU project by inside and as part of that project survey In 2009 where we ask researchers where they currently stored in data and there were a 1 and thousands respondents and gave the following feature a not ghosts stays at the computer at work but also that portable air carriers but for example also a large percentage of computers at home whereas for example digital archives and get very little I find that a bit worried an especially if you look as the responses to before we question and that is where would you be
willing to meet research data and suddenly the digital on price which got such a low percentage as in previous slight rate very hot publishers rate lower than that in the perspective of what they get nowadays so I think you now coming back on that little conclusion that already gave I think there's a lot of demand the research community to have much better and much more anymore data on another EU project in which SCM is false and on which I want to give you a preview to some of the outcomes is project I won't opportunities for data exchange and certain and CFR also in it and you'll hear presentations later from them STM was in working package together with the British Library lead and the Dortch RBB take because we resuming in our data I'm publications of being handles nowadays with key objective to see what kind of impact and and incentives that we could think of an if if there was a combined better chief few integrate data sets and publications it in a more useful ways and what we came to work was 1st of all a definition off if we talk about that kind of data are we talking about is is a sad they and they well we uh put them in and what we call the date of publication pyramids which has 48 is the lowest layer really the role of daytime as as generators and produced an in research project next layers from Andrew Trimble or just described the data collection like you find immunity daytime hours like industry level further up is really the process data and data representation so you make the step from the data for example Intergraph's that that show and a knack condensed form of data and I'm on top of the data as you could find him in publications and now widely structured image life this because it gives a nice way to show all the different ways in which data publications are nowadays connected to study this time at the top of the pyramid of course publications were always had date you know it's not new data publication should do something together is very hard to find a publication that does not have anything but usually In a publication it is very good and very process Cherry aggregated into perhaps 1 figure 1 little table with the most noteworthy process the outcomes of the data that of course it is a very important but this is the most traditional way which entered data has always been an presented the 2nd day it is what you see nowadays happened in supplementary information and to journal articles and that overthrow older stuff that did not fit into the article but that office still want to presented readers among way or the other so the 2nd 1 is a journal for articles supplements if we go to the parents live here that is increasing practice nowadays and and 1 that I think it has a lot of promise for the future and that they a found that is held in data centers and repulsed trees and is referred to from the articles because really there is no need to have the data and the publication in 1 place born and today can really be in different places and I think especially from my own publishers background I think that data centers are much more professional and much more expertise reading data well publishers do what supplements the a way was describes already early this morning by 1st and that they publications or data articles that is really a new thing although quite a spreading and my quite quite fast now and also making use of data in their repositories and but articles are really about describing the data the quality of the way it it can be applied by others set to trap and a it is interesting to see you in a few years ago Herbert to mind and a rhodonite tonight got me we may be the least of all the examples that were the very 1st journals were then period and but since then I also get all kind of example centimeter of journals who were actually doing this already much longer and had had data articles as a separate article site within an amateur but it's interesting to see that there's so much more enthusiasm for it and fish for therefore that of course the most
desolate Foreman and 1 that is so bad that bit of beauty and that is all the data Joyce the remains on disks in Institute and that is the big percentage of researchers who say that they keep a data on the wrong computer or the heart or whatever so if this sort of structures the way he you know if this is typification
uh of the way things are being done now and then of course the next obvious question is what is the situation and how bad is it well pyramid slightly shorter realities this that horrible category is probably far too large not if you look at different studies than the current estimate is that two-thirds of 70 % data is not shared or is never made available or whatever and data on times are springing up here and there and and taking a lot of new initiatives but many disciplines I without and an and actors back to to my message before you know what I think a lot of researchers would like to see more and more toast with data on that we come to the area with supplements you know a lot of data have that ends up in
journal supplements now not always the best way to make him reusable and accessible at the top of the pyramid is fairly stable that a debate as that always got into publications and that will remain to be sold if this is a largely shorter reality scores are you told me we could also the ideal pyramid and I think it would look like you always have raw data that is perhaps more jets shares or not I'm ready to be shared so you'll always have that later underneath but hopefully it would not be as big as previous periods my hope is that data are guys will get a lot bigger share of the supplements will shrink at think that supplements to journals which will you back 10 or 15 years from now that will say I now that was a that was a temporary thing because people thought that you know where didn't fit on the paper and Woodward a sophisticated it gets enough to really integrated data with the articles but of course as these examples at the start of mine and talk show there's a lot happening here so I would hope that case Exley grows and little get much more sophisticated ways like Utopia documents like care viewers To date editor elsewhere in daytime soap this would be my ideal pyramid and of course them and we establish a set ourselves how publishers have to make things better well I few obvious things that's partly what we heard In the other 2 talks he had before are Assistant identifies its but bi-directional linking about partnering with data are not an accepted says it better better integration of underlying data in here in the articles and I think I can also use this platform here to announce you'd have to be a data assigned actually issuing a statement together right after my talk it builds on the Brussels declaration that we had in 2007 where and and some 50 60 publishers said the role research data should be made freely available as much as we can and it statement that we announced To date is a 1st of all that we encourage authors of research papers to deposit their research validated data it just worth data armbands 2nd that where we should have a bi-directional linking in place between 2 data sets and the publications 30 that they should be visibility of these links from people who started the publication saying people who
started the archive they should have links feasible To showed related articles a 5th elements of 2 statement is that we want to work on best practice recommendations for the citation of data sets and the work that say that we doing to date to warrant a day after all meant for Co daytime is exactly on this and there are think this is now very important firstly I find their way of data citation bit messy right now is a lot of differences
between different disciplines also an some data on Gorazde is very very good instructions on how they want authors to sites the in the air in India repository but others don't any instructions I would very much like the paper that their teachers of duration Center issued last year about best practice guidelines for data situation because he did give some some how to do it before and I think you know from the perspective of publishers and publishers eager to do it right and you know let's agree with all the communities how we actually want to do it especially with data on how do you want to an and last but not least a data center invite other organizations to also a statement that we can help to wear Mini and that was my talk I think that perhaps you have to so
Web page
Standard deviation
View (database)
Disintegration
Home page
Open source
Scanning tunneling microscope
Bit
Perspective (visual)
Copenhagen interpretation
Video game
Meeting/Interview
Natural number
Data structure
Figurate number
Data structure
Associative property
Web page
Lattice (order)
Natural number
Uniform resource name
Web page
Open source
Motion capture
Boundary value problem
Scanning tunneling microscope
Figurate number
Units of measurement
Table (information)
Reading (process)
Email
Model theory
Multiplication sign
Sheaf (mathematics)
Metric tensor
Table (information)
Sequence
Revision control
Meeting/Interview
Linker (computing)
Information
Hydraulic jump
Information
Open source
Scanning tunneling microscope
Letterpress printing
Bookmark (World Wide Web)
Digital object identifier
Content (media)
Intrusion detection system
Database
Histology
Energy level
Figurate number
Cuboid
Digitizing
Data structure
Probability density function
Point (geometry)
Graph (mathematics)
View (database)
Multiplication sign
Cellular automaton
Volume (thermodynamics)
Average
Protein
Total S.A.
Number
Sequence
Inclusion map
Data compression
Hypermedia
Database
Process (computing)
Office suite
Subtraction
Physical system
Exception handling
Proper map
Cellular automaton
Open source
Computer
Counting
Scanning tunneling microscope
Volume (thermodynamics)
Bit
Staff (military)
Total S.A.
ACID
Line (geometry)
Set (mathematics)
Limit (category theory)
Protein
Arithmetic mean
Computer animation
Uniform resource name
File viewer
Data type
Data structure
Electric current
Multiplication sign
Archaeological field survey
Ordinary differential equation
Mereology
Information privacy
Proper map
Perspective (visual)
Medical imaging
Video game
Bit rate
Meeting/Interview
Office suite
System identification
Electric generator
Process (computing)
Open source
Ext functor
Bit
Perturbation theory
Maxima and minima
Structured programming
Message passing
Document management system
Lattice (order)
Network topology
Database
Repository (publishing)
Website
Self-organization
Figurate number
Reading (process)
Supremum
Presentation of a group
Set (mathematics)
Computer
Portable communications device
Goodness of fit
Telecommunication
Authorization
Energy level
Representation (politics)
Data structure
MiniDisc
Friction
Hydraulic jump
Condensation
Form (programming)
Raw image format
Information
Inheritance (object-oriented programming)
Computer
Archaeological field survey
Projective plane
Scanning tunneling microscope
Set (mathematics)
Table (information)
Data exchange
Computer animation
Charge carrier
Data center
Dependent and independent variables
Object (grammar)
Separation axiom
Area
Observational study
Multiplication sign
Bit
Term (mathematics)
Computer
Estimator
Category of being
Message passing
Moving average
MiniDisc
Quicksort
Data structure
MiniDisc
INTEGRAL
Set (mathematics)
Disintegration
Interactive television
Declarative programming
Data management
Frequency
Linker (computing)
Lattice (order)
Ideal (ethics)
Authorization
Uniqueness quantification
Statement (computer science)
MiniDisc
Subtraction
Computing platform
Probability density function
Raw image format
Link (knot theory)
Proper map
Element (mathematics)
Shared memory
Fitness function
Scanning tunneling microscope
Bit
Set (mathematics)
Data mining
Database
Personal digital assistant
Statement (computer science)
File archiver
File viewer
Text editor
Right angle
Ideal (ethics)
Code
Meeting/Interview
Repository (publishing)
Statement (computer science)
Authorization
Data center
Website
Self-organization
Perspective (visual)
Loading...

Metadata

Formal Metadata

Title Eefke Smit, STM Association at the DataCite summer meeting 2012
Subtitle Data and Publications: and how they belong together
Title of Series DataCite summer meeting 2012
Part Number 3
Number of Parts 10
Author Smit, Eefke
License CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
DOI 10.5446/10501
Publisher DataCite
Release Date 2012
Language English
Producer DataCite

Content Metadata

Subject Area Information technology
Loading...
Feedback

Timings

  456 ms - page object

Version

AV-Portal 3.7.0 (943df4b4639bec127ddc6b93adb0c7d8d995f77c)