Merken

Wagging the Long Tail of Research Data

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
and thank you very much and and thank the organisers for for having such a great Conference and and giving us an excuse to come and see the beautiful city of now
see anything can be Israeli last night like show what was very beautiful and and a kind of made think of can background from French because we have a kind sounding like shows a tree as well and a also want to thank the organisers providing the conference because it's really impressive to see how and how much of it has been with the deal and data site and so I think it's it's an idea whose time has come and it's great to see how much momentum for now an icon top today about the long tail of research data which is kind of a part of the body the research environment that is often overlooked in the New Year conversations and the and the 1st introduced a little bit too to my organisation and why we have an interest or where interest Leisenring research data and talk about how some of the characteristics of the long tail island based on some information Ivan Gadarene in my role as as the trainer of the area launch of research data interest groups and are not just under the of grief over the of some of the work that we started to through that interest and perhaps of the work we might be able to to future so I'm so what is called corrugated was launched as as the disease and 2 thousand 19 and we really are a global organisation of open access repository initiatives and are major interest is in bringing together the major open-access repository networks that are are growing involving across the world ensuring that those networks surge or purple and and and ensuring that the V the discussions of the open access Environment also included positrons perspective not just the the price of are cost us in the Greek called prospective so whenever major issues interoperability working very hard with members and with the with global networks such a Shia Open here that evidence yet in England America's to shoot at the centre of ability across the positive networks and to take most of us all this has been on some but positrons that collapsed publications particles but really out are members are are the services are members provide a really sorry to expand to include collecting research data as well so there is a big interest in village opacity understanding more about how to manage research data on what the institutional only is in terms of collecting research data future sold as we
know Big Data is all the rage with here all come to re ports and and that it especially over the last 5 year is about how the data will transport signs but
actually the majority datasets the created research don't fall in the area they did not size and timing of numbers of datasets baseball to what we call the long tail of research so that you see on Monday on the left hand side of this is that this is a big science the worst paradigm large instruments creating data using the of all that she wanted processes power to be able to to analyze and and and create more jobs and at the other end a all the smaller very diversity is that the fall in the long to sell as a satellite to give you some some of the more and more of deep a description of the long tail really is signed by a gathering no evidence particles that talk about surveys information that people are are doing to try to describe the kind of continent on the research is the created here as a kind of general review and and really this is not these are not fast care to restrict the description is a very grey area in which but now long tail 10 we tend to talk about very had a genius divers datasets to tend to be smaller the year using the extended not all the same Standard there not integrated with each other the players Sir standalone datasets gyrated individuals rather than as a collection on because they are so different that they are generally collected by institutional repositories general caused is are often don't want to go
so what is the size of the long tail well surveying just 11 by scientists found that the UK and this was survey across disciplines of thinking the next like tells researchers but they found that the majority of whom are well 48 per cent of researchers were creating datasets that were smaller than the 1 gigabyte so that the that could quite small is very hetero genius as well of course now that a survey and I looked at some of the data that was being deposit into the data repository on end connected to the related to the are Assenmacher papers only found there was 40 different file extensions in homes thousand 800 files that they collected across explicit dry added tree on have heard dry which is the general purpose will cause a tree on the data analysis of the kind of data that they have been excels in as images video a all kinds of different from that in the many common wants to manage and the have a size of their data packages actually about 15 megabytes of experts below and and European Commission has said on the diversity is likely to remain dominate the tree research into the city for match title countries computational requirements and also the people who manage in sinking and so again a little bit more
information about what happened with this data on his leg and the signs are assigned study into thousand 11 50 per cent emergency are cutting the data in the last and only 7 per cent were living in the a community depositories 38 were starring University servers think that necessarily needs to dwell on the cell those US and the kind of care
to its sticks and that we use to describe the long tail and I think and they come with with with some challenges and managing and David comes challenges and I don't think these challenges are are in a different than managing data in disciplinary repositories but they might be more acute because of that the diversity of the of the within the positive so some of the some of the issues with the long tail on are a Heidi determined quality and value of datasets when there are coming from all kinds of different discipline but standards to use Metadata demanded data and and OR exactly different differs significant cost well on how you wish order that data collected in England and more general repository is discoverable because it's not really that goal to propose a Tory any specific the incentives to researchers like by should deposit and the data in in the interest of all the countries that brought and and really wants the business case for the organisation in terms of what she managing the game is a Tory and that's not necessarily the same as cases of disappointment in the polls and were using new advances was not so long so that's
why we launched the long tail a British interest because I think there are these somewhat acute Challenges there are also a number of other institutions getting involved in research in Nanjing research data that traditionally have been involved and so there is a lot of learning to be done and I think are actually 1 of the most popular interest groups energy so that shows you used that there is a lot of interest in this area and the judges are biographer really to to to better understand what where are the care traces of this long tail and to address the challenges and mentioned earlier how to manage appropriately how to provide incentive to researchers in how to to to create the business case in my shocker practices on and develop of best practise in the tree and and and we really want to work towards an Shorey's interoperability not just across and found that those kind of institutional or General was preposterous but I drove ability to donate deposit and general postures as well which is the is also or so few
things we've done so far in your life only Yoriko so we haven't done a lot and France was just not my mind calls to tell me I should be doing more and we didn't do a survey of what kind of discovering that data is being signed in in across some of these repository and we talked about the fallout from that we did have a discussion about what kind of practices he can implement preposterous to improve December data on and so in a number of other things that interested as it also dressed in in doing a future and you'll see there and and selecting the next meeting are Johnston and will be discussion about which specific areas talking which shoes and how we could saw some more more and pragmatic results
so I'm just give you a free shot at a summary of what we found an hour in a restaurant is every meditative so thinking about it really is an issue in terms of quality of finding datasets from the perspective of this kind on Taylor was a tree is not so much the domain
once by but more general purpose 1 so we thought we would do well survey to find out what are the current practices data on whether repository managers thought that they were sufficient in terms of discovering so we had 6 response is but about half of the country bailed out halfway through the survey so we have the worst of it is that this is obviously not a representative but she now has to of which way the wind is blowing and most of the response is were from your and we were all from your and so we ask descriptive Metadata standards are being used some because the Tories were using only 1 and a scheme and others were where they were just accepting whatever Metadata scheme was used by the research proposed some reason the number of so there is a lot of diversity their across the repository but my
feeling is that the trend is towards the data saying that netted a scheme that might become a default Metadata scheme for many of the repositories the collecting divers and multi-disciplinary so that we ask you know is that you think that this method is sufficiently to be able to go to shortest every datasets and an 80 per cent of suggesting that the thinking about what they qualify by saying that it is summaries looking and is searching in a specific repository but summary searching coming from outside of the was a Tory then probably not so the timing and the slides will be available at all go into the wording of a summary of what we now and I think there was a presentation yesterday from 1 of the domain repositories about how do people where he people come from to access to to the data and the request for a while now they come from a certain should goes to the all so we need to improve our discovering the innocence of the date and time because it is so that as they said this was followed by a discussion about some of strategies for improving and your wife is 1 key strategy and I'd say about 50 per cent of the depositors response to the survey of were assigned Dealwise and a 50 per cent and linking data publications is another way to bring people in bringing about the building of is landing pages which we talked about nested a lot of making sure that they are a comprehensive and described the case of the well known and big difference 1 of the things we talked about his attaching data management plans to to data if the trend is that more and more researchers have been created on the management side is my not use that as an discovered mechanism for the cost enabling and machine realties and an assigning datasets and or register datasets in data Registry based at registries or as well as signing Registry repository so that basically where racked with the interest-group by all of you and if you are interested to join us and I think most of the presentation at the conference actually were taught so far have been really long tail of research data and focused because most of the most most presentations were talking about this kind of not not actually data but by smaller datasets so do think there is a trend towards them institutions recognising the value of this data and setting up posters to collect data on 1 of things on talk about Correa's is based is institutions really do offer sustainable environment and preservation Environment The not based on the idea funding cycles by many of them such projects so that they offer solution a sustainable launches terms solutions for for taking care of data and institutions already have a lot of expertise all librarians asking a lot of hours speeches yesterday were from the data signed library in the 2nd seed that and expertise already being used in terms of their management and and so we are also very good at collaborating which is important for managing recently on my think we need to be continually think about working closely with the creative side of the data because they are the ones who really understand and no and destroy described so we need to make sure what they are talking to them to make data created researchers and we have a lot to learn from the disciplinary committee which of already been managing data for a long time in some cases wanted data so I make sure that we learn from the best practices that they are already revelry developed not recreate the wheel and and also make sure that we were trying to get her to the troubles of its kind of was worried and and a guest just wanted to wrap up by saying that we should he lessons learned about and a that we learn from the academic publishing world about data citation and and we wanted just be wary about equating dissertation with measures of qualities that was quality of data because think small science Mullany not be cited as often could still be high quality and and very valuable signs of an elevated so
case through the of which were for what you do most of the questions put to her own
regions of the mood of the meeting with the Queen which of the 2 of them were about to recover from his group of such a move would be to discuss a move to lawyers and has issued a plea world of the New for existing she should not be a huge part of the review were found on the special you call to the way she was going to say exactly to found that the kind of provisions that we need to come to the point 1 of the challenges of tail is that those repositories tend to be at coming at the end of the research TerraCycle researchers finished as the user research projected and their just looking somewhere to the research data and not necessarily thinking about spending a lot of time on my interested necessary to spend time at the end preparing the kind of detailed comprehensive descriptions of the US Open as the challenge and that's why we talk the management plans to bring about a woman's the woman part of the country will want me to stay in contact between the 2 ends of the earth and the cafe in the early stages with a baby during the innings but it was a case of getting rid of the Tories to increasing the gate beginning to include research data the question would be a Apollo you
to get back during was going to anyway the interest of the Zarathustra clear change triggering a is the use only existing repositories we would including research we would be a shame if they turned up to witness the Tories for the research and eye meant was that the service is provided at Everton say that it's the same positrons or the same technologies that what is being used because they don't think the open access publication repositories are the ideal place for research data that don't think their very good at managing research data so went on to say what it is that institutions who are already involved collecting research publications are looking expanding their services in terms of collecting accounting created at the institution to include a collective research so I'm don't think that it's appropriate in most cases to collect research data using technology is that it is right now now romances publications but I think that there is a relationship between the 2 services and so on my perspective from the institutional perspective looking at where they fit not just with collecting publications of the research said created and institutions such as the world's move along with the launch the century by the British law
Offene Menge
Impuls
Umsetzung <Informatik>
Bit
Web Site
Selbst organisierendes System
Dokumentenserver
Gruppenkeim
Wiederkehrender Zustand
Datenmanagement
Dienst <Informatik>
Term
Steuerwerk
Computeranimation
Netzwerktopologie
Systemprogrammierung
Datenmanagement
Perspektive
Datennetz
Korrelation
Chi-Quadrat-Verteilung
Dokumentenserver
Datennetz
Bildschirmsymbol
Brennen <Datenverarbeitung>
Kreisbogen
Dienst <Informatik>
Gruppenkeim
Flächeninhalt
Offene Menge
Mereologie
Information
Partikelsystem
Programmbibliothek
Charakteristisches Polynom
Programmierumgebung
Steuerwerk
Satellitensystem
Adressierung
Mereologie
URN
Prozess <Physik>
Dokumentenserver
Dokumentenserver
Zahlenbereich
Sondierung
Steuerwerk
Computeranimation
Deskriptive Statistik
Charakteristisches Polynom
Flächeninhalt
Standardabweichung
Programmierparadigma
Reelle Zahl
Information
Eindeutigkeit
Steuerwerk
Innerer Punkt
Leistung <Physik>
Standardabweichung
Domain <Netzwerk>
Bit
Subtraktion
Dokumentenserver
Datenanalyse
Zellularer Automat
Oval
Computerunterstütztes Verfahren
Sondierung
Steuerwerk
Computeranimation
Videokonferenz
Netzwerktopologie
Vorzeichen <Mathematik>
Typentheorie
Maßerweiterung
Bildgebendes Verfahren
Beobachtungsstudie
Expertensystem
URN
Matching <Graphentheorie>
Elektronische Publikation
Kreisbogen
Rahmenproblem
Zustandsdichte
Emulation
Thetafunktion
Touchscreen
Server
Gerade Zahl
Information
Repository <Informatik>
Steuerwerk
Subtraktion
Selbst organisierendes System
Dokumentenserver
Gruppenkeim
Zahlenbereich
Term
Steuerwerk
Computeranimation
Datenmanagement
Betragsfläche
Spieltheorie
Adressraum
Softwareentwickler
URN
Dokumentenserver
Gruppe <Mathematik>
Zeiger <Informatik>
CAM
Satellitensystem
Energiedichte
Flächeninhalt
Gruppenkeim
Ablaufverfolgung
Ordnung <Mathematik>
Steuerwerk
Standardabweichung
Resultante
Videospiel
Sondierung
Dokumentenserver
Zahlenbereich
Ikosaeder
Automatische Handlungsplanung
Sondierung
Ausgleichsrechnung
Term
Steuerwerk
Computeranimation
Netzwerktopologie
Domain-Name
Gruppenkeim
Flächeninhalt
Verbandstheorie
Perspektive
Strom <Mathematik>
Retrievalsprache
Dokumentenserver
Selbstrepräsentation
Natürliche Zahl
Datenmanagement
Computerunterstützte Übersetzung
Sondierung
Computeranimation
Eins
Homepage
Metadaten
Deskriptive Statistik
Datenmanagement
Konfigurationsdatenbank
Vorzeichen <Mathematik>
Standardabweichung
Korrelation
Speicherabzug
Punkt
Default
Einflussgröße
Konfigurationsdatenbank
Umwandlungsenthalpie
Software Development Kit
Kraftfahrzeugmechatroniker
Dokumentenserver
Gebäude <Mathematik>
Strömungsrichtung
Nummerung
Homotopie
Rechenschieber
Twitter <Softwareplattform>
Verschlingung
COM
Strategisches Spiel
Schlüsselverwaltung
Programmierumgebung
Standardabweichung
Objekt <Kategorie>
Subtraktion
Acht
Ablöseblase
Zahlenbereich
Gebäude <Mathematik>
Sprachsynthese
Kombinatorische Gruppentheorie
Term
Virtuelle Maschine
Domain-Name
Torus
Endogene Variable
Programmbibliothek
Gravitationsgesetz
Strom <Mathematik>
Kollaboration <Informatik>
Datenmissbrauch
Sondierung
Indexberechnung
Menge
Formale Sprache
Betafunktion
Dreiecksfreier Graph
Wort <Informatik>
Case-Modding
Steuerwerk
Term
Dean-Zahl
Punkt
Dokumentenserver
Gruppenkeim
Dialekt
Computeranimation
Deskriptive Statistik
Datenmanagement
Verknüpfungsglied
Verbandstheorie
Offene Menge
Mereologie
Vorlesung/Konferenz
Inklusion <Mathematik>
Steuerwerk
Dienst <Informatik>
Dokumentenserver
Offene Menge
Perspektive
Mathematisierung
Vorlesung/Konferenz
Ideal <Mathematik>
Term
Gesetz <Physik>

Metadaten

Formale Metadaten

Titel Wagging the Long Tail of Research Data
Serientitel DataCite summer meeting 2014
Anzahl der Teile 24
Autor Shearer, Kathleen
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/15289
Herausgeber DataCite
Erscheinungsjahr 2014
Sprache Englisch
Produktionsjahr 2014

Inhaltliche Metadaten

Fachgebiet Informatik

Zugehöriges Material

Ähnliche Filme

Loading...