Bestand wählen
Merken

Michael Wilson, STFC at the DataCite summer meeting 2012

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
Yes like a willful I whether the science and technology facilities which is a Research Council fund research stockholder Research Council of the Hollywood would facilities at what we do a lot of is we run big science facilities so big facilities cost money produce data with an obligation to get the best use that data we get what we do describe what we did what we do with data that I raised issues that So what we do They facilities helping do what big sellers particle physics we played ball and manage the UK subscription that
Billion euros facility we we'd do the same thing for European Space Agency that runs about a billion euros mission we also billed the the cameras in most of the satellites watching you that the operate about minus 270 degrees tried people the atoms station as possible so there's sensitive as possible Speciality that we get lots of data from that we do various things astronomy and European Southern Observatory in other areas of those for big facilities at next 1 was coming along the Europeans Extremely large telescope was agreed funding this week that will probably be a 10 billion EUR facility on the square kilometer race which will cover Australia Southern Africa up to the equator and my only nor was spread across the the forest Chile will be in the order of another 10 billion euros or big of real and is small size small solid years we managed to UK subscriptions to Tyrell which awful swelled talk about the the accession mold that a neutral source we also How about national Sally another neutral souls who likes to talk about the laboratory and X-ray source code Diamond delays national facilities tend to come in at about half a billion euros generally We could persuade minister to write a check for bout of believe so we can either get 1 national facility your contribution to an international human rights bigger checks and in order to keep writing nickel to show that he's getting his money so are going focus on the last long Small science and talk a little bit about what these devices to for data that mutual source is an X-ray sources said The big microscopes you'll watching me lights coming in the Windows palace The meeting going in your eyes the eyes of the detectives these things have not particle particle accelerators that with around protons was round electrons produce X-rays produced neutral The light coming in the window somebody brings along a little crystal light bounces off the crystal occurs into a detector detector reducing date in your case data goes right to the right Inuit enact the data goes from the detectors to act computers so each other data archives the data we got about 3 million files
20 years that files each file can be be hundreds or thousands of records and the diamond lustfully viewers we got 100 million files But we also take the data from serve in the last 3 years about 11 petabyte that may sound talking after me for a fraction of what they produce We've also got computers that people could use to analyze that these posts and we lost a few months only well a true we currently have you case but powerful computer and also the UK most powerful graphics processor based computer that is used to analyze a source of data We'll go large commodity computing customers 7 thousand processing course which is used to avoid certain data we gotta A new place throughput put super data cluster which transfers the data warned terror A 2nd um which is used to analyze the earth observation data And build models of climate change altogether we currently got uh take robot system with a hundred petabyte capacity for storage and with doubling Allied data every year So a lot of big money is gone in and is producing a lot of data we have to maximize the value of that so We list the last With why researchers Do You used to realize this is a list of what we want to what we think we want to do to get the most out of that data we go at researchers access it comes off these machines in U.S. they wanted the next day actually they also wanted back 10 years Holland when they forgot where they put their own copy of the government what they labeled as when they find some new result they want combine it so we've got to preserve it we've got to keep it we gotta keep them a list them but we need to have we need other researchers to validate those results the results of the analysis conclusions they take the data they can't afford We do new experiments on these big pieces of equipment you cult reruns of events in history were observed so They need to get hold of that data If published papers we what exactly as was set in the previous told dear wife reference precise data that was used until that function precisely we want to promote people doing better studies yes these big equipment they're expensive but they still have a certain sensitivity obviously people contain multiple datasets and often find effects
The good when anyone dataset again they have to reference through the deal House point That big machines the big computers are just along the use model climate change is an obvious example Also things like how galaxies of formed how the universe was formed those models needle left parameters set They also need dataset test again they need to to get access for different views of those of the people who don't know experiments that the equipment was there settings were they needed a whole lot of new information to help them understand what the data means is useful to Gradually uses a lot of data we get data for experiments or signs we don't know about as was mentioned as morning his status that um is a continuous status of the 3 100 years collected 6 times a day Started off being collected so that people could get chips rivers without going round you have it's really important in climate change studies have that sort of long-term status said nobody thought about using it that before 30 years before other Davis said again with people come in with complete New you say they don't know a lot about how data was originally selected what's done with major
A lot more information on the day of the stuff we're doing is About fundamental science Jurgen view a lot of the stuff we're doing is also Had we get that new drug recounts how we get that new material to make a call the based computers the silicon chips Of the limited growth potential those sorts of activities that have been enormous financial pale So there is a recumbent bike patents hold issues of Rights control management we got issues The moment about U.S. courts are demanding ELECTRONIC data actual store for stored information in defense patent court cases in November last year the judge demanded the full ever want stored information most 18 years that with relevant actively disclosed patent case last year a similar richer financial issue where he made a similar judgment about 6 years ago the company said it could dispose of 20 billion fine so it's not just a matter of having the data Saying here it is or not we actually produced Don't produce it faces enormous so we gotta keep on 20 scale we also talk about evidence-based policy-making alone policy stuff coming out of climate change based on climate change models based on the data sets can be decades or centuries so we have to preserve it for a long time we have to be able to identify people have defined it They have traced back through the literature where that data's is being referred it was referred to in the journal Proceedings of the Royal Society 1797 the disease They have to be overflowing And yesterday for status as I get down with the users involved unless less about the As I go down the payoff is longer longer rode the future away from when the data was originally going down list the probability of actually any single dataset reducing the payoff is lower and lower So you have to make an investment although probability of potential high impact we have we have to try and understand that space data and reused benefits we tried but we understand why we're doing it we don't know because economics enough yet
But we know that DOI sit in there because their import way of maintaining reference latest Data reused With the various models of data used various life cycles We go through its essential in this cycle We got scientific publications we got various sorts of data being preserved in archives people have to discover that data and has a crew available to But we would promote that day to read particular truly as Aaron metadata catalog right at which contains access to the data from those facilities be Microsoft 6 resources be contains also access to install a superb pedestals people's which so when people say how we have some time on the facility they write a proposal would keep them when they come along to get when they scheduled tell them yes we could have that type kind of facility you can use it for what you ask for when they turn up their data they run the experiment data gets put on the archive that make them then
Please note that database calibrated settings Coming off the machine that calibrated cleansed ocean goes in the car they can download it they can analyze it and do what they like with it after that they will have some results which will put into a publication that publication will then use a DIY to cite original source data that publication goes into the or heavily DOI so we could follow the chair and no single human being is touched anything to to do with metadata so far so this is a big contrast what's happening the social science area where in order to get this to the scientists who wanted the people who Validate the results we do this automatically go further down list I show that we got to add more more the during that human that's what we need to understand that We have lots of computers involved in this process may of course include access to DOS service into the data site system is an integral part of the show which Schreiner neurotic with automated metadata collector We could schedule a proposal That tells us who's doing what do the Fonda is what they're doing the instrument tells us what the data is what data settings is settings with the publication tells us what we analysis of what results the DOI is address the links these things together except by percentage people not use facilities don't actually do what we said in the proposal They said we going to look at a piece of gold ass annoying similar later with a bit of class I mean it they look some because they say they bring alone Chris full put it in the big microscope structures Might take 3 years for 24 years just to grow that crystal took 2 microns size they might be looking The human rights as a single launch crystal and they want to achieve the final structure that particularly done the guy got price that's what we will do our dear wife because people want to know where Cepeda days ago price In other cases people so upper other things actually right because that is what it was where we get the paper published about something out of the circle Which date to do we want I've noise he said system produces raw data is recalibration varies caliber that's automatic that's all we got they do some analysis that means they run of computer programs
What computer They're producing various bits of the derived data what to right we don't have that they gone way back that laboratories and so we want to live on took place but we encourage goes back the same problem that the see have on other people have well we're waiting for the researcher give us something where they don't necessarily see the immediate we also have issues that immediately to broaden the problems Which Beazer software to the you come actually come we do have to retain and preserve that piece of software do we wanna put two-year wireless piece of software so that we can say that profit and this is where the data came out of the experiment this is where calibrated that's later this is where the user took away their where this particular pieces software the the software as well as for the data so that somebody could How do wish to preserve the pieces So we got software from the 1960 used to run on IBM 360 mainframe with idea we've actually got simulators of IBM 360 mainframe little run on laptop PC's today so we can Still run the 40-year-old software This sort of replication can be Forget how would we Reference the dealers which data should actually be cited now we have Dreyfus we want facility we get paid dividends in our structures about facility we allocated being time backlist anywhere between 5 minutes in 2 weeks depending on the type to and the toluene that we allocate that being salinity alike we say This is where your date is going the investigation could cited even if all the data actually exists What level Should the We assigned time we say you've got a day four-day now We want facilities we assigned DOY on the basis of the only because that's the Obviously managed in that time they do an experiment experiment produces 1 or more data sets each dataset 1 hundreds of files so we have a metadata structure which relates experimental facility toe in in the middle of the week of Apr . multiple datasets mobile data relief capsule aikido into local studies have grown proposal but multiple experimental forum further that's on a topical access control issues was related material proposals publications
Other legal issues we the the investigator is because we Haiti's grow old with giving him time we have got but if they don't get married change the names of the university's we don't touch manage These individual identified so the say when these India Data persistent identifiers break identifies in this structure the moment where allocating Dear The experiment level that's what's being so we will get to the point where we are without way restructured the namespace the dealers we remain management for that provision that allows website Davis who individual foil or indeed records wouldn't say when it's the prize-winning when when it's that dataset concerned that we expect to hear about the end of the year says this is the but he really it's battle he spent all your money a lot of schools of real estate show that wastes 18-year-old physics students Kazakhstan refused they will find that particular data that the that that's the 480 billion euro wanted so we are gonna have to be identified regularity we gotta system little support in the current sister however when do we published what we have about 1 % latest commercial we don't publish it issued is that we don't need not my case commercial just collected take it away and we got back over the machines of White Plains
Did But we got different sorts of facilities they do different types of science Different types of science at different policies about how long it should have Access to their data and normally the summer the dog period says scientist have debated for his own use for so we said that normally around 3 years still has a Ph.D. period often universities got Ph.D. regulations If we would select data rout might pose Ph.D. in doubt whether original research But What's really important to everybody we record who access data so that later on we notice somebody's access It is quite common talk about a volley data we also have a real problem with his idea we have bargain metadata it was an example for 2004 and you know it's quite in astronomy Well Spanish a research team and there was discovery of a new planets in
This was greeted with all the response of press would greet somebody saying we discover new planet and turned now actually they were group of people in California who also founded and Previously announced workshop where they would go into a discovery that it set out the Spanish people actually got 11 announced then they to the record of where those Californians have been which data they'd be looking at a later told the their thinking of announcing a new plan this is the data they were looking at to announce a new plan We'll look at that But we can see a planet less announced it that serve Polly be able to get access to the record which some honest interstate access the data they jumped to took 1st announcement of a major discovery so the scientists a really wary of knowing exactly whose access their data and yet Letting people know whose access the data won't necessarily what stage sometimes with better data Even knows that A group of cancer research is in a particularly universities being funded by a particular drug companies to look at a particular chemical is enough for another drug company does think obviously vaccine would collect types of cancers and then look back then they think that could be a potential new drugs to kill cancer in that area
Just giving away the title could give away commercially variance we got issues about how we embargo the metadata for volley access to the logs for how long we embargo
But the data itself we want to avoid data misuse not just in those ways we got lots of satellite us observation data Problem except because some great detail observation data events such as airplanes crashing into New York into what which in 2001 at regional is wanted for reasons we will deal with so we have to decide how to do with case federal case we also got leaves you and into that'll catalog climate change date for which their reports would we want to make available to the public so they understand the issue we got We can only make it available in such a way that the conspiracy theories out there aren't such jump on 1 plane part of it and make something up with certain data We don't want people suddenly saying that where the black calls on their making them going to destroy the universe that we do not publish samples the teaching but if we do That we would receive their state publishing at all because that pleadings You'll stop signs has to be verifiable but we got these real political rates as we and lastly I'd Ursola comeback point made really plummet earlier read a lot of money is being invested in this experiment a lot of money into the data collection we need Justify get boost use out of that deal lies a vital part of that process data that might be necessary for software we need the legal publications the cycles not we also need to determine what return on best How do we calculate different people want it in different ways European Space Agency have financial model Based on space missions of long-term infrastructure some Research Council's of infrastructure some people believe in different ways of doing these evaluations nor costs involved what thoroughly is a little bit about earlier this risk loss mentioned earlier of the evaluation of further EDS a UK data parts but by the SLC was several things are going along with people looking at what are the costs were return lots of different approaches not quite toolbox yet where the evidence was talking about will compete purchase the probably will turn into another 1 is a European project called ensure which is looking at a total investment also in the commercial area of preservation based system but half of all we can do many things with a repeat of the half the YTO preserving data what your objectives in evaluating the activity involved Aziz said I put on a list that different objectives different groups different lifestyles and tried that with these
Process of air that 3 areas but with interest to go west to toe
Pay-TV
Besprechung/Interview
Resultante
Stereometrie
Satellitensystem
Bit
Sensitivitätsanalyse
Prozess <Physik>
Computer
Gleichungssystem
Computerunterstütztes Verfahren
Computer
Cluster-Analyse
Graphikprozessor
Datenverarbeitungssystem
Bildschirmfenster
Roboter
Bruchrechnung
Lineares Funktional
Prozess <Informatik>
Pay-TV
Gebäude <Mathematik>
Speicher <Informatik>
Quellcode
Ereignishorizont
Rhombus <Mathematik>
Verbandstheorie
Rechter Winkel
Wissenschaftliches Rechnen
Ordnung <Mathematik>
Magnetbandlaufwerk
Telekommunikation
Server
Kanalkapazität
Ordinalzahl
Code
Viewer
Überlagerung <Mathematik>
Multi-Tier-Architektur
Virtuelle Maschine
Informationsmodellierung
Datensatz
Reelle Zahl
Rhombus <Mathematik>
Theoretische Physik
Arbeitsplatzcomputer
Vererbungshierarchie
FLOPS <Informatik>
Speicher <Informatik>
Partikelsystem
Leistung <Physik>
Analysis
Beobachtungsstudie
Soundverarbeitung
Architektur <Informatik>
Wald <Graphentheorie>
Vererbungshierarchie
Kanalkapazität
Mailing-Liste
Physikalisches System
Elektronische Publikation
Fokalpunkt
Coprozessor
Roboter
Minimalgrad
Quadratzahl
Flächeninhalt
Parametersystem
Partikelsystem
Modelltheorie
Telekommunikation
Vektorpotenzial
Subtraktion
Punkt
Momentenproblem
Besprechung/Interview
Unrundheit
Computerunterstütztes Verfahren
Raum-Zeit
Virtuelle Maschine
Informationsmodellierung
Datenmanagement
Vorzeichen <Mathematik>
Speicher <Informatik>
Grundraum
Softwaretest
Beobachtungsstudie
Parametersystem
Zentrische Streckung
Fundamentalsatz der Algebra
Sichtenkonzept
Systemaufruf
Mailing-Liste
Quick-Sort
Menge
Rechter Winkel
Information
Resultante
Bit
Web Site
Prozess <Physik>
Metadaten
Adressraum
Klasse <Mathematik>
Programmschema
Geräusch
Online-Katalog
Computerunterstütztes Verfahren
Analysis
Virtuelle Maschine
Metadaten
Informationsmodellierung
Digital Object Identifier
Adressraum
Datentyp
Videospiel
Emulator
Kontrast <Statistik>
Datenstruktur
Optimierung
Analysis
Videospiel
Kreisfläche
Datenhaltung
Einfache Genauigkeit
Mailing-Liste
Prozessautomation
Quellcode
Physikalisches System
Binder <Informatik>
Quick-Sort
Packprogramm
Dienst <Informatik>
Flächeninhalt
Menge
Digital Object Identifier
Rechter Winkel
Dreiecksfreier Graph
Mereologie
Ordnung <Mathematik>
Bit
Punkt
Momentenproblem
t-Test
Computer
Analysis
Computeranimation
Übergang
Metadaten
Datenmanagement
Digital Object Identifier
Regulärer Graph
Code
Datenreplikation
Speicherabzug
TOE
Namensraum
Mobiles Internet
Stellenring
Übergang
Großrechner
Software
Menge
Identifizierbarkeit
Drahtloses lokales Netz
Versionsverwaltung
Maschinenschreiben
Betragsfläche
Metadaten
Physikalismus
Mathematisierung
Virtuelle Maschine
Multiplikation
Datensatz
Webforum
Software
Reelle Zahl
Diskrete Simulation
Notebook-Computer
Datentyp
Datenstruktur
Grundraum
Profil <Strömung>
Beobachtungsstudie
Elektronische Publikation
Datenmodell
Physikalisches System
Elektronische Publikation
Quick-Sort
Menge
Basisvektor
Gamecontroller
Sichtbarkeitsverfahren
Blockade <Mathematik>
Subtraktion
Metadaten
Gruppenkeim
Automatische Handlungsplanung
Routing
Frequenz
Frequenz
Quick-Sort
Datensatz
Metadaten
Datensatz
Flächeninhalt
Trennschärfe <Statistik>
Endogene Variable
Datentyp
Grundraum
Regulator <Mathematik>
Ebene
Satellitensystem
Explosion <Stochastik>
Bit
Einfügungsdämpfung
Subtraktion
Prozess <Physik>
Punkt
Besprechung/Interview
Programmverifikation
Gruppenkeim
Euklidischer Ring
Online-Katalog
Benutzerfreundlichkeit
Login
Physikalische Theorie
Spezialrechner
Metadaten
Informationsmodellierung
Wechselsprung
Software
Vorzeichen <Mathematik>
Stichprobenumfang
Grundraum
Varianz
Leistungsbewertung
Streuungsdiagramm
Beobachtungsstudie
Stichprobe
Systemaufruf
Web Site
Ausnahmebehandlung
Mailing-Liste
Physikalisches System
Bitrate
Menge
EINKAUF <Programm>
Ereignishorizont
Satellitensystem
Objekt <Kategorie>
Ebene
Flächeninhalt
Mereologie
Dreiecksfreier Graph
Blockade <Mathematik>
Projektive Ebene
Verkehrsinformation
Aggregatzustand
Prozess <Physik>
Flächeninhalt
Besprechung/Interview
TOE

Metadaten

Formale Metadaten

Titel Michael Wilson, STFC at the DataCite summer meeting 2012
Untertitel Meeting a scientific facility provider's duty to maximise the value of data
Serientitel DataCite summer meeting 2012
Teil 6
Anzahl der Teile 10
Autor Wilson, Michael
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/6566
Herausgeber DataCite
Erscheinungsjahr 2012
Sprache Englisch
Produzent DataCite

Inhaltliche Metadaten

Fachgebiet Informatik

Ähnliche Filme

Loading...
Feedback