Merken

Scott Edmonds, Giga Science, BGI Shenzhen at DataCite summer meeting 2012

Zitierlink des Filmsegments

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
And thanks for the adoption of via an invitation to this great meeting it's great to see if it's great to see so many things happening now in the in the area of states publishing data take a citation and it's great to be part of this Day Data publication and this They sessions on on flavors Thank you I thought give candidate produce that perspective I'm representing PTI in China and and gave science journal on the way with putting together and so it's really kind of give it to name its data flavor Useful examples for ever coming out of these cultural issues ability earlier usually I'll do these kinds of talks to to Genentech's audience I thought be useful to just give a little bit of background about the data sharing issues the Internet makes them you how things In perspective a maybe talk a little bit about my institution because outside of that Genomics people probably have ahead of it and then I'll give examples The date set we're playing around with that last year Arum And I will end up with them and just The data Opinionated feedback they feel free to take it with a pinch of salt saw that she may make everything kind of boils down to the human gene in project in there in the 1990 In a ticket to give a decade and cost 3 billion and from that tree much got 1 reference Gino but it really helped the feel of Genentech's explosive
And you know from the from this 3 billion price tag is precipitous drop in the cost of disk offers live out We know from our point of 1st G. ROM your purchase of Farrell's bows to raised aluminum Balta woman Flight because it's so people always people play Genentech being a few chose the Bulls will go off But it is very useful to kind of get perspective what what the issues are more gold This is the king salt with make improved in improvements in computing power similar Schaefer competing storage and particularly since The middle of Last decade The amount of Seagram's incurred rose rapidly in the service of issues everyone and you have to with Ahram and removed the body of former brought system remain 1 of the workers maturation Seagram's support to the book on how to book an increase of their tour In that it's very interesting at the moment there's talk of that generation sequencing in the last year or 2 and signed hard saw how the talk and democratize and sequencing much much cheaper Much much quicker subdued Agena Alice and days or weeks and decode urging him sequence on this for example and in the last month blasting got really excited by a surge in new technologies It's a comical Oxford 9 a promising an energy sequence that Space is a sort of fits and USB driver Laptop computer G names and analysis No reagents and promising amazing things people people have actually seen them yet but is potentially very distorted technology in a sort of cool things happening at cement in genomics so tying him with a history of genomic be GRID came out of the human gene and put that they were founded in 1990 and 1999 the sequence 1 % of the human gene basing on behalf of China and from that they were going to the largest Genentech's organization in the world that about 4 thousand employees and it's A not-for-profit research Institute but also it does matter fixing it said that all of the profits and bond fund research side and was formally Beijing Genetics Institute and situated in Beijing but they moved now to engine which the town in southern China blood on the border with Hong Kong and in the last few months of moved to the Hong Kong office but moved back and forth and
If you look at the global secret think BGI Big Red Dog in China 2 they have but 1 point they have probably equivalency secret passing all of America
I think the it's probably about a 5th of 6 of the world's capacity with these harder than that of the most expensive Next-generation sequences they Chen out data at a potentially terabytes and if everything's going full speed ahead They can sequence hundreds if not thousands of human gene in the dying out real to cope with these amounts of data we need a new tariff lots of supercomputing power we have petabyte pet By storage to deal with it and this is their secret in role in in Hong Kong Also in the chamber voluminous and This is the kind of the crisis That is interesting thing on high-energy physics talks because the biology gets It's sort a large hadron collider really wiggliness producing petabyte see ending up with quite caught up with them and its this rapid growth It it's going Happened shortly There's a lot of chances and that's obviously 32 users zinc vastly that's fine gold At the moment BGI million projects to use this past and basic powers duties Good ball Magazine missing projects the G 10 cases for 10 thousand but but species fight that insect species a cancer Jim consortium picking it kind of microbial ecosystems and alike so this is of things Rejects sequence and topical being here and in Copenhagen because
And last month BGI Launched its 1st European headquarters and just and the sequencing center in the coke plant I can't think of Saigon and they have 10 moved 10 of the high seats Making it 1 of the biggest city centers in Europe do things like Sequent swamps and of the Danish population and they have also sued that In the sequence that may so rude genetics is often elders at a success
Story in data sharing is among his beginning of imaging and project the public project was that By varying a collaborative nematodes community were very used It to cooperate and work together and they put a lot into human genetics And there was a race against them Point a consortium that they want a leased and managed to keep their keep it said that the data of free for the public and It is a thing of the share data Genentech's has been helped by its lucky that has been the scene of a limited number of platforms is the model of the standards and he's done all talk about late in the session but for a very long time there's been disease very well established international naked secrecy that consort repositories 3 major ones around the world the EPR Bank for example the finger 19 a think and they are the centers got together in the late eighties certain that relief pull the community's Chang the data a well-funded for a long period of time this this is obviously helped a lot and communities right from the very beginning of altogether And may be very well established data policies and all public funding research And they look After the data was released too much and lead to the community and people don't quite keep up with these Salalah dead deadlines in light of it is been it's been a kind of success when you leave your Data to the community and the community kind of makes a gentleman's agreement not to to not to pick 1st Gino White paper that faces a district is that I can't deal he shall Um both peoples for Mr. Not published until you do and Later the onto agreements try to widen the other areas of biological research with make success but this is that this is the kind of Angola's come from from genomic see these technical challenges to oversee the that the ball data
The major Long Green line here This is a exponential them over the scale and and since 2005 an incredibly difficult to keep up with the growth of the The This is the war next-generation data and prove this The system is success was Bobbitt that's obviously major challenges and genomics these folium being 1 of the biggest Barry Bingham Fund all of this and last year that end in CBI they were debating whether to keep the the short read for World data over and they got additional funding long-term it's very difficult QAT nests became belong Backlogs in amount and of find process all of this data transfer issues and it's been great in this meeting saying all of the people constraint on the on the cultural issues and send them I he a compliance with this is is probably 1 of the major steps and issues with interoperability people it takes time and effort
Produce all of this metadata I'm Security is becoming a bigger and bigger challenge if there is the democratization of sequencing there's just not much much of a rapid growth in in the user bases A major challenge for the rate deal with the growing number of people I think covered in in a few the talks today about this Disney For the incentives and creditor to put this additional effort in Mr. discussed in editorials and a lot of these meetings and Soviet as acting job Iike figures we receive data size and citations it's really it's really well here without mine be I've been watching this biscuit designs Journal database and data analysis reading in conjunction with By mid-century That's as publishers everything will be open access data and we've got back in Hong Kong we got rates and people working at home as well and the 1st issue of the journal will B early next month just a in the approving states now and there was a associated this We have a data processing platform data said in the journal on the Will be hosted And linked to the office database Both have been using this database um fall Basic publishing BGI independently independently of the general and his people have been talking about the Ecole of the this over last How we have believe number that and with how form next ball in today's race will Export data platform which can still work frozen told me that is the ultimate goal of a common or how to see things as a kind of excitable package of data and and the work and some can't description this is where we we would like to go In the 1st issue an incoming this year with with we will have some examples getting close to that and get going to the examples of The data released a file and database kind of kind thing several things that That need to be done to the these days culture has to overcome a lot these overcome of societal states we try to see them see how many of these steps we can work through and and and and deal with now history A beta site is a great so many members now growing amount of
DO ICG and people have been using it In and using dealers data basis for quite a long time Safina this nice examples going back a S 6 7 years if a thing cited end in sight for example The PDB It's been using dealers for a long time that people have a that Even then a matter of fact some examples of people fighting PDB And so obviously The growing user base so we are in the Mafia River about 35 Gino some chance after June and Slightly different types of of old teen emanated from things he could be Ji 1 of 1 of the reformist said the 14 terror by I'm very interesting and a lot of things we release 3 publications screener before before the papers of come out to see justice consequence of this the 1 that people In mentioning was very fast Eli and that's it Uh happy GRI with superfast next next generation sequences we sequence they have pretty concede that the strain if the become either killed About 50 people in Germany and before it even uploading too And CBI based this data as quickly as possible we released it On the Internet catching fire Twitter and we gave it a public domain waiver to maximize the use Full apparently be really useful to give people a mechanism to cited if they if they so desired so we shouldn't with with a data it was really interesting putting on traffickers immediately people started using this data swapping it between themselves within 24 hours people Pictures assembly Annotations and that some of the bloc said the equalized in the 1st we know the from the way that this research was done and I could solid agreed with the people of Posting their results on the blocks and And eventually they ended up and get repository about 20 groups around the world where where within the rest of the week putting Really Useful analysis that and it it was a lot of fun being named releasing subsequent datasets on on Twitter And seeing all of this fantastic work and a locally great attention the BGI but that for a Research Inc
Institution it didn't stop publication in in journalist visitors New England and medicine It's a 200 year olds journal usually very conservative about things such as Pete publications and Pete publications In an announcement of results and things like that but they they had no problems with With the data being outside bow before publication and the even highlighted the office in the article about open-source Joe so that was that there's a nice on the journalist not problems with them and this is exactly a year ago and it's not been nice to kind of followed the date the use of this data and within 5 days of the data being released diagnostic climbers released subsequently been paid on Anti microbial agents that would develop and using the using the award data that was released to the from comparison papers because everybody is other creature study released data in its open sincere and not of a lattice platform comparisons and 1 thing on my cigars about of paper in nature it amounts to occur where at different group also working and Eco at
By assistance because everybody else released that data and under a Creative Commons license in this way it allowed them to basically bailout free use of the data they They by parcel of the lawyers and release that Dayton exactly the same way a saving a couple days in in a crisis like this mind and that was that was really nice dancer consequence So it's great to see you know there's been a lot of talk here form publishers people Talk about about Data General's you can see the generals A getting very interested in this and altogether except if they're choosing their own journals so we'll be talking to a number of generals about their policies about BOP publication released data at 1 thousand Systematically talked to a lot of the publishers and safe I think any Saleh has had problems with that most of the other generals have been been fine and we've already had presentations today plucking Suki's this theory that year such Data General launching in the this the small more interest in next and that we've had a few problems getting deal wise into that into the References the papers we we've had no problems journals and linking linking dealers had NATO generals and science do that with some of our some of what they said Buffer success getting new dates it Working at people citing the data successes with with the Salk you so
We submitted it to their 6 states types in this paper we submitted it full and CBI databases and at the end of the same time issued the a issue video I Nacona complemented selves tonic complimented the process quite well people immediately access the data would be DOI and CBI databases security really well but for example the snake database 9 months on I think it's still not public because it takes such a long time to build To release new database Bills and the like Results in a combined the additional former data release wit with this deal and then we work very closely with the editors and production people acting in biology today in the references cited Please cited DOI following the strip a years DCC Facts and data site guidelines for You know exactly how we should fight the data BMC she uses their example in their instructions for office now about how how it should be cited and after that but we got
Nature My technology and this is a very interesting paper about Founding editor it's quite a controversial area in in in molecular biology
And it is useful having all of the data supporting it we made it available on on upon on Waikiki debate and a cake It's a line in the references of that it's it's nice saying that generals and supporting them not having any problems and subsequently today you know announcing it more more journalists stunned to sign a joint ad DOE correctly and even think share which Their status Started taking data set your eyes even their handles incorrectly cited in in class to people that this is 1 of culture hurdles that seems to seemed to have been bypassed now they will kind of these last couple steps it seems you mean he's sort of ready ready for these things it's
The next Europe more seem to be more of a challenge And so on From our perspective The metadata that it did not matter Functions very useful in Make data discoverable and Produces it in a way I hear made possible form but when we were talking to Google Scholar and the data site DOE Google Scholar but um when we ask them if they could include not realize they're saying well They don't take datasets its it's for scholarly even then it set up a bill to take this idealized and politics dealers in that I actually wanted remove the images which is a bit of a shame it minute just a kind of policy that They should be should be easy to overcome this is that this is something that I think that you need address addressing we're talking to somebody's citation indices does sound like that it's almost there with Uh you know somebody other citation indices taking And Microsoft and total impacted our way on my said think is that this is kind of next that would be really useful to to address next to me And the tracking will basically allow Make the metric much much easier the baby met in a baby metrics people might have the POOR they try to track the use of updates deal license and that kind of included really that it's very difficult if not impossible to kind of other than sticking things in Google chicken and tractor track downstream use and In Google headers Bob last month cheers saying if really If making promises that a release date in this way make it signed to bowl But doing it We really need A unit would make everybody's promises but nobody's actually indexing them at the moment so it it is a bit of a shame I think this would be in these final 2 steps that that that that the next 2 things that putt from my view that need to be addressed now Overcome these last couple of cultural hurdles and say yes I think this is sort of where we are in 2012 Little minus 3 back on playing with These days idealized over the last year When we tried to export the Expo the citations and there and the date society A metadata Search inches of terror and Mandalay its still worked for the former things that's like I don't know if that's the Tamiment Mandalay fault with way but set up but that was 1 minor problem we have and every time we've had issues with burgeoning in granularity things like that Contacting the A beta site helpers there been very useful and helped us to be nice to have The user base grows we now have a much clearer guidelines about these things about burgeoning granularity been the things we because things being integrated into papers but currently citing income paper shaped objects the people 9 publication people talk about giving to individual facts and ask patients in the literature which they be tender the 10 of the 14 of as public Much that I think is a that have to come eventually settle in between where Where people site site data and it be interesting to hear people spend their views in this See see how it evolves and so if people are interested in all of this we do have a In the BMC data standards and sharing Series and in research note on and take designed the actor Journal will be launching a took probably 12 13 July tripping up all of the articles
And have a nice example Discussion about supplement took some mental follows We have a research paper Where we're facing them All of the city's data and tolls with Genentech pipeline under that we will be issuing a dataset ideal lying to it about 84 Get A supplemental follow working with Suzanna we're making it now have complied format with without data platform we can integrate about 80 per cent of all of the methods into this work systems will we Haviland to get it like it should be completely you executable paper yet but that that will later in the year so this is what watch out for for watch out for all of this and get people have An interesting got about that Talk 12 also found Hosting a That session on reproducible research at the end of the year and change and people interested as well so with that they just like to thank
Um that it would be at be Geisel supports Inc This endeavor And we have collaborators in the couple of the Hong Kong universities It's been great working with data and find town and do a tougher questions With
Rückkopplung
Bit
Siedepunkt
Besprechung/Interview
Internetworking
Netzwerktopologie
Flächeninhalt
Verbandstheorie
Perspektive
Pinching
Mereologie
Projektive Ebene
Explosion <Stochastik>
Aggregatzustand
Folge <Mathematik>
Punkt
Momentenproblem
Selbst organisierendes System
Raum-Zeit
Computeranimation
Open Source
Selbst organisierendes System
Perspektive
Mini-Disc
Notebook-Computer
CMM <Software Engineering>
Speicher <Informatik>
Tropfen
Leistung <Physik>
Analysis
Ähnlichkeitsgeometrie
Physikalisches System
EINKAUF <Programm>
Quick-Sort
Office-Paket
Energiedichte
Dienst <Informatik>
Generator <Informatik>
Druckertreiber
Decodierung
Fitnessfunktion
Folge <Mathematik>
Sichtbarkeitsverfahren
Folge <Mathematik>
Punkt
Kanalkapazität
Momentenproblem
Güte der Anpassung
Kanalkapazität
Äquivalenzklasse
ROM <Informatik>
Computeranimation
Zellularer Automat
Videospiel
Vorlesung/Konferenz
Projektive Ebene
Speicher <Informatik>
ART-Netz
Meta-Tag
Leistung <Physik>
Domain <Netzwerk>
Folge <Mathematik>
Punkt
Betragsfläche
Freeware
Gemeinsamer Speicher
Leistungsbewertung
Besprechung/Interview
Zahlenbereich
Assembler
Automatische Folge
Extrempunkt
Systemplattform
Ähnlichkeitsgeometrie
Computeranimation
Datenhaltung
Eins
Demoszene <Programmierung>
Stabilitätstheorie <Logik>
Informationsmodellierung
Typentheorie
Inverser Limes
Bildgebendes Verfahren
Folge <Mathematik>
Dokumentenserver
Systemplattform
Frequenz
Natürliche Sprache
Kollaboration <Informatik>
Flächeninhalt
Projektive Ebene
Standardabweichung
Nebenbedingung
Web Site
Prozess <Physik>
Texteditor
Datenanalyse
Besprechung/Interview
Zahlenbereich
Systemplattform
Datenhaltung
Metadaten
Deskriptive Statistik
Bildschirmmaske
Maßstab
Betragsfläche
Prozess <Informatik>
Datenverarbeitung
Stützpunkt <Mathematik>
Figurierte Zahl
Gerade
Addition
Computersicherheit
Datenhaltung
Datentransfer
Physikalisches System
Bitrate
Elektronische Publikation
Office-Paket
Verbandstheorie
Offene Menge
Elektronischer Fingerabdruck
Ablöseblase
Notepad-Computer
Lesen <Datenverarbeitung>
Aggregatzustand
Resultante
Offene Menge
Explosion <Stochastik>
Folge <Mathematik>
Domain <Netzwerk>
Natürliche Zahl
Gruppenkeim
Extrempunkt
Systemplattform
Analysis
Computeranimation
Internetworking
Datenhaltung
Digital Object Identifier
Softwarewerkzeug
Vorlesung/Konferenz
Public-domain-Software
Gammafunktion
Analysis
Beobachtungsstudie
Folge <Mathematik>
Kraftfahrzeugmechatroniker
Open Source
Elektronischer Datenaustausch
Paarvergleich
Systemaufruf
Office-Paket
Generator <Informatik>
Twitter <Softwareplattform>
Verbandstheorie
Basisvektor
Resultante
Offene Menge
Web Site
Prozess <Physik>
Freeware
Snake <Bildverarbeitung>
Zahlenbereich
Kombinatorische Gruppentheorie
Physikalische Theorie
Computeranimation
Videokonferenz
Datenhaltung
Puffer <Netzplantechnik>
Bildschirmmaske
Datentyp
Notepad-Computer
Vorlesung/Konferenz
Peer-to-Peer-Netz
Addition
Computersicherheit
Datenhaltung
Singularität <Mathematik>
Biprodukt
Office-Paket
Texteditor
Digital Object Identifier
Aggregatzustand
Texteditor
Flächeninhalt
Gemeinsamer Speicher
Natürliche Zahl
Klasse <Mathematik>
Besprechung/Interview
Wissenschaftliches Rechnen
Vorlesung/Konferenz
Analysis
Gerade
Quick-Sort
Web Site
Bit
Total <Mathematik>
Momentenproblem
Adressraum
Mathematisierung
Systemplattform
Computeranimation
Metadaten
Weg <Topologie>
Bildschirmmaske
Digital Object Identifier
Einheit <Mathematik>
Maßstab
Perspektive
Softwarewerkzeug
Vorlesung/Konferenz
E-Mail
Bildgebendes Verfahren
Streuungsdiagramm
Inklusion <Mathematik>
Automatische Indexierung
Objektverfolgung
Lineares Funktional
Sichtenkonzept
Prozess <Informatik>
Linienelement
Betafunktion
Linienelement
Stichprobe
Reihe
Ideal <Mathematik>
Vektorpotenzial
Physikalisches System
Dateiformat
Menge
Quick-Sort
Schlussregel
Objekt <Kategorie>
Dateiformat
Standardabweichung
Kollaboration <Informatik>
Grundraum
Computeranimation

Metadaten

Formale Metadaten

Titel Scott Edmonds, Giga Science, BGI Shenzhen at DataCite summer meeting 2012
Untertitel Adventures in Data Citation: deadly E. coli outbreaks, Sorghum and RNA-editomes provide examples for the future
Serientitel DataCite summer meeting 2012
Teil 7
Anzahl der Teile 10
Autor Edmonds, Scott
DOI 10.5446/6565
Herausgeber DataCite
Erscheinungsjahr 2012
Sprache Englisch
Produzent DataCite

Inhaltliche Metadaten

Fachgebiet Informatik

Ähnliche Filme

Loading...