Invited Talk on "The de.NBI network – a Bioinformatics Infrastructure in Germany for Handling Big Data in Life Sciences.”
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 12 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/38892 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
00:00
DisintegrationComputer networkVideo gameVideo gameSoftwareData structureFitness functionCommutatorNeuroinformatikOrder (biology)Computing platformRootCausalityConnected spaceWordPrice indexCellular automatonInformationData integrityProjective planePoint (geometry)Wave packetXMLUMLLecture/ConferenceComputer animation
01:59
Computer fontDisintegrationComputer networkAerodynamicsPoint cloudInformationSelf-organizationService (economics)Task (computing)Machine visionWave packetSoftwareDirection (geometry)MereologyInformationSoftware developerSpectrum (functional analysis)Level (video gaming)CollaborationismGroup actionDynamical systemMachine visionService (economics)Branch (computer science)Physical systemMultiplication signDatabaseTask (computing)Data structureComputer fontRule of inferenceAreaVariety (linguistics)Arithmetic meanData managementNeuroinformatikShared memoryComputer animation
05:17
Data structureComputer networkService (economics)DisintegrationNeuroinformatikSelf-organizationMereologyDirection (geometry)Projective planePlanningPhase transitionExtension (kinesiology)Self-organizationWave packetInformationSoftwarePoint cloudTask (computing)Service (economics)ResultantPoint (geometry)Slide ruleStatement (computer science)Numbering schemeCollaborationismCoordinate systemStudent's t-testFamilyMatching (graph theory)Representation (politics)Order (biology)AdditionWordEntropie <Informationstheorie>Term (mathematics)VotingSign (mathematics)PlotterSound effectComputer animation
10:02
Computer fontDisintegrationComputer networkInformationSelf-organizationTask (computing)Service (economics)Point cloudMachine visionData structureMultiplication signDirection (geometry)SoftwareProjective planeGroup actionPoint (geometry)Logical constantPhysical lawBeta functionEvent horizonComputer animation
11:33
DisintegrationNeuroinformatikPoint cloudService (economics)Computer networkDecision theoryData managementGroup actionMilitary operationSelf-organizationWave packetQuicksortCloud computingSystem programmingModal logicService (economics)Execution unitSystem administratorSelf-organizationCollaborationismCoordinate systemExpert systemCentralizer and normalizerOffice suitePoint (geometry)Moment (mathematics)NeuroinformatikDecision theoryOperator (mathematics)Software developerTraffic reportingDirection (geometry)Lattice (order)Selectivity (electronic)Scaling (geometry)Projective planeAdditionTerm (mathematics)Row (database)Functional (mathematics)Gene clusterTask (computing)Position operatorACIDInsertion lossGroup actionSoftwareAnalytic continuationComputer animation
14:25
Intrusion detection systemDisintegrationTelecommunicationService (economics)Point cloudSelf-organizationData managementCore dumpPoint (geometry)NeuroinformatikMilitary baseGroup actionVotingService (economics)WebsiteFamilyPopulation densitySoftware developerMetropolitan area networkWave packetData managementMoment (mathematics)Open sourceInformationServer (computing)TelecommunicationSpectrum (functional analysis)Goodness of fitBuffer solutionUniqueness quantificationSoftwareDisk read-and-write headBitINTEGRALDemosceneComputer animation
17:45
Maxima and minimaComponent-based software engineeringCellular automatonProteinDatabaseData managementDisintegrationSelf-organizationHill differential equationPhysical systemLemma (mathematics)WindowGamma functionFormal grammarExecution unitComponent-based software engineeringSelf-organizationNeuroinformatikService (economics)Different (Kate Ryan album)Cellular automatonCovering spaceMoment (mathematics)KodimensionSampling (statistics)Basis <Mathematik>Sinc functionCondition numberCanadian Mathematical SocietyWater vaporBoss CorporationCASE <Informatik>Slide ruleOrder (biology)Data managementInteractive televisionOrientation (vector space)Disk read-and-write headDatabaseComputer animation
20:00
Service (economics)Web pageWindows RegistrySoftwareDatabaseInformation technology consultingDisintegrationTask (computing)Computer networkSoftware configuration managementQuadrilateralLibrary (computing)Service (economics)Point cloudInformation technology consultingTask (computing)Cloud computingInformationDatabaseWeb serviceMathematical analysisProduct (business)Visualization (computer graphics)SequenceWave packetPoint (geometry)Online helpArithmetic meanDirection (geometry)Right angleInstance (computer science)Natural numberACIDResultantSign (mathematics)Goodness of fitUniqueness quantificationWordNumbering schemeComputer animation
22:59
DisintegrationTask (computing)ProteinInstant MessagingRight angleWordInstance (computer science)Different (Kate Ryan album)DatabaseDataflowService (economics)Goodness of fitComputer animation
24:33
Process (computing)Self-organizationData storage deviceHeat transferComputer programOperator (mathematics)Control flowCellular automatonSoftware frameworkTouch typingDisintegrationTask (computing)SoftwareOpen sourceLibrary (computing)Data managementMassMedical imagingSoftware frameworkTouch typingProcess (computing)Self-organizationData managementProjective planeLibrary (computing)Modal logicBounded variationDirection (geometry)Mathematical analysisBasis <Mathematik>System callField (computer science)NeuroinformatikInferenceGoodness of fitEndliche ModelltheorieMathematicsUniverse (mathematics)Bit rateInstance (computer science)Metropolitan area networkQuicksortComputer animation
26:44
Repository (publishing)Hill differential equationSoftwareComputing platformGroup actionSystem identificationTask (computing)HomologieCore dumpObservational studyDatabaseSoftware frameworkDisintegrationInclusion mapNormal (geometry)Computing platformChainSelf-organizationSequenceSoftware frameworkPairwise comparisonForm (programming)Open sourceInformationDatabaseMathematical analysisGoodness of fitComputer animation
27:38
DisintegrationMeta elementTask (computing)Twin primeSinguläres IntegralComputer-generated imageryTask (computing)NeuroinformatikService (economics)Wave packetDifferential (mechanical device)Mathematical analysisDirection (geometry)Field (computer science)Sinc functionCombinational logicSpring (hydrology)Level (video gaming)BitPresentation of a groupInformation2 (number)Raw image formatResultantState of matterDataflowMultiplication signSound effectBilderkennungOffice suiteForm (programming)Roundness (object)CausalitySequenceTerm (mathematics)Streaming mediaSpacetimeInstance (computer science)WordSet (mathematics)INTEGRALComputer animation
30:32
DisintegrationTask (computing)Cloud computingComputational physicsCycle (graph theory)Video gameWave packetPoint cloudNeuroinformatikFamilyInstance (computer science)WritingData managementCausalityPower (physics)Cloud computingSheaf (mathematics)Computer animation
31:24
Point cloudData storage deviceSupercomputerProgrammer (hardware)DisintegrationSelf-organizationState of matterMaxima and minimaSoftwareState of matterSelf-organizationSupercomputerCollaborationismVideo gamePoint cloudCoordinate systemWave packetInstance (computer science)Level (video gaming)Direction (geometry)Goodness of fitTask (computing)Sheaf (mathematics)InternetworkingAdditionData storage deviceWebsiteSingle-precision floating-point formatOpen sourceMaterialization (paranormal)Multiplication sign
33:42
Operations researchService (economics)DisintegrationSelf-organizationState observerCore dumpImplementationObservational studyCore dumpCharacteristic polynomialDifferent (Kate Ryan album)NumberHypothesisSelf-organizationService (economics)Point (geometry)ImplementationExpert systemOpen sourceEndliche ModelltheorieModal logicDimensional analysisSet (mathematics)Link (knot theory)Observational studyState observerComputer animation
35:17
Local GroupExplosionDisintegrationMaxima and minimaSoftware developerSet (mathematics)ResultantDirection (geometry)
36:03
Observational studyCore dumpImplementationPersonal digital assistantDisintegrationCollaborationismCartesian coordinate systemSelf-organizationCASE <Informatik>Core dumpImplementationCollaborationismMultiplication signDisk read-and-write headWave packetDistribution (mathematics)WhiteboardDifferent (Kate Ryan album)Sign (mathematics)Right angleParticle systemLeakOpen sourceRadical (chemistry)Coordinate systemLink (knot theory)MereologyAdditionPC CardComputer animation
38:36
DatabaseSequenceService (economics)InformationPhysical systemFunction (mathematics)DisintegrationSystem identificationCore dumpInformation systemsDatabaseCollaborationismGame controllerCore dumpInformationProcedural programmingFunctional (mathematics)Type theoryCategory of beingExtension (kinesiology)Modal logicGraph (mathematics)Open sourceInstance (computer science)Branch (computer science)Military baseOrder (biology)Forcing (mathematics)Binary fileGodNumerical taxonomyProcess (computing)Right angleComputer animation
40:22
DisintegrationSelf-organizationAuthenticationPhysical systemAuthorizationPoint cloudSpeicherkapazitätIntegrated development environmentProcess (computing)Channel capacityService (economics)Point cloudProjective planeUniverse (mathematics)Point (geometry)Cloud computingSoftwareData storage deviceMoore's lawPower (physics)Cartesian coordinate systemNeuroinformatikField (computer science)Goodness of fitSet (mathematics)AreaSocial classTask (computing)Software developerDifferent (Kate Ryan album)Order (biology)Centralizer and normalizerSound effectComputer animation
43:11
DisintegrationPoint cloudMeta elementFunctional (mathematics)System identificationData structureStatement (computer science)ArmNormal (geometry)Physical systemCartesian coordinate systemFrame problemReading (process)Incidence algebraProjective planeSet (mathematics)Process (computing)Point cloudOpen setFreewareState of matterPlanningRow (database)Extreme programmingDirection (geometry)Open sourceSolid geometryImage resolutionSequenceDrop (liquid)NeuroinformatikSimilarity (geometry)Field (computer science)WebsiteUniform resource locatorXMLComputer animation
46:59
Internet forumComputer networkElement (mathematics)Series (mathematics)AreaDisintegrationData structureSelf-organizationInformationInternet forumInformationSimilarity (geometry)Branch (computer science)Self-organizationState of matterDirection (geometry)Form (programming)SoftwareService (economics)Descriptive statisticsProjective planeTerm (mathematics)Bus (computing)Cellular automatonQuicksortCodeComputer animation
48:17
DisintegrationComputer networkNewsletterEvent horizonInformationPoint cloudInformationArithmetic meanNewsletterSoftwareTable (information)Point cloudView (database)Computer animation
49:06
Computer networkInformationDisintegrationMachine visionPoint cloudService (economics)Analytic continuationWorld Wide Web ConsortiumMachine visionAnalytic continuationSystem administratorService (economics)Point cloudPhase transitionTwitterPower (physics)Connected spaceWave packetSoftwareTerm (mathematics)Web pageQuicksortProjective planePhysical lawOffice suiteOpen sourceLink (knot theory)Spring (hydrology)Duality (mathematics)WordComputer animation
50:57
DisintegrationComputer networkSound effectPresentation of a groupPosition operatorProjective planeUniverse (mathematics)Mobile WebLattice (order)Self-organizationMultiplication signOperator (mathematics)Computer animation
51:54
DisintegrationPresentation of a groupLecture/ConferenceMeeting/InterviewXMLUML
Transcript: English(auto-generated)
00:00
Then we use the German network of bioinformatic infrastructure for handling big data in life sciences. Just to mention to my person at the moment, I have a senior research, a professorship at Bielefeld University.
00:25
It is in the center of biotechnology and this center has two platforms. One platform is omics technologies and the other platform is bioinformatics. This of course is now a connection to your conference here.
00:41
What I should mention is what means infrastructure. Infrastructure means that all the tools and workflows which are available in bioinformatics, they are collected and they are available for users. That is the most important point. If you have users from experimental fields, they do not know these tools very well.
01:04
We also have education and we have training courses where these tools are introduced and you can learn what is going on in such a field. Of course it is also necessary to have a compute structure to help users to compute their projects.
01:25
We also have started to introduce and to establish such a compute structure. I will give you all the general information which is necessary. You should know about Denbigh.
01:41
What I can mention here is already the following. Denbigh is very clear. He has a good connection to this conference here. Data integration and life sciences, that is something we also have in mind.
02:02
The Denbigh network is a dynamic and distributed bioinformatics infrastructure. I will tell you how this infrastructure was developed. What means distributed, it is very clear. We have the best groups distributed over Germany.
02:22
There are members in this network and they collaborate in the task of Denbigh. You see here how eight service centers are located in Germany. We can go through this map here.
02:44
First we have the bio data in Bremen databases. We have the plant bioinformatics. In Heidelberg we have the human bioinformatics. Tübingen is well known for integrative bioinformatics.
03:00
RPC is focused on RNA bioinformatics. Heidelberg, as I mentioned already here, here is his management system. He has his bioinformatics, the human bioinformatics. And bioinfra brought in Bochum, that is proteomics. And last but not least in Bielefeld, we have a microbial bioinformatics.
03:23
So you see that these eight service centers really cover very nicely a very broad spectrum in this field. Okay, that is the landscape and it is very clear. It is a distributed infrastructure. And what means dynamic? Dynamic means it is not a closed shop.
03:41
So if there are groups coming up with interesting new tools and training possibilities, they can join this DNB network. There are possibilities in this direction. What is the outline in my talk here? I will give information how such a network, how DNB was established,
04:02
how is it organized and how is it governed. That is the first part. The second part, and this is one of the most important tasks of DNB, that is service and training. The third part means that we are looking for a collaboration on the European level. And if we have established such a bioinformatic infrastructure in Germany,
04:23
there is also a European development and this European development is coordinated by Alexia. So I will give you some information on Alexia and how we could be integrated or are integrated into Alexia. What I already mentioned, the compute structure is an important part.
04:43
And I will give you some information on the DNB cloud. There is a compute structure which is available for DNB. Last but not least, we are just starting it. We also look to industry and would like to have an industrial branch of DNB.
05:00
That is under development. Last but not least, I will give you some information on information material on the DNB network. And finally, I will have a summary on achievements and visions. And I have enough time to show you all these details.
05:20
Okay. Starting with a slide where first the mission statement of DNB is repeated and I already told you service training was at the main task when we started this work. And I also, as a DNB coordinator, I have the task to look for European collaborations.
05:45
Now, that is the mission statement of DNB when we started it. And I would like to show you the timeline of the DNB project. And the timeline is here in the lower part. And what is of importance, it really starts,
06:01
everything was started in the year 2012. And in the year 2012 here, we have had two papers and I will explain what was the goal of these papers. In 2013 finally, the PMBF announced a project.
06:27
It is called the DNB project. And this project was in a way that there was an opportunity to apply, to get funded by the PMBF to come into DNB. And by an international consortium of reviewers,
06:45
22 projects were selected. And the whole year, 2014, was used for so-called design phase. And design phase means that these 22 projects which were organised, these eight service centres which I explained already,
07:04
they discussed how a network should be organised and how it should be governed. This was important at the end of 2014. Again, this reviewer panel looked at the proposal and they started afterwards. They allowed us to start the DNB project in March 2015.
07:28
And you see that this whole project was funded for five years. There was establishment phase one and establishment phase two. And this was the reason since the end of establishment phase one.
07:41
There was a mid-term review and the reviewers again looked on the success of the DNB network and afterwards we got the possibility to continue in the establishment phase two. Of importance was, of course I already showed you,
08:02
service and tools, training and education were the first tasks. But in addition, during the establishment phase one, we got the opportunity to integrate into Alexia. We extended the scientific topics of our projects
08:23
and we have so-called partner projects integrated to close scientific gaps and finally we started also the DNB cloud. Now, in the establishment phase two, we are continuing, of course, with the original task but we have really to develop as Alexia Corporation
08:43
and we have to develop further the DNB cloud. I will give you some information on this. And finally, the whole project will end in March 2020 and the purpose of the whole project was, of course, to establish a bi-informatic infrastructure.
09:01
And this is not a research project where you have got results and that's it. If you have an infrastructure, you need support to continue. That's the most important point. So we have had five years to establish everything and then we need, of course, further support how to continue here. And there is a problem in Germany. Since in Germany there is no scheme
09:22
how such distributed networks can be further supplied with finances. So here is a big question mark and I am asked as a coordinator to continue here in this direction.
09:40
And it's very, very difficult and in the meantime we have at least a plan and this plan is supported by the BNBF so we will have an extension phase for further two years here and when there are plans to integrate perhaps into the like this organization. That is something what we have in mind. I hope this will be successful. We do not know.
10:01
Okay, so far what you should know and I would like to go through individual points of my talk here and what I already announced is in the year 2012 we have had two papers. One paper was by the German Bioeconomy Council
10:24
and at that time I was a member of this council and this was the reason why I was involved in writing this paper here and the question was how to establish a bioinformatic infrastructure. This was not clear at this time. Of course you can have the imagination
10:43
that you have a new institute for infrastructure and this was what is written in this paper. You can do it in the direction of a network. To combine the best groups here in Germany in such a network and this was written here in this paper.
11:01
This paper was presented to the BNBF. The BNBF was convinced and as usual the BNBF set up his own paper and this was Bioinformatic 2012. And here already the whole BNBF project was written down.
11:22
This was the start and we went into the direction of a network. Now this was the starting point. The next question is of course how is the organization of such a network?
11:41
How will it function? And what I already told you, we have had eight service centers and these eight service centers with 22 projects. These were selected by a review team. So that is a good point to have an international selection. And now it was necessary to create something like a central coordination unit.
12:05
And in this central coordination unit there are delegates, always one delegate per service center. So we have eight CCs and in addition the BNB coordinator was also invited to be in the central coordination unit and he also has a voice.
12:25
And that is the most important point and it is written that the central coordination unit is a main decision-making body of the BNB initiative and it is responsible for the effective operation of the research infrastructure.
12:43
So everything, what is decided, what is going on in the BNB network, that is really decided here in the central coordination unit. The central coordination unit will meet each three months and you can imagine that many questions should be discussed on a broader scale.
13:01
And for this reason the CCU installed so-called special interest groups. In interest groups, the so-called six, there are then the experts who are discussing or focusing on very specific questions which are prepared and afterwards possible decisions,
13:23
they are delivered here to the CCU where the decisions are taken. So that is very important and I will show you what these six have, what the topics of these six are. Now you see here everything is only necessary if you have an administration office,
13:41
it is very clear, which is in collaboration with the BNB coordinator and in this administration office at the moment we have six positions. The next point, very, very, very important, there is an international scientific advice report and we were really lucky that we have six persons,
14:04
scientists by informaticians from all over Europe and they tell us in which direction we should go, what is the problem, what is the nice development and so on. So in two weeks we will have another SAB meeting in Berlin and we have such a meeting every year.
14:21
Now, so far the organization and if you are interested, who is at the moment a member of the CCU, more or less what you see here is that the heads of the service groups are of course then members of the CCU
14:43
and you can go through this picture here, this is nine seats, we have Rolf Bock, perhaps you know these people a little bit from Freiburg, somebody who is going to RNA PyInformatics, or Bert Bork, very well known, Embel Heidelberg,
15:02
and is a human PyInformatics, Martin Eisenhacke from Bochum, he is for proteomics, Frank-Oliver Glöttner from Bremen, he is responsible for the databases, then we have Jens Storje here, he is from Bielefeld and he is from Microbial PyInformatics,
15:22
the next one is Ube Scholz from Gatersleben, that is the person for plant PyInformatics, then Wolfgang Müller from Hitz in Heidelberg, and he is focused on data management, and last but not least, you know very well,
15:41
that is Oliver Kolbacher from Düpingen, he is responsible for Integrative PyInformatics. So you see there is really a broad spectrum of people sitting here in the CCU and they meet, as I mentioned, every three months and they decide what is the most important point for the Danby network where to go.
16:03
Well, on the right side, you have here seven special interest groups, the first one is on communication and outreach, it is very clear that the Danby network should be well known in the scene and that means that we really need communication
16:20
and communication of course is you need a webpage, you need information material and you should have a good outreach, so that is something that is handled by SIG 1. SIG 2 service and service monitoring is not only service, it is also necessary to have a monitoring to see what is important
16:43
of the service examples which are carried out. SIG 3 is training in education, big point in Danby, I will give you some more information on it, so this is SIG 3. SIG 4 is infrastructure and data management,
17:01
mostly in the meantime it is data management, infrastructure is a compute infrastructure and we have in SIG 6, we have a special new established special interest group on Danby Cloud. SIG 5 is Danby development, what I already mentioned, it is really important to look where Danby is going in the future
17:22
and this is handled by SIG 5, so in SIG 5 possibilities are developed and afterwards in the CCU they are discussed. And last but not least, SIG 7 also newly established, that is Alex here, cooperation and this is in the meantime
17:43
really a big point what has to be continued. Okay, here we have a very busy slide and what you see here, you see in detail what is the topics or what are the topics in the different service centers. I do not want to go through all these different subtitles.
18:06
I also mentioned what is done in these service centers, but there is something once again, there is a classification on these service centers and what you see here, there are three service centers
18:20
and they are organisms oriented. And we have service centers for human bioinformatics, plant bioinformatics, microbes bioinformatics, and these three service centers are located in Heidelberg, Garth, Schleben and Bielefeld. And I already showed you the heads of these service centers,
18:41
for human it was Berbork, for plant it was Uberscholzen, microbe, and it was Jens Stohr. So three centers cover the different organisms. There are two centers which are cell component oriented. It is very clear that we have, at the moment, very important, the RNA case.
19:02
You know, of course, that a lot of activity is going on on the human sector for human cells. You look at which RNA are expressed under which conditions. So there is a whole RNA story that is RBC in Freiburg
19:21
and proteomics, yeah, that is bioinfloprod in Bochum. Since there are three centers which are left and they are methods oriented, the databases, I already mentioned that in Bremen we have three or four important databases. I will come to this. The data management note that Stenvissus Bio in Heidelberg
19:43
and workflows or interactive bioinformatics, that is a CP in Tübingen. Well, if you go to the material which is distributed, then you can see what is of importance,
20:01
what you would like to know, and you will get more information on it. Okay, this brings me to the service and training tasks of 10B. Yeah, and here I would trust or I have to tell you what is behind the services which are available.
20:20
Of course, we have services and tools. We have single tools, mostly we have workflows. What is also necessary until you can develop workflows, you need software libraries. Another possibility are web services.
20:40
And we have the databases which are already announced here and we have cloud computing. And last but not least, there is also consulting available. So for an experimental scientist who does not know very much about bioinformatics, he can go to 10B and he will get all the information which is necessary.
21:03
On the right side here you see the tools cloud or service cloud of 10B, but I will give you some more information on it. So what are the approaches for help and support offered by the 10B network? Yeah, already mentioned, we have consulting
21:22
and we have some support. And consulting means before an experimental scientist will start his experiment, it makes a lot of sense to talk to a consultant before. To know what he should go for and how the data should look like,
21:40
which are afterwards used by 10B services. So that is the most important point, that is the consulting step here. And afterwards, as an experimental scientist, he should go for the production of big data here, for instance, by sequencing. And if he has the data and the data are in the right scheme,
22:01
then they can be used by 10B. And you have here, of course, the 10B services, the tools which are necessary to work, to analyze this data, and of course also the compute structures, the 10B cloud, to do it in one step. If an experimental scientist does not know very much about this point here,
22:22
he can get support here by 10B people. So everything is in the right direction to analyze the data. Afterwards, the experimental scientist will get, of course, the analysis report,
22:41
sometimes of course also the visualization of the outcome of the results, and then it is his duty to interpret the data, to write the publications, and if everything was in the right way, perhaps he will have a good publication, Science or Nature, whatever it is. Okay, that is what is going on.
23:02
And if you now ask which, it's just an example, which services are available, I hope I can find everything what I would like to show you. For instance, we have OpenMS here. These are workflows. We have EDCA here in the middle,
23:22
and we have OTP here in the right corner. I will go on these three examples on a broader scale, but, and that is also of importance, that you have tools, you have workflows. Of importance is workflow engines.
23:41
Workflow engines use these different tools and they create something like a workflow, and for these workflow engines, we have Galaxy, and Galaxy is the most important one, but we also have NIME, very well known in the industrial landscape.
24:07
So, and last but not least, I also would like to go to Silver. Silver is here, and Brenda is here. These are two databases. Also Panchia is a database which is incorporated into 10P.
24:24
Good. You see, it would be possible now to talk on all of these different tools, mostly workflows, but I would like to continue and just show you one example, and that is OTP. OTP is a comprehensive framework
24:41
for NGS project organization and processing. OTP one touch pipeline. That is exactly what a user would like to have. So if everything is the right way, the data are in a situation that they can be used by such a workflow, then it's very nice, and you will find here
25:01
what is the importance of OTP. It's the alignment of reads to a reference genome and variant calling. It's obviously something that is coming from the medical, from human bioinformatics, what is very well known in the medical field
25:20
and of course after sequencing, you have reads, you can map the reads on a reference genome and you can look which basis is a variation in very specific basis and then you can go and continue if these variants are of importance or not.
25:41
For instance, if you have a special treatment necessary. Okay, it is offered by the Heidelberg Center of Human Bioinformatics here, and I can tell you that it's a tool which is installed at the TKF Center. So we have all the big possibilities
26:02
here around to work in such a direction. So OpenMS, the OpenMS workflow is a library for LC-MS data management and analysis. So mass spectrometry is a big tool for proteomics,
26:21
well known. If you have gel-free proteomics, it is used and it is very important to have such a workflow working and this OpenMS is located at University of Tübingen, the Center of Integrative Bioinformatics,
26:40
a CEP. What is... Yeah, I also would like to mention EDGA. EDGA is a software platform for comparative genomics, very well known in Germany, Europe and worldwide. So a lot of people from all over the world are working
27:00
on this software platform. And what is behind? Behind is the following, that you have so many sequence genomes now that you can compare the genomes with each other. If there are organisms which are close to each other, the genomes should be close to each other and you can look which genes are repeated, what is different.
27:21
That is really a big possibility to get information on it. EDGA is an abbreviation on an efficient data-based framework for comparative genome analysis using BLAST score ratios. Good. Yeah, this is
27:40
a very short presentation here on the tools and the workflows which are available. The second task of Denbigh, that is the training task. And the training is presented here. Denbigh offers tailor-made training courses, webinars and online training courses
28:00
for omics tools and workflows which enable researchers to transform the raw data into actual results. Very clear. Users should get the information what is behind the different tools. And at the very beginning, this was here in the year 2015,
28:20
we started with 17 training courses in these eight training service centers and we reached more than 300 participants. In the last year, 2017, we have had already nearly 70 training courses offered to the scientific field and we reached around 1,500 participants.
28:44
It's very clear you cannot go on in this direction since for some reasons we are really on a saturation level. And if you look to 2018, we will have nearly the same and what is announced in the meantime, perhaps we will have a little bit more
29:00
on training courses and we will also reach again the same amount of people in these training courses. What is of importance, as soon as we end at Elixir, these training courses are also offered on a European level so in the future we can go for more participants
29:21
from Europe and in this direction. On the right side, there are some recent training courses which were carried out. For instance, from BLFED, the Polyomics Data Integration and Analysis Workshop
29:41
or training course. In Bochum, a differential analysis of quantitative proteomics data using R. Bio-image analysis, this was carried out in Heidelberg. RNA sequencing data, I already mentioned this, a big direction for the future,
30:00
in combination with Galaxy. So Galaxy workflow engine uses all these workflows and they are, via Galaxy, they are afterwards calculated on the DNP-Cloud. And just to go to the future, in Gadersleben there will be a spring school on computational biological stardom.
30:24
So on a very, very low level, computational topics will be handled. This is the training courses. We have training and education. And if you go to education, of course, this means to look for summer schools.
30:44
And we have one or two summer schools per year. We started in 2015 with the Microbioinformatics summer school in Giesen. Then, for instance, 2017, we have cloud computing, also in Giesen. Then we have computation,
31:00
genomics and RNA biology, this was carried out in Berlin. We have computation metabolomics in Wittenberg. We have writing data lifecycle, that is data management, which was carried out here near Hanover in Braunschweig. And these summer schools will be continued. There will be a summer school next year on data scientists.
31:24
Good. I come to the next section and this is the collaboration with Elixir, I already told you. This was a big task for the coordinator to look for European collaboration. And when I started this work,
31:40
I was told from the BMBF, we do not want to collaborate with Elixir. There are no other possibilities available in Europe. If you would like to do it on a European level, you have to join Elixir. And this was a long discussion with the BMBF and finally, the BMBF decided to join Elixir.
32:02
What is Elixir? Elixir is an intergovernmental organization that brings together life science resources from across Europe. Intergovernmental means that different European countries are members in Elixir. Elixir is run by the annual fees
32:20
coming from the different European countries. And I just can tell you that these fees are very, very high. And I think this was the reason why Germany did not join at the very beginning of Elixir. But in the meantime, the BMBF did. The resources of Elixir
32:40
is very clearly the same as what we have here at NP. Databases, software tools, training materials, cloud storage, and supercomputers. Exactly what we have started here, this is done on the European level. And I just would like to go in this direction. We have all the guests in the meantime.
33:02
22 members here who collaborate with Elixir. By the way, Elixir headquarters is located in the UK. There are still some states or countries which are missing, for instance. Austria is not a member of Elixir,
33:22
but there is no way they should do it. The goal of Elixir is to coordinate national resources so they can form a single infrastructure. So it is very clear we have an infrastructure here, the Denbigh infrastructure, and on Elixir level it is of course offered to other members in Europe.
33:44
Some characteristics of the Elixir organization. In the meantime, we have 22 members already mentioned, one observer. Over 200 institutes came together in Elixir. We have 700 experts, if you count them all over Europe.
34:02
And we have 18 core data resources. That is a very important point. Core data resources means the most important data resources, Europe and worldwide. I just can tell you, and I will show it in a minute, two of these core data resources are from Germany.
34:25
Altogether we have, of course, a huge amount of services offered by Elixir nodes through the national nodes. We have implementation studies, so this means Elixir identifies some important scientific questions
34:41
which are afterwards handled by different people from Europe and also on the industrial field, there are larger numbers of industrial companies which are of interest to it. There is an Elixir mission.
35:02
It is written here to report it, but it is very important. This makes it possible for them to gain greater insights into how living organisms work. This is really the point towards the experimental scientists.
35:22
Elixir was elected to be one of the world's leading infrastructures. That is important. This was the result of the G7 summit in Germany in the year 2015. It is worldwide. It is really identified as one of the most prominent infrastructures
35:42
which should be connected. That is of importance. Elixir not only works here in Europe, it also looks to Australia. Just recently we have had a delegation from Australia looking at what is going on in Europe and what should be done in Australia. They talk to Canada, they talk to the US, so that is a very interesting development
36:02
which is going on in this direction. I already told you that in August 2016, the BMPF finally signed the Elixir Consortium Agreement. From this time on, Germany is part,
36:20
is a paying member of Elixir. That is important. In Germany, the national node Elixir Germany is run by a so-called German Elixir team. There is a head of node here, a deputy, that is myself and Andreas Tauch. Then we have a technical coordinator, a training coordinator,
36:41
and we have members of the Elixir board. Elixir board is the highest organization deciding on everything what is going on in Elixir. In some way, it is something like the CCU we have here in Danby. From Germany, the Elixir board is somebody from the BMPF. It is Johannes Moore and we have to sign this,
37:01
Rolf-Bachow from Alex Guzman who is a delegate from the BMPF. Yes. What is of importance? What should I mention? In addition, the next step which was carried out is the German node application. We described Elixir, what Germany is offering to Elixir.
37:23
This has been carried out and we are still on the way to formulate a collaboration agreement. This is now the last one, or even longer, year that we try to establish such a collaboration agreement. The problem is the following.
37:41
We in Germany are a distributed node and distribution was not known to other European countries. So we have to get something completely new and in the meantime, this is more or less on the paper and I hope that we will sign the collaboration agreement
38:00
in the beginning of next year. But Germany is already involved in many Elixir activities like implementation status. I already told you important topics which are handled, or questions which are handled, Europe-wide. Elixir core data resources, I will come to this, and Elixir use cases. Use cases means communities
38:21
which are of interest. Of course, there are some European countries who are of interest of very special use cases and Germany is very clear. We cover the whole bioinformatics, so we are more or less in all of these different use cases. The collaboration with Elixir,
38:40
I already mentioned that Elixir has, in the meantime, 18 so-called Elixir core data resources and this is a hard procedure. You have to apply for it and then it is international reviewed,
39:01
and if the reviews are in the right way, you will get this label, European core data resources, what is written here. And two of the German, of the Danby data resources, namely Silver and Brenda, they got this label. It's very, very important.
39:21
As soon as this label was available, the PMBF paid more money for Silver and Brenda, what was necessary. It was always a big problem for the future. And Silver, perhaps it is known to you, it's from Frank-Oliver Löttner in Bremen and it is a quality control databases of aligned ribosomal RNA.
39:41
So, if you look for taxonomy, that is the way you go, you go for Silver. And Brenda has a most widely used enzyme information system providing comprehensive information on all aspects of enzyme functions and properties. Brenda is located close to here,
40:00
to Braunschweig. Schomburg is also involved in this type of work here. He also has some posters, or at least he has one of the other posters. This was a very big success. Silver and Brenda, two of the German databases were selected and became this extension.
40:24
Good. The next important point is the establishment of the Danby Cloud. From the beginning on, the Danby network didn't have enough compute power, compute power in the different service centers. And we discussed a lot what to do against this gap.
40:42
And one possibility would have been that we will bring compute power to the individual service centers. But this was not decided. It was decided in the CCU to go for a cloud solution. And we went for a Danby Cloud solution, so the CCU decided not to equip each individual service center
41:03
with compute power, but to have a federated one. And this was finally concluded and the PMBF could be convinced. And since 2017, this Danby Cloud was started. It is a federated cloud. I already used this word. Since it is located at five universities,
41:23
the universities are Bielefeld, Giesen, Freiburg, Heidelberg and Tübingen. Only universities where cloud knowledge was already available could start or could work in the Danby Cloud. So these five universities are the universities within Danby which know what cloud computing really means.
41:44
And we got 2016, 2017, 2018, we got financial support from PMBF to develop the cloud. We have got another grant this year and in the meantime,
42:01
we have 20 millions to develop the Danby Cloud. And this was rather successful. What about the characterization of the Danby Cloud? We have one governance, very, very important. We have a central portal. So everybody who would like to work, to have a project on the Danby Cloud have to go through this central portal.
42:24
And the applications are interchangeable. So if one of the projects can be run on one cloud location, it also can run on the other ones. And of course, we have a shared development of DARS for common and cloud software.
42:41
So that is something, what is going on. This is for people who know what it means. We have 16,000 cores, the biggest in Europe in the bioinformatic field. We have 170 terabytes RAM and 38 petabytes storage capacity.
43:00
So that is something and I can tell you that in the meantime, we are really in the situation to compute large projects. If you are interested that you would like to use the Danby Cloud, it's open for everybody in the bioinformatic field.
43:20
It's free of charge very closely, clearly presented here. So first, you have to apply for cloud resources by proposing a project and describing the required resources. So it is necessary, of course, it is necessary to review the project.
43:41
And the resources you are asking for. So you have an application. The application is reviewed by a scientific committee. So the scientific committee can say yes or no. Not everybody can work. It has to be in the right direction. And after approval of the application, the project is created in the Danby Cloud port.
44:05
If you do not know how to use the Danby Cloud, you will get help. Yes, that is the offer by Danby. And then project resources are allocated at one of the cloud sites. So you cannot say I would like to go to this cloud location.
44:21
So the best one will be selected for you. So this works in the meantime very efficiently. And I just would like to give you some projects without telling you who is applying or who is working,
44:41
who is behind such projects. For instance, a health care project where distributed health data are collected and used to create value for citizen health care and scientific research. That is very clear. That is the beginning. That is the start in bioinformatics. It is necessary. A second project is very clear.
45:02
Project for assembling of large plant genomes. You just can go to Gattusleben. Plant genomes, what you have, what you will get in the meantime from sequencing, you will get short reads and long reads. Short reads is Illumina. The long reads are the new technology. We pack bio on the nanobore sequencing.
45:23
And this needs new bioinformatics tools. And these new informatics tools, and of course, a lot of data going into it. And this can be run such projects on the DNP Cloud. And you can, of course, if you have such plant genomes established,
45:45
you can use them and analyze them and get the genes on it. Project three is very clearly in the future direction. Projects for the exploration of soil metagenomes.
46:01
In the meantime, there are a lot of metagenomes, soil metagenomes around. And here, not new metagenomes will be created, just the already existing metagenomes are used. And if you go to an individual metagenomes, you will find a lot of open reading frames that you do not know what these open reading frames are for.
46:23
And the background here is the following, that you are asking, there are similar open reading frames in other soil metagenomes available. And if you can combine them, perhaps you get an idea what they are for. So that is what is done here, and that is exactly the FAIR project of data.
46:43
Reusable of data which are available. And this can be done, and it's very clear, if you have hundreds of such metagenomes, this is a huge data set which should be analyzed, and this can be done on the 10B cloud. I hope that in this direction,
47:01
I give you some interesting information. So, more or less, I will come to the end, very close, very clear. In the cooperation with industry, we are just at the beginning, we have created something like an industrial forum. The industrial company will come together and discuss with us.
47:21
There is the industrial branch of 10B, being a member of such an industrial forum is free of charge, and we will organize workshops and similar things. Perhaps you are interested in which information material is available on the 10B network. I can just point to the 10B handbook here.
47:41
So the Journal of Biotechnology is a special issue. In the handbook, you will find everything that you would like to know on the service center, on the project. And you have also brief information on the tools offered by the service centers. There is a handbook, a very nicely description of the 10B network, and a special issue contains 27 review articles
48:03
describing scientific backgrounds of tools. So if you would like to see what is behind the tools, what is the scientific background, go to the Journal of Biotechnology, which was published in November 2017. If you are looking to get information
48:21
of what is going on in the 10B network, we have the 10B quarterly newsletter, which means that you have each three months. You have a new edition, and the edition three of this year is just under preparation and will be out at the end of this week. More information material on the 10B network are our flyers.
48:44
We have a flyer on the 10B network. We have a 10B fact sheet or 10B fact flyer, and we have a 10B cloud flyer. And I do not go into the details, since you will find all these flyers in your material. It is here on the table.
49:01
It is also distributed in your conference material. Okay. If you are asking for links, everything is available, of course, electronically, and we have a webpage. We have a Twitter channel, and if you would like to get in contact with 10B,
49:21
please contact 10BDE, so you have the connection to the administration office, and your questions will be handled. So this brings me to achievements and visions. I will make it very short. Of course, we have established servicing training in the first establishment phase.
49:43
We have started the national node into Elixir and the 10B cloud in the first establishment phase, and we are continuing now in the second phase. And the vision of the network will be that this dynamic and distributed infrastructure
50:01
should become one of the most powerful national bioinformatic infrastructure in Europe. That is very easy, since Germany is the largest country and the economic situation is very good with Germany. There is also the reason why we pay a lot of fees to Elixir. In the future, besides academics,
50:21
we also hope that industrial researchers will make use of services and training over by 10B. It is of interest also for the industry. You have to be careful with industries, since you cannot support them too much if they just are on the way to use it for economic reasons.
50:40
The big challenge for the future will be to look for continuation of the 10B project, as a sustainable infrastructure. I pointed this out at the very beginning. That is something that we have to clarify in the next two or three years. And this really brings me to the end of my talk.
51:01
I show you the 10B crew at a plenary meeting, taking a picture taken in November 2016 in Berlin. I can tell you that all the guests we have in the meantime, we have 225 people working for 10B.
51:22
70 positions are granted by the PMBF project. 70 more positions are just from universities and research institutes. So the mobilization effect of the 10B PMBF project is enormous.
51:43
Some of these people are only working half-time for 10B, because we have 222. I thank you for your attention and I will finish my presentation. Thank you very much.
Recommendations
Series of 12 media