Interactive Visualization for large-scale multi-factorial Research Designs.
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 12 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/38606 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
Visualization (computer graphics)Scale (map)Interactive televisionDisintegrationMultiplicationFaktorenanalyseVisualization (computer graphics)Scaling (geometry)Interactive televisionGroup actionBitMereologyXMLUMLComputer animation
00:27
DisintegrationPlanningEntire functionProcess (computing)Experimentelle VersuchsforschungData managementDigital signal processingVisualization (computer graphics)Scale (map)Interactive televisionMathematical analysisNP-hardMereologyTube (container)GravitationDifferent (Kate Ryan album)Computer animation
00:46
INTEGRALPlanningEntire functionDisintegrationProcess (computing)Experimentelle VersuchsforschungData managementVisualization (computer graphics)Scale (map)Interactive televisionElectronic meeting systemDigital signal processingCodeState of matterConnected spaceoutputMathematical analysisSampling (statistics)MetadataComputer animation
01:42
Visualization (computer graphics)Scale (map)Interactive televisionIntegrated development environmentPhysicsDisintegrationStandard deviationVideo gameObservational studyMathematical analysisStatisticsMathematical analysisOrder (biology)StatisticsMultiplicationResultantVisualization (computer graphics)Basis <Mathematik>CASE <Informatik>Observational studyEndliche ModelltheorieStandard deviationOntologyGraphics softwareGame controllerNatural numberAutomationMetadataDifferent (Kate Ryan album)MappingComputer animation
03:23
Process modelingDisintegrationVisualization (computer graphics)Scale (map)Interactive televisionProteinMathematical analysisData managementSequenceProjective planeDataflowDifferent (Kate Ryan album)Computing platformProteinInformationOrder (biology)Universe (mathematics)Physical systemSummierbarkeitComputer animation
04:17
Process modelingDisintegrationVisualization (computer graphics)Scale (map)Interactive televisionSampling (statistics)CASE <Informatik>HierarchyRaw image formatDesign of experimentsComputer animation
04:42
Level (video gaming)DisintegrationVisualization (computer graphics)Scale (map)Interactive televisionLocal GroupObservational studyExperimentelle VersuchsforschungFaktorenanalyseCombinational logicType theoryGroup actionPower (physics)NumberDivisorCASE <Informatik>outputObservational studyDifferent (Kate Ryan album)Replication (computing)InformationDesign of experimentsPortletComputer animation
05:34
Experimentelle VersuchsforschungSample (statistics)DisintegrationMultiplicationVisualization (computer graphics)Scale (map)Interactive televisionData structureHierarchyFaktorenanalyseNumberGraph (mathematics)Design of experimentsType theoryGroup actionMereologySampling (statistics)DivisorObservational studyDifferent (Kate Ryan album)OntologyProteinVector spaceCASE <Informatik>Process (computing)Student's t-test1 (number)Computer animation
06:29
StatisticsVideo gameVisualization (computer graphics)Sample (statistics)DisintegrationScale (map)Observational studySpreadsheetFile formatFaktorenanalyseMetadataElectronic mailing listOntologyLocal GroupRepresentation (politics)SpreadsheetAsymptotic analysisForm (programming)Sheaf (mathematics)Power (physics)MereologyProjective planeSampling (statistics)Visualization (computer graphics)Goodness of fitDivisorCASE <Informatik>Observational studyMetadataScripting languageStability theoryFile formatDifferent (Kate Ryan album)Representation (politics)Replication (computing)OntologyGroup actionGraph (mathematics)WebsiteComputer animation
08:26
DisintegrationVisualization (computer graphics)Scale (map)Execution unitGraph (mathematics)InformationMereologyDivisorMeasurementObservational studyDifferent (Kate Ryan album)Self-organizationComputer animationDiagram
09:06
Graph (mathematics)MultiplicationVisualization (computer graphics)Scale (map)Interactive televisionCategory of beingNumberAttribute grammarLocal GroupDisintegrationLevel (video gaming)DivisorGraph (mathematics)Level (video gaming)Category of beingMaxima and minimaSampling (statistics)NumberDivisorCASE <Informatik>Replication (computing)1 (number)Computer animation
09:49
Execution unitDivisorVertex (graph theory)DisintegrationFaktorenanalyseFunction (mathematics)Level (video gaming)Boolean algebraGraph (mathematics)Graph (mathematics)Visualization (computer graphics)Scale (map)Interactive televisionGraph (mathematics)NumberDivisorBoolean functionCASE <Informatik>Set (mathematics)Graph (mathematics)Different (Kate Ryan album)Representation (politics)Level (video gaming)Group actionSimilarity (geometry)Field (computer science)Endliche ModelltheorieFerry CorstenCommitment schemeComputer animation
11:44
Graph (mathematics)Visualization (computer graphics)Scale (map)Interactive televisionDisintegrationMultiplicationPretzelProteinImplementationArc (geometry)Functional (mathematics)CASE <Informatik>ImplementationLevel (video gaming)BitSampling (statistics)Inheritance (object-oriented programming)Representation (politics)ProteinInformationPhase transitionContent (media)InjektivitätText editorSet (mathematics)Disk read-and-write headKey (cryptography)NeuroinformatikRight angleComputer animation
14:44
Client (computing)Similarity (geometry)Vertex (graph theory)ImplementationServer (computing)DisintegrationSample (statistics)Visualization (computer graphics)Scale (map)Interactive televisionJava appletInformationAlgorithmData managementInformationProjective planeSampling (statistics)Table (information)Remote procedure callInteractive televisionWeb browserJava appletSoftware frameworkDifferent (Kate Ryan album)Position operatorProcedural programmingGradient descentPhysical systemProcess (computing)NeuroinformatikService (economics)ProteinComputer animation
15:47
DisintegrationIntelVisualization (computer graphics)Scale (map)ThetafunktionComa BerenicesPolygon meshWeb browserSynchronizationConnected spaceComputer animation
16:05
DisintegrationFormal grammarWechselseitige InformationWeb browserVisualization (computer graphics)Scale (map)Interactive televisionCategory of beingDemosceneProgram flowchartComputer animation
16:22
Internet forumComputer virusFormal grammarGraph (mathematics)Projective planeSampling (statistics)Revision controlExtreme programmingComputer animationProgram flowchart
16:42
DemonDisintegrationExecution unitModemLevel (video gaming)InformationProjective planeCellular automatonSound effectComputer animation
17:00
DisintegrationEmailBinary fileInformationDivisorCASE <Informatik>Computer animation
17:19
Execution unitDisintegrationMach's principleLemma (mathematics)Level (video gaming)Vector spaceCASE <Informatik>Roundness (object)DivisorMassComputer animation
17:43
DisintegrationMach's principleDivisorExecution unitLevel (video gaming)Line (geometry)MereologyProjective planeCellular automatonDivisorProteinMembrane keyboardVector spaceMeasurementCASE <Informatik>Computer animation
18:20
DisintegrationBeta functionIntelVisualization (computer graphics)Scale (map)Web browserCore dumpData modelComputer animation
18:39
Standard deviationDisintegrationVisualization (computer graphics)Scale (map)DampingGamma functionSource codeEndliche ModelltheorieDifferent (Kate Ryan album)Multiplication signStandard deviationFile formatComputer animation
18:56
Color managementDisintegrationIntelVisualization (computer graphics)Scale (map)InformationRevision controlObservational studySound effectDivisorFile formatComputer animation
19:25
Instance (computer science)Graph (mathematics)Visualization (computer graphics)Scale (map)Interactive televisionDisintegrationFile viewerGraph (mathematics)InformationSoftwareDesign of experimentsResultantSet (mathematics)Inclusion mapJava appletFile formatFile viewerEvent horizonRight angleComputer animation
20:15
File formatDisintegrationVisualization (computer graphics)Scale (map)Interactive televisionGraph (mathematics)Revision controlStructural loadElectronic meeting systemOpen setObservational studyDesign of experimentsDifferent (Kate Ryan album)CASE <Informatik>Observational studyExtreme programmingStandard deviationComputer animation
20:35
DisintegrationFluxIntelVisualization (computer graphics)Scale (map)Graph (mathematics)Revision controlStructural loadVideo gameDivisorObservational studySample (statistics)Source codeObservational studyGraph (mathematics)MeasurementDifferent (Kate Ryan album)Computer animation
20:56
DivisorVisualization (computer graphics)Scale (map)Interactive televisionGraph (mathematics)Revision controlDisintegrationChi-squared distributionVideo gameStructural loadBit rateAugmented realityIntelFunction (mathematics)Zoom lensFluxObservational studyTeilerfunktionMaxima and minimaInformationProteinLevel (video gaming)Complex (psychology)TouchscreenMereologyElement (mathematics)Computer animation
21:32
DivisorVector spaceElectronic meeting systemVisualization (computer graphics)Scale (map)Graph (mathematics)DisintegrationStructural loadObservational studyInteractive televisionRevision controlMathematical analysisDesign of experimentsSampling (statistics)Extreme programmingLevel (video gaming)Line (geometry)Cellular automatonObservational studySpeciesDifferent (Kate Ryan album)Multiplication signCASE <Informatik>Computer animation
22:10
Visualization (computer graphics)Observational studyGraph (mathematics)Standard deviationDisintegrationScale (map)Graph (mathematics)InformationDesign of experimentsMereologyObservational studyFile formatSpeciesStandard deviationSampling (statistics)Regulator geneComputer animation
22:57
Local GroupDisintegrationComputer animation
23:15
DisintegrationMeeting/InterviewXMLUML
Transcript: English(auto-generated)
00:00
So as introduced, I will talk about interactive visualization for large scale multi-factorial research designs. I'm Andreas from the bioinformatics group in tubing applied bioinformatics. Before I talk about the graphs, I will introduce a bit why I'm working
00:22
with research designs and what the problems are. And so as part of my PhD, I'm also part of the quantitative biology center in tubing. We are a central facility that analyzes data, but also plans these experiments
00:41
because they want to bring together researchers and different labs in tubing that generate the data and planning these things together as certain pros. So that normally happens. Users come to us and yeah, we plan just together.
01:08
Basically, we present them a research portal that they can input their metadata and their research designs. They then get barcodes for their samples
01:21
and these barcodes are later used in the labs where the data is generated. Then the data comes to us. The raw data is integrated with the already saved metadata connected and the analysis happens and users can then on the portal, exploit this data.
01:41
Why is this needed? We have heard multiple problems today already about the reproducibility crisis. So as you can see by this nature poll basically, many researchers say that they had problems reproducing results for other people,
02:01
but also their own results in some cases. What are solutions that are proposed to resolve these problems? Better understanding of statistics. That means also designing a more robust research design and not generating data and then going
02:21
to the bioinformatician and saying, show us some results for this. And also enforcing standards because if everyone collects the metadata different way, that can lead to problems. Of course you can do mapping of ontologies to solve some of these problems,
02:42
but some people don't even use ontologies. So what are we trying to do to solve these problems? We've been working on web-based creation of these experiments. So metadata collection, but the main aspect of that is the experimental design itself.
03:01
So what is the question of the design that we want to answer by analyzing this? Then visualization of study design, which is what I'm talking about today. And we also do automated quality control and analysis. So in order to visualize these study designs, we first need an experiment model,
03:22
which I will present now. So imagine you have a project about mice. You want to find out something about these mice concerning a treatment A and a treatment B that you give these mice. So this is basically disinformation about mice and what kind of treatment they get is collected in a biological experiment
03:43
in our data management system. And then you have a next step where you extract the liver. For example, you want to find out what happens in the liver if you give a treatment to these two animals. And then, of course, you can perform a multiple,
04:01
you can perform different analysis on proteins or on DNA that you find in that liver. So that would be a third step. And after that, you measure this somehow on high-throughput analysis platforms. For example, you sequence the DNA on a flow cell, like that is shown here,
04:21
and out comes the raw data. And you see, you have kind of a hierarchical model. You start with the mice, extract some kind of tissue, prepare DNA samples, and then measure them. So the second aspect that is rather important for this design in our case
04:42
is factorial experimental design. The factorial design means that you have different experimental factors. For example, this treatment, as we've seen, another type could be the genotype of the mice, for example, bile type and knockout.
05:01
And full factorial design always looks at all the combinations. So k to the power of n, in this case, is two to the power of two, would be four different experiment numbers or study groups. Of course, you can have replicates. So for example, you could have five mice for each group. So you would have 20 different experiments.
05:24
So we have a portlet where people can input all this information and create experimental designs that are perfectorial. That's called the experiment design wizard, but that is not part of what I'm talking about today.
05:40
Important is this hierarchical graph that comes out of this. So similar to the example that I showed, you see here you have these four mice. These are the four different study groups that we talked about. Factor F1 is the treatment, A or B, and factor F2 is bile type or knockout genotype.
06:02
From each of these, you extract some kind of liver sample and from those, you extract proteins in this case. Now, what is the problem with that? As with the other examples that we have seen for ontologies or for pathway graphs, they can get quite large.
06:24
And if you want researchers or anyone really to make sense of that, you need some other approaches. For example, this is a project containing just 19 mice. And these are basically the sample graphs that are connected.
06:42
And if you don't see much, that's part of the problem. So in general, because we wish for a good statistical power because we want many replicates, we have also large sample sizes and this is hard to visualize. So how do other people handle something like that?
07:03
There are many spreadsheet-based formats, for example, ISA-TAP. They collect metadata and experimental factors based on ontologies, for example, which is a good thing, of course. And they want to make it easy for people to edit that using Microsoft Excel or some other format.
07:21
So that's why they use spreadsheets. Of course, all the data is in there. You can somehow analyze that using scripts, but you don't really have visual representations. If they do visual representations, it's more of the different parts of the study that belong together, but not of the experimental factors or the samples that you took.
07:44
And yeah, in the case of ISA-TAP, this is split into different sections. And for example, these can be analyzed, for example, the ontologies that are part of that. Another thing that the same auto-authors did is actually something that is similar to what I will talk about.
08:02
They list these samples in each study group. So this is done on Biography, which is a website where you can look at these ISA-TAP studies. But they don't really visualize that. They just list it too.
08:22
So no sample visualization to solve this problem yet. And this is basically how ISA-TAP is built. So you see, you have these three different parts, investigation, which contains more general information about the experiment, which people are part of it,
08:40
which organization, and it leads to publication, which you can data mine, for example. Then in the study, the study design is talked about, basically, and different factors. And the essays are the different measurements and how do you get to your measured data from the study subjects, basically.
09:03
So our approach is use graph aggregation, actually. Graph aggregation means that instead of displaying the full graph, you somehow merge different nodes of the graph. So then you have some kind of child property
09:21
in your graph, and instead of displaying all of them, the similar ones are just denoted by a number in a corner, for example. So this is actually pretty good for our use case because in experimental designs, we want many replicates, and replicates by design are meant to be similar to the other nodes,
09:43
so they have the same factor levels, for example. How does this work mathematically? So imagine we have N different factors of this experiment, and they built these huge graphs that we want to make smaller somehow.
10:01
For this, we need a Boolean function which actually tells us which of these nodes are similar, so which of these nodes can we merge. And since we want to look at this from the factor level, we also get as many graphs out of this as we have experimental factors.
10:20
So the aggregation graph is actually H I, in this case, which has a node set and an edge set, of course. And basically what we do, I will show an example in a minute, is we look at which nodes are already in the aggregated graph, and if we find one of these,
10:41
and then we look at each node, at a single node from the existing graph, if this is already similar, if one of them is already similar, we don't take it in the aggregation graph, but we can increment the number of the existing node, basically. But if we don't find it, we put it in a new graph, because apparently there's no representation
11:01
of this node in there. And in the end, we just look at the edges and compare the edges of the old graph with the edges of the new graph, and if both nodes that are connected by the edge are in the old graph and in the new graph, then we connect them. So this is a bit much, but I have an example.
11:23
So this is the graph that we talked about at the beginning. And the first step is to look at which nodes are already in the aggregated graph, H1, which is for factor one, so treatment in this case.
11:44
First, we don't have any nodes, of course. So we have nothing to compare to, so this function stays zero. So we add the first node, mouse treatment A, in this case. We continue with the second node,
12:02
and we see that this node already also has treatment A, so this is actually a replica of node one. So in this case, that would be one, actually. That's a mistake. So in this case, we just increment this number, so this node is now representative for the first two nodes.
12:25
Now we go to the third node. We notice that treatment in this case is treatment B, so we add this node, because the two nodes that are already in there are different than this, because treatment was A.
12:41
So we continue with this. This is the same. In this case, this is right, so it's one now. And so we increment this again, and then we go to the next level. We find that there's no node that contains liver, a liver sample, basically, from these mice with treatment A, so we add that.
13:04
In this case, we find out that this node is now there, and we continue this on the protein level. So basically, what's important about this is that the parent sample is always, so the entity that the sample is derived from
13:23
is also important, otherwise we couldn't aggregate these liver samples in two different ones. So in the last step, we are still missing these edges. So what you do is look at the edges
13:40
that are in the original graph and at the nodes that they connect, and we add edges for all the nodes that are basically in the new graph, H. And you see, these are basically copied, but the others are, of course, ignored. So this is just the mathematical representation.
14:02
So what does this look like in our portal? This is the same example, basically. So we went for a bit of a smaller design. And I'll try a better example soon.
14:20
Our implementation is based on D3.js, and we added some more information, of course. For example, users can see a legend about what these different nodes mean. Status of data collection is shown here by these arcs,
14:40
and there's some interactivity, which I will also show in a minute. We implemented this using Java and different JavaScript packages. Vardin is basically a Java framework that allows you to create portlets, Java portlets, which is what most of our portal is based on.
15:05
What happens, basically, when a user clicks on a project is sample information or information about these entities is fetched from our data management system, OpenBiz. In Java, we compute these node summaries
15:20
using the algorithm I just showed you. On the JavaScript side, node positions are computed using these darker package and D3s used to draw these in the browser. And this interactivity is done using remote procedure calls. So when a person clicks on one of these nodes,
15:42
a table with these samples that the node represents is shown. How does this look in action? Sorry about that.
16:14
So this is the, I can't start it in this mode, I'm sorry.
16:26
I hope it can still be seen. This is the old graph that is, as you see, too large to really make out what this project is about. But if you now select the new sample graph, the aggregation graph,
16:40
we can basically do an extreme version of that. We can select no factor at all. That means that each level is basically aggregated. So here you see, this is a mouse project. Kidney is taken from the mice. Some kind of cell lysate is extracted. If you click on a node,
17:00
you see information about which nodes have data and which have no data attached. If you choose a treatment, you see that the mice themselves were treated. That means that the factor at the top layer is split basically. We'll have another example later, but it isn't the case.
17:21
And yeah, you see the different factors. This is a second example. This is actually a peptide mass spec experiment.
17:41
So you see, you can add multiple levels actually not only three. So in this case, proteins are extracted and these proteins are digested. So you see a fourth level of peptides below that and the peptides are measured. Now, if you choose the factor compartments,
18:00
it actually splits at the cell line level because this is a project on, I think, algae and they wanted to see different, but how different parts of the membrane react actually.
18:23
Okay, basically people can use that in our portal and our portal can be installed at other core facilities. But of course, if you just want to use this graph, that's a bit difficult and you might not want to use our data model
18:41
because, well, that's another standard and actually that doesn't really solve the problem if everyone uses something different. So we went back and looked at this as a tap format and checked what we can actually do with that.
19:03
And you probably noticed they collect exactly the same information that we do. So that is highly useful to also display this format. For example, they have these study factors and material and of course the subject.
19:22
So what we did is also implement a standalone version. And of course, you can also import as a tap into our portal. The standalone graph viewer is done using JavaFX D3, which is a package for Java that you can easily include D3 in JavaFX.
19:46
We use the IASER tools software to parse this experimental design. And yeah, then you can either look at the results in our portal or in the standalone viewer.
20:03
The good thing about the portal is that you have, as you've seen this information about data sets that are already attached, you don't get that of course in a file format that you upload. Okay, and this is how that looks.
20:22
Oh, it does actually work. So you can import different experimental design formats, in this case, IASER-TAP. On the left side, you see the different studies because for each IASER investigation, you can have multiple studies. The summary is displayed when you select one,
20:41
and then you can of course do the same. You can get an overview by choosing none. So you have the full aggregation basically. You see that there are about 300 different entities in this graph that three different things were measured, RNA, small molecules, and proteins. You can click on them to see the information
21:02
that is found in IASER-TAP about these samples. If you have a complex design like this, where you have six different levels, you can also zoom in and out to fit it on the screen basically, or to zoom in to get more information about special notes. And again, you can click on it and see in this part
21:21
with the unspecified extract, where the carbon was basically the limiting element. What are these different samples called? And if you double click, you can save this experimental design.
21:43
Now, if you select the second study, you get a picture of the second study. In this case, there was a different exposure time for some cell growth medium, and you see it's split at the second level
22:01
because the cell line was actually treated and not the species. So to summarize, experimental design is important, but it leads to a few new problems, like large studies that you can't really get an overview
22:23
just from the formats that are there alone. So we went for graph aggregation to summarize the important aspects of the study. You could go for any different information that you have in your metadata or in your formats.
22:40
For example, you can just say, okay, these are the different species that are part of the study. And it's always good to build on existing standards to provide a tool that's not only useful to your lab or your group, but to many people. I want to thank especially Luis De La Garza,
23:01
Sven Ansen from the quantitative biology center in Tübing, who helped me with this project, and also my supervisor, Oliver Kohlbacher from the applied bioinformatics group. And that's a picture of the people from the quantitative biology center. Thank you. Thank you.
23:22
Okay, questions.