Computational models of stem cell decisions
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 17 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/40518 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Leibniz MMS Days 201914 / 17
00:00
Decision theoryForschungszentrum RossendorfEvent horizonSpecial unitary groupKörper <Algebra>Thermodynamisches SystemTheoryDiagram
00:40
Forschungszentrum RossendorfDifferent (Kate Ryan album)Thermodynamisches SystemMoment (mathematics)
01:23
Scalar fieldMach's principleMaß <Mathematik>Decision theoryData analysisPhysicsMathematicsMathematical analysisMedianBeta functionMereologyAssociative propertyDynamical systemDifferent (Kate Ryan album)Group actionMoment (mathematics)Maß <Mathematik>GradientIncidence algebra
02:17
Mach's principleNetwork topologyModel theoryDifferent (Kate Ryan album)1 (number)Sampling (statistics)Price index
02:53
Fluid staticsFunction (mathematics)Block (periodic table)Greatest elementGrothendieck topologyBeta functionAutoregressive conditional heteroskedasticityScaling (geometry)Figurate numberModel theoryFluid staticsGroup actionRight angleDynamical systemGrothendieck topology
03:53
Translation (relic)Uniformer RaumModule (mathematics)Nichtlineares GleichungssystemGamma functionInclusion mapAerodynamicsElement (mathematics)Line (geometry)Nichtlineares GleichungssystemInterior (topology)MathematicsMeasurementFood energyModel theoryDivisorNumerical analysisDynamical systemOrder (biology)Slide ruleConcentricRight angleOrdinary differential equationEngineering drawing
05:09
AerodynamicsAtomic nucleusModel theoryPhysical systemAreaComplex (psychology)
05:54
Model theoryRight angle
06:27
Numerical analysisRight angleSet theoryProduct (business)Physical system2 (number)Process (computing)Dynamical systemAtomic nucleusComputer animation
07:18
Number theoryOpen setStochasticMathematicsState of matterTerm (mathematics)Right anglePhysical systemDirection (geometry)DivisorFunction (mathematics)Vector potentialStudent's t-testParameter (computer programming)Descriptive statisticsStochastic kernel estimationSymmetric matrixExpressionMereologyMultiplication signEnergy levelLimit (category theory)Different (Kate Ryan album)Decision theoryCharge carrierOrdinary differential equationDifferential (mechanical device)Differential topologyMany-sorted logicClosed setEngineering drawingDiagramProgram flowchart
12:10
Mach's principleEnergy levelQuantificationHypothesisPairwise comparisonPoint (geometry)Student's t-testPhysical systemMereologyStochastic processKörper <Algebra>Energy levelDecision theoryCurvaturePhysicistNumerical analysisStochasticDirection (geometry)TrajectoryExpressionSet theoryState of matterFigurate numberAlgebraic structureOrder (biology)Group actionNichtlineares GleichungssystemOrdinary differential equationMultiplication signQuantificationDifferential (mechanical device)Model theoryCharge carrierNoise (electronics)Regular graphRight angleTrailDifferent (Kate Ryan album)Theory of relativityCurve fitting
17:45
Mach's principleDecision theoryEvent horizonCorrelation and dependencePoint (geometry)Multiplication signRight angleTerm (mathematics)Axiom of choiceNetwork topologyInferenceDecision theoryDivision (mathematics)Process (computing)Lipschitz-StetigkeitCartesian coordinate systemMereologyAnalogyMany-sorted logicElement (mathematics)Figurate numberState of matterEvent horizonWaveNoise (electronics)MeasurementGroup actionAlgebraic structureDifferent (Kate Ryan album)Differential (mechanical device)
21:19
Correlation and dependenceEvent horizonProcess (computing)Function (mathematics)StochasticLimit (category theory)Parameter (computer programming)Network topologyEstimatorForcing (mathematics)Cross-correlationDifferential (mechanical device)Process (computing)Generating set of a groupLink (knot theory)Decision theoryCross-correlationParameter (computer programming)Different (Kate Ryan album)Function (mathematics)State of matterLikelihood functionGroup actionLengthNumerical analysisStatistical hypothesis testingLimit (category theory)Musical ensembleGroup representationPoint (geometry)SummierbarkeitMultiplication signPhysical systemSet theoryMultilaterationExpressionCurve fitting
24:09
PredictionDecision theoryNetwork topologyParameter (computer programming)Generating set of a groupDifferential (mechanical device)Real numberProcess (computing)Multiplication signMereologyLikelihood functionPoint (geometry)EstimatorSet theoryDifferent (Kate Ryan album)Division (mathematics)TrailRight angleFood energyComputer animation
25:21
PredictionAerodynamicsLinear subspaceDifferential (mechanical device)Network topologyMultiplication signNumerical analysisProduct (business)Axiom of choiceDifferentiable manifoldAlgebraic structurePoint (geometry)Decision theoryCross-correlationGenerating set of a groupCurve fittingDynamical systemParameter (computer programming)Energy levelAreaShooting methodConcentricProcess (computing)EstimatorPredictabilityLine (geometry)Chemical equationState of matterRight angleSymmetry (physics)Division (mathematics)Direction (geometry)ReliefDivisorLinear regressionStudent's t-testFigurate numberDiagram
28:17
Axiom of choiceMathematical morphologyPredictionEquivalence relationImage resolutionRight angleMereologySheaf (mathematics)Position operatorMultiplication signDecision theoryResultantModel theoryMathematical morphologyPoint (geometry)Generating set of a groupKörper <Algebra>Projective plane
29:55
Mach's principleMathematical morphologyNumerical analysisPoint (geometry)DivisorRight angleMultiplication signKörper <Algebra>TrailTable
30:39
ConvolutionMach's principleSocial classIRIS-TSummierbarkeitDrop (liquid)Right angleModel theoryStatisticsConvolutionKörper <Algebra>Social classDifferent (Kate Ryan album)Order (biology)Clique-widthStatistical hypothesis testingWave packetSet theoryObject (grammar)Maß <Mathematik>Process (computing)Computer animation
32:24
ConvolutionLinear mapFunction (mathematics)Sign (mathematics)Maxima and minimaMaxima and minimaConvolutionMultiplicationPositional notationMereologySocial classOrder (biology)Maß <Mathematik>Multiplication signGroup actionImage resolutionIdentical particlesWave packetAlgebraic structureRight angleFrictionTrailPerspective (visual)Statistical hypothesis testingContent (media)
34:47
Wave packetTrailNetwork topologyAverageFood energyGraph coloringComputer animation
35:48
Mortality rateEquivalence relationStatistical hypothesis testingOrder (biology)CurveDifferent (Kate Ryan album)MeasurementMaß <Mathematik>Multiplication signPredictabilityIterationPosition operatorDivisorDistanceSummierbarkeitWave packetBounded variationAreaStudent's t-testGoodness of fitComputer animation
37:01
Mach's principleEquivalence relationDivisorMultiplication signParameter (computer programming)Group actionMatrix (mathematics)Figurate numberNetwork topologyPredictabilityTrailMany-sorted logicRight angleExpressionPhase transitionSocial classProcess (computing)DivisorTrajectoryKinematicsLattice (order)Differential (mechanical device)Point (geometry)
38:38
Sampling (statistics)Cartesian coordinate systemCountingMereologyMathematical morphologySocial classSinc functionWage labourAdaptive behavior
39:31
Mach's principleDifferent (Kate Ryan album)Right angleInsertion lossSampling (statistics)CountingWave packetSocial classSet theoryComputer animation
40:16
PredictionMach's principleSocial classDifferent (Kate Ryan album)Atomic numberSheaf (mathematics)State of matterHierarchySet theoryStatistical hypothesis testingNumerical analysisSign (mathematics)Point (geometry)Order (biology)Cartesian coordinate systemLatent heatAreaPrice indexComputer animation
41:36
Algebraic structureFocus (optics)ConsistencyAxiom of choiceModel theoryAtomic nucleusPhase transitionDynamical systemGroup actionAxiom of choiceGrothendieck topologyNumerical analysisShooting method
42:30
Deutscher FilmpreisMoment (mathematics)Limit (category theory)Lecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:01
Hi, everybody. It's great to be here. Thanks for the invitation. I already enjoyed the mixed crowd of people here during lunch where we talked about brain research and also the magnetic field of the sun, et cetera. So I really enjoy this mix of topics and people.
00:21
It's great to be here. I'm from Munich, so this is where I work. It's the Helmholtz Center in Munich. It's a research center for environmental health. When I give a talk somewhere, I sometimes start by saying, this is to the very north of Munich, so it takes me quite a while to go there. And then I sometimes say, as long as going to the place where I am,
00:41
this is certainly not true here today, because it took me really quite a while to come here. So from Munich, coming here, it's a travel. Still, you should come to Munich, of course, because it's worthwhile to go there and have a look. So this is the Helmholtz Center. It consists of roughly 30 different research entities. It's quite diverse, from groundwater ecology
01:02
to stem cell research to computational biology, what we do. We have about roughly 2,000 employees, 700 scientists, most of them PhDs. The topic over there is the individual being dominated by its genes, its lifestyle, and the environment. So it's a topic, or that's a theme,
01:22
people can agree on, I guess, over there. And we are part of the larger Helmholtz Association with 21 centers all across Germany. I guess you might know that. So great to be here. Great to have a look what the Leibniz Association is doing. I'm really keen on discussing more with you guys. So that's where I'm from. In the Institute of Computational Biology, we are roughly 50 employees at the moment, kind of growing.
01:44
We have 10 different research groups. And my research group is concerned with the dynamics of single cells. And that's what I'm going to tell you. By the way, we just started. We will start now later this year something which is called Haiku, the Helmholtz Association unit for the Helmholtz Artificial Intelligence
02:04
Coordination Unit, where we try to spread out AI methods to the whole Helmholtz community. And we started also as a science school, a grad school for data science, to recruit more people doing that kind of research we do. Now my talk starts. What we do, as all of you do, is abstraction.
02:24
Basically, we abstract complex systems to things we can measure and understand. And of course, that's what we call a model. Now in biology, and I just continue with what the two speakers before me introduced, there are a couple of different models in biology. For example, there might be this kind of network model which we already saw.
02:40
This is a gene network. Every dot is a gene, and they are connected. Sorry, it's a disease network. Every dot is a disease, and two diseases are connected. If a gene is associated with these diseases. So this kind of model is obviously qualitative, right? There's no dynamics. It's static, and it's large scale because it covers all the diseases we know.
03:01
We can have other models like that. And actually, if you talk to biologists, they call this a model, right? Just drawing two proteins, interacting that's a model for biologists. And you see two proteins here, a pure one and a gutter one. And the model is basically what you condense here is all the experiments you did before, right?
03:20
It's the last figure in a paper. And you also say, okay, these two proteins, they can bind to each other at two particular protein sites and then they can do something, actually, they can inhibit the transcription of another gene. All right, so this model is also qualitative, obviously. It's also static, but it's small scale. It just considers two transcription factors.
03:43
Okay, just as a very short detour, I guess you know that, but when I talk about transcription factors and the speakers before me also did, right, you know they are genes. They are transcribed to messages and those are translated to proteins. That's how genes are made of DNA, messages made of mRNA, right?
04:01
And the proteins are made of amino acids, I guess you know that, right? Some of these proteins called transcription factors can now bind to the genome and regulate the transcription of other genes. Okay, let's go over that. So as a final model example here, now this is something what we computational biologists would call a model, I guess, right? So we have a rather complex system with 10 players,
04:22
maybe here you see, so the gray line is the outer rim of a cell and you see a receptor and things can bind to this receptor and the receptor can signal downstream into the cell and then things happen there and genes are activated or deactivated. And this kind of idea we cast into models,
04:41
for example, an ordinary differential equation, where you say the change of the concentration of a receptor on the cell membrane, for example, has a couple of elements that determine that, right? And then you can fit even this kind of ODE to measurements and that's coming close to what, or that's basically what we call a computational model.
05:01
You can also predict things and then this kind of model is quantitative, right? You can write down numbers, it's dynamic and it's the same mesoscale because you can cover up to 10 speakers, up to 10 players with that. All right, so that's some overview over these different models. If you talk to biologists,
05:21
they also call other things models, as I told you, right, little drawings, but they also call species models, right? So for example, E. coli is a model species for biologists. It's a very simple bacterium, it has no nucleus, right? So you can do all kinds of genetic manipulations with it and it's a model because it's much simpler
05:41
than more complex mammalian cells, right? Same with yeast. Now this has a nucleus, but still you can genetically modify as much as you want and you can do experiments with it. So it's a model system for maybe more complicated animals like the fish, dineuraria, zebrafish, it's also a model organism. It's great because it's translucent, you can watch it grow in the embryonic development.
06:01
And then of course the mouse is at our place in Munich and I mean for drug research as well, this is the most widely used model organism, right, because it's a mammal, but still you can do any kind of experiments with it. Okay, so, but that's what I'm going to talk about. It's just, that's basically the frame, right, where our models, our biological models sit in.
06:24
So I'm going to talk about biological models now in a more computational sense. And I'm going to talk about single cells. So this number is amazing, right, we have 30 trillion human cells, every one of us is consisting of this number of cells as we sit here.
06:40
And not just this number is amazing, but it's also interesting to see that most of these cells we consist of are blood cells actually. And I mean, this 25 trillion are mostly red blood cells. These are the little tiny guys without a nucleus running through our blood, right, they transport oxygen. And most of our cells are these guys and then that's an amazing number, but even more amazing I think is that every day
07:01
each of us produces 500 billion of these cells, right. So we are not a static system, but we are kind of a dynamic equilibrium, right, reproducing ourselves every second. Now, how is that done? Well, the process is called hematopoiesis, it's the production of blood. And after 100 years of research,
07:22
that's basically the dogma people believe in for hematopoiesis. So the idea is you have a hematopoietic stem cell and these guys roughly estimate that we have between 200,000 and one million in our bodies. They sit in our bone marrow and they don't do much, but they sometimes divide and then they give rise to a more differentiated cell,
07:42
for example, a short-term hematopoietic stem cell. And then going down here towards the functional cells, right, that do really things, the potential of these cells decreases, right. So if you are a megakaryocyte, aerosocyte progenitor, you won't be able to become a better cell, but you are very close to becoming a red blood cell,
08:02
which is actually doing stuff in the body, right. So you lose potency and you go towards a functional cell, going from the stem cell level over the progenitor cell level to the functional cell level. All right, let's zoom in a bit. That's one thing we looked into a bit and let's zoom in a bit even more.
08:23
So you have here a common myeloid progenitor, myeloid because these guys sit still in our bone marrow and this cell type is believed to produce two different progenitor levels, which are more mature, right, either megakaryocyte, aerosocyte progenitor on the left and granulocyte monocyte progenitor on the right.
08:41
Okay, so what do we know about this system? Well, if you kick out, this one transcription factor I mentioned earlier, right, PU1 in mice, of course, because there you can do the experiments, right. If you kick out PU1 in the genome of the mice, you won't get any GMPs. If you kick out Gata1, you won't get any MEPs, right. So from these experiments, people concluded
09:01
that these two transcription factors, PU1 and Gata1, are probably very important for this differentiation decision, right. And then if you think about how is such a CMP cell able to make this decision, either to become a MEP or to become a GMP, then these two transcription factors are probably important. And now, that's again what I showed you earlier, right.
09:23
From some other experiments, people figured out that PU1 and Gata1 can bind to each other and then inhibit the expression of the other respective genes. So that idea, together with the functional importance of these two proteins, basically converged on the idea
09:41
that PU1 and Gata1 form a toggle switch, okay. And that's what I'm showing up there and over there. So Gata1 is inhibiting PU1. PU1 is inhibiting Gata1. Then they might self-activate each other. And this has been termed a toggle switch, right. And you can think of it as they inhibit each other. So if one gets a bit of an advantage,
10:02
it inhibits the other one and shuts it down and then you go to the one lineage. Or the other one has a bit of an advantage and then it shuts down the other one and you go to the other lineage, right. And this toggle switch you can nicely model. So you can write down an ODE where you say the change of the one protein, X1,
10:20
is determined by a self-activation term. And then you have a mutual inhibition term, right. So you have the other guy down here. The larger that is, the less increase of X1 you get. And you also have a decay term because these proteins are degraded over time. And if you do that in a perfectly symmetric manner for the two proteins and then you simulate it
10:40
or you look at the solution of this ODE system, you find that if you turn the parameters in the system appropriately, you see that you have three stable states of the system. One in the middle where both pure one and gutter one are expressed and then two other states where either the one or the other is expressed. And that's nice because it fits basically to our biological intuition, right,
11:02
or the idea we have of the system. You can start with a state in the middle where both states can be reached but then you tune one of the parameters and you either go to that or to the other direction. Right, so this model is a nice way to abstract or to describe what we think, how these progenitors differentiate
11:23
and become more functional in such a differentiation decision. All right, now in 2000, oh, I didn't put it there, probably 10 years ago, Arjun Raj and co-workers showed that mRNA is actually transcribed in bursts.
11:43
That means that the DNA with the mRNA, the DNA which codes for the gene, right, this is closed most of the time but then sometimes it opens and then many polymerases run over this gene stretch and produce many mRNAs at one time, right? So we get a lot of mRNAs which sort of burst
12:01
and then the gene shuts down again and it's closed. And to describe that actually, an ODE is not really appropriate, right? For that you rather need a stochastic description of the system. And so we came up in the master thesis at that point and tried to just cast this toggle switch into a stochastic formulation and then you have to write down the equations
12:20
for the stochastic system. Since it's complicated to solve it, the only thing you can do is simulating it and that we did. And if you do that, you can also simulate a typical fair trajectory and then you find that if you start in the middle in an undecided state, you see how such a simulation gives you the one or the other state and actually the attractive structure is a bit more complicated
12:41
as compared to the ODE case. But you get the same thing, right? Starting from one undecided state, you either go to the one or the other direction. All right. Well, we did also another model where we considered more players, but then we have 11 players and we can't model that stochastically. So we used a Boolean framework here
13:02
and it's not just us, many people in particular physicists like this idea of the toggle switch as a model for differentiation decisions and many people simulated that because it's such a nice system, right? There's a biological relevance to it and you can still simulate and understand what's happening. But over the years, when more and more of these models appeared,
13:23
it became clear that while we can model it right, we haven't actually proven that this mutual inhibition of pure one and gutter one is really decisive for this differentiation decision in blood cells. So actually this question you troubled our collaborator at that time,
13:41
Tim Schroeder, now he's at ETH, and he had a nice system to actually look into the system. And that's using time-lapse microscopy, so taking images, microscopy images every couple of minutes. And then he also had a mouse, again a model system, right? A mouse where at the pure one locus in the genome, you had a yellow fluorescent protein and at the gutter one locus,
14:01
you had a red fluorescent protein knocked into the genome. So whenever this gene is transcribed, there's a tag to this protein which now lights up in yellow or red, so you can watch the protein transcription over time, right? So what Tim then did is breeding mice with these constructs here,
14:21
sorting out the hematopoietic stem cells from the mice, putting them onto the microscope and doing time-lapse movies, right? Because the idea was, well, let's watch these cells differentiate and then we're going to figure out if pure one and gutter one are really doing it. Okay, so that's the data, the bright field images, how they look like. We take images every five minutes and you see now how the cells run around.
14:43
These are the stem cells, and at some point they grow and differentiate. And we not just have these bright field images, as I told you, right, we also have information about the yellow fluorescent protein. So this is tagged to pure one, so whenever we see yellow shining up, this is a marker for pure one being expressed.
15:00
And then we also have gutter one as a marker, so we know how much gutter one is in there, and it's a bit hard to see, I guess. Well, we also have CD 1632, this is another marker. And whenever we see this coming up in a cell, we know that this cell has adopted the GM lineage. You can't turn it back from there, right? And then we also see this larger cell, it's probably a megakaryocyte, and
15:21
you see that here in that cell, pure one is negative, but gutter one is positive, and CD 1632 is also negative. So that's basically the data set we have, bright field images every ten minutes, then every half an hour, we do these fluorescent images. And now in order to answer that question, right, if you want to get one already decisive, our idea was to to quantify protein levels in
15:44
single cells, right, so we want to know in one particular cell how much pure one, how much gutter one is in there, and we want to watch that over time, so we need to track the cells, we have to follow single cells over time. Now, to answer, to come up with solutions for that, you have to do what is now called bio image informatics, right?
16:03
You have this data set, which is a movie, just a stack of images. And to distill numbers out of that, you have to apply a couple of tools. And if they are not around the right tools, you have to develop them. And actually, we thought that it might be easy to get the numbers out of that, but it turned out that that's actually a difficult part of it.
16:21
And it took us a couple of years and a couple of PhD students to, to come up with a solution for that. I cut that part short, I guess, if you're interested in it, just let's talk afterwards. So we came up with a solution for segmenting the single cells in the movie, so identifying the single cells in there. Then Tim wrote an efficient software tool that allows
16:43
it to track the cells in the movie. And that's actually not done by the computer, but automatically. So a student sits there and follows the single cells over time. And there's, nowadays there's, there's a better solution for that. Single-segmentification, so whenever you have, once you track the cells we,
17:00
again, do a segmentation of the cell which gives you this mask here. And then we just sum up all the intensities in that mask. We also wrote a thing, a tool that allows you to efficiently go into the data set and figure out where the segmentation fails and adapted. And then finally, in all these movies we have a problem because in all the images we have more photons being generated in the middle of
17:20
the image for one photon you put in as compared to the edges of the image, right? So it's an uneven illumination. And to correct for the shading effect we came up with a tool that estimates the background of these images using the assumption that the background should be the same in all these images. And if we do that we can make the images flat and
17:40
compare cells all over the image. Putting all that together we were at the end able to track the single cells, as you see, right? And then when this, once the cell divides you, you track the sisters of that cell, and then again the sisters. And then at some point CD1632 is coming up in the fluorescent channel of these images and the, the guy who's tracking that annotates that and says, okay,
18:03
now the red cell from this time point on we know that it's a GM. All right so what we got after applying all these tools to our data is these kind of trees, right, now time on the x-axis. And you see that there's a mother cell dividing sisters, sisters, and
18:20
so on, so on, so on. At some point you might lose the cell because it's running out of the, of the image. Some cells might just die. And then at some point you're not able to track any further because it's so dense and you can't figure out from one time point to the next, which cell is which. Okay, having that we also have the information about how much pure one and gather one is in the cells, right?
18:41
For example, now I put only one cell here. In that cell gather one is rising, pure one is rising. Then at some point the cell divides and then you get half the amount of pure one in the sister cell. And then it's rising again and so on and so on. And we did that for like roughly 80 trees in different experiments. And we had these trees and then again we wanted to answer this question, right?
19:02
Is pure one and gather one decisive for the niche choice? And if you look at that trees even many of them and for a long time you realize that just looking at the trees doesn't help you to answer this question because you don't really know where to look at, right? The data structure is complicated because you have these trees in there. You have a lot of noise in it in the measurements but
19:21
maybe also in the, in the process of trees transcription as I told you, right? And then you also have the cell cycle, right? That always the proteins half at each time the cells divide and assist us. So just by looking at the trees it's, it's really not, we're not able to answer this question I guess.
19:40
So our idea was to from these trees figure out where the decision for a particular differentiation step occurs. And then once we know where it happens we can look into that time point and compare it with the mutual inhibition model and figure out if that model is actually true.
20:00
Okay, so the second part of this talk is called retrospective inference of choice because from the trees we want to figure out where the decision happens. Now how can we be that done? Let me use a little analogy. So this is time. This is my group in July 2015. That guy over there, over next to me is Michi Strasser, the guy who, who implemented that stuff I'm going to show you.
20:23
Michi went for a postdoc to Seattle. And now if I would see him in Seattle, so, okay, in, in the parlor of, of stem cell decisions, you could say he's making a, a state change, right, from the ICB where he worked before, to the ISB where he works now in Seattle. So if you would, would see him at at ICB in 2017, right,
20:43
you would not be able to figure out when he made the decision to go there, right? Could have been two years ago, four years ago. We don't know. But if Michi would divide, as the cells in our movies do, right, and you would see him and you, and you would have a decision before the division of him, then you would expect that he shows up in Seattle at,
21:00
at correlated time points, right? There might be a bit back and forth in the middle, but he would show up at the same time points. If the decision happens after the division, right, then you have uncorrelated time points of him popping up in Seattle, right? So the idea is to use the tree structure to figure out where the decision for a particular event happens. And of course, in our case, the idea comes from looking at the trees,
21:22
because if you look at this marker, CT 1632, which is the marker for the one particular lineage, you see that, that is coming up in a kind of correlated fashion, right? It's not all over the place. It's not somewhere there and somewhere there. So it's mostly happening here in the fourth and fifth generation. And that's a pattern we see in all the trees. So, the idea is to model this differentiation process.
21:43
And we basically have two steps in here. The one is, we call it a point process, right? We just say, at one time point, the set just decides to differentiate. And then, this probab, this probability to differentiate might be constant, or we, we allow it to be a function of time, but in a function. And then once the deci, the decision has been made, we say,
22:02
we start a stochastic gene expression process, and then at some point, we hit the, the limit, the marker where the marker can be detected in the movies. And, and that's basically the expression of, of the marker as we, as we see the experiments. So, let's visualize it here with a tree. So, this cell here decides to differentiate, and
22:20
then this process starts, a stochastic gene expression process. And some point, you hit the detection limit, and then you say, okay, here, this marker turns on the 1632 marker, and then the other cell a bit later. And this happens in the system as well, and again, it takes a while. So, you have a differentiation process, and you have a delay process, and these two things are parametrized with a number of parameters in our model.
22:42
Now, we have this model, and we have a, we have parameters in there. Let's call it theta and eta, and we want to figure out what these parameters are. So, let's first test if we can figure them out in a simulated test case, case, case, right? So, what we do is, we simulate trees, which look exactly, which look a lot like the data we have.
23:03
And if you simulate it, we know, of course, when the differentiation happens, and how long the delay is, and when the marker comes up. And then we forget about the mark, about this delay, and just see the marker as we do in the experiments. And now we wonder, from a number of these simulated trees, can we infer back these parameters of the model, right?
23:21
And we tried that, and for that we need the likelihood, so the likelihood of observing a particular set of trees, given these two parameters. And the way we set it up, it's a sum over, well, first, if you have this observed tree, there are many different possibilities how this, how this can come apart, right? You can have a, a early decision and a long delay, or
23:42
you can have a late decision with a, with a short delay. And, and the length of these things are determined by the parameters of the model. And then, each of, each of these hidden trees, we now decompose into a couple of subtrees, which we can easily model. And we use a graphical model representation for these subtrees to figure out what's the most likely parameter for
24:00
a given length of delay, for example, and correlation of two systems. Now, we put that all in in our likelihood function and fit data with it. And first, we, we simulated this to make sure that the algorithm works. We find that from simulated data, we can infer back the right parameters. And then we apply to the real data, and actually, that's the data we used here.
24:21
So you see, this is all the trees that make this GM lineage, so the one part of the two images. And you see that the data set's quite heterogeneous, and that comes just from the fact that the, the biologists who tracked it, they sometimes track the whole tree because they want to know what's happening there, sometimes they only track one single cell, right? But the nice thing is that we can put it all into our, our likelihood
24:42
estimation because also that here will give you some information, right? About how long the processes are. So we put that in, let's zoom in here. Let's a couple of those trees we observed, and then we figure out for each of the trees, what's, what are the most likely, the most likely parameters, and then from that, what is the most likely time point of differentiation.
25:01
And this I show you now here, so the step from black to gray, that's the time point where we believe the decision has been made. And what you realize is that our algorithm comes up with relatively long delays, right? So from the time point of differentiation to the marker onset, it takes five to six generations in all these trees.
25:22
Now, what we can do now is look in what's happening there in our real data, right? So we go a long time, and now we pick one of these branches. And now we look at the pure one dynamics, so the one of these two transcription factors, we, we check what's going on in there. And that's how again such a time course looks, right? So you see that pure one intensity rises, is halved, rises, is halved, and
25:44
so on and so on, at some point the marker comes up. And this point over there, this is our predicted differentiation. This is when our algorithm believed that the differentiation has been made from the correlation structure in the trees. Right, we divide the intensity by the area to get a bit of a smooth curve here.
26:01
And then we fit each of, for each single cell, we fit these concentrations with a straight line which is roughly an estimate for the, for the production of pure one in that single cell. And then we can plot this slope of the inner fit which is production of pure one basically, and we can plot it for the, for the cell where we leave that the decision has been made, this is generation zero we call it.
26:22
And we compare that to one duration before and one duration after. And we also compare it to the generation where CD 1632 is, is definitely there. And what you can see is that pure one production does not change at this time point where we think that the decision has been made, right? And actually if you look at the model you would expect that, you know,
26:41
when this toggle switch tilts, right, and, and pure one is winning over then at that time point pure one is, is, shoots off, right? So from looking at it we, we thought well that's not what you would expect from the model but of course it's not, not enough to reject the model, right? However, we thought what we can do now to really reject the model is just
27:01
taking this toggle switch model, implementing it, simulating it, right? Exactly the same number of, of trees we had in the experiment and then checking what, what we get from it, right? So we simulate trees here with the parameters we inferred. And then from all these trees we look again at these time points where we think that the prediction happens, actually in that model we don't know
27:23
where the, where the decision happens because in this toggle switch thing, right, it's a, it's a, it's a delicate balance between the two factors. At some point it tilts, right? But where exactly the decision happens in the tilting process you don't know, right? So we also infer for, for those trees where we believe the decision happens. And then we look again how P1 is developing there.
27:43
And for these simulations you see that at the time point where our algorithm thinks the decision happens, there you see indeed and significant increase of P1 production from a level which is around zero to, to a higher state. So in the model where the, when the balance tilts, P1 is highly produced. In the data we don't see that.
28:01
And this leads us to the conclusion that the P1 dynamics we see in the thing it says is inconsistent with the mutual ambition of Lynch choice. So now we could say, well, this is a bit there's quite some effort for rejecting a model, but, well, as, as you know, right, that's all we can do. Or that's the best we can do. We can reject models and say, no, this does not fit to the data we have.
28:24
And basically that's what we did here. Okay now the third part of, of this story is now, questions to that actually. Interrupt me whenever I'm, if, if I make myself more clear.
28:41
So so this was the, the, the reconstruction of the decision time point. Now that we know that the decision happens before the marker comes up. The question is, can we predict the fate of a cell before this marker is actually visible, right, because we know that the decision should happen four or five, six generations before, maybe we can predict what, what's happening.
29:02
And the idea is, well, what do we have? We have the morphology of the cells, and we have the speed of the cells, because we, we watch them in the sprite field three minute intervals, right? So again, we have to identify the cells in the sprite field channel, and then let's apply some machine learning again, as my, as the speaker before me did, right, because that might help us to predict what the cell will become.
29:24
I, I guess our problem is a bit easier than, than your clinical problem with treatments and, and fine results. So I hope there are no glitches in there. Single certification, so that project was done by, by Felix and Flo. And I showed you before that, I guess, single certification in bright field.
29:40
We adapted to that, and it works, so we can identify the single cells over time. And then once we have that, we can cut them out, and we get these little patches of single cells. And since we have many of these cells, and we have a high time resolution, we have roughly two million of these patches. And now you can think that two million, that's a high number, let's train a machine learning model with it, that should work.
30:02
So supervised machine learning, before I come to that, so you see now, you can order them along the track points, right? So you can follow the single cells in bright field. And you see that they are a little bit smallish at the beginning, and that they grow a bit, and you see if you look at the cell size, that the cell grow, divide, grow, divide, grow, divide.
30:21
And again and again, at some point, now they turn on the CD62, and you see that this mark comes up. So at that time point here, we know that this cell is a GM, and we call these cells annotated, and all the cells before, we know that they return on the mark at some point, but they haven't yet, so we call them latent. Okay, supervised machine learning, well, we use a convolutional new network to
30:44
that, and since this hasn't been introduced before, maybe I do that briefly. So I mean, we all agree, I guess, that there are mathematical, or mechanistic models, and statistic models, statistical models, so that the CNNs are shooting that side, right? So a very short introduction to supervised machine learning,
31:01
I hope I don't bore you. Supervised machine learning, at least for the images, right, work in that way that you have different classes, you have a training set, and you have a test set, and you take an image, you extract features from that image, and you use that features for the classification, right? For example, you have this image of this flower here, you extract the petals,
31:21
and then you calculate petal width and height, for example, right, and look at different flowers, you see that they differ in these features. So if you look at petal width of these different classes of flowers, you see that you can use that feature to discriminate these different classes, right, classification performance, however,
31:41
is then highly dependent on the feature extraction. So you have to be able to extract the features in the image, and you have to know which features to extract in order to make this classification work. Now, like seven years ago, the idea of convolutional neural networks has disrupted the field of image, of computer vision, actually by the idea of
32:02
making this feature extraction obsolete, or not making it obsolete, but integrating it into your learning process, right? So these convolutional networks, they are based on the idea that the network itself learns the optimal features that you need in order to, to classify your problem, okay? So it's not longer you who segments the objects, calculates features, but
32:23
it's the network itself. And how is that done? So these convolutional neural networks have a couple of these layers, convolution layers, pooling layers, and so on. And the convolution layers basically are simple, right? You just take an image, and you just multiply a part of that image with a filter, and here's my filter 101010101, and with that filter,
32:43
you move over the image, and you just multiply everything with a, image content, and for example here you get a four, right? And then you do it again, and you just move over the image with a filter, and you get a new image that is convolved with a filter you used, right? And of course, there are many filters, you can have an identity filter or
33:03
an action filter, sharpening, I mean, or that is in our iPhones, I guess, right? You can use that and run over it, and you also have these non-linear rectifying units here, which help you basically to learn. And then you have this pooling step where you say I now look at the image
33:23
from a more coarse-grained perspective, so in that part of the image, I only take the maximum and also in the other parts, so I shrink my image size down, and if you now put all that together, you can say that the first part of these CNNs is the feature extraction. So in here, the network learns which filters it has to use in order to
33:43
extract features from the image, and then the second part of the image is cluster classification, right, so you figure out what feature it is, and then at the end you classify it. Right, so we use exactly that structure for our problem where we have patches, and we have the notation if that patch will become either the one class or
34:03
the other, right, GMP or MEP, and then we put that in, and then again you have this feature extraction part here with the convolution max pooling layers, here the optimal features are learned that you need to extract features that allow you to classify your patches accordingly.
34:20
So you take a training data set, and for each of these patches, then you get a patch lineage score, which tells you how much the network believe that this patch belongs to the one or the other class. Now we also add displacement, that's just how much did the cell move in the time between the last image and the image I do now, so
34:42
the speed of the cell basically, and all these patch features now we use. And since we have tracking and we have images every five minutes, you see that we have a lot of these patches for one cell, and we can just average over the next patches to get a cell cycle average score. And while the patches are very noisy, you see this patch score for
35:04
one cell that's relatively robust. So this allows us to put one score to a cell, so in that case here, this cell here gets a score of 0.7, and that means that our network believes that this cell is a GM cell and not a Maggie cell.
35:22
All right, this everything is in one approach. The other approach is you can try to put that into a recurrent neural network and use the temporal information. We tried that as well, but it didn't give us much of an advance. Now, let's see what we get from it. This is our, this is all our data, these are all the experiments. These are the same trees I showed you before in the one color, but
35:42
now also the other color, so not just the one branch, but also the other in there, and we used two experiments for training. And we tested on the third experiment. Actually, these experiments are quite diverse, quite diverse. I mean, they haven't taken different years from different PhD students in the lab. So there are variations in the lightning and how the cells are treated.
36:03
And in order to do that, we make sure that we don't overtrain our model here. We do it in a round-robin fashion, so we do it three times, and we get these true positive, false positive ratios. We calculate the area on the curve as a measure for the goodness of a prediction, and now we put in the end, so
36:21
we train on the annotated cells, those that have the marker already. And then we test on the annotated cells in the third experiment. And you see that we get AUCs for these annotated cells above 0.8, mostly, and that's quite okay. But the more interesting thing is that if we now take a cell that hasn't turned on the marker yet, these latent cells, and we do the classification, we find that up to three iterations before the marker comes up, we get similar AUCs.
36:44
So we are able to predict the cell fate up to three iterations before this marker is visited. And since cells divide, these blood cells divide every 12 hours roughly, we can go 36 hours before the conventional marker has come up to predict what the cells will be.
37:01
Now, what can we do with it? Well, one idea would be to not do an experiment where you, you put the cells in, in, in culture and watch them differentiate, track them, and then come with the trees, but to stop the experiment in between. When they haven't turned the mark on yet. And then do maybe a single cell RNA-Seq experiment, where you profile the expression of each of these single cells.
37:22
And then use our use our prediction to separate the cells into two classes, and then figure out which factors actually are differentially regulated at this early time point in the experiment, right? And that might tell you if not you want to go to the other one, which other factors, which other factors are involved in this differentiation process.
37:41
Right so Fabian Thijs who was initially invited here at the meeting, actually my boss, he would give a talk about single cell RNA-Seq because that's what he's doing all the time. You can imagine that things, you can imagine that if you do an experiment where you measure the mRNA expression in a bulk population of cells,
38:00
million of cells, right, you get 20,000 data points basically, which is expression for each of the genes we have. If you do it now in 10,000 single cells, you get a matrix of 10,000 by 20,000. And these matrices are faced with at the moment, right, to make sense of it. It's a high dimensional problem, it's about a couple of gigabyte in the raw data.
38:24
And Fabian and other people are developing methods to be able to make sense of this large data set, clustering the cells, finding trajectories in development processes, and basically what's he doing. Right so the last part is clinical application of what I just showed you.
38:43
And the idea here came when working together with clinicians in Munich who actually gets blood samples of patients suffering from acute myeloid leukemia, and diagnosing leukemia is still based on this morphological assessment of people who are trained to do so. So if you don't feel well, your doctor will draw some blood, and
39:03
then this is smeared out on a glass slide, and cytologists look on the glass slide on the microscope and count cells in there. So if we're able to recognize single cells, it might be a nice application for algorithm to do that in a clinical setting, right? So these are the bloods, the blood smears that are assessed by cytologists.
39:23
And we took 100 of those blood smears from AML patients, and then from individuals which do not suffer from AML. And we digitized these samples in a scanner basically. And these guys are the red blood cells here, and we are more interested in the leukocytes.
39:41
So the cytologists go through these leukocytes, count them, and classify them into roughly 18 different cell types. And that's exactly what we did. So we, to get a training set, right, we asked the cytologists please mark 100 of these cells in each patient and classify them into 80 different classes. And then the idea was to again train a CNN with it and
40:00
see how well it performed there. So after scanning the stuff and after the cytologists doing the annotation, which is not that easy because they're busy with clinical routine of course, we ended up with 18,000 of these things, leukocytes. And then we wondered, so actually, sorry, this is a scheme of these 18 different classes.
40:24
So you know these 18 different classes we asked the, the cytologists to put the cells in, you can put them in this kind of hierarchy. So here are the, the, the good cells, so to say. Here are the, here are the mature cells, here are the immature cells. Some of the immature cells are allowed to show up in your blood, but
40:41
these blasts here, they shouldn't show up in your blood. If, if, if you recognize blast cells in your blood, then that's a clear sign of acute like acute myeloid leukemia. And now the task was to identify these blasts from the other cells, right? So that's what we ask our classifier to do. And if we train the classifier with these cells and then figure out how well we
41:00
can do that, you see that our sensitivity and specificity is pretty high, 99.2 area on the curve. It's just a number, doesn't tell you much in order to get a better feeling of what that means we asked a second cytologist to just do exactly the same what our computer did, so we showed the cytologist the same images of a test set that the computer got and do against classification into eight and
41:22
different classes, and you see that the human cytologist performed roughly as good as our as our approach, so which, which is, I guess, good indication that this can be used in third applications in the clinics. One last thing you should do when applying these CNNs, you should check whether you learn real information or just crap.
41:44
And one of the ways to do that is using saliency maps which tell you which of the pixels you have in the image are actually important for the final classification, and we did that here, and you see that for the monocytes, the computer considers those pixels inside the cell where the nucleus is as important, similar to what the, for,
42:01
what the cytologists think is important there. Same for their classes. Okay, I showed you that image computing can distill numbers out of these images, right? I showed you that human dynamics is inconsistent with the toggle switch model and morphodynamics of single phase is predictive of image choice.
42:22
Well, I skipped that. I want to thank the people involved, basically who did the work, collaborators, my group at the moment, funding by the end. Thank you very much for your attention. Thank you very much for this presentation, so there's time, one minute for.