We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Detecting and Solving Mammalian Phenotypes in Real Time

00:00

Formal Metadata

Title
Detecting and Solving Mammalian Phenotypes in Real Time
Title of Series
Number of Parts
340
Author
License
CC Attribution - NonCommercial - NoDerivatives 4.0 International:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Even when empowered by massively parallel sequencing technologies, genome-wide association studies usually fail to identify most causes of human phenotypic variation. Genetic mapping is the critical element lacking in such studies. However, mapping is readily undertaken in mice, in which mutations are induced in the germline on a known genetic background. Once a slow process, mapping and data assessment are now fully automated by high-speed statistical computation. When a phenotype is detected within a pedigree, the mutational cause is known simultaneously. Mapping can unambiguously identify genes with non-redundant function in immunity, behavior, metabolism, development, and most other phenomena of interest. The number of genes damaged and phenotypically tested in the homozygous mutant state can be monitored to measure saturation. Within a relatively short time, we have been able to survey 36% of all genes for their effects on immunity. In an array of immunological screens, we have sought to detect genes necessary for the development and function of both innate and adaptive immune responses. To date, 721 mutations in 483 genes have scored in these screens, and 151 mutations in 108 genes have been validated by targeting. Approximately 40% of the genes discovered were deemed novel. By screening to more modest depth (~8% saturation) we have sought to identify the key pathways defective in IgE-mediated allergic disease. Here 19 mutations in 18 genes have been found to either increase or decrease IgE responses, and all but one of these genes are associated with class switch recombination or with B cell receptor signaling. Many of the mutations causing allergy in mice operate in a semi-dominant fashion, suggesting heterozygous lesions in the same pathways may be common causes of allergy in humans. Other screens have revealed new causes of obesity, type 2 diabetes, lipodystrophy, cancer susceptibility, disordered bone formation, hypertension, and renal disease. Even the receptor for a chemical agent causing an innate fear response was discovered through this automated approach. Future emphasis will be placed on the suppression of disease phenotypes. This will identify modifier loci, which will stand as candidates for drug development.
3
Thumbnail
1:06:33
16
Thumbnail
48:23
165
Lecture/Conference
Lecture/Conference
Lecture/Conference
Lecture/Conference
Lecture/Conference
Lecture/Conference
Lecture/Conference
Meeting/Interview
Lecture/Conference
Computer animation
Lecture/ConferenceMeeting/Interview
Lecture/Conference
Lecture/ConferenceComputer animation
Transcript: English(auto-generated)
Well, thank you very much. It was now quite some years ago that I became fascinated by a certain strain of mouse called
C3HHEJ, which couldn't respond to lipopolysaccharide. This was the result of a spontaneous mutation that became fixed in this sub-strain of mice in the early 1960s, and I felt it would be important to find what mutation
had affected these mice, in that way we might understand the innate immune sensing mechanism for at least one molecule, and by implication, maybe others. And indeed, finding the molecule in the C3HHEJ mouse did reveal a new family of innate immune receptors in mammals.
But it wasn't easy. It took about five years to go from phenotype to finding the causative mutation. We went through the usual steps of first genetic mapping as depicted up here, then when we had confined the mutation to a small space on the chromosome, we went through a physical
mapping phase where we cloned all of the DNA across that interval, and then we had to search for candidate genes and finally examine every one of them until we found a mutation. The interval we worked on was about 5.8 megabases. We thought at the time it was smaller, but it turned out to be simply enormous.
Nonetheless, it was a thrill to finally find that mutation, and in the year 2000, hoping to solve other immunological questions, we began to be proactive and to go out and create new phenotypes in mice using a germline mutagen, ethylnitrosourea.
This mutagen can be injected into male mice, and it will affect their spermatogonia. The mutations are bred then from the G1 founder, that's heterozygous for these mutations, to make daughters, which are then back-crossed to the founder, and in the G3 generation,
there can be homozygosity for mutations that affect immunity or any other biological process. And I show you simply here a number of visible phenotypes, all of which we eventually solved. Over the next 11 years, while I was at the Scripps Research Institute, we solved
about 100 phenotypes with the old-fashioned positional cloning methods that were gradually empowered by better and better sequencing technology, but still the process was very slow. The rote procedure that we had on finding a phenotype was first to generate a homozygous stock,
then to outcross the mice to another strain which had molecular markers that would let us map and confine the phenotype to a certain part of the chromosome, then you'd back-cross to the mutant stock and genotype and phenotype the F2 animals to establish a critical region. Finally, you'd sequence one homozygote to identify hopefully a single mutation within
the critical region, and that one could almost always demonstrate was causative. But this could still take years, and only a few phenotypes could be tackled simultaneously even in a relatively large lab.
The situation has changed over the last four years, at least in our laboratory, and now when a phenotype is detected, we usually know the mutational cause unambiguously at the very same time. And some of you may not know how this is done, so I'll tell you quite briefly. As ever, we mutagenize the mice, we make G1 animals just like before,
but these days we whole exome sequence every G1 mouse up front, and that way we find all of the induced mutations that can be transmitted to the G2 and G3 generations. In the old days, we didn't really know how much damage we were doing to the genome.
Nowadays, having done many whole exome sequencing jobs, we know that the mean number of coding splicing changes is 60, although there's a fair amount of variance involved. If there are greater than or equal to 30 mutations in a G1 mouse, we deem that that mouse
is worthy of breeding and examining the pedigree for phenotypes, and we do so, and at the same time, we order a set of primers that let us genotype all of the descendants to know their zygosity at every mutation site that we've induced.
These days, we make quite large pedigrees with 12 to 15 G2 daughters and with about 50 or so G3 mice, and all of these mice are genotyped using ion torrent sequencing, a simple benchtop device so that we cover every mutation site in every mouse in the pedigree.
The data could be displayed this way, where we are arraying mice along the top axis and along the vertical axis, all of the different mutation sites, and we cover the entire pedigree this way. It can be even more extensive than shown.
Once that is done, the mice are released whole pedigrees at a time into a screening pipeline, and these days, on average, every mouse is examined according to 86 different phenotypic tests, and these tend to center on immune phenomena, but some of them are extra immune,
involving neurobehavioral function, for example, blood pressure, heart rate, bone density, and the like. As soon as a phenotype is registered and the computer already is aware of the genotypic information, calculation is immediately triggered to identify
the causative mutation out of the 60 or so that exist in the pedigree. As of yesterday, we had looked at 144,608 coding splicing mutations that had been fed into that pipeline, and all of those mutations were examined, most of them
many times in the homozygous state. The mutations were parceled out among 85,200 G3 mice from 2,645 pedigrees, and a total of more than 12 million tests of the null hypothesis
that this mutation has nothing to do with this phenotype were performed. Usually that hypothesis is not rejected, but occasionally it is, and that tells you that there's a strong putative causal relationship between a mutation and a phenotype, and I'll show you an example of
that. So far this has happened to our satisfaction 1,415 times, and we have named these alleles which fall into more than a thousand genes. Most of them are immunologic phenotypes, and about 40% of those immunologic phenotypes that we observe are new.
About 1% of all the mutations we've produced are detected in one or more of those 86 screens, and on average each mutation that's made registers in over seven phenotypic screens, because it's the case that very often a mutation will disrupt multiple processes in the mouse.
We've also developed a way to very accurately infer saturation as we go along. This was a big question for us, and eventually we solved it, and we can say with confidence that we've severely damaged or destroyed 35.8% of all genes in the genome while keeping the mutations under
surveillance and examining the mutant alleles twice or more in the homozygous state, which usually is adequate for mapping purposes. Let me give you an example of how the mapping works. We did at one time find a light-colored mouse, several of them actually, in a certain pedigree,
and we also found an immunological mutation, and you can very clearly see the pheno variant. The wild-type strain is black in color. If we map using the visible phenotype as our phenotype of interest and use a simple binary assessment, affected or not affected,
then you can see clearly if we look at all the mutations in the pedigree, plotted in spatial orientation, that only one of them passes the Bonferroni correction threshold, and using a log scale of p-value seems to be tightly associated with a mutant phenotype,
and we would judge that one probably to be causative. We see the same thing if we look at of 15 different immune phenotypes in which the pedigree is scored as well. All of those phenotypes, and I'm showing you here, the T-independent B-cell response
mapped to the very same mutation, so this mutation is causing hypopigmentation and a lot of different immune disturbances. If you were to mouse over that particular mutation, you would immediately see what the gene is. It's NCSTRN, nicastrin,
which is known to be an essential gene, and if you were to click on the point, you would find that the computer already had assembled quite a bit of information about this gene. We know what the likelihood of damage is according to polyphen2. We see where the mutation is. We can click on the gene model and see where the mutation falls,
in which exon, in this case, is a missense error. If it had caused a splicing problem, it would show you what that was. If you right-click on the mutation, then you see the phenotypic performance of that mutation in the T-independent B-cell response screen.
Here are homozygotes. These are heterozygotes. These are reference allele mice from the same pedigree. These are other controls, wild-type C57 black 6 mice, and it's this weakly semi-dominant relationship that gives the strong score in the linkage plot.
Of course, we tend to hit the same genes over and over and over again as we progress. There are only 24,977 annotated genes in the mouse genome, and we've made 145,000 mutations,
more or less. This gene has been hit in all of these different pedigrees, and when that occurs, the computer knows it, of course, and it sums all of the data from the different pedigrees into what we call a super pedigree. Here, strongly scoring again is this mutation, if we're looking still at the T-independent B-cell response. If you click on the super pedigree
point, and you can adjust the data to scale the size of the point for the likelihood of damage, you find again that several alleles cause this effect of a low T-independent B-cell response, and you can have added confidence that this really is causative of an immune disturbance.
We never fully trust any mutation until we've recreated it or knocked out the gene with CRISPR-Cas9 targeting, and so that's routinely done with anything novel that interests us. In this case, when we did the targeting, we found exactly the same phenotype, not only in the T-independent B-cell response screen, but these were yellow mice,
slightly diminished size. Also, we had all the other immune phenotypes. Of course, we're looking at thousands and thousands of candidate phenotypes, and we like this technique most because it's unbiased, but bias can creep in because those
who are doing the screening certainly are interested by associations that may be rather low in effect size or in p-value, but potentially of huge biological interest if they're true. Because we always verify by CRISPR-Cas9, we don't make mistakes with this process,
but it does tie up that part of the pipeline, and that's now the rate-limiting process for us. We needed an unbiased method to be applied to choose candidates for verification rather than merely hoping that this or that relationship would be correct, and we developed a machine
learning tool called Candidate Explorer that would do just that. Candidate Explorer uses approximately 30 different features and trains the identification of true mutations based on CRISPR-Cas9 results. So far, we've made about 400 complete jobs
where we targeted the germline equivalent of the ENU allele, and we determined yes, it was true, or no, it was not. And if one sets the outcome parameters for good candidate or above,
then one has a precision of about 92% and a recall of about 90%. This is the sweet spot that we tend to use. We usually don't investigate candidates that are below good in their rating by Candidate Explorer, and this means that something like 90% of our picks are correct.
If it falls below that, we usually don't deal with it. We wait for another allele to strike the gene and tell us that indeed we have something, or it may change our impression for the worse. Now, I want to tell you how we have used this process to look at something that's of quite broad interest. We have one screen here that is denoted papain. This is a screen
for allergic or atopic behavior in mice. Approximately half of all people suffer from allergies. Many of you here in the audience probably suffer from allergies and know exactly what allergies are, but for those who don't, I'll say that allergies range from being a mere
nuisance, having a stuffy nose around the family cat, to being really a life-threatening problem. Many allergens are proteases, and this is true, for example, of the allergen of house dust mites, also for most pollen allergens and in other circumstances also.
It's interesting that where allergens, as opposed to most antigens, are concerned, it's unnecessary to have an innate immune response to generate the adaptive immune response, which is overproduction of IgE. It's exaggerated IgE production and subsequent
degranulation of mast cells that creates the symptoms of allergy. The general tendency to develop allergic responses is called atopy, and really that's what our screen is about. We are measuring a tendency to develop an allergic response. Atopy is known to have a genetic
component, has been known to be so for many years. Allergies do tend to run in families. However, the loci that are affected have been unknown, at least for the most part. There is no knowledge that it's genetic, and yet obviously also you need the environmental
condition of the agent being present. So what we set out to do was to try to create an atopic disease in mice. We wondered, can we identify mutations that augment the IgE
response to papain, which is a protease, of course, a cysteine protease, very allergenic in humans and also potentially, we thought, allergenic in mice. The screen for IgE responses to papain was carried out not on all of those mice that
I showed you, but on a total of 16,115 G3 mice from 702 pedigrees, and it incorporated in those pedigrees a total of 38,585 coding splicing changes that fell into 14,600 plus
genes. In this case, we were looking very broadly and with light coverage, and we had relaxed standards about just what we should pick, which isn't the usual case, but we were doing a broad survey and wanted to see everything that might be a candidate. This is simply to show you that
giving papain does indeed elicit an IgE response, and it does so in a dose dependent manner, and the dose we chose to work with finally was one half milligram per mouse injected intraperitoneally. Also, you see that these mice develop a mild eosinophilia,
and these are some of the groups of pedigrees that were screened. You see in each case a comparison between wild-type C57 black 6 mice and the mutant G3 mice in the pedigree, and of course we're interested in these points that depart from the wild-type in the upward
direction, and of course those points don't make much sense unless you run them through the computer and find that they're ascribable to a single mutant locus. I want also to show you that the well-known innate immune sensing pathways, when mutated, really have no effect
on the development of an IgE response, no significant effect with MITEI88 knockout, where you have no TLR signaling. This would be with no signaling by the RIG-I Lykela cases or by CGAS, a double-stranded DNA sensor. No problem if you knock out NLRP3,
the main inflammasome, no problem with a CARD9 mutation, and that applies also to the development of an IgG1 response, which is a marker of atopic-type behavior. When we did the screen, we began to notice that there were some mutations,
like a mutation in STAT6 that we called stationary, and a mutation in I-kappaB kinase that we called impaired, which in the homozygous state, and to some degree the heterozygous state too, would prevent IgE responses to papain, and these mapped as shown here.
On the other hand, we did find mutations that would cause exaggerated IgE responses in the B lymphoid kinase, for example, and in the spleen tyrosine kinase, where we found a strong heterozygote effect. These were all recreated by CRISPR-Cas9 and found to be correct. You'll see this mutation later.
It actually didn't quite score using the screening parameters that we set, but we went back and tried it again anyway after recreating the mutation. In this screen, we used a rather weak p-value cutoff of 0.05. We only insisted on one
homozygous variant, as I mentioned, on a total of only 10 G3 mice per pedigree, and on candidate explorer ratings of potential or better, rather than good or better. In all, there were 14 low IgE response candidates in all. Those fell in STAT6,
folliculum, I-kappaB kinase beta, the IgE heavy chain, and 10 other undeclared candidates. On the other hand, we had high IgE response candidates in itch, in a gene that I won't mention today because my postdoc forbids me to,
and in calcineurin gene, the ITK gene, and all these others shown, as well as 92 undeclared candidates, 103 mutations stood as candidates in all. We went through, and we were quite selective about what we did look at. We didn't look at all of these other candidates,
but things that fell in the known signaling pathways we did look at closely, and those shown in green validated by CRISPR-Cas9 targeting, those in yellow did not, and so we don't need to look at those anymore. These we regard as really true, and we can say that there were
13 candidates in all for low IgE, 100 for the high IgE response. Two additional candidates were identified by looking at strictly dominant effects. These were homozygous lethal mutations in sick and PLCG2, so in all, where we're talking about the high response, we have
confirmed mutations in 10 genes. Nine of those genes encoded proteins comprising pathways that control either class switch recombination or B cell receptor signaling, and quite a lot
is already known about these processes. We know that, first of all, T cells have to differentiate from Th0 to Th2 phenotype or T follicular helper cells in order to stimulate B cells in such a way that class switching is biased toward IgE selection, and this involves
signaling by CD40 ligand and by the IL4 receptor. We had mutations affecting STAT6, which interacts with JAK3 downstream of the IL4 receptor, with IMPARED, which would sit downstream of CD40, and with the IgE heavy chain, which is needed for the IgE to be produced. We introduced other
mutations or looked at other ENU mutations in our collection that hadn't been screened and found, as one might predict, that ablating all T cells with an LCK mutation will prevent IgE production. A JAK3 mutation will do so. Core mutations that affect NF kappa B signaling
also would do so. On the overexpression side, we had all of these genes with the green up arrows. There were several genes which appeared to function to limit the conversion to Th2 or T follicular helper phenotype. We had mutations also downstream
from the B cell receptor, the IgE receptor, that would interfere with signaling, and in doing so, there's reason to think that they interfere with the death of IgE expressing B
cells at some stage, and that may lead to the expansion of IgE clones with the result that you have an atopic mouse. Remember, BLANCO didn't quite reach our cutoff criteria, but we decided to target it anyway, and it was included as well. Not everything matters,
however, because a mutation in PKC beta, which we thought might interfere, had no effect at all on IgE levels. Of particular importance, a lot of these mutations are dominant or semi-dominant, and I've marked each of those with a D. That's important because one might imagine
that any gene in our genomes can sustain severe damage as the result of a dominant mutation, and that's not so readily policed away, and one can measure the incidence of such mutations by referring to a large database. All in all, mutations increasing IgE responses were confirmed in 11 genes in pathways that
influence class switch recombination and BCR signaling, and in one gene unrelated to those pathways, which I don't think I mentioned clearly, but that was LC3B, a gene that is a marker of autophagy but isn't really required for autophagy to occur.
These mutations, those that fall in the class switch recombination pathways, appear to increase IgE by affecting proteins that normally limit T follicular helper, Th2 differentiation. Those in the BCR pathways seem to cause enhanced survival of IgE-expressing
B cells. The high IgE alleles of seven genes that have semi-dominant or dominant characteristics were identified, and six of those were found in primary screening. Remember, we've damaged
14.4 percent of all genes in the genome, and we might expect 42 genes in all within these pathways. If we look at the orthologs of those six genes that we know of that have semi-dominant effects, and we look at 36 other genes with average rates of heterozygous
mutation, we can make a loose estimate that something like 55 percent of all people should carry a damaging or destructive allele in at least one of that pool of target atopy genes. But we're not very hard and fast about this, and there are a lot of caveats.
We've looked, first of all, only at the priming stage of atopy. We didn't look at multiple exposures to the allergen in this case. There are lots of other genes that remain as candidates, and frankly, we were very biased when we saw that there were many genes in the
B-cell signaling and class switch recombination pathways that stood as candidates, so eventually we must go back and look at the other 92 candidates in the collection. If we did that, almost certainly we'd find that everybody has more than one deleterious mutation encouraging atopy, in which case maybe it's conjunction of several such mutations that's important.
Again, the one unrelated mutation that we confirmed operates by a mechanism that we still don't understand, and we're only at the beginning of saturation. Nonetheless, what I've hoped I've been
able to show you is that we could do comparatively light mutagenesis and still find a lot of mutations that bear on this phenotype, the IgE response to a classical allergen. I want you to keep in mind also that these mutations don't compromise survival. They don't
cause immunodeficiency either in the heterozygous state, and very likely in humans, as in mice, they could cause an atopic phenotype. Also, this kind of precise and immediate assignment of cause and effect is only possible in mice. It's certainly not possible by genome-wide
association studies, and mice remain the best genetic model available for predicting how humans will react to particular mutations. This work was the product of quite a large group of people who have to do the mutagenesis.
Then there are others who do the screening. Still others, the informatic work needed to parse the effects of mutations. A CRISPR-Cas9 group is essential, and the sequencing genotyping group is essential too. But all of the work was really done by one student, a medical student named Jeff Sorrell, who spent a great deal of time looking at 16,000 mice and measuring their
responses to IgE, and then putting the data together more or less as I've shown you. Thanks very much for your attention.