We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A Semi-synthetic Organism with an Expanded Genetic Alphabet

00:00

Formal Metadata

Title
A Semi-synthetic Organism with an Expanded Genetic Alphabet
Title of Series
Number of Parts
38
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Expansion of the genetic alphabet to include a third base pair not only has immediate utility for a number of applications, such as site-specific oligonucleotide labeling, but also serves as the foundation for an organism with an expanded genetic code. Toward this goal, we have examined a large number of different unnatural nucleotides bearing mainly hydrophobic nucleobase analogs that pair based on packing and hydrophobic interactions rather than H-bonding. Optimization based on extensive structure-activity relationship studies and two screens resulted in the identification of a class of unnatural base pairs that are well recognized by DNA and RNA polymerases. More recently, we have engineered E. coli to import the requisite unnatural triphosphates and shown that DNA containing the unnatural base pair is efficiently rep licated within the cell, resulting in the first semi-synthetic organism that stores increased information in its genome.
NucleotidePhosphateKohlenhydratchemiePentose phosphate pathwayCofactor (biochemistry)ProteinWaterÖlMixing (process engineering)Complication (medicine)Functional groupMoleculeMetalBinding energyRedoxActivity (UML)PharmacyMeat analogueDNABasenpaarungNucleotidePropinStyrolFood additiveStainless steelBiomolecular structureNucleobaseDNA replicationHydro TasmaniaPlant breedingAnomalie <Medizin>Cell membraneZigarettenschachtelChemical elementCell (biology)Hydrophobic effectPolyphosphateChemical structurePrimer (film)Process (computing)RNAEnzymkinetikChemistryAdenomatous polyposis coliOrganische ChemieBiopolymerTransfer RNAAmberNobeliumStuffingSingulettzustandArzneimittelforschungAmino acidHydrogen bondAntibacterialPhysical chemistryGenotypeMedicinal chemistryTPolymerStereoselectivityChemical propertyRiboseTranscription (genetics)DNS-abhängige-DNS-PolymerasenProtein biosynthesisStop codonSynthetic biologyRibosomeScaffold <Biologie>Sense DistrictAptamerEntropyEnzymeDeterrence (legal)Set (abstract data type)Computer animation
AreaBiosynthesisSurface scienceChemical structureMan pageTetrafluoroethyleneDoxorubicinGreen fluorescent proteinPrimer (film)HydrogenFunctional groupElektronenakzeptorStop codonRapidEnzymkinetikVolumetric flow rateChemical reactionAbbruchreaktionChemistryMeat analogueMixtureScreening (medicine)FiningsChromerzStuffingBasenpaarungDNA replicationPolyphosphateBinding energyWalkingWaterDistortionProteinElektrolytische DissoziationOctane ratingPhosphateNucleobaseCell cycleBiosynthesisZigarettenschachtelElektronenakzeptorPrimer (film)SubstituentGlykosideDNAWasserbeständigkeitElectronic cigaretteSense DistrictMethylgruppeHydrogen bondStructural steelNucleotideShear strengthProgrammed cell deathRiver sourceAromaticityTidal raceChemical structurePhysical chemistryAction potentialDeterrence (legal)DNS-abhängige-DNS-PolymerasenReaction rate constantActive siteHydroxylOptische AktivitätIonenbindungNaphthaSeafloor spreadingDerivative (chemistry)MethoxygruppeGene duplicationAusgangsgesteinStainless steelKohlenhydratchemieSulfurStyrolCobaltoxideSurface scienceAreaInjection (medicine)TuberculosisFluorineFunctional groupPharmacokineticsBakingActivity (UML)NucleosideHydrogenComputer animation
DNAAcetyl-CoA carboxylaseBiotinBasenpaarungChemistryLibrary (computing)Dye penetrant inspectionMan pageNucleotideStainless steelFunctional groupNucleotideGene duplicationElectronegativityGraphiteinlagerungsverbindungenAzo couplingMeat analogueBiomolecular structureBiotinBasenpaarungWursthülleDeep seaWine tasting descriptorsHigh-throughput screeningController (control theory)Chemical structureChemical compoundReaction mechanismPharmacologyNuclear magnetic resonanceDNAFood additiveChemical reactionSea levelOrganische ChemieAddition reactionStainless steelNucleobaseSingle-nucleotide polymorphismAdeninePolymerDNA replicationFatty acid methyl esterPolyphosphateBinding energyProtein domainConformational changeDrop (liquid)Chemische SyntheseMedicinal chemistryPharmaceutical industryHydrogen bondDNS-abhängige-DNS-PolymerasenNucleic acid double helixActivity (UML)Computer animation
Stainless steelTransportAdenomatous polyposis coliPerfumeMultiple chemical sensitivityPlasmidBiosynthesisPolyphosphatePlasmidAdenomatous polyposis coliPhosphateMolar volumeSea levelGrowth mediumConcentrateLactitolStuffingTidal raceConformational changeSpeciesSeparation processMitochondriale DNSBasenpaarungMitochondrionOrganische ChemieCell (biology)Chemical propertyNutrientDecompositionHydrogen bondDyeActive siteTransposonAzo couplingNucleosideCell membranePharmacokineticsTransportChemical structureGene duplicationPrimer (film)Meat analogueStainless steelController (control theory)Ballistic traumaDNA replicationWine tasting descriptorsBiotinOrders of magnitude (radiation)Setzen <Verfahrenstechnik>Protein domainHelixRiboseHost (biology)GeneShear strengthReaction mechanismPlasticArtifact (archaeology)Anomalie <Medizin>NucleotideAlpha particleIsotopenmarkierungSide chainPhosphorusMagnesiumInduktorIonenbindungPorinGene expressionComposite materialProlyloligopeptidaseCytoplasmaConformational isomerismWaterPyrophosphateTransformation <Genetik>Computer animation
PlasmidMan pageBasenpaarungRNASurvival skillsTransportCell (biology)Chemical propertyTranscription (genetics)Transfer RNAAbbruchreaktionGene duplicationReverse transcriptaseWalkingGrade retentionSynthetic oilIce frontGeneRiboseSunscreenInduktorPhosphateMass spectrometryMethylgruppeFunctional groupPlasmidPolyphosphateScreening (medicine)ChemistryStreptavidinMoleculeOrganische ChemiePharmacokineticsMeat analogueErdölraffinationSolubilityToxicityMitochondriale DNSBiotinAdenomatous polyposis coliArtifact (archaeology)Posttranslational modificationProteinAzo couplingOperonCell membraneStereoselectivityGenotypePressureActivity (UML)Controller (control theory)DNABiomolecular structureSet (abstract data type)StiffnessPhysical chemistryGene expressionBearing (mechanical)Octane ratingExciter (effect)TPEEKGrowth mediumDNA replicationConstitutive equationSystemic therapyChromosomeCell growthInitiation (chemistry)LactitolIntergranular corrosionTrace elementLocus (genetics)Computer animation
Diagram
Transcript: English(auto-generated)
So I'd like to thank the organizers for giving me the opportunity to come and talk at this meeting.
My lab is at the Scripps Research Institute and what we're largely interested in is, my group breaks out into three projects. All of what we're interested in from one perspective to another is evolution and trying to understand or manipulate evolution. And very much from a chemist's perspective. I'm trained as a chemist and my lab largely takes a chemical approach.
The three projects very briefly are one, trying to use concepts of evolution to design or identify and optimize novel scaffolds for antibiotic development. A very rigorous sort of chemical, physical approach to understanding the role of adaptive mutations to protein function. And what I'm going to talk about today are efforts to develop an unnatural
base pair with which to expand the genetic alphabet and eventually the genetic code. So, there it is. That's the genetic alphabet. This is the genetic alphabet of E. coli. And if you can't see it, let me blow it up for you. So I gave this talk in China two years ago and someone came up to me afterwards and said, you know, you have a lot of nerve showing the Japanese flag in China.
But of course, the genetic alphabet is just a long string of GCAs and Ts. To a chemist, DNA is actually a remarkably unremarkable polymer. It's really very simple. What makes DNA, of course, utterly unique and remarkable is selective base pairing. G selectively pairs with A and A pairs with T. Another material has that property.
All the diversity of life, all the diversity around us, in fact, all the diversity since the last common ancestor of all life on earth is encoded in a four-letter, two-base pair genetic alphabet. And when my group started at Scripps about 15 years ago, we were very interested in asking the question as to whether or not that alphabet and eventually the genetic code could be expanded.
So we already heard from my good friend Ichiro about his take on this. And we both started about the same time and we both sort of walked along together. And it's been becoming good friends with Ichiro. It's been one of the biggest, greatest things that I've been able to, that this project's led into.
And I really like Ichiro's work. In particular, what he talked about was using his unnatural base pair, DSPX, to evolve aptamers. And to get to a question that was asked earlier, those aptamers bound better than did the normal aptamers comprised of only G, C, and T.
So they did impart novel function that wasn't available in the natural genetic alphabet. These experiments, to my knowledge, are the first example of genetic, of biopolymers, unnatural biopolymers that were themselves evolved without the intermediacy of a natural biopolymer. So I think this was an absolute landmark study. And for me, it also really gave, provided the first real practical demonstration of the use of unnatural base pairs.
So that's Ichiro's in vitro work. What I'm going to talk to you about today, and what's been the driving goal in my lab, is to develop in vivo applications of an expanded genetic alphabet. And so if one's interested in that question, there's a variety of things you have to be concerned with here, showing this organism.
This happens to be an E. coli. He's gram-negative. Clearly there's a pair of plasm with two membranes there. And so there's a variety of things that you have to be concerned with. You have to have a DNA element with an unnatural nucleotide in it. You have to have available within the cell the requisite triphosphates of the unnatural nucleotides. You have to have a DNA polymerase that selectively synthesizes DNA containing that unnatural
base pair, of course, with high efficiency and high fidelity during a replication process. And then you have to have the same story with an RNA polymerase. You have to have an RNA polymerase. You have to have available within the cell the triphosphate of the ribose. And you have to then drive transcription with an RNA polymerase. And then, presumably, that RNA polymerase, that message can go out in the cell with the ribosomes and with tRNAs that are transcribed with the corresponding
cognate nucleotide to reconstitute the unnatural base pair during decoding, drive protein synthesis. And the two things I do want to say at this spot was that I want to acknowledge the work of Steve Benner, who was really the first to talk about expanding the genetic alphabet in a practical sense and develop analogs that were designed to be orthogonal.
And I also want to mention Peter Schultz's work, who worked on this part of the problem. And what Pete did was he developed orthogonal tRNA synthetase pairs, recoding for the amber codon and drawing on orthogonal pairs from Genashi and Mezei archaea that could orthogonally recognize the repurposed amber codon.
And I think that's absolutely beautiful work. In my personal opinion, I think it's a shame and I don't understand why he hasn't won the Nobel Prize for that work yet, because I think it's some of the earliest and best stuff of what we would call synthetic biology today. But I also want to mention from the outset that we planned on stealing Pete's tRNA synthetase pairs.
And instead of encoding them with amber codons, which allows you to incorporate one unnatural amino acid, we wanted to develop an unnatural base pair with which you could use to write a virtually unlimited number of new codons, hopefully for the incorporation of multiple different unnatural amino acids simultaneously into a protein, maybe in an evolvable context.
And I don't have time to go into this too much in terms of the sort of thing that one could do with that. Protein therapeutics, if you haven't noticed the revolution that protein therapeutics has had that's caused in the therapeutic field, then you haven't been following anything. Because 70% of the INDs at the FDA, the investigational new drugs at the FDA two years ago were proteins.
There's an absolute change in how people are thinking about developing proteins, yet proteins have only 20 amino acids. Now we can argue for the rest of the meeting whether proteins need new amino acids for new functions. I would argue they do. If you look at small molecule therapeutics, things like electrophiles are the
most common pharmacophore in a small molecule therapeutic, yet they do not exist in proteins at all, electrophores. Things like metal binding centers, things like redox centers, enzymes draw on that sort of activity by using cofactors for whatever they can, but that's an entirely different challenge to evolve proteins that have cofactors as well.
So the idea of being able to evolve proteins with things like electrophilic centers for therapeutic applications are the long term goal that drives my effort in my lab. So our effort, I want to introduce, and there's two things I want to introduce about the approach that we took to the challenge.
And the first was that we wanted to try to draw upon a different force than hydrogen bonding. Ciro also alluded to this earlier. And everyone's familiar with water and oil and the fact that they don't mix. The hydrophobic effects makes oil want to get out of water for a whole bunch of complicated reasons involving solvation and involving cavity size, involving entropy.
And also packing, obviously, between the oil-like molecules. And I'm going to lump all of that sort of under the rubric of hydrophobicity. And so all of our analogs were designed based on oil wanting to get out of water. So the idea was that two hydrophobic, two very lipophilic nucleobases would pair with each other during replication and not want to pair with a more water-like natural nucleobases.
And now if we're taking the H bonds out, in the early time we were very concerned about just the stability of the DNA. So that's why the analogs were all sort of large. So we were going to replace the cross-strand hydrogen bonding with within-strand packing interactions. So that's one thing I want to say. They're all large and hydrophobic and that's why.
Now the other thing I want to point out about them is there's a lot of them. So we did not want to take sort of what I view as sort of the traditional chemist's approach of very carefully designing an analog and then making it and testing it. I've always been very inspired by medicinal chemistry. About a third of my group works on medicinal chemistry.
And I wanted to approach the problem very much the way – is this yours? I think I just turned it off. I've always been very inspired by the medicinal chemistry approach. So we wanted to simulate the approach. What we wanted to do was not make one but make lots and then develop assays to analyze them and then use those assays to develop structure
activity relationships that feed back into the design effort and try to fuel that cycle to optimize analogs. And so there's a lot of different analogs here. One that I'll talk about in a few seconds is this propynyl isocarboxyral guy.
But this collectively is what we sort of refer to in the lab now as sort of first-generation set of analogs. So in terms of that SAR, the sort of assays that we used was of course just a stability assay where we synthesize the analogs as phosphoramidites. We incorporate them into DNA and we measure the thermal stability because the midpoint where 50% of the duplexes are melted, dissociated into single-strand, is in a rather complex way related to the thermal stability, to the thermodynamic stability of the base pair.
So we synthesize different base pairs, we put them in the same sequence context and we measure the effect on stability that way. The sort of much more important SAR was driven by these assays, the kinetics. In the early days it was all steady-state kinetics so we take a primer and a template again using phosphoramidite chemistry and incorporate specific nucleotides, specific position.
The primer would run up, typically run up right before it and then we'd look at the ability of different DNA polymerases to take triphosphates and incorporate them by extending that primer to incorporate the unnatural nucleoside triphosphate. We refer to that step as synthesis or incorporation because we're actually making the base pair or we're incorporating the triphosphate.
The next step, now it's unlike natural synthesis because you now have a primer that terminates with an unnatural nucleotide. So the next step we refer to as extension and that's the step where you incorporate the next correct nucleotide off the now modified primer terminus. So these two steps we would evaluate independently.
So like I said we synthesize a piece of DNA that terminated here and then look for its incorporation and then we would separately synthesize a piece of DNA that terminated as the primer at the unnatural nucleotide itself and look at the incorporation of the next step. So we would drive second order rate constants and actually be able to analyze the differences between the different analogs.
So by far the strongest SAR that came out of this first generation study was that large aromatic surface area very much facilitates the incorporation step but it makes the extension step very challenging. We were actually able to optimize this but we were never able to optimize that. So in collaboration with Pete Schultz and Dave Wehmer at UC Berkeley, we
solved the structure of one of those first generation analogs that I just described. Now this analog, this PICS-PICS pair, we refer to as a self-pair and I spent a lot of my time in my early days justifying the use of a self-pair. If you think that's weird, fine, today our best pair is heteropairs but just to give you a context, a historical context, a lot of our early efforts were focused on self-pairs.
So this self-pair happens to represent the most canonical of those first generation analogs that were synthesized very well but then terminated extension. So if you look at the structure shown here, you can sort of see why. So they don't pair edge to edge.
Now maybe you look at these analogs and you expect that but again remember we didn't design these to pair with each other. These just came out of an empirical study of synthesizing lots of analogs and looking at all the possible pairs. And these self-pairs are what came out as being sort of the most promising. But they cross-strand intercalate and in our mind's eye we immediately believed that this explained the SAR that we observed.
So during the synthesis, the incorporation step, this triphosphate, if you imagine this mode of binding also during replication in the polymerase active site, maybe this is the template and this is the growing primer. When this triphosphate is incorporated, it picks up all this beautiful packing interaction, gets out
of water which makes it happy and so that's what drives that very efficient incorporation step. But to the extent that the template is more locked down in the polymerase active site with more interactions with the protein and any distortion required to pinch down the duplex to allow that intercalation to happen, the majority of that distortion is going to be borne by the primer terminus which mispositions that hydroxyl group for
the next step for the nucleophilic attack on the next incoming triphosphate which is why the extension step was slow. So with that SAR, we sort of return to our design strategy and ask the question, well if these analogs, if the larger hydrophobic ones were prone to prevent extension, could we design smaller analogs that would not be prone
to intercalations and optimize their incorporation step and then have a pair that we might be able to simultaneously optimize both steps. So we again return to a very sort of med chem empirical approach where we synthesize lots of analogs. Of course I'm not showing the sugar and the phosphate for those mathematicians in the audience that maybe didn't know that something else was attached there.
I was supposed to be a joke. In any event, so again we were systematically examining lots of different analogs. Sorry? Okay I'll try not to. You could maybe close your computer.
So again systematically examining lots of different analogs and systematically putting on different halogens, different fluorine substituents, different methyl substituents and again driving the program very much based on the empirical SAR. Now the SAR that came out of this second generation analogs is a little more complicated so let me spend a second to describe it.
The single position on the nucleobase that was by far the most important was the position ortho to the glycosidic linkage. So this is the glycosidic linkage whether it's a C-glycoside or an N-glycoside, this position was by far the most important.
For the insertion step, the synthesis step where you're inserting the triphosphate against its cognate base in the template, it doesn't matter whether you're looking at a nucleobase in the triphosphate or in the template, you want that substituent to be hydrophobic. Makes sense because we're trying to drive this packing interaction, this hydrophobic interaction in the first place. Now the extension step, when you're looking at the nucleobase in the template, you still want that ortho substituent to be hydrophobic.
But the problem is you need it when it's in the now primer terminus, you need it to be hydrophilic. And the reason is, and we should have known this from the beginning, is if you look at any of the natural nucleobases, they all have an H-bond acceptor there at the same position, that ortho position.
And if you look at structures between the primer template and the plum races, plum races always donate a hydrogen bond to help orient that primer terminus. So you need to be able to accept that H-bond or you're going to force a desolvation. And so this seemed like a potential physical chemical contradiction. How could we simultaneously optimize both its hydrophobicity to pack and its ability to accept a hydrogen bond?
So at the time, I was fortunate because I had a very talented graduate student and postdoc who simultaneously ran two screens independently of each other. Two screens of 3,600 candidates each. And one screen, I'm not going to go into this too much for time, but one was just a gel-based screen where they looked only at the extension step.
Because that was the rate limiting step for most of our analogs. And the other one was a much more sophisticated plate-based screen where we took two plates and each of them were identical, except one got the unnatural triphosphate in addition to the natural triphosphates, where the other only got the natural triphosphates. And we stained with cyber green, which the signal depends on the strength of the amount of double-strand DNA present.
And so we looked for wells where there was strong signal, but in the sister well on the other plate, there was no signal. And that assay bakes into it both extension and incorporation and fidelity all into one assay. So from those two independent screens of 3,600 candidate unnatural base pairs, we
were pretty excited because both of them identified the same single unnatural base pair. And that's shown here. What we call MMO2, which is the second of a methyl methoxy series that was a second generation analog. And this five methyl sulfur isocarboxyral analog, which was a member of the first generation analog series.
Now notice the nature of the ortho substituent, that contradiction that I mentioned earlier. Sulfur at this position is more polarizable, it's more hydrophobic than is oxygen, but it's still able to accept a hydrogen bond. And an o-methyl group is simply a bond rotation away from accepting an H bond or preventing a hydrophobic methyl group for packing.
So at least in our mind's eye, we imagined that that was how the unnatural base spread solved that contradiction. This very quickly reinvigorated our design efforts, and within months we had identified this naphthal methoxy derivative as being a better partner for 5, 6. Now this was the first pair that we could really PCR amplify with high fidelity, and I'll talk about that in a second.
Since its discovery, I'm going to spend most of my time, the rest of my talk, talking about this pair, but since its discovery, we have found TPT3 as a better analog, as a better partner for NAM. And this pair will come back at the end of my talk to be important, but I'm going to spend most of the rest of the talk talking about this NAM 5, 6 pair.
So at this point, we could no longer use steady state kinetics to drive SAR, because the unnatural base pairs are virtually synthesized as fast as an AT base pair. And it's not because they're actually chemically equivalent, it's because they're both rate limited by product association. It's just a limitation of steady state kinetics, you only measure your rate limiting step. And all that that tells us is that we had now increased the efficiency of the chemistry
step to the point where it was no longer rate limiting turnover of starting material into product. So my lab has recently got a rapid stop flow injection system, so we're going to be doing pre-steady state kinetics to get to those numbers specifically, more directly. But even so, that will never be able to be fast enough to drive SAR, because it's a rather time consuming assay.
So we developed another assay based on just sequencing. So we would take a PCR reaction, take the amplicon, and give it to our sequencing facility. And they would take it and put it into a standard Sanger sequencing reaction. Of course, that Sanger sequencing reaction doesn't have our unnatural triphosphates in it.
So what you see is an abrupt termination at the position of the unnatural base pair. So if you ratio the intensity, the amplitude of the peaks prior to the unnatural base pair to those after, and you construct calibration curves of known mixtures of DNA that's synthesized with the unnatural base pair, and DNA synthesized with an AT replacing it, you can convert that ratio into a percent of unnatural base pair present.
You normalize that by the amplification level during a PCR reaction, and that gives you a fidelity, and that's the fidelity that I'll talk about now. So we incorporated the unnatural base pair into a lot of different sequences to look at GC and AT sequences to see if there was a sequence bias. Amplification levels were pretty high, and the fidelities per round of replication were pretty high.
Even sequences where there were two right in a row. Now in this case, there's an approximate here, because what we're looking at here in the assay is a drop on a drop, so that just becomes a little bit difficult to actually rigorously characterize that. So I should mention these little sequences are of course part of 180 more than I'm simply not showing you the rest of the sequence.
And what we're doing here is we embedded the unnatural base pair within a sequence where each of the three nucleotides on both sides were randomized. And what we're trying to do here is say, look, are there any sequences that are particularly prone to lose the unnatural base pair? Because then they would have an amplification advantage, and then we would see this erode. And that did not seem to be the case, so we were pretty enthusiastic.
But in order to examine that a little more carefully, we incorporated the unnatural base pair into a chemically synthesized piece of DNA. We then amplified that piece of DNA. We amplified it 10 to the 24 fold, diluted it out a million fold three times. And then during that PCR cycle of amplification, of course, what happens is some of the unnatural base pairs
are lost, so you produce a population where they've been replaced with a natural pair, and some are retained. In order to differentiate them, we put them through one more round of PCR where one of our analogs is attached to a biotin tag, which now of course produces two populations, one of which is a biotin tag which corresponds to the population that retained the unnatural base pair,
and one which is no longer tagged and corresponds to a population that lost the unnatural base pair. We then can take that population and subject it to Illumina deep high throughput sequencing. And so we actually ran the whole thing in parallel to a natural sequence with an AT present. And then every 10 to the 3, as I mentioned, we took out an aliquot.
And so all of those populations were analyzed with a minimal number of reads of 1.6 million. So the statistical analysis is pretty reasonable. And so here's the single nucleotide frequency data. So what this FRO minus 1 is, it's the frequency in the population that had the unnatural base pair relative to the frequency of the control population that did not have the unnatural base pair,
the frequency of an A at these positions relative to NAM. So this is A at the minus 1 position to the minus 10. This is the FRO minus 1 of A at the position plus 1 to plus 10. And correspondingly, the same for C, G, and T. This is the population that retained the unnatural base pair as a function of amplification.
This is the population that lost the unnatural base pair as a function of amplification. So the reason we used FRO minus 1, because FRO itself, if FRO is greater than 1, that means you have a bias for that nucleotide at that position. If it's negative, sorry, if it's less than 1, you would have a bias against the nucleotide at that position.
FRO minus 1 simply makes it visual in that any value that's positive means you have a bias for, and any value that's negative means you have a bias against. So clearly, here's our largest bias. You can see as you amplify out to 10 to the 24th fold, immediately 5 prime to NAM in the template, you see a preference for C. That's the largest single nucleotide bias that we have.
Now, of course, a single nucleotide bias isn't enough to tell you everything. Oh, sorry, to gauge you as to how important that bias was, a C at that position was present at 18.7% of the initial population. Now, of course, it should have been 12.5%, but these are the vagaries of phosphoramidite coupling efficiencies during chemical synthesis.
But nonetheless, that 18.7, only over 10 to the 24 fold amplification, only grew up to 24.7, a pretty small increase. Now, single nucleotide biases aren't enough to give you the whole story, because sequence correlations can hide biases. So for example, if you take G, C, A, and T and permute it every one, so G, C, A, T, C, A, T, G, and so on,
at every position of only those four sequences, every nucleotide would be present at only 25%. So it would look totally unbiased. But of course, it's highly biased, because there's only four sequences. So correlations hide biases that are not apparent at the single nucleotide level.
So we did a correlation analysis. So this is the population of the retained. This is the population that lost in the natural base pair. This is just a measure of the correlation. So we're looking now at a plot that maps the sequence against itself. So this peak here, for example, this peak measures the correlation here and here.
So what this tells you is that as you amplify the population that retained the unnatural base pair, a correlation grew in between the minus two and the minus one and the plus one and the plus two. So the flanking dinucleotides, same in the population that lost. There were correlations between the nature of those flanking nucleotides.
But those were the only correlations. Now, when we first got this data, we were a little confused, because in this population, the correlation seemed to grow in and then grow out and then sort of grow back in again. And I'll give you the answer. These correlation values are so small that what we're looking at is random fluctuations, just noise. But nonetheless, what this data tells us, that if we're looking for additional sequence biases,
we only need to consider the flanking dinucleotides. So if we did an analysis of the dinucleotide frequency, this time plotted as a unit circle, where F rel minus one is shown here. So a positive bias for something would correspond to a bubbling out of the data, and a bias against would correspond to a collapse off the unit circle.
So the only significant bias that we have, well, at least the largest bias we had is right here. So this isn't the population that retained the unnatural base pair and it's a bias for a five prime dinucleotide of CG. Now this largely, maybe the CG bias is a little larger than a CC and a CT and a CA,
but a CG is probably the largest. This largely simply corresponds to this bias for a C at the five prime position. And if you look at the actual numbers again, a CG was present at 2.3% of the population before the amplification and it was only present at 3.5% after a full 10 to the 24 fold amplification.
So we actually stated in the paper, and the reviewers allowed us to state, that this functionally in vitro was a functional third unnatural base pair because these sequence biases are actually less than some of those observed amongst natural sequences. So at this point we had sort of believed that we had demonstrated that we had a fully functional unnatural base pair. So maybe now is the chance for us to charge into our in vivo,
long term goal of trying to use this as the basis of an organism with an expanded code. But again, returning sort of the medicinal chemistry analogy, if you went to a medicinal chemist and you said, here's a compound that I have, I'd like to start a development program, so pull down $20 million from your pharma company and let's start the program. What the medicinal chemist will ask you is, well, what's the target?
And you'll say, well, who cares? I've got this great activity. And the medicinal chemist will never be interested. You'll never get interested in pharma with that data. And the reason is because the pharma industry has been fooled too many times by ghosts, sort of things that vanish as you try to track them down. So what they want to know at the beginning of a program is what's the target and can you understand the mechanism of action?
Is it a reasonable thing? Is it understandable? So the reason this was an issue for us is because we got a structure of our new base pair, this 5-6 NAM pair in collaboration with Tammy Dwyer at University of San Diego. Here's an overlay of 10 NMR structures. Here's the average structure. They're intercalated again. So not as much as the first generation analog was, but they're still intercalated a little bit.
And this was a real moment for us. We sort of realized our pairs are never going to do this. There's nothing to be had from doing this. They're not getting any H bonds. So the duplex is going to do whatever it has to do to sort of unwind and open a little bit and allow those nucleobases to pack on each other, which is the only route to stability they have. Now, in principle, that doesn't bother me, but of course the reason it does bother me
is because the question is why is it – I mean, I just tried to tell you that they're replicated well by different polymerases. But everyone knows that polymerases evolved to recognize a Watson-Crick pair. This looks like a mismatch. This looks like, if you're familiar with an AA zipper motif mis-pair where the A's interdigitate amongst each other. And everyone knows that polymerases evolved to select against them.
So are we chasing a ghost? So to tell you a little more carefully what I mean by that, So if you look at a GC or a CG or a TA or an AT, it doesn't matter, they all form the same structure, the very famous Watson-Crick structure. And it doesn't matter whether it's formed in duplex DNA or if it's formed during replication
by pairing a triphosphate against a templating nucleobase. They all look like this. They're all planar and they all take on this very canonical Watson-Crick structure. Again, ours doesn't look like that. Ours looks like this intercalated structure that polymerases are supposed to select against. So if you think about how a polymerase works, the DNA polymerase looks like a right hand. The template lays down like this.
And when a triphosphate binds, and only when the correct triphosphate binds, it induces a large conformational change of the fingers domain down over the palm and thumb domain. Now that conformational change is supposed to result in a very tight closed complex that rigorously selects for the structure of a Watson-Crick pair. So we had two questions when I approached my friend Andy Marks
about collaborating Andy Solve's crystal structures amongst other things. And we decided that we would try to solve structures of polymerases synthesizing our natural base pair. And we had a couple of questions. Number one was formation of our hydrophobic pair sufficient to drive that same conformational change of the polymerase? And number two, if it was, what is it recognizing?
So here are just the key structures. Here's the binary complex of the primer template bound to the polymerase. And the only part of the polymerase I'm showing is the O and the O1 helix. Those are at the base of that fingers domain that I mentioned. And here's in the binary complex. So in the binary complex, NAM, or analog in the template, is flipped out of the developing duplex.
Now when we add our triphosphate and solve the structure for the ternary complex, what you see is that our natural analog in the template flips back into the developing duplex, and you get this large conformational change. Now to see that conformational change, I'm overlaying the binary and the ternary structures here. There is that conformational change of the fingers domain. Now to convince you that it's exactly the same as the conformational change induced by a natural base pair,
this is an overlay of the ternary complexes synthesizing a GC pair, and that's synthesizing our unnatural base pair. So you see at the secondary level, they're absolutely superimposable. And if you actually look at the side chains, and even the bound waters and magnesium ions, they're absolutely superimposable. So the first question, the unnatural base pair triggers that exact same conformational change of the polymerase.
Now the second question, what's it recognizing? If your eyes are good, you can already see, and if they're not, if they're like mine, then let me help you. That was a good day in my lab. A natural base pair is replicated with an induced fit mechanism.
Its formation drives a large conformational change in the polymerase. Our unnatural base pair is replicated by a different, but only subtly different mechanism, a mutual induced fit mechanism. Its formation drives a large conformational change in the polymerase, but that conformational change in the polymerase drives a conformational change in the base pair. I don't think that if we would have tried to develop an unnatural base pair based on most other forces
that are much more directional, like hydrogen bonding or ionic forces, I don't think it would have had the plasticity to adapt to both the strength and the plasticity to adapt to the polymerase active site. So having, I hopefully convince you that the unnatural base pair is well replicated, and that we understand the mechanism, it's not some weird artifact.
With that, we were enthusiastic enough to advance to what our long-term goal was, and that is trying to use the unnatural base pair, trying to deploy it in an organism as a basis for an expanded genetic alphabet and eventually a genetic code. So returning to this image of my gram-negative cell, we were immediately confronted with a challenge, and that's this.
How do we get our triphosphates in a cell? So the literature will, if you look in the literature, it will give you a couple different suggestions. I don't have time to go into any of that. If anyone wants to challenge me on why, in order to have a semi-synthetic organism, you'd have to have it be able to import and synthesize the unnatural triphosphates or whatever. We can talk about that later. But none of the strategies worked.
And so the strategy for us that finally did work was based on noting some published literature. So what that literature was, was the following observation. There are a variety of genetic elements that autonomously replicate. So these genetic elements are the genomes of several intracellular bacteria, for example, some chlamydial species, as well as the genomes of several organelles,
like mitochondrion chloroplasts. And the property they have is this. They autonomously replicate, but they don't encode the machinery of triphosphate synthesis. Instead, these genomes are bathing in another organism that has all these nutrients already available. And so it's a well-known thing that what those genomes do is they minimize and scavenge.
The genomes that did not encode the machinery of triphosphate synthesis, instead encode dedicated nucleoside triphosphate transporters and just steal them from their host environment. So we got really excited about that. We thought, well, maybe some of them would be useful for us. And in fact, several were found that actually imported GCA or T, deoxys and ribose.
So we got very enthusiastic about that and rode around the world and requested these genes. And of course, this is the idea that we would express them in bacteria and they would facilitate the uptake of our natural triphosphates. I got to be honest, this is not the transporter, nor are these triphosphates. This is just an image I stole off Wiki or the internet someplace.
But that's the idea. So put it a little more specifically, here's the cytoplasm membrane. And we envisioned that, well, the outer membrane, the unnatural, we imagined the triphosphates would get through because they're hydrophilic. They can diffuse through porins. And then once in the parplasm, we imagined that one of these transporters
expressed from a plasmid might facilitate their uptake. So we got a variety of different of these transporters, as I mentioned. And the second from a chlamydial, an algae species called PTNTT2, worked very well when we assayed it with radiolabeled ATP. So it worked in our E. coli cells. And then the question is, would it function to import the unnatural triphosphates?
So here's cytoplasmic preps. So we're only looking at cytoplasmic composition. And this data results from the addition of alpha-labeled P31-labeled dATP. So you can see that this is the amount of triphosphate that gets into the cell. This is the amount of diphosphate present, monophosphate.
And then the nucleoside is dark because it's alpha-P-phosphorus-labeled. So you just don't see it. So this immediately tells you that the triphosphate is getting in, A is getting in, but it's being decomposed. This is probably just the natural life cycle of triphosphates within a cell. They're being metabolized up. They're being brought back down. But, of course, what we were real excited is that it also acted to import both the triphosphates of 5, 6 and NAM.
Once within the cell, you still see decomposition, just like we did with A. There's the triphosphate, there's the diphosphate, there's the monophosphate. Now this is an HPLC assay, so we can actually see the free nucleoside. And so you see it does go all the way down to the free nucleoside.
And this is for NAM. So a couple of things. Number one, it is being decomposed once within the cell. But it's not being decomposed particularly faster than the natural triphosphates. Number two, these triphosphate levels are 30 to 70 micromolar. And those are steady-state levels that persisted for hours after addition of triphosphate to the media.
That's just the level where import is being balanced by the degradation. And the important point is those values are an order of magnitude. Those triphosphate concentrations are an order of magnitude above the Km that we measured in vitro for different plum races. So we imagined that this, despite the accumulation of all this other stuff,
that these triphosphate levels might be sufficient to support our first shot on goal. So we constructed two plasmids. One we call an accessory plasmid. And all right now the accessory plasmid, we're naming it the accessory plasmid because that's the plasmid that we eventually imagined expressing things like orthogonal synthetases and all the accessory machinery. But right now all it has is that transporter that I just described.
And the information plasmid, what we call the information plasmid because it has an unnatural base pair. Now what the information plasmid is is nothing more than PUC19. Simply where AT, we've got a PUC19 fan here. Simply where the only difference between PUC19 and PIMF is a single AT was replaced with an unnatural base pair.
Otherwise it's identical to PUC19. Now the unnatural base pair, remember the pair that I told you, this NAM TPT3? It's actually replicated a little bit better than 5.6 NAM. But we hadn't validated it anywhere near the level that we had validated 5.6 NAM for replication biases and structure.
So we were definitely going to take our first shot in vivo with this. But the plasmid that we constructed, we constructed synthetically with this base pair. Now that will come back to be important in a few minutes. But since PIMF has this plasmid, we envisioned that the first round of replication would just immediately replace TPT3 with 5.6.
If these two were the only ones that we supplied to the media. So here's the experiment. Transform E. coli with that PAX plasmid and induce production of the transporter. Then add your unnatural triphosphates and then transform with the plasmid containing the unnatural base pair. Give it a little time and then recover the plasmid and determine what the fate of the unnatural base pair was.
So important controls are transform with PUC19 instead of the unnatural plasmid containing the unnatural base pair. Don't induce the transporter or don't add the unnatural triphosphates. This is the first data that we actually got.
The graduate student would run this experiment. And so what we're showing here is isolation of PIMF after 15 hours which corresponded to 22 doublings of the E. coli and a 10 to the 7 fold amplification of PIMF. Just like that trick that I showed you earlier, the graduate student took the plasmid, recovered it out
of the cells at this point, and amplified it by PCR with an analog that was tagged with biotin. And that separated into two populations and we could differentiate them with the streptavidin super shift. And so here's the gel. So here when you have PUC19, so when you have the template that doesn't have the unnatural base pair in it, it doesn't matter whether you add the triphosphate or if you induce the transporter, you don't see a shift.
That's important because it tells you the unnatural base pair is not randomly inserting through the genome. Now when you have the plasmid containing the unnatural base pair but you don't provide, so the transporter is under IPTG control, when you don't induce the transporter, you see no shift.
When you do induce the transporter but you don't provide the triphosphates, you don't see a shift. Only when you have the unnatural base pair in the template, you provide the triphosphates and you provide for their uptake, you see a shift. Now the shift based on calibration curves that were similar to what I showed you earlier was a shift that implied a fidelity that was greater than 99.7% per division.
So we were pretty excited about that but of course we wanted to be rigorous and we wanted a separate independent assay to demonstrate retention of the unnatural base pair. So we just used the same trick I showed you earlier, the sequencing trick, where we took the plasmid, PCR amplified it this time without a biotin tag and then subjected it to Sanger sequencing and there you see the truncation, the abrupt termination at the position of the unnatural base pair.
So we could actually go into this chromatogram and read what the sequence was. So of course it was exactly where we put it, it wasn't at some new place. Now we thought we clearly unambiguously demonstrated retention within a dividing E. coli cell. So we wrote a paper up, we submitted it and two of three reviewers agreed with me.
One didn't. And what the reviewer didn't like, there was two steps he didn't like. He didn't like this step and he didn't like this step. He didn't like the PCR steps. He wanted us to take, because he thought we were somehow introducing an artifact. He wanted us to take the plasmid directly out of cells and analyze it.
So I actually got these comments back from the journal, I think it was on the 21st of December and I freaked out a little bit because to me that screamed mass spec. And I don't really, we don't do mass spec in my lab. So I shot out this email to this group at NEB and the reason I shot the email is because this group had been developing an LC-MS-MS method that they had just published a couple papers on,
where they were able to demonstrate the presence of an epigenetic modification in a plasmid to the sensitivity of one methyl group in a plasmid. So I thought well if they can do that, they can probably analyze the retention of our unnatural base pair by the same technique. So I contacted them, I think it was on the 23rd, I think it was the 28th I got an email back from them, from Ivan Corona.
And he said okay we bought your unnatural triphosphates, they're commercially available, and here's a work plan and we're going to be done in two weeks.
And so anyone that's run an academic group and tried to manage collaborations, you know how hard this can be. I was like well okay that's good, maybe. And they weren't able to do it. It took them about five weeks. I can't say enough for this collaboration. It was one of the most enjoyable collaborations that I've ever participated in and I owe them a huge, a lot.
Here's the data. So here's the LCM-SMS trace. So here's DC content, G content, A content. T's down here because it doesn't fly well in their mass spec that was known. And so what you see is all the same control experiments, they're all on top of each other. And out here in the trace is this little peak for 5.6. Now that peak amplitude corresponds to just what you'd expect for one per plasmid.
And importantly, it's the mass of 5.6, not TPT-3. The only spot that 5.6 was provided was as a triphosphate to the media. This is unambiguous evidence for replication of the unnatural base pair within the cell. The third reviewer even bought it.
So this actually is a picture of the organism. So since the last common ancestor of all life on earth, it's been four letters, two pairs. This organism is stably, healthily growing, happily growing, while maintaining six letters and three pairs.
So we are continuing to optimize all aspects of the system. We hadn't synthesized a lot of analogs, so we synthesized a lot more of these. Turns out these are NAM analogs. And then we ran another screen. This time we ran a screen of 7,000 candidates because we had more. We actually identified a family of eight, most represented, for example, by this pair I already showed you, this NAM TPT-3.
But all eight of these analogs are in vitro replicated better than 5.6 NAM. Which I hope I just convinced you is good enough to replicate in a cell. So this again is like a med chem principle. If you have one molecule and you can't modify it and you go into a med chem program,
most molecules drop out of development not because of affinity, but because of PK, pharmacokinetics. They have toxicity, solubility problems, off-target activity, whatever. It's hugely advantageous to have a panel, a set of molecules with different physical chemical properties in order to aid development.
So we're excited to have the flexibility of having all of these analogs with slightly different physical chemical properties. Now, the unnatural triphosphates don't make the cell sick. But expression of the transporter does. So you're forcing cell, and this is not uncommon for membrane proteins,
because you're forcing the membrane to accommodate all of these transporters. And so that's why we put the transporter on a high copy plasmid because we thought there was selection pressure against retaining it and any mutation that came up that were a single copy that would delete it. Suddenly the cells would grow better, they would dominate the population. And right in the middle of an experiment we'd lose the ability to import the unnatural triphosphate.
So a graduate student joined my group and asked the question, could we drive a mutant of the transporter that was not toxic? And he actually was able to successfully do that because it's no longer toxic. We took it off the plasmid and integrated it into the chromosome. We optimized a whole bunch of different constitutive transporters.
The one that was best was this PLACUV5. It imports beautifully. Here are the growth rates. So this is just an amplitude difference. It just comes from the initial inoculation. The slope is what the growth rate is. So here you see the initial transporter system that we developed. And you see this plateauing here is the toxicity that I was referring to. This slope now shows this bacterial cell bearing that single mutant chromosomal located loci transporter
is growing with an identical rate and it imports identically. We've now done this experiment many different times. And what's exciting about this is that it takes one sort of moving part off the table. We don't have to add something to induce the transporter's production.
Our cell line now is always competent to take up the unnatural triphosphates. We can just take it out of the freezer and run an experiment. We don't have to transform with that first plasmid. We don't have to worry about timing and adding the transporter and giving enough time. So we're excited about that kind of optimization. And this is the kind of optimization that we anticipate applying to all facets of the semisynthetic organism.
Okay, so in the last seconds, how much time do I have? Two minutes. Okay, so all I showed you was retention. The question I always get is, well, what about the next step? What about retrieval? So we put the unnatural base pair within, and this time instead of avoiding a gene, we put it right in the middle of super folder GFP behind a canonical T7 promoter
and in front of a canonical factor-independent termination sequence. And this time the experiment is to propagate the unnatural base pair within this plasmid, now to import both four triphosphates now, the deoxys and the ribos of 5, 6, or whatever in the unnatural pair of the unnatural triphosphates,
and then induce transcription with IPTG to induce the transporter, and then collect the RNA and then analyze it. So understand what the unnatural base pair has to do to survive this assay. It has to stably replicate. The transporter has to bring in all four triphosphates. It has to survive transcription into message.
We then lyse the cells. It has to survive notoriously error-prone reverse transcription back into DNA and then survive PCR amplification. Right there, only where you expect it to be. So we've now transcribed lots of different messages. Surprisingly, maybe not surprisingly, tRNAs transcribe better. They're probably structured and prevent row-dependent termination.
But now we can stably and efficiently transcribe, and so the next step is to begin to combine those two to try to look at decoding at the ribosome. So with that, only to thank my group. So the students who worked on the project, Michael Ledbetter and York and Aaron, where's Aaron?
Our graduate students and post-docs, Brian and Ailey. And then I mentioned these collaborations and, again, the NEB group for collaborating and helping us out when we need it the most. And these agencies for funding, and I thank you for your attention.