Logo TIB AV-Portal Logo TIB AV-Portal

A Semi-synthetic Organism with an Expanded Genetic Alphabet

Video in TIB AV-Portal: A Semi-synthetic Organism with an Expanded Genetic Alphabet

Formal Metadata

A Semi-synthetic Organism with an Expanded Genetic Alphabet
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Expansion of the genetic alphabet to include a third base pair not only has immediate utility for a number of applications, such as site-specific oligonucleotide labeling, but also serves as the foundation for an organism with an expanded genetic code. Toward this goal, we have examined a large number of different unnatural nucleotides bearing mainly hydrophobic nucleobase analogs that pair based on packing and hydrophobic interactions rather than H-bonding. Optimization based on extensive structure-activity relationship studies and two screens resulted in the identification of a class of unnatural base pairs that are well recognized by DNA and RNA polymerases. More recently, we have engineered E. coli to import the requisite unnatural triphosphates and shown that DNA containing the unnatural base pair is efficiently rep licated within the cell, resulting in the first semi-synthetic organism that stores increased information in its genome.
Stop physical chemist sense T Ribose chemical variety DNA Nobel analogue P sites Amber cell protein transcription amino acids membrane core processes chemical element organizations triphosphate protein synthesis genetic code tRNA properties scaffolds multiple synthetic biology antibiotics mutation DNA polymerases function RNA coli polymer DNA replication chemist base pair stuff selective Biopolymere
surface Apoptosis areas bind Hydroxyethylcellulosen medicinal chemistry molecule protein phosphate drug discovery core base surface oil steps source stability packing Geschwindigkeitskonstante sequence genetic code biosynthesis activities Chromstahl complications Cross kinetics styrene Pentose phosphate pathway aromatic specific DNA analogue amino acids Redoxreaktion prosthetic groups metals areas triphosphate biosynthesis genetic code hydrogen bonding sugar DNA polymerases water function Propin chemical structures mixing DNA replication Primer (film) base pair chemist
physical chemist sugar acceptors fine Methoxygruppe chemical optische Aktivität Verzerrungen Strength race protein chemical structures derivatives Substituent base rates steps sulfur end packing phosphate Tetrafluorethylen oxygen hydrophobic stuff amplifier sense biosynthesis bond Chromstahl potential hydroxyl groups kinetics styrene methyl man Wasserstoff analogue function GFP active site parents Primer (film) triphosphate acceptors Doxorubicin Plate hydrogen bonding spread Naphtha glycosides water chemical structures DNA replication Primer (film) screening base pair
Stop Zellzyklus high-throughput screens kinetics DNA case chemical man DNA analogue rapid synthesis Biotin terminal control Chrome level mixture base pair Acc triphosphate Single-nucleotide polymorphism rates genetic code association steps Deep sea chemische Reaktion Electronegativity chemical firm flow Drops function coupling Biotin Library penetrative DNA replication electron volt base pair amplifier sequence
pharmaceutical company transport domain genome bind gene medicinal chemistry variety Strength race alpha-helix base Famous organizations Mitochondrium properties Conformational change additional Adenine stability phosphate coupling Kunststoffe compounds polymer sequence amplifier genetic code severe biosynthesis bond Transposon mechanism Chromstahl Ribose Nutrition man NMR Artifact (archaeology) Magnesium nucleoside DNA Pharmacology analogue cell active site level side chains type triphosphate Single-nucleotide polymorphism genetic code hydrogen bonding Chromstahl induced DNA polymerases water function chemical structures DNA replication Primer (film) base pair host
species perish phosphorus transport Plasmide media Multiple chemical sensitivity Alpha nucleoside pharmakokinetischen race cell important shot overexpression molar coli level Perfume membrane dye triphosphate composition concentration transport Porin HPLC phosphate coupling orders of magnitude coli base pair stuff label
T sensitivity Plasmide media transport genome operation retention methyl man Artifact (archaeology) analogue Biotin cell terminal shot control modifications mass spectrometer coli level organizations triphosphate Peek steps Plasmide firm function coupling coli chemical structures Streptavidin DNA replication base pair sequence amplifier
physical chemist affinity growth transport Plasmide set gene retention chemical molecule chromosome protein Löslichkeit membrane survive organizations rates properties steps constitutional systems toxicity phosphate RNA selective amplifier bear activities factors Ribose reverse transcription DNA analogue pharmakokinetischen cell terminal transcription overexpression excitement triphosphate tRNA Plasmide Gletscherzunge pressure mutation induced synthetic initial function screening base pair flexibility
so I'd like to thank the organizers for giving me the opportunity to come in talk at this meeting my lab is at the Scripps Research Institute and what we're largely interested in is my group breaks out is sorted into three projects all of what were interested in from one perspective another is evolution and trying to understand or manipulate evolution and very much from a chemists perspective I'm trained as a chemist in my lab is largely takes a chemical approach um the three projects very briefly are one trying to use concepts of evolution to design or identify and optimize novel scaffolds for antibiotic development a very rigorous sort of chemical physical approach to understanding the the role of adaptive mutations to protein function and what I'm gonna talk about today are our efforts to develop an unnatural base pair with which to expand the genetic alphabet and eventually the genetic code so there it is that's the genetic alphabet this is this is the genetic alphabet of E coli and if you can't see it let me blow it up for you this is a so I gave this talk in Japan in China - two years ago and someone came with me afterwards and said you know you have a lot of nerve showing the Japanese flag in China but of course it's just a lot the genetic alphabet is a long string of G CS and T's to a chemist DNA is actually a remarkably unremarkable polymer it's really very very simple what makes DNA of course utterly unique and remarkable is selective base pairing G selectively pairs with a and a pairs with T and other material has that property all the diversity of your life all the diversity around us in fact all the diversity since the last common ancestor of all life on Earth is encoded in a four-letter two base pair of genetic alphabet and when my when my group started at Scripps about 15 years ago we were very interested in asking the question as to whether or not that alphabet and eventually the genetic code could be expanded so we already heard from my good friend Ichiro about his about his take on on this and we we've we both started about the same time and we both sort of walked along together and it's been becoming good friends with the Cheras but one of the biggest greatest things that I've been able to that this projects led into and I really like a Cera's work and Patek what he talked about was using his unnatural base fare DSP X to evolve after MERS and to get to a question that was asked earlier those after MERS bound better that then did the the the the the the normal optimize of comprised of only GCAT so they did in part novel function that wasn't available in the natural genetic alphabet this these experiments to my knowledge of the first example of genetic of biopolymers unnatural biopolymers that were themselves evolved without the human intermediacy of a natural biopolymer so I think there was an absolute landmark study and for me it also really gave provided the first real practical demonstration of the use of a natural base pairs so that's it zeroes in vitro work what I'm going to talk to you about today is what's been the driving goal in my lab is to develop in vivo applications of an expanded genetic alphabet and so if one's interested in that question there's a variety of things you have to be concerned with here showing this organism this happens to be an e coli so gram-negative clear that there's a periplasm with two membranes there and so there's a variety of things that you have to be concerned with you have to have a DNA element with an unnatural nucleotide in it you have to have available within the cell that core the Tri fast the requisite triphosphate sub the unnatural nucleotides you have to have a DNA polymerase that selectively synthesizes DNA containing that unnatural base pair of course with high efficiency and high fidelity during a replication process then you have to have the same story within RNA polymerase you have to have an RNA polymerase that you have to have available within the cell the triphosphate of the ribose and you have to then drive transcription with an RNA polymerase and then provably that RNA polymerases can go out in the cell to the ribosomes and and with T RNAs that are transcribed with the corresponding cognate nucleotide to reconstitute the unnatural base pair during decoding drive protein synthesis and you know what the two things I do want to say at this spot was that I want to acknowledge the work of Steve Benner who was really the first to talk about expanding the genetic alphabet in a practical sense and develop analogs that were designed to be orthogonal and I also want to mention Peter Schultz's work who worked on this part of the problem where and what what Pete did was he developed our foggle trna synthetase pairs recoding for the amber codon and drawing on orthogonal pairs from from Jotham janay Chien and an MS a archaea that could that could orthogonal II recognize the amber the repurposed amber codon and I think that's absolutely beautiful work in my personal opinion I think it's a shame and I don't understand why he hasn't won the Nobel Prize for that work yet because I think it's some of the earliest and best stuff of what we would call synthetic biology today but I lost I also want to mention from the outset that we planned on stealing Pete's trna synthetase pairs and instead of encoding them with Amber codons which allows you to incorporate one unnatural amino acid we wanted to develop one unnatural base pair with which you could use to write an virtually unlimited number of new codons hopefully for the incorporation of multiple different unnatural amino acids simultaneously
into a protein maybe in an evolvable contact and I don't have time to go into this too much of in terms of the sort of thing that one could do with that a protein therapeutics if you haven't noticed the the the the revolution protein that protein therapeutics has hasn't had that's caused in the therapeutic field then you haven't been following anything because 70% of the inds at the fda the investigational new drugs at the fda two years ago were proteins there's a absolutes a change and how people are thinking about developing proteins yet proteins have only 20 amino acids no we can argue until the until until the rest of the meeting where their proteins need new amino acids for new functions I would argue they do if you look at small molecule therapeutics things like electrophiles are the most common pharmaco for in a small molecule therapeutic yet they do not exist in proteins at all electro force things like metal binding centers things like redox centers enzymes evolve that enzymes drawn that sort of activity by using cofactors for whatever they can but that's an entirely different challenge to evolve proteins that have cofactors as well so the idea being will evolve proteins with things like electrophilic centers for therapeutic applications are the long term goal that drives my effort in my lab so our effort I want to introduce and there's two things I want introduced about the approach that we took to the challenge and the first was that we wanted to try to draw upon a different force than hydrogen bonding that chair also alluded to this earlier and everyone's familiar with water and oil and the fact that they don't mix the hydrophobic effects makes oil want to get out of water for a whole bunch of complicated reasons involving salvation and evolving you know cavity size involving entropy and I'm and also packing obviously between the oil like molecules and I'm gonna lump all of that sort of under the rubric of hydrophobicity and so all of our analogs were designed based on oil when you get out of water so the idea is that two hydrophobic to lip is two very lipophilic nucleobases would pair with each other during replication and not want to pair with the more hydro with a more water like natural nucleobases and now we're taking the H bonds out we were in mean in the early time we were very concerned about just the stability of the DNA so we that's why the analogues were all sort of large in it so we were gonna replace the cross strand hydrogen bonding with within strand packing interactions so that's one thing I would say the large they're all large and hydrophobic and that's why now the other thing I want to point out about them is there's a lot of them so we did not want to take sort of what I view is sort of the traditional chemists approach of making of very carefully designing an analogue and then making it and testing it well I I've always been very inspired by medicinal chemistry about a third of my group works on medicinal chemistry and I wanted to approach the problem very much the way I think I just turned it off okay I've always been very inspired by the medicinal chemistry approach so we wanted to simulate the approach what we wanted to do is not make one but make lots and then develop assays to analyze them and then use those assays develop start to develop structure activity relationships that feed back into the design effort and and try to fuel that cycle to optimize analogs and so there's a lot of different analogs here um one that I'll talk about in a few seconds is this propyne Eliezer horrible styrol guy and this and and and and but this this is it collectively is what we sort of refer to in the lab now is sort of first generation set of analogs so in terms of that SAR they're sort of assays that we use to of course just a stability assay where we synthesize the analogs as possible ammonites we incorporate them into DNA and we measure the thermal stability because the midpoint where 50% of the duplexes are melted dissociated into single strand isn't a rather complex way related to the thermal stability to start to the thermodynamic stability of the base pair so we synthesize different base pairs we put them in the same sequence context and we measure the effect on stability that way the sort of much more important sar was driven by these assays the kinetics in the early days it was all priests it was all steady-state kinetics so we take a primer and a template again using phosphoramidite chemistry and incorporate a specific nucleotide as specific position the primer would run up typically run up right before it and then we look at the ability of different DNA polymerases to take triphosphate and incorporate them by primer and by extending that primer to incorporate the the unnatural nucleotide triphosphate we refer to that step as synthesis or incorporation because we're actually making the base pair or we're incorporating the triphosphate the next step now it's unlike natural synthesis because you now have a primer that terminates with an unnatural nucleotide and so the next step we refer to as extension and that's the step where you incorporate the next correct nucleotide off the now modified primer terminus so these two steps we would evaluate independently so like I said we synthesize a PCD name that terminated here and then look for its incorporation and then we would separately synthesize a piece of DNA that terminated as the primer at the unnatural head itself and look at the incorporation of the next step and so we would develop we would drive second-order rate constants and actually be able to quantify at Kondo if I be able to analyze the differences between the different analogs so by far the strongest SAR that came out of this first generation study was that large aromatic surface area very much very much facilitates the incorporation step but it makes the extension step very challenging we were actually we were able to optimize this but we were never able to optimize that so in collaboration with Pete Schultz and Dave
Lambert UC Berkeley we solved the structure of one of those first-generation analogs that I just described now this analogue this pics pics pair we referred to as a self pair and I spent a lot of my time in my early days justifying the use of the self pair if you think that's weird fine today our pair our best pairs Hedorah pairs but just to give you a context to historical context and are a lot of our early efforts were focused on self pairs and so this self pair happens to represent the sort of most canonical of those first-generation analogues that were synthesized very well but then terminated extension and so if you look at the structure shown here you can sort of see why so they don't edge-to-edge now maybe you you look at these analogues and you expect that but again remember we didn't design these to pair with each other these just came out of a empirical study of synthesizing lots of analogs and looking at all the possible pairs and these self pairs are what came out as being sort of the most promising but they cross strand intercalating in our minds eye we immediately visualize we believed that this explained the SAR that we that we observed so it during the synthesis the incorporation stuff this triphosphates if you imagine this motive of binding also in during replication and the polymerase active site maybe this is the template and this is the growing primer when this triphosphates incorporated it picks up all this beautiful packing interaction it gets out of water which it makes it happy and so that's what drives that very efficient incorporation step but to the extent that the templates more locked down in the polymerase active site with more interactions with the protein the any distortion required to pinch down of the duplex to allow that intercalation to happen the majority of that distortion is going to be borne by the primer terminus which miss positions that hydroxyl group for the next step for the nucleophilic attack on the income and the next incoming triphosphate which is why the extension step was slow so with
that SAR we sort of return to our design strategy and ask the question well if these analogs were if the large and hydrophobic ones were were prone to prevent extension could we design smaller analogs that would not be prone to intercalate ins and optimize their incorporation step and then have a pair of might that we might be able to simultaneously optimize both steps so we again return to a very sort of med chem empirical approach where we synthesize lots of analogs of course I'm not showing the sugar and the phosphate for those mathematicians in the audience that that that maybe didn't know that something else was attached there um I was supposed to be a joke but in any event um so again we were systematically examining lots of different analogues sorry okay I'll try not to you could maybe close your computer so again systematically examining lots of different analogs and systematically putting a different haulage different flooring substituents different methyl substituent and again driving the program very much based on that empirical SAR now the s error that came out of this second-generation analogues is a little more complicated so let me spend a second to describe it the single position on the nucleobase that was by far the most important was the position ortho to the glycosidic linkage so this is the glycosidic linkage whether it's a c glycoside or an end glycoside this position was by far the most important for the insertion step the synthesis step where you're inserting the triphosphate against its cognate base in the template it doesn't matter whether you're looking at a nucleobase in the triphosphate or in the template you want that suspicion to be hydrophobic makes sense because we're trying to drive this packing interaction this hydrophobic interaction in the first place now this extension step when you're looking at the nucleobase in the template you still want that ortho substituents a drove ik but the problem is you need it and when it's in the in the in the the the now primer terminus you needed to be hydrophilic and the reason is and we should have known this from the beginnings do you look at any of the naturally playah bases they all have an H bond acceptor they're at the same position that ortho position and if you look at structures between the primer template and the plumb races polymerase is always donate a hydrogen bond to help orient that primer terminus so you need to be able to accept that h bond or you're gonna force a d solvation and so this seems like a potential physical chemical contradiction how could we simultaneously optimize both its hydrophobicity to pack and its ability to accept a hydrogen bond so at the time I was fortunate excited a very talented graduate student in postdoc who simultaneously ran two screens independently of each other two screens of 3600 candidates each and one screen I'm not going to go into this too much for time but one was just a gel-based screen where they looked only at the extension step because that was the rate limiting step for most of our analogs and the other one was a much more sophisticated plate based screen where we took two plates and each of them were identical except one got the natural triphosphate in addition to the natural triphosphate swear the other only got the natural triphosphate and we stained with cyber green which the signal depends on the strength of the amount of double strand DNA present and so we looked for wells where we live for a place where we look for wells on one plate where there was strong signal but in the sister well on the other plate there was no signal in that assay bakes into it both extension and incorporation and fidelity all into one assay so from this from those two independent screen of 3600 candidate unnatural base pairs we were pretty excited because both of them identified the same single unnatural base pair and that's shown here what we call MMO 2 which is the second of a methyl methoxy series that was a second-generation analog and this five sulfur this 5-methyl sulfur aiesec arbol styrol analog which was a member of the first generation analog series now notice the nature of the the ortho substituent that contradiction that I that I mentioned earlier sulfur at this position is more polarizable it's more hydrophobic than as oxygen but it's still able to accept a hydrogen bond and the OL methyl group is simply a bond rotation away from accepting an h bond or preventing a hydrophobic methyl group for packing so at least in our minds eye we imagine that that was how we that the unnatural base spread solved that contradiction this very much very quickly reinvigorated our design efforts and within months we had up we had identified this naphtha methoxy derivative as being a better partner four five six now this was the first pair that we could really PCR amplify with high fidelity and I'll talk about that in a second since its discovery I'm gonna spend most of my time the rest of my talk talking about this pair but since its discovery we have found TPT 3 as a better analog is a better partner 4n am and this parent will come back at the end of my talk to be important but I'm gonna spend most of my the rest of the talk talking about this n am 5/6 pair so at this point we could no longer use steady-state kinetics to drive SAR because the unnatural base pairs are virtually synthesized as fast as an 80 base pair and it's not because they're actually chemically equivalent it's
because they're both rate limited by product Association it's just a limitation of steady-state kinetics you only measure your rate limiting step and all that that tells us is that we had now increased the efficiency of the chemistry step to the point where it was no longer rate limiting turnover of starting material into products so we my lab has recently got a rapid stop flow injection system so we're going to be doing pre steady-state kinetics to get to those numbers specifically more directly but even so that we'll never be able to be fast enough to drive SAR because it's a rather time consumed ese so we developed another si based on just pika sequencing so we would take a PCR reaction take the amplicon and give it to our sequencing facility and they would take it and put it into a stank a standard Sanger sequencing reaction of course that stare that see that saying our sequencing reaction doesn't have our own natural triphosphates in it so what you see is an abrupt termination at the position of the unnatural base pair so if you ratio the intents of the amplitude of the chrome at the peaks prior to the unnatural base pair to those after and you construct calibration curves of known mixtures of DNA that's synthesized with the unnatural base pair and DNA synthesized with an 80 replacing it you can convert that ratio into a percent of unnatural base pair present he's normalized that by the amplification level during a PCR reaction and that gives you a fidelity and that's sort of the fidelity that I'll talk about now so we incorporated the unnatural base pair into a lot of different sequences to look at G C and a T rich sequences to see if there was a sequence bias amplification levels were were pretty high and and the fidelities per round of replication were pretty high even sequences of where they were too and right in a row now in this case there's an approximate here because what we're looking at here in the assay is a drop on a drop so that just becomes a little bit difficult to actually rigorously characterize that we even put in so I should mention these these little sequences or course part of 180 more than I'm simply not showing you the rest of the sequence and what we're doing here is we embedded the unnatural base pair within a rant a sequence where each of the three nucleotides on both sides were randomized and what we're trying to do here say look are there any sequences that are particularly prone to lose the unnatural base pair because then they would have an amplification advantage and then we would see this a road and that did not seem to be the case so we were pretty enthusiastic but in order to examine that a little more carefully we incorporated the unnatural base pair into a chemically synthesized piece of DNA we then amplified that piece of DNA we amplified it 10 to the 24 fold deluded it out a million fold three times and then during that PCR cycle amplification of course what happens if some of the unnatural base pairs are lost so you produce a population where they've been replaced with a natural pair and some are retained in order to differentiate them we put them through one more round of PCR or one of our analogs is attached to a biotin tag which now of course produces two populations one of which as a biotin tag which corresponds to the population the retained the unnatural base pair and one which is no longer tagged and corresponds to a population that lost their natural base pair we then can take that population and and subject it to Illumina deep high-throughput sequencing and so we actually ran the whole thing in parallel to a natural sequence with an 80 present and then every time that the three as mentioned we took out an aliquot and so all of those populations were analyzed with a minimal number of reads of 1.6 million so that statistical analysis is pretty reasonable and so here's the single nucleotide frequency data so what this F row minus one is it's the frequency in the population that had the unnatural base pair relative to the frequency of the control population that did not have the unnatural base pair the frequency of an a at these positions relative to an AM so this is a the minus one position to the minus ten this is the the frequent F row minus one of a at the position plus 1/2 plus 10 and correspondingly the same for C G and T this is the population that retained the unnatural base pair as a function of amplification this is the population that lost the unnatural base pair as a function of amplification so the reason we used F row minus 1 because F F rel itself L is greater than 1 that means you have a bias for that nucleotide at that position if it's if it's negative if it's sorry if it's less than 1 you'd have a bias against the nucleotide at that position a fro minus 1 simply makes it visual and that any value that's positive means you have a bias for in any value that's negative means you have a biased against so clearly here's our largest bias you can see as you amplify out to 10 to the 24 fold immediately 5 prime to n a.m. in the template you see a preference for C that's the largest single nucleotide bias that we have now of course a single nucleotide bias isn't enough to tell you everything oh sorry to gauge use - how relative how important that bias was a C at that position was present at 18 point seven percent of the initial population now of course this should've been 12.5% but these are the vagaries of phosphoramidite coupling efficiencies on during chemical synthesis but nonetheless that 18.7 only over 10 to the 24 volt amplification only grew up to 24 points a pretty small increase now single
nucleotide biases aren't enough to give you the whole story because sequence correlations can hide biases so for example if you take GC a and T and perm you that every one so GCAT see ATG and so on at every position of the only those four sequences every nucleotide would be present at only 25% so it would look totally unbiased but of course it's highly biased because there's only four sequences so correlations hide biases that are not apparent at the single nucleotide level so we looked we did a correlation analysis so this is the population to retain this is the population that lost in natural base pair this is just a measure of the correlation so it's we're looking now at a might and at a plot that map's the sequence against itself so this peak here for example this peak measures the correlation here and here so what this tells you is that as you amplified the population that retained the unnatural base pair had a the a correlation grew in between the -2 and the -1 and the +1 and the +2 so the flanking dinucleotides same in the population that lost there were correlations between the nature of those flanking nucleotides but those were the only correlations now when we first got this data we were a little confused because in this population the correlation seems to grow in and then grow out and then sort of grow back in again and I'll give you the answer these correlation values are so small that what we're looking at is random fluctuations just noise but nonetheless what this data tells us that if we're looking for additional sequence biases we only need to consider the the flanking dinucleotides so if we did an analysis of the dinucleotide frequency this time plotted as a unit circle where F or L minus 1 is shown here so a positive bias for something would correspond to a bubbling out of the data and a bias against would correspond to a collapse off the unit circle so the only significant bias that we have will be at least the largest bias we had is right here so this isn't the populace that retained the unnatural base pair and it's a bias for a five-prime dinucleotide of cg now this largely I mean maybe the CG bias is a little larger than a CC and Ct in the CA but a cg is probably the largest this largely course simply
corresponds to this bias for a CE at the five prime position and if you look at the actual numbers again a CG was present 2.3 percent of the populations before the amplification and it was only present at 3.5% after a full 10 to the 24 for amplification so we actually stated in the paper and the reviewers allowed us to state that this functionally in vitro was a was that was a functional third unnatural base pair because these sequence biases are actually less than some of those observed amongst natural sequences so at this point we had sort of believed that we had demonstrated that we had a fully functional and natural base pair so maybe now is the chance for us to charge into her and vivo long term goal of trying to use this as the basis of an organism with an expanded code but if you again returning sort of the medicinal chemistry analogy if you went to a medicinal chemist and you said here's a compound that I have I'd like to start development program so pull down 20 million dollars from your pharma company and let's let's start the program but the medicinal chemist will ask you is well what's the target and you'll say well who cares I've got this great activity and and the medicinal chemist will never be interested you'll never get interested in Pharma with that data and the reason is because the pharma industry has been fooled too many times by ghosts sort of things that vanish as you try to track them down so what they want to know what the beginning of a program is what's the target and can you understand the mechanism of action is it a reasonable thing is it understandable so the reason this was an issue for us because we got a structure of our new base pair this five six n am pair in collaboration with Tammy Dwyer University of San Diego here's an overlay of 10 NMR structures here's the average structure their intercalated again so not as much as the first generation analog was but this they're still intercalated a little bit and this was a real moment for us we sort of realized our pairs are never gonna do this there's nothing to be had from doing this they're not getting any H bonds so the duplex is gonna do whatever it has to do is sort of unwind and open a little bit and allow those nuclear bases to packed on each other which is the only route to stability they have now in principle that doesn't bother me but of course the reason it does bother me is because the question is why is it I mean I just tried to tell you that they're replicated well by different polymer aces but everyone knows that polymerase is evolved to recognize a Watson Crick pair this looks like a mismatch this looks like if you're familiar with an a a sipper motif miss pair where the a's interdigitate amongst each other and everyone knows that polymerase is evolved just select against them so are we chasing a ghost so to tell you a little more carefully what I mean by that so if you look at the GC or a CG or TA or an 80 it doesn't matter they all form the same structure Watson the very famous Watson Crick structure and it doesn't matter whether it's formed in duplex DNA or if it's formed during replication by pairing a triphosphate against a templating nucleobase they all look like this they're all planar and they all take on this very canonical Watson Crick structure again ours doesn't look like that ours looks like this intercalated structure that polymer aces are supposed to select against so if you think of how polymerase works the DNA polymerase looks like a right hand template lays down like this and when when a triphosphate binds and only when the correct triphosphate binds it induces a large conformational change of the fingers domain down over the palm and thumb domain now that conformational change is supposed to result in a very tight closed complex that rigorously selects for the structure of a watson crick pair so we had two questions when I approached my friend Andy marks about collaborating Andy salts crystal structures amongst other things and we we decided that we would try to solve structures of polymerase to synthesizing our natural base pair and we had a couple of questions number one was formation of our hydrophobic pair sufficient to drive that same conformational change of the polymerase and and and number two if it was what is it recognizing so here are just the key structures here's the binary complex of the primer template bound of the plum race and the only part of the polymerase I'm showing is the O and the o1 helix those are the base of that fingers domain that I mentioned and here's in the binary complex here's the turn and so in the binary complex n am our analog and the template is flipped out of the developing duplex now when we add our triphosphate and solve the structure for the ternary complex what you see is the RN that triana logging the template flips back into the developing duplex and you get this large conformational change now to see that conformational change I'm overlaying the binary and the ternary structures here there's that conformational change of the fingers domain now to convince you that it's exactly the same as the conformational change induced by a natural base pair this is normal a of the ternary complex is synthesizing a GC pair and that synthesising are a natural base for CC at the secondary level they're absolutely super imposable and if you actually look at the side chains and even the bound waters and magnesium ions they're absolutely super imposable so the first question the unnatural base pair triggers that exact same conformational change of the polymerase now the second question what's it recognizing if your eyes are good you can already see and if if they're not they're like mine then let me help you that was a good day in my lab a natural plumber a natural base pairs replicated with an induced fit mechanism its formation drives a large conformational change in the polymerase our natural base pair is replicated by a different but only subtly different mechanism a mutual induced fit mechanism its formation drives a large conformational change of the polymerase but that conformational change in the polymerase drives a conformational change in the base pair I don't think that if we would have tried to develop an unnatural base pair based on most other forces that are much more directional like hydrogen bonding or ionic forces I don't think it would have had the plasticity to adapt to both the strength and the plasticity to adapt to the polymerase active site so having I hopefully convince you that it's that the unnatural base pair is well replicated and that we understand the mechanism it's not some weird artifact with that we were enthusiastic enough to advance to what our long-term goal was and that is trying to use the unnatural base pair trying to deploy it in an organism as the basis for an expanded genetic alphadon eventually a genetic code so returning to this image of my gram-negative cell we were immediately a confronted with a challenge and that's this how do we get our triphosphate in a Cell so the literature well if you look in the literature will give you a couple different suggestions I don't have time to go into any of that if anyone wants to challenge me on why in order to have a semi-synthetic organism you'd have to have it be able to import and synthesize the unnatural triphosphates or whatever we can talk about that later but none of the strategies worked and so the strategy for us that finally did work was based on noting some published literature so what that literature was was the following observation there are a variety of genetic elements that autonomously replicate so these genetic elements are the genomes of several intracellular bacteria some for example some chlamydial species as well as the genomes of several organ like mitochondria and chloroplasts and the property they have is this they they autonomously replicate but they don't encode the machinery of triphosphate synthesis instead like a lot of I mean these these these genomes are bathing in another organism that has all these nutrients already available and so it's a well-known thing that what those genomes do is they minimize and scavenge the genomes that did not encode the machinery triphosphate synthesis instead encode dedicated nucleoside triphosphate transporters and just steal them from their host environment so we got really excited about that we thought well maybe some of them would be useful for us and in fact several we found that actually imported GC a or T deoxys and ribose so we got very enthusiastic about that and rode around the world and requested these genes and of course this is the idea that we would express them in bacteria and they would facilitate the uptake of our natural triphosphate so I got to be honest this is not the transporter nor these try phosphates this is just an image I stole off wiki or the internet someplace but that's the idea so putting a little more
specifically here here's the cytoplasmic membrane and we envisioned that well the outer membrane of the the unnatural we imagine the triphosphates would get through because they're hydrophilic they can diffuse through porins and then once in the periplasm we imagined that one of these transporters Express from a plasmid might facilitate their uptake so we got a variety of different of these these transporters as I mentioned and the second from a chlamydial am an algae species called PT NTT 2 worked very well when we assayed it with radio labeled ATP so it worked in our t coli cells and then the question is would it function to import the unnatural try phosphates so here's
cytoplasmic perhaps so we're only looking at cytoplasmic composition and this this data results from the addition of D of rabiul of alpha labeled P 31 labeled D ATP so you can see that this is the component this is the amount of triphosphate that gets into the cell this is the amount of dye phosphate present mono phosphate and then the nucleoside is dark because it's alpha peel a phosphorus label so you just don't see it so this immediately tells you that the triphosphate is getting in a is getting in but it's being decomposed this is probably just the natural life cycle of try phosphates within the cell they're being but I stopped there being brought back down but of course what we're real excited is that they also it also acted to import both the triphosphates of 5/6 and n am once within the cell you still see decomposition just like we did with a there's the triphosphate there's the dye phosphate there's the mono phosphate now this is an HPLC assay so we can actually see we're not it the the free nucleoside and so you see it it does go all the way down to the free nucleoside and this is um 4n am so a couple things number one it is being decomposed once within the cell but it's not being decomposed particularly faster than the natural triphosphate so number one and number two these triphosphate levels are 30 i think it's 30 to 70 micro molar and those were steady state levels that persisted for hours after addition of triphosphate to the media there were that's just the level that we're import is being balanced by the degradation and the important point is is those values are an order of magnitude those triphosphate concentrations are an order of magnitude above the km that we measured in vitro for different plumb races so we imagine that this despite the accumulation of all this other stuff that these triphosphate levels might be sufficient to support our first shot on goal so we constructed two plasmas one we call an accessory plasma and all right now the accessory we're naming at the accessory plasma because that's the plasma that we eventually express imagined expressing things like orthogonal synthetases and all the accessory machinery but right now all it has is that transporter that i just described and the information plasma what we call the information plasmid because it has an unnatural base pair now what the information plasmid is is nothing more than puck 19 simply where 80 something they've got a puck 19 fan here simply where the only difference between the puck 19 in pimp is a single 80 was replaced with an unnatural base pair otherwise this identical to puck 19 now the unnatural base pair remember the parallel I told
you this n am TPT 3 it's actually replicated a little bit better than 5 6 n a.m. but we hadn't validated it anywhere near the level that we validated 5 6 and AM things like efficient for like replication biases and structure so we are definitely going to take our first shot in vivo with this but the plasmid that we constructed we constructed synthetically with this base pair now that'll come back to be important in a few minutes but dirt but just since since pimp has this plasmid we envisioned that the first round of replication would just immediately replace TPT three with five six if five six and we're the only if these two were the only the ones that we supplied to the media so here's the experiment
transforming coli with that PACs plasmid and induce production of the transporter then you add your truck then add your unnatural triphosphate and then transform with the plasmid containing the unnatural base pair give it a little time and then recover the plasmid and and and and and determine what the fate of the unnatural base pair was so it's important controls are transformed with PUC nineteen instead of the unnatural the plasmid containing the unnatural base pair don't induce the transporter or don't add the natural triphosphate this is the first data that we actually got the graduation would run this experiment and what it's so what we're showing her here is isolation of pimp after 15 hours which correspond it to 22 doublings of the e.coli and a 10 to the seventh fold amplification of pimp just like that trick that I showed you earlier the graduate student took the plasmid recover it out of the cells at this point and amplified it by PCR with an analog that was tagged with biotin and that separated into two populations and we could differentiate them with the streptavidin super shift and so here's the gel so here when you have when you have PUC 19 so when you have the template that doesn't have the unnatural base pair in it doesn't matter whether you add the triphosphate or if you induce the transporter you don't see a shift that's important because it tells you the unnatural base pair is not randomly inserting through the genome now when you have the plasma containing the unnatural base pair but you but you don't provide but so that's the transporter is a under iptg control when you don't induce the transporter you see no shift when you do induce the transporter but you don't provide the triphosphate so you don't you don't see a shift only when you have the unnatural base pair on the template you provide the triphosphate and you provide for their uptake you see a shift now this shift based on calibration curves that were similar to I showed you earlier was a shift that implied a fidelity is greater than 99.7% per 2 so we were pretty excited about that but of course we wanted to be rigorous and we wanted a separate independent assay to demonstrate retention and the unnatural base pair so we just used the same trick I showed you earlier the sequencing trick where we took the plasmid PCR amplified at this time without a biotin tag and then subjected it to Sanger sequencing and there you see the truncation the the abrupt termination at the position of their natural base Berg so we can actually go into this chromatogram and read what the sequence was so of course it was exactly where we put it it wasn't at some new place now we thought we clearly unambiguously demonstrated retention within a dividing u coli cell and so we wrote a paper up we submitted it and two of three reviewers agreed with me one didn't and I got and what the reviewer didn't like there was two steps he didn't like he didn't like this step and he didn't like this step he didn't like the PCR steps he wanted us to take because he thought we were somehow introducing an artifact he wanted us to take the plasmid directly out of cells and analyze it so I actually got this comments back from the journal I think was on the 21st of December and I freaked out a little bit because to me that screamed mass spec and I don't really we don't do mass spec in my lab so I shot out this like email to this group at anyb and the reason I shot the email is because this this group had been developing an LC ms/ms method that they just published a couple papers on were they able to demonstrate the presence of an epigenetic modification and a plasmid to the sensitivity of one methyl group in a plasmid so I thought well if they can do that they can they can probably analyze the retention of our unnatural base pair by by the same technique so I contacted them and so I contacted them I think it was on that that xx this R it's 23rd I think it was the 28th I got an email back from them from Vaughn Corona and he said okay we bought your unnatural triphosphates they're commercially available and here's here's oh here's a work plan and we're gonna be done in two weeks and so anyone that's run an academic group and tried to manage collaborations you know how hard this can be I was like okay that's good maybe and they they weren't able to do it it took him about five weeks I can't say enough for this collaboration it was one of the most enjoyable operations that I've ever participated and I owe them a huge a lot here's the data so here's the LC ms/ms trace so here's d c-- content g content a content t's down here because it doesn't fly well on their mass spec that was known and so he sees all the same control experiments they're all on top of each other and out here in the trace is this little peek four deep four five six now that peak amplitude corresponds to just what you'd expect for one per plasmid and importantly it's the mass of five six not TPT three the only spot that T that five six was provided was as a triphosphate to the media this is unambiguous evidence for in reverb for replication of the unnatural base pair within the cell the third reviewer even bought it so sense so this actually is a picture of of the organism so since the last common ancestor of all life on earth it's been four four letters two pairs this organism is stable e healthily growing happily growing while maintaining six letters and three pairs so we are continuing to optimize all
aspects of the system we've we had we hadn't we hadn't synthesized a lot of analogs so we synthesized a lot more of these these turns out these RNA M analogs and then we ran another screen this time we ran the screen of 7,000 candidates because we had more we actually identified a family of eight most truck the most most represented for example by this pair already showed you this ma MTP t3 but all eight of these analogs are in vitro replicated better than five six NM which I hope I just convinced you is good enough to replicate in a cell so this again is like a med chem principle if you have one molecule and you can't modify it you go into a med camp program most molecules drop out of development not because of affinity but because of PK pharmacokinetics they have toxicity solubility problems off target activity whatever it's hugely advantageous to have a panel a set of molecules with different physical chemical properties in order to aid development so we're excited to have the flexibility of having all of these analogs are slightly different pharmakom physical chemical properties now um the unnatural triphosphates don't make the cell sick but expression of the transporter does so you're you're forcing cell and this is not uncommon for membrane proteins because you're forcing the member you're forcing the membrane to accommodate all of these transporters and so that's why we put the transporter on a high copy plasmid because we thought there was selection pressure against retaining it and any mutation that came up at were a single copy that would delete it suddenly the cells would grow better they would dominate the population and right in the middle of an experiment we'd lose the ability to import the International triphosphate so a graduate student join my group and and and ask the question could we make could we drive a mutant of the transporter that was not toxic and he actually was able to successfully do that because there's no longer toxic we took it off the plasmid and Ingrid integrated into the chromosome we optimized a whole bunch of different constitutive transporters the one that was best was this peel a qv5 it imports beautifully here the growth rate so it's the this is just an amplitude difference it just comes from the initial inoculation the slope is what the growth rate is so here you see the initial transporter system that we developed and you see this plateauing here is the toxicity that I was referring to this slope now shows it this this bacterial cell bearing that single mutant chromosomal located low sie transporters is growing with an identical rate and it imports identically we've not done this experiment many different times and what's exciting about this is that it takes one sort of moving part off the table we don't have to add something to induce the transporters production we have our cell line now is always competent to take up the unnatural triphosphate so we can just take it out of the freezer and run an experiment you'll have to transform with that first plasmid we will have to worry about timing and adding the transporter and giving enough time and so we're excited about that kind of automation and this is the kind of optimization that we review that we that we anticipate applying to all facets of the semi synthetic organism okay so in the last second time we shown drive two minutes okay so all I showed you was retention the question I always get is well what about the next up what about retrieval so we put the unnatural base pair within in this time instead of avoiding a gene we put it right in the middle of super folder gf behind a canal behind a canonical t7 promoter and in front of a canonical factor independent termination sequence and this time the experiment is to try to propagate the unnatural base pair within the this plasmid now to import both for triphosphates now the deoxys and the ribose of five six or whatever in the unnatural try pair of the unnatural try phosphates and then induce transcription with iptg to induce the trip the transporter and then collect the RNA and then analyze and so understand what the unnatural base pair has to do to survive this sa it has two stable e replicate the transporter has to bring in both try for all four triphosphates it has to survive transcription into message we then lyse the cells it has to survive notoriously error-prone reverse transcription back into DNA and then survive PCR amplification right there only where you expect it to be so we've now transcribed lots of different messages surprisingly maybe not surprisingly tRNA subscribe better they're probably structured and prevent row dependent termination but now we can stable e and efficiently transcribed and so the next step is begin to combine those two to try to look at decoding at the ribosome so with that is only to thank my group so the students who worked on the project Michael Ledbetter and York and Aaron where's Aaron Aaron our graduate students and postdocs Brian and Ailee and then I mentioned these collaborations and uh and and again the the Emmy the anyb group for collaborating and and and helping us out when we need it the most and these agencies for funding and I thank you for your attention