Merken

How CPython parser works, and how to make it work better

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
few people these but it is about some that they did with my previous employer
when any of them they asked me to remove from the slides and mention their name so I'm here with no business affiliation and unlike problem most books in this conference this is not about writing stuff with Python this is about
life and himself and the specific about the C Python which is the reference implementation and even more specific about the parser which is part of this implementation so what is the by comparison when they put by the the will this is the
picture that came up but actually passes
the 1st is the stage of the interpreter which converts the source quoted in tool are straight which the compiler that compiles and might call time they might code is what the interpreter actually rounds now what this western parts are like the same as
any mature programming language it has formal grammar which looks something like
this if this link open has formally define production 3 starts at the symbols for a 3 different kinds of inputs every line defines a new nonterminal symbol and the notation here is a bit more complicated than the BNF notation that promise us usually written in something more like regular expressions so we have something like plus modifiers and somewhere star modifiers for repetitions
as well as parentheses for grouping and that is the other ingredient is the header file with the definition of all tokens but has 57 types of tokens numbered 0 to 56 and the numbers above 256 are reserved for non-terminals which are the ones that are defined in the grammar then we have parser-generator that based on the formal grammar and a token definition producers a file which looks like this it's just a definition of site stating dated doesn't have any code in it all it does is he defines the DFA wonderfully for every nonterminal that the grammar defines so every did if a is defined as an array of states every state is defined as an array of arcs and every hour has a label and this nation states the also
I thought mystring is duplicative but it's not really the sorry did the process 4 of the 5 primes but this again did you see the proof
of the fact that I can show OK then I'll have to scroll back this is what the grammar looks like with their productions and the right hand sides that looks like a regular expressions and that's what they had a file like with the that the token definitions I don't know how to make it the plaquette again let me try and
right now at this set assessing myself what I'm showing so those to go back each so this state is there that defined as an array of parts of the going out of the state and each arc is defined as in now each dephasing is defined as Arab states that each state is defined as a way of arms going out with the 1st item is the arc label which is more or less the same as the symbol number and the 2nd item is the destination state and in fact we have a special table which converts a symbol ladies into all arc labels because symbol number 1 it corresponds to all keywords and all identifiers so we also have look much which specific keyword it corresponds to so that each each 1 of those you get a different arc label number even though they're all sharing the same symbol 80 and the near the very end of this file we
have the fission of the themselves and for each the way we also have the
starts which shows which tokens the so this specific nonterminal can start with and this is so sophisticated
enough so I prepare a short demonstration
of how this works for very simple Python program which looks like this we pass it as a single input so that's the opera production that record person it according to an upper parse tree starts with they just mobile single-input and the DFA which pass a single input is that if a number 0 which looks like this and in the initial states which estate number 0 we have 3 arcs going out of it so the
1st arc with label number 2 response to the new line next characters simple statement you lose 3rd are 2 component statement and we're looking at that if tokens in the inputs and
according to the start since we know that simple statement cannot begin with if if it's not a new line of compounds that we can begin with if so we transition along the last hour are the combined statement no to the parse tree and before we actually transitional state number 2 we switch
to the new DFA which versus compound statements forward composite but here we have many options so in the initial state we have 9 arcs going out of it only the 1st hour which is for the statement can start with them if spoken so we add if statements notably pass through transition along this and so on now we switch to the BFA which passes if statements its sophisticated enough that it didn't fit on the slide we because you say that it's definition is
all sophisticated heavenly by parenthesis under repetitions and optional parts it In the initial step we have only 1 hour which with a label 97 corresponds to the if spoken because it's a terminal symbol we just added to the
parse tree consume it from the input the switch to the next state understand the same DFA these next state once again has only 1 are going out of it but this 1 it's for a test nonterminal the starts at is that it can be begin with the number of such as for the 2 so we are the test mode
switched of the give which should have 1st who passes test and so on it calls for an awful long time before we actually get back into the same DFA and now we're in state number 2 past 1 Arik which consumes the Cologne Open from the input transitions to the next state once again we have only 1 arc
responding to sweet they starts set confirms that sweet can begin from print token so we added sweet node of the parse tree switched all DFA for sweet once again it takes an awful long time before we get here
and the interesting bit is now we have 3 arcs going out of the state number 4 arc with label 98 corresponding
to the terminal LP if 99 to the terminal else and the next open in the input is the new line so it's neither of these and the arc labeled 0 means that is a final state so if we did much anything which has are just done
with this DFA and go back to the previous 1 so we go back to composite on rear ended
up in stay that only has a 0 are which means no matter what the next token in the input is we done and going back to the previous 1 go back to the real difference was started from there is only 1 arc with label number 2 which is funny line so we consume the land from the input add a note to the past
3 and the transition state number 1 which is fine final so we are at end of input word in the final state of the would be a phase which means that they are since succeeded where so what has just
happened we had the program which was 27 bytes long tokenized until 7 tokens and this is the parse tree which we produced for it 64 nodes takes the with 109 as much memory as the source
code itself and it's not so much at 307 because of almost doesn't range we see that we have a huge the like serious sulfur a dozen nodes we a single child each the in all these big because so that Python has a rather primitive faster which has eluded did defined grammar like this world it looks is seriously instead of
repeat in all steps that have shown just yeah that's more important the bit because the
past together can take a hundred times as much memory as the source code if I have a
moderately big the book by that's a source file of say 30 megabytes that the a person what would not even be able to all all this into memory because they think it's effect would would would would be too big so that was my motivation for all this work I had 30 megabyte file file and I couldn't possibly run it through Python we showed
these this is CST stand for concrete syntax tree there is a standard model
called faster and you can show good the CST sir license so structure of nested lists and every this structure has 2 elements the 1st 1 is the symbol lady so you'd say 0 tool 56 for terminals and something above
256 for terminals so the sec 2nd item is the spoken string for thumbnails forces subtree In an honest at least 4 of nonterminals the departed the condition for passing module the actually says that people shouldn't use it because there is about a module called st stand for abstract syntax tree which is a 2nd kind of syntax tree that
Python uses and they used example shows what the st looks like for the same way example Python code this so that it's much more compact it's almost human readable and the point is that unlike C C which reflects how the program is represented in the source code they st reflects what the pro problem actually intends to do so it's actually the st that the compiler runs on to generate the bytecode
the so the since 2nd
kind of syntax tree actually also has a formal grammar which is what this formal grammar looks like a different format now we have each each kind of St no define and what its children should be how many and what are their tapes and the funny bit is that there is no generator which which takes this file as input this
file is just formal documentation but the parser is handwritten tokens the yeah
yeah so this hand written parser is actually through 52 100 lines long
it's a bit of a very old and thus to
quote which states from 4 thousand 5 to 2007 nobody touches that nobody remembers how it works and the this the general attitude is the if it ain't broke don't fix it the that if it is the it has been working for 10
years time has proven that it does work OK yeah and what was once again the AST 4 hours a sample code is the very simple structure of not 9 nodes which I'm showing here but
but so that this is what these 2nd part actually looks like has
that the example is function that the passes X for a type of modes and what it does it has a loop organized with the label and I go to from the middle of the function something
that Dexter wouldn't approve and what it actually does is that for each note which only has 1 child it knows that there is nothing interesting for this node so nothing to do we just give you the next so we see that 2 thirds of the nodes in the CST have 1 child we don't do anything with them we just skip the the next node what I thought was the why do we have all this useless modes occupying the memory if we can eliminate them at the time when we generate the CST than at the same time whereas savant two-thirds of the memory and also the 2nd person becomes simpler because it doesn't have to
be concerned with this given all this useless modes this is what an example of them compressed CST looks like it's actually small enough so it fits on 1 slide without scrolling and at the same time we have this modified function which no longer has the awkward loop actually doesn't have
any looks at all what it does is that in each branch of the switch it can answer that the note has more than 1 chip child mean we can do something meaningful with this not and but just keep it now the the this
shows the serialized CST is
made before and after my fixed and you can see that the CST has becomes a so small that the ball was becomes human readable the catch here is that the only
purpose of CST still be biased by the 2nd pass until the st and then it's discarded for the memory would never use it again so
if we have a woman in scripts such as a web service for example that Bolton in that split-second run for hours and days and today doesn't matter will how much memory the syste takes because it's only there for the 1st split-second on the other hand if we have a system utility which Francis for a split-second and finishes immediately and the performance of this utility would be dominated by the performance of the past and that was my case I my script was just look up the value in
the 30 megabytes the chill dictionary span indeed there where where they
might fixed of the convo Python parser memory consumption of of my script were reduced by a factor of 3 so I was curious how Hawaii much would affect other standard benchmarks and this of the results 1 of them has shown the improvement by
3rd a few of them have shown around a dozen per cent improvement and nothing has but 1st but actually they use benchmark package which I was using
was the prediction October last year on the grounds that the results it was shown or inconsistent and meaningless for some purposes like I'm I'm not that performance analysis of some sort I don't know what a deep and wanted to on but it had this been superseded where they by performance but the package and these are the results
wizard is something got a little bit better something but a little bit 1st but the big picture is that often changed my intuition is that type of problems intentionally runs at a few warm-up loops before as a certain the timer and this way
indention layout avoids at time and the loading and personal of the source code I didn't show can inside the book by performance it's beacon sophisticated enough but nevertheless even this result and from that my by chance no unintended side-effects doesn't break anything and at the same time it enables a use case which may not be relevant to most of the Python programmers but at least the if I bump into it maybe other people's other a people would bump into it as well yeah so
my desire was to show this but for the community to get it included into Python how somebody that's this is it opens a bug in the buck occur because that's where all the discussions happen where all the reviews happens and they also have a is
there and you're allowed kills this sound being once a month if your bike is not getting enough
attention so I opened tool issues in the vector to occur and what 1 of them at the 2nd 1 of them actually got accepted and committed the the book me a 3 three-month waiting for a review and what it does was and the 2nd hand written parser knows that the input that it passes all this comes from the 1st class so
it's guaranteed to conform with the 1st formal grammar so it doesn't have to validate that it conforms to it it knows for sure that for example for the statement the 1st child is always give token then the 2nd child is always
test but 1st a module can take an arbitrary serialize syste as input and run it through these of 2nd person and then if it doesn't conform to the formal grammar of the python binary which expression not at Memorex so they had to implement the validation for this CST themselves how they implemented this was like this
I'm showing just litigation for 3 different types of nodes but it it was all more or less the same and they are chicken that the number of the children the motives and the types of the children a correct and then recursively this into each
child node but and the callable think function for the child node when I was looking at this code which had to be updated with that every change to the grammar too much the update grammar I thought maybe we can what the generated from the grammar maybe we can write some DFA face which would traverse the CIA to see if it's valid
and than what occurred to me is that we already have this be a face this are that they face which pass around you can just run them on the CST instead of the the token in a stream of the input so this
way I could eliminate 20 400 lines of hand-written code the just 7 2 lines of code for running the difference which we already have in by comparison and which are guaranteed to stay in sync with the positive because they are what the parser runs but after a 3 months
individual my other Pige was
rejected by with a 1 recent himself Sanders because because it's so big unsophisticated and nobody has a link you to review it a 3 months I shouldn't hold my breath for anybody to jump up and review it
and if nobody cares as much as I do about random 30 megabytes by conscripts through the interpreter than stuff like for me whether this is the motivation well I'm
here maybe you can and for this so it will be built 2 6 4 1 5 uptake of that they can we control reviewed and from that it's OK that had committed Oklahoma's explained what might prejudice when that
sample if the only purpose of the CST is to get but but biased by the 2nd pass and discarded whether we have
to store it in the memory at all maybe you can generate 1 sees you know that the time handle it
in the MST parser and discarded immediately that way instead of improving its memory consumption by a factor of 3 we're improving it infinitely tool takes 0 memory
what the current impostor actually does
is it has school entry points 1 string knowledge a wonderful file objects it but passes these input tool 1st of function which runs a look inside the it's extracting 1 the token at the time from the input Boston and this extracted token tool bypasses had open function and this at token kind of consumer function when maintains its state in
the state structure which most importantly has this step all the EDF face the keeping track of the current state for each DFA and is the pointer to
the CST rule not so it's a Benson you is the most to build this tree after a personal use them up at end of input it returns to the C route to the caller and the caller processes
the 2nd function called by st from node objects which is essentially the 2nd pass it recursively that traverses this 3 and 4 for each subtree of the CST
it generates the corresponding subtree of the St the important bit here is that the whole of CST he's available by that time 2nd parser around so it can analyze modes out before the it can be analyzed so as the 1 node repeatedly and
so on my suggestion is so replaced
the by Steve from node object with a different function something like bias due from tokeniser which would more or less the same but for the for getting each CST not you would call this next node functions function which would make the passes states all the parameters for the tokenizer which we will just pass on pulled the tokenizer and to generate
from note in the output which means only node type and mode string but no children and as will return them a number of Europe indicate in the position of the new node relatively tool the previous node so whether it's the 1st child of a parent node and next season another child mold or whether a the previous what was the last child and we're going back to the buyer and the difficult question curious what what are we going to do with the bypass a module because we no longer have the ECC is to use memory there is nothing that it would be serialized until the structure of nested lists and I'm not sure anybody is
actually using the person model even the especially so that it's the command up as deprecated well all analyzer that I have seen our using the ST because it's
much easier to use but we have them some effort we can make the parser itself call next Motorika recursively to materialize the CST FIL sterilize it into all these nested lists and
then to discard the whole thing just for compatibility with the existing consumers so there are some and there are some difficulties here the
minor difficulties that the power still can no longer have a loop run inside
we have to change it to a generator function and accept the look of it until the process state structure but the main difficulty is that we have to rewrite all the 50 to 100 lines of the 100 comparison this is the example of the right and 1 function in this way this is for parser for slice
nodes the slices when you have like square brackets and 2 to 1 or 2 or 3 numbers so separated by colons it inside them and see that they all parser knows all kinds of interesting things about the American but parser can only check each note once and he has to go on these nodes in order which actually simplifies the code
good the question is it would obviously take some time to rewrite the 50 to 100 lines this way maybe several weekends many weekends islands the we want to do this and if for example i takes a several we can still that do this will there be somebody who wants to review that patch including the right and offer 50 to 100 lines of code that's actually all I have and so
on I put my questions into what
will this is the image that came up so I search be fair and be and thank you very much so and questions
but when you have the problem tokenizing year 30 bytes worse file what was the limit that you hit to use simply run out of memory was there some internal structure a static size it the form yes at that I was doing is on my laptop which only has 2 gigabytes of memory so if you bought more memory ahead more sparse and I would have been a problem and in the future we deserve even larger than the are it's not going to be a problem for hardly anybody Bloom if you want to run Python on your phone and it always we're problem a lot in this in the um rather than rewrite all the conversion from the concrete parse tree to the abstract syntax tree is
it not possible to some migrant of grammar definition that to generate occurred because it does look awfully repetitive courage you I
agree that that does look repetitive and I I think that the I can actually show this the like most grammars
for yet for example include the because of beautiful call to with the each production that the parser generator it
inserts into the parser maybe something similar can be done to this book as the OWL file to help the generate and the parser but this would change the syntax of this file which is something formal and somebody outside of the bars so of book by country may rely on this exact structure so you yes what while k the same the in the structure of the grammar it it's not possible if if we a give ourselves they availability to rewrite the grammar than it would be possible but we may be breakends somebody else's code we
and I have once and this is also and if the entire problem is having maintain our I think you could
become 1 this year clearly qualified this thank you broader it's worth the supply from tree move to get for 5 months ago so all the Hg links are now 5 months of data you Sorokin European this web page is a sheet of hypernetwork Python is now has and get hot and has been for 5 months so we're now looking at how models yeah I'm Icelandic is actually all of than 5 months because this work has been going on since the year ago but I'm sorry what is an outdated links this to look and the new questions in the something to them and to thank
Intel
Software
Parser
MIDI <Musikelektronik>
Facebook
Interpretierer
Videospiel
Compiler
Mereologie
Parser
Implementierung
Unrundheit
Quellcode
Paarvergleich
Parser
Formale Grammatik
Programmiersprache
Subtraktion
Bit
Formale Grammatik
Parser
Symboltabelle
Biprodukt
Binder <Informatik>
Ein-Ausgabe
Computeranimation
Regulärer Ausdruck
Zahlensystem
Token-Ring
CMM <Software Engineering>
Gerade
Formale Grammatik
Web Site
Prozess <Physik>
Formale Grammatik
Parser
Zahlenbereich
Token-Ring
Primideal
Elektronische Publikation
Code
Computeranimation
Kreisbogen
Eins
Token-Ring
Beweistheorie
Datentyp
E-Mail
Aggregatzustand
Subtraktion
Formale Grammatik
Zahlenbereich
Symboltabelle
Biprodukt
Elektronische Publikation
Computeranimation
Kreisbogen
Regulärer Ausdruck
Menge
Rechter Winkel
Mereologie
Identifizierbarkeit
Aggregatzustand
Tabelle <Informatik>
Formale Grammatik
Token-Ring
Syntaxbaum
Parser
Zahlenbereich
Aggregatzustand
Biprodukt
Optimierung
Ein-Ausgabe
Computeranimation
Kreisbogen
Aggregatzustand
Hydrostatik
Befehl <Informatik>
Syntaxbaum
Gruppenoperation
Endogene Variable
Zahlenbereich
Zusammenhängender Graph
Token-Ring
Hochdruck
Gerade
Computeranimation
Kreisbogen
Hydrostatik
Befehl <Informatik>
Multifunktion
Gruppenoperation
Symboltabelle
Aggregatzustand
Hochdruck
Computeranimation
Kreisbogen
Konfiguration <Informatik>
Rechenschieber
Suite <Programmpaket>
Mereologie
Radikal <Mathematik>
Aggregatzustand
Fitnessfunktion
Softwaretest
ATM
Hydrostatik
Syntaxbaum
Gruppenoperation
Zahlenbereich
Ein-Ausgabe
Hochdruck
Computeranimation
Kreisbogen
Suite <Programmpaket>
Offene Menge
Aggregatzustand
Hydrostatik
Bit
Knotenmenge
Syntaxbaum
Suite <Programmpaket>
Zahlenbereich
Token-Ring
Hochdruck
Computeranimation
Aggregatzustand
Kreisbogen
Hydrostatik
Suite <Programmpaket>
Radikal <Mathematik>
Ein-Ausgabe
Hochdruck
Gerade
Computeranimation
Aggregatzustand
Subtraktion
Zahlenbereich
Ein-Ausgabe
Hochdruck
Computeranimation
Kreisbogen
Intel
Software
Wort <Informatik>
Facebook
Gerade
Phasenumwandlung
Aggregatzustand
Zeichenkette
Spannweite <Stochastik>
Knotenmenge
Token-Ring
Syntaxbaum
Funktion <Mathematik>
Festspeicher
Ein-Ausgabe
Formale Grammatik
Token-Ring
Quellcode
Optimierung
Hochdruck
Code
Computeranimation
Zeichenkette
Soundverarbeitung
Bit
Token-Ring
Funktion <Mathematik>
Code
Festspeicher
Ein-Ausgabe
Quellcode
Elektronische Publikation
Systemzusammenbruch
Hochdruck
Computeranimation
Datentyp
Syntaxbaum
Standardmodell <Elementarteilchenphysik>
Parser
Mailing-Liste
Symboltabelle
Element <Mathematik>
Information
Computervirus
Systemzusammenbruch
Hochdruck
Computeranimation
Zeichenkette
Token-Ring
Funktion <Mathematik>
Code
Ein-Ausgabe
Radikal <Mathematik>
Datenstruktur
Syntaxbaum
Datentyp
Punkt
Thumbnail
Compiler
Parser
Quellcode
Information
Computervirus
Hochdruck
Modul
Code
Computeranimation
Forcing
Konditionszahl
Abstrakter Syntaxbaum
Byte-Code
Radikal <Mathematik>
Optimierung
Syntaxbaum
Zeichenkette
Formale Grammatik
Bit
Abstrakter Syntaxbaum
Code
Abstrakter Syntaxbaum
Magnetbandlaufwerk
Formale Grammatik
Parser
Dateiformat
Ein-Ausgabe
Elektronische Publikation
Syntaxbaum
Computeranimation
Formale Grammatik
Maschinenschreiben
Bit
Abstrakter Syntaxbaum
Parser
Token-Ring
Euler-Winkel
Benutzeroberfläche
Elektronische Publikation
Parser
Hochdruck
Computeranimation
Code
Syntaxbaum
Gerade
Formale Grammatik
Knotenmenge
Abstrakter Syntaxbaum
Code
Mereologie
Stichprobenumfang
Parser
Datenstruktur
Syntaxbaum
Große Vereinheitlichung
Systemaufruf
Code
Computeranimation
Physikalischer Effekt
Lineares Funktional
Loop
ATM
Knotenmenge
Festspeicher
Datentyp
NP-hartes Problem
Innerer Punkt
Message-Passing
Computeranimation
Rechenschieber
Lineares Funktional
ATM
Loop
Syntaxbaum
Statistische Analyse
Verzweigendes Programm
Computeranimation
Datentyp
Suite <Programmpaket>
Default
Parser
Information
Hochdruck
Computeranimation
Abstrakter Syntaxbaum
Softwarewerkzeug
Benchmark
ROM <Informatik>
Computeranimation
Systemsoftware
Web Services
Standardabweichung
Festspeicher
Skript <Programm>
Skript <Programm>
Syntaxbaum
Message-Passing
Resultante
Abstrakter Syntaxbaum
Benchmark
Parser
ROM <Informatik>
Teilbarkeit
Computeranimation
Data Dictionary
Standardabweichung
Festspeicher
Skript <Programm>
Skript <Programm>
Syntaxbaum
Benchmark
Sinusfunktion
Resultante
Quellcode
Last
Prognoseverfahren
Standardabweichung
Parser
Benchmark
ROM <Informatik>
Quick-Sort
Computeranimation
Analysis
Sinusfunktion
Quellcode
Loop
Programmiergerät
Bit
Datentyp
Randverteilung
Parser
Kontrollstruktur
Benchmark
Quellcode
Computeranimation
Besprechung/Interview
Datensicherung
Computeranimation
Programmfehler
Befehl <Informatik>
Abstrakter Syntaxbaum
Ein-Ausgabe
Klasse <Mathematik>
Formale Grammatik
Parser
Vektorraum
Ein-Ausgabe
Parser
Syntaxbaum
Datensicherung
Computeranimation
Softwaretest
Subtraktion
Abstrakter Syntaxbaum
Syntaxbaum
Validität
Formale Grammatik
Parser
Zahlenbereich
Ein-Ausgabe
Datensicherung
Modul
Binärcode
Computeranimation
Arithmetischer Ausdruck
Knotenmenge
Ein-Ausgabe
Datentyp
Serielle Schnittstelle
Syntaxbaum
Streaming <Kommunikationstechnik>
Lineares Funktional
Knotenmenge
Mathematisierung
Validität
Formale Grammatik
Ein-Ausgabe
Syntaxbaum
Code
Computeranimation
Formale Grammatik
Subtraktion
Parser
Patch <Software>
Paarvergleich
Parser
Code
Synchronisierung
Computeranimation
Message-Passing
Thread
Code
Datenerfassung
Syntaxbaum
Gerade
Interpretierer
Message-Passing
Last
Thread
Ruhmasse
Parser
Patch <Software>
Benutzeroberfläche
E-Mail
Binder <Informatik>
Computeranimation
Festspeicher
Stichprobenumfang
Parser
Syntaxbaum
Teilbarkeit
Message-Passing
Computeranimation
Proxy Server
Lineares Funktional
Punkt
Datentyp
PASS <Programm>
Parser
Token-Ring
Ein-Ausgabe
Elektronische Publikation
Systemaufruf
Computeranimation
Objekt <Kategorie>
Zeichenkette
Ein-Ausgabe
Deterministischer endlicher Automat
Syntaxbaum
Zeichenkette
Datentyp
Parser
PASS <Programm>
Schlussregel
Routing
Ein-Ausgabe
Systemaufruf
Computeranimation
Zeichenkette
Netzwerktopologie
Weg <Topologie>
Ein-Ausgabe
Deterministischer endlicher Automat
Datenstruktur
Zeiger <Informatik>
Syntaxbaum
Aggregatzustand
Lineares Funktional
Bit
Datentyp
Syntaxbaum
PASS <Programm>
Parser
Parser
Systemaufruf
Computeranimation
Zeichenkette
Objekt <Kategorie>
Knotenmenge
Polygonzug
Ein-Ausgabe
Deterministischer endlicher Automat
Syntaxbaum
Message-Passing
Lineares Funktional
Parametersystem
Subtraktion
Polygonzug
Parser
PASS <Programm>
Token-Ring
Systemaufruf
Computeranimation
Objekt <Kategorie>
ENUM
Knotenmenge
Funktion <Mathematik>
Parametersystem
Ein-Ausgabe
Virtuelle Realität
Deterministischer endlicher Automat
Syntaxbaum
Message-Passing
Proxy Server
ATM
Ortsoperator
Polygonzug
Parser
Zahlenbereich
Mailing-Liste
Modul
Computeranimation
ENUM
Informationsmodellierung
Knotenmenge
Festspeicher
Parametersystem
Virtuelle Realität
Vererbungshierarchie
Datenstruktur
Syntaxbaum
Funktion <Mathematik>
Zeichenkette
Datentyp
Syntaxbaum
Polygonzug
Parser
Mathematisierung
Parser
Computeranimation
Loop
ENUM
Funktion <Mathematik>
Parametersystem
Virtuelle Realität
Syntaxbaum
Leistung <Physik>
Lineares Funktional
Datentyp
Prozess <Physik>
Program Slicing
Mathematisierung
Zahlenbereich
Paarvergleich
Parser
Code
Computeranimation
Erzeugende
Knotenmenge
Poisson-Klammer
Rechter Winkel
Datenstruktur
Ordnung <Mathematik>
Gerade
Aggregatzustand
Patch <Software>
Rechter Winkel
Ablöseblase
Mathematisierung
Patch <Software>
Code
Gerade
Computeranimation
Bildgebendes Verfahren
Computeranimation
Umsetzung <Informatik>
Bildschirmmaske
Syntaxbaum
Notebook-Computer
Festspeicher
Abstrakter Syntaxbaum
Formale Grammatik
Inverser Limes
Elektronische Publikation
Datenstruktur
Computeranimation
Einfügungsdämpfung
Besprechung/Interview
Formale Grammatik
Systemaufruf
Datenstruktur
Elektronische Publikation
Biprodukt
Parser
Gerade
Code
Computeranimation
Netzwerktopologie
Informationsmodellierung
Binder <Informatik>
Web-Seite
Computeranimation

Metadaten

Formale Metadaten

Titel How CPython parser works, and how to make it work better
Serientitel EuroPython 2017
Autor Skrobov, Artyom
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/33817
Herausgeber EuroPython
Erscheinungsjahr 2017
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract How CPython parser works, and how to make it work better [EuroPython 2017 - Talk - 2017-07-12 - PythonAnywhere Room] [Rimini, Italy] The part of CPython core that parses the Python source code is some very old and convoluted code: the time has proven its reliability, but few CPython hackers understand (or care) how it works, or even what exactly it does. There is, however, a good reason to care: for short-running scripts, the performance of CPython may easily be dominated by that of parsing the source code. The talk will describe the two parsers that are involved, it will explain how these two parsers build two different kinds of syntax trees, and then show how the structure of one of the trees can be amended to reduce its memory footprint threefold, with only minor changes necessary in its consumers. It will also suggest other, more invasive improvements, which can yield even better savings. The talk will assume fluency in C and a basic acquaintance with CPython core internals, and will give the attendees an introduction into hacking the parser, guiding their way through to the very tangible end result of reducing Python overall memory consumption by up to 30%, measured at standard micro-benchmarks

Ähnliche Filme

Loading...