How CPython parser works, and how to make it work better

Video thumbnail (Frame 0) Video thumbnail (Frame 647) Video thumbnail (Frame 1552) Video thumbnail (Frame 2522) Video thumbnail (Frame 4129) Video thumbnail (Frame 5516) Video thumbnail (Frame 7276) Video thumbnail (Frame 8633) Video thumbnail (Frame 9517) Video thumbnail (Frame 10290) Video thumbnail (Frame 11321) Video thumbnail (Frame 11950) Video thumbnail (Frame 12660) Video thumbnail (Frame 13266) Video thumbnail (Frame 14230) Video thumbnail (Frame 15415) Video thumbnail (Frame 16536) Video thumbnail (Frame 17196) Video thumbnail (Frame 17903) Video thumbnail (Frame 18722) Video thumbnail (Frame 19340) Video thumbnail (Frame 20358) Video thumbnail (Frame 21085) Video thumbnail (Frame 22483) Video thumbnail (Frame 23372) Video thumbnail (Frame 24065) Video thumbnail (Frame 25079) Video thumbnail (Frame 25840) Video thumbnail (Frame 26664) Video thumbnail (Frame 28085) Video thumbnail (Frame 28756) Video thumbnail (Frame 29405) Video thumbnail (Frame 30363) Video thumbnail (Frame 31348) Video thumbnail (Frame 32174) Video thumbnail (Frame 32855) Video thumbnail (Frame 33598) Video thumbnail (Frame 34203) Video thumbnail (Frame 34930) Video thumbnail (Frame 35562) Video thumbnail (Frame 36384) Video thumbnail (Frame 37293) Video thumbnail (Frame 38371) Video thumbnail (Frame 39062) Video thumbnail (Frame 39934) Video thumbnail (Frame 40783) Video thumbnail (Frame 41602) Video thumbnail (Frame 42299) Video thumbnail (Frame 43633) Video thumbnail (Frame 44388) Video thumbnail (Frame 45358)
Video in TIB AV-Portal: How CPython parser works, and how to make it work better

Formal Metadata

Title
How CPython parser works, and how to make it work better
Title of Series
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Identifiers
Publisher
Release Date
2017
Language
English

Content Metadata

Subject Area
Abstract
How CPython parser works, and how to make it work better [EuroPython 2017 - Talk - 2017-07-12 - PythonAnywhere Room] [Rimini, Italy] The part of CPython core that parses the Python source code is some very old and convoluted code: the time has proven its reliability, but few CPython hackers understand (or care) how it works, or even what exactly it does. There is, however, a good reason to care: for short-running scripts, the performance of CPython may easily be dominated by that of parsing the source code. The talk will describe the two parsers that are involved, it will explain how these two parsers build two different kinds of syntax trees, and then show how the structure of one of the trees can be amended to reduce its memory footprint threefold, with only minor changes necessary in its consumers. It will also suggest other, more invasive improvements, which can yield even better savings. The talk will assume fluency in C and a basic acquaintance with CPython core internals, and will give the attendees an introduction into hacking the parser, guiding their way through to the very tangible end result of reducing Python overall memory consumption by up to 30%, measured at standard micro-benchmarks
Intel Software Parsing
Pairwise comparison Implementation Roundness (object) Multiplication sign Source code Interpreter (computing) Video game Parsing Mereology Parsing Compiler
Programming language Link (knot theory) Token ring Parsing Bit Line (geometry) Regulärer Ausdruck <Textverarbeitung> Symbol table Product (business) Positional notation Different (Kate Ryan album) Formal grammar output Formal grammar Capability Maturity Model Default (computer science)
Email Computer file State of matter Token ring Code Token ring 1 (number) Parsing Core dump Number Prime ideal Type theory Proof theory Process (computing) Formal grammar Website Formal grammar Arc (geometry)
Arm Identifiability Computer file State of matter Set (mathematics) Angle Regulärer Ausdruck <Textverarbeitung> Mereology Symbol table Product (business) Number Different (Kate Ryan album) Formal grammar Right angle Table (information) Arc (geometry) Chi-squared distribution
State of matter Token ring Syntaxbaum output Parsing Computer programming Formal grammar Arc (geometry) Product (business) Number
Dependent and independent variables Group action Computer virus Fluid statics Token ring Connectivity (graph theory) Statement (computer science) Syntaxbaum Letterpress printing Line (geometry) Arc (geometry) Number
Slide rule Group action Suite (music) State of matter Correspondence (mathematics) Fitness function Letterpress printing Mereology Statistics Symbol table Radical (chemistry) Fluid statics Computer configuration Set (mathematics) Statement (computer science) Social class Convex hull Arc (geometry) Arc (geometry)
Group action Suite (music) State of matter Multiplication sign Syntaxbaum Letterpress printing Open set Statistics Number Fluid statics output Software testing Asynchronous Transfer Mode Arc (geometry)
Fluid statics Suite (music) Token ring State of matter Multiplication sign Syntaxbaum Bit Letterpress printing Arc (geometry) Number
Radical (chemistry) Suite (music) State of matter output Letterpress printing Line (geometry)
Intel Word Different (Kate Ryan album) State of matter Phase transition output Social class Letterpress printing Line (geometry) output Arc (geometry) Number
Token ring Code Source code Range (statistics) Token ring Syntaxbaum Letterpress printing Computer programming Number Semiconductor memory String (computer science) Function (mathematics) Formal grammar Vertex (graph theory) output
Source code Computer file Multiplication sign Source code Token ring Sound effect Bit Letterpress printing Semiconductor memory Number Error message Semiconductor memory Function (mathematics) String (computer science) Vertex (graph theory) output
Source code Parsing Computer virus Suite (music) Standard Model Token ring Syntaxbaum Electronic mailing list Parsing Letterpress printing Semiconductor memory Symbol table Syntaxbaum Element (mathematics) Radical (chemistry) Number Error message Function (mathematics) String (computer science) Vertex (graph theory) Information Data structure output Data type
Module (mathematics) Bytecode Point (geometry) Parsing Suite (music) Code Forcing (mathematics) Source code Parsing Core dump Letterpress printing Computer programming Syntaxbaum Abstract syntax tree Compiler Radical (chemistry) String (computer science) Information Condition number Thumbnail Data type
Source code Computer virus Computer file File format Tape drive Abstract syntax tree Parsing Bit Exponential function Syntaxbaum Abstract syntax tree Pointer (computer programming) Formal grammar output Formal grammar Default (computer science)
Source code Execution unit Computer file Token ring Euler angles Parsing Abstract syntax tree Bit Letterpress printing Line (geometry) Parsing Syntaxbaum Touch typing Formal grammar
Source code System call Mapping Code Multiplication sign Sampling (statistics) Abstract syntax tree Parsing Usability Mereology Syntaxbaum Data structure Loop (music) Formal grammar Default (computer science)
Type theory Functional (mathematics) Message passing Decision tree learning Loop (music) Semiconductor memory Multiplication sign Usability Loop (music) Repeating decimal Asynchronous Transfer Mode
Slide rule Functional (mathematics) NP-hard Multiplication sign Strut Video game Syntaxbaum Usability Branch (computer science) Statistics Dressing (medical) Loop (music) Asynchronous Transfer Mode
Dressing (medical) Suite (music) Parsing Information Letterpress printing Data type
Scripting language Standard deviation Scripting language Abstract syntax tree System software Syntaxbaum Message passing Web service Benchmark Read-only memory Semiconductor memory Personal digital assistant Utility software Data compression
Scripting language Standard deviation Scripting language Divisor Abstract syntax tree Data dictionary Benchmark Parsing Syntaxbaum Benchmark Read-only memory Semiconductor memory Resultant Data compression
Predictability Standard deviation Source code Parsing Benchmark Read-only memory Mathematical analysis Quicksort Resultant Data compression Sound effect
Source code Parsing Numerical digit Multiplication sign Source code Control flow Bit Sound effect Type theory Programmer (hardware) Programmschleife Benchmark Personal digital assistant
Backup Software bug
Vector space Patch (Unix) Formal grammar Statement (computer science) output Parsing Abstract syntax tree output Backup Parsing Syntaxbaum Social class
Module (mathematics) Serial port Validity (statistics) Patch (Unix) Binary code Expression Syntaxbaum Parsing Abstract syntax tree Syntaxbaum Number Type theory Different (Kate Ryan album) Formal grammar output Software testing output Backup
Mathematics Functional (mathematics) Validity (statistics) Code Formal grammar output Streaming media Syntaxbaum
Source code Pairwise comparison Polygon mesh Code Line (geometry) Maxima and minima Parsing Line (geometry) Term (mathematics) Parsing Syntaxbaum Different (Kate Ryan album) Synchronization Message passing Formal grammar
Execution unit Link (knot theory) Patch (Unix) Electronic program guide Normed vector space Interpreter (computing) Parsing Term (mathematics) Message passing Curvature
Message passing Divisor Semiconductor memory Multiplication sign Sampling (statistics) Parsing Syntaxbaum
Point (geometry) Parsing Functional (mathematics) Computer file Token ring Mountain pass Multiplication sign Token ring Parsing Stack (abstract data type) Syntaxbaum Root String (computer science) String (computer science) output Object (grammar) output Proxy server Data type
Trail Parsing System call State of matter Mountain pass Token ring Parsing Stack (abstract data type) Rule of inference Syntaxbaum Root Pointer (computer programming) String (computer science) Network topology output Deterministic finite automaton Vertex (graph theory) Data structure output Routing Data type
Parsing Functional (mathematics) System call Mountain pass Multiplication sign Token ring Syntaxbaum Parsing Bit Stack (abstract data type) Traverse (surveying) Parsing Syntaxbaum Message passing Root String (computer science) Deterministic finite automaton Vertex (graph theory) Object (grammar) output Data type
Functional (mathematics) System call Token ring Mountain pass Parsing Parameter (computer programming) Stack (abstract data type) Parameter (computer programming) Syntaxbaum Traverse (surveying) Telephone number mapping Message passing Root Different (Kate Ryan album) String (computer science) Function (mathematics) Vertex (graph theory) Object (grammar) output
Module (mathematics) Inheritance (object-oriented programming) Civil engineering Electronic mailing list Parsing Parameter (computer programming) Function (mathematics) Syntaxbaum Number Traverse (surveying) Telephone number mapping Semiconductor memory String (computer science) Vertex (graph theory) Data structure Endliche Modelltheorie Proxy server Position operator Asynchronous Transfer Mode
Parsing Civil engineering Syntaxbaum Parsing Parameter (computer programming) Parsing Syntaxbaum Power (physics) Traverse (surveying) Telephone number mapping Loop (music) Function (mathematics) Loop (music) Data type
Pairwise comparison Parsing Execution unit Functional (mathematics) State of matter Code Poisson-Klammer Line (geometry) Generating function Parsing Number Process (computing) Program slicing Order (biology) Right angle Data structure Loop (music) Data type
Code Patch (Unix) Multiplication sign Right angle Line (geometry) Flow separation Loop (music)
Medical imaging
Laptop Computer file Semiconductor memory Syntaxbaum Formal grammar Data structure Data conversion Limit (category theory) Abstract syntax tree Form (programming)
Mathematics Parsing Computer file Code Formal grammar Insertion loss Mereology Data structure System call Parsing Product (business)
Web page Link (knot theory) Network topology Endliche Modelltheorie
few people these but it is about some that they did with my previous employer
when any of them they asked me to remove from the slides and mention their name so I'm here with no business affiliation and unlike problem most books in this conference this is not about writing stuff with Python this is about
life and himself and the specific about the C Python which is the reference implementation and even more specific about the parser which is part of this implementation so what is the by comparison when they put by the the will this is the
picture that came up but actually passes
the 1st is the stage of the interpreter which converts the source quoted in tool are straight which the compiler that compiles and might call time they might code is what the interpreter actually rounds now what this western parts are like the same as
any mature programming language it has formal grammar which looks something like
this if this link open has formally define production 3 starts at the symbols for a 3 different kinds of inputs every line defines a new nonterminal symbol and the notation here is a bit more complicated than the BNF notation that promise us usually written in something more like regular expressions so we have something like plus modifiers and somewhere star modifiers for repetitions
as well as parentheses for grouping and that is the other ingredient is the header file with the definition of all tokens but has 57 types of tokens numbered 0 to 56 and the numbers above 256 are reserved for non-terminals which are the ones that are defined in the grammar then we have parser-generator that based on the formal grammar and a token definition producers a file which looks like this it's just a definition of site stating dated doesn't have any code in it all it does is he defines the DFA wonderfully for every nonterminal that the grammar defines so every did if a is defined as an array of states every state is defined as an array of arcs and every hour has a label and this nation states the also
I thought mystring is duplicative but it's not really the sorry did the process 4 of the 5 primes but this again did you see the proof
of the fact that I can show OK then I'll have to scroll back this is what the grammar looks like with their productions and the right hand sides that looks like a regular expressions and that's what they had a file like with the that the token definitions I don't know how to make it the plaquette again let me try and
right now at this set assessing myself what I'm showing so those to go back each so this state is there that defined as an array of parts of the going out of the state and each arc is defined as in now each dephasing is defined as Arab states that each state is defined as a way of arms going out with the 1st item is the arc label which is more or less the same as the symbol number and the 2nd item is the destination state and in fact we have a special table which converts a symbol ladies into all arc labels because symbol number 1 it corresponds to all keywords and all identifiers so we also have look much which specific keyword it corresponds to so that each each 1 of those you get a different arc label number even though they're all sharing the same symbol 80 and the near the very end of this file we
have the fission of the themselves and for each the way we also have the
starts which shows which tokens the so this specific nonterminal can start with and this is so sophisticated
enough so I prepare a short demonstration
of how this works for very simple Python program which looks like this we pass it as a single input so that's the opera production that record person it according to an upper parse tree starts with they just mobile single-input and the DFA which pass a single input is that if a number 0 which looks like this and in the initial states which estate number 0 we have 3 arcs going out of it so the
1st arc with label number 2 response to the new line next characters simple statement you lose 3rd are 2 component statement and we're looking at that if tokens in the inputs and
according to the start since we know that simple statement cannot begin with if if it's not a new line of compounds that we can begin with if so we transition along the last hour are the combined statement no to the parse tree and before we actually transitional state number 2 we switch
to the new DFA which versus compound statements forward composite but here we have many options so in the initial state we have 9 arcs going out of it only the 1st hour which is for the statement can start with them if spoken so we add if statements notably pass through transition along this and so on now we switch to the BFA which passes if statements its sophisticated enough that it didn't fit on the slide we because you say that it's definition is
all sophisticated heavenly by parenthesis under repetitions and optional parts it In the initial step we have only 1 hour which with a label 97 corresponds to the if spoken because it's a terminal symbol we just added to the
parse tree consume it from the input the switch to the next state understand the same DFA these next state once again has only 1 are going out of it but this 1 it's for a test nonterminal the starts at is that it can be begin with the number of such as for the 2 so we are the test mode
switched of the give which should have 1st who passes test and so on it calls for an awful long time before we actually get back into the same DFA and now we're in state number 2 past 1 Arik which consumes the Cologne Open from the input transitions to the next state once again we have only 1 arc
responding to sweet they starts set confirms that sweet can begin from print token so we added sweet node of the parse tree switched all DFA for sweet once again it takes an awful long time before we get here
and the interesting bit is now we have 3 arcs going out of the state number 4 arc with label 98 corresponding
to the terminal LP if 99 to the terminal else and the next open in the input is the new line so it's neither of these and the arc labeled 0 means that is a final state so if we did much anything which has are just done
with this DFA and go back to the previous 1 so we go back to composite on rear ended
up in stay that only has a 0 are which means no matter what the next token in the input is we done and going back to the previous 1 go back to the real difference was started from there is only 1 arc with label number 2 which is funny line so we consume the land from the input add a note to the past
3 and the transition state number 1 which is fine final so we are at end of input word in the final state of the would be a phase which means that they are since succeeded where so what has just
happened we had the program which was 27 bytes long tokenized until 7 tokens and this is the parse tree which we produced for it 64 nodes takes the with 109 as much memory as the source
code itself and it's not so much at 307 because of almost doesn't range we see that we have a huge the like serious sulfur a dozen nodes we a single child each the in all these big because so that Python has a rather primitive faster which has eluded did defined grammar like this world it looks is seriously instead of
repeat in all steps that have shown just yeah that's more important the bit because the
past together can take a hundred times as much memory as the source code if I have a
moderately big the book by that's a source file of say 30 megabytes that the a person what would not even be able to all all this into memory because they think it's effect would would would would be too big so that was my motivation for all this work I had 30 megabyte file file and I couldn't possibly run it through Python we showed
these this is CST stand for concrete syntax tree there is a standard model
called faster and you can show good the CST sir license so structure of nested lists and every this structure has 2 elements the 1st 1 is the symbol lady so you'd say 0 tool 56 for terminals and something above
256 for terminals so the sec 2nd item is the spoken string for thumbnails forces subtree In an honest at least 4 of nonterminals the departed the condition for passing module the actually says that people shouldn't use it because there is about a module called st stand for abstract syntax tree which is a 2nd kind of syntax tree that
Python uses and they used example shows what the st looks like for the same way example Python code this so that it's much more compact it's almost human readable and the point is that unlike C C which reflects how the program is represented in the source code they st reflects what the pro problem actually intends to do so it's actually the st that the compiler runs on to generate the bytecode
the so the since 2nd
kind of syntax tree actually also has a formal grammar which is what this formal grammar looks like a different format now we have each each kind of St no define and what its children should be how many and what are their tapes and the funny bit is that there is no generator which which takes this file as input this
file is just formal documentation but the parser is handwritten tokens the yeah
yeah so this hand written parser is actually through 52 100 lines long
it's a bit of a very old and thus to
quote which states from 4 thousand 5 to 2007 nobody touches that nobody remembers how it works and the this the general attitude is the if it ain't broke don't fix it the that if it is the it has been working for 10
years time has proven that it does work OK yeah and what was once again the AST 4 hours a sample code is the very simple structure of not 9 nodes which I'm showing here but
but so that this is what these 2nd part actually looks like has
that the example is function that the passes X for a type of modes and what it does it has a loop organized with the label and I go to from the middle of the function something
that Dexter wouldn't approve and what it actually does is that for each note which only has 1 child it knows that there is nothing interesting for this node so nothing to do we just give you the next so we see that 2 thirds of the nodes in the CST have 1 child we don't do anything with them we just skip the the next node what I thought was the why do we have all this useless modes occupying the memory if we can eliminate them at the time when we generate the CST than at the same time whereas savant two-thirds of the memory and also the 2nd person becomes simpler because it doesn't have to
be concerned with this given all this useless modes this is what an example of them compressed CST looks like it's actually small enough so it fits on 1 slide without scrolling and at the same time we have this modified function which no longer has the awkward loop actually doesn't have
any looks at all what it does is that in each branch of the switch it can answer that the note has more than 1 chip child mean we can do something meaningful with this not and but just keep it now the the this
shows the serialized CST is
made before and after my fixed and you can see that the CST has becomes a so small that the ball was becomes human readable the catch here is that the only
purpose of CST still be biased by the 2nd pass until the st and then it's discarded for the memory would never use it again so
if we have a woman in scripts such as a web service for example that Bolton in that split-second run for hours and days and today doesn't matter will how much memory the syste takes because it's only there for the 1st split-second on the other hand if we have a system utility which Francis for a split-second and finishes immediately and the performance of this utility would be dominated by the performance of the past and that was my case I my script was just look up the value in
the 30 megabytes the chill dictionary span indeed there where where they
might fixed of the convo Python parser memory consumption of of my script were reduced by a factor of 3 so I was curious how Hawaii much would affect other standard benchmarks and this of the results 1 of them has shown the improvement by
3rd a few of them have shown around a dozen per cent improvement and nothing has but 1st but actually they use benchmark package which I was using
was the prediction October last year on the grounds that the results it was shown or inconsistent and meaningless for some purposes like I'm I'm not that performance analysis of some sort I don't know what a deep and wanted to on but it had this been superseded where they by performance but the package and these are the results
wizard is something got a little bit better something but a little bit 1st but the big picture is that often changed my intuition is that type of problems intentionally runs at a few warm-up loops before as a certain the timer and this way
indention layout avoids at time and the loading and personal of the source code I didn't show can inside the book by performance it's beacon sophisticated enough but nevertheless even this result and from that my by chance no unintended side-effects doesn't break anything and at the same time it enables a use case which may not be relevant to most of the Python programmers but at least the if I bump into it maybe other people's other a people would bump into it as well yeah so
my desire was to show this but for the community to get it included into Python how somebody that's this is it opens a bug in the buck occur because that's where all the discussions happen where all the reviews happens and they also have a is
there and you're allowed kills this sound being once a month if your bike is not getting enough
attention so I opened tool issues in the vector to occur and what 1 of them at the 2nd 1 of them actually got accepted and committed the the book me a 3 three-month waiting for a review and what it does was and the 2nd hand written parser knows that the input that it passes all this comes from the 1st class so
it's guaranteed to conform with the 1st formal grammar so it doesn't have to validate that it conforms to it it knows for sure that for example for the statement the 1st child is always give token then the 2nd child is always
test but 1st a module can take an arbitrary serialize syste as input and run it through these of 2nd person and then if it doesn't conform to the formal grammar of the python binary which expression not at Memorex so they had to implement the validation for this CST themselves how they implemented this was like this
I'm showing just litigation for 3 different types of nodes but it it was all more or less the same and they are chicken that the number of the children the motives and the types of the children a correct and then recursively this into each
child node but and the callable think function for the child node when I was looking at this code which had to be updated with that every change to the grammar too much the update grammar I thought maybe we can what the generated from the grammar maybe we can write some DFA face which would traverse the CIA to see if it's valid
and than what occurred to me is that we already have this be a face this are that they face which pass around you can just run them on the CST instead of the the token in a stream of the input so this
way I could eliminate 20 400 lines of hand-written code the just 7 2 lines of code for running the difference which we already have in by comparison and which are guaranteed to stay in sync with the positive because they are what the parser runs but after a 3 months
individual my other Pige was
rejected by with a 1 recent himself Sanders because because it's so big unsophisticated and nobody has a link you to review it a 3 months I shouldn't hold my breath for anybody to jump up and review it
and if nobody cares as much as I do about random 30 megabytes by conscripts through the interpreter than stuff like for me whether this is the motivation well I'm
here maybe you can and for this so it will be built 2 6 4 1 5 uptake of that they can we control reviewed and from that it's OK that had committed Oklahoma's explained what might prejudice when that
sample if the only purpose of the CST is to get but but biased by the 2nd pass and discarded whether we have
to store it in the memory at all maybe you can generate 1 sees you know that the time handle it
in the MST parser and discarded immediately that way instead of improving its memory consumption by a factor of 3 we're improving it infinitely tool takes 0 memory
what the current impostor actually does
is it has school entry points 1 string knowledge a wonderful file objects it but passes these input tool 1st of function which runs a look inside the it's extracting 1 the token at the time from the input Boston and this extracted token tool bypasses had open function and this at token kind of consumer function when maintains its state in
the state structure which most importantly has this step all the EDF face the keeping track of the current state for each DFA and is the pointer to
the CST rule not so it's a Benson you is the most to build this tree after a personal use them up at end of input it returns to the C route to the caller and the caller processes
the 2nd function called by st from node objects which is essentially the 2nd pass it recursively that traverses this 3 and 4 for each subtree of the CST
it generates the corresponding subtree of the St the important bit here is that the whole of CST he's available by that time 2nd parser around so it can analyze modes out before the it can be analyzed so as the 1 node repeatedly and
so on my suggestion is so replaced
the by Steve from node object with a different function something like bias due from tokeniser which would more or less the same but for the for getting each CST not you would call this next node functions function which would make the passes states all the parameters for the tokenizer which we will just pass on pulled the tokenizer and to generate
from note in the output which means only node type and mode string but no children and as will return them a number of Europe indicate in the position of the new node relatively tool the previous node so whether it's the 1st child of a parent node and next season another child mold or whether a the previous what was the last child and we're going back to the buyer and the difficult question curious what what are we going to do with the bypass a module because we no longer have the ECC is to use memory there is nothing that it would be serialized until the structure of nested lists and I'm not sure anybody is
actually using the person model even the especially so that it's the command up as deprecated well all analyzer that I have seen our using the ST because it's
much easier to use but we have them some effort we can make the parser itself call next Motorika recursively to materialize the CST FIL sterilize it into all these nested lists and
then to discard the whole thing just for compatibility with the existing consumers so there are some and there are some difficulties here the
minor difficulties that the power still can no longer have a loop run inside
we have to change it to a generator function and accept the look of it until the process state structure but the main difficulty is that we have to rewrite all the 50 to 100 lines of the 100 comparison this is the example of the right and 1 function in this way this is for parser for slice
nodes the slices when you have like square brackets and 2 to 1 or 2 or 3 numbers so separated by colons it inside them and see that they all parser knows all kinds of interesting things about the American but parser can only check each note once and he has to go on these nodes in order which actually simplifies the code
good the question is it would obviously take some time to rewrite the 50 to 100 lines this way maybe several weekends many weekends islands the we want to do this and if for example i takes a several we can still that do this will there be somebody who wants to review that patch including the right and offer 50 to 100 lines of code that's actually all I have and so
on I put my questions into what
will this is the image that came up so I search be fair and be and thank you very much so and questions
but when you have the problem tokenizing year 30 bytes worse file what was the limit that you hit to use simply run out of memory was there some internal structure a static size it the form yes at that I was doing is on my laptop which only has 2 gigabytes of memory so if you bought more memory ahead more sparse and I would have been a problem and in the future we deserve even larger than the are it's not going to be a problem for hardly anybody Bloom if you want to run Python on your phone and it always we're problem a lot in this in the um rather than rewrite all the conversion from the concrete parse tree to the abstract syntax tree is
it not possible to some migrant of grammar definition that to generate occurred because it does look awfully repetitive courage you I
agree that that does look repetitive and I I think that the I can actually show this the like most grammars
for yet for example include the because of beautiful call to with the each production that the parser generator it
inserts into the parser maybe something similar can be done to this book as the OWL file to help the generate and the parser but this would change the syntax of this file which is something formal and somebody outside of the bars so of book by country may rely on this exact structure so you yes what while k the same the in the structure of the grammar it it's not possible if if we a give ourselves they availability to rewrite the grammar than it would be possible but we may be breakends somebody else's code we
and I have once and this is also and if the entire problem is having maintain our I think you could
become 1 this year clearly qualified this thank you broader it's worth the supply from tree move to get for 5 months ago so all the Hg links are now 5 months of data you Sorokin European this web page is a sheet of hypernetwork Python is now has and get hot and has been for 5 months so we're now looking at how models yeah I'm Icelandic is actually all of than 5 months because this work has been going on since the year ago but I'm sorry what is an outdated links this to look and the new questions in the something to them and to thank
Feedback