Zero-Overhead Metaprogramming

Video in TIB AV-Portal: Zero-Overhead Metaprogramming

Formal Metadata

Zero-Overhead Metaprogramming
Using Self-optimizing Interpreters to make Runtime Metaprogramming Fast
Alternative Title
Smalltalk - Zero-overhead
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year

Content Metadata

Subject Area
Slide rule Group action Mobile app Implementation Concurrency (computer science) Proxy server Divisor Observational study Virtual machine Discrete element method Semantics (computer science) Proper map Computer programming Field (computer science) Formal language Time domain Latent heat Single-precision floating-point format Software testing Endliche Modelltheorie Proxy server Metropolitan area network Physical system Oracle Social class Domain name Overhead (computing) Building Projective plane Instance (computer science) Limit (category theory) Cartesian coordinate system Symbol table Hand fan Arithmetic mean Voting Vector space Personal digital assistant Logic Text editor Object (grammar) Communications protocol Oracle Row (database)
Metre Slide rule Context awareness Overhead (computing) Just-in-Time-Compiler Java applet Code Multiplication sign Order of magnitude Computer programming Implementation Proxy server Physical system Overhead (computing) Information Moment (mathematics) Generic programming Instance (computer science) Line (geometry) Cartesian coordinate system Compiler Uniform resource locator Message passing Process (computing) Interpreter (computing) Object (grammar) Mathematical optimization Communications protocol
Code Multiplication sign Semantics (computer science) Formal language Different (Kate Ryan album) Interpreter (computing) Cuboid Abstraction Social class Area Parsing Moment (mathematics) Physicalism Bit Instance (computer science) Variable (mathematics) Parsing Type theory Data management Process (computing) Heuristic Right angle Reading (process) Resultant Frame problem Asynchronous Transfer Mode Implementation Supremum Computer file Virtual machine Branch (computer science) Computer icon Wave packet Root Energy level Integer Conditional probability Compilation album Condition number Pairwise comparison Overhead (computing) Expression Java applet Code Counting Cartesian coordinate system Software maintenance Frame problem Subject indexing Personal digital assistant Network topology Interpreter (computing) Statement (computer science) Codec Object (grammar) Mathematical optimization
Just-in-Time-Compiler Set (mathematics) Parameter (computer programming) Shape (magazine) Mereology Disk read-and-write head Neuroinformatik Formal language Inference Mathematics Positional notation Error message Social class Polymorphism (materials science) Reflection (mathematics) Moment (mathematics) Sound effect Instance (computer science) Message passing Chain Physical system Resultant Asynchronous Transfer Mode Classical physics Point (geometry) Metre Slide rule Dataflow Field (computer science) Power (physics) Chain Cache (computing) Operator (mathematics) Directed set Integer Mathematical optimization Overhead (computing) Standard deviation Focus (optics) Information Surface Primitive (album) Letterpress printing Line (geometry) System call Symbol table Compiler Cache (computing) Uniform resource locator Voting Error message Personal digital assistant Interpreter (computing) Object (grammar)
Bytecode Standard deviation Keyboard shortcut Implementation Scripting language Machine code Proxy server Java applet Combinational logic Virtual machine Mereology Semantics (computer science) Field (computer science) Variable (mathematics) Formal language Time domain Chain Root Envelope (mathematics) Interpreter (computing) Software Energy level Data structure Compilation album Physical system Overhead (computing) Standard deviation Building Java applet Field (computer science) Line (geometry) Cartesian coordinate system Category of being Message passing Process (computing) Personal digital assistant Universe (mathematics) Right angle Object (grammar) Communications protocol Window
Metre Point (geometry) Meta element Implementation Pixel Context awareness Overhead (computing) Level of measurement Java applet Multiplication sign Image processing Compiler Mereology Semantics (computer science) Computer programming Exact sequence Medical imaging Performance appraisal Different (Kate Ryan album) Interpreter (computing) Operator (mathematics) Single-precision floating-point format Energy level Endliche Modelltheorie Teilauswertung Compilation album Physical system Overhead (computing) Just-in-Time-Compiler Weight Java applet Bit Cartesian coordinate system Benchmark Arithmetic mean Message passing Process (computing) Vector space Partial derivative Speech synthesis Object (grammar) Intercept theorem Communications protocol Library (computing)
Metre Meta element Overhead (computing) Demo (music) Reflection (mathematics) Virtual machine Sound effect Cartesian coordinate system Perspective (visual) Computer programming Field (computer science) Chain Interpreter (computing) Chain Object (grammar) Teilauswertung
Overhead (computing) Googol
and that's what I want to achieve and that's a little research project that we did together this proceeding from oracle apps he contributed limits of the experiments and J. and now this defined cuts than in the U. N at on so but in the programming well that as mentioned group here you probably know that kind of stuff I'm sorry I still use my own slide that said that for the more general audience of and so you see here you can imagine just with semantics and that's so it's just that this man higher mean the mean fully brick so the idea you have fewer profound in small talk was a symbol you have your own In sensor reading and and say field right in earnest that's so that kind of metaprogramming on here from people from the the kind of world yes sense and is doesn't understand you implement low proxy in the news doesn't understand and perhaps for vote to call by using fan then and that's not so much in small talk having a proper MIT object protocol so that's not right so serious kind of this style of things and that's what I'm actually really into so so really call to make that stuff fast but actually I want to get here that's researcher editor during my studies in Brussels and the goes is to have something that can modify it is of the behavior of language just by defining a specific some what called me class so in my case I was focusing on concurrency what I wanted I wanted to have some meat object like domain that can redefine so that for instance can provide the semantics of an actor model is in my language so I don't have to inductive logic machine to provide guarantees but I can really just defined like proximal top so either actual as the and or whatever you can imagine so what I rented to be able to do is we define the semantics of writing fields by just providing such a little snippet and what you see here is actually almost sufficient to implement lecture system so because most important factors is that you have all proper isolation between the actors so that that 1 electric can see what the other actors to in and what axis and object trough is and another actor so essentially in the late fusion he just to test whether you in the current vector on another actor if you in the care about you can just access object fields otherwise thrown in a row like imagine that stuff is executed for every single fuel frightens system so it's really got them so if you have no way of optimizing if so far down that's a problem and in general of course everybody knows
programming is slow why do people normally because for instance of Java so but it's not all that much better if you work in this context because cockatoo moment not optimized it is you have for mesoderm location and when we measure that we see roughly of 7 x over overhead it's about the same order of magnitude also and follows sensing was anemic proxies at least on JVM you also see about 7 x over so it's much so as it should be the what you to this assumption is if you and to encourage people to use made a program and and not to a veritable performance-critical code you when you have 0 so either as if the situation now that people find represents which perhaps Mexico to less readable because lines mentally no standard signal as you use doesn't understand and if you go to the 1st time the you generate message for instance to delegate to another object so you know it's a cost of doing the doesn't understand and the effective application but then of course you have that strange doesn't understand the and all those methods popping up and for us to maintain the system that doesn't really know what's going on so yes 0 Laplacian OK now positively is to make the smarter so that all object of metaprogramming facilities are fast so yeah the idea is to look at the end of OK so generic meter programming is pretty slow on job of competitors that France's on pipeline as a for the tracing just-in-time compiler that's actually pretty fast but if you start to implement such a simple made after protocol as i showed then you still get like a 50 per cent overhead because the compiler even so it can optimize all those little since stood enough information to optimize the example I showed you was the act or of MIT object of so we need something better and what you found here
is a nice way of implementing of interpreters and then we also experimented with those made object protocol so I will just be the show you all the way how we implemented interpreters because I think that's that's a very nice idea it's not actually mine I forgot to put 2 of the reference year but and included in this slide so I will put on a hit
the 1st starts explaining exactly it is the job like could on that's application of if you see iconic coat that C interpreter level implementation there just to make sense therefore so if you imagine you
want to implement the little interpreter new probably 1st started as a parser influences for such a code snippet we have conditional statement this a conditional expression than the 2 branches of than adults and that will be passed to a tree like that perhaps on so again the conditions statement of the other 2 sub-expressions are interpreted in the essence of it looks like that we pass a file we get back in that kind of root note here and then we call and execute Knesset and possible perhaps a frame which captures all the local variables in the misidentification so another question is how do you actually need to implement some of those executioners methods for each of those nodes and that's pretty straightforward to the simplest possible way is really just are directly implements the semantics of feeling that you want so for instance you have literals so 0 and the 1 here and the possible just by by parsing traits such an object for the constant value in their and their execution you will just return the fall reading of variables are very similar there we use of offering object I mentioned before and we can assume that the to begin languages like small talk we compiled so in a base that you can just enumerate you know the on your of local variables and then you can just to an index and access a specific index for each variable in that the frame which has an area to store the over this if he rights on the other hand the variable but that's is very similar but the difference is that your 1st 1st if a sup expression to evaluate so here we have 0 assignment statement and 1st you want of course to get the value here so what we do is we went to the child execute all that that note and get back a value which is can store into the training object at a given interest the index of that variable it looking so that's pretty much generic way of implementing and a up 6 industry-based interpreter of very fancy but the main idea of but is here to actually start optimizing that is the interpreter during the execution someone else seems it's important to really get performances to relicense mental on we we actually don't know exactly during compilation what that is so we have here is that the current statement to increment the count if you just look at the code we intuitively know our OK there's only just integers but the comparisons maintenance of course also the integers of objects so 1 round-trip their use in the larger machines is to use seems like smelling teachers radial essentially still a bit and then represent the integer directly and don't have to allocate objects all but yeah so depending on what kind of integers is maybe you need to allocate an object and we went to wanted so for the self-modifying self optimizing interpreters we tried to actually optimise each of those notes for instance that they can handle integers directly it so what that means that the topic we just OK compiled at the lucrative but actually to uninitialized models of and then note that executes will store executed expression 1st gets a value and then call specialized method was value and then that specialize messengers language implemented in sides the heuristics we want to use to optimize all interpreted and In that case I decided I I want to use a type and then actually provide a specific type for the instance variable right of physical type integer and now at the moment you see it's ready to and the very executed and you really exchange that node in the tree and the next time executed intellectually UC optimized note for the integers so that implementation looks like that so instead of that just calling a generic execute method naturally ask the whole subtree to provide us with an integer and then we get a deterrent sometimes in some clue passes it might not be true that that's an integer than we might need to handle that exceptional case and we change our tree but for a surprisingly many I could separate into directly and then we can just optimise it to the integers and in that case on yeah Vincent job like language and implement your interpreted in are you can properly type all since to use directly pure integers and compute optimized on that kind of a AST tree to really use integers without having to erect the result having to tag them and also the class essentially can use direct plus operator off of the process of instead of having to do all kind of management and structured around so far all right low-tier specifically that means they can also store there was a queue integer-valued directly in all frames so we have actually a specific the 2nd area instead of just having an object area there we can direct the story integers to toward boxing and tight taking and other tricks which emotions usually need to apply so that's the basic simple idea of losing self modifying self optimizing interpreters and I want to show you how we actually use that to optimize also
reflection so yeah on that set
part you just to give you a little review what language looks like but it's still a small vote you just was this notation to not confuse the non-small focus but there as you can see that the performance of for instance or at instance like fruit doesn't understand the and here's a global get global and and so on am it's purpose just on of the performed of so standard reflective mesoderm location you and you have an error if all the arguments so now the question is how can we possibly optimizer good while net 1st look at where people came up is this standard messes chance small talk so that's a classic of again paper on the ideas you have some variable was whatever kind of value you can imagine in here and then plus message so this 1 so the more jobless of imitation is see directly that message than operator here and the problem is that when is highly polymorphic so if you would use a set a compiler the we would have to guess what's going to happen there because usually we don't know and the idea of they came up this polymorphic allowing caches is that he actually tried to observe and runtime but what happens and then cash lugar for the operator or the last message said for instance so and and this is interpreted as it's called dispatch chain which essentially is part of the class and mode and little cash the love of results so let's makes use the birds we have all of last year it we have all columns which has an integer object in it so the 0 and that's dispatching starts all in uninitialized case and the idea now is that p that the executed 1st message sent here and then we know it OK we actually got an integer as a receiver the look up you get the integer plus operation the captured and we also have like that check whether the receiver is still an integer because maybe for some reason the call on the flow so large integer or something and the receiver changes so we need also to handle the case so that's the general idea and when you you now apply a compiler to it that compiler can speculate on OK I actually see units like but the column pneumatic sense so there's actually only 1 kind of receiver always been observed and then it can completely in line that trust operator and do directly the plus operation in your message so the benefit here is even interpreter the what's the extra look up and enables inlining inference optimizations by just-in-time compartments so that's the basic idea I'm back to our little reflective example so remember performs a symbol and here we actually just generalize that idea so we have on you and we wrote here and also at the dispatch shape and here's the result on that plus 1st so the notes he gets here plus several provided as a power meter and then all the cash that you have that and and again the check in case some other symbols coming along dynamically at some point and then we will just messed that kind of dispatching again you do the same thing I just showed on the slide before so here we will have again an integer the moment and then you can't cash and in line the plus operator so and that provides actually enough information to from the just-in-time compiler and optimizes and allows us to remove computers your head off reflective mesoderm location or effective field accesses so all a 7 x and the UN it's a just-in-time compiler sees its minimal free anyway OK so that surface
looser 2 categories so this simple reflective axes and also souls like in combination of the doesn't understand the message missing at is performed here so far the question is what do we do this is what I actually was interested in the kind of fancy root object put calls so remember so we
have that kind of MIT object protocol it's really cross-cutting applied to the window application so we want to change every single fueled right yeah so if you have a little example of using that kind of of meat object protocol and in that right well there and still 2 parts here and definite in by to misidentifications also part just leave left out but definitely envelope feud axis is for them it's not static in there and what the semantics and so you can just in a 2 byte code for right the field here or something like that was a native code no we 1st have to reserve what happens on the MIT object level so and then the solution here is applying those caching and dispatching tricks again so the 1st thing is we observed that kind of neat object actually comes along and in our case we see hides actually all electrode and then we can actually just in line that the right to feel and by that we provide enough knowledge to the compilers to optimize interestingly is it's a very generic of structure so we can in case we only have a standard the the semantics of a language he did not actually use all that fancy stuff to do something they can really just directed to the right so in the end they're supposed to be in the lower in the system if you don't use a fancy MIT opted for cost stuff OK so I briefly to check whether is actually faster on what I used for that that's not this meant for you probably know so to speak at to follow sensing of a simpler on because it's made it's much easier for me to experiment with stuff it's called Simple Object machine that's a little small so what exactly in those made of 80 tradition much simpler has been used in a couple of universities in the past this designed for teaching on so it's he has a constant so more clearly expressed performance was usually not so a goal of underlying picked it up and got to like foremost on job a little fast the couple of implementations in C C + + and Java JavaScript are items and also of course small talk so I for the set up here I used
to 2 implementations set of base on our part operators 70 like New from pi pi pi pi is based on that but was a very fast pipe implementation roughly 7 times faster than see Partners Inc of the compilation their users meter tracing and what not going to detail that's the interesting part is just the 2nd 1 uses a completely different technique that's based on Java of understanding to show that actually the technique is general enough to be applicable to all kind of ibm's and all kind of all compilation technologies uh if you're interested you find those 2 little small talks on on the and yet so to show you
the members that's the benchmark to show or a couple of benchmarks social that actually can remove all the reflective overhead while having the kind of fancy meats object would call so imagine all kind of fueled riots all kind of if you do all message sends are at have an extra indirection level an extra intercept they can redefine also semantics and the question was OK if I put it into the system do I actually see over and the end and if so how much so I've tried longbows little small talks and in the end we see about it we were had further on-line per cent depending on the compilation technology and the most important insight here is we don't see the 7 x over here so there's still a little bit of overhead because we have that extra semantics so at any single point in the program they could change again what a few dried means by associating another me object to a base level object so there is some semantics that we still need put to support but there are that are only there is essentially no no or were to speak of so at least not the 7 X to solve for so by modeling the vials made object protocols important or maybe it's not important for you maybe it's really just an academic saying I'm interested in so lets you look at something more of vectors oriented so that's the antecedent came in this is such a Ruby implementation on so we apply the same techniques also in that context kind of independently and he checked that kind of that does actually get a real applications in his real applications uh he looked into a room the image processing library which is a good example is a Ruby people saying also very much of the dynamical small told way OK let's use performed nets used doesn't understand to implement ordinal of processing library and it turns out that's that's really everywhere and is also for every single pixel operation if you like virtually in the multilayer documents on the user's so we have all the benchmarks year and this he actually if you apply that kind of technique you get to 10 to 20 x speedup on processing images in jail so that's just to show and it's really nice and icing on and beyond those also thinking about introducing something like that but in 2 copies so I hope at some point he also have that's and infallible and speech this is on now lets us give the
demo it's not really important so just to sum up the of wanted to show you is that a programming doesn't have to be slow so there's a very simple technique at least from my perspective on missiles dispatch chains which is are essentially just majorized fall off Singh said larger machines already do you can apply that to effective method invocation reflective field access and so on we can also apply that to fancy meter object programming and perhaps enable new kind of applications was that and yeah that's a single most important which for me
so did you have any questions I would like to read out but more need to have a paper draft so that that's not it accepted but should many new I can get it and yeah Christians OK so you're much if at