It's all about the goto
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 95 | |
Author | ||
License | CC Attribution 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/32262 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FrOSCon 201784 / 95
4
8
9
15
20
22
23
24
25
27
29
32
36
37
38
39
40
45
46
47
48
49
50
51
53
54
59
63
64
65
74
75
76
79
83
84
86
87
88
89
91
92
93
94
95
00:00
Slide ruleCuboidKey (cryptography)Term (mathematics)Digital watermarkingQuicksortMultiplication signXMLLecture/Conference
00:52
Interior (topology)Programming languageLevel (video gaming)Slide ruleMappingNetwork topologyQuicksortExpected valueNeuroinformatikData conversionComputer animation
01:35
CodeScripting languageLevel (video gaming)Source codeBefehlsprozessorProgramming languageFormal languageMoment (mathematics)Network topologyRepresentation (politics)2 (number)Token ringBytecodeWordCodeSet (mathematics)Student's t-testRight angleLecture/ConferenceComputer animation
02:44
Line (geometry)Formal languageMultiplication signString (computer science)NamespaceSoftware developerVideo gameAngleSheaf (mathematics)Error messageState of matterDifferent (Kate Ryan album)CodeCellular automatonScripting languageToken ringSpacetimeWordInformationLevel (video gaming)Slide ruleOperator (mathematics)Group actionArrow of timeOpen setFunction (mathematics)Social classSparse matrixQuicksortAsynchronous Transfer ModeCuboidMereologyVariable (mathematics)Content (media)Arithmetic meanExecution unitMoment (mathematics)Element (mathematics)Task (computing)System callStatement (computer science)Closed setWeb browserRight angleParsingParsingDoubling the cubeHeat transferContext awarenessConstructor (object-oriented programming)Mobile appComputer animation
07:55
Computer configurationProgrammschleifeRule of inferenceDeclarative programmingInterface (computing)CASE <Informatik>Social classFunctional (mathematics)BackupStatement (computer science)Computer fileBinary codeBelegleserSlide ruleState of matterFlagElement (mathematics)SpacetimeAbstractionInheritance (object-oriented programming)Electronic mailing listGroup actionMereologyToken ringNetwork topologyString (computer science)Interpreter (computing)Scripting languageFunction (mathematics)AlgorithmFinite-state machineData structureInformationCodeLine (geometry)Associative propertyNumberFile formatOcean currentRight angleProcess (computing)Video game consoleExtension (kinesiology)Water vaporWord2 (number)Standard deviationOperator (mathematics)Sheaf (mathematics)Direction (geometry)RecursionVariable (mathematics)Formal languageComputer animation
13:06
WordAbstract syntax treeMultiplication signRule of inferenceRight angleElectronic mailing listDefault (computer science)Category of beingSocial classPoint (geometry)Constructor (object-oriented programming)Core dumpOperator (mathematics)Figurate numberData storage deviceExtension (kinesiology)Regular expressionBranch (computer science)Mathematical optimizationFunctional (mathematics)FlagStatement (computer science)Exception handlingCASE <Informatik>Cache (computing)CodeDeclarative programmingFormal languageData structureNetwork topologyScripting languageLine (geometry)File formatParameter (computer programming)Level (video gaming)Moment (mathematics)ParsingType theoryBlock (periodic table)ImplementationMereologyGraph coloringElement (mathematics)Dependent and independent variablesInformationThermal expansionReflection (mathematics)Digital photographyInterface (computing)Link (knot theory)Slide ruleData conversionResultantState of matterVariable (mathematics)Design by contractPhysical lawQuantum stateComputer animation
19:12
CodeElectronic mailing listFunctional (mathematics)Multiplication signRight angleSingle-precision floating-point formatSoftware testingField (computer science)Video gameMathematical optimizationQuicksortStatement (computer science)Substitute goodSpeech synthesisOperator (mathematics)CodeDebuggerObject (grammar)Pole (complex analysis)NumberCategory of beingParameter (computer programming)Point (geometry)Line (geometry)Free variables and bound variablesVariable (mathematics)CASE <Informatik>Message passingKeyboard shortcutSet (mathematics)Formal languageInformationBitScripting languageRevision controlConstructor (object-oriented programming)OpcodeLinearizationDifferent (Kate Ryan album)Maxima and minimaLecture/ConferenceComputer animation
24:54
Hydraulic jumpMetreProgrammschleifeCASE <Informatik>Order (biology)Regular expressionLogicOperator (mathematics)Line (geometry)ResultantPairwise comparisonBlock (periodic table)Binary codeStatement (computer science)Lie groupOpcodeMatching (graph theory)WordRight angleDesign by contractFlagFigurate numberCondition numberComputer fileCodeEquals signFormal languageElectronic mailing listConstructor (object-oriented programming)Control flowEquivalence relationSquare numberEqualiser (mathematics)Variable (mathematics)Loop (music)Keyboard shortcutCircleNumberMultiplicationBitQuicksortSystem callInformation technology consultingCodePhysical lawBuildingObservational studyUniverse (mathematics)Branch (computer science)MereologyState of matterVideo gameElement (mathematics)Internet forumLecture/ConferenceComputer animation
30:26
ResultantPhysical systemLoop (music)String (computer science)Water vaporSlide ruleLevel (video gaming)Multiplication signRankingJunction (traffic)MultiplicationHydraulic jumpKeyboard shortcutProgrammschleifeLimit (category theory)Complex (psychology)Natural numberPointer (computer programming)GodRight angleCASE <Informatik>Sheaf (mathematics)Matching (graph theory)Sound effectCodeQuicksortOpcodeObject (grammar)Speech synthesisSquare numberCondition numberLine (geometry)NumberBitMoment (mathematics)TrailOperator (mathematics)Object-oriented programmingOrder (biology)Reading (process)Regular expressionInformationElement (mathematics)Basis <Mathematik>MereologyKey (cryptography)Field (computer science)SummierbarkeitCollisionStatement (computer science)Electronic mailing listCodeOffice suiteoutputBranch (computer science)Initial value problemData structureFreeware1 (number)Group actionPoint (geometry)Exception handlingSimilarity (geometry)Variable (mathematics)Cursor (computers)Ring (mathematics)Error messageComputer animation
37:00
Term (mathematics)ProgrammschleifeField (computer science)CodeArray data structureMultiplication signData structureArrow of timeMereologyOrdinary differential equationComputer programmingLine (geometry)TheoryPoint (geometry)Functional (mathematics)Semiconductor memoryCASE <Informatik>DecimalComputer iconDecision theoryStatement (computer science)Graph (mathematics)Heegaard splittingLogicNetwork topologyGoodness of fitDemosceneHydraulic jumpSocial classBit rateOpcodeLoop (music)Slide ruleGradientComputer animationLecture/Conference
39:00
Exception handlingSlide ruleCondition numberSocial classMotion captureSimilarity (geometry)Right angleQuicksortCASE <Informatik>String (computer science)Matching (graph theory)Multiplication signStatement (computer science)Keyboard shortcutOrder (biology)Validity (statistics)NumberLevel (video gaming)Hash functionComputer animation
40:18
System callCodeMultiplication signStatement (computer science)CodeFunctional (mathematics)NumberMereologyOpcodeMathematical optimizationLine (geometry)Mathematical analysisStaff (military)UnicodeMaxima and minimaSource codeComputer animationLecture/Conference
41:19
MereologyStatement (computer science)Graph coloringSystem callLine (geometry)FlagBranch (computer science)Lie groupBoss CorporationSoftware developerMathematical analysisCodeFunctional (mathematics)Computer scienceCASE <Informatik>NumberGreen's functionMultiplication signSoftware testingRight angleAreaGraph (mathematics)Function (mathematics)Scripting languageLatent heatDirectory serviceVideo gameMoment (mathematics)BitMultiplicationSolid geometryPresentation of a groupConfiguration spaceConnectivity (graph theory)Pulse (signal processing)Greatest elementResultantAverageGraph (mathematics)GradientSpeech synthesisSource codeExecution unitPotenz <Mathematik>Computer animation
45:33
Multiplication signLevel (video gaming)Scripting languageCodeToken ringBlock (periodic table)State of matterAbstract syntax treeData structureHydraulic jumpMathematical analysisLinearizationSystem callOpcodeCache (computing)Graph (mathematics)MereologyBytecodeExtension (kinesiology)Bit rateLocal ringPrice indexSingle-precision floating-point formatComputer animationLecture/Conference
46:53
Run time (program lifecycle phase)Inheritance (object-oriented programming)Social classScripting languageRule of inferenceInterface (computing)MultilaterationMoment (mathematics)CASE <Informatik>ParsingGodMultiplicationStandard deviationMassSemantics (computer science)Phase transitionSpecial unitary groupPole (complex analysis)Fiber (mathematics)Right angleLecture/Conference
48:57
Statement (computer science)QuicksortMoment (mathematics)Line (geometry)System callElectronic mailing listAdditionFunctional (mathematics)Exception handlingLevel (video gaming)Speech synthesisKeyboard shortcutTrailWhiteboardLecture/ConferenceComputer animation
50:57
Line (geometry)Statement (computer science)MetadataBitFunction (mathematics)Data conversionCASE <Informatik>OpcodeMathematical optimizationDifferent (Kate Ryan album)Shared memoryOperator (mathematics)Revision controlCodeHydraulic jumpCache (computing)Design by contractBranch (computer science)BytecodeQuicksortOrder (biology)Reading (process)Speech synthesisLecture/ConferenceComputer animation
52:52
Hand fanDifferent (Kate Ryan album)Token ringOpcodeMathematical analysisElectronic mailing listUniform resource locatorCodeLine (geometry)QuicksortString (computer science)SpacetimeCASE <Informatik>Food energyRight angleHydraulic jumpPresentation of a groupCodeCellular automatonCodierung <Programmierung>Speech synthesisComputer iconChainLecture/Conference
55:29
Open sourceConstructor (object-oriented programming)Hand fanVariable (mathematics)Social classSystem callRight angleSlide ruleMultiplication signElectronic mailing listVirtual machineQR codeInformationOpen setLocal ringMoment (mathematics)EstimatorLink (knot theory)7 (number)Uniform resource locatorFrame problemLecture/ConferenceComputer animation
56:56
Computer animation
Transcript: English(auto-generated)
00:07
Hello, welcome to the third talk this afternoon here in the Audimax.
00:28
It's Derek Redhouse from the UK, born in Netherlands, really did a lot of work in the internals of PHP, now works for MongoDB as a senior engineer.
00:43
I think you did the Xdebug for the PHP, contributed to OpenStreetMap. I'll just put on my slide and see. Yeah, okay, so you can do it better than me. So he will tell a lot about the inner workings of PHP.
01:01
So yeah, and he said we are going to expect lots of wonkiness, a form of assembly and trees. So thank you very much Derek, please. Thanks for the introduction. The only thing he didn't mention that I like maps, beer and whiskey. So there we go. Next slide. So this afternoon we're going to look at a few things,
01:23
a few different stages. We're going to look at what the stages are, what conversion stages are and conclusion is. The idea of this talk is to sort of let you know how many computer languages, although the examples are mostly PHP, many computer languages go from a script source all the way to running things on a CPU.
01:42
So we'll be going through the different stages that PHP does and see what is necessary to actually build up a language. And then we'll look at some other interesting techniques on looking at code as well. So let's have a look at what stages we actually have in running code.
02:02
So the four stages, at least in PHP. The first thing is parsing the script. Parsing the script also means converting code into tokens. A lot of those words I'll get to in a moment. Second stage is we're going to have to create a logical representation of the code. Basically that's where PHP looks at what the code is
02:22
and makes something out of it, right? It needs to understand what the code says. So that's the second step here. From there on, we are creating executable code. We're converting this tree into code and then PHP executes this bytecode in something that is called the Zend engine.
02:44
All right, so the first stage, the parsing stage, is the way how a parser works is it looks at what happens in your script basically line by line. And if it encounters specific keywords, it jumps to a different stage. So a simple example, if you've ever written a PHP script in your life before,
03:03
in PHP it is basically HTML, right? If you don't write a PHP tag, it just gets sent out straight to the browser. Most people don't do that anymore, but it's still something you can do. So the initial state of the parser in PHP is basically saying, well, echo things to the output.
03:20
Then that is the initial state, how it is called. But there's a few others as well. The most important one is, for example, the moment you see a PHP starting tag, the parser is switched to a different mode called ST in scripting. That is text for, well, we're currently parsing PHP code and no longer just putting things straight to the browser.
03:42
But even in the scripting stage, there are different things sometimes. Things behave differently depending on which context you're in. So, for example, if you are in double quotes, variables get interpreted in a different way. So the parser needs to have a specific stage for knowing how to deal with behaviour
04:01
of parsing things within double quotes. And similarly, if you have NowDocs, which is like a longer form of PHP scripts, there's going to be something similar in here. Now, any time... So the tokenization step is defined in a big file, which is zend-language-scanner.l, which stands for Lexer.
04:23
It basically defines all the tokens that it could find, and each of those things have an action. So if you look at the very simple PHP open tag here, it says only apply this if it is the initial state, because, of course, if you're already in scripting, we want to ignore the PHP open tag, because it makes no sense to do the transfer.
04:42
You get the parser for that, actually. And it is then followed by a space, a tab, and a new line, all optionally. So what this basically says is that the PHP open tag is the angle brackets, a question mark, PHP, followed by a space, a tab, and a new line,
05:03
which means the new line gets eaten, right? Nothing happens there. And when it does that, it needs to do something with the new line, because PHP needs to know which line of the code it is currently at, otherwise it can't throw correct errors, just kind of handy to know. It switches to a different state by using this begin thing, and then it returns the token.
05:21
So for everything that it parses, every different element, it then emits this token. But no meaning is given to this token. The scanner, the tokenization only looks at the different keywords that are in here. And, finally, PHP comes with a tokenizer extension,
05:40
so you can actually have a look at what the scanner actually sees. So if you have this example script, most of my examples will be related to a little whiskey app that I wrote called RAM.io, which you're more than happy to have a look at if you're into whiskey, or rum, or tequila, or any of the other spirits. So, yeah, my examples will be coming from there.
06:00
So this is a really simple, useless script. It doesn't do anything. It defines a class in the namespace, and another particularly very interesting class at that. So when the tokenizer goes over this, it changes this into all those tokens. So I've slightly edited the output that comes out of the tokenizer, and I've removed all the white space in it,
06:21
otherwise it looks really silly. But the main constructs it picks out of here are the TOpen tag, you might remember from two slides ago, and the value of that is the PHP opening tag. Then you get TNamespace, which is the namespace keyword. Then we get some white space, we get TString, and then the word RAM.io.
06:41
But as you see, it doesn't actually associate the string RAM.io directly with the namespace. It is just tokens that it sees, right? It sees the namespace token, it sees a string, and the string has then a value called RAM.io. So it doesn't do anything more than that with us. Similarly, you get the closing statement semicolon,
07:02
you get white space, then you get the class keywords. You get then a string which is called whiskey, where the value of string is whiskey. And not necessarily every token has a value. Like TOpen tag doesn't have one, or TNamespace doesn't have one, but TString does have one, because of course in the end it needs to sort of know what the value of these strings are to do anything useful with it.
07:22
Similarly, you get TPrivate for a private keyword, TPublic, TObject operator, which is the error operator, and a whole bunch of other things. There's about, well what I remember, about maybe a hundred tokens or so. And those are usually the names that you see when you write some PHP code and there's a syntax error. It will tell you an expected token,
07:42
and then it gives you the name of the token, and it will tell you if there is one. So that is what the parsing stage does. But as I said, this parsing stage doesn't do anything logically with the information it finds in the script. That is known by the scanner. So the scanner converts tokens,
08:03
or a collection of tokens, into something called, into a structure that represents the logical information that is encoded in these tokens. And this is kind of complicated how this looks like. And this is probably the most complicated part of PHP
08:22
besides its executor. Basically, this is not a state machine. It's a state machine that starts with nothing. The nothing is in this case called the top statement. And the top statement is basically the first thing in a PHP script. There's no state in there. So what can you put in a PHP script?
08:41
Well you can put in a class definition, or a function definition, or traits declaration, or you can just put in standard code or statements. So those are the four options that you have here. And the way how the scanner rules written is that the top statement is either a statement, or a function declaration statement, or a class declaration statement, or a trait declaration statement,
09:01
and there's a whole bunch of others, but I've left those out because I've run out of space on the slide here. Now just to have a quick look at what the class declaration says. So the class declaration, if you remember in PHP, they can have modifiers. So you can have an abstract class or a final class. And then you can have the class name.
09:20
Or you can just have the class name right from the start. The modifiers are optional in here. And that is basically what it says. So there are two options here. The pipeline tells you that there's an OR, it's either just a rule set or the other rule set. And the first one is, well we have class modifiers and then the T class token. But then we need to check what are those class modifiers first.
09:42
So that's what the scanner does. It then sees this other definition that I've highlighted in pink here. And pink then says, the class modifiers, they are either going to be class modifier or they're going to be class modifiers followed by class modifier. And this is a way how traditionally scanners would resolve
10:03
recursion and things. It doesn't define, so you can have two class modifiers or three class modifiers because that would be a silly thing to do. Especially for statements, you never know how many you're going to have, right? So then the class modifier then checks that it's either T abstract or T final. And then with each of the rules it finds
10:21
between the curly braces, it then associates an action with it. So if it finds the abstract or T final, it basically sets the dollar dollar as the output of this action. We're setting it to either Zent ACC explicit abstract class. That is a flag internally that says well this is an abstract class. And you have the same thing for Zent ACC final,
10:42
marking this as a final class. Once it has resolved this rule set, it then pulls back to the original one, say we have an abstract class risky. It then sees this T class token and it does then do something with this class. The first thing it does
11:00
it basically assigns the current line number to an internal variable so that it can create for PHP's reflection, it can tell you between which lines a specific class has for example. Not the most interesting thing is this one. You then get the T string for the name of the class and then it says extends from implements list backup
11:22
document. It's a very long name. Basically what it says is we can have things that this class extends from like inheritance which is the extends from which I didn't show on the slide. Then a class can also implement interfaces which is the second thing. Then there's the backup document if a class has a
11:42
specifically format comment on it it will also do something with it and store it so that you can look at the comments later. From those rule sets it then creates actions. What it says here in red I've said Zend AST create declaration. All those Zend AST calls that are actually being run
12:02
as C code later, they build up this tree of the logical interpretation of your script. That's the next step that we have. The scanner rules they give meaning to the tokens and from this it constructs this AST through rules. You have it for every different element you have, for binary operations,
12:22
for assignment operations, for for loops, if else loops and so on and so on. The scanner rules is something that we write in a file called ZendLanguageScanner.y and this is a file that's converted into an enormous C file.
12:42
That has basically a state number for every of the different aspects that make up this big rule set. The class declaration statement ends up being case 167, 168 and there's I think between one and two thousand of them. It's not a file you want to look at and also one you don't have to look at
13:02
because it's all auto-generated for you. Unless you want to debug it, I guess. So the scanner rules convert us into this abstract syntax tree and that is a really difficult word to say very quickly so I'm going to call this an AST from now on. I've learned that from a previous time I gave this talk. Alright, so what is this AST?
13:22
An AST describes the structure of your of the parse script. Each node is a language construct and language construct can class definition, function definition statement and so on and so on. I'll show you in a moment. Nested structures are represented through this tree as well so if you have a nested if statement
13:40
it will look nested in this AST. It also doesn't keep all of your original code and text. It strips out all the white space, it strips out all the comments that are not associated with a class or function because those you can see back through a reflection, others you can't. And again there's a cool tool that actually can visualize this AST for you
14:01
written by one of the other PHP core contributors and your original idea is that from this AST you can actually run some optimizations, figure out which things can never be can never happen like if you have an if true then you clearly know that one of the branches in this if else statement can never happen because it's always going to be true and things like that. PHP itself
14:21
doesn't do a lot of these optimizations but because there's an AST in there it allows it for other extensions like Zend's opcache to look at this AST and optimize that before it generates bytecode from it. Now the reason why PHP itself doesn't do this a lot is because if you know that if PHP sees
14:40
a script, every time it sees a script it will parse it, create an AST out of it and execute it. Doing optimizations on this stage is actually quite expensive to do so it makes no sense to do it on every request. Of course if you get a cache in there like Zend opcache then it can spend more time on doing this optimization on the first
15:02
time, of course that means doing this the first time takes more time, but all subsequent requests, because it has this cache in there, it doesn't have to do this anymore so it has all the optimizations built in there making it even faster. This is the reason why PHP doesn't do this itself but it's something that opcache does. This is just one of the things that opcache does.
15:21
If you do if you call AST parse code on a script you get this very, very big long line. I mean, it's very long. Of course when I click now it doesn't work, but let me see, scroll. It's a very, very very, very big complicated thing to our show, we don't do that. Luckily
15:42
it has another method that actually allows you to format it for you and that looks a lot easier already. The start of your script is this AST statement list STMT stands for statement, and then it has a constant in it I'm not sure what a constant is in this case actually I don't know where it came from, but then you have AST class, which is
16:02
then a class. And well, a class has different attributes, like I have the flags, it could be a static class, an abstract class or a final class for example. It has a name. It could implement or it could inherit from other classes, it doesn't do that so that's why it does that null. It can also implement interfaces, there's none
16:21
either, so that's also null. And then you have the list of statements that belong to this class. And statements that belong to a class, well they can be constant definitions, they can be property definitions, or of course they can be methods. So in this case there is only two elements, two statements in a class, which is well first of all we define
16:41
a private property, so it's a property declaration with the flags private, making a private property, it has a name, it has a name null and it has no default value. Then we get a method definition and you probably can guess what the first one does, it's your constructor. So there's a public constructor,
17:02
it has a list of arguments or parameters, in this case there's one, you can see that the type of it is null, like no typing is defined for it, it has a name name, and a default value null. Then for each method, after the parameter declaration, he
17:20
got all the statements that belong to this, and so on and so on. If you have this AST, you can basically rewrite back your original script, except that you probably lose all your new lines, because PHP doesn't necessarily care for those when having to execute code. For some reason it actually does store that, but I'm not sure why the
17:40
AST extension doesn't show that information, but it should be there because it is necessary to create errors out of it at some point. That's a part of it, and of course this goes on a little bit further, but I won't be showing you these. To link that back
18:00
to where this comes from in the AST itself, just making sure the colors are right, is that different parts that made up my little constructor, I linked them back to the different elements from the first method declaration. The only one I really want to look at here is the this name assignment, assigning that to name. The
18:20
first statement and the only statement in this method is where we assign the value of name to the property this name. That is the name property of the class. That is an assignment operation, which is the purple link, which is the assignment operating you have in there. An assignment operator has two sides, it has a variable on the left hand side and an expression on the right hand side.
18:41
The variable is not a simple variable, it is a more complicated variable because this is a property, hence it uses AST prop instead of AST var. A property consists of two things, an expression on the right hand side, which is the name of your property in most cases, sorry, the name of the left hand side is your variable name,
19:01
in this case this, and then the property is name. This little block says we assign a value of name to this name and then we do that. The AST is still something that PHP cannot execute because it needs to be converted to something that the Zend executor can run. That is something
19:22
that in PHP we call bytecode. By code, we also call them opcodes, we call them two different ways to make it easier. Not quite. Each function, each method, and each main body of the script is represented by something we call an oparray, and an oparray
19:40
is basically a list of opcodes. It is a linear list of opcodes. From this AST we need to convert all of the methods, all of the functions to this array of opcodes. The Zend engine will return, will execute each of those opcodes in turn, unless it sees a function call, then calls a function, starts executing the oparray belonging to that functional
20:00
method, and at the end it returns back to the previous function and starts executing that. You sort of expect that to happen, right? It is very similar to assembly instructions if you have ever seen that the language is clearly not the same, but the concept of assembly instructions, comparing that to PHP opcode is very, very similar. There is
20:20
a tool that you can use for visualizing it. This is a few different tools, actually. There is one in opcache, there is one in PHP DBG, and there is a tool that I have written 15 years ago called VLD that also allows you to have a look at this. Because I know this tool better I will be using that. To convert this little bit of AST that I have in the
20:42
start, it is my constructor. It was a very, very simple constructor which we are converting into opcodes. In this case it ended up being seven opcodes, numbers 0 to 6. The first one is called XNOP. NOP stands for no operator that doesn't basically do anything anymore.
21:01
It is basically a placeholder for function __construct. You don't have to associate any code with it but it still ends up being in there. Why is this exactly? I don't know. There is lots of things like why is this like this? I don't know. Because it is PHP.
21:20
There is lots of interesting and really complicated things in there. Every opcode has two operands and a return value. We see that up to two operands and a return value. The first opcode besides the NOP is called RECV, stands for
21:41
receive. It basically tells me we are going to accept an argument that is being pushed on the stack into the function. In this case we are receiving an argument and the value of this argument we are going to put in a variable called exclamation mark zero. PHP doesn't have exclamation mark syntaxes
22:01
for variables as you know. It is internal speak for a compiled variable. This is a placeholder number for, you can look it up slightly higher, it tells you that is the name variable. It is a quick substitution that is an optimisation so the people don't have to look up where to find a variable every time it sees the
22:21
name of a variable. It is a FAFSA thing that came in PHP 5.3. Then you get exstatement. Exstatement doesn't do anything but it is great for debuggers because this is the point where they can hook in for pausing the debugger. That is what exstatement is for. If you do not have a debugger loaded, that
22:41
opcode will actually not be there. PHP doesn't generate it for you. Then you get the assignment operator. What did I say? I said every opcode has at most a return value and two operands. But that is not true either. It is true most of the time, but
23:01
it is PHP. The first example is here, which is a curious one because if you look at the operands there is only one operand listed. But there is actually two. There is two except one is silent because PHP has lots of shortcuts in it to make things go faster as you expect. So the
23:21
assigned OBAJ opcode assigns a property of a specific variable and assigns a value to that. For that it needs three operands, right? It needs the name of the variable, the name of your property and then the value. So that actually needs
23:41
three bits of information to do so. But as I said at the max you can only have two, but the this is a shortcut so you only have one in there. So you have this one variable and this is implied here. But it still needs to have the value and that is because that is the normal case. It attaches a
24:01
secondary opcode to this called opdata which is in some cases like this one used for giving you the third operand. In this case the name of the variable. So those two lines basically say we are assigning the value of the exclamation mark zero which is a name variable to the name
24:20
property on the this object, on the this object. Then it has an old X statement because you can pause afterwards and then it does a return null. Every single function in Matlab and PHP always has a return statement at the end. If you don't set it yourself there will be return null. If you do set it yourself
24:40
the return null is no longer there, although early versions of PHP 5 it actually wasn't removed. Is this clear? Because it's kind of complicated. I know it's a very small example. They will get more difficult. We need to start small. Now let's have a look at jumps. I've seen people do this on this rock, it's crazy.
25:02
It's also about 700 meters off the ground but you can't quite see that. Let's have a look at the if statement. The if, although it is not a loop, it is a very simple language contract and this is where the go-tos come in. As I've said that opcodes are executed one by
25:21
one in order in a linear line. If you want to do loops in that you need to have a way to skip things and go back to things if you have a for loop for example. The if statement is very simple. You either execute the code, the statements inside the match, or you don't. That's basically what it is.
25:42
In this case we are comparing the value a to 42 and if that's true we are going to live universe and everything. To see this in AST it looks like Fallout. You have the AST if, which is your language construct. That has multiple possibilities. The comparison in this case is a binary operation.
26:02
You compare a variable to a value. It has two bits of it. It has a left side and a right side. The left side is your variable name on the left side of the binary operator. On the right hand side you have a constant value which is 42. The flags for binary operation is equal. They can also be greater than or less than or
26:22
multi equal and extra equal and stuff like that. Equivalent I think is the right word for the three equal signs. If this matches then there is a list of statements that this can execute. The statements in this case is my echo statement which is to live the universe and everything.
26:42
Because this is something that is being conditionally executed there needs to be a way to skip parts of it. If you look at the opcodes that is something you see back. In this case we have six opcodes. The X statement at the start again is there so that the words can stop. The first important thing is the is equal operand which
27:02
compares a variable with an expression. The variable in this case is the dollar exclamation mark zero again and the value or the expression is the constant value 42. That is a very simple one.
27:20
What this equal does is compare these values and then returns the result of this expression into the tilde one temporary variable. This is a variable that persists for more than one or two statements. Or rather for more than one or two opcodes actually. Then the next opcode says jump
27:41
z and the z stands for zero. Basically this looks at the return of the previous operation and if it is false or zero then it will jump to the opcode that is specified in the second operand. This says basically if the result of a equals 42 is
28:01
false we are going to jump to opcode number five. That is what this little thing here indicates. In other words if the if case doesn't match we jump over the case so that we don't do the echo to live even as everything.
28:21
This makes sense, right? It is also a lie because in PHP 7.1 the is equal and the jumps are contracted into one operation. But for the logic of it this is basically how it works. Lots of shortcuts to make things go faster but it works so why not continue doing it. It is a construction
28:41
but the opcodes are still there. They had not been converted into a specialist opcode because that would first of all break code like this that looks at where all the jumps are because it needs to be able to determine for later for like figure out where you can have code that you don't have anymore can be executed to them being optimised out and stuff like that.
29:01
It is not something that can just be removed from here but it is shortcut. Not every case either but most of them. Then at the end we have the return one which I don't know where that came from. If can of course also have an else statement and in this case the if is now a statement list and if
29:20
have then for each of those cases it has this if LM construct. If LM with a value is an expression that compares A to in this case pi and then the second if LM does not have a condition at all because it is just your L so always get executed if it hits that file.
29:40
Then it echoes squares. Now if you look at the opcodes again there is more jumps in there now. The first comparison and the jumps that you have seen before. That is just your normal if statement. But of course at the end of the first echo, after echo circles you want to make sure it does not also execute echo squares
30:00
so there needs to be a jump that jumps out of it. The last step of the statements in the first block of the if statement there is now a jump instruction, GMP, that jumps straight to number eight. GMP is an unconditional jump. Jumps basically are our go-tos. There is nothing else about it. They are a bit more clever because they are sometimes conditional
30:22
depending on which jump you have but in the end they are go-tos. And that is going to echo squares. Let's have a look at loops or rings. Anybody wants to guess what this is? Nobody? Everybody asleep? For loops, yes.
30:41
I was specifically referring to this big ring. Between the border of France and Switzerland, this is a very big ring. Yeah, there we go. Alright. So yeah, the for loop. The for loop is the first loop that we are looking at and if you look at a for loop it has multiple parts, right? It has your initial
31:01
assignment, it has a condition, it has an end of loop operation, list of operations to run and it has all the statements in there. And those are the four things that come back if you look at the AST4 construct in the AST. So the initialization step
31:20
which is a list of expressions, remember you can use a comma and stuff in there. You have the condition which can also be multiple conditions in there and then you have the loop which is the thing that gets executed at the end. Now if you look at how the opcodes get arranged for this you see it gets a bit out of order, right? Because
31:41
if you look at the first number, the line number, you do first a bit of number two, then a bit of number four, then a bit of number two again. This is kind of weird. But if you think about it, you can rewrite for loops with go to statements. Now you should never do this in any code you write, but you can do this. So if we do that, it becomes
32:01
a lot more clear to how the opcodes actually relate to this. So if we rewrite our for loop to the initial condition, go to the condition, if the condition matches, then we're going to execute our statement. And at the end of the list of statements, it does the blue operator for the end of loop increment. And if you rearrange
32:21
it like this, then it is a one-to-one match with the opcodes that you get out of it again. And then it becomes a lot clearer. So the for loop has this. Yeah, don't ever write code like this yourself. As I said. Anything else interesting in here? Nope. So the while loop, a similar thing, right? You have
32:41
an initial, or rather you can rewrite a for loop as a while loop as well if you want to do that. So you have the initial statement, you have the while keywords, you have then your condition, and then you have the statements in there and then you can do an end of loop instruction, which in this case we haven't done because it's been put in a while case. So here we have while, we do the
33:01
assignment, we do the jump immediately because it jumps to opcode 6 to do the condition. And if the condition matches, it then jumps back to opcode 4 to do the echo itself. If not, if the jump not that instruction doesn't match,
33:21
like if the previous one was false in this case, then it will just drop down without having to jump. So by rearranging those opcodes, it makes it a little bit clever there because less jumps aren't necessary. Speeding things up again. Do well, a very similar thing, just slightly reorganized.
33:41
We'll spend too much time on this slide. For each gets more complicated. For each loop, for each in PHP, loops over elements in an array or elements in an expression really. And in order to do that, it needs to keep track internally of where it is in the array. So it has an extra
34:01
pointer or cursor that loops through the array. And any time you do a for each loop, PHP allocates this cursor object and it does that through an opcode called feResetR. It allocates this cursor object, also sets it back to the start of the string. The result of this is your
34:20
I see this correctly, it looks at the array and if the array is empty, then it will not even do this and just jump straight over the loop. Again, this is a shortcut. So the feResetR with the exclamation mark 0 and the jump to 11 basically says if the array is empty, it will just jump
34:41
straight out of it and don't bother doing anything. Which is great. Inside every loop it needs to fetch information, it needs to infect your key and your value. That is the first thing that happens here. fetchR fetches the value into the value it fetches the value into the
35:03
value variable. That is tricky to say as well. Again, if and then it advances the cursor and if it knows it can't advance, it will also jump out of the loop. Then you have the assignment, the assignment that is opcode 5 is there for assigning the key.
35:21
Because if you don't do the key to error value, that opcode is simply missing because it is not necessary. Now what again did I say about how many operands an opcode had? It was 2, wasn't it? Look at the one on opcode number 3. How many operands does that have? Three of them.
35:40
Yeah, sometimes PHP does that. It has this extended value that often only encodes a jump instruction. That is the only thing it does. Well, with an exception here and there. PHP, right? What can I say? What is important, one more thing I want to point out here
36:01
is that because for each loop it needs to allocate this cursor object and it needs to free it at the end, which you see with the fe3 in line 11. This is one of the reasons why when PHP actually got to go to keywords, which by the way you should use bit
36:21
restraint, you cannot actually jump into loops because of this reason. Because if you jump straight into say line 7, then it wasn't, the cursor wasn't how you say that allocated, right? So if you then
36:41
jump to the thing where it needs to fetch something from the cursor, it wasn't there and that you can't do so it crashes. This is why one of the restrictions in go tos in PHP doesn't allow you to jump inside of construct but only jump out of them. Again, please don't use those. And then you get to really complex loops where you have multiple nested levels.
37:01
It turns out being something like this sometimes. Any of you seen primary? Don't watch it just once because you won't understand the film. Okay, so the really complex loops, let me visualize those with the tool because that makes it a lot easier to see. I'll sort of skip on the opcodes here because
37:20
you can have a look at them yourself later. I'll put the slides online. And you can actually see in the tree structure here what actually gets constructed from the opcodes that are being there. And they're very similar to what you would expect, right? Like in line two and three it does do some initialization it sets the value of the array. Then you have the for each
37:42
in I've got my latest point here. In line three you have like matching this line so there is a decision to be made, right? Either the array is empty or the array is not empty. If the array is empty, we jump straight out of it. We jump to opcodes 12
38:01
and 13 on line 8 which is I believe this one here. So we jump straight out of the loop and that is what the arrow points out here, right? If the array is empty we jump straight out of it and then we leave the function. Now if the array isn't empty, well we have the first if statement and the first if statement you can either go left, it's true
38:21
or right, it's false. If it is false in this case, you get another if statement in which case it can be true or false so the arrays or the graph splits once more into right. Any time you have a conditional you get an extra split in your graph. And then of course
38:41
they can loop back because of for each loop of course loops over your whole array so it goes back to fetching another value of decimal. So this graph which is actually part of VLD, the tool can actually create those graphs for you. That's how I grade my slides because I can't draw those things myself. I'm a lazy programmer, I write tools to do the drawing for me.
39:02
And then we get exceptions which are also really difficult. Exceptions are called in order. So if the exception doesn't hit it catches it and if the catch doesn't match it will look for the next catch. If that doesn't match it will look for the next catch
39:21
and if there's none there it will bubble up the exception. So if you look at this you sort of realize that if you have exception classes that are called way more often than others you should put them at the start because that makes it a bit quicker. In a similar way if you do this with switch case statement
39:41
which I don't think I have a slide for it's a similar thing, right, if almost every single time the value that you're matching against in your switch you should put at the start because that is the first condition that PHP will check for you. And then the second one and the third one and so on and so on. This is something that's going to be
40:01
addressed in PHP 7.2 where if all the values in your case statements are either all numbers or all strings it will actually create a hash map for that, shortcutting the whole thing so it doesn't have to check every single statement. Again making it faster again. So many little tricks. So what can we do?
40:21
Go ahead. Dammit. I was hoping to skip. Fast call is really complicated. If I have time about it I will try to explain it. So the dead code thing. I mean sometimes you write code that can never be executed. Things that you do after return statements
40:41
so if you have this very simple thing like you have echo 40 return and an echo 2, clearly the 2 is never going to be executed because we have just returned from the function. So there's some code in both VLD as well as the next debug that makes heavenly use of the code analysis parts in VLD. It actually figures out which lines of
41:00
code or which operands or which opcodes cannot be executed and it marks those with a little star after the operand number. So you can see that opcode 4 the echo 2 is marked with this little star because it can never be executed. Which is useful to have because optimizers can use this to get rid of them to make your code smaller.
41:23
I have 20 minutes but 5 minutes to hope to finish the presentation. So why is this useful to have a look at analysis of what the paths are in? A common thing to do if you write test cases is figuring out what your code coverage of your functions are. So if we have this function
41:40
if then else we have two unrelated if statements in there. We have the first if A equals true and the second one if B equals true. I can make sure that my test case covers all of the lines in this function but just issuing two calls I can pass in where A is true
42:00
and B is false or where A is false and B is true. And I've covered every single line in the code but you haven't tested all of the paths in your code yet because you haven't tested where both A and B are true or where A and B are false. Now several years ago the developer of PHPUnit says well we need to fix this
42:20
and you write some code for that Eric. So I did. It took me about a year and a half to do because it's complicated. I got busy with life that happens. But I did actually get it into Xdebug 2.4. Well there's two problems with this. It is very difficult to visualize which I'll explain to you in a moment. Also it's ridiculously slow. And I'll also explain what that is. So
42:41
if you run this with code coverage if you set this up yourself you do those two lines. I'll skip over that as well. It looks better if I show you the output of it and you can see all six lines are green right because I've managed to hit all the lines in the code. But that's a lie.
43:01
So Xdebug 2.4 has a new flag for code coverage called CC Branch Check which is a new magic sauce which adds extra analysis of doing this part on branch coverage. This is really slow to do because it needs to figure out which branches there are and then which parts you have through
43:21
making use of all these branches. Also every time you have a conditional statement you duplicate the number of parts that you have. So it's an exponential growth and that is not a great computer science thing to have. So what happens
43:41
here with just the two if statements you end up having five branches the one before the if statements a branch on each side for each of the if statements and then you have four possible paths to go through there. They all have the initial bit and then you can go left or right so you can either jump to opcode 5 which is inside the if or 8 which is outside of it
44:00
and so on and so on. So what Xdebug actually figured out for you here is that which branches are being hit which is all of them and which parts are being hit which as you can see here is only the two out of four. And as a tricky... yes this output form is something
44:21
that comes out of Xdebug with a little contributor script that is in the contrib directory of Xdebug. This is PHP specific. So to visualise this, well we can do this in the following area. If you look at the graphs from the previous one, well what we can
44:41
do for every part that is being hit we draw a solid line and for every part that has not been hit we draw a dotted line. Well let's use a different colour for each of the possible parts. And if you have well the four parts here that is quite easy to do. Well consider you have eight if statements in there how many parts are you going to have then?
45:01
You have 256 right? That makes a really really wide graph. Also can you really distinguish 256 different colours with your eyes? 256 you might manage. But if you get 1024 you're kind of screwed right? You just can't do that anyway. It's just too much of it. So somebody needs to write something clever to visualise this better. And I
45:21
don't do that kind of stuff. So the boss back in Sebastian's court the developer of PHP code and he has now spent a year and a half trying to figure out how to do that. So this hasn't come anywhere yet. So we're going to do a quick recap to go quickly over the things that we talked about. We have some time for more questions.
45:40
So we looked at the different stages that we had. We first had a code, the script itself written in ASCII well not quite, UTF-8 really. We converted to the tokens the tokens are the blocks that make up every single part of the script. The tokens get converted by parsing state into this AST abstract syntax tree.
46:01
Which then gets converted to bytecode which then gets executed. We have looked at most of the jump structures here and all the looping structures they end up being converted into this linear array of opcodes with jumps in it. We've done some code analysis for Fun and Profit to figure out which things can be removed from our code once we store it in our cache. I mean if
46:20
you're just going to execute it and then throw it away then why bother removing it in the first place? Because you're going to throw it away anyway. There's no profit in this, just lots of fun. We have looked at a few tools. We have looked at PHP tokenizer which shows you all the tokens. We have looked at Nikita's AST extension to visualize the ASTs. We have had a look at VLD that shows you the graphs for the bytecode and there we
46:42
have looked at some code coverage. Alright, that was pretty much what I had to talk about. Are there any questions? The fast call goes last. Yeah, go ahead. I'll repeat the questions directly.
47:10
The scanner will know a little bit. Sorry, I said I was going to repeat the question and then I didn't.
47:23
To recap, the question is does the scanner already know something about the OO semantics? Yes, the scanner in PHP does know some things about it. First of all, it knows that methods can be public, private and protected, right? That's one thing it needs to know that it is there. Similarly,
47:41
it needs to know how the inheritance works, because it will not allow you to actually inherit from you can't do multiple inheritance in PHP, so it has rules built in for that. It also knows, for example, that if a class has already been loaded, being parsed before,
48:00
and you parse something that implements this interface, then it already knows that because the interface is already there. If the class doesn't implement the interfaces, it also knows that at that moment and it will show you an error. In many cases, it knows something about previous scripts or class or interface
48:22
that is already being loaded, but not always. If it doesn't know that it will resolve these things at runtime. Later.
48:49
Yeah, you're most welcome. Okay, anything else? Yeah? Yes.
49:14
I didn't hear the last bit.
49:23
Alright, so that's a bit longer question for me to repeat. In short, how does PHP know that there is a next catch statement? If the echo for the first echo statement would have been a throw, because this happens in the same function, that's easy enough, right?
49:40
You can easily see that it would be a catch call, because it just follows down the line to see where the first catch is. If it is a function call that throws the exception, then the moment you do an exception and function call, the function aborts, and the execution gets passed back to where it was called from, and then PHP can find the first catch statement.
50:03
I actually believe that PHP internally already knows on which op lines there's a catch statement, so it shortcuts that a little bit sometimes. In most cases, it just goes down until it finds the first catch statement.
50:20
If it doesn't find it, then it knows already where the next one is. Something I didn't quite explain. The scanner actually runs twice, sort of, there's like two stages in, and the first stage, it doesn't know about these things, but on the second stage, it figures out where all those catch statements are, and can from that build up a list of the
50:42
linking of the catch statements as well as the case statements. It is not a full path of the scanner, but it's something additional that it does, as a sort of second stage kind of thing. One more?
51:13
Because it also knows between which lines they're trying catch statements.
51:21
That is true, that does not show up in the op codes, but it shows up in the meta information associated with each op array, which VLD cannot visualise, so it doesn't tell you. Which would be nice if it could do that, but it doesn't tell you. All right, anything else? You can go first.
51:49
The optimization that the op cache does, basically what the question was. It removes that code if it can. It will also sometimes reorganise things, if you have like it will convert post increment to
52:01
pre increment, that kind of stuff. It's a bit of a useless operation to convert, because speed and PHP isn't much of a difference. It will reorganise this, it will also, if it can, contract jump instructions, because in some cases the output of the AST to bytecode conversion, it does two jump instructions
52:21
right after each other sometimes, and it then will optimise it out if it can. That's one of the things it does. There's a whole bunch of others, but I can't quite remember what I did. It doesn't do very complex things, although I say that every version of PHP they're becoming more and more complex, which makes
52:41
it harder to explain. Okay, I hope you don't consider this as mic sharing. Is there a reason why the opcode statement names and the token names are abbreviated all the time? I mean, why can't it say fast return instead of fast red?
53:00
I assume it's return. So the question is why are the tokens and opcode names abbreviated? Is there a reason there is it is yeah, I can't if I would do this now that probably wouldn't happen. So it is historical reasons.
53:23
Of course, JMP can be jumpset, right? There's no space requirement associated with any of them, because they're basically constant values. And there's also it would be quite easy to do is change VLD to write out the name of the opcode, there's nothing special
53:40
about them. And it's the same with the name of the tokens. Tokens are a bit different because the names are already defined in the lexer, so they're sort of more hard coded and I can't really do anything out of that without defining the whole list myself. But for the opcodes you need to do that anyway, so yeah, there's also no reason why
54:00
in some cases I actually have prefixed them with zend underscore. Don't know why I've done it here. Whereas in PHP itself they're all prefixed with zend underscore. Also, I don't know why. Alright, anything else? Yeah, one more? Yeah, the strings
54:21
are URL encoded only because of VLD. I need to make sure there's no need lines in it to mess up my layout, so that's just... No, no, no, no. In PHP itself there's not URL encoded every single string. That'd be silly. Alright, yeah?
54:41
Are there lintes built on top of the PHP AST? Yes, there are. There's a tool called FAN, which stands for... It's P-H-A-N. Why not? It's written I think by Rasmus at Etsy actually, that they wrote that to figure out whether
55:00
they can move their PHP 5 code to PHP 7, because FAN can also reach PHP 5 code and then again can see whether behaving would have changed, converting that from PHP 5 to PHP 7. So they just do some static analysis of these things and I believe some of the things are built on top of that. I also believe that
55:20
it should be possible to use the AST to enforce some style requirements. You can't do syntax things in that, right? Because the syntax is being stripped out. But other things like every class must have a constructor kind of thing, that kind of things you can enforce by looking at the AST and stuff like that.
55:41
Or names of variables of course, you can do that. There are tools available and FAN from Etsy is a good one to have a look at, which is open source. I think it's Etsy slash FAN. Anything else? You want your fast call question? Okay, that's fine.
56:01
Nothing on this side? Alright, I have one more slide, which is my last slide that has a QR code that goes to a link to the slides. They are not up there. We'll probably have to wait until Monday. If you go to the URL you cannot only find a slide, you can also find a list of resources. Some of the tools I spoke about.
56:22
Some links to more information about PHP internals. There's a great article there written by Nikita about PHP 7's virtual machine. Really, really good article. If you have any questions, feel free to email me. I hope to answer your questions in a reasonable
56:41
time frame. I can't do any promises there. If that's everything, thanks very much. I hope you learned something and enjoy the rest of the conference.
Recommendations
Series of 3 media
Series of 4 media
Series of 4 media
Series of 17 media