We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PHP in the graph

00:00

Formal Metadata

Title
PHP in the graph
Title of Series
Number of Parts
611
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2017

Content Metadata

Subject Area
Genre
Abstract
Graph databases come with enhanced connectivity of data and whiteboardfriendly paradigm. It requires learning a new Graph Traversal Language, thatcrawls the network and brings informations. It's indeed a fresh new look athow we store web of data and search for it. We'll meet gremlin, from ApacheTinkerPop, which provides an abstract layer that make it easy to express yourbusiness logic without fighting with the code, and several Open Source graphdatabases, available for testing and toying.
17
Thumbnail
24:59
109
Thumbnail
48:51
117
Thumbnail
18:37
128
146
Thumbnail
22:32
162
Thumbnail
23:18
163
Thumbnail
25:09
164
Thumbnail
25:09
166
Thumbnail
24:48
171
177
181
Thumbnail
26:28
184
Thumbnail
30:09
191
Thumbnail
25:08
232
Thumbnail
39:45
287
292
Thumbnail
25:14
302
Thumbnail
26:55
304
Thumbnail
46:54
305
314
317
321
Thumbnail
18:50
330
Thumbnail
21:06
333
Thumbnail
22:18
336
Thumbnail
24:31
339
Thumbnail
49:21
340
Thumbnail
28:02
348
Thumbnail
41:47
354
Thumbnail
26:01
362
Thumbnail
18:56
371
Thumbnail
13:12
384
385
Thumbnail
25:08
386
Thumbnail
30:08
394
Thumbnail
15:09
395
411
Thumbnail
15:10
420
459
473
Thumbnail
13:48
483
501
Thumbnail
32:59
502
Thumbnail
14:48
511
518
575
Thumbnail
25:39
590
Thumbnail
25:00
592
Thumbnail
23:32
Graph (mathematics)Right angleFormal languageGraph (mathematics)MereologyGoodness of fitSlide ruleLecture/Conference
Inclusion mapHill differential equationGraph (mathematics)Message passingDependent and independent variablesDefault (computer science)Error messageFunction (mathematics)Formal languagePlastikkarteGraph theoryGraph (mathematics)Link (knot theory)Connectivity (graph theory)Digital photographyToken ringCode1 (number)Mathematical analysisExtension (kinesiology)Video gameDot productOpen sourceElectronic mailing listTable (information)DatabaseGraph (mathematics)Theory of relativityVertex (graph theory)Link (knot theory)System callProgramming languageBasis <Mathematik>Presentation of a groupObject (grammar)Slide ruleFunctional (mathematics)MereologyBitQuery languageTraverse (surveying)SoftwareDefault (computer science)Line (geometry)NumberRevision controlMaizeMixed realityMultiplication signVirtual machineVariable (mathematics)WordOcean currentGraph (mathematics)Different (Kate Ryan album)Speech synthesisFormal languageElement (mathematics)ProgrammschleifeSocial classPetri netTemplate (C++)NamespaceSet (mathematics)Covering spaceProblemorientierte ProgrammierspracheString (computer science)File formatPiWeightScaling (geometry)Moment (mathematics)Point cloudLecture/ConferenceComputer animation
System callGraph (mathematics)Electronic mailing listGraph (mathematics)DatabaseCore dumpTerm (mathematics)Vertex (graph theory)Graph theoryFormal languagePoint (geometry)Range (statistics)Category of beingType theoryGraph (mathematics)Set (mathematics)Link (knot theory)Object (grammar)Default (computer science)Exception handlingDifferent (Kate Ryan album)Arithmetic meanLevel (video gaming)Right angleIntrusion detection systemFile formatChainSoftware1 (number)ResultantArrow of timeDisk read-and-write headArray data structureGraph (mathematics)Boolean algebraReal numberServer (computing)Multiplication signQuery languageMoment (mathematics)InformationSystem callHuman migrationPresentation of a groupLimit (category theory)String (computer science)Subject indexingNeuroinformatikNatural numberSpacetimeCausalityData storage deviceMathematics
PasswordSystem callFunctional (mathematics)NumberRight angleCategory of beingView (database)Traverse (surveying)Type theoryNetwork topologyCore dumpCausalityCovering spaceRevision controlCASE <Informatik>PasswordPoint (geometry)Cartesian coordinate systemTerm (mathematics)Element (mathematics)DatabaseWordSystem callRecursionLibrary (computing)Multiplication signFrictionSlide ruleGraph (mathematics)ExplosionMetropolitan area network40 (number)MereologySoftwareMessage passingElectronic mailing listInclusion mapGoodness of fitCountingString (computer science)Query languageLecture/Conference
Graph (mathematics)Functional (mathematics)Right angleDifferent (Kate Ryan album)Interface (computing)Social classPoint (geometry)Group actionVertex (graph theory)Exception handlingFreewareWordMedical imagingExtension (kinesiology)Inheritance (object-oriented programming)Table (information)Loop (music)Theory of relativityLattice (order)Condition numberSystem callLevel (video gaming)Hidden Markov modelQuery languageChainMultiplication signCategory of beingFilter <Stochastik>Mixed realityGraph (mathematics)Graph (mathematics)ResultantSlide ruleDataflowComputer architectureProgrammschleifeMenu (computing)Rule of inferenceComputer animation
System callFunction (mathematics)Embedded systemCountingLimit (category theory)Range (statistics)Sampling (music)Row (database)Query languageElement (mathematics)Web pageVertex (graph theory)InternetworkingHidden Markov modelNumberDatabaseGoodness of fitPoint (geometry)Graph (mathematics)WordGodSqueeze theoremLimit (category theory)Right angleResultantSet (mathematics)Escape characterBranch (computer science)Process (computing)Functional (mathematics)Error messageHost Identity ProtocolSampling (statistics)Arithmetic meanElectronic mailing listGraph (mathematics)Instance (computer science)Server (computing)Variety (linguistics)System callObject (grammar)BitCountingView (database)Line (geometry)Extension (kinesiology)Operator (mathematics)Multiplication signCursor (computers)Category of beingRegulärer Ausdruck <Textverarbeitung>InformationDifferent (Kate Ryan album)Selectivity (electronic)Predicate (grammar)Sinc functionDatenverknüpfungRange (statistics)Electronic visual displayFlow separationLecture/Conference
System callCountingDigital filterSpherical capEmailGraph (mathematics)outputRepresentational state transferPlug-in (computing)Closed setServer (computing)Computer programmingCodeConnectivity (graph theory)Functional (mathematics)ImplementationProjective planeType theoryQuery languageLimit (category theory)Different (Kate Ryan album)Standard deviationInformationRevision controlAxiom of choiceNumberMultiplication signInstallation artGroup actionSystem callMereologyLatent heatCountingDatabaseInjektivitätGoodness of fitINTEGRALMathematical optimizationCondition number1 (number)Graph (mathematics)Point (geometry)Formal languageIterationAlgebraic closureFlow separationMoment (mathematics)Traffic reportingProduct (business)Programming languageGraph (mathematics)Cartesian coordinate systemLine (geometry)Session Initiation ProtocolResultantLattice (order)DatenverknüpfungRight angleStructural loadView (database)Perfect graphLevel (video gaming)Information managementRoutingVideo game consoleLecture/Conference
Sound effectFlow separationSubject indexingNumberStandard deviationRule of inferenceDatabasePoint (geometry)Algebraic closureImplementationInclusion mapMultiplication signLecture/Conference
Core dumpComputer animation
Transcript: English(auto-generated)
Yeah, we're three minutes, but we're live. Good. Thank you. Wow. Good room. Let's go, because I have a full list of slides. So pitch me in the graph, mostly about gremlin.
Anyone has been using grinding already? One? Okay, so you're all beginners. You start from nothing, right? Good, I can tell you anything. Okay, I think we have a very young audience. If you want one of them, I can sacrifice one. Anyway, so we're
going to talk about gremlin, which is one of the language to actually go into a graph and learn and extract data from it. So that's exactly what I'm going to do now. I'm going to start working on presenting the graph itself, at least how we organize closer to it.
Okay, we're still live, right? Okay, anyway, so first part of the talk, I'm going to tell you about the, well, if I have slides, I guess. Oh, that's the slides here. We talk about, well, discovery of the graph, presentation of the graph
itself. So you have the basis of the element. That's going to be very easy. There are two elements. There are nodes and there are edges, or however you call them. And then we're going to dwell on the most interesting part, which is how to traverse, how to handle this big blob of data. Okay, it's not going to be like SQL, where everything is nice and clean. Okay, column values and everything is
understood in it, right? Everyone does that, right? Okay, here it's a big mess. So we're going to see how to move like that. And I will finish by grabbing in PHP itself, meaning that whatever I'm going to present you now, well, if you want to use it from PHP, of course, well, you need the
link at the end. And I'll just show you at the end. So we skip the installation and the dirty, greasy details for the end. For those of you who don't know me, I'm the proud owner of the first elephant ever. That's my main achievement in life. I just, you know, haven't torn it. I've been doing
PHP for, since the last century, and I'm probably the most, I would say, negative of a Belgian. That's possible. Yeah, je parle pas moi long, je parle le français. Et je prêt n'ai d'alense, net flamiche. Well, very badly, but whatever. And I live outside the country. So probably you take me, put it in a, you know, photo booth, and then I
get, you get a Belgian guy. So just a contra. Anyway, how come do I, do I end up with working with Grammy? Just because of that, I do for a living static analysis. So I don't even run the code you plan to run, but before that, I actually tried to review it. So initially I did that on my own, you know, with my little hands, well, with my little eyes. I realized it was way too
much, okay. Currently it's like three o'clock, so I probably have already reviewed two million lines of code since this morning. The machine is working for that all the time. So basically it takes, it takes a piece of PHP, just break it into tokens, and rebuilds the links there is between the
lines of code and what's really happening. And you think about, a little bit about it, it's more, it's not a table. You can put, you know, every if in one table and the switch in another one, and the method calls in a third one. That's going to be a mess. You need relations between all those elements, whatever artificial they may be, and you want them into something that
you can query. So that's exactly the experience there. So one thing we're going to work on, and that's why we're going to, you're going to hear me listen and mention WordPress a lot, is that we're going to work on a very simple graph, which is the WordPress called graph. Okay, I chose WordPress because it's mostly functions, so you understand it's kind of easy. We take,
you take one function, the function has a name, so that's what I'm going to collect, and from there I know what are the other PHP functions that are being called. Okay, no namespaces, so that makes the world very simple by itself. Most of the PHP function has been removed, so we won't have
anything that's alone like that. Okay, anyone understands what we're working on? Right? Okay, so if you understand what you're working on, then you're going to understand that, because what I've shown you before, that this concept of linking a function and its function call on the definition, will end you up
with that. So it's very easy, you can see there are nodes, and there are links, but there are a little bit. So that's actually the usual experience I have. Okay, I think about turning a network into something that I
can understand, so I collect that, and the connection is usually kind of easy, because here what do I need? I need to tokenize PHP code, so there is the tokenizer extension for that. I collect the strings, I put that in, I just select the ones that are important, and I put that in a dot format, and that's it. And then I try to understand what's there, and I don't know,
right? I don't even know what to start with, okay? It may be arbitrary, but when you look at a SQL table, then you start at the top, ID 0, ID 1, something like that. Here you don't have that. Where is it? Where's the number 0? I don't know. So we need a traversing language, and that's actually the definition for that. Traversal language, something that helps you
navigate through the maze of the graph, and will actually provide you with interesting data. That's what we want to end up with, okay? And the one I discovered when I start looking for something that was not SQL to support my static auditing was Gremlin. It was, I don't know, like four or five years ago, something like that.
It was in version 2, and we're going to work only on version 3 today, which is the current version, since a year or something. So what is important? So first it's a domain-specific language, so it's going to work only for graph, okay? You can apply that to something else. We will actually see how the concept
can actually ooze into PHP. That will be the conclusion. It's a programming language. Again, I'm going to compare that with SQL. It is not something where you feel you have templates, and you just feel a few keywords, and you get some data. No, no. We're going to work. We can have variables. We can maybe not declare classes. That's going to be over here, I guess. But we can do lots of
things. We can have loops. We can have variables, like incrementations, filtering, you know, things like that. So that's the important part. It's open source. The code is online. It's actually an Apache incubator at the moment, so you can go and download it. It's vendor agnostics also. We do not depend on one database, okay? It's more standard as for that as SQL. You can apply that to
many different databases. There's a list of them at the end, so when it's important to know that, you'll have the list. So the letters are here. G is V and E. The first thing that is actually kind of stressing when you start is that the vocabulary is not the same, okay? So when
we talk with Grameen, we have graph for G, E for edges, and V for vertices. Simple enough. Vertex, if there's one, okay, seems like some kind of Roman language, and edges. You may also have people talk about nodes, links, and the datasets, or also
objects and relations, and the dataset again, okay? So suddenly it's two, three different words for the same reality. Okay, be ready for that. It's not the only situation where you end up with the same concept applied to different realities, so be ready. We'll try to stick with this one, although I've been learning that myself on my own, so I usually, you know, mix them. Please
switch them from one to the other. First actor we have, well, we use the G here. That will be a recurring gag. G for graph, maybe for Grameen. I don't exactly know which way it represents, okay? But it's the default one we have, so we'll always start the query with that, and the first one are the vertices
which are the nodes, the object itself that are in the database. The big dot, red dot, no, black dot. You've seen in the first earlier cloud. That's it. We have that. What can we say from that? First, you get it, the full list of them is access this method, and if you just want one of them, every
single node has an ID. So just like databases, all of them get one, okay? The edges will get another one, but you start with that. If you ask for it, well, you get it, and you don't get anything back, okay? What you see here is really a method call, okay? I have a first object which is kind of special. We're not
going to go into it any moment. We'll just wait for the rest. The rest are methods, and we're going to call methods. Each method are going to be linked with a preceding one, so a method will probably return an object vertex by itself. No details yet. Of course we're going to be able to go into the details, but that's what you want at the beginning, the first one. If you want to go inside
an object, well, the graph, most of the graphs are schema-less, okay? So you can put anything you want inside an object in terms of properties. So if you ask for the object, of course you get the object itself. If you want to go inside the properties, they're called values, and you can have the usual suspects in
terms of value type. So strings, booleans, integers, reals, arrays, which is nice. So we can store full arrays in them. Arrays of arrays, again, that can be interesting to have, but you can have them, and you don't care if it's defined somewhere or not, okay? You can always create one, destroy one. If it
does not exist, it will return no. So in the time you end up with a no, then probably going to stop the whole query anywhere. So for example, if v1 does not exist, then it's a no. The rest will be done, and we end up with a no. Easy enough
to add to that? Nothing surprising, right? Okay, so let's move on. If you want to do some discovery, and again when you're a beginner, that's a really useful tool, value map, okay? You've seen that values will give you all the values itself, as long as you know the name. If you don't know, you want to discover what's inside the graph, then value map is going to return you
everything in JSON format, isn't it? Yeah, no, I don't know. Anyway, in an array here, it's interesting. Here I move to another, to another value. You can get the whole list or even just one of them like that. So in terms of discovery, when you don't know
what's in there, you can start with a value map. That's important. Our second concrete here, the edges, okay? So we have the object, we now have to link them. That will be the work of the edges. They are stored conveniently into the big E methods. You can access any edges by its ID, which also has an ID, and the two of them are separated.
Actually, we won't care about that later, okay? But there are two separated of them, two of them. They're two separated sets, and the only thing that's special about edges is that they always, well, they have three things. They can have properties just like we've seen for the object, okay? Same stuff, properties, the same value, so when you're
crawling on the space, you can always, you know, check something along the edge. They have an ID, they have a start, and they have an end, which are integers, which are the previous IDs of the nodes we've seen previously, okay? And they also have a label, so that's, that's why I put here calls. That's the one we're going to use.
They always have a label, okay? Nodes can have a label, but it's not compulsory. They can have the same label for everyone, that's good. If they have different labels, then they will be, you know, probably indexed differently, but that depends on the lower, lower underlying server. So up to now, it's okay. Nodes, we are okay. Labels, on the other hand, they're always, always needed.
Shall we put them together? Yeah, yeah, of course, we need that. At some point, we'll have to do that. So edge discovery works exactly the same. On top of that, ID, so values for everything, except for IDs and labels which are used by the language itself, okay?
They get a special method by themselves. Otherwise, when you get an E like that, it's an edge. If you get a V, it's a vertex. If you get a null, it's nothing. And otherwise, you get an error. So, shall we put that together? Yeah.
Now, now we have the edges, now we have the objects. The main, the meat of Gremlin is, of course, to navigate through the network itself. So here is a very complex network, okay? Suddenly, we have, we have like four objects and three links. I manage, I use the different IDs here, so you can, we make the difference. I don't want to say,
oh, this is edge one and this is linking the vertex one. They're all different, but they're probably, if you do that on your own, they will probably have one, two, three, four, and one, two, three, okay? That's, that's normal. So, from the edge, we said we have, we have, we know the outgoing link, so we can, we can ask for OUTV, meaning I
want the outgoing link, and I need the vertex, and we end up on two. So, we say E5, E5 is this, this link. The outgoing is the one there, okay? The outgoing, that will be the start, it will be on the other side, okay? This is
called a directed graph. By default, all the Gremlin graphs are directed, meaning that there are always a starting point and an ending point, okay? In general, we can have directed or indirected graphs, so if you don't care, well, you just use, you know, OUT just as you like. We'll see another one, another one to do
the movement in a moment, and there is a way to do the migration, the movement from any side. So, the graph is good, but just information in it. So, on top of that, following, following from an edge. So, we've seen how to reach the, the
edges of, well, the limits of an edge. If you are working from a node, then you will start with a V, and instead of OUTV and ENV, you'll get directly, out. OUT and EN will just say, okay, if you follow the link that are going from this link, then you will go, tell me the, tell me the list. You can see here from G1, G1, the
incoming link, the, the incoming nodes from, to 1 are only 4, so I just get one result. Is it easy? Yeah! I'd like to see loading heads. On the other hand, you can have several, right? So, if you want the OUT ones from V, coming from V and going
outside, now we have 2 and 3. And if you want the whole of them, you use both. Fair enough? And then you can get the list. If you really want to go inside the link and make some checks, then you can also mention the E. Remember when we go inside the E, and we want to go to a V, we have OUTV and
OUT in V. Now, if we are in V and we want to go to reach an E, then you change the, change the data. Okay? Now, I think that's, that's probably an epiphany when I, when I was starting to work on that. That's the chaining. Then, remember, I told you that, well, G is the start. That's bringing you as a node.
That's bringing us another node. So, actually, we could start chaining them. Okay? So, let's say we want from the H2, and we want to see what is the ID of something that is in and in again. Okay? The first GV, I start with the ID.
Then I go one way, and I go another one. And I end up with the number 4. And I can chain. And I can go 1, 2, out, V, both, explode, and start counting. Okay? I end up with a number. Okay? So, I mention, this is the graph, this is a
number, this is a, a vertex, another, another. I end up with a number, with a number, because here ID, we'll say, okay, at that node, once I have found a node at that point of the traversal, I stop, and I look into the node, and I get a number. I cannot go on with another in at that point. Because it's a
number. I am stuck. I'm in a property. There's nothing more to, I can do. Okay? We'll see later how we can extract something in, along the way. Yes. Here, oh, it's just an example. We're going to move to something that is a little more, I
would say, business-like. Okay? We're going to move inside the database, the WordPress database. That's going to be more interesting. There, there you can start thinking. That's more example. Yeah, it's going to be different. So, remember, you want to have a schema. Okay, it's schema-less, because we don't have any
properties. If you want to remember the WordPress call graph, that's what it's going to look like. And actually, on top of that, I should remove one of the functions, right? Okay, because actually, we have only function types and this link. Very simple. So, how come do we end up with that? Okay, I think I got
4,400 function definitions and I got something like 55,000 calls on the current version. Does that make sense? Yeah, for the last, for the current
version. Does it make sense, like 4,400 functions in WordPress? Yeah, it's a pretty pretty large piece of software, right? Including all the functions, I mean, all the inclusions they have, all the external libraries. Kind of makes sense. They're using, roughly, each function is, as a means, used 11 times. Okay, so what can,
can we, can we find something interesting in that? So, first, if we just convert, no, not one call. You don't use all of them all the time, right? Hopefully, well, come to me, I'm going to reconfigure it. So, the first, yeah,
this. This, if we, if we start going, yeah. For example, yeah, it could be, it could
be the function is still here, WordPress itself do not use it, but it's still there because they have previous version for which they want to provide backward compatibility and that's it. How, well, it's way too early for me to answer that, but let me think. When I, when I reach the point where you can just answer that, I'll ask you the query. Okay? Just a question. Yeah. So, in WordPress, because you, you actually subscribe to hooks and, and
put, put the functions like strings. Yeah. So you take into account those as well? No, no. For this example, no. No, okay. Because they're meant to those. Yes, that should be, I have to admit, here I, I needed a base with enough data, so it could be fun and interesting.
In the same time, I didn't check all the PHP special cases. So, yes, you're right, and that was actually in my first slide. There is one where they say, okay, apply filter, and there is a name of a function. This is not, this is not taken in your case here, but we're not going to, we're going to use it to navigate.
So, the simple version, if we, if we make it really large and just take a few elements, then we have one functions, which are incoming calls. So, those are the functions may call this function, and they may, this function in its term may call something else. Okay? So, if we want to just, you know, have a look
at that from, on this exact example. Again, I start with an ID, I'm still stuck with that, but that will go away. I go out, one, two, and I get the name, so I get those function names, and I can call the incoming, the incoming functions on the other way. I'm not using out, because this is directed graph, so I know
which way they are calling. One function is calling the other, or the function is being called. I can know all the functions are calling VP set password, for example, by using the other one. Okay? Just application of what we've seen. Okay? So, now question. Would you imagine that WordPress, okay, yeah,
let's move. I don't know what you're talking about. Now, now we're there, now we're there, you know what? Please, take over, that's good. Can we detect the
number of WordPress, or a number of them, of WordPress function that are actually recursive? What is a recursive function in our graph? In and out is the same. So, that's interesting, we can start there, get intermediate, it calls get parents, good, but this is not the same, so that's not a recursive
function. Same stuff from the other way. What do we need here? We need to find a function that calls itself. Okay? At that point, you know how to move along the graph, you know, you're able to go very far, but you don't know how to stop. So, we need a little few more functions. Here are two of them. What I have on top, you know, well, gv is okay, out is okay, values are okay, what you
need is two of them, which is as, and retain. So, the one we actually need directly is retain. Retain means that I will only let things go if it's my, it's the value that is provided here. Okay? So, this is a filter that says, if you
give me a list of elements, I will only let those pass. And here, I need to tell it which elements have to be passing. So, I first name my element. So, I start with something, for example, I start with this one. So, okay, that's myself. Now, I go out there, and I say, oh, is it myself? No, it's not myself,
because I named this one, and I can understand that, and that's not the same. So, I stop. Okay? So, I start again. I go there, start 30, okay, this is this one. I go out here, it's not myself. Here, it's myself. So, I can go on, and I end up here. Then, I got my name, and I display the name. Does that answer the question? Okay. I don't
remember how many exactly I found, five or six. Does it make sense to have recursive function in WordPress? Why not? Of course. First, why not? There's no reason to be a, just a little surprising. Get parent, get category
parents. This is for, hmm, menu? Yeah, menu. Typical usage of things, things we call each other. Well, the submenus will call the parents and the functions the same. Yeah? Come on, you, you, you, you've talked together, right?
Wait for a slide. Because, in between, I have to show you that. The one, the one we have done here, we said, okay, we start with this, we follow the call, and we end up there, but it's a directed graph, and we can move on the graph the
way we want. Okay? If we can do that, then we can actually rewrite the same, the same query, and believe the same way, actually, but do it in, with the, the in instead of the out. Here is the same, okay? Here, I call the origin, or I call the following, that's, that's the same. Now, now you have this question. Could
you find what I call ping-pong table functions? Okay? It's like, hey, oh, I have that. Hey, send you that. Oh, that's interesting, but hey, that's for you. Okay? Ping-pong function. How can we do that? We need an extra word. We need an extra filter that will actually filter things that are not ourself, okay? From
what we've seen already, we know everything, except, except, oh, that's a bad transition. Thank you. The two of you are following. So, we still have the retain, so what we want to do, we start with somewhere, we call, and we don't want a recursive function, okay? A ping-pong function is something, or two functions
that are sending them, each other, probably the same value. Hopefully not the same, but they are calling each other. Hopefully they will stop at some point, right? But we want the first one not to be himself, and on the other way, on the way back, we want it to be the same, okay? So, the first one, we start with 47, we call it, so we know what, what to compare it with, go out, not
me, again, out again, me, and then I know my, my, my answer. And this will give me directly at least two answers. If there are none of them, that would be zero, otherwise I will get two of them. Well, actually not starting back with
that. Okay, anyway, if we don't start with the 47, then we get two of them, because if this called that, then the contrary is good. So, third guy, next question. So, ping-pong, ping-pong function, so I actually found, found some. If we go on, we
can do that with three, right? Yeah, yeah, that's what's easy to guess. I can do that with four, and I can do that a long time before you get fed up, right? So, what do we need there? We, we just, we just fire. Yeah, it's fine, it's fine. I got two of
them like that, so. So, we can actually chain again, right? The first one, we don't want it. The second one, we don't want it. The third one, maybe we could actually, you know, name, name the intermediate one, so we don't end up
with recursive functions. That could be an extra thing, but that will probably go out of the, of the slide. So, we go on, we go on. That would be nice to have a loop. So, yes, there are loops. There are loops in, in Gremlin, which are conveniently called repeat. They work with a subquery. So, we are going to put
between the parentheses something that we want to repeat. We mention times, so we know how many times we want to repeat that, here three times. We can also have an until, until some condition is met. This is nice. So, this is, this first rule that, that really shrinks the, the size of the query. The other thing
we have here is the emit. Emit itself will just emit anything that path through. So, as long as I can run into that, the repeat will, will repeat itself, okay? When it's done, the emit will say, okay, this value has been found once. I, I give it to you, okay? If you put init in the emit
another filter, except myself, well, it's already there, but we could put except myself, and not emit the thing, okay? So, we can, we can do something in the loop. So, the, the, the body of the loop may be complexed. And when we end up there at the end of the loop, we can have another condition for the emitting. Like, we don't, we just want the end of the
loop, or we want to emit anything we want, we find in between. That's interesting. And you can also put emit here, or before, then it will start immediately emitting. So, it will emit the first one, like a do while. And otherwise, it will just repeat it once. No, that's the contrary. It will
be a while, right? And otherwise, you can put it behind, and it will just repeat that after at least running it once. That's convenient, right? Can you do that in SQL? No. Yeah, no, no, no, no. So, what have you, what have we seen up to now? Well, we've seen the node and the, the vertices.
We've seen in and out, which is the, the, the backbone of your navigation. You're going to use that a lot. We've seen a few filters, except, retain, in, without the label, that's going to come, and the first loops. So, now we're going to move to something more interesting. I'm still waiting for your question, right? But
basically, traversing the graph is architecture around that. We're going to, to say, if I follow those, okay, I follow one, one, I start with a node, I follow that, that, that, that, that, and when I end up there, the whole story, if it's matching, then that will be my query, and I want to extract some data from that. That's exactly the, the, the chain I want to find, okay? I want to
find one query that calls each other, that calls itself, okay? That's exactly what we're doing. We just follow the path, and then, at some point, make filters. If anything end up at the end of the query, that's our result. So, filtering, we've seen a few of them. Something, another one that is really convenient, I think,
and I really like that, is that you can actually add the extra level in the, in the edge you're moving on, okay? The example is not built for that because I only have one example which is, one edge which is cool, okay? Imagine that you have classes instead of functions, okay? There will be
classes instantiating, there will be inheritance, there will be extensions, extents, and implements, for example. So, different relations between all those tools, okay? And we can put all of them into those, and say, okay, I want to follow and find an interface. I'm just going to follow implements, not extents, except if I start with a node
which is itself is an interface, because interface has to be extended by another, extending another interface, while classes will be implemented in an interface. Two different behaviors, okay? That's a good one. If you have more than one, then you can put them just in a row, that will work also, just the same. I'm going to use that a lot, so
that's kind of neat. Hello, internet. So, if you want to filter on vertices, there is the as function. The first one says, does this vertex has a name? Whatever it is, is
it defined? Is it not null? If it's not null, then it will pass, okay? If it's null, then it will not pass, so it's not defined. Otherwise, you can put one value. You can ask it for it not to be the value, and those, any queue, you're going to see a few of them, are what they call predicate. They look like
functions, and you can apply them. So, here, not equal. There is another one which is equal. There is a smaller, smaller, larger, things like that, all those operators. Strangely enough, there is, there's supposed to be a regex one, if you want that, but I've never seen it on Neo4j. For example, it depends on the server,
on the underlying server. So, there's a few of them, but not all of them. Within or without. You have a long list of elements. You can say, okay, if the name is in this list, then it works. But that will be filtering, that will be filtering on nodes.
Let's say, let's say we want to look for dying functions, functions that may end up calling VP die, okay? What do we get here? So, notice that we don't know which are the functions, so we start with V. I don't put any ID anymore. Okay, ID was initially good, because I wanted to start
from a point of view that you understand. Now we just say, okay, in the war graph, any of the verges, you follow one, and it must end up on the function that is called VP die. And then you show me the name. What do we get? Yeah, VP die, of course.
VP die, what is important here? Basically, the query looks like that. I start with something, anything, everything in the data set. I start, I follow something, and I end up with VP die. And then there, I say, okay, I want the name. The name that I really want is the one at the beginning,
not at the end, right? So, the processing is important, okay? Since we're chaining, things are flowing through the pipeline, and they will query them. Yeah, we have to be careful. So, how do we do that? Yeah, we run the opposite way. So, we start with a function that is called VP die,
then we go inside, and then we find the name of the function. And there we have a list of interesting things, okay? So, the graph is going one way, but we don't care. We have all the means we need to do it the way we want. Okay, so, let's use it this way, okay?
Actually, here, for this example, it's kind of simple. You could actually end up with, you could end up with that situation. Initially, I used that, and I didn't really care about the order of the query, okay? So, here, we have the two of them, and I introduce you with your good friend count,
which makes sandwiches, as you all know, right? So, it's counting the number of elements, and I start with V, go out, cool, end up with VP die count for 84. If I do the other one the other way, I also get 84, because count just end up counting the number of things that go through it. So, we don't care if it's several at the same time, okay? We, it was laughable for us to end up
with all the names, because they were all the same. But, there are actually different results. We found another way to end up on VP die, and that's how we need to count. So, at that point, one thing you want to make, if you want to make the difference, is by adding dedup. Now, dedup, can you guess?
Deduplicate, yeah, I have no idea why the hell they only decided to have this one as abbreviated name. Usually, they have long, you know, a human-readable name, okay, and the first time I got that, like, dedup, it's like, what this is, okay, yeah, deduplication.
So, we have, I think there's another unique, okay, I'll answer that. Okay, I think we'll see that later. There is a little problem for that. Maybe the deduplicate will be the same origin, I don't know, but this is the only one that has an abbreviation.
So, here, I say, okay, get a function you follow, you end up on VD die, you dedup. So, you only allow one instance of an object that has reached this point, and then you count, and of course, we end up with one, okay? So, no variety. On the other hand, if we do the same with several of them, then you have 84.
I wonder if the usage of dedup as a really ugly name is not made for one good reason. Here, it's a beautiful name. Is that your name, maybe? No, okay, that happens. Here, what happens is that we scan everything we follow, we end up on die. We actually end up with 84 elements,
and we just say, okay, I just want one. So, we do a lot of work. We unhurf, we find everything, and then we end up on that. On this one, it's a lot more streamlined, because we just find the one we want, follow once, and the first hit gives us directly all the results we want. We don't need the dedup. It's completely useless. And we just manipulated 84 data.
Here, if you have a million line to analyze, that's going to be a lot longer. So, anytime you start using dedup, what? You have to be faster. So, on top of that, we've seen counting. So, here is the count. There are a bunch of function interesting. Limit, okay, like in SQL.
Range is the SQL limit, but with the beginning and the end. Okay, so, you can get the numbers. Tail, for just the end, so that's interesting. There is coin, also, which is nice. Just going to make a sample of that. Just completely arbitrary and random, okay? The number you have here is a percentage. So, we get, basically, a percent of the element,
which means that you can make, you want to test the query, you don't want to run it on the whole database. Directly, coin, it will just make a number of them, like, once in a while, just going to give you data. So, dying functions, back to the start. What do we need? Just rewrote it, to introduce you
with my good friend, select. We've seen, again, the has, remember? Okay, so, I still use the out, because it's interesting. What happens is, I give a name of the element that's interesting me. I go out, do some checks. When I'm done, whenever I end up in the graph, then I say, okay, select start.
If you reach that point, if from there, you can go out, find a VPDI, then when you're done, stop, come back, and get the thing called start by name. So, the keyword here is select. Select will actually, at any point in the query, will move the cursor, if we call that,
move it again to a place that has been named, a place you've been there. So, you can go, find an interesting node, go one way, come back, go another way, come back, do a third way, and then end up there. And in the same time, collect lots of interesting information, okay? By here is not actually a function. It depends on the previous method.
Select, and there are a few others, which I may not mention by heart now, but select will say, first name will be displayed by name. So, I mention a property, display that property. I can actually say, start, start, mention two of them, and have two different properties selected from there. Or, I could select several values,
give them different names, and each time select, okay, the first one by name, the second one by ID, the third one by, I don't know, extension, whatever. Whatever that means. So, select and as actually work a lot better than that we've seen. Okay, but yeah, that's exactly what I'm explaining.
So, let's imagine, oh, 10 minutes, what do I need? Okay, let's finish with this one. The sub-queries. Okay, here, he's getting a little strange. I'm trying to look for functions that are calling both escape, escape HTML, and VP DAI.
So, someone that probably preparing some error messages, and then we'll actually push that to VP DAI, something like that. So, since we have, from what we've seen, okay, we start with the name, we go on. The first one has no constant, but that would be our result. So, we name it, okay? We don't care about what's the result,
that's on our way. We go out, check the other function call, and then we select the results. So, we know exactly the one we get from the middle. The order of processing is from here, because we start initially with the VP DAI, and when we end up, we end up with the escape HTML. Right? So, we do not start from where we want,
and we do not end up again from where we want. We just have to use select to move everything back to the right place. Unless, of course, you use high-level Gremlin. There is another function called where, which allows you to do sub-queries. So, you moved on the main graph,
and at that point, you don't want to divert and have Gremlin go somewhere. We just want to do a little check on the side. Okay? That's exactly where. Here where, say stop, and go out, find a call if there is a VP DAI. If you find something where anything, a number, a node, an edge,
where we'll be happy, it will be considered a true. If it doesn't find anything and end up with a null, then it will be a false and will stop. So, basically, I can now start from anywhere. Here, do a side, go to VP DAO. I found it, so I go back, and of course, we can do the same for the other one. Okay? And that will now look like that.
Oh, and that's a miss. That should be an in, right? That should be an in. Why the hell do I have to add that? And that would be answering the question about D-DAP, possibly. For out, actually, that's another typo. I just start with out, and that goes, and that goes, while on the other hand, when I start here,
I have to start with underscore, underscore, underscore, underscore, represent the current node. Okay? Which could make sense here. I could use underscore, underscore, but here, as a sugar syntax, I don't need it. Here, and that's really puzzling when you really see that initially, is that because, well, that should be in here. In is actually a keyword for groovy.
Who the hell is groovy? Yeah, I mean, did I mention that up to now? Yeah, you have to show me another five minutes, right? Yeah. Actually, this engine is written on groovy, so most of the code we have here, and we have a programming language, will be based on groovy, and groovy has a keyword
which is in, so we cannot start directly with in. It doesn't have a keyword called out, so starting with out is okay, but starting with in is not possible, so we have to underscore, use underscore, underscore. This is there where we start seeing the limit between the language, the Gremlin language, and the underlying. There will be, I think they just published
a C implementation of Gremlin, so possibly in this C implementation, in will not be a keyword, and that will flow. So, suddenly the standard is not exactly the same for everyone. Let's finish. If you want, oh yeah, our good friend count,
so if you want to use aware, and end up with something that is a negative assertion, then use count, and you make it equal to zero. And that's probably another typo. So, let's finish with that. That's the full idea when you're writing a query with this language.
You think about the way you want to traverse, then you think about the different conditions you have. It may be on the node, it may be on the side, it may be a little back trip, a side trip, and in the end, you mark in red the one you want reported, and that's how you end up with a Gremlin query.
Yeah, so to finish, I'm sorry, I'm going to jump that. There's filter, which we see, we start using actually groovy code, okay? In the closures, and there's a very interesting group count, group by, just the same as in SQL, but there's a group count.
What I really enjoy with that is that you can have several group counts in the same query. Because actually, the group count is just a counter. It just, okay, I get a node, I get a specification in it, I know what I want. I want to count the names of the function, so I collect that, and I put that in an array. But I can do something with that, right?
So I just show that to the next one. And the next one is another group count, which is now counting the number of times this function is being called. And I end up with one result, and two different group count at the same time. That's interesting. So, PHP and Gremlin, okay? Well, first of all, you can see that PHP
is a lot better at logos than them. PHP for Gremlin, there is pomvert, do not, do not mistake that with pom. It's completely different, right? Yeah, yeah, yeah, great, great movie. So, this one is pomvert. Gremlin PHP, it's on the Composer, so, okay, Composer in style, you have it.
There's a lot of old stuff for Gremlin 2. Make sure you start working Gremlin 3. Gremlin 2 is completely dead, but there's lots of servers running with it, so make sure that the one you choose is the one running. For example, OreoDB, which I will mention in a moment, is still running Gremlin 2. It is not what I've shown here,
and they are going to move to Gremlin 3. It's on the way, but it's not done, okay? So, just make sure when you make your choice that Gremlin 3 is the one that is being supported. Otherwise, from Neo4j, there's a plugin which uses REST API. How does that look like? Require once, use connection with the host
and the graph being used, open, send, get the results, close. Someone hasn't understood? Okay, so that's very simple. From PHP point of view, it's very simple. I say that the meat of running the queries, rebuilding them, okay, maybe fetching a number
of information from your own PHP before putting that in the query with the usual injection problems, unless, well, no one knows how to use a gremlin, so maybe we're safe for a moment, but that will happen, okay? So, here, the version is three. There is gremlin. Initially, I think it was,
there is one of those projects. It was initially one, two, three, four, five projects. Five different projects that were processing different parts of the graph, and they most, from version zero to version three. You skipped the 10 minutes, thank you. So, each time, each iteration,
one of the project was merged in two, and finally, it's only gremlin, so just look for gremlin, or you can call that, it's currently called TinkerPop, okay? It's an incubator on Apache. You will find the information there. Now, as I say, gremlin runs on different databases. I don't know if MySQL or MariaDB have done it.
The guy who wrote gremlin wrote Titan, so the integration is pretty tight. That's a good one. The stable ones I use for gremlin from Neo4j, but Neo4j is running the show with Cypher. So, initially, when I came, gremlin and Neo4j were very good friends, and it was easy to set up. I have to admit that I came to them with that.
Nowadays, more and more, they're separated, so it's getting more and more difficult to use Neo4j from gremlin and vice versa. But it still works, so it's okay. Stardog, I don't know, TinkerPop has a server by themself, and Rextr is the old server from gremlin 2, okay?
One thing you have to understand, there is a server. The server itself may be the one that installs the underlying servers, okay? So, there's a gremlin servers that you will tell it, okay, install Titan DB, and you will be using Titan DB.
At the beginning, it's a little confusing. I'm expecting to be gremlin on top of that, but it's not the contrary. There is a gremlin server which turns the query into something that the underlying server understands. That's the way it works, okay? So, basically, you have to download the server, the console, which you will be using to input your data, and the server itself will run.
There is an easy install from them, all in command line, and you have to use this command line, okay? And there it is. Well, thank you for staying with me. Yeah, let's move.
I'm staying, well, I have to go, right? Yeah, three minutes for questions. But I'm staying here, so if you want some more answers, I'll be able to answer. Yeah?
Okay, I'm going to say no. I'm going to say no. The reason is, gremlin is the top-level language, and it's actually relaying on the server to do the actual performance thing. So let me give you an example. Anytime, do I have a query somewhere?
That's not in the end, that's not in the one. Okay, anytime I start with that, for example, based on Neo4j, for example, just behind, instead of in and out, I make a label check. I check if my node is of a certain type. I have different types of that. The thing is, gremlin do not do any optimization for that.
So it just, okay, as label, good, and it shows that Neo4j. Neo4j, on the other hand, the first time it sees the as label, it says, oh, do I have an index for that? Yes, then I check it. I have no way to create the index using gremlin, because gremlin do not understand the index concept.
I actually have to rely on Cypher, so when I set up the database, I call Cypher to build the index, and then it will be used. But from gremlin, though, there is no tool that will tell you, okay, you should do this way, or you can do another one. The other trick I use is, as usual with databases, the less amount of data I manipulate,
the better it is, okay? So after the GB, for example, or the first check, I put a counter, okay? I mentioned that you can include closure, okay? Groovy, so I use a little side effects. I didn't show that, but there's side effects. So I run a little counter, and when I'm done at the end,
I end up with another counter, say, okay, here is the number of data I processed, here is the number of data that I have in the end. When I have too many of them, then I usually go back and say, okay, is there a way that I can make that faster? But at that point, no, this is the problem of the separation of this standard and the implementation. Okay, sorry for the long answer.