We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback
00:00

Formal Metadata

Title
Graffiti
Subtitle
A embedded graph database
Alternative Title
Graffiti: A historical, distributed graph engine
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Graffiti is the graph engine of Skydive - an open source networking analysis tool. Graffiti was created from scratch to provide the features required by Skydive : distributed, replicated, store the whole history of the graph, allow subcribing to events on the graph using WebSocket and visualization. Skydive (https://skydive.network) is an open source analysis tool. It collects information about an infrastructure topology - such as network interfaces, Linux bridges, network namespaces, containers, virtual machines, ... and store them into a graph database called Graffiti (https://github.com/skydive-project/skydive/tree/master/graffiti) The graph is : - distributed : some agents only have a portion of the graph - replicated : for high availability and load distribution - historical : every change on the graph is archived, allowing retrieval of the graph at any point in time or getting all the revisions of a set of nodes and edges during a period of time A custom implementation of the Gremlin language is used to query the graph, with some additional steps to specify the time context of the query for instance. In addition to the core engine, a WebSocket based user interface - based on D3JS - is available to visualize and interact with the graph. This presentation will showcase a demo of Graffiti and try to advocate its use in your own project.
Graph (mathematics)Chemical equationQuery languageExtension (kinesiology)Embedded systemEvent horizonComputer virusChemical equationConnectivity (graph theory)Network topologyProjective planeConstraint (mathematics)Office suiteSoftwareMereologyGraph (mathematics)LastteilungSymbol tableGreen's functionEvent horizonMusical ensembleScaling (geometry)Multiplication signPoint (geometry)Theory of relativityHigh availabilityTime travelData modelQuery languageEinbettung <Mathematik>Structural loadParsingComputer animation
ArchitectureConnectivity (graph theory)GravitationComputer architectureVirtual machineMereologyDisk read-and-write headGraph (mathematics)CASE <Informatik>Network topology1 (number)Semiconductor memoryLastteilungProgram flowchart
Graph (mathematics)SubsetEvent horizonDreizehnElasticity (physics)Moving averageMechanism designPrice indexFrequencyRevision controlSynchronizationReplication (computing)Extension (kinesiology)Latent heatAlgorithmArchitectureHierarchyEvent horizonWebsiteMultiplication signGraph (mathematics)Roundness (object)Subject indexingNumberMechanism designDirected graphConnectivity (graph theory)Arithmetic meanDifferent (Kate Ryan album)Message passingLastteilungFile formatPoint (geometry)Process (computing)Selectivity (electronic)Right angleExtension (kinesiology)CASE <Informatik>DataflowSubsetTransformation (genetics)Matrix (mathematics)Digital electronicsRaw image formatDialectContext awarenessGreen's functionMereologyLatent heatWindowRegular graphSurgeryRule of inferenceAsynchronous Transfer ModeSoftware testingMoving averageWritingWeb 2.01 (number)Vertex (graph theory)Replication (computing)Network socketRepresentational state transferGraph (mathematics)SoftwareRevision controlExpressionFront and back endsFrequencyStructural loadCode
Information securityGraph (mathematics)Maxima and minimaSubject indexingGraph (mathematics)MetadataWeb 2.0SubgraphLink (knot theory)Projective planeView (database)Network topologyCore dumpType theoryInheritance (object-oriented programming)NamespaceGraph (mathematics)Artistic renderingComputer fileVertex (graph theory)FrequencyCategory of beingTowerMetreMedical imagingDemosceneRight angleMultiplication signScaling (geometry)SoftwareSeries (mathematics)Touch typingComputer animation
Menu (computing)Graph (mathematics)Maxima and minimaIntelGoodness of fitComputer animationSource code
Graph (mathematics)AlgorithmArchitectureGraph (mathematics)HierarchyEvent horizonGroup actionProjective planeGoodness of fitConfiguration spaceLatent heatComputer fileHierarchySubsetGraph (mathematics)AlgorithmWeb 2.0NeuroinformatikDifferent (Kate Ryan album)Graph (mathematics)Right angleResultantTraffic reportingDemosceneWritingExecution unitSpeciesChainComputer animation
Vertex (graph theory)Directory serviceInstallable File SystemSharewareRootEvent horizonMetropolitan area networkLoop (music)SoftwareView (database)Link (knot theory)Computer fileArithmetic meanDemo (music)CodeGraph (mathematics)RootChainPattern languageMultiplication signComputer animation
BitDemo (music)Computer fileComputer animation
Sample (statistics)Event horizonType theoryLink (knot theory)Web 2.0Selectivity (electronic)Computer animation
MereologyDecision theoryDivision (mathematics)Multiplication signOffice suiteDatabase transactionDatabaseProjective planeMechanism designInformationPoint (geometry)Port scannerEvent horizonSubject indexingVertex (graph theory)Data storage deviceComputer animationLecture/Conference
FacebookOpen sourcePoint cloud
Transcript: English(auto-generated)
So, hello. I'm the first Sylvain, Sylvain Bobo, and we'll talk about Graffiti. Graffiti was created because first we were working on Skydive, which is a networking tool.
So it's one of the features. It allows you to get the topology of your whole network infrastructure and to retrieve the topology and to visualize it. And so the data model that we use for Skydive is a graph.
And so we extracted this part and we created a brand new project from this, which is Graffiti. So the engine is embedded inside Skydive. So it's a graph engine, which is highly event-based. So everything in Skydive goes through events.
It has interesting features. One of the interesting features we have is that it allows you to time travel. So it allows you to query the graph as it was at a certain point of time.
It's highly available, so you can have multiple, we'll see later, but one of the components is the hubs. So we can have multiple hubs, and so it provides you high availability for your graph. And it also provides load balancing, because we'll see that you can subscribe to multiple hubs.
And so to balance the load. So why did we create our own engine?
It's because first we were using existing graph engines, but we had first the constraints of the embedding. So that was our first constraint, but also we wanted to be able to easily extend the query language that we were using.
In fact, in Skydive, in Graffiti, we implemented our own Gremlin parser and executor. And so we wanted to be able to add custom steps very easily, and even steps that are not really graph related.
So that's the architecture of Graffiti. The first component is the pod. So the pod is a small agent. It has just a local graph, just part of the graph.
That's where you create the nodes. So in the case of Skydive, the pods are running on the machines of the infrastructure. And so the graph is populated on the pod, and then the pod forwards its graph to another component, which is the hub.
And the hub has the whole graph. So it's in memory, but it's also persistent, so you can also have a database behind it. So as you can see, the hubs are replicated. They are connected, and the pods are connected to multiple hubs, just in case of failover or load balancing.
So this, regarding the event mechanisms, so it's a graph as a pub sub.
So internally, so when it's using in this embedded mode, you register callbacks on the graph, and then your code is triggered when any event happens on the graph. You can also subscribe to the graph externally through a web socket.
So that's the way the web UI works. And you can also publish to the graph. So you can publish to a pod or to a hub. It doesn't make any difference. It's the same API.
You can also subscribe just to a portion of the graph. If you are not interested in all the events of the graph, just a few nodes or subsets, you can subscribe on this. So the events that you get are pretty straightforward.
It's the node creation, the edge creation, the updates, the deletes. So the messages are encoded in different formats. You can use JSON for the web UI, obviously, but you can also use Protobuf for performance reasons.
So that's the way the pods and their hubs are talking together, is through using Protobuf. Regarding the history, we keep every modification on the graph. It generates a new revision of the elements.
So that allows us to do two things. It's just to say, show me the graph as it was at a specific point in time. But also to see all the modifications that happened on a node or a subset of the graph.
So you can see, give me what happened to this node, and you will get all the events, all the modifications and the revisions of our node. So to achieve this, to use this time context, we introduced a new step, a Gremlin step,
which we called at. And you can see here that you can specify your time. So let's say that this Gremlin expression says the graph, how it was one minute ago. Or you can specify a date. And you can also say, give a time, and then the period where you want to get the revision from.
So we support as a backend, we support Elasticsearch. But we also support Orion DB, but I would not recommend you to use this.
Because all our efforts are put on the Elasticsearch backend. We also have, in the case of Elasticsearch, to maintain the index are not too big.
We have a rolling index mechanism. So then you can set, like, I don't want my index to be more than a specific size, or a specific number of nodes. So to achieve high availability, we have replication mechanisms between the hubs.
And for the load balancing, the pods use a round-robin connection to the hubs. We handle the reconnection, so if the network connectivity goes down, then
the pods will automatically reconnect and resync their graph with the hubs. And we have, and for the rolling index, this has to be done on a single hub. So we have a master election, and for this we use ETCD.
So to write Gremlin extensions, they have to be written in Go, because Graffiti is written entirely in Go. And your new step will be automatically available through the REST API.
So we did not implement all the Gremlin specifications, only the ones we were really using. So we have all the basic Gremlin steps, and the one we added specifically for the time selection.
We also added steps for networking purposes. So in the case of Skydive, the main user of Graffiti, we added the flow steps, so that allows you to retrieve the network flows of your infrastructure.
You can get some metrics, you can have like sockets, so those are, I won't explain this because this is network specific stuff. But the fact is that it provides a graph transformation, so if you use the socket steps, then you have a new graph.
And then you can also subscribe to this new graph and get the same feature that you have with the regular graph. And now, sorry, demo.
Okay, this is the first demo, so just to introduce you what it is.
This is a web UI that we are using for Skydive, which is the core project for Graffiti. But definitely it can be used with, it was used by Skydive, sorry, but definitely you can use it with Graffiti. So that's what you can get.
So you can explore the graph, it's a kind of a tree because of Skydive, but then you have a regular graph, we can see we have multiple parents. And you can work through the graph, you will get metadata. Yes, sorry. And the metadata, you can have multiple metadata and you have a way to render them properly according to your needs.
You have a way to do search, to select the column that you want to see. You can describe your metadata and the rendering used for that within the config file. So since this is an embedded project, you embed your Graffiti in your project and you provide metadata and then the web UI will render it properly.
And thanks to the history, for example, for the matrix, here it's for networking purpose, but then thanks to the history we can get a graph of the data, requesting a period of time, something like this.
And then we, thanks to the web UI, we have a way to tag the links and the nodes and then it will generate different views. So yes, that's another view. So it's a Kubernetes infrastructure, so you can get the infrastructure, I mean the physical infrastructure and then the logical one.
And basically we have a kind of layer of graphs and you can select the types, the link types you want to see. And now we are going to see that we are going to leverage the filtering, subscribing mechanism.
So yes, when we click on this, the web UI is going to reconnect to the Graffiti, subscribing to another subgraph and getting another view. I mean just a subgraph, so here just the namespaces. And you can do a quick search, thanks to the web UI, there is a kind of index and then it will open properly the node.
So just to summarize a bit, when to use Graffiti and when to not use it. So definitely if you want to write a go-long application, having an abandoned graph engine, you can use it.
It's schema-less or not, you can have schema and that's useful sometimes. If you want to have, like we did for Skydive, if you want to have a kind of grimming language, a common grimming language, but being able to extend it for specific purpose, you can definitely do this.
If you want to have a distributed architecture, that's good too. And we do support the hierarchy of graphs, so meaning if you want to have a subset of the infrastructure having a specific graph and then propagating the graph in a chain, that's possible. And attaching ACL, that's pretty useful because you can get, you can authorize some persona to do something or to subscribe to something.
And when to not use it, so definitely this is not good for any graph specific algorithm in high computing stuff. This is not the purpose of the project, it's definitely an embedded project.
And if you have node or edges with a lot of metadata, binary attached to the metadata, that's not good too. Again, that's definitely an embedded project. For the web UI, it's definitely even based, I will show you that just after. You can do search as I explained, filtering, everything, you can put everything in a config file so you can really customize everything.
That's one view you can get with, that's definitely a network infrastructure, but you can see there is plenty of nodes. And I do have another demo, so basically the demo is about doing something which is completely outside of the scope of skydive, so not networking purpose, that's a real demo.
So it will be a Python demo watching a directory, creating nodes for files or folders, creating edges for any kind of links between the entities. So basically, the code is like this, we are watching, thanks to your notify your folder, and we create nodes when we see a file
or a folder, and then if it's, we are linking the new file to the root node, and if it's a folder, we do this again. So that's fairly simple. This code is in Python, meaning that you do have with
Graffiti a Python binding, so you can, it's not mandatory to interact with Graffiti in Golang. And I'm going to show you that, it's going to be quick.
So first we start the de-watcher, it's going to watch a folder, then I run the script, and we are writing a bit. So we have a host, and then if we expand it, we have our first folder, which is watched, and then we start the demo.sh, which is going to create a few files.
So just to show you that event base, we will see the web UI reacting on the events. And it has a few folders, and then finally, we will see a symlink between two files.
Just here, that's a symlink, and you have new link types that you can select if you want. And then the filtering again, okay, I want to see only the folders, and that's what I have. And I think that it's, so if you have, if there is any question.
Is there any questions? Yeah. I have a question, how do you handle deletions, or can you handle deletions because of the possibility of going back in time?
Sorry, a deletion, yeah, that's an event, that's part of the event stored. Just repeat the question for... Oh yes, yes, how do we handle deletions of information, nodes or origins, for example? Yeah, so basically, we store this information within the data store. So that's one of the events we store, we store creation, deletion, updates.
Maybe you want to... You mean how you restore it in the database, the way we mark a node as deleted? Yeah, because every node has a lifetime, basically, as a created and deleted times, and then we just select the nodes that are still living at the time we query.
And the store grows over time, because at some point in time you have so much data. Yeah, that's why we do have a rolling index stuff. That's one of the reasons. And there is a revision, which is kept for all the modifications.
So that's part of the mechanism. Okay, so if you want to, if you use it as a, yes, is there any transaction thing?
So if you use it internally, I mean, if you embed the project within your project, yes, we do have a mechanism for that. But if you interact it with the graffiti outside of the Golang scope, like in Python, there is nothing. So it's, but if you embed it as part of a Golang project, yes, there is a kind of mechanism for that.
Sorry, can you repeat? Yeah. But you have to implement it yourself.
Now let's find the speed of scan.