We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Graph Databases, a little connected tour

00:00

Formal Metadata

Title
Graph Databases, a little connected tour
Title of Series
Part Number
59
Number of Parts
119
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production PlaceBerlin

Content Metadata

Subject Area
Genre
Abstract
Francisco Fernández Castaño - Graph Databases, a little connected tour There are many kinds of NoSQL databases like, document databases, key-value, column databases and graph databases. In some scenarios is more convenient to store our data as a graph, because we want to extract and study information relative to these connections. In this scenario, graph databases are the ideal, they are designed and implemented to deal with connected information in a efficient way. ----- There are many kinds of NoSQL databases like, document databases, key-value, column databases and graph databases. In some scenarios is more convenient to store our data as a graph, because we want to extract and study information relative to these connections. In this scenario, graph databases are the ideal, they are designed and implemented to deal with connected information in a efficient way. In this talk I'll explain why NoSQL is necessary in some contexts as an alternative to traditional relational databases. How graph databases allow developers model their domains in a natural way without translating these domain models to an relational model with some artificial data like foreign keys and why is more efficient a graph database than a relational one or even a document database in a high connected environment. Then I'll explain specific characteristics of Neo4J as well as how to use Cypher the neo4j query language through python.
Keywords
CodeGoogolGraph (mathematics)Dot productDatabaseGraph (mathematics)Total S.A.Military baseComputer animationLecture/Conference
Endliche ModelltheorieGroup actionSoftware engineeringTerm (mathematics)Well-formed formulaGroup theorySocial classProcess (computing)Graph (mathematics)Basis <Mathematik>Lecture/Conference
BitRelational databaseGraph theoryDatabaseGraph (mathematics)Military baseEscape characterComputer animationLecture/Conference
Multiplication signGraph theoryMereologyDialectTheoryConservation lawStaff (military)MathematicsBridging (networking)MathematicianComputer animation
Graph theoryMereologyAlgorithmTheoryGraph (mathematics)Lecture/Conference
Set (mathematics)Form (programming)Charge carrierVertex (graph theory)Graph theoryGraph (mathematics)Graph (mathematics)Level (video gaming)Lecture/ConferencePanel painting
Real numberGraph (mathematics)Connectivity (graph theory)Workstation <Musikinstrument>Graph (mathematics)Graph (mathematics)Online helpCategory of beingExtension (kinesiology)DatabaseAlgorithmDiagram
Asynchronous Transfer ModeGraph (mathematics)ACIDCategory of beingProgramming languageMultiplication signDatabase transactionCycle (graph theory)Uniformer RaumOpen setGraph (mathematics)Data structureDeclarative programmingConnectivity (graph theory)Key (cryptography)DatabaseData storage deviceFerry CorstenEncryptionForm (programming)DiagramProgram flowchart
Graph (mathematics)Graph (mathematics)DatabaseOpen sourceDirected graphLecture/Conference
Discrete element methodLattice (group)WaveBitArtificial neural networkMetadataProcess (computing)Table (information)ScalabilityCASE <Informatik>Formal languageRelational databaseReal-time operating systemKey (cryptography)Domain-specific languageConnectivity (graph theory)Computer animationLecture/Conference
LinearizationDomain-specific languageAuthorizationGroup actionModemIntegrated development environmentDiagramJSONXML
Metropolitan area networkMaxima and minimaPointer (computer programming)Table (information)Multiplication signShape (magazine)Relational databaseComputer animation
Subject indexingTable (information)Complex (psychology)LoginMultiplication signFreewareShape (magazine)Logical constantComputational complexity theoryNeighbourhood (graph theory)State of matterLecture/ConferenceJSON
Complex (psychology)Table (information)Subject indexingVarianceIntegrated development environmentGraph (mathematics)JSON
Domain-specific languagePort scannerData modelComplex (psychology)Domain nameSummierbarkeitGraph (mathematics)Endliche ModelltheorieDatabaseComputer animation
DiagramGraph (mathematics)Endliche ModelltheorieData storage deviceDirection (geometry)Program flowchart
CASE <Informatik>DatabaseRelational databaseGraph (mathematics)String (computer science)
Computer networkCASE <Informatik>Point (geometry)Demo (music)Physical systemEndliche ModelltheorieAlgorithmJSONXMLUML
Suite (music)ASCIIPattern languageCache (computing)BuildingPhysical systemGraph theoryReal-time operating systemCASE <Informatik>SoftwareGraph (mathematics)Local ringData managementDeclarative programmingJSONXML
ASCIISystem callMetropolitan area networkMultiplication signCodeSlide ruleProgramming languageTransverse waveDiagramProgram flowchart
1 (number)Graph (mathematics)Translation (relic)Representation (politics)Musical ensemble
Metropolitan area networkProcess (computing)
Category of beingMusical ensembleCASE <Informatik>Diagram
Bound statePlanningMusical ensembleMultiplication signJSONComputer animation
Form (programming)Physical systemComputer animation
Value-added networkCategory of beingMultiplication signMusical ensemblePoint (geometry)Subject indexing
Neumann boundary conditionMetropolitan area networkQuery languageMusical ensemblePattern languageCategory of beingGame theoryCASE <Informatik>Wind tunnel
Graph (mathematics)CASE <Informatik>Chemical equationCategory of beingInternet forumGraph (mathematics)Musical ensembleField (computer science)JSONXML
Metropolitan area networkAssociative propertyPoint (geometry)1 (number)Musical ensembleComputer configurationLecture/Conference
Maxima and minimaCategory of beingComputer animation
Limit (category theory)Abstract syntax treeMaxima and minimaMetropolitan area networkGraph (mathematics)Bus (computing)Workstation <Musikinstrument>Entire function
Metropolitan area networkAlgorithmContext awarenessSpiralElectronic mailing listWorkstation <Musikinstrument>Digital photographyGraph (mathematics)State of matterConnectivity (graph theory)WeightQuicksortLecture/Conference
Representational state transferRight angleTouch typingAlgorithmGraph (mathematics)Computer animation
Metropolitan area networkValue-added networkMaxima and minimaGamma functionInterior (topology)Relational databaseProgramming languageDevice driverCross-correlationTwitterData qualityMathematical analysisDomain nameHacker (term)Relational databaseModule (mathematics)Representational state transferXMLProgram flowchart
Gamma functionPhysical systemDatabaseNumberChemical equationMarginal distributionComputer fileJSONXMLUML
Metropolitan area networkArmPort scannerGrand Unified TheoryZoom lensGoogolDifferent (Kate Ryan album)Computer programmingFluid staticsRun time (program lifecycle phase)Real numberRelational databaseType theoryTable (information)Statement (computer science)Programming languageDynamical systemRow (database)Endliche ModelltheorieCarry (arithmetic)Graph (mathematics)Attribute grammarQuery languageInformation securitySimulationPoint (geometry)Set (mathematics)Rule of inference
Graph (mathematics)Cache (computing)ResultantFormal languageType theorySensitivity analysisConnectivity (graph theory)Query languageVariable (mathematics)JSON
Metropolitan area networkDatabaseObject (grammar)QuicksortObservational studyRelational databasePhysical systemSlide ruleUser interfaceMereologySubject indexingFunction (mathematics)Price indexVisualization (computer graphics)
Graph (mathematics)Pairwise comparisonCASE <Informatik>Optimization problemPhysical systemCategory of beingLinear searchDatabaseSubject indexingJSON
Subject indexing
Transcript: English(auto-generated)
Our next talk is about graph databases.
Please welcome Francisco Fernandez Castaño. Hi, my name is Francisco Fernandez. I'm from Madrid in Spain. I work as a software engineer in V code, and I also run the CC++ user group there in Madrid,
and also Neo4j user group. And today I'm gonna talk about graph databases, a little connected tool. So let's start by the beginning. There's a lot of people talking about NoSQL, big data, why relational databases don't scale,
but these kind of databases, the graph databases are based on graph theory. And graph theory is a bit old topic. Let me introduce to you this guy. Probably you will know it, him, sorry. He's older, he was a mathematician from the 18th century.
And he's the guilty of the graph theory. He developed a lot of mathematical stuff, also the graph theory. And he have a lot of time to think and question things to himself. And he used to live in Prussia in Konigsberg, I think that I pronounce it well.
And he ask himself, okay, the old town of Konigsberg has seven bridges. Can you take a walk through the town, visiting each part of the town and crossing each bridge only once? Does somebody know the answer? Well, the answer is no.
But this is not the interesting part of this question. With this problem, he started developing the graph theory. And thank you to his work. We have these kind of algorithms, these graph databases and everything. And he ended up defining a graph in this form.
It's a very concise form. And a graph is just another pair of set of vertices and edges that connect that vertices. I have to read it. It's scary but we are used to deal with graphs every day. Even my mom is used to deal with graphs.
Here we have an example. Here we have a map from the Manhattan underground. And we have, in one place we have the stations that are our nodes. And the connection between the station are the relationships or the edge of our graph.
So probably most of you have come here to Berlin and you probably have run some graph algorithm to find how to come here to Alexanderplatz. Oh, I am in this place. How can I go to Alexanderplatz? Probably it's not the best, the shortest path but you have found a solution. Okay, but what is a graph database?
Does somebody know what is a graph database? Any idea? No, okay. It's a very simple concept. It's just a database that use a graph as a main data structure. Today I wanna talk about Neo4j.
And Neo4j implements a property graph. And what is a property graph? Here we have the definition of a property graph in a form of a graph. So a property graph store nodes and also relationship. This relationship connect our nodes. And both of them could have properties.
And what are properties? Just a pair of key values. Okay, as I told you, today I'm gonna talk about Neo4j. Neo4j is a graph database. It's written in Java. Sorry, it's not Python. It provides exit transactions, a res interface,
cipher language that is a declarative language to query the database. It's open source and it's a NoSQL database. But probably you are questioning yourself, why should I care about graph databases?
I usually work with MongoDB or probably Postgres, MySQL, and everything is okay. Why should I learn a new technology? Well, I think that probably there are a main reason to take care about these technologies. And I think that the traditional way,
when I'm in with the traditional way, when I'm working with relational databases, if we're dealing with highly connected data, this approach is a bit artificial because relational databases weren't designed to deal with connected data.
So probably we have some problems because we have to deal with so much information. We have to deal with foreign keys. If we are working with a many-to-many relationship, we even have to create a new table to hold this met information. We have to take care that this information is consistent.
So I think that we are mixing our data with our metadata in the relational case. And if we are working with a documental database, we have the same problem. If we want to work with connected data, the scenario is even worse, I think.
We have to run some Hadoop process or whatever to get some information. And we cannot get insight in real time. And we probably face some scalability problems in highly connected domains.
So probably we will have some problems of performance. Some guys, the Neo4j in action authors run an experiment. They wanted to compare the performance between MySQL and Neo4j in a highly connected environment.
So they run this experiment. They model a domain, a social network with user that follows between them. And I think that they store a million of users and a lot of relationship between them. And they compare. They wanted to know, give me the friend of my friends,
friend of my friend of my friends, until a depth of five. Here is the table. We can see that there is, at the first level, the times are similar. But when we go deeper, the times are far away from MySQL. Takes a long time to finish.
Why? Why this happens? Probably we will design our relational database in that shape. We will have our user table and then many-to-many relationship that this is the relationship between users in another table.
So it's time that we are looking for the friends of one user. We have to look in this table. It's an index lookup. And it has a complexity of log of n because we are looking for an index. While when we are working with our database, they are designed to get the neighborhoods for free.
They are stored in a shape that we get in a constant order of complexity. What happens when we go deeper? In our relational environment, we get this complexity because for each depth that we have,
we have to look into our table. We have to have an index lookup. So it is multiplied by the name of the depth of our lookup. While when we are working with graph databases, we end up with this complexity because we only have to transverse our graph.
But other reason to think about using graph databases could be that we can transform our domain model in a natural way. When I face a problem, I usually graph a paper and a pen and I finally ended up with this kind of drawings. I have some entities that are related to each other.
The relationship has some semantics. So this is some kind of UML diagram. And if we are using a graph database, we can translate this to our storage directly. We don't have to take care about normalizing my model
and blah, blah, blah. This kind of thing that we have to do when we are working with relational databases. Probably using a graph database for, I don't know, storing documents is not the best solution, but for other scenarios could be rational, okay? What are the use cases for graph databases?
Okay, for example, we have social networks, the well-known use case. Someone follows, this is the model of Twitter, for example. Then we have other use cases. For example, just partial problems. I want to go from point A to B.
So this is our classic algorithm and that is solved using graphs. For detecting fraud, authorization, network management, to build recommendation systems in real time. And there is a lot of other use cases. Oh, okay.
And now I will start talking about Neo4j. Let me introduce you to Cypher. Cypher is a declarative language. It's ASCII-oriented, so in some way, we translate what we are representing to ASCII code as drawings.
You will see better in later slides. And we look for patterns, okay? And Neo4j gives us these layers to access to the APIs. And on the top of it, this is Cypher. Then we can access to other APIs, transversal API.
We have to write using some JVM language to access to these APIs. We can use the item if we want to. And okay, what is the simplest thing that we can represent using Cypher? This thing. I know this is related to another one. A is related to B.
On the top, we see a drawing. And below, we have the Cypher representation. The translation is very straightforward, as far as you can see. Okay? Then we can represent other things. For example, here, I'm telling that Eric Clapton playing cream. We have one node that is Eric Clapton, and we have cream that is a band.
And we have a relationship with some semantics. So we are relating the two entities using a graph. Then we have our example of social network. We have some users. In Neo4j, we can label our nodes
because probably we want to categorize our nodes. So here I'm saying, okay, I have some users, and they are related. They follow each others. Then I can also add properties to my nodes and to my relationships. Here I'm representing that Eric Clapton
has some properties. In that case, a name that is Eric Clapton, and also the relationship has a property. That is a date when he started to play in that band. Here I'm trying to represent what bands, musicians that play in bands,
and the styles that these bands are labeled. And what is the simplest thing that I can query to cipher? This thing. I am asking to Neo4j, give me all the nodes that are related with this relationship,
with a relationship that is labeled with playing. So it will give me all the nodes that are related with this relationship, and it returns all the nodes. I can look for other things. Here I'm asking to Neo4j, okay, give me all the nodes that are related with playing,
and also in the other side are related with labeled. So basically it will return me all the nodes that a musician play in a band, and the style of this band. And it returns some properties.
But we can look for some particular nodes. Here I'm asking to Neo4j, look me in your index, a node that have a property name with a value Clapton. So we will have a starting point.
We have the node with this value that represent Eric Clapton, and I want to know all the bands in that Eric Clapton played, and the style of this band. This is the goal of this query. And I return some properties of this node. In that case, I get the name of Eric Clapton,
the name of the band, and the style of this band. Okay? Then I can look for more patterns here. Here I'm saying to Neo4j, okay, find a node with an Eric Clapton again,
and give me all the bands that have the style blues. I'm looking for two nodes in that case. I'm asking to Neo4j to look me for the node with this property named Clapton, and also this node with this property blues. I'll look for the bands that have these properties,
have this relationship. And I return order by some field. By, so, okay. We also can have optionality in our relationship here.
We have evolved our model, and we also have the relationship between a musician and a band, and all musicians can produce also bands. So here we are looking for all the bands
that Clapton play or produce, and we are filtering by some date. As you can see, at some point it's similar to SQL. Also we can have an optional depth.
Here I'm saying to Neo4j, okay, look me for all the nodes that are related with this property as a maximum depth of five. So he will look for me, and he will give me A1, A2, A3, A4, A5.
All the paths, if they are paths until depth of five, he will give me all the nodes, okay? Here we have a more developed example. It's a geospatial problem, and my goal is going from a metro station
in Madrid to another. So I look for a station. I am in Seoul, and I want to go to Retiro, okay? So I look for these two nodes. I ask Neo4j that find for me these two nodes,
and then I find all the connections, all the paths that exist between these two stations, okay? So probably I have one, two, three, or four, I don't know, and path to that connect Seoul with Retiro. And then I reduce,
I add all the weights between all the station that is composed the path, and I get the shortest path. Just notice that Neo4j has implemented all these kind of graph algorithm. It provides a sort of path, distra, a star,
all of these kind of graph algorithms are implemented in Neo4j. It was just an example. As I told you, Neo4j give us a REST API to query, to create nodes, and everything.
There are some occasions where we need to extend this REST API, so we can extend Neo4j using extension, manage, or unmanage, so we can write some algorithm using the API, the transversed API, for example, and we can expose this as an endpoint in our API.
This is some example written in Java, sorry. There are drivers for almost every language. As I told you, we access via REST API. If you want to use using Python, I recommend you Python now.
It has a module for Django, I think. I also, my conclusion, I want to quote Martin Fowler. Instead of just picking a relational database or probably MongoDB, because in Hacker News
is the trending thing, we have to think about our data and what we have to do with this data. Probably, we have to tend to polyglot persistence, have two, three, or five databases in our systems to explore this data. If you want to know more about this topic,
I recommend you these three books, NoSQL This Style by Martin Fowler, Neo4j Inaction and Graph Databases. Also, if you want to try it without installing it, I recommend you GraphNDB. That is a Neo4j as a service.
There are some pre-plans to try it. Okay, questions.
So, this is all very new to me and I have only a very vague idea about that, but from what I have seen, my impression is that we basically store records in note and we label the edges with the relations.
So, in SQL, when I want to create a new record, I have to put it into a table for which I define the type. So, I define all the attributes in advance and I define how they should look like. Am I required to do it here as well? Do I actually have to define the type of data
which I can store in the note or can I just do anything with the Cypher statements? You can do anything that you want. There are no predefined schema. Okay, so this reminds me then of a difference between dynamic and static type languages.
So, what happens if I write a statement in Cypher that actually doesn't make sense? I would ask for relation between or I would create two notes and connect them with the relation and then I would create other two notes that would carry different type of data
and I would connect it with the same relation. I could create many statements that probably wouldn't make any sense. What happens then? Nothing, no. It allows you to store whatever you want. Okay, so basically the issues or the problems are solved during the run time
when I run the statement. Probably it will return nothing if you are requiring something that doesn't make sense or something that you didn't store before but there are no type checks. Are there any advantages that this brings to us?
Like dynamically typed languages definitely have some advantages out of this. Do we see something in the database? Yeah, there are advantages. You can evolve your model as well as to evolve your program. You are not tied to a schema.
So, for example, if tomorrow I want to, in my example of musicians, I want to ask the engineers that engineer the albums of these bands, my old queries will still work and it can evolve without touching anything.
It's more agile, this. It's like a no SQL philosophy. Yeah, there's some real world scenario where disadvantage can actually play a role would be interesting to me. Thank you for your answer, thanks. Hi, in the example that you had
where you're searching for two kinds of relationships, whether it was artist and producer or musician and producer or something like that. Yeah, that one. That one. In that query, can the result contain the type of connection? Yes.
So, here you are. You are storing in the R variable, you have information of the relationship. So you can get that? Okay, thank you. Sorry, this is a silly question. You're adding all your objects and their relations and then you have a database full of stuff.
Are there tools that can sort of introspect that to then just sort of, not UML, but dump out the relationships that you actually have within your database? I can hear you. Can you repeat the question, please? So, once you have your database full of data, is there something that can output sort of a summary of the relationships that are stored within the database?
Yes, you have some web interface that represent graphically what have you stored. And that's part of Cypher or part of? It's part of Neo4j. Okay. And there are other tools like Linked Torios, I think, that explore in this way on visualization of your data.
Answer this, okay. Thank you for your talk. You said that the relationships you get for free, there are no indexes, and there is just on the slide, I wanted to ask how it is implemented that our date is greater than 1968. So, there are actually some internal indices
for comparison or it is linear search. Just when you are looking for properties, in the background, Neo4j use Lucene. So, when you are looking in that case for name Clapton, you are using Lucene. So, probably this could be a handicap
of these kind of databases because you have to go to the index. Yeah, okay, thank you. You're welcome.
Are there any more questions? No? Okay, thanks a lot for your talk. Thank you.