Graph Databases: Talking about your Data Relationships with Python
Formal Metadata
Title 
Graph Databases: Talking about your Data Relationships with Python

Title of Series  
Author 

License 
CC Attribution  NonCommercial  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2017

Language 
English

Content Metadata
Subject Area  
Abstract 
Graph Databases: Talking about your Data Relationships with Python [EuroPython 2017  Talk  20170714  PyCharm Room] [Rimini, Italy] Have you ever considered how many relationships you have in your virtual life? Every friend or page liked on Facebook, each connection in LinkedIn or Twitter account followed is a new relationship not only between two people, but also between their data. In Brazil only, we have 160 millions Facebook users. How can we represent and manipulate all these relationships? Graph Databases are storage systems that use graph structure (nodes and edges) to represent and store data in a semantic way. This talk will begin approaching the challenge in representing relationships in Relational Databases and introducing a more friendly solution using graph. The definition of Graph Database, its pros and cons and some available tools (Neo4J, OrientDB and TitanDB) will be shown during the presentation, as well as how these tools can be integrated with Python. Outline: Relationships Relationships in Relational Databases Graph Definition Graph approach to represent relationships Graph Databases Definition Advantages Neo4J Usage Examples Integration with Python Comparison between Graph Databases Comparison between Neo4J and Relational Database Application

00:00
Group action
Graph (mathematics)
Graph (mathematics)
Software developer
Gender
Debugger
Mathematical analysis
Database
Coma Berenices
Database
Student's ttest
Mereology
Stack (abstract data type)
Front and back ends
Wave packet
Degree (graph theory)
Word
Selforganization
02:12
Pairwise comparison
Group action
Matching (graph theory)
Graph (mathematics)
State of matter
Relational database
View (database)
Cellular automaton
Software developer
1 (number)
Online help
Bit
Database
Lattice (order)
Client (computing)
Cartesian coordinate system
Event horizon
Code
Power (physics)
Product (business)
Process (computing)
Core dump
Selforganization
05:18
Web page
Trail
Group action
Link (knot theory)
Code
Connectivity (graph theory)
Multiplication sign
1 (number)
Set (mathematics)
Disk readandwrite head
Facebook
Medical imaging
Profil (magazine)
Internetworking
Representation (politics)
Software testing
Data structure
Area
Home page
Graph (mathematics)
Relational database
File format
Weight
Gender
Graph (mathematics)
Database
Cartesian coordinate system
Graph theory
Category of being
Vertex (graph theory)
Video game
Table (information)
Row (database)
11:13
Home page
Matching (graph theory)
Graph (mathematics)
Gender
1 (number)
Database
Basis <Mathematik>
Line (geometry)
Vertex (graph theory)
Representation (politics)
Circle
Data structure
Abelian category
Graph theory
12:16
Pattern recognition
Group action
Graph (mathematics)
Database
Area
Information retrieval
Facebook
Mechanism design
Mathematics
Different (Kate Ryan album)
Process (computing)
Physical system
Area
Pattern recognition
Algorithm
File format
Relational database
Data storage device
Data analysis
Measurement
Mechanism design
Type theory
Proof theory
Process (computing)
Data structure
Physical system
Server (computing)
Implementation
Sequel
Algorithm
Virtual machine
Data storage device
Scalability
Number
Centralizer and normalizer
Directed set
Data structure
Graph (mathematics)
Information
Mathematical analysis
Theory
Analytic set
Database
Ultraviolet photoelectron spectroscopy
Cartesian coordinate system
Scalability
Graph theory
Personal digital assistant
Information retrieval
Table (information)
15:33
Trail
Category of being
Graph (mathematics)
Process (computing)
Sequel
Relational database
Query language
Electronic mailing list
Website
Database
Bit
Representational state transfer
16:45
Graph (mathematics)
Gender
Gender
PoissonKlammer
Electronic mailing list
Database
Parameter (computer programming)
Representational state transfer
Attribute grammar
Type theory
Category of being
Query language
Vertex (graph theory)
Object (grammar)
Resultant
17:57
Matching (graph theory)
Database
18:38
Positional notation
Information
Direction (geometry)
Arrow of time
Representational state transfer
Form (programming)
19:22
Web page
Home page
Graph (mathematics)
Vapor barrier
Information
Structural load
Home page
Mathematical analysis
Database
Database
Complete metric space
Query language
Object (grammar)
Address space
20:24
Home page
Graph (mathematics)
Information
Query language
Interface (computing)
Digitizing
Musical ensemble
Web browser
Object (grammar)
Representational state transfer
21:30
Graph (mathematics)
Gender
Code
Graph (mathematics)
Multiplication sign
Sheaf (mathematics)
Database
Database transaction
Subgraph
Methodenbank
Cartesian coordinate system
Mathematics
Password
Pattern language
Endliche Modelltheorie
Object (grammar)
22:55
Beat (acoustics)
Home page
Dependent and independent variables
Graph (mathematics)
Graph (mathematics)
Web page
Home page
1 (number)
Type theory
Query language
Computer configuration
Musical ensemble
Object (grammar)
Abelian category
24:12
Pairwise comparison
Graph (mathematics)
Sequel
Key (cryptography)
Java applet
Graph (mathematics)
Java applet
Data storage device
Database
Database
Revision control
Query language
Network topology
Extension (kinesiology)
Extension (kinesiology)
25:27
Sequel
Observational study
Graph (mathematics)
Multiplication sign
Home page
1 (number)
Database
Client (computing)
Average
Formal language
Readonly memory
Different (Kate Ryan album)
Semiconductor memory
Average
Operator (mathematics)
Selectivity (electronic)
Software testing
Home page
Computer font
Dependent and independent variables
Graph (mathematics)
Key (cryptography)
Client (computing)
Total S.A.
Bit
Database
Line (geometry)
Graph theory
Type theory
Vector space
Vertex (graph theory)
Right angle
29:36
Sequel
Relational database
Query language
Operator (mathematics)
Relational database
Home page
Database
Table (information)
Abelian category
30:29
Area
Computer font
Graph (mathematics)
Sequel
Relational database
Database
Database
Cartesian coordinate system
Attribute grammar
Type theory
Query language
Personal digital assistant
Network topology
Query language
Data structure
Navigation
32:16
Area
Graph (mathematics)
Information
File format
Relational database
Connectivity (graph theory)
Graph (mathematics)
Interactive television
Mathematical analysis
Computer network
Database
Mathematical analysis
Event horizon
Particle system
Software
Telecommunication
Telecommunication
Natural language
Data structure
33:53
Cluster sampling
Greatest element
Computer file
Algorithm
Graph (mathematics)
Weight
Multiplication sign
Gene cluster
Mathematical analysis
Methodenbank
Centralizer and normalizer
Methodenbank
Query language
Directed graph
Chisquared distribution
Computer font
Algorithm
Graph (mathematics)
Information
Weight
Electronic mailing list
Mathematical analysis
Computer network
Database
Ultraviolet photoelectron spectroscopy
Cartesian coordinate system
Category of being
Digital photography
Software
Green's function
36:46
Graph (mathematics)
Vapor barrier
Information
Relational database
Graph (mathematics)
Weight
Graph (mathematics)
Gene cluster
1 (number)
Mathematical analysis
Set (mathematics)
Disk readandwrite head
Mereology
Cartesian coordinate system
Measurement
Centralizer and normalizer
Robotics
Normal (geometry)
Computer worm
Table (information)
39:12
Presentation of a group
Codierung <Programmierung>
Cycle (graph theory)
Blog
Hidden Markov model
Twitter
39:51
Server (computing)
Presentation of a group
Functional (mathematics)
Implementation
Sequel
Euler angles
Modal logic
Multiplication sign
Time series
Black box
Mereology
Event horizon
Attribute grammar
Revision control
Fluid statics
Coefficient of determination
Goodness of fit
Mechanism design
Mathematics
Centralizer and normalizer
Population density
Bit rate
Semiconductor memory
Energy level
Form (programming)
User interface
Pattern recognition
Graph (mathematics)
Scaling (geometry)
Relational database
Interface (computing)
Weight
Graph (mathematics)
Data storage device
Electronic mailing list
Physicalism
Sound effect
Database
Instance (computer science)
Cartesian coordinate system
Graph theory
Sparse matrix
Process (computing)
Website
Video game
Iteration
Library (computing)
00:05
and had run has so many in the cold air and I'm going to discuss with you today about graph databases and how to handle relationships uh within your the you with the new data with by um so 1st
00:23
uh let me introduce myself I am the fully stack developer at lab coats a and I work mainly with jungle in the back end in England years in the front end and adding Alma and also as a student at the Federal University of Grenoble cope in and I'm trying to get my master's degree and I'm almost there and the final part in my master research is related to performing all Clary's olap analysis on graph databases so that's why I'm interested in this topic and that's why I'm talking to you today about this um and also a member of the Python Users Group impregnable cool and the pilates group in which the the that's my teacher an I leads in Brazil so yeah but later his see fainter than book I said a lot of words that you guys probably be they understand so I came all the way from Brazil to him in a to attend era Python yeah if it's so it's a really long trade from 19 hours between um airplanes and trains to get here so I'll leave you receive the that's in the northeast part of uh Brazil we have like this very active community of Python use back there we you recently organized gender grows last month I was 1 of the organizers and we also organized a the fees these uh to user groups uh the team so as we're very actively we're really proud of it aid I work at a lab
02:14
coats has a set the forehead and it's a softer pseudo and and what is that codes letter codes is a softer still that's from received the to the world and and the use of 2 studio means that we developed solutions being needs at a process or a and web at or a product for our clients and it's according to our clients desire in the client's needs an we can solve the problem we can develop a new product or we can implement a new process in their business we have 5 years of experience with clients in Brazil and in the US and in the technologies that we use are mainly and don't go on JavaScript may with and what has and react in the little bit of view you day we also work work with uh post Greece custom on a less in Alaska search and a lot of others and so on and codes from the with the help of the community so what that's why we are always trying to give something back to the community and we helped organize some of the uh powerful user groups meetings in our state we help to organize Abbas you come in 2014 it seems should thousand and 12 we have been to all Brazilians Python conferences every year and we also have participated in a lot of junk grows events as coaches and organizers so we are trying to always give back to the community because we came from its we came from a group of Anthony sets match in a Brazilian Conference cell and is it's just fair to give it back began about
04:13
about this talk about this is mentioned that I was that talking about relationships are what I mean by at a year and then I will introduce the concept of graph databases and in no if anyone have heard of it and has anyone heard of graph databases will that's really good that the missing it and then I will proceed to talk about a new Ford saying that the most popular graph databases the graph database that we have in the industry but we also have other so well be comparing some of the solutions that we have available today I wasn't doing as no comparison between new for J and relational databases are how we can compare those 2 because we as a the ones that are used to develop uh let developers I used to work with relational databases so I brought and some of the concepts here so we can compare in and and I will talk about some applications some core applications that we can have for graph databases
05:18
yeah at so yeah let's starts with relationship so what is that they keep talking about and
05:27
so on our relationships in daytime is an it's really is pretty much related to know the relationships that we have in real life every time that you add a new friend on Facebook or you follow someone on Tweeter and or been and an image in the interests or you accept a new connection on link Indian you are creating a new relationship not just in your personal life that means that now you're friends with someone else or you've vol someone else but you are also creating in a database a relationship between your data in someone else's data so when you want someone as your friends if you need your profile data is not linked somehow to the profile data of the other person and the but now I see you are a test we have a lot of relationships on the Internet only in Brazil we have 160 mediums used users of Facebook so it's like a lot of people use of Facebook back in Brazil and each 1 of them at each one's friends in like a page or participate in the group so that's a lot of relationships um but how can we represent and manipulate Altes relationships in a good way so is not how we presented as most scenario very common as social network where uh just an example where the user can be friends with another user or like a page working in a similar way as we know Facebook just as small as an area that to we can request so another to have this scenario let's try to come than uh represented the daytime of this scenario using tables as we will do in relational databases uh so as 1st we would have a user table so we can keep track of the date of our users and each user has a name gender and age OK that's cool so now we need to store the formation of we to user is framed weights it user so let's creates a table called friends with and in this table we have the idea of a 2 ideas of users in each row represents a friendship between 2 users code that's been nice I understand so far and so let's create another table to represent pages uh a each page has a name and ID obsolete in that category but we also need to store the information on from each user likes which page so let's create a table called like that connects uh user ID to a page ID OK now I need to know now I have all my data in the tables and running my application and they need to know what are the pages that the user with the name John likes and OK and OK users always maybe John so let's go back here to the user table and we find out that John has an idea of 1 code so now we have to go to the other table a is they're like stable and that's the idea of the pieces that the user with ID 1 light and we have their heads are the pages with ID to end I D 1 and then we have to go back to that page it's on table and against that that that page with ID 1 is cola in the page we take the 2 is the Beatles so yeah you saw that to introduce really simple question I had to comes to query 3 different tables just to get the pages that the usage of light and that was not fun it was a lot of going back and forth to try to figure out by the which 1 is which 1 needs it's not great then basically table sucks for this kind of thing you mean and it's it's not that the the best way to do it so how can you use a better data structure to represent this state that if we can insert this kind of question faster and more intuitive way that's where present your graphs and graph is added that structure that's usually represented by the letter G in mathematical format and is formed by a set of vertices V a the set of edges E if that murder simple as a data structure that i new 1 that has studied this in the past knows that is just a representation it is a very simple way in in to achieve a graphic way to represent your data and given this concept now we can represent all war scenario using graphs here I have to graph representing now the same data that we had before in Tables represented as graphs each
11:15
circle is a vertex or node In each line represents an match or a relationship between the notes and we have in the green ones are the users their red ones are the pages as uh so as you can see we get each alone we have labels for each relationship that we can see a friends with and and relationship of light I mean
11:47
so given this representation is too much easier to find out which basis you the usage of like so we just find in the so called was the name John and then we followed the lines I do we get to the pages that the users like this so that's pretty good then we can have like a graph can view of our data and it's really nice but how can we use this kind of structure in our database
12:17
that's when i comes this graph databases so graph
12:22
databases justice system that's farmers data in graphs took the structures and which allows users to explicitly store the relationships between data so with explicit relationships we can get either directly information Retrieval method we can directly retrieval the formation of the relationship and besides that we have other advantages for graph databases not only explicitly story the relationship in the database uh they also allow and more elaborated date analysis and we can use a common polaritons from graph theory area that I don't know if anyone here has ever heard of it but we can have process algorithms of community detection of pattern recognition or centrality measures and you run these organisms in out there actually own databases and find out a new information new analytical information about our data the another advantage is that graph databases I have a very flexible uh the schema so what is it let's imagine now that for our leaders scenario we would like to introduce the concept of groups as we have groups on Facebook proof of riffraff members and double lifted to the that in there in relational database we would need to create a new table and probably change the columns of another table and do a lot of things to make this work but from graph databases but we only had 2 already added the node with this type of group in connect those to the existing nodes we don't have to it we don't even have to look at how the database is organized before we can just add that's why I didn't show everything because it's not necessary I just have to add that node and connects to whichever I want to connect it's that simple yeah another didn't it is that a recent graph databases and implementations are implemented using no sequel storage mechanisms so bake they carry out their differences of no sequel database is with them which means that they have a horizontal scalability which means that 2 we can I improve the performance of a database just by increasing the number of simple machines that runs all the cases we don't need a huge amazing server we can have a small and simple machines that can ruin our our application end which means that we can as do some distributed processing to improve the performance of our application it's really good it
15:33
so I know I decided to
15:36
go up a little bit further is but explaining you foresee as our graph database at because it
15:44
is the most popular graph database according to D B E genes to be in the east 80 websites that contains a list of all the database is available both relation now or no sequel of graph databases and to keep on track of it and they keep the interests of most popular um databases and if was a is the most popular in the category of graph databases it is implemented in Java so has its own query language that is called 1st I'm going to show some examples of In the data can be accessed through through the REST API or add Java API uh in now I rule and come with some examples of such a process for you and let's let's say that we want to create a vertex on old as we call it the new for j this is the common that we use we just have to
16:47
uh use their the keyword creates In the in we are inside the the practices that John is just some and and yes for the nodes and we say that the type of the node user agents in the brackets we pass all the parameters that all the attributes that the you know the node has has name gender it in 8 and for these queries specifically I want to return this these nodes just I could show you this is the result of run it is
17:21
common to any of which it's just a single node with the name John yep basically but when we do this using the REST API he we get into his own object and decide the digital object we have this graph object that comes with a list of nodes in the list of relationships in our graph it we can see that for now we only have 1 node uh of the label user and with the properties that we see that named John with no relationships so let's create a relationship because that's not and the of graph databases how we can create relationships
17:59
using Cypher and let's say that we have already to all nodes in our database that's John and Mary ends at 1st we need to retrieve these nodes by using the keyword match and we and getting them by their names and then we create a relationship of the label friends with between these 2 notes and I we return so we can see it
18:29
yeah the so I at 1st we had married John totally separated just 2 nodes in our database but an now after creative relationship
18:39
we have this arrow come from John to Mary as we use this
18:44
notation this arrow notation in their this i for Crary where we use an arrow to indicate the direction of the relationship but this is totally apician now we can have relationships without directions Bob with directions both ways so you can at even add more information to about your relationship this 1 I prefer to add a direction that's
19:10
totally up to you and I yeah their arrests and they'd say so as a form of a REST API requests usually have to create a relationship we have now our
19:24
graph object with 2 nodes that's as John and Mary into 1 relationship that connects those 2 notes so we can see how we can do that using address API now
19:38
let's load all that information that we had before analysis barrier let's
19:43
and to let's load everything up to 4 stages this is how this is what we get we get all the 3 users and the 2 pages that we have the relationships so now let's query let's try to retrieve some data from this database and to query we use that you would match and then on we can just say next then then all ways than they use on on with the name John a that has a relationship of to page in her tummy those pages that's easy so it returns me please
20:26
and these graph this graphic this semester I took from the new for a embedded browser a browser interface that they have as soon as you install it you can go to their it's really easy to use it's really graphic you can totally see your data like this and makes things things easy if you I starting with new for J so it's an it's really nice if you were doing a REST API Prairie and digital objects that readers now has an object of data that's the data that it
21:05
wants to return to your query and it returns to roast because they use a John likes to pages so it returns Ichiro returns the relationship as an of Joan with the Beatles and John which cola so we have that information into his own house of yeah that's pretty good but we only use a cypher where's Python so as to
21:32
integrate this out to integrate new foods a
21:35
waste your Python applications and we use by to New York right 80 pattern model that integrates different it to your application a supports Python 2 and 3 yeah as so I would show you an example of how to do all the things that I said before to create a node to create a relationship inquiry 0 yeah database the use ICU so this is the python code to eat we import from the prior to heal our object of graph know the relationship as a always subgraphs running we await subdatabase Roni we can get the graph to object from that just had to pass the password and to eat a then we started to section they did not begin we static transaction in the synsets and we can create a lot of time and we can create nodes and relationships in a week or meets every change that we want at once and at the end of the all the transaction so at 1st created the nodes of a Joan and then the called the create method for that transaction I did the same thing for Mary and denigrate the relationship but everything just get close to the database as soon as and to the Committee to comment so when a coder comments method it pushes everything to to new which but
22:56
how can we query antiquaries is just a simple we have our graph object and we now use an old selector in which these notes lecture we can select an 80 and older with the label user and the name John and that the first one that corresponds to the squaring N is the day of the week can match these are and uh did the relationships that starts with these nodes that represents the user John and there has the relationship of types like and then get all these relationships and print them into and node of these relationships which corresponds to the pages that the usage of likes and I just created here so to see that that the response and you also allows you to all we room assets query he side of it so and this is the user you all price of to to run this but you can also and the common the ground and you can best whole cypher query to it and the 2 will return to you that the response and so you have now i i talked a lot about new for
24:09
unity we're pretty used to it by now but it out what are the other options that we have
24:14
so far and according to 0
24:19
I I took at the other 2 most
24:21
popular graph databases from the genes we have a really be entitled the as a put up the comprise more comparison between those 3 in infancy is an active graph database that going to be a somewhat more the database which means that it not only contains graph database but it also supports keystore a key value store and columnist for a bunch of documents are a bunch of other kinds of our methods to sorry data while tighten DB and works with rest but it has to have a better in the DB to work with it can be I of Berkeley DB are there is a kind of a database to connect to tighten the B all of the tree that all of the 3 Arab implemented in Java a at each 1 of them has its own query language we have suffered from the 4 j we have an extended version of sequel for or you to be able to be have grimly for Python DB comparing those 3
25:29
types of acquire language we have a serious of questions whether the pieces that the user John likes um was a we're right so how that's done it's pretty easy Broussard forwards and but audience be brings some of the are more familiar to the ones that they used to sequel because we can see that structure of select from where that we are used to but they always add some key was to work with graphs so as its select seen on the relationship like that goes both ways aimed at expanding to get the nodes that are at the end of this relationship in each we returned from end users on with the name John so it to get the notes that I am from the user with the John and will expand to the other nodes that these nodes connected to and we're to bring that the pages really has a total different syntax it's um it's not something that we use to but someone that has worked with growing should be find this easy but uh it's doing basically the same thing it's going through our graph G and to our of vertex assets that the it is getting the vertex that has the name John ain't getting out there in the then the nodes that's connected to eat but is basically the same idea behind it yeah sigh and that's compares some some some things about performance
27:13
of i didn't perform this these experiments I took it from some papers that are found on line that was compared those that were comparing these 3 databases and I thought it was good to bring here if someone was wondering how these 3 performed between each other so 1 of the most common operations that we do in a database these just retrieving using as their ID so in this test and a in the public they every it's time for at each of these 3 our databases to retrieve an old news that given its ID in the graph that has high 100 thousand vertices and a for clients the air performing these operations to 100 times so they did this experiment and based on the coupling to the average time response from that is it was done by some researchers in Belgium In and here we see that our moon Titan DB highest and is is likely is lower than the other 2 have been debate is the fastest 1 but new foresees quite close to the line so and we see that they have a pretty pretty close but yeah tightens the 1 that's a bit slower because I'm yellow but this can change if you change the the vector and B I mean the in for this experiment they they're using passenger so identified in other experiments using other kind of uh vector uh of database so this could change but the another performance experiment that they had these an is related to the amount of memory required by each of these databases and new so based on the graph with uh 32 thousand vertices and 256 thousand respect edges in the it chosen by some researchers at Georgia Institute of Technology and we see that a it's the requires a lot of internal memory 4 to studies kind of graph y type and it is the 1 that requires that the least on here In a 4 is a is In the middle of it so yeah uh but
29:37
bring into a relational database that something that some of that that's a concept that we are most used to as if we want
29:44
to perform this inquiry that it was doing before we doing their best examples if we wanted to do that in sequel we would have 2 this would be the query we would do would be selected from somewhere and we would we would have to join this 3 tables to get this information and besides the fact that these these not as legible as the 1 in the the 1st query so that's I think the the advantage of sigh fair to S equals so yeah and we know that a joint operations are really great for us but it takes some of the performance of complications so that's not so good and I I also brought a performance experiment that
30:31
uh as as some researchers at the Mississippi University did comparing and you for J with my sequel and they basically have tried to submit 2 types of queries and structural query aid I data query an query basically goals navigating through your data through the relationship so few data AIDS doing like have we did 1st search innovator like a tree and data querying only retrieves um um a node by its attributes and we can see here that actually um mice and no any of a is not that good for data query if your application is only trying to reach 3 nodes by some at tributes aimed not at using the relationships so much so that the topology of the graph so much maybe that's not a good idea to use new for j because my sequel can perform better than that and but if you have a lot of navigations between the relationships of the data and then new for T is the way to go because it's easy to use and it's more it's faster and then and my sequel so yeah we you have to to analyze what's your application is doing and how it's using the data to choose wisely if you goal or not for our graph database and so yeah let's see some
32:09
of the cases because I'm talking about so in theoretical so where they use its where it's it important to use and graph databases there are several areas where
32:18
we could use graph databases 1 of them is social network that's the example that I've been doing so far aid an the yeah we are we we can see that's pretty straightforward to relate those user as nodes and relationships as edges it's pretty straightforward but we also have some work done in this area for bioinformatics in the genetic analysis I'm not from this area so I am not I don't know exactly what they do but it appears to be that the particles of our DNA Dave have some interactions between them in those interactions are relationships between particles so they start this if this kind of information in a graph databases so they could process this information in a graph databases in a better way than it would be in a relational database so that's an area that's taking a lot of the of a lot of that event is from a graph structure another interesting area that's using graph databases nowadays EC telecommunications because as they can represents the formation of the person calling another person and the connections of the cables
33:34
easily using graph databases so they can visually see their network using this kind of database yeah but I personally brought to you today and more specifically as yeah yeah for me it's important but I do know about you an
33:54
application about graph at the bottom of Thrones that uses graph databases so it's really really interesting application of graph databases that someone did and it was amazing so brought to you here the work of these guys that's just uh it blew my mind and it was angel
34:12
beverage it's a huge shown and
34:15
and I find the those names correctly but they took the time to analyze the network formed by the the characters of the book I storm of salts from 1 of the most loans a and they went through all the book and they resisted the relationships between each of the practice and they also gave weight to this kind of relationship so if act as a person was really close to another person it has a higher weight of the relationship if it's just like and it's not so close so that the weight these is low In the did these and they put in the a CSV file it gets formant like this so so they they character at among tool the category has been laid and a weight of 5 but lemon with some well has a weight of 3 1 so they did this to all the book and all the characters of the book and you know that a lot so would that really amazing if there is someone the other guy it's called William Leon that had the brilliant idea to loans information assigning you for j a The I started to play around with it and do some analysis on its and to the this analysis he used to air module called I grant I don't know if anyone has ever heard of it it's about a module that allows you to manipulate the graph using as an network analysis algorithms such as centrality in community detection it's really pretty easy to call this away from us from this module a has a really neat way to connect with new for j uh so yeah let's take a look at how we move beyond the biggest using I graph and you for a using by some so he had got their graph from prior to yield a it's connects new for J and then he loaded all the information from the graph the the new photographs into I graph and then he just called The EDR community Walktrap method a ain't considering the weights that each relationship had and indeed was able now using this method he was able to identify the the clusters or communities of the end of the list of characters so low that they a
36:49
result of that that method was a table looking like these as so we have 8 clusters in each cluster head AEU and a set of characters that as a part of this community and in the west quite low so it's over here but a sees it's not deterministic and it some some some stuff like this may happen to your data but yeah he took this information and he also did some centrality measure and he was able to come up with this is a graph so this graph represents all the characters of the book and they are divided each column represents the community or cluster the note that the size of the node represents the importance of that character given the centrality measure In there it with this of the edges represent the weights of the relationship between the nodes so the below you can see that I don't know if you can see but um we have if you look at we have that the big the big node of the Barrier is John John Snow aid if you look around you see that the blue alkalinity represents that people that's in the wall on the wall so you see other the characters over that and the green ones you see that they be good node is then there is any you know if you look around to see that that's all the characters that are from her parts of the of this film the big yellow 1 you see we have robots that the king and anterior on in and so see other other important people at that represents the at the main part of the of the IMF name of norms and
38:40
so yeah and this is really funny it's was really have a nice way I found it I really interesting application of graph and databases how you could play around in and out and analyze the data to extract some more information about it and it and it's pretty it's pretty nice to see how your data behaves even his analysis if you work to with a relational database you couldn't have these visual of your data which the nice so young that as its
39:13
I have any questions please feel
39:16
free to ask now or later I would be around here on the road the this talk is only speaker that as less leather coats I also have written a blog posts about this presentation is of the same thing you can find all the nodes aligned in our medium account in lab coats and these are my 2 ETA in HMM so if you have any questions you free at the thanks fj
39:45
few and we also have stickers over here if you want the stickers yet so
39:57
thank you so much regret of the intron OK you have a lot of questions to have that sometimes the the of I don't think for the amazing talk I come from a basically know a skill and relation of the base background and my my question is mainly held the scale such about the basic graph at the race maybe would tighten the being Cassandra right conceived but with the other side and have an expense understand the static part effect the scale you grow when you're not that gets to a size where it's basically you cannot showed memory to store everything in 1 1 server yeah for new forms a you have distributed processing so you can do that easily but on foreign today they also referring to be they also provide a really nice way to disputes your processing because it's also no sequel and kind of database so they already have not in there I didn't bring here but they have in their on their website everything nice tutorials on how you can do this and but it's basically the same thing you can and come up with an instance of a database and spread out and they have nice mechanisms to make everything as a distributed and come up happiness they come together without conflicts and his or like sharding using whatever you yeah here again about so but I've PetaMedia notcher I'm sorry I Idea such much about that may be but where to be totally physicals is concerned about guns so imagine of more yeah I like I assume that some of that but I didn't read much about it but the so that because of the candidate or hello of thank you for a token so I have 2 questions the 1st 1 who can I some old specify Mishima for for denote ended the base for let's say user rests on the name and the of actually new asynchrony is chemo as and you don't have a way to to put your schema inside of the of the day preceding and I use probably should the DC in application level instead of the database it's can uh that the big thing is that you don't have to come up with a schema for for so yeah yeah thank you and the 2nd question of life from the with the moon yeah it's so it's some whole carry some of specified dough storage mechanism for autograph like for sparse graphs or dense graphs is so optimal is enabled by storing the Russians of iteration ships in Babel or like linked list you know inside the duo forager the he yeah I'm not sure about that and never look at 2 to it if there is different ways to to do it but and that as far as have got to new for subtree black box that you just and it's take that takes care of you for you these things and I am not sure if you are able to tweak these 2 to customize the way that based on the specific parts of your data and I'm not sure I know that with like you can add change your back and as as as your application needs but new for J. I'm not sure if you're able to do that you were retarded drew 3 more questions so the these babies are perfect for storing relationships much relationships change over time how do you keep the better that because if you change a relationship that old is no known no longer there yeah I mean as the next step for me would be to try to come up with the time series for uh how you you would implement time series applications using uh databases but yeah I'm not sure you probably should and you can have an and start attitudes in your relationship so you can add a star like lists of attributes EUR relationship which could you could start they had a historic part of your relationship weight has been but and I'm not sure how you could implement really time series and application using you freeze is a but and you can do it implementing the in 18 and attribute you know your relationship events it it yeah thank you for the presentation I have 2 questions so if I'm not wrong you no forger there's a user interface we can actually see the graph in journal on hold those interface react if you have a big Raphael funnel hundreds of millions of knowledge yeah yeah it limits it limits and working now with an on a database that was supposed to be 16 thousand nodes and when I search for all the nodes it's just returned to me at thousand and so on it limits to you because it's really can be really heavy in the front and so yeah it limits and the 2nd question would be what I did a quick collection of to Thunder and there were some post like used to deliver the true aborted all the continuing on all of the progeny yeah I I had this morning actually someone treated me about Titan DB that idea had the time to look at it but it seems that they have like a discontinuation going on so I probably have to update my my talk but I just heard of this today and I'm going to take a further look at the lowest being involved in the process of that change the name I'm going to I'm going to take a look but yeah I so this this morning to really weird know a very good presentation of danger and I'd like to know paraphrasable when you use our graph possible do you use that Baxter of reaching 4 is imple tool are find did trust expressed between 2 nodes in graph databases do you have despite the Arboretum achieve or necessary I get the data and use the graph to looking for these present you have and you have some libraries to have to add to your site to unify JA you call these functions from Cypher so they our at coupled with the graph databases because these want they I graph 1 is really nice but you can do it without a graph database but you have some libraries that calculates the and and they a centrality measure their as pattern recognition of using the new phytate only affecting without their their graph you have those libraries to thank you go on great dog because remember to the rate at which the government it