We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Neo4j in a .NET world

00:00

Formal Metadata

Title
Neo4j in a .NET world
Title of Series
Number of Parts
110
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
This year, a small team of developers delivered a ASP.NET MVC app, with a Neo4j backed, all running in Azure. This isn't in POC; it's a production system. Also, unlike most graph DB talks, it's not a social network! In this session you'll hear the stories of: - what our project is, why we chose a graph db, how we modelled it to start with and how our model was wrong, - how we deploy neo4j in Azure, - how we diagnosed and fixed a performance problem that spanned the CLR, the JVM and Azure, - how we built our own Neo4j client for .NET, complete with a fluent, interface and internal logic to serialize .NET - expressions into Groovy comparisons (It was only slightly scary.)
23
63
77
WeightApplication service providerGraph (mathematics)Principal idealSoftware developerWeb pageGraph (mathematics)Direction (geometry)SequelDisk read-and-write headTable (information)Information technology consultingSeries (mathematics)CountingRow (database)DatabaseSpeciesGraph theoryLink (knot theory)Network topologyData structureWebsite1 (number)Endliche ModelltheorieGroup actionInformationVideo gameProduct (business)Projective planeProcess (computing)DiagramCycle (graph theory)Subject indexingMilitary baseAdditionMultiplication signArithmetic meanTablet computerBackupEmailDebuggerMereologyCASE <Informatik>BitIntegrated development environmentQuicksortGodData storage deviceComputer programmingType theoryRight angleDot productTheory of relativityMatrix (mathematics)Category of beingNeuroinformatikInternet service providerPoint (geometry)Revision controlDomain nameSummierbarkeitQuery languageArrow of timeLevel (video gaming)Goodness of fitProgrammer (hardware)WeightFreewareServer (computing)Computer animation
Software developerGraph (mathematics)Type theoryArrow of timeGraph (mathematics)DiagramEndliche ModelltheoriePower (physics)Point (geometry)Different (Kate Ryan album)Graph coloringInheritance (object-oriented programming)Direction (geometry)WebsiteEquivalence relationWeightProcess (computing)Category of beingVirtual machineJava appletBitCoefficient of determinationWindowSingle-precision floating-point formatWeb browserData managementProjective planeSystem administratorBinary fileIntegrated development environmentComputer fileKeyboard shortcutQuery languageProgram flowchartComputer animationEngineering drawing
Software developerGraphical user interfacePiVideo game consoleVirtual machineEquivalence relationEncryptionDifferent (Kate Ryan album)Statement (computer science)Point (geometry)BitGradientQuery languageNear-ringMultiplication signSlide ruleComputer animation
Software developerStatement (computer science)1 (number)Physical systemRow (database)Type theoryProjective planeAsynchronous Transfer ModeWebsiteBuildingAnalogyDifferent (Kate Ryan album)Web browserVisualization (computer graphics)Source codeEngineering drawing
Software developerQuery languageRootkitPhysical systemType theoryGraph (mathematics)Graph (mathematics)Block (periodic table)Table (information)WritingArrow of timeSystem callVideo game consoleMatching (graph theory)Numbering schemePoisson-KlammerASCIICASE <Informatik>RoutingAngleComputer animation
Software developerData structureMereologyProjective planeTable (information)Point (geometry)GodInternetworkingSequelCASE <Informatik>AngleFormal languageQuery languageWebsiteNumberVideoconferencingRight angleEndliche ModelltheorieDirection (geometry)Matching (graph theory)Graph (mathematics)CountingSoftwareLine (geometry)Complex numberPoisson-KlammerDatabaseArrow of timeDifferent (Kate Ryan album)Set (mathematics)Source code
Software developerGraph (mathematics)Endliche ModelltheorieMereologyDifferent (Kate Ryan album)Graph (mathematics)Vertex (graph theory)Interface (computing)Imperative programmingDigital photographyCategory of beingMatching (graph theory)Formal languageComputer animation
Software developerRevision controlExecution unitPhysical systemGeneric programmingTask (computing)OvalSoftware bugSoftware testingWindowBuildingView (database)ArchitectureComputer fileVideo game consoleDrag (physics)BitoutputClient (computing)Twin primeComputer programData typeFluid staticsString (computer science)Scalable Coherent InterfaceTerm (mathematics)Set (mathematics)Single-precision floating-point formatReading (process)Vertex (graph theory)Category of beingPower (physics)Imperative programmingFormal languageCartesian coordinate systemProjective planeEquivalence relationData managementQuery languageInterface (computing)Library (computing)Local ringData storage deviceString (computer science)QuicksortClient (computing)Multiplication signSoftware testingIdentity managementTraffic reportingMetadataIntrusion detection systemRootkitDependent and independent variablesRepresentational state transferOpen sourceWeightGraph (mathematics)Software bugSoftware developerProduct (business)MultiplicationEmailNumberComputer fileRevision controlPoint (geometry)Hand fanResultantRoutingNear-ringBit rateOffice suiteMatching (graph theory)InternetworkingDemosceneNoise (electronics)Reading (process)Closed setRight anglePOKEProcess (computing)Computer animationSource code
View (database)BuildingRootkitClient (computing)Video game consoleSoftware developerSoftware testingArchitectureMathematical singularityWindowCountingSingle-precision floating-point formatComputer fileDrag (physics)String (computer science)Set (mathematics)Software bugNormed vector spaceGamma functionHypermediaCategory of beingQuery languagePredicate (grammar)ResultantVisualization (computer graphics)Computer fontLambda calculusSystem callClient (computing)DatabaseData dictionaryCoefficient of determinationGraph (mathematics)Video game consolePoint (geometry)String (computer science)RepetitionRight angleStandard deviationWrapper (data mining)Matching (graph theory)QuicksortObject-oriented programmingRepresentational state transferWeightComputer animation
Software developerCategory of beingDisk read-and-write headWeb browserPower (physics)Primitive (album)BitLink (knot theory)Category of beingProduct (business)Data storage deviceDatabaseGraph (mathematics)Projective planeSubject indexingLattice (order)Address spaceMultiplication signStructural loadControl flowComputer animation
Software developerInheritance (object-oriented programming)RootkitDreizehnComputer filePointer (computer programming)Formal languageGraph theoryCoefficient of determinationGame controllerGraph (mathematics)Projective planeRootkitGraph drawingCASE <Informatik>Revision controlForcing (mathematics)RandomizationOpen setScripting languageVolumenvisualisierungProduct (business)Directed graphMoment (mathematics)DiagramRun time (program lifecycle phase)WebsiteProcess (computing)Computer animationSource code
Software developerInfinityNormed vector spaceDirected graphComputer fileComputer fileDiagramForm (programming)Product (business)Touch typingData structureWikiGraph (mathematics)Complex (psychology)Projective planeType theorySpacetimeLine (geometry)Multiplication signFamilyUniform resource locatorProcess (computing)Graph coloringAuthorizationComputer programmingEndliche ModelltheoriePhysical systemDatabaseScalable Coherent InterfaceQuery languageStapeldateiElectronic mailing listCrash (computing)Computer animation
Software developerServer (computing)Internet service providerCodeData typeState of matterPerturbation theoryWrapper (data mining)Staff (military)Software bugPower (physics)System administratorGastropod shellDirectory serviceDatabaseComputer fileCASE <Informatik>StapeldateiScripting languageSoftware developerProcess (computing)Projective planeMoment (mathematics)Integrated development environmentGroup actionDistribution (mathematics)Near-ringInternet service providerShift operatorProduct (business)WindowDigitizingGame controllerWebsitePatch (Unix)MultiplicationLevel (video gaming)Revision controlSequelInstance (computer science)Point (geometry)Image registrationOracleDatabaseJava appletConfiguration spaceNumberInstallation artMiniDiscGoodness of fitWeb 2.0Keyboard shortcutComputer animationSource code
Software developerWeb pagePiServer (computing)MultiplicationQuery languageRun time (program lifecycle phase)Self-organizationProfil (magazine)WebsiteInstance (computer science)Tape driveCartesian coordinate systemData structureProduct (business)Graph (mathematics)Integrated development environmentApplication service providerWeightEndliche ModelltheorieProjective planeJava appletCross-correlationData storage deviceSoftwareConfiguration spaceEmailWeb 2.0Game controllerInstallation artPoint (geometry)Web applicationHard disk driveMultiplication signPoint cloudComputer fileFirewall (computing)Intrusion detection systemPatch (Unix)SpacetimeBitSoftware developerProcess (computing)CuboidScripting languageVirtualizationString (computer science)CASE <Informatik>Revision controlComplex (psychology)Coma BerenicesForcing (mathematics)Disk read-and-write headBlogGroup actionWindowCrash (computing)OntologyComputer animation
Software developerIcosahedronClient (computing)Mach's principleSingle-precision floating-point formatSoftware bugComputer fileSoftware testingView (database)ArchitectureComputer programSpacetimeWindowBuildingBloch waveFluid staticsString (computer science)Inclusion mapComputer configurationWechselseitige InformationClient (computing)Human migrationMultiplication signOperating systemComputer programmingUniqueness quantificationIdentifiabilityFile formatNumberMessage passingFluid staticsSocial classForm (programming)Key (cryptography)SynchronizationProjective planeMechanism designComputer fileUniform resource locatorOpen setData storage deviceType theoryMultiplicationLibrary (computing)Intrusion detection systemProcess (computing)Graph (mathematics)SpacetimeQuery languageCuboidQuicksortPhysical systemDifferent (Kate Ryan album)Electric generatorNumbering schemeVirtual machineReliefBlogSequelObservational studyComputer iconOpen sourcePoint cloudGoodness of fitComputer animation
BuildingSoftware testingArchitectureClient (computing)Software developerSoftware development kitEuler anglesHuman migrationOvalInheritance (object-oriented programming)Graph (mathematics)RootkitData typeCategory of beingDigital filterUniqueness quantificationView (database)Query languageMathematicsVirtual machineComputer iconLaptopProjective planeMultiplication signNumbering schemeHuman migrationSequelSoftwareRoutingDatabase transactionTrailRevision controlBlock (periodic table)TwitterContent (media)CASE <Informatik>Pattern languageClient (computing)Category of beingFilter <Stochastik>Moment (mathematics)NumberSpacetimePredicate (grammar)DatabaseInternet service providerPhysical systemGraph (mathematics)Query languageServer (computing)QuicksortRootkitCodeMoving averageComputer fileMobile appWeightComputer animation
Human migrationClient (computing)Graph (mathematics)Video game consoleSoftware developerHill differential equationData miningSubject indexingData modelRootkitPhysical lawComputing platformMultiplicationFood energyInternet service providerRelational databasePhysical systemFamilyFunctional (mathematics)MereologyTable (information)Multiplication signRootkitSingle-precision floating-point formatNumberQuery languageData structureType theoryReference dataSoftware testingSoftwareSoftware maintenanceCartesian coordinate systemElectronic mailing listDatabaseCASE <Informatik>CodeIntegrated development environmentDifferent (Kate Ryan album)Complex (psychology)Graph (mathematics)BitClient (computing)Subject indexingInformationQuicksortMatching (graph theory)Operator (mathematics)Level (video gaming)LeakRoutingPlastikkarteSource codeDiagram
Software developerMatching (graph theory)Subject indexingMoment (mathematics)Level (video gaming)Default (computer science)QuicksortKey (cryptography)1 (number)Graph (mathematics)Physical systemPrice indexEndliche ModelltheorieMereologyClient (computing)Computing platformPoint (geometry)BitOperator (mathematics)CASE <Informatik>Menu (computing)Interactive televisionProcess (computing)Right angleDiagram
Software developerBitExpressionJava appletThread (computing)Predicate (grammar)Information security2 (number)SoftwareAlgorithmQuicksortDifferent (Kate Ryan album)Cartesian coordinate systemCodierung <Programmierung>Type theoryTable (information)Server (computing)Query languageBus (computing)Client (computing)CuboidComputing platformCategory of beingImplementationEndliche ModelltheorieSoftware developerGraph (mathematics)WeightHardware description languageOpen sourceTraverse (surveying)Dynamical systemGoodness of fitMoment (mathematics)Term (mathematics)EncryptionLink (knot theory)Scripting languageDeclarative programmingString (computer science)Observational studyLogarithmDifferenz <Mathematik>Bit rateMultiplicationSpacetimeBuildingReading (process)CodeProcess (computing)ForceFormal languageProjective planeLevel (video gaming)Network topologySource codeComputer animation
Transcript: English(auto-generated)
All right, good morning, everyone. We've got a bit of an interesting topic here. So I was always interested to see how many people were going to come along and know what Neo4j was. I know I was talking to one guy who said he's used Neo4j. How many people have used it already? OK, three. OK, a couple of people.
And the rest of you, how many people know what a graph database is? OK, a couple of people. Not so sure. All right, so we'll go through a little bit of terminology to start with. So my name's Taith Motti. I'm a principal consultant with a group called Redify down in Australia. And the reason that I was doing this talk and why I know about Neo4j is I've actually
spent the last 12 months with a team using Neo4j on one of our projects as a database. It's our only back end store. So it's not just something that we bolted onto the side. We exclusively use it. And we're in production. We have production users, everything like that. So we've gone through that full life cycle. And I actually really quite like it now.
Just to complicate things, we run it in Azure as well, which means we have a Java-based graph database running on Azure with a .NET front end. So it made for some interesting debugging problems from time to time, like when you had a performance issue. So start off with what is a graph DB. It's a NoSQL store.
So it's not a relational store or anything like that. And it's basically like a document database where we have nodes which have keys and values. They're property stores. But then we have relationships between them. So if we start off talking about what a graph is,
if we imagine that these sort of dots are nodes and then we have relationships between them, that there is a tree structure we'll be quite familiar with. It looks very common. As soon as we go and do a link like this and there's two ways to get to something, that becomes only a graph. It's no longer a tree. So a tree is a graph. A graph is not a tree.
So this is why it's a graph database. Neo4j is a directed graph database. And what that means is that these links between nodes all have a direction. So the arrow might go that way, that way, and that way. And I missed one. So you always have a direction, whether you like it or not, which is incredibly useful.
And then when you're talking about graphs, there's also the term of cyclic or acyclic. Neo4j or graph databases support both. So what cyclic means is we've actually got a cyclic graph here, which I've drawn out, because there's a complete cycle there
between all of the nodes. There's a way to get from one to the next and get back to where you started. That doesn't always have to be the case, though. Because if I go and take that out, that's still perfectly valid. And that's acyclic. So it supports both of them. So that's just a little bit of terminology to start with. Neo4j is just one graph database on the market. It's one of the most mature ones, though.
And it's the one I'm familiar with, which is why I'm talking about it. There's a good little story about the name, which I only found out about after nine months of using it, which is why it's called Neo4j. I absolutely hate the J on the end. I'm trying to ignore that. It's kind of like talking about SQL Server for C++. And it scares the .NET programmers away. But the reason it's called Neo is actually a matrix reference,
because Neo is fighting the evil tables all through the matrix series. So that's why it's Neo4j. We're going to fight the evil tables. And it runs some really big websites, like really, really big ones, big name companies. And there's lots of information about that on their site. More importantly, as far as Wecare as developers,
you can get it for free. So it's the same licensing model as MySQL, where you can get a basic version of it, like a community edition, and you can use that for free. And then you only pay for it when you want stuff like clustering and online backups and everything. So getting into why would we use a graph database, graph databases are really good in highly connected
environments, where we have lots of things that all relate to each other. And we need to go and walk those relationships, which is a lot of our domain models, whereas a SQL database is really oriented to dealing with things in aggregate, so where we want to sum things and count things and all those types of processes. So anywhere where you find you're in a SQL database
and you're doing lots and lots of joins, and you're only getting one or two rows from each table, and it feels really hard and annoying and your queries are ugly, and you have to do all these indexes and everything, that's where you want to be using a graph DB. That makes that really, really easy. What's nice about that, then, is it's also very easy to model. So I started a new project a couple of weeks ago,
and just getting my head around it, I was talking to the product owner, and I drew out this diagram of going, OK, we've got people and companies, and companies have assets, and described things like, OK, companies own an asset, and people can recommend companies, and people want jobs, and things like that. So I just scribbled this out of my tablet
and then emailed it to the product owner and said, is this the right model? Have I got it right as a high level concept? And this is a totally business guy. He really doesn't know how to use a computer very well. And what he sent me back was a PowerPoint file, because that's what business people draw in, right? Everybody draws in PowerPoints, where your mock-ups come from. And he went and redrew it like this.
And that diagram made perfect sense to him, because he just went, well, we have companies. And then he implicitly went and did all these different colored arrows. And the arrows had directions, and he described the names of them. They're the relationship types. So he's just a business guy who just went and drew it out. So anybody can actually go and model a graph TV,
which is really, really nice. So let's get into actually having a look at, we'll run up Neo4j. I'm going to launch Neo4j. We're going to walk through a bit of how we structure data and some querying of it. And then I'm going to step back into the .NET world, because this was called Neo4j and .NET, and talk about how we query it from .NET. And then also on the two projects
I've been using it on, how we deploy it into Azure and also into an on-premises environment. As I go, if you've got any questions, just throw a hand up. As most of the speakers have probably said, though, we've got warned. It's Norway. Nobody asks questions. In America, everybody asks questions, because they want to give their opinion rather than actually asking a question. So they talk for a while and pretend it's a question.
So Neo4j is Java-based. So if I jump into, I have a temp folder, NDC. I've just downloaded the zip file off their website, and then I've unzipped it. And here we get. So in here, there's a bin folder,
and there's just Neo4j.bat. And all I do is double-click that. There's no install process. This is just going and running it up on my machine. It's gonna bind to a port. I've got a Java window in front there. There we go. So the Java window's now running. And if I go over to my browser, I'm just gonna go to localhost 7474,
and Neo4j's up and running on my machine. This is the admin console. It's kind of the equivalent of SQL Management Studio or something like that. And we can see that we currently have one node, no properties, no relationships. So let's go and get some data in here, and we can start playing with that. If I minimize my awesome slides.
Just gonna grab a bit of data I built earlier to save us some time. So there's two different query languages for Neo4j.
One's called Gremlin, and one's called Cypher. And what we're gonna look at today is one called Cypher. And I just pasted in a Cypher statement there, and we'll go back and have a look at that in a minute. So if I start off in the data browser, we've got this kind of initial node zero, which is what we call our reference node. And I'm gonna flip over here into a different visualization mode.
So off node zero, what I've created here is this data. I can go and bring these nodes in. And I like to think of node zero as kind of the system or whatever it is, the project or the site we're building or something like that. And I describe the relationships off that in that type of terminology. So node zero has a user, Tatham,
has a user, Pat, has a user, Tom. And then off these, we then have other relationships like Tom likes rowing. Tatham, Tom, and Chrissie all like sailing. So very easy relationships, and you can just kind of read through it. The system has a user, Tatham, who likes sailing. And somebody else also likes sailing.
And then we can go and query off that very easily. So what I'm gonna do, there we go. Unfortunately, this console doesn't work very well with increasing font sizes. Every query we run, we have to start from somewhere in the graph. Because there's no like named tables or anything.
There's no fixed schema. We just make it all up. We've got to tell it where to start from somewhere. So our query always starts with a start clause. And we'll go and in this case, say I want to start from node zero. And let's call that root. And then from there, what I want to do is I want to go and match. And I'm gonna say from root,
there's a relationship that goes out of type has user. And that goes to a person. Return the person. So it's glorified ASCII art. I get paid to write ASCII art all day. It's awesome. Angle brackets, little dashes. So we can see there that there are four users in the system.
Simple enough. And node one there is me. So let's go and run a query against that. So I can say start me is node one. I want to go and match stuff that I like to an activity. Return the activities.
There we go. I like sailing and mountain biking. So you can see the kind of directed relationships there of where I've got a dash and then I've got like the arrow off the end. So this here is describing the relationship. Oops, that block there. And then I can just kind of name nodes along the way. What I can also do with this then is, oops.
If I say start me is node one, I want to match things that I like to an activity. And then I want to get coming back into the activity. So you see the arrows going two different ways here. Other people who like that activity as well.
Return that. So there are other people who like the same activities that I do. Now, can you imagine doing this in SQL? So at this point you have an activities table, you'd have a person table, you'd have a person activities table and you would have done some god awful number of inner joins already.
But then what I'm looking for here is other people who I don't know who are interested in the same things as me. So I can go me is node one. I want to match, oops, me likes activity.
And then I want to say where there's not a relationship of me friend to that other person. So people I'm not friends with. And in that little clause there, you'll notice I haven't put any angle brackets in. So even though every relationship always has a direction, I can query across that relationship in either direction.
So I'm saying I don't care who's originally friends with who, it's kind of a bi-directional thing. It's not really that relevant. We'll just ignore that in the querying. And then I can go and return the person. Now here I've got Chrissy twice, so how's that work? As it's going out and querying across the graph, it's gone and walked out from me
to two different activities, sailing and mountain biking, and then walked out and found her node twice. So it's returned it twice.
This is a really weird way to step up in lines. So obviously I can, as you expect, I can just say give me the distinct nodes. So quite a simple query structure. Is everyone happy with that? Questions about that? Cool. What this also lets us do is I can actually go and run some kind of more projections as well,
so where I have a more complex domain structure. So I can say start with me, match the other people who like the same thing, where I'm not friends with them, and then instead of just returning the person, I want to return the person's name and like a column. So we're going to create a table here
out of a graph database or a projection with a count of how many activities there are, and I want to order that by the count of activities. So how many activities do I have in common? Descending. So there I can see there's two different things that this person Chrissy likes that I like as well and I'm not friends with her.
There's this other guy Pat who I'm not friends with and there's one thing that he likes. I should say in case my friends actually watch this video, I am friends with both of those people it just needs to be in the data set that they're not. I get some nasty email from them. You said I wasn't friends with you. All right, so Cypher, here we're using it as a query language.
Cypher is also mutable. So in the same way that in SQL you can do mutations, you can create, update, and delete, you can do that now in Cypher as well. What I will do as well though is very briefly talk about Gremlin, which is another query language. And the way Gremlin works is it's very kind of imperative
as opposed to Cypher which we've been using which is very declarative. So if we look at our kind of model here, what we were doing in Cypher was we were describing a part of the graph saying something that looks like this then give us these relevant parts of it. Whereas what we do with Gremlin is we very explicitly say, okay, start at this node then follow out this relationship
and then when you're here go out this one and then get this one. And it's really nicely named. They've got a nice little logo of a character who's a little Gremlin. You can imagine him kind of running around the graph and getting different parts of it. And it's a fluent interface. So the way it works, the magnifier of the road,
I can say go and get me V zero. Now why is it V? The wonderful thing about the graph world is that it's quite academic and depends who you stumble across. So in some places we call them nodes and relationships. In the academic world they call them vertexes and edges. Nodes are vertexes, relationships are edges. Same thing.
So here I've got vertex zero. And then what I can do is I can look for all the outbound edges. So we can see those. I can see the inbound vertices for them. I can look at the outbound edges of where they go on like and they're friends with things. And then I can filter that
to where it.label equals friend.label. Filter. So that gives me all the friend relationships. And then I can get the incoming vertices. So we've got them. And then I can map out the properties off the end.map.
And I can see them coming out there. So it's a very kind of imperative step-by-step language which gives you a lot of power. But generally the world's moving towards Cypher. All right, so we're in a .net world. We don't wanna sit here in the equivalent of management studio all day writing our application.
So what we're gonna do is just go and create a brand new project. So Neo4j has a very nice REST API which gives you access to go and retrieve individual nodes put and post them and all that sort of stuff. But also go and send queries to it and saying here's my Cypher query. Please run this for me. And what we've done over the last 12 months
of working on this production project is actually built out what we think is a really nice client library for going and querying that. And of course in the .net world because we released something, it's on NuGet. Firehole contains corrupted data, awesome.
I tried to just pull that from my local NuGet store but we'll download that from the internet instead. So this is an open source project. We work on it quite actively though because we have kind of six, seven, eight developers in our team. And every time we need a feature we add it to this. So we've got something like almost 900 commits now
and 500 tests and all sorts of stuff. And every single time we add a feature as long as our tests pass our CI build automatically publishes it to NuGet. So if you use this library at all, sometimes you'll be out of date within an hour because we'll publish five or six times a day. Which is fun for bug reports.
People say it's broken, it's like which version? Oh no, you're two hours behind. All right, so what I need to do here is I'm just gonna spin up a new graph client. Graph, not giraffe. And we'll point that at our local data store.
That's the rest endpoint. And then one of the things we do need to do is explicitly go and connect the client. Because there's a lot of metadata that we don't get about how to talk to Neo4j until we hit the root endpoint. And that allows us to see what version of Neo4j it is,
turn features on and off, that sort of stuff. Now, when I talk about the root node, which is that kind of node zero, this is we always need somewhere to start from. One of the things about node IDs in Neo4j though is that they can change. And we never want to persist them anywhere at all. Because from a data storage sense, the way they work is, say you've got node six,
when it wants to go and find the header for that node, it starts in a particular file and it goes, node number six times six bytes in and goes and that's where it expects it to be. So if you optimize your data store, things can move. So there's a special one here, a special property we've got .root node that always tells you where it is. But what I want to do is start a Cypher query.
So I'm just gonna go client.cypher and then say I want to start that query and the identity is root. And that's the client.root node. And then I can actually have in Cypher, I can have multiple start points as well. So I can say it's that node and that node, I want to find stuff that's in the middle of the two of them or stuff.
Then what we've done here is, this is gonna be impossible to magnify into I think. I do this, this, there we go. The start method returns an iCypher fluent query started. So every time you call one of our methods, we go and give you a different interface back that represents what's valid syntactically
at that point in the query. So here you get a match clause. And then I can go and do my match and say okay, so what I'm gonna look for here is actually find my own user. So I want to say I'm doing the has user to a user. But now I want to go and filter this because I'm looking for a particular user.
I want to find my user, me. So I can put a where clause in here but we don't want to just have magic strings in our .net otherwise there's no point in us having this nice client library. So what I'm gonna do first of all is actually go and create the pocos and say, okay, well whenever we've got a user node, they have a name which is just a public get set string.
And whenever we have an activity, it happens to for now look the same. And then what I can do is say where the entity or the identity user here of type user,
user.name equals TAFEN. And then say, I want to return that as a node of user. And I need to return the identity user.
So I'm gonna put this as TAFEN query equals that. And I'm gonna say TAFEN query.results.single because I'm only expecting to find one node from that.
Let's go and just use the F5 key, which I'm normally not a fan of at all, there we go. So off the client, we've gone and done stuff like, oops, we've connected up to the data store. We've gone and pulled back the root API response so we know where all of the various endpoints are that we need to talk to.
But then in this query here, it had to, visual studio font size, it'd be great if they applied to tool tips. You can see there we've gone and built out the cypher query for that. So we've actually taken the C sharp lambda and turned that into a cypher predicate
and formatted that out. And then when I go and invoke the results property, it goes and retrieves that node for us. So if we go and jump into our immediate window, I've now got TAFEN is a node of user. I can go .reference.id, tells me the node ID.
And if I go .data, I get the POCO there that we've deserialized into. So I get all that nice just standard property support, not dealing with dictionaries or anything ugly like that.
And then from here, I can always start a cypher query. Like once you've got a node, you then sort of just keep querying off that and walking around the graph. So then what I can go and do is I can do the similar people query. So people who are interested in similar things.
So I can say, okay, I wanna start a cypher query off that node and we'll just call that me. And then I can do the match of me, likes and activity. And then somebody else who likes the same thing and you assume they're a person, oops.
Now for the where here, I can't easily supply a predicate for the whole not relationship thing. So I can just pass that in as a string. Basically we support string entry for everything and we just plumb that into the query and then where possible, we try and create a nicer .net wrapper for it.
And then say return all of the distinct people. I can go person.reference.id and person.data.name.
So there in our console,
we've gone and retrieved those two nodes, references to them, data, everything. So it's actually quite nice query experience, I think. Biased, slightly biased. All right. And then at some point though, we have to be able to modify data in our database.
One of the things that's the way the REST API works is that you have to retrieve an entire node and then give the entire node back. You can't just update individual properties. With mutable cipher, you can do that. We haven't quite got all our support in there for that yet but what I could do is I could go and say, okay, on a person, we now have a property of their age
and then what I could do up here is once we've retrieved TAFEM, I can say client.update. We give it the reference to that node and then what it does is gives us a callback.
It'll go and retrieve that node and then say, okay, here's the node, now update what you want. So I can go and put something in there to go and update that node. If I run that and we go back to our data browser,
we look at node one, you'll see it's gone and persisted that in there. One of the things about Neo4j is that it has very simple node models. They're just a property and a value and then you store primitives in them which initially feels a little bit restrictive but as you go and build out a graph,
a lot of the power in it is actually keeping very small simple nodes and then walking the relationships between them. So you actually kind of take a big complex idea like a customer and break that up a bit so you'll have a customer and then you'll link off to address nodes and things like that which means that then querying it is incredibly efficient because you walk from one node to the next
which it's very optimized to do and you only walk the paths that you actually need to go down and retrieve the properties you actually need. So you spend a lot less time going and managing things like clustered indexes on join columns and stuff like that which is kind of nice. So the problem with this though is we've started to kind of load all this data
into our graph. It's all nice in NoSQL. We can see in our data browser a little bit what's going on but I don't have many nodes in here and it's already starting to get very messy so how do we actually know what's in the database? We can just go and make stuff up. And this was one of the problems we ran into kind of fairly on in our project. So one of the things we did
if I go into this one is there's a really nice tool called GraphVis which has a language called dot. So it's a graph visualization tool and dot is just a language for describing graphs and so in our source control just in the root of our project we started writing up what we called the reference graph.
So here we just go and describe like an example case of what we would put in our graph. And then the nice thing about GraphVis is it can actually go and render this for us. So I've got a render documentation dot PS1 PowerShell script and now when I go and open reference graph dot SVG
I get a nice SVG diagram as soon as I can zoom, there we go of an example of kind of what's in my graph. So not actually what's in there but a source controlled way of doing that. So whenever we go and update any of our models or pocos or relationships or anything we keep this up to date. What we're working on at the moment
is actually having a script which can compare this with a real data store. So it's kind of documenting our schema without it being enforced at runtime. That there's a very, very simple project. The project I've been on for the last 12 months
our reference graph looks like this. You'll see it's, oops, the whole touch thing is not working on different res. You'll see it's an incredibly kind of complex model where we've got lots of really, really highly connected data. There's lots of lines going everywhere. And the scenario that we're modeling here
is actually a social care system. So you've got the police turn up at a house, mummies overdosed on drugs, there's three kids in the house, the police pick them up, they have to go into emergency care and then into foster care. So there's lots of various relationships we need to trace between people. Of their incredibly complex family structures
in that type of socio economic space. The whole question of who's your father is very, very complicated. But we also have a very tight relationship between forms and how we manage that and the kind of process around going okay, they're involved in this foster care program and this is their carer and that carer
has done this authorization for this person and all that type of stuff. So it's very much about walking around the nodes relating to an individual and having to query that very efficiently. And we don't do a lot of stuff in aggregate. So this is where a graph database made a lot of sense for us to be able to just go and grow this out very organically. What's also nice about this reference graph here,
we actually have one of our CI builds every time we push, if you touch the documentation folder at all, it rebuilds the SVG, drops it on a HTTP location which is then referenced from our wiki. So just every time we update push, it's on the wiki. And similar to that kind of diagram I had
where the business or the product owner just drawn out all the what he called, it's actually quite funny, I sent him a file called what connects to what and he sent back something called what connects underscore color spaghetti. But this is a very similar diagram to that, albeit generated from code. And what we've actually found is not so much business users,
but some of our BA's who we work with can understand this graph very easily. So they'll go and open the reference graph and go, oh, that's really interesting that we've got that data so close there. Can we go and show that on the screen? Can we query that? Can we do this? And they have a really easy sense of how our data's actually structured, which is nice.
So how do we go and deploy this? You saw how easy it was for me to run it up where I just unzipped something and then just basically double-clicked the batch file and off we went. That's obviously not always gonna be the case where you don't just wanna run it in process.
But it was one of the projects I've got at the moment. So what I'm gonna run here is a PowerShell script set up development environment, which I need to run elevated.
Some good little keyboard gymnastics there to launch that process, Control-Shift-Windows-2. They're all very close. So what this script is doing is actually installing Neo4j as a service. So if we look at some of what's gone past, you'll see it does, Java 1.7 is installed.
So we go and check that based on the Java registration. If it's not, this script will actually download that from Oracle, run the MSI silently, and then continue on to the next step. And then what we do is we go and download Neo4j from their distribution site, dist.neo4j.org.
So it says Neo4j not installed for this project. Install is already cached, I've already downloaded on disk. We unzip it. And then what I say is in my web config for this particular project, I say what port I want Neo4j to be run on. Because each of the different sites that we run up of like our staging and production and development instances and stuff, the way you segregate the databases
is you run multiple instances of Neo4j in its entirety. It's not like SQL with one instance in multiple databases. So I just say in the web config, it should be on port 8000. And then what the script does is actually just goes and patches this into the config files, into the Neo4j config files. Says here's your data path, there's the port number you need to be on.
The service name should be Neo4j-8000, so it's unique. We go and create the service and we start it up. And then totally in the background, now what I've got, if I go to localhost, 8000, is I've got another Neo4j instance running that a website can talk to. So that script there, while it was called setup development environment, is just wrapping what we actually use
in our production deployment to go and just install that and run it up on the box. And then at that point, it doesn't matter that any of it's built in Java, you just query it via REST. In the Azure environment though, this is a little bit more complex. How many people have worked with Azure? Okay, a couple.
So Azure's obviously fairly nicely set up in the .net space. We chose to use it for this project because it gave us a very structured deployment package where as long as we could build that package and we could hand it off to Azure, they would run it up for us. Because the organization we're working with are a very small not-for-profit, they don't have a lot of IT capability.
So we needed to be able to go, oh look, the website's down, redeploy it and just have it run. So the way that actually works is we have, so in Azure you have multiple roles. So we run up a web role, which is where our web application runs, and then we have a worker role. So the worker role is basically invokes a DLL entry point
and then you can do whatever you want on that box. And then in the Azure config, we describe that we need an endpoint to be exposed in here. And we just say that there needs to be an HTTP endpoint there. Then in the Azure world, we've got blob storage,
which is kind of like S3, blobs. So each of the web and the worker roles here are actual VMs running Windows installs. So we actually store the JRE and Neo4j itself in blob storage. And then when the worker role runs up, this is very similar to the PowerShell script, we download JRE, install it on the box,
we download Neo4j, unzip it on the box, we look for, we ask Azure what the network configuration is of where our endpoint should be, we patch that into the config files because that's the endpoint that they open in the firewall. But then what we need is somewhere to keep our persistent data because the web and the worker roles get completely destroyed and recreated all the time.
So we use another feature of Azure which is called cloud drive, where in our blob storage, we have Neo4j.vhd, which is a virtual hard disk file, the same as you'd use with virtual PC or something like that locally. And then we actually mount this from blob storage
onto the box as xDrive. And then xDrive is where we just store all of our persistent data. So we point the Neo4j data store to that. And then that gives us persistent storage. And then we just keep a .NET process around that actually goes and then monitors, is Neo4j still running? So if that monitoring fails at all,
then we crash the .NET process, which then allows Azure to detect the worker role has failed, and then it will go and redeploy and rebuild that worker role for us. So that gives us a nice kind of model there. The other thing that we can then do is actually you can scale out Neo4j
across multiple nodes. And they each need to have their own persistent data store, and then they synchronize data between them. So the way that we do that is we just go and have multiple VHDs in here, and we run up one worker role instance for how many nodes we want in the cluster,
how many server nodes we want to host our graph nodes, and we point each of them at independent VHDs. And it all works, just. It felt like a lot of sticky tape and string and glue, but it all did actually plug together. And then on the front end, we have an ASP.NET MVC application.
Now to actually get instrumentation all the way through, we also used a product called New Relic, which is a kind of monitoring product. It's like a runtime profiler. And what's really nice about New Relic is they have both .NET CLR and Java JVM support.
So that there actually, we install the New Relic kind of monitoring agent into Neo4j here, and we wrap our MVC application in it as well. And that actually gives us end-to-end traceability of queries, it puts a correlation ID on the underlying requests. So we can see, like we ran this controller here in .NET,
which had this action, which did this HTTP call, it puts a correlation header on it, which then hits the embedded server in Neo4j, which launched this Java class, which then went and ran this query. So we can actually see right down really specifically what queries were impacting our performance, which was kind of cool.
Right, so that's the deployment story. One of the things I was talking about was that the node IDs can change and shift on you. So you can't use them at all. Now, we wanted to have, we needed to be able to expose things in URLs. So the idea of, okay, in a lot of cases,
say you're looking at a user profile, you can go and use a username or something like that. That works well enough, but you don't always have a semantic identifier like that to use. So where we had stuff like clients, we'd need a number for them, because A, they don't all have unique names, and also a lot of the types of people we're dealing with,
we don't even know their name to start with. It's the kid with the red beanie who's always at the skate park. So we needed a way to be able to generate these unique identifiers. Now, Neo4j doesn't actually have a way of doing these out of the box, and it also doesn't have that kind of sense of schema to understand the different scopes of, well, that's a client and that's a program or something like that,
because we also wanted to be able to do things like, we'd have referrals, and then a referral would obviously need a unique number. So one of the other things we built out of this project was a library called Snowmaker, where its whole job is to just generate unique IDs. So it is a distributed, so cloud-capable,
guaranteed unique ID generator, which generates longs rather than quids in a mostly sequential and mostly dense fashion. So time-wise, it should be kind of mostly one, two, three, four, and then sometimes you might skip six and you'll get seven, eight, and if you sort it exactly by time across multiple nodes, some of them might be threaded together.
So you might get 10, 11, but then by the time you get to 20 and 21 and stuff, it all kind of mostly works out. So what that actually let us do was have, and it's also scope-based, so we can have really targeted scopes of things like clients, referrals, one. So it's referral number one for that client,
not the number one referral in the whole system, but just the first referral for that client. So the way that works, if I install another NuGet package, so this is also something we publish as open source. There we go. Is we have a unique ID generator.
Some IntelliSense would be awesome. No, we're not gonna get that. There we go.
And then the unique ID generator uses a optimistic store of some form. So it's got a file one, which is only really if you've got one machine, so it's great for dev and that's about it. Or there's another store we've got where you can use Azure Blob Storage,
which is the synchronization mechanism. And then off the generator, what I can do is say, just give me the next ID in a particular scope. So give me the next ID for a client. And that there is gonna go and return a long.
So then go and kind of, so imagine we then had client one and we want to generate a referral for them. What we could do is we can just say client one, referrals, and we just make that the scope name. And then it starts at one and counts up again from there. So that worked really nicely for us in kind of,
so the reason I'm talking about this in a graph talk is that it gave us the opportunity to do things like having that slash client slash referral slash one, which makes a lot of sense in that highly connected space of we can have lots of nodes with all unique ID one persisted on them because we find them through a query of where they sit in the graph. We only need to identify them uniquely in that scope.
Whereas doing that in SQL world, you'd be talking like composite keys and stuff like that, which as soon as you say that, most SQL people kind of cringe. And then just to wrap that up, what we'd do is we'd have a public static class ID scopes.
And we'd have something like referrals would take a long client ID and it would do the formatting. So then what we'd do in here is we'd just go ID scopes dot referrals and we'd pass in the client ID. So we got that nice and consistently.
All right, so let's say now we want to go out and actually give all of our users in our system, we've already got unique IDs. We need to go and do some form of migrations and kind of evolve the schema over time. So what I'll do is I'll open up.
We wait for the preview release of the operating system to wake up. Let's do it in here.
Okay, my alt tab's not showing anything and I can't click the icon. This is awesome. There's two. And this is where we go. Change machines.
People ask me why I carry two laptops. So the migrations, we always wanted to kind of evolve our schema. Very similar to how you would just evolve a SQL schema over time. Get this projecting. There we go.
So what we did in our application, this is a smaller app that I'm just starting to work on, is we have a migration service that just runs on startup. And what we did was we tried as much as possible
to avoid what you generally do in kind of the SQL space is you go and check what version of the database we're up to, what changes we applied, and then just roll forward new changes you need to do from there. Because we're in this kind of NoSQL world of this very soft schema, we also wanted to make it almost self-healing in a sense. So we make all of our data migrations where we can.
We just look for something. If we find a particular pattern or we don't find a particular pattern, we go and create it. So here I go and check the particular super nodes exist or we introduced a date-created property after we'd already created a bunch of users. So just go and say, hey, show me all of the users that don't have a date-created property and then just go and set it on them. I mean, this code here can be improved a lot,
but we don't have a lot of migrations yet. So it's a very procedural. And you can see there's a gremlin query there of doing the out relationships. And as we move towards mutable cipher, we'll actually be able to do those kinds of transactions in bulk updates as well of just sending a single query to the server rather than at the moment where we're having to sort of foreach through things.
And then specific to our Neo4j client, because you can't always have a predicate for something, say a property not existing and therefore being null, but we don't want to expose it as a nullable property on the Poco. Like if we introduce unique ID, we don't want that to be nullable long. In the where clauses, we also let you go and build up filters
and things like that directly to support those migrations. So for things that aren't even valid in the way that you represent your .NET code, but being able to go and find those patterns and upgrade them as well. Now, in some cases that doesn't always work. So where we went and had to clean up bad data that was very hard for us to detect otherwise.
So something else that we did in that kind of idea of treating the root node as the system. Let's see if I can bring this up. C code was we did actually kind of track DB versions in the DB as well.
I love how it gets freaked out about blocked content in the SVG file. Here we go. So off the root node, we go and say it has versions and a version history and then go and write out kind of what version numbers we've applied to the database.
This particular file here, which is using that graph is layout doesn't auto layout, which isn't always great, but it's generally pretty good. One of the other things we then had, which was quite nice about a graph database over a relational database was one of our requirements
is we had to be a fully multi-tenant system where we could actually, we're a hosted software as a service platform for these care providers. And they needed to be able to just sign up and have their own silo of data. And you obviously don't want details about children in complex family scenarios leaking at all.
And we needed to be really, really sure that we weren't gonna leak data from one agency into another at all. Now in a traditional kind of relational database sense, we basically would have put like an agency ID column on every single table. We would have had to make sure that all of our queries went and queried by that, or we would have had to go and actually shard the databases
and maintain multiple databases or something, which makes the setup story a bit harder creating them. One of the nice things about using the graph database is off the root node, we then go and have individual nodes for each agency, and then all of their data exists off the graph. We then have some shared nodes for things like reference data. So we have a list of countries.
So where we go and have like, we'll have reference data, and then we'll have 180 nodes for different countries. And then we'll relate things into those countries in some ways. But then what we can do is very easily, first of all, our queries, unless they accidentally walk via reference data, which we have things to protect that. Our queries, once they start from the root node and go out,
it's very hard for them to get into another agency, even though it's in the same database, because you kind of walked off into this part of the graph. You can't get back over there that easily. And to make sure that we didn't create any relationships between them, what we've actually got is just a single cypher query, where we go and describe, what we can do is we can say something like,
okay, start at the root node is node zero, and then match from root any particular type of relationship, any number of times. So actually we'd go from it to a, has agency, to an agency,
and then through any particular relationship, any number of times, to another agency. And then there's a way you can actually then go and wrap that in a shortest path function. And Neo4j will find the shortest path between those two agencies that doesn't go via the root node.
And if that turns up as a non-null value, we go, oh, hold on, there's a relationship between these two agencies. And then basically the warning sirens go off on the roof and we go and diagnose what the problem is. So we then actually run that as part of our test cases through all the different environments as we go and restore data into them, which is a really cool thing to be able to go and query
at a data structure level, rather than trying to prove that our application code was safe and always queried the agency ID column or something like that. So that's multi-tenant safety. And the last part to talk about would be indexing. So I've talked a lot about going and sort of walking around the graph.
In some cases, though, you do actually want to go and perform aggregate operations of things like, okay, we've got every single client and there's 100,000 clients. Find me the one who's named Bob or something like that. Neo4j does actually have an indexing platform built into it and one of the ones that's plugged in by default is Lucene. So when you go and create a node,
first of all, you can have auto indexing. But what you can do is you can say, okay, here's the node I want to create and then I want to have these key value pairs in the index as well in these indexes. So then for our tenant safety, we actually maintain independent indexes for each tenant that we have in the system. So we'll have like agency six dash clients as an index.
It is a little bit manual at the moment. You have to say, okay, here's my node and then here's the keys and values I want you to put in the index. But we actually found that quite advantageous because in our kind of node model here, we've got a concept of also known as James is also known as Jimmy and Jimmy the kid and whatever.
And that would be like one node for James and then all these also known as nodes for all the other ones because that's how we store the multiple values. But what we could then do in the index is whilst we'd create a node called James, when we put it in the index, we could say name is James and name is Jimmy and name is Jimmy the kid.
And they all point to node one, right? So then any match on that searching in that index matches back to the client, even though the reason that we match them is because they have an also known as stored in another node. So you've got that kind of level of indirection in the indexes as well, which we found really powerful for representing our domain model.
So that's what I've got for today. The kind of what I wanted to walk through there was just about showing you that Neo4j is practical in a .NET world. I hate the fact that it has J on the end of the name and I continue to tell them this because it scared us off to start with going, oh, it's a Java thing that's gonna suck to integrate with.
But it's actually getting to be a really, really nice platform to use from .NET. And it's the same way that you don't care what SQL server is written in. It's just an implementation detail. So it works for us. We've released a lot of open source in terms of Snowmaker and also the Neo4j client. And that's starting to get a good amount of traction, which shows me there's a lot of developers
picking things up. We get a pull request for Neo4j client about once a week at the moment. And we publish about 10 builds a week. So there's a lot of activity in that space, not just from us, which is awesome. So there's the two links to it. Any questions? Yes?
Yep, the query to protect the agencies. How long does that take to run? At the moment, we've got three agencies
with a couple hundred thousand nodes in each of them. And we run that query and have an instrument, but it's somewhere like a minute or something, or maybe 30 seconds. Because we use a shortest path algorithm, and there's a whole bunch of different graph-related algorithms like that in Neo4j. Normally, you wouldn't walk every single node
in your database like that, but it's very good at going and chopping out bits that aren't relevant and realizing there's sub-graphs and multi-threading that and all that sort of stuff. But there's lots of different algorithms in there for doing that. It's really efficient for things like, say you want to do transport routing, for example, and you go and have lots of different bus stops,
and then each of them is nodes, and then you have relationships between them which describe the bus routes. And then on the relationships, you can actually put data in there about how long that bus trip takes. And then you can go to Neo4j and say, hey, find me the shortest path between this node and that node where it's weighted on this property. And it just has all that magic in the box.
So I don't even know how those algorithms run so quickly. They just do, which is good. I don't really want to go and study path traversal algorithms too much. It's a little bit scary. Any other questions? Yep, sure.
So the question is about, is there any security in Neo4j? At the moment, no, there's nothing. It's all at the network level, basically. If you can connect to it, you've got everything. And then the idea is that you do all your security in your application. It's very hard to apply a lot of those sorts of concepts to a graph model, because as far as Neo4j is concerned,
it can't tell, like, what's, so in that first example, it can't tell what's a person node and what's an activity node and those types of things. So you can't really do those same concepts that you could in SQL of saying, grant read access to that table or that category of data. It's just got no way of knowing what it is. So in that sense, because it's so much more complex,
there's not really a lot of, been a lot of push for them to try and implement something around that. One of the issues they are having at the moment, though, so there's Neo4j is available as an add-on on Heroku. You can just say add it to your application. The Gremlin query structure is built on top of Groovy, which is a scripting kind of dynamic layer on top of Java.
So you can actually go and just run arbitrary script in Groovy. So they're having to go and segment all that off at the moment, and they're trying to deprecate Groovy because of the security risks around it and move to that more declarative cipher model as well. It got a little bit scary when we were building our client initially supporting Gremlin
because we're translating the predicates. So we're taking a C-sharp lander expression and having to translate it into a Gremlin predicate because Gremlin is Groovy script, which is Java. I actually ended up writing a method once called encode string as Java string. I was like, am I doing something wrong here? So we've hidden all that away, and we've buried it, and cipher's a lot nicer now as well.
It's much more descriptive language. Okay. Thank you. I'll be hanging around for any questions.