We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Weaviate OSS Smart Graph

00:00

Formal Metadata

Title
Weaviate OSS Smart Graph
Subtitle
feature updates, demo and use cases
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Weaviate is an open-source smart graph that aims to allow anyone, anywhere, any time to create their own semantic search engines, knowledge graphs or knowledge networks. Weaviate is RESTful and GraphQL API based and built on top of a semantic vector storage mechanism called the contextionary. Because all data is stored in the vector space, Weaviate is ideal for; - Semantically search through the knowledge graph. - Automatically classify entities in the graph. - Create easy to use knowledge mappings. Because the use of formal ontologies are optional, Weaviate can be used to create a P2P knowledge network which we want to present during this conference. This is a follow up after the initial design was shared during last year's FOSDEM. Problem Creating a knowledge graph can be a complex endeavor, let alone the integration of semantic search models. Bain & Company research under US enterprise CTO's shows that 59% of them believe they lack the capabilities to generate meaningful business insights from their data, and 85% said it would require substantial investments to improve their data platforms. Solution Weaviate aims anyone to create large, enterprise-scale knowledge graphs as straight forward as possible. Weaviate's feature set allows anyone to; - Semantically search through the knowledge graph. - Automatically classify entities in the knowledge graph. - Create easy to use knowledge mappings. Weaviate's Contextionary Weavite's Contextionary is the semantic vector storage mechanism that stores data -unlike traditional storage mechanisms- based on its semantic meaning. For example, if someone stores information about a company with the name Apple, this data object would be found closely related to concepts like the iPhone. Because of the algorithmic use (as opposed to retraining) of the pre-trained machine learning model, Weaviate is able to learn new concepts fast and near-realtime. This allows the user to update and manipulate the knowledge graph directly. Demo & Use cases During the session, we want to show a few -recent- use cases to demo how Weaviate can be used. The demo will include; querying; semantic querying; adding concepts; going from an ontology to a schema; and more. Knowledge network Because of Weaviate's contextionary, a formal ontology is optional (e.g., "a company with the name Netflix" is semantically similar to "a business with the identifier Netflix Inc.") this allows multiple Weaviate to connect and communicate over a peer to peer (P2P) network to exchange knowledge. Aka, the knowledge network. During the session, we want to demonstrate the first prototype of this network.
Personal digital assistantSharewareInsertion lossPresentation of a groupMultiplication signSelectivity (electronic)Group actionComputer animation
Group actionBitDemosceneCuboidQuicksortOpen source
Directed graphPlastikkarteGraph (mathematics)ArchitectureRepresentation (politics)Endliche ModelltheorieBitGraph (mathematics)CASE <Informatik>Shared memoryMobile appExpert systemOpen sourceCuboidComputer architecturePhysical systemSemantics (computer science)Software developerDirected graphBuildingVirtual machineObject (grammar)Presentation of a groupSharewareMultiplication signOffice suiteFault-tolerant systemWordWebsiteFreewareContext awarenessComputer animation
SpacetimeMultiplication signSpacetimeBitDatabaseImplementationDifferent (Kate Ryan album)Data storage deviceLevel (video gaming)Data structureSemantics (computer science)Graph (mathematics)Decision theoryObject (grammar)InformationTable (information)Element (mathematics)Connectivity (graph theory)RoutingTheory of relativityBasis <Mathematik>Airy functionComputer animation
Electronic mailing listData storage deviceSpacetimeComputer animation
Bit error rateBitMultiplication signSubject indexingRepresentation (politics)Different (Kate Ryan album)WordMachine learningEinbettung <Mathematik>Mechanism designBoom (sailing)Data storage deviceVector spaceFactory (trading post)Graph (mathematics)Presentation of a groupFormal languageMultiplicationQuicksortProgramming languageSpacetimeBit error rateUMLComputer animation
Formal languageComputer programmingWordRepresentation (politics)SpacetimeData compressionArmComputer animation
String (computer science)Object (grammar)WordSocial classSharewareInformationComputer animation
Euklidischer RaumFunction (mathematics)LogarithmWordPresentation of a groupObject (grammar)Factory (trading post)DistanceGraph (mathematics)SpacetimeLogarithmGraph theoryOvalQuery languageVector spaceComputer animation
Error messageGraph (mathematics)Set (mathematics)Object (grammar)ResultantBitComputer animation
ScalabilityComputer networkSelf-organizationGraph (mathematics)Software developerElement (mathematics)SpacetimeWave packetCuboidDifferent (Kate Ryan album)Graph theorySoftwareTouchscreenStandard deviationSemantics (computer science)Vector spacePeer-to-peerComputer animation
Social classCategory of beingGraph (mathematics)BitGroup actionCausalityCategory of beingPurchasingExecution unitFunctional (mathematics)SharewareSemantics (computer science)Filter <Stochastik>Social classComputer animation
Product (business)Graph theoryComputer networkPlastikkarteGraph (mathematics)Execution unitRadio-frequency identificationBookmark (World Wide Web)Directed graphCore dumpPersonal digital assistantSharewareSet (mathematics)WebsiteComputer animationSource code
CNNRandom numberConvex hullBookmark (World Wide Web)FAQExecution unitMetrePoint (geometry)Social classNumbering schemeMeta elementSource codeComputer animationXML
Local ringBookmark (World Wide Web)Boom (sailing)SpacetimeGraph (mathematics)Source code
Boom (sailing)Bookmark (World Wide Web)Twin primeGraph (mathematics)Group actionWordWebsiteMultiplication signQuery languageMeta elementObject (grammar)AuthorizationOpen sourceFunctional (mathematics)SoftwareDatabaseBitResultantVector spaceForcing (mathematics)Web pageVariety (linguistics)Type theoryMusical ensembleGraph theorySemantics (computer science)String (computer science)Sound effectDialectNetwork topologySpacetimeMedical imagingDimensional analysisSource codeComputer animation
CNNBookmark (World Wide Web)WindowDirected graphBitNewsletterSource codeUML
WindowWebsiteAlgorithmGraph (mathematics)Object (grammar)WordPosition operatorVector spaceSingle-precision floating-point formatCASE <Informatik>PrototypeRight angleComputer configuration
CNNRandom numberBookmark (World Wide Web)Point (geometry)TunisSource code
Bookmark (World Wide Web)Annulus (mathematics)GoogolResultantAlgorithmPoint (geometry)Open sourceData modelComputer clusterScaling (geometry)Graph theoryCore dumpVector spaceGastropod shellVariety (linguistics)Formal languageComputer architectureSoftware testingDivisorCollisionComputer animation
Point cloudFacebookOpen source
Transcript: English(auto-generated)
Yes, thank you so much. Well, thank you for having me. So it's indeed it's the last of the day So it's like outside and I saw like a lot of people already drinking beers and stuff So it's like I hope that there are still people in the room. So thank you for listening so this is like yeah last year my my colleague a channel is here and he he presents for the first time we
V8 and what we're trying to do and we've done a lot in the year so today I want to I want to show you actually an update and way more things about We V8 and a lot of things have changed also the things that we've changed into so quickly the agenda what I want to talk about So first I want to talk about what it is because they were cutting, you know, we're new we're new on the scene
So yeah, so what is it? Secondly, I want to talk about what has changed since last year or presented here and this year I want to talk a little bit of the about the technology and last but not least Of course, I want to give it demo. I want to show it in action Alright, so that's the that's what I'm going to do. So first about the
About V8. So V8 is an open-source smart graph. That's what it is. And what we mean with that Well, the fact that it's open source that is the usual suspects, right? So source lives and get up but you can use that using docker, docker compose and kubernetes So we have that all out of the box for you It is smart because of something which we call the contextionary and if you never heard about the contextionary
That's fine because we invented it and I'm about to explain to you what it is and what it does And I want to use I'm going to use one buzzword Sorry for that, but what I what is it has a serious it's it has serious notion So sometimes people talk about like the AI first architectures So like can we build systems that have like these machine and models built into them
That we can build new solutions and as you new ideas and that's what we try to solve with semantic models So what makes it in our case what we like to call smart is the building semantic model and the graph is well I don't have to in this room I don't have to you know share with the graph is but the we we've chosen to use graph QL
And the reason why I've done that is because we see of course there's like a lot of graph experts You know you sparkle or cypher those kind of things, but sometimes when we have developers who find a little bit more difficult We see that they really like to work with graph QL So we really embrace graph QL, and it's like completely 100% graph QL based
So that's how we define what our smart graph is and with a smarter If you can do three things first thing is semantic search So what you can do and that's also what I will demo to you what we know from traditional search if I may call it Like that we search for keywords right so if we write about The company Apple we actually need to search for Apple
But in VVH you don't necessarily have to so if you add company with the name Apple, but you search for example What's the company the business related to the iPhone it will still find your data object company with the name app We also can do based on that is automatic classification So we can automatically make edges in the graph based on the semantics of your that object and last
But not least we can do knowledge representation, and this is often I was always also referred as like nowadays. It's like it's the knowledge graph But it's like I wouldn't say we're necessarily a knowledge graph, but the you can create those similar Representations, so those are the three things the three use cases that we can help you with so
Something important to share a based on what we did last year So the best way to explain is a little bit on time and what I mean with that is like that We saw of course a lot of databases in the past that were more relational base, right? So just you know a road column structures and tables and then of course
We got these Graph databases, so for example the people from they are in room So they made a beautiful beautiful database like that and we chose a year back to store information with Janus and what we did was that we had that semantic element that but I will explain the
Contectionary note I will explain what that is we had that as a feature but last year we decided to just double down on that semantic element, so we got completely rid of the implementation Janus, and we could basically everything ourselves and we that means that we now only have that contextionary to store that information in and
We've actually this is a crown so we actually figured out like this is actually We're really happy that we made that decision because now we could really bring something new You know to the to the stage something different and a different way to handle your data objects and work with them So we store everything in a semantic space
That's what we do and now if you go like whoa semantic space I'm going to explain what we mean with that so but just keep thinking when I talk about semantic space I'm talking about the Contectionary so imagine it like this if you go to a I'm compliant as you can see let's say if you go to a grocery store
And you have shopping list and on that shopping list you might have four items a banana Washing powder you're looking for an apple and a piece of bread So if you go in the Supermarket and you find the banana You know that an apple is going to be closer by than the bread or the washing powder
And if you move towards the bread you know that you're actually getting closer To the washing powder and moving more away from the fruit That's how we store data in the space, and so that's the metaphor to actually imprint The problem that we solve and we do that to something that we that we call the Contectionary and I want to give you a little bit of background where we're coming from and
What's different to other? solutions that around here, so if you go all the way back again time to the to the 1950s it was like a famous quote, and I said like a word can be characterized by the company keeps and But it basically means that you would say that the word Paris would be more closely related to France than it would be to
Holland for example or the US and New York would be more closely related to the US then for example Spain and that basically went for all words and a lot of research was done there, and then we
Jumped for it, and then with the whole machine learning boom we saw there was a lot of work being done with With based on machine learning and these word embeddings And then what we also saw to us and it was like we got first word to vac Got glove and nowadays say what they call in the academic realm which they call a state-of-the-art school bird
But if we now put on our engineering hat we really fell in love with glove And why did we fall in love with glass because? Bird has multiple representations of a word in that that I said, but the glove doesn't Glove has one vector representation for every word and the critique that it often
Got was that well if you for example have to name apple apple can mean of course fruit, but it can also mean the company and That we wanted to solve the problem in a different way, but as engineers We were very happy with this because now we could index it we could index those words in a storage mechanism So if you start a weave here, you can imagine it like this
It's an empty space that you start you choose a language not a programming language, but a spoken language So let's say for example English and the space is filled with all those words So for example if you find apple you find nearby fruit you find nearby company
And you might also find iphone and these representations that we store they have a 600 dimensional Representation which is very fancy, but just it has to do with compression that kind of stuff and The thing that we did is this If you store a data object like this so for example the class company with the name apple
Founded in 1976 etcetera and in when I demo I will show you how that actually works But if you study information, but we've yet does It that it creates a string of the words and the concepts that are in there And it takes those concepts, and it's funnels them down. How does it do that?
It says like it takes the Euclidean distance it combines it with the it's this sounds very fancy But with the logarithmic function based among the occurrence of the word so for example company might occur less often than the word apple, so then we say the word company is more important and Then we even work with word boosting so that you can say certain words are important in my data object
But what we do with that is that we create our first Our first object in our graph, and it gets its own factory presentation So now if we have an empty review to those words in that and we store our first Data object that you can see there there it is
It lives that's where it lives in the vector space. That's what we mean so now if we query for let's say a Company and iPhone it can look in the nearby space and find the data object so now without even having iphone in the data object we can still find it and
That's the thing that we created that we thought like hey, this is this is our thing right? This is our our golden goose egg because now we have different ways of creating graphs and actually query through them So this example for example might look something like this So if I have a data set with companies and this would be my graph QL everywhere I say get give me things which are companies, and I want to have their names, but explore them by iPhone
I might forget this result apple, and as you see iphone is nowhere in the data object and If I have that same that that I said that I would say well a little bit more abstract organize these companies on the concept of redmont I might get back Microsoft and
That's how we structure our graph so and So that's basically what what we view this and as a developer we get comes with a few features So the first thing is that's that's contextionary comes all out of the box You don't have to do your training. You don't have to set it up. Whatever it comes all out of the box
Or basically all out of the container. I should say Adding that happens to the HTTP API, but crawling data to the graph QL API It's completely containerized so you can run it wherever you want And because we use that vector space is very very scalable you can scale this that space very very tremendously
big so I think the the biggest one that we ever tried was a few was a few billions it gets pretty big and Something that we have in the pipeline, and maybe I can show you that next year, but is that we also Can create peer-to-peer networks of deviates so that we can point to semantic elements in different graphs
So that we don't have to agree anymore on our ontologies or on our schemas, but that's like that's that's in the make So a little bit about graph QL, so I have so when I demo that so how we structure our graph QL, this is the The UX if you will of our graph QL So we have a get function first the other one is an aggregate function when I'm going to talk about the get function
We have a semantic kind which we make a distinction between things and actions nouns and verbs We have the class the class has the property a property might have a reference and then the property itself And what you can do you can have these semantic filters on top of that so there you see for example the explore
Filter and they can search for concepts, but you can even move away from concepts of move towards concepts And I will demo that to you in a bit Well demo so now we get to the demo so now you might want us a lot How does it actually look so what I did is that I spun up a a
Docker container if you want to do it yourself if you go to our website send me that technology And then you simply click we've yet, and then you find all the documentation The installation gives you just a we've yet But what I'm going to demo to you is this news publications that are set and if you click that one There's just one simple docker compose command that you can run that you can play you know around with it yourself
So there's a meta endpoint, which I'm just running just to make sure yes, so that it's running well So let's first look at a schema So this is an example of schema so here you see I have the class publication and I have I have the name name so the name of the publication, but for example
Oh, so I said the headquarter geolocation, which is our geocordinates has articles etcetera etcetera this is how we structure the schema this key that's important because we will see that back when we use GraphQL to query and The things that we actually store look like this so for example you see an article the article has an ID
The beacon is a reference in our graph But why do we call it beacon because we do it in the space so it's a beacon in the space and Then here you see for example a summary of the article the title of the article and the URL where the article actually comes from So that's how it's structured So we've created a simple
GUI that you can use to actually You know look through and search to the to the graph so you can visualize stuff but I want to I want to dive into the The GraphQL queries, so let me show you a simple query so if I say get Things and I want to have I want to have a publication and I want to see the name
This would be a valid query then you see rising folk financial times Wyatt, New York and economist etcetera now what I now can do is that I can say well I Want to I want to explore
For the concepts let's say business and I'm going to limit it to three results just that it's easier readable so I know I say do the same query but explore based on the concepts of Business so if I now run this query you see Financial Times the New York Times International New York Times etc But you see the word business the meta tech. It's nowhere there. That's what comes from the contextionary or if I would say
Fashion and I would run it then you see it starts for example with folk so that's how we structured it, and how it works But that's not not only going for like small strings, but also for larger larger text objects so for example
If I would say get things article and I would say show me the Title of the article and around the crazy so now you see all these articles about a variety of topics You see brexit so you can see when we when we actually pull to that end, but it's just a variety of topics in here
But if I now say well, I want to have those articles I'm going to use the explore function again, and I can't I say well. I want to explore for the concepts. Let's say Music and I'm going to limit the results again. Just for the sake of readability
So same query, but now based on the articles So if I run this you say like you see fair enough the first one has the word musical, but then it's about Gwen Stefani, and then it's about John Lennon, so you see the word music is not necessarily in there But it organizes it like that automatically So now even if you want to filter further in this in this graph what you can do is that you can say well
The question we had like so how do we do pagination because if you have a 600 dimensional space? What would be the next? page So what we've done is this so we said well, we can actually for example move towards a concept So I can say well move towards the concept for example of the Beatles
So I guess you already know what will happen if I do that So and I give it a certain force the force is how strongly you want to push towards this concept inside the vector space So let's say a little bit arbitrary, but let's say 85% So if I do that now you see that John Lennon the article with John Lennon comes first
And if I say like I'm more like a stones person, I hate the Beatles then of course you can also do Move away from the concept of the Beatles same query, and we see John Lennon is gone, and now the question is like okay So what makes it so like now the traditional graph people are going like yeah But I haven't seen you the graph in action yet
So that's just very simple because you could say for example has authors and we can say on author The author is a name so this is how we structure the graph so now you see so you see the graph object here That's this first the title of the article And if it has authors and then actually the authors that are related to this to this article
I think if time allows it. There's one more thing can I still quickly show one more thing? Yes, so there's one more thing which I completely forgot because there's another problem that we also try to solve with this And I want to show you that going back to the publications so quickly going back to the publications
things publication and then I say the name of the publication When you glanced over this you might have noticed that we have all these publications We have the International New York Times the New York Times Company and the New York Times in there three times Which is a problem because of course you want to represent concepts, but in a database we have the same concept represented three times
So we have something for that Which we can do is that we can say well we want to group concepts together So then we say I'm going to do the type merge them together I'm going to give a four so how big do we need to look in the vector space before merging and
Then I can say well do that with five percent so if I now run this query You'll see that it merged together the International New York Times New York Times Company and the New York Times But if I now do a graph query, so I say has articles on article The title of the article it even now merges together those different articles from those different publications into the same
concept so that's what we have that's how it works a Way more features the automatic classification I didn't even get to show you that but you can play around with it because well where it's false them So this software's open source. You can play around with it. You can set it up yourself. You can create your own graphs
Of semantic graphs I should say That's my story in a nutshell. Thank you all for listening and If you like it, and if you go to our website, then I just have one question, so this is our website If you go here then you can sign up to a newsletter if you want to learn or you can click on the github star
Button if you want to promote a little bit, so that's but this is our website. Thank you very much for listening
Thank you
And the second question is I guess so you showed us examples No, no, why are you sorry about it? There's a great question so let me let me start with the first answer to the first question so
and and sorry if I went to over that quickly too quickly because it's A this whole everything what I told is also on the website in detail But that's what's happening here. We always knew use the same algorithm So if you would have a that object with a longer sentence like for example the summary or the title of the articles that you've seen
It applies the same Algorithm, so it says like first I take all these individual words Then I find the center position between those words Then I weigh them based on the occurrence so certain words are seen as more important than others so it weighs it towards that
And then we have this optional word boosting that you can say well in this specific case this was very important So move more towards that that's how we create those vector positions so regardless if you're querying or if you're Adding data, that's how we do it so that's why we also became agnostic about the fact that gloves about single words and
Because we learned that if we so we the first prototype we did way back was very simple We say okay. I have the word Apple show me what's nearby and then as you would expect as glove does it says like well I found iPhone, but also found fruit, and then we did something very simple Which okay now go and sit in the center between
Apple and iPhone show me again. What happens, and then you see actually That that's successful, and if I if I may I can actually quickly show that I'm the last okay, so Thank shit, I should have wait till yet said go ahead, and then thank you so there's a contextionary end point
Where you can say well concepts, so if I now literally do what I just said so if I would say Apple Then you see your Apple iTunes Google preview on this and now of course in this example. We don't see Of course the fruit, but let's say if I would do Apple and not based on the company, but on fruit
So I concatenate them apple and fruit you see how it now starts to get better and better in those results So and that's how it the algorithm works, so you can play around with this also yourself on this on this end point and the other question
Is this sustainable for the normal servers or do you have some solutions there that you can pay less?
Yeah sure so we We of course we're also like we're also a business so we have like so the core is like open source But we built like a shell around that so we currently have six companies using this on a large scale in in a variety of industries wholesale retail Oil and gas all those kind of things and these graphs get pretty big and especially if you scale the communities cluster
Fast which is something I'm now I could say that we fancily like architected that but it was actually something We got for free because we just the data model is just vectors only vectors And that's of course very fast to scale and to search through so the answer to that question is yes
I Don't know and I love the idea so we're definitely gonna try that out. I don't know I don't it's a great idea We haven't we haven't tried that yet, so what we currently do is that we just have you have to weave it in a language
so Dutch French English of course But we haven't tried that yet So if you don't mind then it would be fantastic to or of course you can try that yourself or we can do it together But that would be fantastic. It's a it's a great idea, and I don't know
Thank you