Advanced Search Plays with GraphQL
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 60 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/66619 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202346 / 60
5
12
20
23
33
34
35
46
49
00:00
DisintegrationSubject indexingQuery languagePresentation of a groupBuildingMusical ensembleRight anglePoint (geometry)BitCASE <Informatik>Scheduling (computing)MereologyDemosceneExpert systemCoroutineDiagramLecture/Conference
00:55
Presentation of a groupBuildingSubject indexingQuery languagePresentation of a groupMereologyPerfect groupLaptopLevel (video gaming)EmailPlastikkarteLogic gateLecture/ConferenceMeeting/Interview
01:57
DatabaseEndliche ModelltheorieFormal languageQuery languageSubsetRelational databaseComputer animation
02:47
DatabaseComputer programmingGraph (mathematics)HypermediaCASE <Informatik>Connected spaceQuicksortEndliche ModelltheorieKey (cryptography)Time seriesConnectivity (graph theory)File archiverCompass (drafting)Universe (mathematics)Lecture/Conference
04:18
Data typeQuery languageVector potentialSubject indexingDatabaseType theoryNumberProduct (business)Matching (graph theory)Object (grammar)Right angleField (computer science)Process (computing)SubsetToken ringSingle-precision floating-point formatUser interfaceDatabaseKey (cryptography)String (computer science)Compass (drafting)System administratorPhysical systemArray data structureCartesian coordinate systemRelational databaseEndliche Modelltheorie1 (number)Programming languageSampling (statistics)Game theoryGraphical user interfaceComputer animation
07:55
Information securityData modelWorkloadScale (map)Streamlines, streaklines, and pathlinesSoftware developerQuicksortComputing platformDatabaseTexture mappingCore dumpServer (computing)WebsiteCategory of beingObservational studyComputer animationLecture/Conference
08:52
ResultantWebsiteDatabaseCartesian coordinate systemQuery languageClient (computing)Elasticity (physics)Observational studyDemosceneComputing platformServer (computing)Endliche ModelltheorieComputer animation
09:48
MUDBuildingCommodore VIC-20Standard deviationGraphic designRight anglePosition operatorBitGame theoryCategory of beingWeightGoodness of fitOrder (biology)Query languageCartesian coordinate systemFunctional (mathematics)DatabaseSet (mathematics)Zoom lensPlastikkarteProduct (business)Domain nameLatin squareComputer animationLecture/Conference
14:24
TwitterSoftware developerGoogolExpert systemBuffer overflowBlogPoint cloudCASE <Informatik>Multiplication signComputing platformSoftware developerProcess (computing)SoftwareTwitterExpert systemComputer animation
15:13
TimestampGame theoryLink (knot theory)Cartesian coordinate systemFormal languageOperator (mathematics)Type theoryDatabaseSubject indexingFunctional (mathematics)Query languageMereologyNumeral (linguistics)Lecture/ConferenceComputer animation
16:42
Finitary relationFreewareDatabaseSimultaneous localization and mappingNetwork operating systemUniform resource locatorSoftware developerPrice indexSubject indexingRevision controlPhysical systemDefault (computer science)Source codeField (computer science)Texture mappingData typeString (computer science)ParsingTerm (mathematics)Category of beingInformationType theoryConfiguration spaceComputer configurationNormal (geometry)Form (programming)AerodynamicsoutputParameter (computer programming)NumberLattice (order)DreizehnWordMetric systemQuery languageLocal ringMultiplication signNumerical analysisField (computer science)SpacetimeMappingSingle-precision floating-point formatDynamical systemArithmetic progressionMultiplication signEmailDefault (computer science)Text editorSubject indexingConfiguration spaceAdditionType theoryFilter <Stochastik>Presentation of a groupData typeNumberDatabaseString (computer science)Order (biology)Token ringStandard deviationComputer animation
19:58
Category of beingPosition operatorDifferent (Kate Ryan album)Java appletOpen sourceLibrary (computing)Multiplication signSingle-precision floating-point formatDemosceneLecture/ConferenceMeeting/Interview
20:50
FrequencySingle-precision floating-point formatData storage deviceProduct (business)DatabaseElasticity (physics)Token ringSubject indexingMereologyInverter (logic gate)Point (geometry)Inversion (music)Lecture/ConferenceComputer animation
22:18
Network topologyPrice indexStandard deviationInverse problemDefault (computer science)Subject indexingField (computer science)Key (cryptography)Monster groupToken ringTerm (mathematics)Point (geometry)Query languageBitComputer animation
23:50
Inverse problemPrice indexTotal S.A.Subject indexingDefault (computer science)DatabaseLocal ringQuery languageComputer configurationObject (grammar)Sample (statistics)Computer wormField (computer science)Execution unitString (computer science)PlastikkarteFunctional (mathematics)CASE <Informatik>Operator (mathematics)Level (video gaming)Subject indexingImplementationPoint cloudEnterprise architectureLatent heatMatching (graph theory)Multiplication signoutputDatabaseQuery languageDifferent (Kate Ryan album)SubsetTrailMusical ensembleTwitterDefault (computer science)Sampling (statistics)Link (knot theory)Compass (drafting)Gene clusterField (computer science)Function (mathematics)Right angleWeb 2.0Token ringProjective planeComputer animation
27:49
Sample (statistics)Execution unitObject (grammar)Table (information)Content (media)Query languageComputer iconNormed vector spaceLevel (video gaming)Range (statistics)Telephone number mappingField (computer science)outputPointer (computer programming)Perfect groupImplementationFuzzy logicMetasearch engineDifferent (Kate Ryan album)Field (computer science)Functional (mathematics)Matching (graph theory)Artificial lifeMaxima and minimaResultantMereologyCodePoint (geometry)Meta elementMultiplication signMusIS <Museumsinformationssystem>Operator (mathematics)Subject indexingProjective planeCore dumpMultiplicationLevel (video gaming)Source codeLecture/ConferenceComputer animation
31:36
Query languageResultantLevel (video gaming)Electronic mailing listDifferent (Kate Ryan album)Function (mathematics)NumberProjective planeQuery languageWeb pageLecture/ConferenceComputer animation
32:39
Logical constantFunction (mathematics)Field (computer science)Internet service providerFunctional (mathematics)Product (business)SoftwareLogical constantMathematical optimizationField (computer science)Subject indexingMatching (graph theory)Dynamical systemLevel (video gaming)QuicksortMappingSource codeSpacetimeResultantCASE <Informatik>Set (mathematics)Computer animation
34:41
Token ringSubject indexingType theoryDigital filterFunction (mathematics)Subject indexingQuery languageExpert systemData storage deviceVector spaceComputer animation
35:47
Metropolitan area networkMereologyAlpha (investment)MultilaterationFunctional (mathematics)Computer animationLecture/Conference
36:35
Peg solitaireInfinityBitRepresentational state transferBuildingGoodness of fitLibrary catalogWeb pagePhysical systemParameter (computer programming)Query languageFront and back endsFilter <Stochastik>Lecture/Conference
37:55
Physical systemRepresentational state transferTelecommunicationWeb pageFront and back endsDifferent (Kate Ryan album)Product (business)Single-precision floating-point formatDefault (computer science)Lecture/Conference
39:08
Wide area networkServer (computing)Query languageDeclarative programmingRight angleField (computer science)Run time (program lifecycle phase)Library (computing)Computer programmingLatent heatDesign by contractParameter (computer programming)Instance (computer science)Integrated development environmentDependent and independent variablesSoftware frameworkType theoryClient (computing)Server (computing)MultiplicationMathematicsOperator (mathematics)Communications protocolVariable (mathematics)Multiplication signResolvent formalismObject (grammar)Physical systemPoint (geometry)Level (video gaming)Functional (mathematics)AuthenticationBoilerplate (text)Error messageDemo (music)CuboidoutputParsingGraph (mathematics)Cartesian coordinate systemBit rateDebuggerWeb applicationDatabaseContext awarenessPoint cloudBuildingFront and back endsInheritance (object-oriented programming)Lecture/ConferenceComputer animation
45:27
DatabaseService (economics)BuildingServer (computing)Mobile appWeb pageFunction (mathematics)Number theorySynchronizationSource codeSoftware development kitConfiguration spaceOnline helpString (computer science)Data miningHuman migrationPoint cloudInternet service providerSoftware developerTemplate (C++)CodeAuthenticationRule of inferenceBlogLinked dataMetric systemInstallation artField (computer science)Computer iconWritingReading (process)Physical lawQuery languageRootType theoryLine (geometry)Graph (mathematics)Software testingKeyboard shortcutReal numberMenu (computing)User interfaceAkkumulator <Informatik>FlagToken ringClient (computing)Slide ruleCASE <Informatik>Sheaf (mathematics)Computing platformCartesian coordinate systemMathematicsQuery languageDatabaseDemo (music)Source codeRule of inferenceField (computer science)Row (database)Server (computing)Configuration spaceWindowMoment (mathematics)Front and back endsDesign by contractType theoryProjective planeSampling (statistics)Order (biology)Object (grammar)SubsetDampingFunction (mathematics)Electronic program guideWritingGraphics softwareLevel (video gaming)Linked dataBitRight angleLink (knot theory)Reading (process)QuicksortSelectivity (electronic)Limit (category theory)Mobile appComputer animationLecture/Conference
52:03
Token ringPhysical systemClient (computing)Query languagePoint cloudFunction (mathematics)CodeMenu (computing)Group actionText editorLevel (video gaming)Complex (psychology)Operator (mathematics)String (computer science)Digital filterContext awarenessService (economics)outputLimit (category theory)Asynchronous Transfer ModeData typeExecution unitSource codeAuthenticationConfiguration spaceGoogolInternet service providerEmailPasswordFacebookCartesian coordinate systemProjective planeNumberFunctional (mathematics)Type theoryUniverse (mathematics)AuthenticationString (computer science)ResultantPasswordQuery languageGoodness of fitDefault (computer science)Constructor (object-oriented programming)LoginLevel (video gaming)Key (cryptography)Limit (category theory)Resolvent formalismField (computer science)Fuzzy logicoutputParameter (computer programming)Maxima and minimaCodeObject (grammar)Operator (mathematics)Sheaf (mathematics)DatabaseLaptopMachine codeSource codeImplementationEmailInternet service providerInheritance (object-oriented programming)Different (Kate Ryan album)Electronic mailing listComputer wormComputer animation
58:26
AuthenticationService (economics)Rule of inferenceLine (geometry)Subject indexingMobile appWindowGraphical user interfaceComputer fileView (database)Internet service providerLinked dataSource codeBuildingSynchronizationFunction (mathematics)AuthorizationFinitary relationSoftware developerError messageInterface (computing)Local ringInstallation artIntegrated development environmentRange (statistics)Metric systemCodeBookmark (World Wide Web)Frame problemTerm (mathematics)Online helpSoftware repositoryCommon Language InfrastructureBlogRun time (program lifecycle phase)Heat transferTotal S.A.Line (geometry)Subject indexingMobile appWeb 2.0Integrated development environmentCartesian coordinate systemInsertion lossHome pageComputer animation
59:19
Installation artPlug-in (computing)View (database)Bookmark (World Wide Web)Graphical user interfaceComputer fileError messageIntegrated development environmentPoint cloudFront and back endsCodeService (economics)Computer configurationSynchronizationBuildingWeb serviceMaxima and minimaWindowBinary codeComputer networkValuation (algebra)Category of beingContent (media)Source codeSoftware developerDemosceneMemory managementDegree (graph theory)CodeOpen setMobile appFunctional (mathematics)Computer animation
01:00:13
Query languageError messageKeyboard shortcutFunctional (mathematics)ImplementationLink (knot theory)TrailComputer animationLecture/Conference
01:01:33
Point cloudSubsetDatabaseFunctional (mathematics)Enterprise architectureLatent heatMeeting/InterviewLecture/Conference
01:02:29
Link (knot theory)Multiplication signLecture/ConferenceMeeting/InterviewDiagram
Transcript: English(auto-generated)
00:09
Thanks, everyone. I appreciate you being here. I know it's like a very long talk, and there are a bunch of other great talks coming on right now. So if you feel like leaving at a certain point to watch a talk and coming back, that's totally fine.
00:23
This is our agenda today. So we are going to start with Search, Atlas Search, which is based on Lucene. It's kind of a basic introduction to MongoDB and to Atlas Search. So if you're already a search expert, this might be a bit basic for you.
00:41
But you'll still get a refresher on Lucene. And the second part, which starts at around 10.10, I'll try to stick to this schedule in case someone wants to join after the first talk, is going to be GraphQL. How many of you are here because of GraphQL? Yeah, that's what I thought. Perfect. The other important part about this presentation
01:02
is that it's very hands-on. So if you feel like following the tutorial while I'm presenting, you can totally do that using your own laptop or just on your phone, looking at the agenda and following the topics there. I get super distracted when I watch someone on stage and I need to have something to play with.
01:22
So I will totally understand if you're on your phones doing different stuff. Don't worry, I won't be offended. This is the workshop itself. It's also fully offline. You can do it later.
01:40
It doesn't require any, there is no pay gate. You can register for Atlas without using any credit card. You can just provide your email, register, and follow the steps there. All right, let's get started with a short intro about MongoDB.
02:04
I'm sure the people who are leaving are gonna come back later. So the thing that we are gonna be using the most today besides Apache Lucene is the MongoDB document model and the query API. The query API is the API, which is not exactly a language
02:24
but the API that we're using to query documents to query the database. So kind of like the SQL that people use to query a relational database. So what is the deal with these documents?
02:40
Documents are really a superset over any other model that people use when they model their data in a database. So you can model relational connections there as well using documents. You can have foreign keys and make references to other documents. So this is possible. You can model graph data. Of course, if you have very heavily interconnected data,
03:03
like, I don't know, social media is an example that people give when they talk about graph connections. You might wanna use a dedicated database, but that's a very rare case. You can definitely model graph data using a document database as well.
03:20
And you can model time series as well. Time series data with sensor data that you get every milliseconds, every 100 milliseconds data from sensors and put that in buckets inside documents and even archive that using a TTL. So documents are quite universal.
03:41
When people say a document database and they say a NoSQL database, they don't really usually realize that documents can be used to model all sorts of data. And this is maybe for historical reasons because people use document databases to just put any sort of JSON object there,
04:01
unstructured data and not really use it as a database. But that's not really the case. So let's take a look at the documents that we will be using today in this workshop. So here, this is a program called MongoDB Compass. It's a graphical user interface
04:21
that you can use to connect to your database cluster, MongoDB database cluster. I have already connected to the production cluster that we will be using today. So on the right, we see the databases in this cluster. We have some system ones like admin, config, local, and this is our actual data called soccer. I apologize in advance.
04:42
This workshop was prepared initially by my colleague, Karen, she lives in the US and that's what they call football there. So if I say soccer, you know what I'm talking about. We all know the proper name for the game. But yeah, so we have the football database and we have players collection inside it,
05:01
just a single collection so that we can focus over our efforts. And on the right, we see the documents. This is a sample of all the 19,000 documents in the collection. So if we take a look at one of these documents, it looks like an object, any programming language object.
05:23
And this is what documents in MongoDB look like. So we have different fields. The underscore ID is the primary key. It uses a special type called object ID. Then we have, you can see we have strings, we have numbers.
05:44
More importantly, we can have nested objects or sub documents. This is actually not a very nested object. It's a special date type, but you can have nested sub documents if you want. You can have nested arrays and this is all supported by design.
06:01
So of course you can have sub collections and stuff like that in some relational databases, but this is not by design. So these are features that were added later on. But with documents, with any document database, you can have nested objects and also arrays.
06:21
You can see that this is a very large document. And one of our jobs today will be to build an API that returns just a subset of all that data. You can argue that this is not a great model. Obviously you don't wanna store 200 fields in one single document, but we'll ignore that.
06:43
We will just build our API in such a way that we return just the subset of data that we need for the application. All right, what else do we have here? So here we can query the database. So for example, I can query using an exact match
07:00
so I can get the player with the short name L.messy. This should return the first document I see here. And you see here that we have just one document. Now, if I misspell it or do any other mistake that doesn't match exactly,
07:20
you see that we don't get any documents. This is an exact match. And in opposite of that, today we will be using full text search. So in full text search, this whole thing will be broken into two separate tokens using the tokenizer for keywords.
07:40
And we will be able to also use fuzzy search. So if I just use messy, this should be able to find it because we are gonna be using full text search as opposed to this exact match search that we see here. All right, back to the documents. So MongoDB Atlas is the platform
08:01
and from that, I will be showing you today. It is built by MongoDB, the company. At its core is the document database MongoDB but it's a lot more than that. It has all sorts of services, including Atlas Search which is the full text search that I was mentioning.
08:20
The whole idea is that you have integrated everything at the same place and it's all integrated with your operational data. So what is Atlas Search and why is it important? In a study done by Gartner, they found that 87% of shoppers that open a website,
08:41
an e-commerce website, go right to the search bar. So this is the first thing that they use. They don't browse the website or go to a certain category. They go and search using the search bar. And another study found that 68% of the people that don't find the result in their first search query just leave the website.
09:02
So the first experience of 87% of shoppers is actually the search bar and that's why search is super important for modern applications. So why do we wanna use Auto Search? In the past, people, our clients at MongoDB
09:22
were integrating their MongoDB database with existing services like Elastic and Sower and all of them use Apache Lucene. So we thought, why don't we bring this into the platform? So we just implemented Atlas Search which uses Apache Lucene under the hood.
09:40
And the whole idea is that you have all the services, everything you need in the same place integrated with the document model. So our goal today or something that we call the search game will be to make sure that this search bar is, this isn't, okay,
10:01
is as big as a goal net and everything that our users search for is surely gonna be a goal. It's not gonna be a miss. So in order to do that, we need some players. The search game that we're gonna play, this is the application that we will use actually.
10:21
So we want to make sure that everyone who visits the application finds whatever they need and they leave us a good review and they find the products that they need. So let me show you the application before I proceed. This is a real application. It's, the domain is Atlas Search Soccer.
10:41
You probably don't see it, but it's actually atlasearchsoccer.com. It allows you to build your dream team. So if I select the position here and search for a player, I don't really know any forwards playing for the German team right now, but I can do that and search for Thomas Müller.
11:07
And you saw here that they didn't use the umlaut, but they were still able to find Müller because they have fuzzy search. So even if this character is not the right one, I was still able to find the right player.
11:22
I heard that this guy is really good. Chai Havertz, I'm not sure how to spell his name, but yeah, we were able to find him. This is a data set from 2022. So you can see that some of the clubs might be different nowadays, but yeah, we have Chai Havertz, so I can build him here.
11:44
And there is a lot more to this application. So we have text search, full text search. We have wildcard search. So if I search for Czesny, I don't really know how to spell his name, but if I use a star for like the wildcard
12:01
and finish this with NY, I should be able to find him. Yeah, I managed to find him. Another cool thing is the autocomplete. So if I search for Ronaldo, I have the autocomplete search. And here on the right, you actually see the query itself.
12:23
So if I zoom in a little bit, you will see that this is actually the query that is being executed against my database to find this player. If I go to advanced scouting, I have facet search. So facets are these categories that they can select.
12:43
And you see here that I have now filters, the text should include France. I can filter by a team as well, by position and so on. So definitely you can play with this and see how to build these queries yourself. It's kind of a self-documented application.
13:01
It's really cool. And I wanted to show you one more thing that we will actually build ourselves. So let's go back to text and search for Ronaldo. Ronaldo, as you know, is a very popular name in the Latin world. So we see a bunch of Ronaldo's here, but we actually want Cristiano Ronaldo.
13:22
So how do we make this happen? How do we make sure that the users find the most popular player? Well, you can modify the scoring. So every single player here is scored based on the relevance of this text compared to their actual long name. We can use function score.
13:42
And if I zoom in here, we can add the relevance to the overall. The overall is the actual score for this player, like a score by FIFA from one to 100. So we have the overall, we add this to the relevance score,
14:00
and now we actually get Cristiano Ronaldo first with the score of 93. And then the other people are sorted based on their overall score. All right, so this is our simple application. We will build a bunch of this functionality today, everything that I showed you besides facets.
14:30
A random introduction in the middle about me. My name is Tanimira. I live in Bulgaria. I work for MongoDB. I'm a developer advocate there, so my job is to come here, talk to you,
14:40
and actually I would love to hear about your use cases as well after this talk. So you can find me and network with me. If you can't find me, you can reach out to me or LinkedIn, Twitter, or whatever platform you use. I'm also a Google developer expert for Angular and the Coursera instructors. I have published courses on Coursera.
15:02
I write some articles, and yeah, I speak at conferences. I've been to Berlin many times, always for conferences, and I really love the city. How many of you here live in Berlin? All right, cool, cool, cool. I don't live here, but I enjoy the city a lot.
15:21
All right, but let's go back to this documents thing. We saw the documents that we will use, but we need a couple more players on our team to make the search game successful. We loaded the data. I won't show you how to load this data, but if you follow the workshop, the link that they shared, there is instructions how to load this data
15:40
into your own cluster when you deploy it. I showed you the application as well. This allows us to find players with weird names. And how do we search actually using MongoDB Atlas? So we need the data, the database collection. We can create a search index for a collection.
16:01
So the index is for a particular collection, not for the whole database. And then we can query using the door search operator. Door search operator is part of the query API that I mentioned, the language that we use to communicate with the MongoDB database. So let's build a search index.
16:22
The support types for a search index are all of these types. Today we will be using just text and numerical. We could honestly use just text, but I want numerical for the function scoring.
16:43
All right, so this is MongoDB Atlas. This is the cluster that I have deployed and I loaded the data into. I can open the cluster and I will show you that, yeah, this is the same data that we saw earlier. So in order to build a search index, I need to go to search over here
17:02
and create a search index. I will use the visual editor. In the end of the day, what is generated will be a JSON configuration anyway, but let's start with the visual editor. The index name, I'll keep it as a default because I don't have any other indexes in this cluster.
17:22
And I will select the database and the collection. And that's the configuration, the default configuration that we have. So the index analyzer that we have is Lucene standard. So it will break the player names or all the text in all fields into tokens
17:41
based on like the white space or commas. This is how it will generate tokens. We also have something called dynamic mapping. Now, dynamic mapping is really cool if you're just getting started. It basically goes through all fields in the documents and it says, this field is of this type,
18:00
this field is of this type. It creates mappings for every single field, but it's kind of slow because it will create mappings for all these 200 fields that we saw. So I don't wanna do that even though it's very easy to get started with. I'm gonna go and refine the index. I will disable the dynamic mapping
18:22
and I will add my own field mappings. And it's really simple to add them. So I wanna map the short name of the player and this is of type string, I'll add that. Then I will also map the long name, this kind of the full name of the player,
18:43
Lionel, Andres, Messi, and so on, or for Ronaldo maybe 10 names. I will also map the overall. And this now is actually a number. So I'm gonna have to find the number.
19:02
And I think this is enough for now. You can create mappings for all the fields that you need. And of course you can refine the indexes later as well, re-index your data. That's everything we need. You can see we support mappings as well. So if you need a multilingual search,
19:21
you can create a mapping from English to Spanish, from English to German. Let's save the changes and create the search index. We see that the build of the index is now in progress and we will receive an award. I will actually receive an email when it's built. I'm not sure why this is the default behavior,
19:41
but every time I play with it, I get an email. We can follow the status here. It should take up to a minute or so. So let's go back to our presentation while we wait. Again, all of these data types are supported. And in addition to them, there is also support for facets. Again, facets are a clever way to put the data
20:03
into different buckets based on some categories. So a category might be the position of the player or their club or their country. All right. So actually the star player on our team today
20:21
won't be messy. It won't even be bumpy. This is an old picture, like the cup is in the previous player's hands now. But anyway, the star player of our team today is Apache Lucene. And I'm sure if you were here yesterday, you have heard about Lucene.
20:41
And if you're at this conference, I'm sure you know about Lucene as well. So I'm not gonna spend too much time on it. It's an open source Java library that every single search product, like most popular full text search products like elastic solar and that will search use under the hood.
21:02
And what is really powerful about this is that we will be actually using this document database. We will be passing it into the Apache Lucene analyzers and we will generate some tokens. And these tokens then will be used to create an inverted index
21:20
that we will use to search for data. So what is the deal with this inverted index? The inverted index, if you think about it, is really every single token points to a document. So if you break L.messy into two tokens,
21:40
in the inverted index, you will have L. pointing to the document for Lionel Messi. You'll have Messi pointing to the same document. And actually every single player that has the L.token as part of their name will point to all the documents that are relevant to this player.
22:02
But how does this inverted index compare to the MongoDB index? So we have auto search, which creates an inverted index and MongoDB uses B3 indexes. So how do they work together? So this is a default MongoDB document
22:24
and by default every MongoDB document has an index on the primary key, the underscore ID field. So if we have the Manchester United term and we split it into tokens using the standard analyzer, we will have the Manchester and United tokens.
22:46
Now the standard index will be this, using the primary key. The inverted index will be built by auto search using the tokens. And now the tokens, you can see that they point to documents one and two,
23:03
so to Manchester United and Manchester City for the Manchester token and United points to one and three, to Manchester United and Newcastle United to these two documents. This is how they compare. So actually we will use both indexes when we are querying. Let me just elaborate a bit more on that.
23:21
So when we send a query using the Manchester query, like as a text, first we will go to the inverted index. We will find the one and two there, the value of the index, and then we will go to the standard index and we will find the actual documents. And this is how it works under the hood.
23:43
These are all the analyzers that Lucene supports and they're also available on the website. They're also supported in MongoDB Atlas. So for example, you can use the keyword analyzer as well and make sure that the whole string stays together.
24:02
And in this case, Manchester United will be just one token that points to the concrete document there. All right, I hope the index is already built and we will proceed with the last stage, which is querying the data using the dollar search operator.
24:30
Okay, we see that the index is now active. It has indexed all these documents
24:40
and we can start using it. Now I will use MongoDB Compass again because it's easier to build queries there. So if I go to aggregations, I see again a sample of my documents. We see a bunch of documents and I will create a new query.
25:02
So create new, where's the stage? So in the query API, we build aggregation pipelines. A pipeline contains different stages.
25:22
It's basically an array with different stages that we execute and each stage transforms the data. This is really zoomed in and yeah, that's why I don't see the stage. So I'm gonna expand that. All right, so this is the input. This will be the modifier, the stage,
25:42
and this will be the output. So the first stage will be dollar search. Do you see all right? Okay, so in the dollar search stage, we need to specify the name of the index. The name was default. Because it's default, I can just remove it. If it was something else, I should put the specific name.
26:03
Then I should provide the query. We're gonna be looking for messy and the path. The path will be short name. So when I do that, you see that we have one document matching. And if I search for L.messy,
26:21
now I will get 10 documents. So I will have winelmessy. I will also have L.bailey, L.belegrini and so on. So this matches the first token. Is that clear? All right, let's keep just messy for now. And I will add one more stage
26:41
to the aggregation pipeline after search. And I will use project. Project basically allows me to filter the fields that they need. So I can say I want just the short name, the quop name, quop name.
27:02
Yeah, quop name exists. And yeah, we have just messy and quop name. And I also want the overall score. You can see it's all data. He's no longer playing in PhD, but that's another topic, don't wanna get into it.
27:21
All right, let's go back to the first stage. Did I add a new stage? Oops, sorry. And see what happens if I make a mistake here, if I misspell messy's name. I don't get any documents. But how do we fix that in full text search?
27:43
I mentioned this a couple of times already. Anyone has any ideas? Fuzzy search, yeah, perfect. So let's implement fuzzy search. It's as simple as going here and saying fuzzy and then providing the maximum edits that we can make.
28:00
So the maximum mistakes that we can make. And now we get eight documents. Let's go to the second stage and see which are the documents. So we have messa, messy, mussy, melee, and again, messa, a few more messas, may, may funk.
28:20
So we got the result that we kind of wanted. But let's see what is the score of these documents. Why were they sorted in this way? To do that, I can use the meta documents. So if I can create a new field called
28:41
search relevance score. I don't think it matters what I call it. I think I messed up this syntax. So I will quickly go to the workshop.
29:03
If you follow the workshop later, you will go through exactly the same thing. So fuzzy matching. Let's grab this real quick.
29:21
So it's meta search score. All right, the search score that we see here is the reason why the results are sorted in this way. So we see the most relevant results first. So we see that messa and messy have the same search score together with the other people
29:40
that have just one difference in their name. And may has a lower score. So this is 2.9, this is 2.6 because I guess it's a three letter name. But again, how do we implement this relevance scoring? How do we use the overall to make sure that we see messy first?
30:03
I need to go back to the search and amend it over here. So I'm not gonna bore you with any more live coding. I will just grab this piece of code and tie it.
30:22
I'll go to the other stage again. And now we have a score. So we had two point something before that. Now we have 95 because we are adding the overall field to the relevant score. And now we have the actual player that we want first. Let's go through the syntax here.
30:42
So we basically in the same index, the same operator, I added score and I am using a function to add these two fields to sum them. I am adding the overall and the relevance. And this is how simple it is to modify the score.
31:02
I can also use multiply and it will have the same result. So now if I go to project, you see that the score is 270 because we're multiplying 90 by 2.3 or something like that.
31:23
All right. So this is the search part. We are right on time. Let's just finish off the data search part and we can start with GraphQL. So what we did here to summarize
31:40
is we built an aggregation pipeline. Again, an aggregation pipeline is a list of different stages and each stage modifies the results. The next stage gets the modified results and in the end we have an output document or a bunch of documents or we can have just one number if we want.
32:02
So we had search as the first stage, then we had project and we can add as many of these as we want. This is the search query that we used. Again, we used fuzzy search to make sure that we can find Messi even if we misspell his name.
32:21
And we also use relevance scoring. So why is relevance scoring important? Because first of all, everyone uses it and if you don't find your results in the first search query that you provide, you are very likely to leave the page and never come back.
32:40
So we can use function scoring, we can use also boost and we can provide a certain constant. These are all coming again from Lucene and they're all available in Atlas Search. We use function scoring because it's really easy to implement. We can also use boost. Boost allows you to boost certain documents. For example, you can get discounted products first,
33:03
like see if the product is discounted then boost the scoring. And everyone uses that. It feels like a hack because sometimes a certain e-commerce provider is first showing you their own products and maybe this is not very ethical, but they do it. And if everyone is doing it,
33:21
you should do it too as long as it's ethical to your users. Now, a couple of performance tips. Don't use, so yeah, first don't use dynamic analyzers where possible. Dynamic mapping, sorry, use your own field mappings. You can improve your index space
33:43
using custom analyzers. The second thing is don't use underscore, dollar sort. This is a sorting stage that you can use in MongoDB, but it doesn't make sense to use that in Atlas because all of your results are already sorted based on their relevant score.
34:01
So optimize the relevant score instead of using sort. If you need to use match or sort for some reason, you can actually use computed fields. We call them stored source fields, but basically these fields will be indexed together
34:24
with your search index. And this will allow you to use match or sort in a much more optimized way instead of scanning the whole data set. If you ever have this use case, you will figure it out. Don't worry about it too much right now.
34:43
All right, so the deal with that to search that we saw, we have our data, we tokenize it using a Lucene analyzer. We can use different analyzers. Then we create an index and then finally we use the query API to query that data.
35:00
And this is the whole magic behind it. And this allows it to be very good in score. This amazing animation is done by my colleague, Karen.
35:22
Look her up. She's our search expert. But yeah, basically search provides you with all these things. Most of them are coming from Lucene and I can't really say that, but we might also have vector storage available soon.
35:42
If you're interested in that, and you probably are if you're here. All right, before we start with the second part of the workshop slash tutorial, I can give you this again, if you are interested in following the tutorial later. Again, we went through the first part, which is the alpha search part.
36:01
Now we're gonna go through the second part of the workshop, which is the GraphQL part. So together we will build a GraphQL API that exposes the search functionality that we built.
36:21
All right, let's get started with GraphQL. How many people here have used GraphQL before? All right, that's really good. So I can tell you anything. And if you have questions, ask these two people who use it. But let's start a bit back. How many people have used the REST API?
36:40
All right, that's good. So I can use this reference. The REST API usually works the following way. You decide to build a REST API. Let's say that we're building a movies catalog. So we create one, our first endpoint, which allows us to fetch the movies. And then we realize that for this page, we actually don't need everything in this movies document.
37:02
So we create a movies titles endpoint. Then someone comes and tells us this is not the right way to build a REST API. You can't really build movies titles because this is not a real entity in your system. So you might be very smart and create a query parameter for filtering.
37:20
And that's fine, that's totally fine. It's a bit non-standard, but you can do it. And then on the next page, you have the movie details page. So you need the movie itself, a bit more details about it. And you also need the comments, the reviews that people left for this movie. So how do you fetch the data from the backend? You'll get one request for the movies,
37:42
another request for the comments. And then you decide that you wanna be smart. You haven't heard about parallel requests, so you wanna optimize it. You build a movie with comments endpoint. And this is maybe not a great idea, but I mean, I've seen systems where you need to extend requests to get everything you need for a page.
38:01
So people build these kind of endpoints. And at the end of the day, this is how your REST API might look like. And the biggest problem is that actually, usually there is no much communication between the frontend team and the backend team. So we don't know which endpoints you need. So you'll go and fetch like a weird endpoints
38:23
and then ask them to build something else or go and build something else, even though it already exists. And yeah, this is the pain from REST, one of them. And some people argued that GraphQL is the savior, the solution that can fix these problems
38:41
and many others that come from REST APIs. Now, GraphQL has its own problem. So everything that they list as benefits, like don't take it as a one solution to every problem. Just do your research and see if it fits your own product.
39:03
The difference is that in GraphQL, you have just one single endpoint. By default, maybe slash GraphQL, my API slash GraphQL. And you have a declarative language that you use for building queries. This is how a query using the GraphQL query language looks like.
39:20
So you're saying, I want movies and I want just the title field from the movies, nothing else, nothing more. And this is what you get in return, just the data that you requested. So makes sense, right? Why is it like that in REST? You can also have nested objects. So if you request the title, the IMDB object,
39:41
specifically the rating from it, the year and the directors, this is what you get back. As simple as that. You can have parameters. You can filter by movie ID, for instance. You can see that we have a type here, which is something that we don't necessarily have with REST. We have a type that is actually among DB specific fields.
40:04
So our API needs to understand this type. We can use this to query specifically for the movie that we want. We can actually query multiple entities. So the powerful thing is that we can get the movie together with its comments with just a single request.
40:21
If this is how our UI should work. But what exactly is GraphQL? We saw some very magical things happening here. So this is how you use an API, but how do you build it? What's behind it? GraphQL is actually a specification. It's not a set library or a technology
40:42
or like many libraries. It's a specification that you can implement yourself. And GraphQL is actually something that people call these two things, the query language, GQL or GraphQL query language, and the server-side runtime that understands how to execute these queries.
41:03
So the GraphQL query language basically has queries that allow you to fetch data and mutations that allow you to change data. It also has something called subscribers, saying this just for like consistency, but we are not gonna discuss subscribers today. So basically queries get mutations, change data.
41:25
And if we take a look at one query here, what do we see? So first we have the operation type. We have an operation query. Then we have the name of the operation. We can call it whatever we want. And then we have the variables.
41:42
Then every single field that we see here is something that we can get from the API. And these fields should correspond to functions that know how to resolve the data that should be returned when we request these fields.
42:01
Next to the fields, we have arguments. So every single field can have an argument, not just the top level field there. And our API, our server should know how to like read this, parse these arguments and what to do with them. So this is the query language. What about the runtime?
42:20
The runtime, runtime here in this context means the execution environment. So when we run our program, what is the execution environment that can understand a GraphQL query? It's defined by the specification and you can implement it yourself. And basically it allows you to execute GraphQL queries.
42:42
There is one more thing though. So it's not just the query language and the runtime. There is also a type system. And this is really cool because it's a clearly defined contract between the client and the server, the one who requests and the one who responds. So when you request data, if you misspell something or you put the wrong type,
43:03
you will get an error. And this is because you have a clear contract. The contract contains the available operations, so all the field names, the input parameters, and all the possible responses.
43:22
And this is how a typical web application that uses GraphQL might look like. So we have some front-end framework, React, Angular, whatever, that sends queries through a GraphQL client library. This GraphQL client library knows how to convert the query language to an HTTP request.
43:42
It could be another protocol, but yeah, HTTP is simple and universal. This goes to a GraphQL API, and now the GraphQL API can be just a federation point across multiple services. So one GraphQL API can request multiple other services, databases, other APIs, combine the data,
44:04
and pass it back. In the example that we will see, we will have an Angular or React application, it doesn't really matter, which uses a client library called Apollo, and it communicates over HTTP
44:22
with a MongoDB Atlas GraphQL API that we will build now. The GraphQL API exposes data from an Atlas database, a MongoDB database that is hosted in the cloud. So what is the purpose of this GraphQL API? Why are we building it with MongoDB Atlas?
44:40
What are the benefits? And the reason is that they have only 20, 30 minutes and they don't want to spend all the time building resolver functions. It's super easy to build it. It's fully serverless. It has all of the box authentication that we will implement in the next demo.
45:03
And basically it gives you everything out of the box. It's kind of slow and painful to build a GraphQL API on the server yourself. It's very easy to use it on the front end, but then when you have to sit down and build it, it takes some time. There is a lot of boilerplate that you need to write.
45:21
That's why we will just generate it and use it for granted. Again, the slide, if anyone wants to follow, you don't have to, of course, but just in case. So we are going to be following this section over here,
45:42
Atlas GraphQL API, and we'll build our own GraphQL API. I'll keep this open because there are a few configurations that I don't want to write in front of you from scratch. I'll just copy them. So again, MongoDB Atlas,
46:02
this is our database with Atlas Search. We have something called Data Services, which encompasses these things. And now we'll go to App Services, which allows us to build a backend for the database and for Atlas Search. I'll close this window. It instructs us how to write our own
46:22
like MongoDB Atlas application. There are a bunch of guides that you can follow. There's also a GraphQL guide, but I'm not going to follow that. I'll just do it myself.
46:44
All right. Just to make sure that this is connected to the correct cluster, I will go to Linked Data Sources here.
47:02
I'm not sure it's linked to the correct service, so I'll link it myself. So I can select a data source, or maybe I have the wrong, okay. This is the wrong project, so I'll go to my other project.
47:21
So what happened here is I have a project with no database, and I created a new application, and it's created a new cluster for me, but I actually need the cluster with my data. So now I'm in the cluster that has data. I'll build my own application. And when I go to Next here,
47:46
and when I go to Linked Data Sources, yeah, we see that it's linked to the Atlas Search soccer cluster, which is the correct cluster. All right, to enable this GraphQL API,
48:02
I need to do just two things. First, I need to set up data access rules, so who can access the data and what they can do with it. I'll select the players collection over here, and I have some presets over here. So I can select Deny All Access, Read All,
48:21
Read and Write All. I'll just select Read All because I want everyone to be able to see all data about the players. If I wanted, I could start from scratch and define field level permissions so everyone can see just specific fields instead of everything. I could do that, but let's keep it simple for the demo.
48:42
So I'm adding this preset role, and this created a draft for my application, so I need to deploy the changes. So this is an actual backend that someone might be using at the moment, so I need to redeploy any changes that they make.
49:01
And then the second thing is defining a schema. As I mentioned, GraphQL has a strong API contract, so we need a schema, a strongly defined schema with the fields and their types in order to generate this GraphQL API. Again, I need the same collection. I will define my own schema,
49:21
and here I can actually generate the schema using a sample of the documents. So Atlas will do that for me. But because I wanna expose just a specific subset of the documents, I will skip and define my own schema. I'll go to JSON, and here you can see
49:44
this is a JSON object. You can write everything from scratch, or I can just go to the workshop and copy the data that they need. All right, so you see here we have around 20 fields.
50:03
Instead of the 100 fields that I had initially. So I'm gonna save this, review and deploy, and that's everything I need. Now I can just go to the GraphQL tab.
50:23
You can kind of see it here. This is the GraphQL endpoint that I can just copy and start using in my own application, and we will do that. And I can also play with the GraphQL API that was generated using this graphical editor.
50:44
So there is a query that was built for me. If I go and execute it, I should be able to get one player. It's a random player from the database. If I go back to the tutorial and copy the query that we have here,
51:09
this is a bit more interesting query. So let's execute this. All right, what do we have here? We have players, so plural.
51:20
We're gonna limit them to five, just five players. We're gonna get the players with age lower than 20, so younger than 20, and we're gonna sort them by the overall. I don't think I need a sort by actually, so let's remove that.
51:42
And we have some players, I guess. Let's get the overall as well. Yep, we have the overall. Now I can sort by overall descending. This was all generated for me by Atlas,
52:01
by the platform, using the schema that they wrote. So it knows that overall is a numeric field, and it generated the sort by overall descending and ascending. All right, so we have Pedro Gonzales Lopez from Barcelona, overall 81, Baka Yosaka with 80, and so on.
52:23
We have just five results as well. This is the API. This is what we needed to build. It was like that simple. All right, so we have the API actually,
52:43
and the next thing that we need to do is make it work with the Atlas search functionality that we saw earlier. So our end goal with this section is to be able to execute this query, search players. Now search players is not a collection in the database,
53:01
so we can't use it right now. So we're gonna create a custom resolver. A custom resolver will be a function that executes some custom code, and it understands this input. So we're gonna have query, operator, path, everything that we had in the Atlas search query, and it returns the data that we need.
53:23
So how do we do that? We need to create a serverless function. Let's go ahead and copy the function. So I can go here to functions right underneath GraphQL, create a new function.
53:41
I'm gonna call this search. We don't need to change anything else, but we can have authentication if we want, and write the implementation here. Now this is some JavaScript code, but it's quite basic. Basically, we get the parameters from the input from this argument that we have here.
54:01
It's an object, so we need to extract them. Then we construct the different pipeline stages. So we have construct search stage, construct limit stage, and we don't need project. The API will handle this for us. Finally, we query the collection.
54:21
So players is our collection. We say players.aggregate pipeline, convert this to array finally, and return the results. I'm not gonna go through the whole code. You have the source later if you wanna see it. All right, so let's save this function.
54:45
And now we just need to link the function to the GraphQL API, use it as the end point for our custom resolver. So I went back to GraphQL. Now I need to go to custom resolvers
55:00
and add the custom resolver. Search players. The parent type is query because it's gonna be a get request. We need to select the function, the payload. So the return type will be an existing type list,
55:23
so an array of some existing type, yeah? It's not default. It's something that I implemented, but you can get it from here. It's universal kind of. It's not specific to the players, but yeah.
55:42
I pasted the code. Yeah, but good question. Thank you for that. Yeah, so the final thing that we need here, so we have the return type. We need the input type, and the input type is specific. We saw that we have some specific arguments. So we're gonna specify a custom type, and I will copy this again from here.
56:06
Where is it? Okay, so this is the input type. It's nothing special. It's just like limit is a number, fuzzy max edit is a number, path is a string, stuff like that. I just put my laptop on do not disturb.
56:30
Sorry, all right. I'm gonna save the draft, review and deploy again, and let's go ahead and try search players, okay?
56:48
This is the query. Again, we are searching for Chesney using a wildcard operator. We are searching using the path name,
57:00
and we're gonna limit the results to five. And yeah, we have Srebni and Chesney as results. It's that simple. We can also implement. This is one of the exercises in the workshop. So if you go here, the bonus challenge is implementing custom scoring for this function.
57:22
So if you wanna play with it later, you can extend the function to have custom scoring. All right, final thing. How do we use this in an actual application? So first, I need some authentication.
57:43
By default, this is completely locked down. No one can connect to it. So I need to enable authentication so that people can use it. And you see that we have a bunch of authentication providers here. We have anonymous login. We have email password, social login, API keys, JWT,
58:00
or even you can execute your own custom code. Now, you may have guessed it. I will use anonymous authentication because it's just very, it's the most simple one. And we have like 10 minutes, I think. Okay, so we enabled anonymous authentication. Now everyone can connect to this API.
58:23
So what else do we need? I need to open the actual application. And for this, I'm using a web IDE called Sandbox. So this is my actual application in called Sandbox.
58:41
And I will be using it to connect to the API. So we need to do two things. So first, insert the Atlas app ID on line 17 in index.js. Line 17, we have app ID. This app ID, I should get it from here. Actually, if I go to the homepage,
59:01
you'll see this is app ID. So I can just copy it and insert it. It's probably broken. I need also the GraphQL endpoint. So I will go to the GraphQL tab. And I showed you earlier, this is the endpoint.
59:21
And now I will open code Sandbox because it didn't install the dependencies properly. That's why it's not working.
59:48
It actually works. It's just this preview that's not working. But I added the app ID, the GraphQL endpoint, and now the app is actually connected to the GraphQL API.
01:00:01
Let's see, what else do I have? I have wildcard, because this is the function that I implemented, so I can search for Chesney, and it works again. Yep, that's, I think, it. There is also, again, a bonus challenge, if you'll try this again afterwards.
01:00:23
You can implement custom scoring by connecting your custom scoring function from the previous bonus challenge. That's everything I wanted to show you today. Let's wrap this up. Again, this is the link for the workshop. There is no tracking or whatever.
01:00:42
I will just see how many people clicked on it, so please scan it so I can come again next year. That's everything I wanted to show you again. I hope you enjoyed it. I'll be here, so talk to me after the session. And if you have any questions, or I don't think you're shy, but if you are, you can
01:01:00
find me on LinkedIn or Twitter, and I hope you enjoyed it. Thank you. Thank you, Irmira. Do we have any questions? Yeah, I think it's... Just because of the online, yeah.
01:01:23
So I've never heard of MongoDB Atlas before, actually, so thank you for presenting that. I assume that all of that Atlas functionality only ever works with the cloud-hosted MongoDB clusters, right? Some of it, yeah. Atlas Search, yeah, for sure.
01:01:43
There are certain functionalities that can work outside of Atlas. For Community Edition, it's like a bigger subset for Enterprise Edition, which is like a self-hosted cloud MongoDB database. But yeah, if you have something specific in mind, you can ask me or...
01:02:03
Not really. All right, thank you. Sure. Yeah. Great talk, by the way. So I have a question. When you implemented the custom function, you specified the database and the collection,
01:02:20
right? Does that mean that we can use all of the databases that we connected and then have cross-database queries? Yes, that's true. That's great. Thank you. Yeah. Do we have any other questions? Yeah. I'll just quickly check online.
01:02:41
I haven't had anyone yet, but let's just check. Okay. No questions online. If you eventually get to come up with any questions, I think, Mira, are you still around very much around? Oh, yeah. Yeah. So you can always connect with our outside and ask your questions.
01:03:06
I hope you had a good time with her and yeah, thank you very much. Thank you so much. Mira.