We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Building a Semantic Search Application in Python, Using Haystack

00:00

Formal Metadata

Title
Building a Semantic Search Application in Python, Using Haystack
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
So much of our knowledge is recorded as textual data. The knowledge is there, but extracting insights out of it is a challenge. Imagine the time you spend trying to get to that one piece of information that you know is buried somewhere in your piles of documents. In this presentation, we will approach this problem by building our own semantic search application in Python, using Haystack. Haystack is an open source NLP framework and its key building blocks support a variety of semantic search pipelines. In this presentation, we will walk through one particular application of semantic search: question answering. We will also have a look at: - What tasks semantic search enables - Key building blocks - How to leverage Haystack’s open source tooling to use the latest resources in NLP
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
Software frameworkMalwareOpen sourceBitSoftware developerSemantics (computer science)Cartesian coordinate systemComputer animation
InformationBitLevel (video gaming)Focus (optics)Computer animation
Query languageResultantPhysical systemBitType theoryFamilySearch algorithmQuery languageComputer animation
Time evolutionFormal languageQueue (abstract data type)Endliche ModelltheorieFormal languageEndliche ModelltheorieFunctional (mathematics)Diagram
Vector graphicsEndliche ModelltheorieFormal languageProduct (business)GoogolComputer networkTranslation (relic)Information retrievalRankingoutputPredictionTime evolutionDigital filterQuery languageContext awarenessLimit (category theory)LengthTask (computing)Endliche ModelltheorieDemosceneVector spacePerfect groupFormal languageInformation retrievalType theoryDifferent (Kate Ryan album)Electric generatorContext awarenessQuery languageQuicksortSearch algorithmTerm (mathematics)Coefficient of determinationRepresentation (politics)Computer-assisted translationSemantics (computer science)Cartesian coordinate systemCASE <Informatik>InfinityEvoluteBitTransformation (genetics)Computer animation
Endliche ModelltheorieFormal languagePersonal digital assistantOpen sourceBridging (networking)Different (Kate Ryan album)Software developerFormal languageEndliche ModelltheorieOpen sourceCASE <Informatik>Machine learningComputer animation
Data storage deviceComputer configurationDisintegrationType theoryTable (information)Different (Kate Ryan album)Connectivity (graph theory)INTEGRALOpen sourceEndliche ModelltheorieSubject indexingInformation retrievalSoftware frameworkCartesian coordinate systemFormal languageComputer animation
Computer configurationVector spaceAnalog-to-digital converterComputer filePreprocessorDependent and independent variablesBitPopulation densitySoftware frameworkInformation retrievalMathematical optimizationSubject indexingDecision theoryComputer fileType theoryConnectivity (graph theory)Vector spaceEinbettung <Mathematik>PreprocessorFormal languagePoint (geometry)Cartesian coordinate systemBuildingRoutingQuery languageComputer animation
Multitier architectureVertex (graph theory)Maxima and minimaBitEndliche ModelltheorieoutputQuery languageInformation retrievalEinbettung <Mathematik>Function (mathematics)Representation (politics)FreewareVector spaceComputer animation
Data modelVector spaceMaxima and minimaEinbettung <Mathematik>Information retrievalGame theory1 (number)Einbettung <Mathematik>Data storage deviceRepresentation (politics)Vector spaceWebsiteComputer animation
MIDIEinbettung <Mathematik>Information retrievalEinbettung <Mathematik>Representation (politics)Data storage deviceEndliche ModelltheorieVector spaceComputer fileSubject indexingComputer animation
Endliche ModelltheorieContext awarenessComputer animation
Physical systemChatterbotBookmark (World Wide Web)Multiplication signEndliche ModelltheorieQuery languageTemplate (C++)InformationBitForm (programming)Computer animation
Endliche ModelltheorieFormal languageConnectivity (graph theory)CASE <Informatik>Computer animation
Information retrievalStandard deviationAugmented realityComputer animation
Asynchronous Transfer ModeEndliche ModelltheorieProbability density functionOpen setDampingMultiplication signInformation retrievalInformationKey (cryptography)Context awarenessData storage deviceData conversionDifferent (Kate Ryan album)BitType theoryComputer fileVector spaceElectric generatorSlide ruleDimensional analysisConnectivity (graph theory)Einbettung <Mathematik>PreprocessorSubject indexingRepresentation (politics)
Data storage deviceAsynchronous Transfer ModeEinbettung <Mathematik>Multiplication signInternet service providerCASE <Informatik>Multiplication signContext awarenessEndliche ModelltheorieFormal languageQuery languageComputer animation
Context awarenessEndliche ModelltheorieAugmented realityInformation retrievalOpen sourceSoftware frameworkTask (computing)Context awarenessCASE <Informatik>Endliche ModelltheorieInformation retrievalElectric generatorAugmented realityComputer animation
Server (computing)Open setRegular graphOpen sourceTwitterLattice (order)QR codeComputer animation
Computer animation
Program flowchart
Transcript: English(auto-generated)
All right, everyone. Can we get a big welcome to Tuanna? Can you hear me?
Yes. Great. So if anyone was here for the talk before, just a disclaimer, I'm not as good a public speaker. I do not think I'd enjoy malware so much, but it's all downhill from there, so just FYI. All right. So I'm going to be talking about building a semantic search application in Python, and specifically we're going to be using an open source framework called Haystack, and
that's why I'm here. So a bit about me, I'm a developer advocate at Deepset, and we maintain Haystack. And yeah, so this is some information about me, but let's just dive right into it. So the agenda I'm going to follow, I'm going to try to keep the NLP stuff quite high level, and focus on the how to build bit, but I do have to give a bit of a high
level explanation, so I'm going to do a brief history on what we mean by semantic search. Please do not judge me for this example, Kardashian sister. So let's assume we have a bunch of documents, and let's see what would happen if we do some keyword search on it, and let's say we've got the query Kardashian sisters. You might get something a bit like this, which is great, and you can see that there's
some clever stuff going on here, sisters may be associated with siblings and family as well. Keyword search is still very widely used, but this is the type of result you might get from a corpus of documents you might have. But what if that's just not enough? What if I want to be able to ask something like, who is the richest Kardashian sister?
How do I make the system understand what I'm trying to get to? So for that, let's have a look at this. There might be some names you've already seen here, especially the last one there. I think everyone and their grandparents have heard of this by now, ChatGPT. So these are language models. I'm going to briefly walk through where they get such impressive functionality from.
So most of them are based on what we call transformers. And what those are doing is what I try to depict at the top here. So imagine that thing in the middle as the language model. And very, very simply put, obviously every model does something a bit different, or
for slightly different use cases, let's say. Given a piece of text, they will produce some sort of vector representation of that text. They're trained on very vast amounts of text data, and then this is what we get at the end of the day. And this is cool because it's enabled us to do many different things.
We can use those vectors to compare them to each other, like dog might be close to cat but far away from teapot, for example. And that's enabled us to do a lot of different things like question answering, summarization, what we call retrieval, so document retrieval. And it's all thanks to these transformers. And a lot of these use cases are often grouped under the term search because
actually what's happening in the background is a very clever search algorithm. So question answering and retrieval specifically can be grouped under search. All right, how does this work? And I'm very briefly gonna go through what these different types of models do,
and how they do what they do. And I'm gonna talk about the evolution from extractive models to now generative models like ChatGPT, for example. The very simple one, and we're going to build our first semantic search application with this type of model, is often referred to as the reader model, simply a question answering model.
Very specifically, an extractive question answering model. And the way these work are given a piece of context and query. They're very good at looking through that context and finding, extracting the answer from that context, but it does need that context. Obviously, there are some limitations to these models, because they're limited by input length.
I can't give it just infinite amounts of data. But we have come up with ways to make that a bit more efficient. And we've introduced models that we often refer to as retriever models or embedding models. These don't necessarily have to be language models. I'm going to be looking at language models.
It could also be based on keyword search that we saw before. But what they do is they act as a sort of filter. So let's say you've got a bunch of documents. Let's say you've got thousands and thousands of documents. And the retriever can basically say, hey, I've got this query, and this is the top five, ten most relevant documents that you should look at.
And then that means that the reader doesn't have to look through anything. So we actually gain a lot of speed out of this. All right, finally, this is all the hype today. And you'll notice, well, one thing you should notice is you see that the document context, anything like that, I've chopped it off. It's just a query.
So these new language models, they don't actually need context. You can give it context, but it doesn't require context. And this is very cool because they produce human-like answers. What they're trained to do, the task to do, is not extracting answers, it's generating answers. And I just want to point out, there are two things here.
It doesn't necessarily have to be answers. So I'm going to be looking at an answer generator, but it can just be prompted to produce some context. It doesn't necessarily have to be an answer to a question. So we've been seeing this, maybe you've seen some of these scenes lately.
So this is ChatGPT again on the theme, who is the tallest Kardashian sister. It hasn't just extracted Kendall for me. It says, the tallest Kardashian sister is Kendall Jenner. Perfect. But let's see what happens if it's not like a question. This is not my creativity, by the way, but I think it's amazing. Write a poem about Fostom in the style of Markdown changelog.
This is what you get, there you go. All right, so these language models are readily available. You might have already heard these names, OpenAI, Kahir. They provide these increasingly large language models. There is a difference when we say language model and large language model, but we'll leave that to the side for now.
Let's not talk about that. There are also many, many, many open source models on Hugging Face. And if you don't know what Hugging Face is, I think very simply put, I like to refer it to sort of like the GitHub of machine learning. So you can host your open source models and other developers can use them, use them in their projects, or even contribute to them. And what's really cool about them, like I said, your search results.
Stop becoming just simple search results, they are human-like answers. So now let's look at how we use these language models for various use cases. For that, I want to talk about Haystack, this is why I'm here. So Haystack is an open source NLP framework built in Python.
And what it achieves is basically what this picture is trying to show you. You're free to build your own end-to-end NLP application. And each of those green boxes, a high-level component in Haystack. There are retrievers that we looked at, there are readers that we looked at.
We'll look at some different ones as well. And each of these are basically the main class, and you might have different types of readers, different types of retrievers. For example, there could be a reader that is good at looking at paragraphs and extracting answers, but there might be a reader type called table reader that's good at looking at tables and retrieving answers from that.
There are integrations with Hugging Face, so that means you can just download a model off of Hugging Face. But also OpenAI for here, obviously you need to provide an API key, but you are free to use those as well. Alongside that, a building an NLP application isn't just about the search component.
You presumably have lots of documents somewhere, maybe the PDFs, maybe the TXDs. So there are components for you to build your indexing pipeline that we call, so that you can write your data somewhere in a way that can be used by these language models. All right, so some of those components. We already talked briefly about the reader and the retriever.
We're going to be using those. There could be an answer generator, a question generator. We're not gonna look at that today, but that's really cool, because then you can use those questions to train another model, for example. Summarizer, prompt node, we're gonna very briefly look into that. But you get the idea. There's a bunch of components, and each of them might have types under them.
You can use data connectors, file converters, as mentioned. Pre-processing your documents in a way that's going to be a bit more useful to the language model, for example. And of course, you need to keep your data somewhere. So you might decide you want to use Elasticsearch or Opensearch, or you might want to use something a bit more vector optimized. And these are all available in the Haystack framework.
So this is the idea of, so I talked about the nodes, but the idea behind building with these nodes is to build your own pipeline. This is just an example. You really don't have to pay attention to the actual names of these components. But to give you an idea, you are free to decide what path your application should take based on a decision.
For example, here we have what we call the query classifier. So let's say a user enters a keyword. There's no point in doing fancy embedding search, maybe. So you might route it to keyword search. If the user enters something that's more like a human-formed question, you might say, okay, do some what we call dense retrieval or embedding retrieval.
So that's just an example. Then finally, I'm not going to get into this today at all, but let's say you have a running application. You can just provide it through REST API, and then you're free to query it, upload more files, and index them, and so on. All right, so let's look at how that might look.
First thing you do is install farm-haystack. If you're curious as to why there is farm at the beginning there, we can talk about this later, it's a bit about the history of the company. And then we just simply initialize two things, the retriever. Here we specifically have the embedding retriever. And notice that I'm giving it the document store, so
the retriever already knows where to look for these documents. And then we define an embedding model. So I mentioned that these retrievers could be a keyword retrieval, or it could be retrieval based on some embedding representation. So here we're basically saying, use this some name, some model name, so it's just a model, to create the vector representations.
And then I'm initializing a reader, and this is a very commonly used, let's say, question answering, extract question answering, again, some other model, and these are both off of Hugging Face, let's imagine. So we've got this retriever, and it's connected to a document store, and we've got a reader.
So how would we build our pipeline? We would first initialize a pipeline, and then the first thing we add is the first node, and we're saying, retriever. I'm first adding the retriever. And that input you see, inputs query, is actually a special input in Haystack. And it's usually indicating that this is the entry point. This is the first thing that gets the query.
So okay, you've told it, you've got the query. I could leave it here, and this pipeline, if I run it, what it's doing is given a query, it's just dumping out data documents for me. That's what the retriever does. It's just going to return to me the most relevant documents. But I want to build a question answering pipeline, so I would maybe add a second node, and I would say, now,
this is the question answering model node. And anything that's the output from the retriever is an input to this node. And that's simply it. You could do this, but you could also just use pre-made pipelines. This is a very common one, so we do have a pre-made pipeline for it.
And it's just simply called an extractive QA pipeline. And you just tell it what retriever and what reader to use. But the pipeline I built before, that's just a lot more flexible. I'm free to add any more nodes to this. I'm free to extract any nodes from this. So it's just a better way to build your own pipeline.
Then simply what I do is I run what now looks like a very random question, but we'll get to it. And then hopefully you have a working system, and you've got an answer. Great. All right, so I'm going to build an actual example. So I want to set the scene, and I was very lazy. This is actually the exact example we have in our first tutorial on our website.
But let's assume we have a document store somewhere, and it has a bunch of documents, TXT files, about Game of Thrones. I'm going to make this document store Fae's document store. This is one of the options. So let's assume I've got Fae's document store. And of course, I want to do question answering, and I want this to be efficient. So we're going to build exactly that pipeline we just saw before,
retriever followed by a reader. Specifically, I'm going to use an embedding retriever. So these are the ones that can actually look at vector representations and extract the most similar ones. And then we are going to have a reader, simply a question answering node at the end. So how would that look? I first initialize my document store.
This is basically, I'm not going through the indexing one just now. We'll look at that in a bit. But let's assume that the files are already indexed, and they're in that Fae's document store. And then I've got a retriever. I'm telling it where to look, and look at my document store. And I'm using this very specific embedding model of a hugging face.
I then tell the retriever to update all of the embeddings in my document store. So it's basically using that model to create vector representations of all of my TxD files. And then I'm initializing the reader, same thing that we did before. I'm just using a specific model of a hugging face.
This is trained by the company I work for, too. And then I do the exact same thing I did before. I'm just creating the pipeline, adding the nodes. And then I run, maybe, who is the father of Arya Stark. And this is what I might get back as an answer. Now the thing to notice here, the answers are very Eddard, Ned.
And that's because it's not generating answers. It's extracting the answer that's already in the context. So if you see the first answer below, you'll notice that there's Eddard in there. And this pipeline and this model has decided, this is the most relevant answer to you. I could have printed out scores, you can get scores, I just haven't here. And then I said, give me the top five.
And the first two, three, I think, are correct. So, okay, we've got something working. But what if I want to generate human sounding like answers? Eddard is pretty, okay, I've got the answer, but maybe I want a system. Maybe I want to create a chatbot that talks to me. So let's look at how we might do that.
All right, so this is going to be a bit of a special example, because I'm not going to build a pipeline. The reason for that is, as mentioned before, these generative models don't need context, right? So I should be able to just use them. We've got this node called the prompt node. And what this does is actually a special node, because you can morph it based on what you want it to do.
You might have heard recently this whole terminology around prompt engineering. And that's basically used with models that are able to consume some instruction and act accordingly. By default, our prompt node is basically told, just answer the question, that's all it does.
But you could maybe define a template for it, what we call a prompt template. So I could have maybe said, answer the question as a yes or no answer. And it would give me a yes or no answer, but obviously I need to ask it a yes or no question for it to make sense. Anyway, so I'm just using it like this, like the pure form. And I'm using a model from OpenAI.
Obviously I need to provide an API key. And I'm using this particular one, text-davinci-003. I actually ran these yesterday. So these are the replies I got. And this particular one I ran a few times. So the first time I ran, when is Milos flying to Frankfurt? By the way, spoiler alert, Milos is our CEO.
So I know who Milos is, and I know when he's flying to Frankfurt, or when he flew to Frankfurt. And I get an answer. Milos' flight to Frankfurt is scheduled for August 7th, 2020. This is really convincing sounding. Fine, okay. But this one was actually quite impressive. Again, if I ran the same exact query with this model,
I got, it's not possible to answer this question without more information. This is actually really cool because clearly this model sometimes can infer that, hey, maybe I need more information to give you an answer that what we now refer to as hallucination. Maybe you've heard of that term also. These models can hallucinate.
They're tasked to generate answers. It's not tasked to generate actual answers for you that are truthful. Anyway, let's say, when is Milos traveling somewhere? I love this answer. When he has the time and money available to do so.
And then I guess I don't know which one's my favorite. This one or the next one. Who is Milos? Gosh. A Greek island. Lovely, okay. But the problem here is this is very, you know, I could believe this. It's very like realistic, these answers.
So we're going to look at how we can use these large language models for our use cases. And what we're going to do is basically we're going to do exactly what we did for the extractive QA1. And we're going to use a component that is quite clever because it's been prompted to say, generate answers based off of these retrieved documents
and nothing else. It can sometimes not work well, but there are ways to make it work well. And we won't get into like all the creativity behind it. So I'll show you the most basic solution you might get. But this is going to be what we do. It's the same exact pipeline as before. The reader has been replaced by the generator.
So I actually have Mihos's ticket to Frankfurt. It was 14th of November. And as a bonus, I thought I'd try, this is my ticket, my Eurostar ticket from Amsterdam to London and back. So I got these and they are PDFs.
And so now I'm going to start defining my new components. I've got the same FICE document store. Embedding dimensions is not something you should worry about for now. And I'm defining an embedding retriever. Here what I'm doing is, again, I'm using a model by OpenAI.
So I'm using an API key. So this is the model I'm going to use to create vector representations and then compare it to queries. And this time I'm not using the prompt node. I'm using that clever node there called the OpenAI answer generator. And you might notice it is the exact same model as the one before.
We're going to briefly look at indexing. So we've got the PDF text converter and preprocessor. And let's go to the next slide. As mentioned before, there are pre-made pipelines. So I could have just defined generative QA pipeline and told it what generator and retriever to use. But let's look at what it might look like if I were to build it from scratch.
And first, you see the indexing pipeline. So if you follow it, you'll notice that it's getting the PDF file and then writing that to a document store given some pre-processing steps. And I then write my and Milosz's tickets in there. And then querying pipeline is the exact same as the extractive QA pipeline you saw before. All but the only difference is
the last bit is the answer generator, not the reader. This time though, it does have some context and it does have some documents. What did I get when I ran the same two questions? I got, who is Milosz? He's not a Greek island. He is the passenger whose travel data
is on the passenger itinerary receipt. Now this is the only information this model knows. So it can't tell me he's my CEO because I haven't uploaded any information about my company. So don't make something up, just tell me what you know. If I run when is Milosz flying to Frankfurt, I get Milosz is flying to Frankfurt
on the correct date and time. And then I had that bonus in there. Who is traveling to London? I would get Twana Celik is traveling to London. Now, if I were to run, let's say who is, let's say, when is Alfred traveling to Frankfurt?
What I haven't shown you here, because I think it goes a bit too deep into building these types of pipelines, for the open AI answer generator, I could actually provide examples and example documents. So just in case I'm worried that it's gonna make up something somewhere that there's a time
that this Alfred who doesn't exist is traveling to Frankfurt, I can give it some examples saying, hey, if you encounter something like this, just say I don't have the context for it. So I could have just run query pipeline.run when is Alfred traveling to Frankfurt and it would have told me I have no context for this. So I'm not gonna give you the answer.
This model that we saw does do that sometimes. The first example we saw, it did say I don't have enough context for this, but not all the time. So this is how you might use it for your own use cases. You might use large language models for your own use cases and how you might mitigate them hallucinating. So to conclude,
extractive question answering models and pipelines are great at retrieving knowledge that already exists in context. However, generative models are really cool because they can generate human-like answers, but combining them with a retrieval augmented step means that you can use them very specifically for your own use cases.
Haystack, as I mentioned, is fully open source. It's built in Python and we accept contributions every day. Contributions literally welcome and I would say every release we have a community contribution in there. Thank you very much. And this QR code is our first tutorial. It is an extractable. It's the non-cool one, but it is a good way to start.
Thank you very much. Thank you, Luanna.
We have a few minutes for questions. If you have questions for Luanna, we have three minutes for questions, else you can also find her afterwards.