We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

ChatGPT is lying, how can we fix it?

00:00

Formal Metadata

Title
ChatGPT is lying, how can we fix it?
Title of Series
Number of Parts
60
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
ChatGPT was a revolution nobody was ready for. All the social channels have been flooded with prompts and answers which look ok at first glance but turn out to be counterfeit. Factuality is the biggest concern about Large Language Models, not only the OpenAI product. If you build an app with LLMs, you need to be aware of this. Retrieval Augmented Language Models seem to be the solution to overcome that issue. They combine LLMs' language capabilities and the knowledge base's accuracy. The talk will review possible ways to implement it with humans in the loop.
Elasticity (physics)Musical ensembleEndliche ModelltheorieProgramming languageFocus (optics)BitDiagramComputer animationMeeting/InterviewLecture/Conference
WaveEndliche ModelltheorieHypermediaCASE <Informatik>Programming languageDependent and independent variablesLecture/Conference
Software developerState of matterAssociative propertyAssembly languageDependent and independent variablesCASE <Informatik>MereologyRight angleRevision controlComputer animationLecture/Conference
Endliche ModelltheorieWave packetQuery languageProgramming languageInformationPhase transitionLecture/Conference
Endliche ModelltheorieProgramming languageDependent and independent variablesCurvatureCASE <Informatik>Process (computing)Wave packetBit
Process modelingPairwise comparisonToken ringMaxima and minimaConsistencyBinary fileDependent and independent variablesReinforcement learningData modelProcess (computing)Endliche ModelltheorieNeuroinformatikState of matterTunisWordSoftwareSet (mathematics)Presentation of a groupInformationInternetworkingCASE <Informatik>Wave packetPhysical systemDependent and independent variablesAuthorizationQuicksortExpected valueProgramming languageWeb crawlerShared memoryMixed reality1 (number)Real numberMereologyPhase transition
Endliche ModelltheorieInferenceProgramming languageSoftwareObject-oriented programmingWave packetPhase transitionRule of inferenceToken ring
Parameter (computer programming)Context awarenessComputer networkComponent-based software engineeringPredictionEndliche ModelltheorieContext awarenessPredictabilityWave packetSoftwareConnectivity (graph theory)Programming languageData conversionFormal grammarOrder (biology)Rule of inferenceState of matterOnline chatPhase transitionInterface (computing)Token ringStatement (computer science)Physical systemComputer animation
Endliche ModelltheorieMultiplication signEndliche ModelltheoriePoint (geometry)Different (Kate Ryan album)Programming languageProcess (computing)Wave packetArtificial neural networkCausalityPhase transition
Slide ruleTerm (mathematics)Physical systemProgramming languageVirtual machineToken ringMoore's lawProcess (computing)CASE <Informatik>Web pageExpected valueWave packetEndliche ModelltheorieInformationReal-time operating systemMachine learningReading (process)Web browserProduct (business)Computer animation
Programming languageEndliche ModelltheorieDifferent (Kate Ryan album)Expert systemRoboticsTask (computing)Focus (optics)Task (computing)OracleToken ringExpected valueEndliche ModelltheorieProgramming languageCartesian coordinate systemOrder (biology)
Context awarenessCartesian coordinate systemProgramming languageEndliche ModelltheoriePhysical systemTask (computing)Order (biology)Context awarenessQuicksortKnowledge baseProcess (computing)TunisInformationLecture/Conference
WaveInformationTotal S.A.State of matterRepresentation (politics)Bit error rateSingle-precision floating-point formatEndliche ModelltheorieInformationTask (computing)Limit (category theory)Knowledge baseXMLMeeting/Interview
WaveEndliche ModelltheorieLimit (category theory)WordInformationOpen sourceToken ringNumberForm (programming)Single-precision floating-point formatOnline chatContext awarenessProgramming languageWindowLecture/Conference
Context awarenessContext awarenessWindowQuery languagePoint (geometry)Endliche ModelltheorieKnowledge baseBitLengthPhysical systemShared memoryLimit (category theory)Confluence (abstract rewriting)
WavePhysical systemQuery languageTerm (mathematics)System callMehrplatzsystemMultiplication signTelecommunicationToken ringMeeting/Interview
Programming languageCASE <Informatik>Semantics (computer science)Endliche ModelltheoriePhysical systemContext awarenessExtension (kinesiology)Information retrievalAugmented reality
Point (geometry)Similarity (geometry)SpacetimeVector spaceArtificial neural networkType theoryDot productVector spaceSemantics (computer science)Different (Kate Ryan album)DistanceArtificial neural networkScaling (geometry)Knowledge baseTrigonometric functionsComputer configurationVariety (linguistics)Endliche ModelltheorieMultiplication signComputer animation
Similarity (geometry)SpacetimeVector spaceArtificial neural networkPoint (geometry)Alpha (investment)Vector graphicsComputer networkEndliche ModelltheorieProgramming languageSemantics (computer science)Information retrievalVector space1 (number)Physical systemAdditionLimit (category theory)Filter <Stochastik>Software developerServer (computing)Order (biology)Query languageLibrary (computing)Representation (politics)WordEinbettung <Mathematik>Roundness (object)Wave packetSelf-organizationProcess (computing)Subject indexingConnectivity (graph theory)Virtual machineInteractive televisionKnowledge baseDatabaseInformation privacyQuicksortArtificial neural networkParameter (computer programming)Cartesian coordinate systemTunisMultiplication signClosed setMedical imagingProduct (business)InformationPhase transitionMagnetic stripe cardMultiplicationOpen setArithmetic meanSpacetimeComputer animation
InformationFormal verificationOracleEndliche ModelltheorieProgramming languageTerm (mathematics)Dependent and independent variablesTrailNumbering schemeCASE <Informatik>Electronic mailing listInformationTask (computing)Different (Kate Ryan album)Context awarenessIdentifiabilityQuicksortSource codeOrder (biology)ResultantNumberProcess (computing)Extension (kinesiology)Slide ruleQuery languageStrategy gameLecture/Conference
PlotterInformationConstraint (mathematics)Vector spaceProcess (computing)DatabaseEvent horizonEndliche ModelltheorieProgramming languageKnowledge baseCASE <Informatik>Proper map
ChainPrice indexSubject indexingSoftware testingProgramming languageLevel (video gaming)Revision controlCycle (graph theory)Library (computing)Cartesian coordinate systemExterior algebraChainOpen sourceProjective planeEndliche Modelltheorie
Dynamic random-access memorySoftware developerTwitterElasticity (physics)QR codeMessage passingProfil (magazine)BitPresentation of a groupInformation retrievalComputer configurationComputer animationLecture/Conference
Open setWaveInformation retrievalBitFormal verificationDependent and independent variablesPhysical systemCausality
Query languageParameter (computer programming)Endliche ModelltheorieInformation retrievalDifferent (Kate Ryan album)CASE <Informatik>Process (computing)Programming languageDatabaseAugmented realityLecture/Conference
WaveNatural languageCASE <Informatik>Hybrid computerTesselationComputer animation
Musical ensembleDiagram
Transcript: English(auto-generated)
Yeah, thanks a lot. Definitely, ChatGPT is one of the best buzzwords if you want to apply for this kind of conference. I'm so excited to see what next year is going to bring.
But let's just focus on large language models for those 30 or 40 minutes and see whether they are really useful, or is it still just a toy that we could play a bit with but nothing really else? Because we have all seen that large language models,
including ChatGPT, were not only used by people working in tech, such as you, but also by the traditional media, some journalists. So that's definitely a topic that will be hot in the upcoming months. And there are probably more and more use cases for it.
But we also need to be sure that there are some limits and that there are some ways of how to overcome them. But the main issue that we have with ChatGPT-like models is the fact that they produce some responses which may sound reasonable at the first glance,
but they turn to be counterfactual. I tried with asking ChatGPT. It is ChatGPT version 3 because it is becoming more and more complicated to find a good example that would be completely against the common knowledge. I just asked for the previous capital of Germany.
And for those who are not German, that may sound like a reasonable response, right? Frankfurt could have been the previous capital of Germany. Why not? But it turns out to be false because Bonn was actually the previous capital of West Germany, at least,
because Germany was divided into West and East part. So that's basically a simple case in which ChatGPT failed to answer correctly to give a prompt. But that's actually the fact that I was able to get from Google just by providing a simple query.
So this is basically something that our model should have been exposed to during the training phase because that's how those large language models are being trained. But this is also something that it couldn't remember properly for some reasons. And this is, yeah, but that should be also
like the correct answer to my prompt. But OK, I could accept that it may be just simply wrong, but it will be way better if it could just simply provide me the answer that I don't know or I'm not sure I do not
have that information. And this is something that we call hallucination. And hallucination is a response of large language model which is not justified by the data that it was trained on. So this is basically a case that we really need to avoid,
but we have no idea of how to do that yet. We need to focus a bit on the training process because that's the reason why those large language models tend to confabulate. And this is basically a screenshot from a great talk by Andrej Karpathy. This is a state of GPT that has been released recently.
And that describes the whole process of training GPT-like models. So the first part, the pre-training, is something that is the most important piece here because it's being done on some huge corpora of language textual data that is being scraped from internet.
Well, people used to say that we are training those models on the whole internet, but that's not exactly the case because they are using some sort of publicly available data sets like Common Crawl. This presentation also links to some data sets which has been used to train at least some of those.
And basically, as you may see, they're using trillions of words to train the pre-model that is being used as a base for some further fine tuning. And that's one of the reasons. We are taking the data of a low quality.
Well, my mom used to say that I should be reading books instead of looking at my computer and reading the internet, and she was right about that. And probably we should do the same for training models if we really care about the quality of the data that we put into. Well, any single book, at least those better ones, let's say, should go through some sort of review process.
So we can expect that the information that is being put inside in a particular book should be at least confirmed by somebody else, not only the author. So that would be definitely a better data set to train some general language models,
because they would rely on some factual information, not something that somebody decided to put into the internet. But that's actually something that we need to do, because it's just way easier. And this is basically a network that
is being trained to predict the next token, but nothing really else. This base model is then being used in some further fine-tuning. So then we are using less data, but of a higher quality. And that's actually where it learns to really work for some specific cases.
We have the supervised fine-tuning phase. Then our model can be also fine-tuned even further with some additional model in that process. We can create a real-world modeling system that would be ranking the responses returned by the system. Actually, this is a pretty complicated process,
but this presentation is available. So if you are interested in more details, please just watch it carefully, because that shares a lot of cool things about the training process of this kind of models. But there are actually three different models
that we could deploy. Actually, you can create a large language model based on this publicly available huge amount of data. And it will be also working pretty well for some cases. But if you put garbage in, you can expect garbage out. And that's exactly what happens with large language models.
The answers that are provided based on our prompts or queries may sound reasonable, but they are just predicting some next tokens based on the prompt and the rules or knowledge they learned during the training phase. So that's exactly the whole objective of the training,
and that's what those networks also do during the inference. There are two components which are included into the prediction. The first one is the internal state of such a network that was created during the training phase. And that surely includes some rules,
but those might be like grammar or language rules in general. Also including some, let's say, common knowledge. But there is no easy way of how to check what the model learned during this journey. That's actually one of the issues, because there is no way to validate that it really
was trained properly and can predict not only the next token, but also factual statements. And there is also a context that we put into the model. So whenever you send a prompt, whenever you query this kind of system, this prompt
is being used in order to provide those tokens as a response. And that includes the prompt that you put inside, but if you use an interface like ChatTPT, it also involves putting some previous conversation into the context, but that happens under the hood. You are not really aware of that,
but that's something that has to be done in order to include the context of the whole conversation, not only the latest prompt. And the other thing that is a cause of hallucinations in large language models is the fact that they see the word as if it was frozen.
That's basically because the training process is fairly similar to any different training of neural networks. We need to collect some data, and we need to cut it off at some point just to start the training. And that happened for ChatTPT. The cut-off time for ChatTPT was September 2021.
So if you ask some questions about anything that happened after, it may be simply struggle with that. And right now, ChatTPT already responds with some answers that I have the data set up to September 2021.
And if you are asking for the future, I cannot tell you that. But that was fine-tuned after this preview phase. Actually, these models guide a lot just because many people started sending those prompts. And they were just able to catch up with all the possible mistakes it might be doing.
And this is completely different if we compare it to humans. That's why I do not like calling that AI at all. This is like another machine learning system, maybe just a more sophisticated one. But still, I don't want to use the term AI to call a system that is trying
to predict the next token. That doesn't seem to be justified by the fact what I understand as intelligence. So contrary to humans, those large language models do not learn continuously. You might be interacting with them for a while, sending some different prompts into them.
And it may seem to remember those facts. But if you just refresh the page in your browser, you won't be able to retrieve the information that you provided before. And that's, in fact, because those models are not being trained in real time. That's actually a process that requires
lots of computational power. We would need to come back to the slide about the training process. But that's basically a matter of days in the best case. And it cannot be done by providing all the prompts because people may be interacting with those systems in a way that we would never
want the model to remember. And that's something that we cannot expect from those models, but I just feel that we have some high expectations. I really recommend reading the article mentioned here. It's all about our high expectations that we all have for those large language models.
That's exactly the same thing. They are trying to predict the next token. So they may be really good at some language tasks, but not necessarily be used as an oracle that we ask and question and expect to receive a correct answer.
Really great article, but there is also one lesson to be learned from that. We cannot expect this kind of models to be providing us factual answers, but we can use them in order to solve some language tasks. Because that's, in fact, something that they were trained for. And those language tasks include
rephrasing or summarization. And that's already something that may help us build some really cool application based on the large language models. And this is one of the possible solutions to avoid hallucination. In order to avoid them, we can think
about extending the context and rephrasing the task that we want this kind of system to solve. So it's no longer a knowledge-related task, but rather a language task that they should be hopefully
able to solve. Because that's something that already happened during the fine-tuning process. Some better quality data was used to solve this kind of task. So that's basically what we could expect to be working pretty well. And extending the context with some data requires us to possess this information already.
So you may ask, what's the reason for including this data if I already have the answer? But that's, in fact, not something that we need to have. We only need to have some sort of reliable knowledge base and a way of how to extract some relevant data
to be put into our prompt to extend the context. If I ask the following question and put just a single paragraph from Wikipedia, I just ask the same question. This is at the end of my prompt and put a single. That's actually the first paragraph of the whole article
about Germany, I think. Then the task becomes way easier. Because right now, the model can simply use the provided information to somehow summarize and answer my question by extracting a single piece of information that I wanted it to. But there are some limits, of course.
I cannot put whole Wikipedia into every single prompt. Some say it should be possible to put even your whole knowledge base if you work in a small company. But those models, and I might be wrong here because I couldn't find a reliable source for that information.
But I just found some comments on some forums. There is a limit of tokens, and this is the number of tokens you can provide into a single prompt of ChatGPT-like model. So that means a single token doesn't mean a single word. So it's even less if you count words. And it's quite hard to predict how many words you can put
because that's language dependent. But that basically means that we not only need to provide some relevant information into our prompts, but we also need to extract only those pieces which might be relevant. And that's a challenge that we are trying
to solve for ages so far. It's a problem of search and how to handle that in those new circumstances. There are some attempts to extend those context windows. Anthropic model claims to support 100,000 context
windows, which seem to be quite huge. But if you work in any company that has no Slack, some Confluence, or a different notion, or a different system that you use to share the knowledge,
share the internal knowledge of your company, then if you just combine all those texts together and calculate the length of them, it will simply turn out to be not enough. And there are some other issues that we have with those huge context windows. For example, if we just increase the context window,
we can actually pass some more data. So this is pretty useful. But we still need to know what to put inside the single prompt so we are not solving the problem of searching. We still need to find those relevant documents and put them inside our prompt.
We can just include some more examples. Instead of putting just three, we can maybe put, I don't know, 100. But still, there is a limit which is obviously a little bit higher. That's another point. We need to know what to put inside. Another problem that we cannot really expect our knowledge base
to be limited even to that high amount of possible documents. And this is pretty damn expensive. You want to use those models, you usually pay based on token that you used in your prompt. So if you, for every single query, would be putting like 100,000 tokens,
then every single call to that API would cost you about $1, if I remember that properly. So that's basically not something that you could afford in the long term if you support every single user query on your system using LLMs. But that also takes a lot of time,
passing those data into the prompt and sending it over some communication channel like HTTP. Definitely need more time than just passing, I don't know, 100 characters or even 1,000. So there is a solution that many people claim to be right now the only one of how to overcome those problems.
And this is retrieval of augmented language models. This is basically an extension of large language models. We are still using them, but we support this prompt formulation with putting some additional context in an automated way. So for that, we need to have a reliable search system.
And in this case, that has to be a semantic search because keyword-based search doesn't apply that well. But I will just try to describe it further on. So what I mean when I speak about semantic search? Basically, we are using deep neural models, which are capable of encoding any type of data
into some fixed dimensional vectors. And those vectors have this great ability to be close to each other if they represent a similar concept. So if you just put two different pieces of text by describing the same idea, you
may expect that the vectors produced by such a neural network will be close to each other in some way. We usually calculate this distance based on cosine distance or dot product, but there is a variety of options available. But 90% of the times, we are using cosine distance.
So this is basically it. So semantic search is all about comparing the vectors, but comparing the vectors on a really high scale because your knowledge base may be built with thousands of documents. And if you want to do it, you need to have a proper tooling.
So basically, semantic search is really important to create those retrieval-based, retrieval-augmented language models. And semantic search is the capability of deep neural networks to encode different texts describing the same idea to the vector space in which
those vectors will be close to each other. In that example, striped blue shirt made from cotton is exactly the same thing like cotton-made maritime shirt. So that's basically the idea that somebody had in mind while creating this query was exactly the same.
They just described it differently. And we can use that for our advantage because that can already capture some different queries, even though some words were not used in our documents. So this is basically it. The semantic search in opposite to keyword-based search
can already handle some synonyms and many languages at the same time. There are some pre-trained models available also provided as APIs by OpenAI or Cohere, but there are plenty of companies offering this kind of systems.
So you don't necessarily need to train your own models, but you can always fine-tune the existing ones so they reflect your knowledge in a better way. And in order to serve vector search in production, you need to use some sort of vector database. And Quadrant is one of the possible solutions.
It's written in Rust and mostly focuses on the performance. There are various ways of how to interact with it. You can run it on a single machine for the development proposals, but you can also seamlessly switch to cluster if you want to. And there are also some additional filters
possible to be applied on top of vector search. So let's say you are looking for a piece of cloth similar to the one which is presented on an image, but in addition to that, you also want it to be in a specific color or from a specific fabric. That can be also done and that cannot be easily captured by the vector, so you need some additional filters
to be also applicable. And this is how the interaction with large language models look like without the retrieval argumentation. We just send a prompt, and this is directly sent into the large language models. And it's up to you if you want to put some additional context or if you do not
have any idea of what should be put into it. And you may expect the large language model to be just hallucinating if you do not provide it. But if we just decide to extend the system to include this retrieval phase, we need another component. So we need an embedding model that
will be converting our prompts, our queries, into the vector representation. So you are free to choose whatever works for you. There is a huge room for experiments. And you need a knowledge base. And vector database, such as Quadrant, may play a role of this knowledge base because it may be indexing the vectors for your documents.
And those vectors might be then used to extract some relevant documents whenever a new prompt is being sent. So then when our user sends a prompt, it is being enriched with some relevant information, which is being sent along with the original query
into the large language model. This is actually a pipeline that is being implemented in many different libraries that are becoming more and more popular nowadays. But you can also do it on your own. That's not that fancy and not that complicated to be implemented. So how do we build this knowledge base? This is actually something that requires some offline processes.
So you can index all the documents that you have in your organization or the documents that you would like to use. And those might include some private data that, by any means, couldn't be included into the training phase, even this first round.
We divide some longer documents into chunks. And those chunks are being vectorized by our embedding models. And those vectors are eventually being indexed in the knowledge base along with the text that we're used to produce the vectors. So then whenever we receive a prompt, we can ask for some relevant candidate documents
and include them into the prompt. And we typically set a limit that we want k closest documents, k most relevant documents, which are available for our prompt. But it doesn't seem to be solving all the issues that we have with hallucinations
and large language models. There is still something else that we need to do. And this is all about prompt engineering. I really like the term. We are right now trying to find the people who are capable of talking to our oracles that we created before. That's exactly what may happen if you just
ask a really simple question to chat GPT. I asked for some countries which names start with R. And this is a response. I expected five countries. And the thing to know, there are only three countries on the whole world which names start with letter R.
But because I asked for five, model wanted to just make me happy because it was trained to reward proper answers. And that was the task. I wanted five answers, so it provided five. And that's an issue because I may be just thinking that this is a correct answer if I just
include it in my pipelines. If I would be directly using those responses for some further processing, I would just be relying on some fake information. However, if I just allow the model to just provide me
some shorter list if there are not enough data, not enough information, it changes its mind and tries to replace those obviously long entries, like Qatar in the previous slide,
with some republics, which is right to some extent, but still there are plenty of republics around the world. And that's not exactly something I wanted it to produce. So I also need to exclude those republics. I need to know how the model behaves in some specific cases. And I just need to make sure that it won't be making that mistake.
And finally, if I formulate my prompt in that way, it is able to produce me a list of results which are not only fulfilling my criteria, like I wanted five, but they are also true. And that's something that we still need to learn, how to provide the prompt in order
to be able to retrieve only factual information from our models. So prompt engineering is hard, I have to say. I was trying to figure out some different ways. And there are some cases in which you would be expecting the model to answer properly
to your queries, but it's not the case. We still need to experiment a little bit. And if you just change your large language model, let's say you were dealing with LLMs for a while, you were experimenting with charge GPT, and then you switch to, I don't know, LAMA, then it turns out that you need to learn that from the very beginning,
because the prompts have to be formulated differently. That happens if you just experiment with different models. But first of all, we need to allow our models to not provide the answer if they do not know it. And that's exactly what we need to put into the prompt, because they were rewarded to provide the answer. So there is no reason why they would be answering
that they don't know it. And that's something that we need to include. But you also need to be sure that the reasoning scheme of our models is correct. So that's a strategy that is quite common nowadays, asking for a reasoning scheme.
And if you include this kind of request into your prompt, you would be able to track step by step how the model was retrieving that information. At least you may expect it to be returning that information, because it might be still hallucinating on the reasoning scheme. But that's not something that we can avoid in all the cases.
But the most important thing is to turn the knowledge-based task, knowledge-oriented prompt, into a language task. So summarization or rephrasing of the documents seem to be a way of how to overcome those problems.
And it's also great to expect the model, to include that in our prompts, to expect the model to return the sources if you just provide N documents into the context. Then you may simply expect it to return, I don't know, some sort of identifiers or a number of the document that it
used to produce the answer. But there are still lots of challenges. How to keep our knowledge base up to date, because this is not fixed, and many events happen every single day. So we need to make sure that we have a process that can back up the knowledge base and provide the relevant information as soon
as it has to be provided. We also need to have some other constraints being applied on top of vector search. If you ask it for the president of US and the cutoff was done in September 2021, then it might be just simply wrong. But even though if you have a database
and you have all the information about the past presidents of US, if you just put all those data into the prompt, it might be just choosing either of them. So we need to have some additional constraints applied on top, and that strongly depends on the use case. One thing to note, quality assurance with LLMs
is not that easy. There is another process that people try to solve with another model being included in that whole pipeline. So a different model is just verifying the answers of the large language models. But still, we do not have any way
of how to do that properly. That's really tough. When it comes to tools which can be used for that, the most popular one nowadays is long chain. But if you have worked with long chain, you may know that they are trying to be really up to date with all the recent updates in the large language models, but it comes with a cost. It's quite hard to maintain a long chain application
because the release cycle is so fast that they try to release a new version every single day. And if you work with open source, you know that that should require some more effort to test the things thoroughly and just make sure that they do not break anything that is already working based on those libraries.
But high stack and llama index seem to be better alternatives. They are more mature projects, so to say, and they try to keep the release cycle in a proper way at some high level of quality. So that's definitely something I would recommend using.
But if you have any questions, I hope you still have a few minutes for them. If not, then this QR code will direct to my LinkedIn profile. Please just feel free to drop a message anytime. And here are my socials.
So thanks so much for your presentation. It was a bit daunting to get here, but you did very, very well. So I think there would be some questions. Who has some questions for Lucas about this?
Seems to be all clear. Ah. I have two questions. Thanks for the presentation, for overcoming the challenges. So my question is around, have you ever tried or heard of this option
of doing the retrieval afterwards? So basically I'm asking a rather vague question. I get a response back that, you know, hopefully has some truth in it, and then just search for those truths and verify if they actually are in our retrieval system. Well, I haven't tried it yet.
That sounds like something that might be possibly one of the ways of how to like make sure that the quality of the responses is cool. I'm not really sure. I haven't used that so far, but that's a cool idea. I would definitely try it out. Okay, cause I just want to follow up a little bit.
Cause my problem is that probably my questions would be too vague to actually search in the database in the first place. So I want it to be expanded first before I search. Yeah, that may work. I would definitely love to hear more about the use case you are trying to solve. Cause if the queries are that limited,
then that might be great to just understand. Maybe you just need a different retrieval process under the hood as well. If the questions are, I don't know. The semantic search works great with retrieval augmentation language models, just because those queries which are being used are more on the long tail side.
So they're pretty long. They may include some keywords, but they come from the natural language. So the semantic search works way better than the keyword-based search. But maybe you can think about having some hybrid or even a keyword-based search if that works in a specific case. It really depends on the use case, but I would love to hear more about what you are trying to solve.
Cool, thank you very much.