#Haystack: Semantic Search Ecosystem
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 48 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68806 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202023 / 48
14
19
26
43
47
00:00
Decision theoryExpert systemTransformation (genetics)Content (media)Computing platformRun time (program lifecycle phase)Electronic mailing listFeedbackArchaeological field surveyData conversionElasticity (physics)Electronic meeting systemCommon Language InfrastructureStack (abstract data type)Lambda calculusPoint cloudSpring (hydrology)Java appletGraph (mathematics)Degree (graph theory)Process (computing)Token ringType theoryContext awarenessGraph (mathematics)Content (media)ResultantPrincipal idealSemantics (computer science)SoftwareService (economics)Token ringSampling (statistics)Focus (optics)Internet service providerFeedbackPoint (geometry)Process (computing)CodeIntegrated development environmentDomain nameStaff (military)InformationQuery languageTransformation (genetics)Physical systemEvent horizonLevel (video gaming)Latent heatTerm (mathematics)Strategy gameElectronic mailing listType theoryExpert systemSubject indexingCASE <Informatik>Set (mathematics)Archaeological field surveyGroup actionGraph (mathematics)Arithmetic meanBitDifferent (Kate Ryan album)Cross-correlationData storage deviceMappingFormal languageConnectivity (graph theory)Intrusion detection systemWordInstance (computer science)Single-precision floating-point formatDecision theoryAreaNumberComputing platformMereologyElasticity (physics)Constructor (object-oriented programming)Text editorInteractive televisionSearch engine (computing)Run-time systemProduct (business)Cartesian coordinate systemRun time (program lifecycle phase)Presentation of a groupSoftware engineeringXMLUMLComputer animationDiagramProgram flowchart
09:02
Context awarenessEvent horizonScale (map)Self-organizationDecision theoryToken ringType theoryEntropie <Informationstheorie>Data structureAnnulus (mathematics)FiberFluidEntire functionCellular automatonMathematical morphologyDisplacement MappingDependent and independent variablesTape driveRule of inferenceInformationData managementText editorOscillationPerformance appraisalStatisticsMiniDiscProduct (business)UsabilityQuery languageRun time (program lifecycle phase)Elasticity (physics)CASE <Informatik>Content (media)Graph (mathematics)Function (mathematics)Thermal expansionPressure volume diagramComa BerenicesMatching (graph theory)Mathematical analysisNumber theoryRouter (computing)CASE <Informatik>Traverse (surveying)Social classMatching (graph theory)Graph (mathematics)Token ringBridging (networking)Fuzzy logicComputer wormIntrusion detection systemThermal expansionBitDistanceContent (media)Integrated development environmentGraph (mathematics)Operator (mathematics)Series (mathematics)Run time (program lifecycle phase)Strategy gameConnectivity (graph theory)Physical systemQuery languageForcing (mathematics)Set (mathematics)Rule of inferenceComputing platformMappingTerm (mathematics)Sound effectClassical physicsEndliche ModelltheorieSubject indexingMixture modelRaw image formatDecision theoryReverse engineeringArithmetic meanPermutationProduct (business)WeightLatent heatMultiplication signCuboidResolvent formalismInheritance (object-oriented programming)Run-time systemComputer animationSource codeDiagramXMLUMLProgram flowchart
17:57
Query languageRun time (program lifecycle phase)Elasticity (physics)MaizeDecision theoryAbelian categoryWeightProxy serverControl flowWärmeleitungToken ringBootingComputer-generated imageryPlastikkarteNormal (geometry)Condition numberDependent and independent variablesData managementArchaeological field surveyElectronic mailing listSet (mathematics)SkewnessPerformance appraisalInterior (topology)Cartesian coordinate systemEntropie <Informationstheorie>View (database)Sanitary sewerLocal GroupRankingFaktorenanalysePressurePairwise comparisonComputer virusFuzzy logicConfiguration spaceData typeInformationStatisticsExpert systemData conversionRouter (computing)Type theoryCategory of beingPressure volume diagramSemantics (computer science)Software testingMortality rateSign (mathematics)Incidence algebraCrash (computing)Computing platformSinguläres IntegralSet (mathematics)Context awarenessInformationFeedbackCASE <Informatik>Query languageResultantExpert systemArchaeological field surveyMultiplication signCategory of beingAreaThermal expansionElectronic mailing listPhysical systemConnectivity (graph theory)Interactive televisionDefault (computer science)Integrated development environmentPresentation of a groupWebsiteVariety (linguistics)Directed graphGraph (mathematics)Pairwise comparisonRule of inferenceGroup actionGraph (mathematics)GradientMathematical analysisExpected valueContent (media)Alpha (investment)Series (mathematics)Bit rateDifferent (Kate Ryan album)Type theoryWeb pageSound effectSelf-organizationBridging (networking)Performance appraisalBitFunktionalanalysisMultitier architectureAdditionRepresentation (politics)Program flowchartComputer animationXMLSource codeDiagramUML
26:51
AreaSubject indexingPhysical systemIdentifiabilityString (computer science)Classical physicsMultiplicationMixture modelProcess (computing)Strategy gameSet (mathematics)Latent heatThermal expansionExpert systemQuery languageIntrusion detection systemContent (media)AnalogyGraph (mathematics)Run time (program lifecycle phase)SubsetCASE <Informatik>MathematicsElectronic mailing listMathematical analysisAdditionInformationGraph theoryBitDirected graphIntegrated development environmentVirtual machineTerm (mathematics)Position operatorCombinational logicPublic domainBoolean algebraType theoryMatching (graph theory)TunisXMLMeeting/Interview
32:05
Operator (mathematics)Multiplication signGraph (mathematics)ResultantPhysical systemExpert systemThermal expansionBitContent (media)Thresholding (image processing)Graph (mathematics)Process (computing)Directed graphRule of inferenceInformation technology consultingStaff (military)AreaField (computer science)Query languageMereologyDifferent (Kate Ryan album)Decision theoryDomain nameMatching (graph theory)Selectivity (electronic)FrequencyRight angleReal numberLatent heatArchaeological field surveyInheritance (object-oriented programming)Self-organizationMeeting/Interview
37:18
XMLUML
Transcript: English(auto-generated)
00:09
As you heard, my name is Erica Lecision, and I'm a principal software engineer at EBSCO Information Services. My primary focus is to design search solutions for EBSCO clinical decisions.
00:23
We are the leading provider of evidence-based content for point-of-care practitioners. Clinicians use our search engines to diagnose and treat patients. Today, I'm going to talk to you about our search ecosystem.
00:40
This is really all of the code and infrastructure that come together to create a search experience for our customers. I'll start off with an overview of our search prep environment. This is where we take the content that has been written by our editorial staff, and we transform it into something we can work with.
01:02
We then semantically enrich and develop and decorate this content prior to indexing. It is then available in our search runtime environment. The medical search platform and query intelligence work together to construct runtime-specific
01:20
search strategies based on user intent. We also incorporate two types of feedback into the system, user and expert feedback. Our expert feedback area is really where we engage with our subject matter experts. We work with them to create judgment lists,
01:42
otherwise known as golden sets. These really represent the target that we strive for when tuning our search results. Additionally, we invite them to curate our medical knowledge graph and participate in surveys and experiments.
02:02
We also observe how our end users interact with our search results and incorporate that feedback back into the system. At the heart of our ecosystem is our medical knowledge graph. This allows us to semantically understand our domain.
02:21
Now, today I'm gonna give you a high-level overview of each of these areas, but if you're looking for more information, I invite you to check out Haystack's website. You'll find there a presentation that I did last year on query intelligence. Additionally, I'll be speaking in August about all of the recent work
02:40
that has been done on our medical knowledge graph. Our search ecosystem is really about two to three years old. The majority of the work that the team does is really greenfield initiatives, as well as migrating from an on-prem environment to AWS.
03:01
We currently have about six microservices in production and a handful of on-demand applications. Our medical knowledge graph was conceived about five years ago. It has since gone through quite a bit of transformation over the years.
03:23
It is currently composed of about 350,000 concepts, 700,000 terms, and 2.5 million relationships. Now, terms are primarily our entry point into our knowledge graph.
03:41
It really represents ways that users would search in our system, and it can also represent synonyms. Once we have an entry point into the graph, we can then make a correlation between a term and a concept. Now, we also can leverage relationships so we can traverse through this knowledge graph
04:02
in order to enhance our search experience. We use this medical knowledge graph as part of our search prep environment, as well as our search runtime environment. Our content ingestion pipeline consults heavily with our medical knowledge graph
04:20
in order to semantically understand the text. It then stores it in our Elasticsearch index. Now, naturally, our search runtime environment has to speak in that same language, so it also relies heavily on our medical knowledge graph prior to reading from our index.
04:41
So let's take a peek at our search prep environment. This is an architectural overview of our content ingestion pipeline, and this particular pipeline really handles two main use cases. First, it handles a use case where a content editor has published a piece of content
05:02
and it is ready to be searchable. Another use case is when we completely reprocess all of the content in our system. We refer to this as a content ingestion process. Now, regardless of either use case, the pipeline behaves the same.
05:21
It essentially listens for these events and then takes action. The first thing it does is it goes out to S3 and it fetches that particular piece of content. It goes ahead and transforms it into internal pojos. We then take chunks of that content
05:41
and we pass it off to a concept mapper in order to semantically understand its meaning. Now, we actually do quite a bit of different types of enrichment or decorating against this content, but the most compelling impact is really the result of our semantic enrichment.
06:01
It is then saved into our Elasticsearch index and then we go ahead and we notify the other services of the outcome of this particular event. I'm gonna go ahead and take a deeper dive into our semantic enrichment process. Here is a snippet sample of text
06:20
that we're gonna go ahead and enrich. The bold here is representing a content title, fever of unknown origin, FUO in adults, below is a snippet of text. And essentially what we do is we pass these chunks of text to our concept mapping endpoint. And really what the endpoint does
06:41
is it tries to create a connection between these tokens and concepts in our graph. For example, fever of unknown origin is associated with concept ID 7520000. We then go ahead and semantically wrap the original text
07:02
with these concept IDs. Now, FUO is actually an acronym for fever of unknown origin. So we go ahead and we semantically tag it with the same concept ID. In is a stop word. And so therefore we do not semantically enrich it
07:21
because it will not add any benefit. So this is an example result of what it looks like after semantic enrichment has run. You can see here, fever of unknown origin, as well as FUO are both tagged with the same concept ID. You can also observe instances
07:41
where a single piece of text is associated with multiple concepts in our graph. For example, temperature is associated with three concepts. Now this actually might be a great area for us to curate so that we can reduce the number of definitions associated with temperature.
08:00
You'll also observe that there's some portions of texts that do not get enriched. And this essentially is because we either don't have a definition in our map or our knowledge graph or semantically enriching it really wouldn't add any value. So really what we're talking about here
08:21
is taking large chunks of text and comparing it against a knowledge graph. So as you can imagine, you might encounter some particular problems when you do this. Now the issues that we encounter primarily are around acronyms. So let's take a look at some examples. This particular snippet of text
08:41
in those with heart failure, who have had. Now the terms who and had are actually correlated to acronyms in our graph. The term who is defined as World Health Organization and had is defined as hospital anxiety and depression.
09:02
Now that's actually not the usage in this particular sentence. So although we have identified these as acronyms, we do not semantically enrich them. So we have a set of rules that we follow to ensure that we don't overly tag our content. We also have to deal with ambiguous acronyms.
09:23
Symptoms of AF is an example of this. AF is associated with many concepts in our graph. And as you can see, when I pass this snippet of text to my concept mapper, I'm able to easily resolve symptoms, but AF is really associated with way too many concepts
09:41
for me to tag it. There would really be no value in me adding all of these concept IDs to this particular acronym. So in order to mitigate this issue, we ask that our content writers follow the rule of first occurrence. And what this does is it allows them to define the meaning of an acronym prior to usage in the content.
10:04
You can see in this particular example that AF is defined by atrial fibrillation. They can then use it freely throughout the content and we now understand its meaning and usage. So now we're able to disambiguate AF
10:22
by comparing it to the expansion that was provided to us. So in this case, we create that connection between atrial fibrillation and AF. This is an example of what our index looks like after semantic enrichment has run. You can see here a mixture of keywords
10:41
as well as concept IDs. Ultimately, when we tokenize this text, we strip out the CID tags and what we're left with are the raw concept IDs as well as the original text. So that summarizes our search prep environment.
11:02
Let's now talk a little bit about our search runtime environment. This is an architectural overview of this environment. What happens is we receive a request against the medical search platform. And the first thing it does is it tries to understand the user's intent.
11:22
It sends the query off to query intelligence and query intelligence performs a series of operations. First, it needs to segment the query. It then has a series of decisions it needs to make. Should it expand the criteria that we use? Should it narrow it or perhaps should it keep it?
11:42
Ultimately, what query intelligence does is it returns a set of recommendations to the medical search platform so that it can construct the best runtime search strategy against Elasticsearch. So let's go ahead and take a peek at how we segment queries in query intelligence.
12:05
So this is an example query, paralysis facial nerve. And if we were to use a brute force approach, we could say that these three tokens are associated with four concepts in our graph. You can see here that paralysis
12:21
would be associated with two concepts, whereas facial and nerve are associated with their own. However, if we go ahead and we look for the longest matching tokens, we can become more specific. We can then identify that facial nerve actually corresponds to a single concept in our graph.
12:41
So here we've been able to reduce three tokens to three concepts. But if we actually take permutations into account, we can become even more specific and more targeted. Ultimately, what we're trying to do here is come up with the most precise best case scenario.
13:03
Oftentimes when users enter terms into a search box, they don't really take the order into account. So by looking for the longest matching tokens and also performing permutations along with fuzzy matching, we can then create a more specific mapping between terms and a concept in our graph.
13:24
And this is important because we wanna create that bridge between the way someone searches and the way content is structured. Here you can see that our content is actually written as facial nerve palsy, but by semantically understanding that and additionally performing the query segmentation,
13:43
we can create that bridge between the two worlds. Now, just because we semantically understand the query doesn't mean that we really understand the intent of the user. Headache is an example of this. Headache is a really straightforward mapping
14:02
in our knowledge graph. You can see that we have quite a bit of rich synonyms and terms that represent headache. But it turns out that headache is a common side effect of many drugs. It can also be a symptom of many medical conditions. So we really just don't want to blindly pass
14:22
this semantic understanding along to searching. We wanna be a little bit more intelligent about that. So let's take a look at what I mean. Let's walk through some examples. Let's go ahead and take a look at why we would want to expand a particular query.
14:42
So this is an example of a classic query expansion in our model. Here someone has searched for Zoloft and pregnancy. Now we've been able to segment this particular query into two concepts, Zoloft pregnancy. However, there's a problem here.
15:01
We don't actually mention Zoloft in our content. Our content actually mentions antidepressants. So what we can do is interrogate our knowledge graph and try to create that bridge between the query and our content. You can see in this example here that we are able to traverse our knowledge graph
15:23
and create that connection between Zoloft and antidepressant. However, if we do so blindly, then we will create a lot of recall in our system. And that is evident when you look at the expansion of pregnancy. Really the intent of the user here
15:41
is that they would be satisfied with content that is antidepressant or Zoloft and pregnancy. So let's go ahead and walk through an example traversal in our graph to achieve this. When someone searches for Zoloft, we identify that Zoloft is a brand name
16:02
associated with a substance known as sertraline. We then go ahead and examine the relationships in our graph associated with this substance. We identify that it is actually a member of the drug class known as SSRIs. We then expand to its parent, which is antidepressants.
16:26
So ultimately in order to create a bridge between Zoloft and antidepressants, we first had to identify that this was an appropriate candidate to be expanded. And then we went ahead and performed two hops
16:40
to create that bridge. So ultimately when query intelligence receives a query such as this, what it does is it returns both segments associated with it, but it automatically performs that hypernym expansion. And notice that it is selective. It does not perform this on pregnancy,
17:02
but it did perform this on the product Zoloft. And you can see in this example payload that we have the concept IDs associated with the expansions. We also represent the distance of these expansions in our knowledge graph. And this allows us to apply a decay weight when searching.
17:23
Here, when the medical search platform actually executes this particular query, it applies the highest boosting on the original term, which is Zoloft. And then you can see that it's been decayed as it expands through the graph.
17:41
Again, this allows us to create that bridge between a very specific query and content that is represented in our system. So let's go ahead and talk about the reverse scenario of when we would narrow or we would filter down a segmented query. So I mentioned earlier that AF is really ambiguous.
18:04
And we have the ability to solve this during semantic enrichment, because we follow the rule of first occurrence. But when searching, we don't have context, or at times we have very little context. So what we do is we rely on a variety of sources
18:23
in order to come up with a preferred definition and lack of context. In this case, we've worked with our subject matter experts and we've also looked at how users interact with this particular query in our system. You can see that more often than not,
18:40
individuals click on atrial fibrillation when they search for AF. It also corresponds to a category in our content. Now, when we perform that search, we go ahead and identify and make that connection between an acronym and its expansion in our content.
19:02
Now, I should mention here that we're creating a default experience for our customers, but we do have the opportunity to get it wrong. So you will see in the future that we will offer the ability for our users to pivot and perhaps give us more insight on their intent.
19:21
All right, so we talked about how we prep our environment and we talked about how we search it, but let's actually dive into how we go ahead and tune this environment. Our search evaluation tool was conceived out of this notion that we are very successful
19:41
when we are tightly coupled with subject matter experts. The tool really creates this bridge between highly technical engineers and non-technical end users or individuals who are really focused primarily in treating patients. So we've built quite a bit of rich functionality
20:02
into this tool, and it's actually used heavily by the engineers and the subject matter experts. One of our most recent additions into this tool is to allow these subject matter experts to curate the graph. You can see here that we're adding very little ability for them to interact with the graph,
20:22
but we will be expanding upon this. Now, this actually is a representation of manual curation, but we also do some automated curation against the graph. And I'll be talking in more detail about that in my discussion in August.
20:41
Most importantly though, we invite these subject matter experts to generate golden sets or these judgment lists. And essentially what we do is we ask them to go ahead and classify the search results into different buckets for a given query. So this is an example golden set that's being generated for the query breast cancer.
21:03
We essentially ask these SMEs or subject matter experts to go ahead and create a classification for each title. Most relevant are those results that we would expect to appear at the top of the page. Irrelevant shouldn't appear at all. And then relevant are fine to appear anywhere
21:22
in the main search results. Notice that we're not asking them to give us position one versus three or seven. We're simply asking them to group them in buckets. Once they're grouped into buckets, it gives us an opportunity to calculate a grade
21:40
on how well we are performing against their expectations. What we ultimately do is we perform two different types of scoring. The first, which is most critical, is we actually compare how our search results are behaving against the most relevant bucket that they've defined.
22:01
We then also calculate a grade against the entire result set. So this is an example of analysis that's been run. You can see for breast cancer, we're performing an A in the top tier result bucket. And then overall, we have a grade of a B.
22:22
You'll also see here that we have an indicator on whether this golden set is curated or not. And essentially what this means is that we give more precedence or more preference to golden sets that have been generated by a group of professionals versus one that was created independently.
22:43
Oftentimes we find ourselves having multiple golden sets for the same query. So we wanna make sure that when we grade ourselves, we use the one that most closely aligns, potentially with our customers. We also use these golden sets heavily when we compare how certain experiments
23:01
are performing in the system. This is an example where we're comparing a baseline against an experiment known as DMP2 underscore fuzzy. You can see for the query alpha dash one, that our experiment is better performing in the top tier results as well as overall.
23:23
So it allows us the opportunity to vet out different experiments before promoting it for other eyes to see it. We also rely heavily on what we call a workflow comparison. In this particular example, the user is searching for croup
23:41
and we're comparing our baseline against an individual who is authenticated as a pediatrician. On the left, our baseline shows that the fourth hit is associated with children. Whereas on the right side, you can see here that three of the results are associated with children
24:00
and are probably more important to someone who is personalized as a pediatrician. Now, once we have vetted out these different experiments, we then give them off or hand them off to our subject matter experts to play around with it and see what they think. What we do is we create a blind survey
24:21
and we pass them a series of queries and we ask them to rate which results sets are better. Now we randomly pick which side will display the results so that they can't tell the difference. Ultimately, we use this information to decide if an experiment is ready to be promoted to our live environment.
24:44
Now, once it's in our live environment, our user feedback really then comes into play. And we use it to address a certain set of queries that we struggle with. Let's take an example of this. Breast cancer is a pretty generic query and we have a lot of content in our system
25:03
that mentions breast cancer. In fact, all of the titles in this list have the same thing in common. They're actually represented by two different concepts. You have breast cancer and then some other concept, for example, chemo prevention, screening, tumor markers.
25:24
And ultimately when they search, it's hard to distinguish which one of these is really more relevant than the other. But it turns out that breast cancer in women is actually the title that we would like to appear at the top of the page.
25:41
So what we do is we take click data into account and we ultimately use this as a tie-breaking effect. So over time we solve these queries based upon how the users interact with the system. And here you can see that we can achieve that goal
26:01
by applying user behavior. So that summarizes my presentation. I talked about the four main areas of our ecosystem. I talked about how we semantically enrich our content and how we try to understand the user's intent when querying.
26:21
And I also discussed how we incorporate expert and user feedback into the system. And as I mentioned earlier, check out Haystack's website for more information on query intelligence, and definitely check out my presentation in August on our medical knowledge graph. Now, if you like what you see, we're hiring in my team
26:42
as well as throughout the organization. So check out our career website and reach out to me. Thank you. I'll try that again. Fantastic, thank you very much, Erica. We do have some questions.
27:01
You've generated a lot of interest here. So I'm just going to go to Slack and ask some questions for people. So Mia asks, is it possible to create your knowledge graph semi-automatically? How much expert duration did you require? Yeah, that's a great question. So we actually have an automated process
27:20
that rebuilds the knowledge graph daily. We collect quite a bit of information from the public domains, and we also source information from our own content. And essentially what happens is all of this information is reconstructed together in our Neo4j environment. We resolve all of the relationships.
27:41
We perform machine learning based curation to identify duplicates in our graph. And then we actually perform a merging of these duplicates upon exporting this graph. So ultimately we do have a manual process for building the graph, and we have some curation,
28:01
but we're going to continue to embark upon more because ultimately it can be very expensive to have your subject matter experts curate large portions of the graph. So we really try to have them focus on hotspots. So some hotspots might be acronyms or concepts that are associated with, I'm sorry, terms that are associated
28:21
with too many concepts. And also of course, any poor performing queries. Thank you. So our next question comes from Artem. In your example of analyzing tagged text, the position of the concept ID is different from the tagged text.
28:40
Don't you use the concept ID for overlapping with the tag text during search? Does that make sense? I think. I think. So when we search, we actually search for a combination of concept IDs, as well as text. And because they're interlaced together,
29:02
when we receive a hit from Elasticsearch, we need to actually perform some type of extraction to make sure that we highlight the correct area. So we do have challenges when they're interlaced together, but allows us the flexibility to semantically attempt to identify hits in our index,
29:20
and then also fall back on keywords. So we really get the best of both worlds. But then of course, we have to mitigate that when we get the content out of the system. Okay, thank you. Our next question comes from Tito, who asks, when you match a medical concept in the query at search runtime, do you only search the concept identifier for recall,
29:44
or do you also search the keyword string? Yeah, so similar to my last answer, we actually perform both. So we actually rely heavily on DISMAX. And I like to describe it in many ways as we throw a lot of spaghetti at a wall. And the one that's stickiest is the one
30:02
that actually contributes to the scoring. So we examine the text using phrase matching. We also use classic Boolean logic. We use multi-match. But ultimately, we are using a mixture of keywords in addition to concept IDs.
30:21
Thank you, I love the spaghetti analogy. So Scott asks, when you evaluate a search experiment with expert rated documents, what do you do with documents that haven't been rated? Yeah, that's a great question. Because as I indicated before, we create these runtime specific search strategies.
30:42
So it can be quite a challenge when you tune. However, these strategies end up acting on a small subset of queries. So for example, I had talked about the HyperNIM expansion example. So oftentimes, when we perform an experiment, it's usually on a narrow focus
31:02
that we're trying to essentially tune. But our golden sets really tend to represent at least a portion of all areas or all strategies that we try to hit. So you'll see in some cases, we have golden sets that are really focused on acronym hotspots.
31:22
Others that are focused more on punctuation or plural. Others that are focused on specific expansion or narrowing cases. But ultimately, it's only gonna be as good as much data as you can compare. Now, we currently have about 300 golden sets
31:42
or judgment lists. And our subject matter experts are constantly creating more. What's actually really important to us is that queries that perform well should also be represented by these judgment lists, not just those that perform poorly, because we wanna make sure that as we tune, we don't have a degrade in performance.
32:02
So we're constantly running this analysis as we make changes. And then really the rule is that if it isn't performing better than your baseline, we typically don't promote it. Thank you. Well, I'm afraid we can't get away from this one. Avi Rapoport asks, how do you handle new concepts such as COVID?
32:21
Yeah, great question. So we have a regular cadence upon which we update our graph. Now, the update process itself, I had mentioned we actually run it daily right now. But to incorporate those vocabularies from the public sources is a little bit more effort. So we don't do it as frequently.
32:42
We do however, have the ability to add manual concepts into our graph. We tend to want to avoid this approach because you don't have the rich relationships that come from it. But COVID is a great example because we actually have written quite a bit of content on COVID. And so we had to react very quickly.
33:00
So we have the ability to create these concepts, go ahead and curate them, and then we can catch up later with the public vocabularies. Thank you. So we've got time for one last question, I'm afraid, but remember you can join the breakout room afterwards to carry on discussing these interesting concepts with Erica.
33:21
So I'm going to ask on Charles's behalf, how many subject matter experts do you have? And are these full-time employees or consultants that review the graph as needed? That's a great question. So we write our own content. And so we have this incredible staff of medical professionals
33:40
that are top physicians in their fields. They are often seeing patients and they're also writing as part of our content. And so we actually have about, I think 10 of them that represent different areas or different domains or specialties. We meet weekly with them
34:00
and we talk about future initiatives. We discuss any poor performing queries and we also do operations together to curate the graph. So we have quite a few on hand. Some are more involved than others, but ultimately the goal is to have even more involved in the process. As I had mentioned,
34:20
we've been incorporating more manual curation into our tooling in order to allow a wider audience of subject matter experts to work with our graph. Okay, thank you. I actually misunderstood the time slightly. So we do have a little more time for questions if that's okay. Sure.
34:41
Tom Bergman asks, we look at your specific examples. He's wondering why Zoloft was expanded in the query, but pregnancy was not. Yeah, so if I had expanded pregnancy, I would have created too much recall in the system. So pregnancy, if you expand up to the parent,
35:01
ultimately comes down to a female organ, right? It becomes really very generic. And so if I expanded that, the search results probably wouldn't make sense. So when we expand, we really wanna make sure that we're really careful when we do that or when we make that particular decision.
35:21
Really what we try to focus on most is precision. We have expert end users that are working in the system. And for us, we wanna make sure that we really shorten the time it takes to answer these critical clinical questions. So for us, when we do expand and create more results that they have to look through, we wanna make sure that it's always
35:41
with the goal of precision in mind. Thank you. And I think we will have our last question now. This is another one from Tito. How do you evaluate and tune the algorithmic expansions function to ensure the system is not presenting too many matches? How do you avoid watered down results for precise searches?
36:01
Yeah, it's a tough problem to solve. So for us, because we're very selective, we do quite a bit of experiments and blind surveys, ultimately with our subject matter experts to evaluate that. But you do absolutely run into trouble. So even though we have 2.7 million relationships
36:23
in our graph, we really rely heavily, mostly on about 530,000 of those relationships. So we are selective when we actually perform these expansions. And I should mention, even though there's these, about 500,000 that we use,
36:40
it's only during selective periods. So it's really gotta meet a certain threshold in order for us to trigger it. Because otherwise, the search results can become, as you indicated, watered down very quickly. And you can see a real degrade in performance. An example of this is if you went ahead
37:01
and just blindly expanded any substance, if it wasn't a drug brand name and just any random substance, you could see search results that come back that are really not even related to it because you've really traversed too far away in your knowledge graph for it to be relevant.