We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Entity Linking at scale with Lucene

00:00

Formal Metadata

Title
Entity Linking at scale with Lucene
Title of Series
Number of Parts
56
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Signal AI offers a sophisticated platform to support businesses in their decision making. Customers define searches across billions of documents by using an extensive DSL that includes concepts like entities and topics amongst them. This metadata is being extracted from over 5 million documents each day and is made available to the end users within 30 seconds from its ingestion via a mix of machine learning and text retrieval techniques. Entity Linking is one of the core capabilities in the Signal AI data processing platform. It is a complex system that uses various strategies to achieve the highest quality while retaining excellent throughput characteristics. Back in 2019, one of the existing components of the Entity Linking system was rapidly reaching its limits and could not scale anymore. To overcome the limitation, the team took an innovative approach and used Apache Lucene with its inverted index and term vectors capabilities to enable the identification of rule-based entities. By choosing a percolator model the team had to revisit the previous architecture, breaking it down into smaller components that follow the Single Responsibility Principle for microservices. This talk will take the audience through the evolution of this service, from its inception until today. It will provide details around the technical decisions and trade-offs that make this component one of the most resilient, fast and cost effective solutions, capable of handling 20 times more the number of rules at a fraction of the cost. It will also discuss how the same technology is used to reprocess the entire dataset every night in approximately 15 minutes.
Level (video gaming)Gamma functionMusical ensembleRight angleGoodness of fitLink (knot theory)Connectivity (graph theory)Different (Kate Ryan album)Vector spaceScalabilityLevel (video gaming)Pattern recognitionBitXMLUMLLecture/ConferenceComputer animation
Programming paradigm1 (number)Dynamic random-access memoryDisk read-and-write headLevel (video gaming)Macro (computer science)Context awarenessComplex analysisDecision theoryContent (media)Computer animation
Local GroupDimensional analysisSelf-organizationProduct (business)ExistenceData modelRule of inferenceMachine learningData managementExecution unitVirtual machineAliasingContent (media)Physical systemWave packetOperator (mathematics)MetadataAliasingBefehlsprozessor2 (number)Process (computing)Structural loadService (economics)Point (geometry)Content (media)Endliche ModelltheorieRule of inferenceTouch typingProduct (business)DataflowHypermediaLogic programmingCross-correlationComputing platformVariety (linguistics)Self-organizationEntire functionDifferent (Kate Ryan album)Virtual machineTerm (mathematics)Order (biology)Binary codePhase transitionHypercubeNumberObject (grammar)BlogData storage deviceCASE <Informatik>Real numberWeb pageSource codeReal-time operating systemType theoryCurveBroadcasting (networking)Graph (mathematics)Knowledge extractionLatent heatSystem identificationBitFocus (optics)Uniform resource locatorCore dumpCombinational logic1 (number)Machine learningZoom lensDiagramComputer animation
Gamma functionRegulärer Ausdruck <Textverarbeitung>Rule of inferenceScaling (geometry)Kolmogorov complexitySpacetimeData structureService (economics)Level (video gaming)Operator (mathematics)NumberRule of inferenceMultiplication signTerm (mathematics)Complex (psychology)Data structureRegulärer Ausdruck <Textverarbeitung>Token ringGrand Unified TheoryBitPattern languageRight angleMatching (graph theory)Self-organizationPosition operatorSemiconductor memoryPhysical systemSpacetimeData storage deviceAngleAliasingInformationSound effectResultantLengthRegular graphMereologyCombinational logicLine (geometry)Run time (program lifecycle phase)Goodness of fitPoint (geometry)Perspective (visual)Query languageXMLUMLComputer animation
Data structureTask (computing)Process (computing)Query languageService (economics)Reduction of orderMultiplication signBitSoftware frameworkLibrary (computing)Subject indexingBuildingToken ringMatching (graph theory)QuicksortRegulärer Ausdruck <Textverarbeitung>Line (geometry)Rule of inferenceFilter <Stochastik>Pattern languageSlide ruleMoment (mathematics)Data structureContent (media)IdentifiabilitySurfaceForm (programming)Order (biology)MetadataCombinational logicInformationNeuroinformatikTerm (mathematics)NumberAliasingWiki1 (number)ResultantTask (computing)2 (number)Web applicationStructural loadGraph (mathematics)Process (computing)Different (Kate Ryan album)Instance (computer science)Exception handlingSemiconductor memoryMultiplicationInformation retrievalInverter (logic gate)Point (geometry)Vector spaceMathematical analysisLevel (video gaming)View (database)Regular graphComputer animation
Gamma functionSingle-precision floating-point formatParallel computingIdeal (ethics)Reduction of orderProgramming paradigmService (economics)Level (video gaming)Process (computing)Physical systemStapeldateiSerial portPoint cloudIntermediate languageRun time (program lifecycle phase)Multiplication signContext awarenessDependent and independent variablesParallel portIdeal (ethics)SequencePoint (geometry)2 (number)NumberArithmetic meanData structureMetadataTerm (mathematics)BitDifferent (Kate Ryan album)Virtual machineMaxima and minimaOrder (biology)Musical ensembleSpacetimeAliasingObject-oriented programmingSocial classModule (mathematics)Goodness of fitSolid geometryLogic programmingFunctional (mathematics)Vector spaceMultiplicationReduction of orderSingle-precision floating-point formatStructural loadSoftware testingExterior algebraCASE <Informatik>Rule of inferenceSubject indexingResultantGroup actionSimilitude (model)Machine learningLecture/ConferenceComputer animation
Single-precision floating-point formatParallel computingStapeldateiParallel portMacro (computer science)Programming paradigmDifferent (Kate Ryan album)Level (video gaming)Real-time operating systemReal numberBitResultantComputer fileMultiplication signProcess (computing)Moment (mathematics)Mathematical analysisSubject indexingRule of inferenceNeuroinformatikVirtual machineEinbettung <Mathematik>Single-precision floating-point formatParallel portDependent and independent variablesStapeldateiRepresentation (politics)Task (computing)Row (database)Computer architectureType theoryTheory of relativityoutputCASE <Informatik>Sound effectMereologyRegulärer Ausdruck <Textverarbeitung>FrequencyService (economics)Decision theoryContent (media)Arithmetic meanInstance (computer science)Electric generator1 (number)Flow separationContext awarenessLatent heatPoint (geometry)Volume (thermodynamics)Basis <Mathematik>Validity (statistics)Term (mathematics)Vector space2 (number)Dynamical systemConnectivity (graph theory)Data structurePhysical systemMathematicsKey (cryptography)Computer animation
Rule of inferencePresentation of a groupPattern languageToken ringRegulärer Ausdruck <Textverarbeitung>Computer animationLecture/Conference
Subject indexingInstance (computer science)Token ringPattern languageType theoryRule of inferenceLevel (video gaming)Form (programming)Query languageNormal (geometry)Right angleTransformation (genetics)File formatLecture/Conference
Query languageMultiplication signMoment (mathematics)Shape (magazine)MetadataContent (media)Token ringType theoryGoodness of fitRule of inferenceComputer architecturePerfect groupCASE <Informatik>Fitness functionRight angleData structureMatching (graph theory)Lecture/Conference
Rule of inferenceGoodness of fitLevel (video gaming)Noise (electronics)Different (Kate Ryan album)Virtual machineObject (grammar)AliasingType theoryMatching (graph theory)Endliche ModelltheorieCASE <Informatik>Right angleAbsolute valueMultiplicationLecture/Conference
Type theoryBinary codeAliasingVirtual machineProduct (business)Selectivity (electronic)Formal languageLecture/Conference
Machine learningFormal languageRule of inferenceType theoryContent (media)CASE <Informatik>Multiplication signError messageMoment (mathematics)Lecture/ConferenceMeeting/Interview
Moment (mathematics)Goodness of fitMusical ensembleLecture/ConferenceJSONXMLUML
Transcript: English(auto-generated)