Typically, when you need to expand a query through a model - for example, to do entity recognition or query tagging - you'd use a separate service. While this architecture is perfectly valid, the extra network hops to the "query expansion microservices" will impact query latency. For autocomplete and other low-latency use-cases, you might want to trade some complexity for speed by implementing a custom query parser. In this talk, we'll show a working example: - we'll build a model using TensorFlow in Python that does query expansion - we'll load it with TensorFlow for Java in a Solr Query parser - now we can run queries and get them expanded directly in Solr One can use this talk and the resources we'll share in order to implement a query parser for their own use-case. We'll also expand on the architecture trade-offs. For example, as you add more nodes and replicas to handle more query throughput, you'll expand the capacity for query expansion. Should you need to scale these separately, you can use coordinator nodes. |