Scaling Facets to the Stars

Cite

Related Material

Plain Schwarz

Srivastava, Shikhar

Formal Metadata

Title

Scaling Facets to the Stars

Title of Series

Berlin Buzzwords 2021

Number of Parts

Author

Srivastava, Shikhar

Contributors

N. N. (Moderation)

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/67355 (DOI)

Publisher

Plain Schwarz

Release Date

2021

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Imagine you have a collection made up of several million news stories spanning several years. Your users may want to know how many of yesterday's news stories matched the query “American President.” That’s easy. It's just a normal query to the search engine. However, let’s say they want to get the same number for each day of the last year, or even each day of the last five years. That's a bit harder. Now, if you want to do this in less than a second, then it becomes really challenging. And, what if new data keeps coming into the collection every day and you need to scale to billions of news stories? In this talk, we will show how we achieved responsive time series aggregation across billions of documents using Apache Solr facets. We will discuss how, by optimizing our cache hit-rate on complex queries, we successfully reduced latency by a factor of ten (from tens of seconds to under one second) and increased throughput by 60 times (from 10 queries/minute to 10 queries/second). We will review the experiments we followed, show how we used Apache Jmeter to quickly run the right experiments, discuss the data structures we exploited, and look at how we organized our caching policy using Eclipse Memory Analyzer to scale Apache Solr facets to the stars! Get ready to take off!