Architecting Solr indexing pipelines in Google Cloud Platform

Plain Schwarz

Roy, Shubhro Jyoti

Formal Metadata

Title

Title of Series

Berlin Buzzwords 2022

Number of Parts

Author

Roy, Shubhro Jyoti

Contributors

N. N. (Moderation)

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/67168 (DOI)

Publisher

Plain Schwarz

Release Date

2022

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The ubiquity of public cloud platforms has made it easy to offload operational overhead of maintaining on-premise systems and leverage the ability to scale these systems on-demand in a matter of minutes. But architecting a secure scalable systems in the public cloud comes with its own challenges. This problem is further complicated when you are migrating from an on-premise system. Such migrations often require infrastructure to operate in a hybrid state where some parts of the system have been migrated to the cloud while remaining components continue to run on-premise. We must also ensure that the migration is invisible to the user and there is no impact to overall availability of the system during this transition. Recently Box Search underwent such a migration for our Solr indexing pipeline and document store which involved migrating hundreds of terabytes of customer data from on-premise to GCP. In this talk we present the overall system architecture, the migration process and some of the challenges we encountered when running this system in a hybrid state.