Democratizing Climate Science: Searching through NASA’s earth data with AI at scale

Cite

Related Material

Plain Schwarz

Thomas, Sherin Chandrasekaran, Dharini

Formal Metadata

Title

Democratizing Climate Science: Searching through NASA’s earth data with AI at scale

Title of Series

Berlin Buzzwords 2021

Number of Parts

Author

Thomas, Sherin

Chandrasekaran, Dharini

Contributors

N. N. (Moderation)

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/67354 (DOI)

Publisher

Plain Schwarz

Release Date

2021

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Full title: Democratizing Climate Science: Searching through NASA’s earth data with AI at scale While everyone was sheltering in place in 2020, a group of citizen scientists decided to tackle the problem of auto-detecting interesting weather patterns in earth’s imagery collected by NASA satellites. The problem - we were dealing with a scale we had never seen before - 20 years worth of earth’s imagery collected continuously by not just NASA but also other private and public space agencies across the world which was only growing exponentially by the day. We wanted to build a reverse image search engine on this massive unlabelled dataset and automatically detect interesting phenomena such as hurricanes, polar vortexes, melting ice caps etc. NASA’s scientists had performed extensive research to solve this problem in theory - but no one had attempted to build a production quality system to put it in practice before. SpaceML was started in collaboration with NASA’s Frontier Development Lab and Google Cloud and is built entirely by industry professionals and student mentees around the world with their donated time. In this presentation we will talk about how we solved the problem of applying deep learning to continuously search for interesting weather patterns in petabytes of earth’s imagery. We will cover the challenges involved in continuous data processing, indexing and running distributed search while providing a low latency, highly available search API. And how we used Google Cloud offerings such as Dataflow, Functions, App Engine along with Pytorch and nearest neighbor search libraries such as SCANN, FAISS and Annoy to make it happen. We will detail the end to end self supervised learning system that we built with an eye on cost constrained usage of cloud resources while maintaining extensibility for other space science endeavors. We will also touch upon the organizational challenges in building this system with a highly distributed team including how we employed fast prototyping to build confidence in the system while gradually increasing the scale to petabytes of data. We built this system with the goal of open sourcing the set of components to expand the project’s applicability beyond space science. In this talk we will describe the architecture of the individual components so that you can leverage them to enable deep learning on any type of dataset in your field.