We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Democratizing Climate Science: Searching through NASA’s earth data with AI at scale

Formal Metadata

Title
Democratizing Climate Science: Searching through NASA’s earth data with AI at scale
Title of Series
Number of Parts
69
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Full title: Democratizing Climate Science: Searching through NASA’s earth data with AI at scale While everyone was sheltering in place in 2020, a group of citizen scientists decided to tackle the problem of auto-detecting interesting weather patterns in earth’s imagery collected by NASA satellites. The problem - we were dealing with a scale we had never seen before - 20 years worth of earth’s imagery collected continuously by not just NASA but also other private and public space agencies across the world which was only growing exponentially by the day. We wanted to build a reverse image search engine on this massive unlabelled dataset and automatically detect interesting phenomena such as hurricanes, polar vortexes, melting ice caps etc. NASA’s scientists had performed extensive research to solve this problem in theory - but no one had attempted to build a production quality system to put it in practice before. SpaceML was started in collaboration with NASA’s Frontier Development Lab and Google Cloud and is built entirely by industry professionals and student mentees around the world with their donated time. In this presentation we will talk about how we solved the problem of applying deep learning to continuously search for interesting weather patterns in petabytes of earth’s imagery. We will cover the challenges involved in continuous data processing, indexing and running distributed search while providing a low latency, highly available search API. And how we used Google Cloud offerings such as Dataflow, Functions, App Engine along with Pytorch and nearest neighbor search libraries such as SCANN, FAISS and Annoy to make it happen. We will detail the end to end self supervised learning system that we built with an eye on cost constrained usage of cloud resources while maintaining extensibility for other space science endeavors. We will also touch upon the organizational challenges in building this system with a highly distributed team including how we employed fast prototyping to build confidence in the system while gradually increasing the scale to petabytes of data. We built this system with the goal of open sourcing the set of components to expand the project’s applicability beyond space science. In this talk we will describe the architecture of the individual components so that you can leverage them to enable deep learning on any type of dataset in your field.