We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Using Spark in Weather Applications

Formal Metadata

Title
Using Spark in Weather Applications
Title of Series
Number of Parts
183
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Producer
Production Year2015
Production PlaceSeoul, South Korea

Content Metadata

Subject Area
Genre
Abstract
"Many important weather related questions require looking at weather models (NWP) and the distribution of model parameters derived by ensembles of models. The size of these datasets have restricted their use to event analysis. The ECMWF ensemble has 51 members. Using all these members for statistical analysis over a long period of time requires very expensive computational resources and a large amount of processing time. While event analysis using these ensembles is invaluable, detailed quantitative analysis is essential for assessing the physical uncertainty in weather models. Even more important is to potentially detect different weather regimes and other interesting phenomena buried in the distribution of NWP parameters that could not be discovered using a deterministic (control) model. Existing tools, even distributed computing tools, scale very poorly to handle this type of statistical analysis and exploration - making it impossible to analyze all members of the ensemble over large periods of time. The goal of this research project is to develop a scaleable framework based on the Apache Spark project and its resilient dataset structures proposed for parallel, distributed, real time weather ensemble analysis. This distributed framework performs parsing and reading GRIB files from disk, cleaning and pre-processing model data, and training statistical models on each ensemble enabling exploration and uncertainty assessment of current weather conditions for many different applications. Depending on the success of this project, I will also try to tie in Spark's streaming functionality to stream data as they become ready from source, eliminating a lot of code that manages live streams of (near) real-time data."