Monitoring Ozone Levels with Postgresql

Cite

Related Material

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Lai, Alex Brandon, Marty

Formal Metadata

Title

Monitoring Ozone Levels with Postgresql

Title of Series

PGCon 2012

Number of Parts

Author

Lai, Alex

Brandon, Marty

Contributors

Heroku (Provider)

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/19027 (DOI)

Publisher

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Release Date

2012

Language

English

Producer

FOSSLC

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Monitoring Ozone Levels with Postgresql Database Streaming Replication and Monitoring Postgres is used to manage data from the Ozone Monitoring Instrument aboard NASA's Aura spacecraft. The database implementation must handle large volumes of complex data transmitted continually from the satellite and generated by processing-intensive analyses performed by a team of atmospheric scientists. This talk will describe the architecture and some of the challenges faced. Focus will be given to our replication efforts, software developed for monitoring, and ongoing work to create a decentralized network of services commnicating through a RESTful interface. NASA and its international partners operate several Earth observing satellites that closely follow one after another along the same orbital track. This coordinated group of satellites, is called the Afternoon Constellation, or "A-Train", for short. Four satellites currently fly in the A-Train: Aqua, CloudSat, CALIPSO, and Aura. Each satellite has one or more observational instruments that are used together in the construction of high-definition three-dimensional images of the Earth's atmosphere and to monitor changes over time. Aura's instruments include the Ozone Monitoring Instrument (OMI). Data management and processing services for data harvested by OMI are provided by the OMI Science Support Team headquartered at Goddard Space Flight Center. Raw OMI data is received and initially processed at a ground station in Finland, then ingested into the system, where it is analyzed by scientists who submit processing jobs. Earth Science Data Types (ESDTs) are the products of these jobs, and one of the principal types of data managed in the database. Complex and abstract, ESDTs represent the interface between the raw science data and the data management system, and more than 900 are currently defined. Our current database implementation includes 10 clusters, each running Postgres 9.0.4, and divided into three production levels: development, testing, and operations. The central operations cluster handles on average about 200 commit statements per second, contains tables as large as 160 million rows, and is configured for streaming replication. New data is continually being added to the system, and the total quantitiy is increasing at a rate of about 60% per year. This influx of data, in addition to scientific analyses, can cause the load on the database to vary suddenly, and monitoring software has been developed to provide early warning of potential problems. The latest implementation of our software architecture uses decentralized services communicating through a RESTful interface. Databases are bundled together with their software component, and schema changes are managed using patch files. A utility has been created to apply the patches, and ensure schema consistency as the databases are amended. Perl's Rose-DB is used as an object-relational mapper, and database queries, via HTTP requests, are supported by encoding the query information into JSON. The new platform uses a different data model, making it necessary to sync between the two representations, and causing some difficulty with data duplication.