We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PostgreSQL on Hadoop

Formale Metadaten

Titel
PostgreSQL on Hadoop
Alternativer Titel
Distributed Analytic Databases
Serientitel
Anzahl der Teile
25
Autor
Mitwirkende
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
ProduktionsortOttawa, Canada

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Bridging the Divide with Distributed Foreign Tables Apache Hadoop is an open-source framework that enables the construction of distributed, data-intensive applications running on clusters of commodity hardware. Building on a foundation initially composed of the MapReduce programming model and Hadoop Distributed Filesystem, in recent years Hadoop has expanded to include applications for data warehousing (Apache Hive), ETL (Apache Pig), and NoSQL column stores (Apache HBase). In this talk we describe recent work done at Citus Data that makes it possible to run a distributed version of PostgreSQL on top of Hadoop in a manner that combines the rich feature set and low-latency responsiveness of PostgreSQL with the scalability and performance characteristics of Hadoop. This talk will begin with a high level overview of Hadoop that focuses on its distributed storage layer and block-based replication model. Next we will look at the data model of the Apache Hive data warehousing system and explain how it enables features such as schema-on-read, support for semi-structured data, and pluggable storage formats. Finally, we will describe how we leveraged these ideas and Foreign Data Wrappers to build a distributed version of PostgreSQL. This version runs natively on Hadoop clusters and seamlessly integrates with other components in the Hadoop ecosystem.