DuckDB: Bringing analytical SQL directly to your Python shell.

Cite

Related Material

FOSDEM VZW

Holanda, Pedro

Formal Metadata

Title

DuckDB: Bringing analytical SQL directly to your Python shell.

Title of Series

FOSDEM 2023

Number of Parts

542

Author

Holanda, Pedro

Contributors

N. N. (Moderation)

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/62028 (DOI)

Publisher

FOSDEM VZW

Release Date

2023

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

In this talk, we will present DuckDB. DuckDB is a novel data management system that executes analytical SQL queries without requiring a server. DuckDB has a unique, in-depth integration with the existing PyData ecosystem. This integration allows DuckDB to query and output data from and to other Python libraries without copying it. This makes DuckDB an essential tool for the data scientist. In a live demo, we will showcase how DuckDB performs and integrates with the most used Python data-wrangling tool, Pandas. The talk is catered primarily towards data scientists and data engineers. The talk aims to familiarize users with the design differences between Pandas and DuckDB and how to combine them to solve their data-science needs. We will have an overview about five main characteristics of DuckDB. 1) Vectorized Execution Engine, 2) End-to-end Query Optimization, 3) Automatic Parallelism, 4) Beyond Memory Execution 5) Data Compression. In addition, users will also experience a live demo of DuckDB and Pandas in a typical data science scenario, focusing on comparing their performance and usability while showcasing their cooperation. The demo is most interesting for an audience familiar with Python, the Pandas API, and SQL.