Analyzing Data with Python & Docker

207

Cite

EuroPython

Dewes, Andreas

Formal Metadata

Title

Analyzing Data with Python & Docker

Title of Series

EuroPython 2016

Part Number

Number of Parts

169

Author

Dewes, Andreas

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/21095 (DOI)

Publisher

EuroPython

Release Date

2016

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Andreas Dewes - Analyzing Data with Python & Docker Docker is a powerful tool for packaging software and services in containers and running them on a virtual infrastructure. Python is a very powerful language for data analysis. What happens if we combine the two? We get a very versatile and robust system for analyzing data at small and large scale! I will show how we can make use of Python and Docker to build repeatable, robust data analysis workflows that can be used in many different contexts. I will explain the core ideas behind Docker and show how they can be useful in data analysis. I will then discuss an open-source Python library (Rouster) which uses the Python Docker-API to analyze data in containers and show several interesting use cases (possibly even a live-demo). Outline: 1. Why data analysis can be frustrating: Managing software, dependencies, data versions, workflows 2. How Docker can help us to make data analysis easier & more reproducible 3. Introducing Rouster: Building data analysis workflows with Python and Docker 4. Examples of data analysis workflows: Business Intelligence, Scientific Data Analysis, Interactive Exploration of Data 5. Future Directions & Outlook