PyAutoFit: A Classy Probabilistic Programming Language For Data Science

EuroPython

Nightingale, James W.

Formale Metadaten

Titel

Serientitel

EuroPython 2021

Anzahl der Teile

115

Autor

Nightingale, James W.

Mitwirkende

Arora, Dhanshree (Moderation)

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.

Identifikatoren

10.5446/58768 (DOI)

Herausgeber

EuroPython

Erscheinungsjahr

2021

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

A major trend in data science is the rapid adoption of Bayesian statistics for data analysis and modeling. With modern data-sets growing by orders of magnitude in size, the focus is now on developing methods capable of applying contemporary inference techniques to extremely large datasets. To this aim, I present PyAutoFit, an open-source probabilistic programming language for automated Bayesian inference that was recently published in the Journal of Open Source Software. I will begin by giving an overview of PyAutoFit’s core features, in particular how it: Makes it simple to compose and fit probabilistic models using a range of Bayesian inference libraries, such as emcee and dynesty. Handles the 'heavy lifting' that comes with model-fitting, including model composition & customization, outputting results, model-specific visualization and posterior analysis. Is built for big-data analysis, whereby results are output as a sqlite database which can be queried after model-fitting is complete. PyAutoFit was developed by Astronomers seeking to fit large libraries of galaxy images to better understand the nature of dark matter. Using this science-case, I will describe PyAutoFit’s advanced features, such as multi-level models, automated model-fitting pipelines and support for massively parallel computing infrastructures. The goal of this talk is to introduce the audience to PyAutoFit so they can adopt it for their use-case. The only prerequisite is a basic understanding of object oriented programming in Python.