A major trend in data science is the rapid adoption of Bayesian statistics for data analysis and modeling. With modern data-sets growing by orders of magnitude in size, the focus is now on developing methods capable of applying contemporary inference techniques to extremely large datasets. To this aim, I present PyAutoFit, an open-source probabilistic programming language for automated Bayesian inference that was recently published in the Journal of Open Source Software. I will begin by giving an overview of PyAutoFit’s core features, in particular how it: Makes it simple to compose and fit probabilistic models using a range of Bayesian inference libraries, such as emcee and dynesty. Handles the 'heavy lifting' that comes with model-fitting, including model composition & customization, outputting results, model-specific visualization and posterior analysis. Is built for big-data analysis, whereby results are output as a sqlite database which can be queried after model-fitting is complete. PyAutoFit was developed by Astronomers seeking to fit large libraries of galaxy images to better understand the nature of dark matter. Using this science-case, I will describe PyAutoFit’s advanced features, such as multi-level models, automated model-fitting pipelines and support for massively parallel computing infrastructures. The goal of this talk is to introduce the audience to PyAutoFit so they can adopt it for their use-case. The only prerequisite is a basic understanding of object oriented programming in Python. |