Big Data Analytics at the MPCDF: GPU Crystallography with Python

Zitieren

EuroPython

Bernardo, Giuseppe di

Formale Metadaten

Titel

Big Data Analytics at the MPCDF: GPU Crystallography with Python

Serientitel

EuroPython 2017

Anzahl der Teile

160

Autor

Bernardo, Giuseppe di

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben

Identifikatoren

10.5446/33715 (DOI)

Herausgeber

EuroPython

Erscheinungsjahr

2017

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Big Data Analytics at the MPCDF: GPU Crystallography with Python [EuroPython 2017 - Talk - 2017-07-12 - Anfiteatro 1] [Rimini, Italy] In close collaboration with scientists from MPG, the Max Planck Computing and Data Facility is engaged in the development and optimization of algorithms and applications for high performance computing, as well as in the design and implementation of solutions for data-intensive projects. Python is now used at MPCDF in the emerging area of “atom probe crystallography” (APT): a Fourier spectral analysis in 3D reciprocal space can be simulated in order to reveal both composition and crystallographic structure at the atomic scale of billions APT experimental data sets. The Python data ecosystem has proved to be well suited to this, as it has grown beyond the confines of single machines to embrace scalability. This talk aims to describe our approach to scaling across multiple GPUs, and the role of our visualization methods too. Our data workflow analysis relies on the GPU-accelerated Python software package called PyNX, an open source Python library which provides fast parallel computation scattering. The code is well suited for GPU computing, using both the pyCUDA and pyOpenCL libraries. Exploratory data analysis and performance tests are initially carried on through Jupyter notebooks and Python packages e.g., pandas, matplotlib, plotly. In production stage, interactive visualization is realized by using standard scientific tool, e.g. Paraview, an open-source 3D visualization program which e.g. requires Python modules to generate visualization components within VTK files