We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Big Data Analytics at the MPCDF: GPU Crystallography with Python

00:00

Formal Metadata

Title
Big Data Analytics at the MPCDF: GPU Crystallography with Python
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Big Data Analytics at the MPCDF: GPU Crystallography with Python [EuroPython 2017 - Talk - 2017-07-12 - Anfiteatro 1] [Rimini, Italy] In close collaboration with scientists from MPG, the Max Planck Computing and Data Facility is engaged in the development and optimization of algorithms and applications for high performance computing, as well as in the design and implementation of solutions for data-intensive projects. Python is now used at MPCDF in the emerging area of “atom probe crystallography” (APT): a Fourier spectral analysis in 3D reciprocal space can be simulated in order to reveal both composition and crystallographic structure at the atomic scale of billions APT experimental data sets. The Python data ecosystem has proved to be well suited to this, as it has grown beyond the confines of single machines to embrace scalability. This talk aims to describe our approach to scaling across multiple GPUs, and the role of our visualization methods too. Our data workflow analysis relies on the GPU-accelerated Python software package called PyNX, an open source Python library which provides fast parallel computation scattering. The code is well suited for GPU computing, using both the pyCUDA and pyOpenCL libraries. Exploratory data analysis and performance tests are initially carried on through Jupyter notebooks and Python packages e.g., pandas, matplotlib, plotly. In production stage, interactive visualization is realized by using standard scientific tool, e.g. Paraview, an open-source 3D visualization program which e.g. requires Python modules to generate visualization components within VTK files
Graphics processing unitComa BerenicesSoftwareIntelAerodynamicsSystem programmingTerm (mathematics)Planck constantGamma functionVisualization (computer graphics)Computer fontImplementationHill differential equationFourier seriesPoint (geometry)Mathematical analysisData miningTablet computerData modelDirected setFast Fourier transformSpacetimeTheoryScatteringPopulation densityVector spaceCalculationAtomic numberFaktorenanalyseComputerMappingInformationAtomic numberDivisorSet (mathematics)Operator (mathematics)Well-formed formulaSupercomputerState of matterCASE <Informatik>Internet service providerArithmetic meanDifferent (Kate Ryan album)Point (geometry)AreaInternational Date LineFast Fourier transformView (database)Population densitySoftware developerFourier seriesProjective planeData miningCartesian coordinate systemForm (programming)Medical imagingData managementData structureDomain namePosition operatorFourier transformInternet forumClass diagramCollaborationismCuboidVisualization (computer graphics)AlgorithmFile archiverDirection (geometry)Planck constantMaxima and minimaResolvent formalismNuclear spaceCycle (graph theory)Information retrievalVolume (thermodynamics)Scaling (geometry)Computer animationLecture/Conference
Graphics processing unitAtomic numberDirected setCalculationMathematical analysisFLOPSFaktorenanalyseBefehlsprozessorServer (computing)Parallel computingScalabilityAlgorithmMIDIDirection (geometry)Atomic numberFourier transformCalculationOperator (mathematics)NumberAlgorithmMultiplication signPoint (geometry)Graphics processing unitImage resolutionScaling (geometry)Term (mathematics)ComputerResultantTrigonometric functionsFunktionalanalysisFrequencyOrder (biology)Online helpPresentation of a groupSpacetimeLecture/ConferenceXML
Graphics processing unitMaß <Mathematik>ArchitectureComputer-generated imageryParallel portComplementarityVisualization (computer graphics)Vector graphicsSineCodeNewton's law of universal gravitationOvalDevice driverDisintegrationMixed realityType theoryOperations researchBefehlsprozessorAbstractionMaxima and minimaData managementComplete metric spaceError messageLinear codeAlgebraReduction of orderFast Fourier transformOverhead (computing)Web pageWikiFAQInformationOpen sourceData structureModul <Datentyp>Thermal radiationForceWorld Wide Web ConsortiumMathematical analysisSimulationWavefrontContinuous functionPoint (geometry)Lattice (group)Computer hardwareScatteringAssembly languageHochleistungsrechnenLibrary (computing)Formal languageSpacetimeInstallation artRevision controlInterface (computing)SoftwareLink (knot theory)Scripting languageMatrix (mathematics)Graphics processing unitEquivalence relationScatteringCodeModule (mathematics)Complete metric spaceRevision controlDefault (computer science)AuthorizationInformationOrder (biology)Combinational logicLinear codeProcess (computing)Computer programmingSinc functionIntegrated development environmentProgramming languageComputer hardwareParallel computingOpen sourceDistribution (mathematics)AlgorithmPoint (geometry)Semiconductor memoryDevice driverComputer graphics (computer science)Projective planeAsymptotic analysisFehlererkennungscodeBroadcasting (networking)MereologyPower (physics)Task (computing)Line (geometry)Interface (computing)QuicksortMultiplication signResultantData managementINTEGRALVisualization (computer graphics)Computer architectureAbstractionCuboidCoprocessorLibrary (computing)Array data structureWebsiteCore dumpPolarization (waves)Software developerDemosceneElement (mathematics)High-level programming languageMathematical optimizationInternet service providerAtomic numberFunktionalanalysisSpacetimeVapor barrierCondition numberInternet forumSampling (statistics)Computational scienceThermal conductivityTheoryLevel (video gaming)Linear algebraBefehlsprozessorSummierbarkeitElectronic visual displayFlow separationFourier transformMoment (mathematics)Image resolutionXML
Interface (computing)SoftwareCalculationRevision controlLibrary (computing)ScatteringIncidence algebraInstallation artOvalKernel (computing)BefehlsprozessorGraphics processing unitLetterpress printingMagnetic stripe cardFourier transformoutputHill differential equationFormal languageData structureComplex analysisPrice indexRefractionBenchmarkCartesian coordinate systemVertical directionScale (map)Maxima and minimaCluster samplingQuery languageComputer fontVisualization (computer graphics)PlanningData analysisExploratory data analysisEmulationCubeRing (mathematics)Image processingTriangleElectronic visual displayComputer-generated imageryNumberAlgorithmPlotterComplex analysisFormal languageAngleComputer architectureRight angleTerm (mathematics)PlanningCubic graphMappingMultiplication signCubeImage resolutionData structureComputerStapeldateiDifferent (Kate Ryan album)2 (number)LogarithmReflection (mathematics)Library (computing)outputAtomic numberModule (mathematics)View (database)Point (geometry)Interface (computing)DivisorTrigonometric functionsThread (computing)Shared memoryBlock (periodic table)Subject indexingDistribution (mathematics)WorkloadCuboidScripting languageProjective planeFourier transformProcess (computing)Image processingCASE <Informatik>SynchronizationSingle-precision floating-point formatOrder (biology)Extreme programmingScaling (geometry)TelecommunicationElectric generatorData conversionReal numberVertical directionExtension (kinesiology)SpacetimeNetwork topologyContinuum hypothesisBefehlsprozessorExecution unitCartesian coordinate systemPlastikkarteData managementWell-formed formulaForm (programming)FunktionalanalysisInstance (computer science)Set (mathematics)CodeSemiconductor memoryIdeal (ethics)BenchmarkPulse (signal processing)MassPascal's triangleFourier seriesRule of inferenceElectronic visual displayMultiplicationMatching (graph theory)File systemPhysical lawPairwise comparisonComputer configuration5 (number)LogicTube (container)Asynchronous Transfer ModeMessage passingTask (computing)Dynamical systemXML
outputGraphics processing unitWorld Wide Web ConsortiumSweep line algorithmComputer fileArray data structureInstallation artOpen sourceVisualization (computer graphics)SoftwareCore dumpJava appletType theoryProcess (computing)VolumeImage processingMathematical analysisArchitectureView (database)Software frameworkElectronic visual displaySet (mathematics)Endliche ModelltheorieUser interfaceData structureRepresentation (politics)Convex hullAxiom of choiceSupercomputerProduct (business)Software maintenanceIntegral domainClient (computing)Function (mathematics)Server (computing)Maxima and minimaGUI widgetElement (mathematics)Graphical user interfaceMenu (computing)Online helpParameter (computer programming)SurfaceLocal GroupCurvePoint (geometry)Image warpingDigital filterActive contour modelBasis <Mathematik>SubsetThresholding (image processing)Scalar fieldRange (statistics)Vector spaceWave packetAttribute grammarQuery languageRadiusTupleInstance (computer science)SpacetimeSpring (hydrology)Volume (thermodynamics)Right angleFile formatPlotterPoint (geometry)Mathematical analysisComputer graphics (computer science)Visualization (computer graphics)MereologyoutputObservational studyOrder (biology)AngleSurfaceDistanceCartesian coordinate systemCross-platformResultantSoftwarePersonal identification numberUser interfaceGreatest elementMultiplication signRepresentation (politics)Open sourceSemiconductor memoryMenu (computing)IterationSupercomputerImage processingParameter (computer programming)LaptopSocial classPulse (signal processing)Vapor barrierData typeMeasurementEqualiser (mathematics)WhiteboardOnline helpDialectCollaborationismGraphical user interfaceObject (grammar)Data conversionFunktionalanalysisRange (statistics)DigitizingQuery languageThresholding (image processing)Confidence intervalVolumenvisualisierungMathematicsPhysicsClient (computing)QuicksortParametrische ErregungPower (physics)View (database)Category of beingWeb browserArtistic renderingComputer simulationFilter <Stochastik>Device driverTask (computing)Open setActive contour modelConnected spaceLibrary (computing)ScalabilityArray data structureMedical imagingDistribution (mathematics)Computer file
Fast Fourier transformMultiplication signRight angleSpacetimeAtomic numberLecture/Conference
Uniformer RaumTerm (mathematics)Fast Fourier transformTheoremComputerLecture/Conference
Fast Fourier transformSoftwareComputerDaylight saving timeSilicon Graphics Inc.Multiplication signLecture/Conference
INTEGRALSoftwareLibrary (computing)Context awarenessTask (computing)Multiplication signComputerLecture/Conference
Transcript: English(auto-generated)
Thanks for coming to my talk. First, I'd like to say that I work for the Max Planck Computing and Data Facility. So in short, MPCDF, which is Cross Institutional Competence Center of the Max Planck Society.
So we collaborate with the scientists from different Max Planck Institute society to support them from the high performance computing point of view. And for data science as well. Max Planck Computing and Data Facility is involved in the development of application and algorithm
for high performance computing, as well as for implementing and designing a solution for data intensive project. So not only operates the state of art of supercomputers, but also provide
up-to-date infrastructures for data management or long-term archival. In collaboration with one of these institutes, to be precise, the Max Planck Full Isofulsion Center in Dusseldorf, we are working in the so-called big data driven material science domain.
And in particular, I'm supporting them in the emerging area of atom probe crystallography. So their goal is to reveal both structures and composition of crystals. And basically, they ask us to perform Fourier
transforms of large data set. By large, I mean something like billions of atoms inside a crystal. They plan to retrieve high quality crystalline data and iteratively apply Fourier transform, which is imperative if you are interested in looking forward
to high quality sub volumes. And this cycle of operations helps you to improve and reconstruct the parameters useful for the atom probe tomography. This requires state of the art of data mining and visualization.
The Fourier transform or scattering maps for nanostructures in principle can be computed with the formula in the upper box, as long as the electron or atomic nuclear density is well-defined on a grid fine enough
to resolve atomic position R inside a crystal. So you can solve basically the Fourier transform either by using the well-known algorithm of a fast Fourier transform, which is pretty fast, scales like n log n and suitable for large crystalline structures,
or by direct calculations, which it's a bit slower. But we prefer to follow this way, because it's the most general case. In principle, you can compute the amplitude
of a scattering maps starting from the atomic position and scattering factors inside the crystals. From any structure model, I mean, it can be also ideal or non-ideal case. The crystal can present some deformation, some tension,
some strain. So that's why it's very useful to compute this great Fourier transform by direct calculations. Just to give you some numbers, usually we have to do with a lot of atoms. As I said, billions of atoms. So in this presentation, I'm showing results
based on 10 to the 8 atoms. The reciprocal or lattice space denoted by hkl usually requires a very fine resolution. So let's say 10 to the 6. Yes. And so in terms of floating point operations,
you have 10 to the 8 times 10 to the 6 times 10, which accounts the floating point operation due to the algebraic operations involved in Fourier transform. Essentially, you are evaluating a sinus and cosinus function. So generally, you end up with 10 to the 15 flop,
which is a very huge number. But compared to the peak performances of modern architectures, it seems this algorithm is well-suited for GPU computation, which has 10 to the 13 floating point operations per seconds.
And I'm going to show you how this algorithm scales perfectly on multiple GPUs on a computation time of order of minutes. So what we are doing at the MPCDF, not cooking perfect blue crystals, no worry. And first, why GPU programming in combination
with high-level languages, high-level scripting languages like Python? They are polar opposite. GPU parallel programming are highly parallel, very sensitive to the architecture, to the hardware. They are built to optimize the floating point memory
throughput in order to give you a tremendous high performance when you address your scientific task. On the other hand, Python is in favor of easy use. Favor is of use.
But PyCuda aims to join together these two aspects. PyOpenCL is a similar philosophy and can be considered a sort of a sister project. But for the rest of the talk, I'm just referring to PyCuda. And why Python?
Well, not only is it easy and user-friendly, easy to learn, general purpose. But it's very valuable for the scientific community because it contains a lot of packages,
very useful for addressing your scientific task. And this allows you to write your code in a dozen of lines instead of hundreds or even more in other programming languages. And especially, this avoids to reinvent the wheel any time. It excels in the displaying your data,
since the scientific visualization is an essential part in the scientific process. And NumPy, which is a foundational package for scientific computing, it gives you powerful n-dimensional array, broadcasting function,
optimize the linear algebra, Fourier transforms algorithm and tools to integrating C and C++ and Fortran code. Perhaps the most simple and useful program you can write in PyCuda is simple to multiply
by two element-wise your four times four array. Two things, you import PyCuda driver as CUDA alias, and then gives you access to driver-level CUDA interface. Then import PyCuda auto init, because it automatically picks the GPU available
up and run on it. And then simple, you define your four times four matrices. You allocate memory on the device. By device, I mean the GPU. And literally, host to device, transfer your NumPy array to the device. And now in the red box, the most interesting part,
you have a purely CUDA C code wrapped in Python. So essentially, this code executes on the device, and then it's called within
and from existing Python code. The same results can be obtained with much less effort using GPU arrays, since PyCuda offers abstractions, and so the GPU equivalent of NumPy array.
So in agreement with the edit, run, repeat style, PyCuda has a two-fold aim. First, it aims to simply usage existing CUDA C and wrapping this CUDA C, avoid to reinvent basics of GPU programming, and on top of the first layer,
PyCuda offer abstractions. So PyCuda gives you easy, complete, and Pythonic access to the GPU. So this guarantees automatic resource management
and error checking. Convenience, in sense, they provide abstractions, as I showed you before, and tightly integration, tight integration with NumPy array. Of course, it's fast, and has a very well documentation.
Here, Justin reporting some links where you can get more information about PyCuda. What are we using to analyze our crystals? We are using PyNX, which stands for Tools for Nanostructures Crystallography.
It's an open source library. The author is Vincent Favnicoli, and the code has been developed at the European Synchrotron Radiation Facility. I'm just talking about the main modules in charge of computing X-ray scattering by getting benefits of using
graphical processing units. Just to give a complete overview, I'm just enumerating the remaining modules, but these are not touched in the rest of the discussion. So the main aim of PyNX is to give a large sample of atoms,
let's say billions of atoms, to compute a Fourier transform in one, two, or 3D coordinates in the reciprocal space with very fine resolution, using the performances of GPUs. The high performance can be obtained
by using either NVIDIA toolkit, and so along with PyCuda library, or, as I said, with PyOpenCL. At MPCDF, the default Python environment is provided by Anaconda Distributions.
PyNX support Python version 2.7 and above. It can be simply downloaded by the website project. You can ask for an account and become a developer, or simply pip install PyNX. It's required to have PyCuda,
of course, if you want to run on GPU, and NumPy, Matplotlib, if you want to display data. PyNX, if GPU are not available, can run on CPU as well, so it's recommended to load,
to import pyfftw package, and optionally you can use some external library, the CCTBX library, which stands for Crystal Computational Toolbooks, and likely you can install with Conda under your Python distribution.
What makes PyNX very valuable is you can simply use the Python interface. You don't need to learn CUDA. I've just finished to say you don't need to learn CUDA, and then I'm showing a CUDA piece of code, but at least it's useful and nice
to have a view of what's going on. Essentially, CUDA FHKL is your device scanner. You have a device scanner for each modules in the PyNX library. A couple of remarks. You are indexing array
by combining threads and multi-threads and block. For each block, you are allocated a shared memory because threads are better for communication and synchronization. For global memory, you are transferring your input data
to the shared memory, so threads on single blocks have access to the same portion of the data. Importantly, you have each tree to compute each single reflection, and also you have fast and optimized trigonometric function.
This is just the continuum to take care of the remaining of the atoms included in your datasets. And now, finally, to Python interface. It's simple. You read your data. Essentially, our data providers, they give us this with file extension .pos, which is essentially made of four columns,
X, Y, and Z atomic coordinates, and the fourth column is the must-to-charge ratio, which helps to identify different atomic spaces in your input data file. It's a good habit to convert real nanometers units
in some dimensional units called fractional coordinates, which depends on the crystal you are exploring. And you can define your reciprocal 3D space, H-K-L, and run your function,
F-H-K-L thread, to compute the Fourier transform. The name of the GPU cards available is passed by command line, and you can choose whatever you have. For instance, in PCDF,
I'm running on Maxwell architecture, which is very performant. And essentially, what is computed is the formula in the box, this create Fourier transform, distributed on several GPU. And what it returns is a tupler,
a complex NumPy array, F-H-K-L, and also the computation time, which is nice sometimes, especially at the beginning when you want to perform some speed test. This is just to give an example of what you can get, or you can display. For instance, on the left is in H-L plane.
You are showing a contour plot of scattering maps for a monotomic cubic structures under tube cells. In this case, you can also appreciate the fact that I'm working with the non-ideal structures. You can see a slight offsets along the vertical direction.
On the right is just to compute the complex refraction index in terms of the scattering angle. Some performances. On the left is what I run on our infrastructures.
As I said, the Maxwell architecture, the GTX 980. So you are varying the number of grid points in the reciprocal space, and as well the number of atoms. And please compare with the plot reported in the seminar paper where for the first time PionX has been introduced
to the scientific community. Nowadays, we can reach throughput per GPU of order of four times 10 to the 11 reflections per atoms per seconds. More benchmarks. Please pay attention that on the vertical axis
it's on logarithmic scale. And this is the difference in computation time by running PionX on a GPU and on 64 logical CPU. For instance, looking at the batch charts in the middle, using a resolution of 64 cubes in the reciprocal space,
you can make your computation in roughly five minutes compared to two hours on a CPU. So there is a kind of a factor of 24 between the two. Here, just to make more example on new generation of a GPU,
in this case is the Pascal architecture. In fact, between the Maxwell and Pascal architecture, you gain roughly half an hour, for instance, in the most extreme resolution case. And this is how we deploy our data science project.
Basically, we submit our job on the supercomputing cluster. And in the box is just a small script how do you submit your job, because it uses the Zlern Workload Manager. Now, you have raw data,
and you would like to convert it in a form which is viewable and understandable to humans. You want to visualize your data, because it's important as well. First, we are going to use scikit-image, which is a collection of algorithms for image processing
developed by the scipy community and written in the Python language. Especially, I'm using the matching cubes algorithms, which iterates across your data volume, trying to find the regions
which matches with your isosurface value. So in the end, what's the goal? You have a data volume, and from this 3D cube, you want to extract a surface of equal value, isosurface. So it takes two input parameters,
the data volume, called power spec, which is the scattering amplitude essentially computed with PyEnix, and the value you are choosing. But to be more interactive with the visualization is convenient to use VTK, the visualization toolkit, which is open source software for computing graphics,
image processing, scientific visualization, is a collection of a C++ library. But well, also, I'll wrap it in Python, for instance. So what we need to do is to convert our 3D NumPy array given back by PyEnix in a VTK XML-based format.
And I'm going to use PyAVTK, which is a very easy-to-use Python package. And it saves NumPy arrays straight in your VTK XML-based.
So once you have this VTK XML file, you can process these files using one of the most common applications, like Visit, MyHavi, or ParaView. We are using ParaView as our main workhorses
for 3D analysis. ParaView, it's an open source, multi-platform, perfectly scalable, useful for visualizing huge amount of data in 2D and 3D. It's scalable in the sense you can run from your notebook
up to your cluster or distribute memory supercomputers. Has an intuitive user interface. And when you are doing with the scientific visualization, you need data with a well-done special representation.
So the data types used in ParaView are meshes, and to our purposes, we are using a linear grid, mostly uniform. This is just to say that ParaView is very popular in the academic, also government institutions.
When you think about ParaView, you just think about the small client application. In reality, ParaView is a tall stack of libraries. And VTK is the core, provides all functionalities for doing your visualization and volume rendering.
Concerning Python, ParaView comes with PVPython, which is a nice application which allows you to automate your task and make your Python scripting for visualization. So this is the graphical user interface. Basic three steps when you visualize your data,
reading, filtering, and rendering. As most of GUI has a menu bar with all the features included in ParaView, a toolbars with the most common features used for visualizing your data, a pipeline browser where a collection of a pipeline object
with indented syntax is presented. You can have a look inside your data or your pipeline collection and change parameters
in the inspector or properties panel, of course, on help. And finally, 3D viewing. So you probably are wondering why I'm talking about ParaView in a Python conference. If you look at this plot, which is ISO surface of my simulations done with Pionix,
behind these plots there is a Python scripting, essentially. So you set up your camera and your parameters, change your parameters. You use all the filters you need to apply contour plot,
thresholds, or whatever, and finally you visualize your data. This is just a collection of filters you may want to apply, a connection of all this, what, sorry, a collection of all these filters
gives you a pipeline object. I'm using for my ordinary work just three of them. For instance, on the right, I want to make a contour plot of my data, and then I want to, for instance, make a ISO surface extractions
by just looking at a range of values. I can also inspect in the data opening as pretty shit browser. And you can make a query on your data
according to some threshold criteria. Let's say I want to extract for all my data just the cells or the grid points according to some of my criterias. And also you have this nice feature
by clicking and dragging on your viewing to select the data you are interested. So I'm going to conclude with the next task we want to address. If you are interested in looking for some sub-volumes in your Fourier space, let's call some spot,
you want to identify these spots, measuring angles and distances, and iteratively pass to the data drivers in order to improve their atom probe tomography reconstructions.
And so ParaView and Python, it's very useful for addressing this task. And thank you for your kind attention.
We have time for questions. Hello, thank you for the talk. You say that you use your direct Fourier transform
because your atoms are not equi-spaced, right? You're not dealing with perfect crystals. Have you tried to use the non-uniform fast Fourier transform which doesn't require you to have equi-spaced grid? No, I...
Have a look at it. Okay. But in terms of performances, you get this, I mean, here you are using the benefit of using GPU. Can you run this norm? The non-uniform, it's actually runs on top
of fast Fourier transform, but it's actually a mathematical theorem that lets you go from equi-spaced grid to a non-equi-spaced grid. In terms of computational... Is n log n as fast as the fast Fourier transform?
So as far as I know, there's been, there's a lot of software used in crystallography that has been around for decades. And when I started, crystallographers were using computers
that are mostly unknown today, like SGI, silicon graphics machines. Are you aware of any integration of the things that you do with software that has been used by crystallographers more traditionally?
The only thing I know is this integration with the... Could you speak? Sorry. The only thing I am aware is the integration of Pionix tool with the external library computing crystallography computational toolbooks for making other scientific task like, I don't know,
computing grasping incidents or use it in this, difficult to pronounce it, typography technique and so on. So this is the only thing I'm aware of that. Any other questions?
Then let's thank the speaker again. Thank you.