The convergence of HPC and BigData

FOSDEM VZW

François, Damien

Formal Metadata

Title

Subtitle

What does it mean for HPC sysadmins?

Title of Series

FOSDEM 2019

Number of Parts

561

Author

François, Damien

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/44577 (DOI)

Publisher

FOSDEM VZW

Release Date

2019

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

There are mainly two types of people in the scientific computing world: those who produce data and those who consume it. Those who have models and generate data from those models, a process known as 'simulation', and those who have data and infer models from the data ('analytics'). The former often originate from disciplines such as Engineering, Physics, or Climatology, while the latter are most often active in Remote sensing, Bioinformatics, Sociology, or Management. Simulations often require large amount of computations so they are often run on generic High-Performance Computing (HPC) infrastructures built on a cluster of powerful high-end machines linked together with high-bandwidth low-latency networks. The cluster is often augmented with hardware accelerators (co-processors such as GPUs or FPGAs) and a large and fast parallel filesystem, all setup and tuned by systems administrators. By contrast, in analytics, the focus is on the storage and access of the data so analytics is often performed on a BigData infrastructure suited for the problem at hand. Those infrastructure offer specific data stores and are often installed in a more or less self-service way on a public or private 'Cloud' typically built on top of 'commodity' hardware. Those two worlds, the world of HPC and the world of BigData are slowly, but surely, converging. The HPC world realises that there are more to data storage than just files and that 'self-service' ideas are tempting. In the meantime, the BigData world realises that co-processors and fast networks can really speedup analytics. And indeed, all major public Cloud services now have an HPC offering. And many academic HPC centres start to offer Cloud infrastructures and BigData-related tools. This talk will focus on the latter point of view and review the tools originating from the BigData and the ideas from the Cloud that can be implemented in a HPC context to enlarge the offer for scientific computing in universities and research centres.