Processing Structured Scientific Data in Cloud
The HDF5 file format has been used extensively in the HPC community for the storage of scientific data (e.g. multi-dimensional arrays). Unfortunately, the traditional HDF5 library doesn't work so well for applications running in the cloud. To address this, we've developed a service based implementation of HDF5, HDF Kita. Kita utilizes object based storage (e.g. AWS S3) and runs as a cluster of Docker Containers. In combination with the service, JupyterHub enables users to easily run notebooks in the cloud that can use an unlimited amount of data and take advantage of the parallelization capabilities of the Kita Server. |