We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Lessons learnt managing and scaling 200TB glusterfs cluster @PhonePe

Formale Metadaten

Titel
Lessons learnt managing and scaling 200TB glusterfs cluster @PhonePe
Serientitel
Anzahl der Teile
542
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
We manage a 200TB glusterfs cluster in production. While we were managing this, we learnt some key points. In this session, we will share with you: - What are the minimal health checks that are needed for a glusterfs volume, to ensure high availability and consistency - What are the problems with the current cluster expansion steps(rebalance) in glusterfs we experienced? How did we manage to avoid the need for a rebalancing of data, for our use-case. Proof of concept for new rebalance algo for future. - How are we scheduling our maintenance activities such that we never have downtime even if the things go wrong. - How did we reduce the time to replace a node from weeks to a day. As the number of clients increased we had to scale the system to handle the increasing load, here are our learnings scaling glusterfs - How to profile glusterfs to find performance bottlenecks. - Why client-io-threads feature didn't work for us? How we improved applications to achieve 4x throughput by scaling mounts instead. - How to Improve the incremental heal speed and patches contributed to upstream Road map for glusterfs based on these findings