Reduce the Storage Consumption of Your Storage Clusters with RozoFS

Cite

FOSDEM VZW

Pertin, Dimitri

Formal Metadata

Title

Reduce the Storage Consumption of Your Storage Clusters with RozoFS

Title of Series

FOSDEM 2014

Number of Parts

199

Author

Pertin, Dimitri

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/32615 (DOI)

Publisher

FOSDEM VZW

Release Date

2014

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Distributed storage systems like RozoFS provide the best solution to adapt the resources of your system to an evolving demand, but data protection entails a huge data consumption. This topic would interest those who cares about the data consumption (which is directly linked with energy consumption and architecture cost) of their clusters. Erasure coding (EC) is a technique providing the same data protection and availability as traditional block replication, while reducing storage usage significantly (e.g. up to 50%). Of course, EC comes with drawbacks, as it performs complex computations. However, the Mojette transform, used in RozoFS for its erasure code behaviour, brings fast computations since it relies on simple additions. Efforts are done to open up EC-based systems to data-intensive applications. The growth of the global storage is alarming. IDC's Digital Universe study [1] forecasts that the global amount of data will reach 40 zettabytes (ZB) by 2020. Data protection plays a major role in this storage consumption. The Mojette transform [2] is a mathematical tool from the University of Nantes that computes 'n' redundant projection blocks from 'k' information blocks. Any 'k' blocks among the 'n' are sufficient to retrieve the original data, behaving like an erasure code. Distributing these 'n' projection blocks over network storage nodes, RozoFS [3] is able to face 'n-k' node failures (including disk, network, server failures). Providing the same data protection and availability as traditional block replication [4], this technique reduces significantly the storage capacity (e.g. up to 50%). Of course, erasure coding comes with drawbacks as it performs complex computations. The Mojette transform, however, brings fast computations since it relies on simple additions. RozoFS holds many important characteristics for a distributed storage system, such as: * scalability: clusters of storage nodes can be added on demand; * openness: compatible with different protocols (CIFS,NFS,...), Amazon S3, Hadoop,...; * transparency: users manage their file exactly as usual; * management: provide a tool to make the administration tasks easier. http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf JeanPierre Guédon and Nicolas Normand http://link.springer.com/chapter/10.1007%2F978-3-540-31965-88 http://www.rozofs.com/ Hakim Weatherspoon and John D. Kubiatowicz http://oceanstore.cs.berkeley.edu/publications/papers/pdf/erasureiptps.pdf