We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Reduce the Storage Consumption of Your Storage Clusters with RozoFS

00:00

Formal Metadata

Title
Reduce the Storage Consumption of Your Storage Clusters with RozoFS
Title of Series
Number of Parts
199
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Distributed storage systems like RozoFS provide the best solution to adapt the resources of your system to an evolving demand, but data protection entails a huge data consumption. This topic would interest those who cares about the data consumption (which is directly linked with energy consumption and architecture cost) of their clusters. Erasure coding (EC) is a technique providing the same data protection and availability as traditional block replication, while reducing storage usage significantly (e.g. up to 50%). Of course, EC comes with drawbacks, as it performs complex computations. However, the Mojette transform, used in RozoFS for its erasure code behaviour, brings fast computations since it relies on simple additions. Efforts are done to open up EC-based systems to data-intensive applications. The growth of the global storage is alarming. IDC's Digital Universe study [1] forecasts that the global amount of data will reach 40 zettabytes (ZB) by 2020. Data protection plays a major role in this storage consumption. The Mojette transform [2] is a mathematical tool from the University of Nantes that computes 'n' redundant projection blocks from 'k' information blocks. Any 'k' blocks among the 'n' are sufficient to retrieve the original data, behaving like an erasure code. Distributing these 'n' projection blocks over network storage nodes, RozoFS [3] is able to face 'n-k' node failures (including disk, network, server failures). Providing the same data protection and availability as traditional block replication [4], this technique reduces significantly the storage capacity (e.g. up to 50%). Of course, erasure coding comes with drawbacks as it performs complex computations. The Mojette transform, however, brings fast computations since it relies on simple additions. RozoFS holds many important characteristics for a distributed storage system, such as: * scalability: clusters of storage nodes can be added on demand; * openness: compatible with different protocols (CIFS,NFS,...), Amazon S3, Hadoop,...; * transparency: users manage their file exactly as usual; * management: provide a tool to make the administration tasks easier. http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf JeanPierre Guédon and Nicolas Normand http://link.springer.com/chapter/10.1007%2F978-3-540-31965-88 http://www.rozofs.com/ Hakim Weatherspoon and John D. Kubiatowicz http://oceanstore.cs.berkeley.edu/publications/papers/pdf/erasureiptps.pdf
65
Thumbnail
1:05:10
77
Thumbnail
22:24
78
Thumbnail
26:32
90
115
Thumbnail
41:20
139
Thumbnail
25:17
147
150
Thumbnail
26:18
154
158
161
Thumbnail
51:47
164
Thumbnail
17:38
168
Thumbnail
24:34
176
194
Thumbnail
32:39
195
Thumbnail
34:28
Student's t-testUniverse (mathematics)Computer animation
FrequencyMachine visionMultiplication signWage labourNeuroinformatikData storage deviceScalabilityInternet service providerXMLComputer animation
Formal languageSet (mathematics)Loop (music)Right anglePoint (geometry)Information privacyComputer animationXML
Natural numberPoint (geometry)Universe (mathematics)Data storage deviceComputer animationXML
CodeBlock (periodic table)Overhead (computing)Information privacyComputer fileData storage deviceComplex (psychology)RadiusState of matterParticle systemFood energyInformationComputer animation
AreaQuicksortState of matterFrequencyArithmetic meanComputer animationXML
Computer programmingCASE <Informatik>Right angleVertex (graph theory)ResultantMultiplication signComputer animationXMLProgram flowchart
State of matterProjective planeJSON
Rule of inferenceCASE <Informatik>MereologyUniverse (mathematics)Theory of relativityRight angleWordPlotterTheoryExecution unitService (economics)Block (periodic table)Phase transitionChemical equationType theoryDifferent (Kate Ryan album)Order (biology)Volume (thermodynamics)Multiplication signVertex (graph theory)SoftwareChannel capacityData storage deviceComputer fileMechanism designLecture/Conference
Transcript: English(auto-generated)
So Well, I'm Dimitri a PhD student from the University of North France and a contributor to Which is One of its main feature
Well, everything starts with the question
In computer
So today what is coming is to distribute your data
Okay, so You can access your storage. I just like to do
Everybody
And provide scalability And Meaning that
Access
So
We The amount of data The point is that data protection plays a major role
What we want to do is to find a way to reuse this Storage consumption So what we propose is to use a ratio coding which is
The idea here is when you want to store a file actually take into K data blocks and using a mathematical tool you are able to Compute Then when I
I am able to reconstruct I am
Here is that the storage overhead is only 1.5 Actually provide the same data protection that
Brings complex
So To
Means that
It
My Okay, so this talk
This request from And you can provide this Then
It's the time that's
Okay
Oh
I
This project was built
Oh
I Actually here We are
Looking at
The software
Yes
Yes We are When you take This Drive There is a mechanism To reconstruct the data
Also This problem is the service
Managed the next Space And For this case We do have two Two of the service Which is Different nodes And if one phase We can use
PhaseMaker to switch the service Do you support nodes of different size? And what type of Do you use to choose where to shoot? So if one node is larger than the other
Do you store more data on that node? How do you handle that? Well, currently we Define Nodes in volumes So each time you Like You consider a volume that We Drive the same
Same storage capacity Then if once you have more Nodes when you decide a new volume You bring The same way nodes So you can have You can share When you want to store Files If you don't balance
The policy blocks In order to In this case Normally there is no Unbalance This is going over time
So if you have more questions for Dimitri You can take it outside You'll be glad to answer Thank you