RestFS: the Next Generation Cloud Storage
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 199 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/32617 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
NumberConnectivity (graph theory)Video gameMultiplication signSound effectInternet forumMereologyDisk read-and-write headCycle (graph theory)Client (computing)Line (geometry)Server (computing)TrailSoftwareSoftware testingData structurePhysical systemTelecommunicationObject (grammar)WordCategory of beingData storage deviceMoment (mathematics)DialectRight angleSheaf (mathematics)Arithmetic meanFocus (optics)View (database)Set (mathematics)Information securityElectronic mailing listResultantGreatest elementQuicksortDifferent (Kate Ryan album)InformationStreaming mediaSlide ruleLogicAssociative propertyFraction (mathematics)Element (mathematics)System callSystem identificationMiniDiscCache (computing)File systemBlock (periodic table)Operator (mathematics)MathematicsHash functionAuthenticationSimilarity (geometry)Web pageCovering spacePay televisionFunctional (mathematics)Front and back endsDatabaseComputer configurationDimensional analysisUniform resource locatorRevision controlCellular automatonMetadataField (computer science)Neuroinformatik2 (number)Electronic signatureSoftware frameworkDirect numerical simulationQuery languageBitAreaReplication (computing)Single-precision floating-point formatData centerDistanceData managementLevel (video gaming)Configuration spaceDevice driverPoint (geometry)Semiconductor memoryComputer architectureSynchronizationData transmissionDiagramProgram flowchart
09:17
Multiplication signDisk read-and-write headRight angleVariable (mathematics)Revision controlSerial portInsertion lossMereologyVideo gameCategory of beingFraction (mathematics)SummierbarkeitThermal radiationCASE <Informatik>Block (periodic table)Line (geometry)Arithmetic meanSeries (mathematics)Different (Kate Ryan album)Physical systemObject (grammar)Slide ruleNatural numberGraph (mathematics)Identity managementExecution unitPosition operatorTheoryTrail2 (number)Set (mathematics)Semiconductor memorySingle-precision floating-point formatMedical imagingVirtualizationBitInfinityCellular automatonHash functionOperator (mathematics)DatabaseProcess (computing)Heat transferPoint (geometry)Front and back endsServer (computing)Vector spaceComputer fileType theoryReplication (computing)Computer programFile systemDiagramProgram flowchart
15:38
Standard deviationElement (mathematics)Block (periodic table)Object (grammar)Level (video gaming)Centralizer and normalizerInterface (computing)Replication (computing)Adaptive behaviorNoise (electronics)Communications protocolServer (computing)BitDevice driverWebsiteInformation securityData managementCodeDirectory serviceHash functionData storage deviceScalabilityMetreFront and back endsCASE <Informatik>Position operatorDatabaseComputing platformMultiplication signSoftware testingPhysical systemCartesian coordinate systemPoint (geometry)Proxy serverFile systemAuthenticationMereologyArithmetic meanLimit (category theory)Insertion lossVideoconferencingData recovery2 (number)Array data structureLatent heatDebuggerGraph (mathematics)Physical lawSound effectGraph (mathematics)Machine visionSet (mathematics)NumberSummierbarkeitRadiusRight angleNatural numberCategory of beingEndliche ModelltheorieRule of inferenceMoment (mathematics)MathematicsPlotterRevision controlForcing (mathematics)InformationDisk read-and-write headComputer animation
22:00
Block (periodic table)Multiplication signView (database)Right angleNumberWebsiteSingle-precision floating-point formatSpacetimeServer (computing)Numbering schemeRevision controlComputer programPunched cardDifferent (Kate Ryan album)Physical systemMereologyDistribution (mathematics)XML
23:14
State of matterMultiplication signDistribution (mathematics)IterationSocial classDifferent (Kate Ryan album)Staff (military)ScalabilityStructural loadGoodness of fitDirect numerical simulationData centerCartesian coordinate systemSoftwareSupercomputerFile systemLecture/Conference
Transcript: English(auto-generated)
00:00
Yeah, my name is First of all
00:30
Two percent The problem that we have is we try to use a common system that is present in our computer or
00:41
in our storage arena, but this sound Like to put the tape on on the street the tapestry know that we need something new for this reason We started to Rethinking completely the story The main goal is to create
01:14
Another interesting thing we start the project because we want testing to find new technology new party
01:21
That is main today a big framework to try The new technology First item bond. This is quite famous. This is a Concept movie computation is cheaper than moving data. This is our first party Second that come from Amazon. There is always a field
01:41
waiting around the corner and The last that come from Ruby guy We have to decouple as much as possible because we handle so many days Then we have to the couple and the element that Make your solution have to evolve in different life cycle style then you need to decouple everything
02:07
From these three party we create five Peanuts now we work on the object. That means means we separate the data from the metadata We try to mark each element with a revision or a hash to make a unique
02:24
Identification this would be faster to look at We want to introduce the cash more intelligent cash based on callback. You don't have to put you would be notified This is one of the main concept of our care. We will notify something change
02:40
What's the transmission now? We work in the network the latency the long distances a big problem We want to create a wide area area network storage We don't want to create a single story for single data center Then we need a compression. We need to transfer the data only if you need and what is changed especially way
03:01
We also to simplify the communication We have today many firewall many protections then we have to work on HTTP as much as possible Obviously we have to be distributed to decentralize you remove any single point of failure That means we need to
03:21
Replication of the data with the we need to spread around on multiple nodes We have also to look up your data all the time or as much as possible You don't have to fix configuration You need to retrieve the situation that is better for you in the time that you use the data And also do obviously today the security in special way after the NSA is quite important
03:46
Well, our structure is very similar to Amazon then we have more or less the same concept we We have packet and we have object object are storing back And what doesn't exist today in s3 is the cell concept the the bucket reside in a cell
04:05
I've tried to explain a little bit better You have a client the climate and look up on DNS name They receive back the list a list of IP of resource locator a List of side that contain your data. No you make the query to DNS
04:24
You find where is your packet then you have a list of? Rosso resource locator that means you still don't know exactly where is you know only the cell For each cell you can retrieve the list
04:42
Of the server that handled your your data, but then we have three level No, we have a first look at more global that is more handled by DNS We have a sub one that give you okay myself has this dimension I have this list of the server. Please use this one or I have a couple of separate this is cool
05:03
Please move another region that is only for discovery and Absolutely from all this information the server return you also the tide and Priority no, I'm a bigger than I can handle for
05:21
400,000 connection or a small and I am full for 90 percent and so on To retrieve the data when you have the list of server that means we did at the server and object and data You can start you start first of all you need to retrieve the property of your object
05:42
No, you discover your bucket and second step. You have to discover. What is your object? and The property of the object mainly is to retrieve the signature of each element that you need Then you have metadata What does it mean metadata mean who created the object the size of this object and many other action?
06:03
The data reside in a different server side That means also you can scale in a different way You know if you have many operations that read the metadata and less operation right or or read you have different dimension of the of the cluster
06:21
No, we are in that in a cell and we have different kind of cluster one for me the data one for option All this information would be cached by the time No, you have a look up While we have to the right authentication that you can use token and similar idea behind
06:42
covers You have a famous subscription. No, I store some object in my in my Locker cache, please notify server has to notify something change and you have to catch one for me today This is a permanent page that means is stored on the disk of the VM
07:04
That means also when you restart your VM the data is there and then you can only Revalidate with the head or with ID your cash that you don't need to transfer when you have to make some operation you need only to transfer the difference because you have all the head and
07:22
You have to send the head if you have to read you have to check what is changing and retrieve only the block You need because the other block is in your head We have the caches divide in disk and also memory to speed up but mainly is a person spent This is difficult to find another
07:41
file system today At the moment we don't add but you can imagine immediately that you can handle this connected operation You can sync your file system everything when you will have connection The Infrastructure of the server the architecture is divided in Three million level we have our front-end level that is for with the client
08:04
We have a manager level that where we have the business logic and we have a driver We have many pieces no we try to divide each operation in a separate in the
08:24
Software that means we've seen the next slide that each element has a dedicated Backend dedicated database that mean you can scale up a specific function if you have a problem on the specific And also you can paralyze all the operation
08:43
Well we can identify now The main service that is made of data as I said you can token for authentication Well we have We have the resource locator in the side we have the callback service we have the block device
09:09
And we As I said why
09:37
And
09:43
We try to use as much as possible K value to use no SQL It's quite fast or very well for cash Is designed to work with cash then we can scale in person in this case
10:01
We can be very fast We try to keep a simple operational. We have a lot of database. We try to keep a single operation of comic operation We can contact Wait you can fire all then Because you have different DB
10:21
That you can fire no SQL is very far that everything is very We also use on the back end on the server distributed memory cash What and I want to give you a smaller example on internment that is
10:47
You know what what it's like no And you have a set of property that is famous
11:03
Second size block size max Why we have block size my size Because your object
11:23
The data has a block then you can specify like today you have the block on the We are now you can change your block and on your five, you know The type of your file you can decide to use large block more block and also the block are collected in
11:43
segment To reduce Transfer that sometimes you need to 500 blocks then you can ask the segment then you don't have to ask 500 ID to the back you can say okay, please send me the segment
12:02
But also the size of the segment you can decide Then you have made the data that where you have all the properties You can figure out how many blocks how many requests you can do and The hash for each block and this is kind of like this
12:24
On the other side you have the block and is collected by second and you store it in a separate Small different on on the block side. We use a hash
12:42
straight track For understanding if we handle the same block On the property we use a serial because we are able to handle versioning We can enable versioning and with serial. We also through the serial we use recommend a clock vector to
13:03
To resolve program on a contemporary commit or disconnected operation Couple of other know the ID as a side is a name Idea object is a UI ID
13:21
And in this segment is a combination between the position in the segment and the ID itself and the chunk data is meaning ahead that's This you can read the ID of the of the of the single block
13:42
We build with the composition of each Bucket name segments for the segment ID position In this way, we have a unique ID for all them back What you can do the process is this is something that you can do with industry
14:05
You can create fake object that is kind of my point of devices then that you can connect one Cell or one bucket to another bucket then in this way, you can create an infinite file system And also in this case you have a unique
14:23
Path know that that will be the path will be the same But the nice things that you can mount buckets and buckets like today in the prices
14:40
As I said, well, sorry We have a revision we try to keep revision and versioning Like as much as possible that we keep always the last version and we keep the difference of the previous one to Help a little bit the virtualization. This is very useful in virtualization. If you have to start a new VM
15:03
cancers are from Gold image that mean the time of replication of the gold image is zero because you don't have to replicate any On the block storage. This is today is a sub project is a little bit more complicated because he's based on a distributed
15:23
block replication Mainly based on consistent action, then we try to keep as much as possible to handle the failure not to re-replicate or rebalance the head And unfortunately from the previous one But we use the three copies or whether the user can decide which are going to write one
15:45
We to write to read and song that the security of your data. You can't decide that quite quite famous and on the back end the failure detected by custom server custom protocol and
16:01
it's No Other to note that for the replication it is a small custom custom and in case of failure we have Election to identify who is the master who has to receive the data replicate to other nodes
16:21
Then the first part is that we have a front end and the front end is kind of proxy to the back end As I said on the cache side We have a party subscribe
16:42
And we have a persistent What as I said security is important, but We use an SSL protocol You can decide if you want to encrypt on the block level that each block can be
17:02
Encrypted on the server on the vine obviously much better And we have an ACL and NFS for schema of ACL As more why we use of noise to add and why we try to not keep everything as we are with
17:24
hash and You can see with arrays you are able to handle five hundred Five hundred fifty thousand requests per second with a single DB That mean you can install all the video that you want that from the concept of the side
17:45
Inside of the side not the scalability. We don't have any limit in this country. You can see also Again much faster. It's like one seven hundred thousand request per second
18:00
with what I think We try with over 10,000 time but this is on a back-end platformer that the code of The code is organized in For level but already mentioned we have a protocol to face
18:23
We have a service we have a manager and we have a driver what I want to say We have followed a little bit if you know the customer fast that each element can be replaced Each element is decoupled Standard interface if you want to do something better if you want to
18:41
For example on the storage you can use our story. That is the name is Pisa or if you want to use the local piece You can replace the driver and write the block on the lock and then you don't have to care replication and so on In the management if you want to do something specific in the versioning you can override the versioning
19:04
Or if you want to another interface now the interface on the TCP we have to interface one s3 we We have all the s3 command that we are completely Replacing and we have our rest FS database. That's the big advantage that you have all the data and
19:32
What we use mainly we use everything Python because we believe is more simple Much faster for prototyping and what on that position in the back end we use zero MQ and
19:49
a bunch of other technologies We use out for authentication We don't believe to to store our prevention in our system. Probably is more make more sense to use a standalone and
20:02
Yeah for now the DHT the Distributed the replication of the data is based on DHT today. We've used the media. We want to move past three or a multi-dimensional Distributed hash the point is okay. I told you a lot probably you don't understand very well, but what what?
20:24
You have to use the rest of that Or what is the advantage you can use as a home directory? You can have a home directory everywhere in every your device a central point like dropbox But on on the file system level you can mount we have a few some amount
20:41
That you can mount and you don't care where you are You can use Object starting an application you can bring application that use directly interface that can have advantage You can use also for CDN because we have this replication Adaptive this is one of the things that we want to do make
21:03
Adaptation on the request and I scale out the system This is a recovery we don't have any problem to replicate across The time is over What if we want some of these topics will be handled in the conference 2014 insert
21:27
I think the more interesting part is you can visit the experiment that Test test plan in Geneva that mean how many meters below the ground and the famous ring of
21:40
30-kilometer and Yes That doesn't mean I don't want I don't want money And in mainly people that is interested share the idea from code No, as I said, we use Python mainly because we believe in the prototyping make the experiment quite Go forward Um, well now that the site is not updated I hope to update soon and we are rewriting
22:08
Punch what a big part then I think we require that means we remove from the website the zeros of one version Because we don't want to compute the people I believe in one month
22:22
We'll be able to 0.2 that has a different scheme of distribution of the block then What please keep in mind one month check the site try to download Then the nice thing that all this company I talked about a lot of company But you can start install without program in a single VM now use a single DB single server
22:46
Yeah, then you can try on your system. You don't need to handle 10,000 node or something like that. Yeah, you can start to fly and And everything is included is many items that you can understand it you start
23:04
Well, I know that you come from space Well and Our inspiration on the file system come from staff
23:22
Come from open FS come from extreme FS come from tau FS No, we try to collect all the good very good idea and from staff We get OSD and especially with the distribution of the native time the auto scaling of the need well, I
23:40
Think we are more for wider network because we are on HTTP and DNS base That's for Luca no, that's that's sex is more for high-performance computing but more connected in the data center And they say that the center load latency You have letters, but anyway, well, I think is we are more cross
24:02
We are more on the side of S3. We want to increase the performance, but I think is It's not for single application that has a very high high load. I you on the system, but it's more for many Request no, I probably understand. I don't know what you want to do in the future