Introduction to Swift Object Storage
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 644 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/41407 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
00:00
Object (grammar)Data storage devicePersonal digital assistantArchitectureSchmelze <Betrieb>Physical systemObject (grammar)Data storage deviceComputer animation
00:31
Personal digital assistantEmailMobile WebInformationInternetworkingType theoryRelational databaseData storage deviceSlide ruleData structureComputer fileHypermediaDifferent (Kate Ryan album)System programmingBlock (periodic table)Object (grammar)LogicArchitectureUniqueness quantificationMetadataAddress spaceCurvatureCluster samplingPoint cloudScalable Coherent InterfaceWorld Wide Web ConsortiumInfinityGroup actionCodeCore dumpObject (grammar)Block (periodic table)Data managementComputer architectureData storage deviceOrder (biology)Physical systemForm (programming)File archiverRow (database)Electric generatorPoint cloudPoint (geometry)CASE <Informatik>Mobile appType theoryDifferent (Kate Ryan album)Bit rateVideoconferencingPotenz <Mathematik>MassDigital photographyComputer fileStatisticsShared memoryMessage passingTelecommunicationWeb applicationExplosionHypermediaNumberContent (media)File systemService (economics)WorkloadWeb 2.0Connected spaceAddress spaceCurvatureRemote procedure callData structureMetadataMereologyDirectory serviceFunctional (mathematics)1 (number)Ferry CorstenWritingChannel capacityComputer animation
08:30
Personal digital assistantNetwork topologyDigital signalObject (grammar)Data storage deviceScalabilityInstallable File SystemOpen setConsistencySoftware testingPartition (number theory)Spherical capUniform resource locatorFlash memoryBackupData storage devicePhysical systemPerspective (visual)ScalabilityObject (grammar)Key (cryptography)Block (periodic table)Right angleCartesian coordinate systemContent (media)Point cloudVideoconferencingProduct (business)Office suiteSoftware bugShared memoryWebsiteFocus (optics)CodeComputer fileCASE <Informatik>Term (mathematics)Arithmetic meanMiddlewareFunctional (mathematics)Software testingDigitizingLine (geometry)Network topologyEvent horizonBitProjective planeMultiplication signArray data structureMobile appUniform resource locatorRepresentational state transferBasis <Mathematik>Computer hardwareHard disk driveDifferent (Kate Ryan album)Stack (abstract data type)ConsistencyWeb application9 (number)NP-hardLink (knot theory)SimulationPairwise comparisonComputer animation
16:10
Object (grammar)Spherical capConsistencyUniform resource locatorPartition (number theory)Autoregressive conditional heteroskedasticityEncryptionFunction (mathematics)Fluid staticsAuthenticationArchitectureObject (grammar)Computer fileMultiplicationPhysical systemGroup actionProxy serverFront and back endsNumberBitData storage deviceGateway (telecommunications)Limit (category theory)Covering spaceConnectivity (graph theory)Fluid staticsVirtualizationPoint (geometry)CountingMobile appForm (programming)Multiplication signFunctional (mathematics)Maxima and minimaConsistencyComputer architectureDataflowDifferent (Kate Ryan album)Domain nameAlgorithmWeb 2.0Service (economics)WorkloadInterface (computing)MetadataTelecommunicationPartition (number theory)LastteilungAdventure gameReplication (computing)MereologyVirtual machineServer (computing)HD DVDData centerCodeHigh availabilityCurvatureAuthenticationSingle-precision floating-point formatAddress spaceType theoryCASE <Informatik>DatabaseElectronic mailing listUniform resource locatorChemical equationWindowSoftware maintenanceTemporal logicComputer animation
23:49
ArchitectureInstallable File SystemData storage deviceRule of inferenceCoding theoryInformationService (economics)ConsistencyObject (grammar)ImplementationMiddlewareCodeOperator (mathematics)Ring (mathematics)Data centerData storage deviceProxy serverConsistencyDiagramRule of inferenceCodeDatabaseDifferent (Kate Ryan album)Projective planeCodeNetwork topologyPhysical systemChannel capacityComputer architectureDemonMultitier architectureConnectivity (graph theory)Perspective (visual)InformationBand matrixUniform resource locatorUniqueness quantificationObject (grammar)Control flowPoint (geometry)Binary fileSingle-precision floating-point formatSlide ruleHash functionComputer fileMereologyMultiplicationRight angleScalabilityDomain nameBitFlash memoryAnalytic continuationComputer animation
30:19
ConsistencyProjective planeScheduling (computing)ScalabilityServer (computing)Core dumpQuicksortConnectivity (graph theory)Object (grammar)Parallel portBitPhysical systemAuthenticationData storage deviceDemonMereologyLogic gateOpen setSound effectComputer animation
33:11
Physical systemConnectivity (graph theory)Lecture/Conference
34:03
Service (economics)Point cloudLecture/ConferenceProgram flowchart
Transcript: English(auto-generated)
00:06
Hello, everyone. First of all, thank you for waking up on a very cold and early Sunday morning. Thanks for coming here. I appreciate it.
00:20
My name is Thiago da Silva. I'm going to be talking about an object storage system called OpenStack Swift. I work for Red Hat. I'm a Swift core reviewer, meaning a group of people that review and maintain code and merge code.
00:43
And I've been working on Swift for about four years or so, since 2004. I thought I'd start with a brief introduction into object storage in general, like what drove the need for object storage, talk about some of the use
01:05
cases and what requirements those use cases lead to, and then dive into Swift, look at the functionality, do some overview, talk about the architecture, and then just answer some questions. So it's going to be pretty intro and brief.
01:24
So we've all heard a lot about the explosive growth of data that is just being generated. There's been a ton of statistics about the exponential amount of data that is being generated.
01:42
And what's interesting about that is that data is being generated a lot, but the rate of that generation is also growing. So it's really an exponential growth. And the second point that's interesting is that what's driving that generation of data growth is mobile applications and web applications.
02:06
When you talk about data growth, you need to also talk about what types of data is being created. And I try to break that down between two types of data, what I call structured data, and that is data that is organized in a relational form.
02:24
Think of databases. It's data that is easy to analyze, easy to query, typically in a column and rows and in a very tabular form.
02:41
Very easy to consume and analyze. And we have also what is called unstructured data. So that's data without any structure or any order or schema. Your typical example of that is all the media files that are out there being shared across the web,
03:03
document files, research files. Think of from x-rays to even down to JSON files being shared across the web. What is interesting about unstructured data
03:21
is that it makes up about 90% of the generated data that is being generated today. When you talk about types of data, you also have different types of storage systems for storing different types of data.
03:42
We all know very well block and file storage. Block, data that is divided into chunks of blocks. We know how it's accessed. It's really well understood. File systems, again, very well understood.
04:00
And you have object storage. That is just a logical architecture that is used to manage data as objects or blobs of data. What's interesting about object storage is that when you talk about an object, that object contains the data and the metadata
04:23
about that data. And it usually contains a unique ID. How you access that data is through a unique ID. So it has a very flat address structure, for example, compared to file storage where you have directories and you have files.
04:42
Object storage, typically, is just a very flat, maybe you have buckets or containers to put your objects into. And that's pretty much it. The way you access it is through its unique ID. And the other point that is interesting about object storage is that, differently from block and file,
05:05
is how you access it. Typically, you access the object as a whole. So you're either writing a whole blob of data or you're getting a whole block of data. Again, compare, for example, to file where you might be able to open and append and write parts of it and read just parts of it.
05:22
Typically, with an object storage, you put and you get the whole thing, the whole blob of data. The other distinction that I wanted to make here that I think is very interesting is that with block and file, you have local access. So you can think of your SATA drive, your local hard drive,
05:43
and you're accessing it locally. Through, for example, your file system, you have XFS file system that you're accessing locally. Or you have remote connections, too, like your Sunbus share and whatnot. But with object storage, what's distinct about it is it's always remote.
06:01
Typically, you don't access an object storage through a local connection. So what are the use cases that are driving the need for object storage?
06:21
The first and foremost, the typical one that you see are the private clouds. So the object storage market was pretty much created by Amazon S3. But today, there's many public and private clouds out there that are running OpenStack Swift. I tried to list just some example of them. These are all companies.
06:41
OVH is based here in Europe. They've spoken at many OpenStack conferences talking about a very large cluster that they have running OpenStack Swift and storing a lot of data. What's interesting about this use case is that typically, if you are maintaining that private cloud, you don't know
07:01
what kind of workload you're dealing with. You're just providing a service and your users can be very varied and you don't know. But there are other use cases too. You have what I called web and mobile applications. For example, Wikipedia is an example
07:22
that we always talk about where all the media content in Wikipedia is actually stored on a Swift cluster. Which is interesting to think about that is that I don't think the size of that cluster is too big, but the number of users is just tremendous, right?
07:40
Terxel is a Red Hat customer. It's a public reference that we talk about. They are a telecom company in Turkey and they've talked extensively about some mobile applications that they built. Something similar to like WhatsApp that allows their users to send messages
08:03
and share photos and share videos. And they also built a pretty massive Swift cluster and supporting, again, millions of users. Not only a very large cluster in capacity, but also supporting millions and millions of users.
08:22
Another use case is data archival. And what I mean by data archival here, it can be either what I like to call active or passive data archival or cold storage if you think about it. Video companies that are producing or filming on site need to upload their footage
08:44
so that it can be whatever they do, processed production in their production site. So Digital Film Tree, for example, is a company that talked a while back about how they were filming on site in Vancouver,
09:00
for example, and they have their offices in Los Angeles and they're able to use a Swift cluster to upload their data to the cloud and download that data from their Los Angeles offices and speed up how that production of the video happened. Next Cloud also has been talking here this weekend.
09:22
They offer an application similar to Dropbox where you can just share files across many devices or with other people. So again, all these different applications allows you to put data on a Swift cluster. You can put it into cold storage and just back up
09:41
or you can actively be sharing content. So based on these use cases, what kinds of requirements does that entail? What can we learn from these use cases? The first one is that you need to provide durability.
10:03
So from an end user perspective who is uploading that data, again, going back to that distinction that I made between block and file, with object storage, you're really abstracting the data from where the data is being stored.
10:20
I don't want to know, I don't care how that data is being stored, but I need the guarantee that it's durable. It needs to, I can't afford to lose this data. So the object storage system needs to take care of that. From a user's perspective, I don't know where the data is being stored,
10:42
what kind of drive, how it's being updated, how refreshed that system is, what kind of backup it does, or if it's using Flash or hard drive, I don't know. I'm just putting my data on a cloud and it needs to be durable.
11:00
It also needs to be available, meaning I need to be able to access this data 24-7 if I have to. It cannot be unavailable. That's a huge difference, for example, again, from a lot of times even looking at the SLA for EBS on Amazon or what have you
11:24
in terms of block storage on public clouds. It's actually, they provide very low SLA, like 99.9, compared to object storage that has many nines. It's always available. It's always being able to connect.
11:41
Accessibility is really how easy it is to use your API and how accessible it is. That's another thing that drove the need, or another requirement that drove the need for object storage in terms of all your web apps and your mobile apps and how they're accessing your data.
12:02
They need to provide a very simple-to-use API to access that data. And of course, scalability. Nobody starts with 100 petabytes of data. You start small and you need to be able to grow. You need to be able to have that flexibility for your storage system to grow as you grow.
12:23
You need to be able to grow extensively, but you also need to keep it all at low cost. Companies are trying to get away from the storage arrays of old time, and they need systems.
12:41
And that's why we have this room. That's why we have the SDS. That's why we talk about software-defined storage. It needs to be able to be running on low-cost hardware. So based on these use cases and requirements, let's dive into OpenStack Swift
13:00
and just talk a little bit about that. So Swift is a highly-distributed, eventual-consistent durable object storage system. It allows you to store your data safely and cheaply.
13:21
Talking a little bit about the OpenStack Swift community. It's an open-source project. It was one of the founding projects of OpenStack. It was produced by or founded by RecSpace. And what drove RecSpace is that they were a public cloud.
13:44
They had the need for an object storage system, so they implemented Swift and then donated that to OpenStack to found OpenStack. What that means, though, is that Swift has been in production running a public cloud for about eight years.
14:01
It's very stable. The API is very stable. The system is very stable. And it has a very active community. We have over 700 total contributors. We have about 20 or so active contributors on a monthly basis. And one of the things, like I said,
14:21
I've been working in Swift for about four years. And one of the things that I always like to highlight about the community is its focus on automated testing. There's a really huge focus on CI and then making sure that the code that we write gets properly tested. So we have really a huge focus
14:42
on how we test every single code that we put in there. As an example, I always kind of thought it was interesting how we merge a recent feature, sim links. And I looked at the piece of middleware
15:01
that we're actually providing the functionality. It was about 300 lines of code. And I looked at the tests, and it's about 2,000 lines of tests. That's a big difference. But I think it speaks to the focus of the community in terms of making sure that you're writing good code. It doesn't mean that we don't have bugs or anything, but we focus a lot on that.
15:22
And it's a very friendly community. It's been a lot of fun to work on those folks. Quick overview. So you access Swift through a RESTful API. It's all through HTTP verbs. So your GET, your PUT, your DELETE, POST.
15:43
The object key, the way you access a given object is through a URL. So differently from other object storage systems where you typically put your data and the system gives you back a unique key, the user itself is who defines
16:00
what that key for that object is. So the key is just a URL. You have, like I talked about, the flat address space before. You can group objects, a collection of objects, into buckets or containers. And you can group containers inside a account.
16:24
And think of account more like a bank account where you put stuff in as opposed to a user. It's not really, an account is not really a user. It's more like just a group of containers. So it's just a way to organize your cluster
16:40
across multiple accounts. I mentioned already before, it's a highly durable and distributed storage system. You're able to load balance your cluster and you replicate the data many times to maintain data durability or you use erasure coding.
17:03
But the point is that there's no single point of failure. Your system can be, depending on how you deploy, you can deploy in such a way that you don't have a single point of failures. A interesting distinction about Swift is that it's an eventual consistent system.
17:22
So it's designed for high availability and partition tolerance, sacrificing a little bit of consistency. The myth that I like to dispel about eventual consistency, that's the number one thing that I hear a lot is that a lot of people equate eventual consistency
17:42
to eventual replication and that's not true. So eventual consistency does not mean eventual replication. It does not mean that Swift, when you put data into Swift, you store just one replica and Swift will eventually replicate your data. Because if that was the case, then you would have a window of time there
18:02
that you might lose your data because you might have just one copy and that disk might fail. That's not the case. Swift will always store your data durability with multiple replicas so that when a system goes down or something, your data will not be lost.
18:22
Shameless plug. There's gonna be another talk later today where it's actually gonna dive a little bit more into how eventual consistency can actually play a really nice role into the Swift architecture. And that's when we talk about, for example, global replication.
18:41
So when you have multiple data centers and you need to have a cluster that goes across or spans across multiple data centers across the globe. That's where eventual consistency really shines.
19:02
Swift has a very rich API exposing a lot of functionality. I list some of them there. I'd like to highlight just a couple of them. Object versioning is when you're able to store or if you have a workload that overwrites objects,
19:21
you're able to maintain old copies of that object. And object exploration is when you're able to store an object and set an expiration time on it and it will be automatically deleted for you. So the system takes care of that automatic deletion. SLO is talking about,
19:41
or is a feature that provides large object support. So similar to other object storage systems, Swift applies a artificial limitation on how big your object can be. And that limit is typically five gigabytes.
20:01
And that's really artificial just so that you don't have people trying to upload using HTTP a terabyte object. It wouldn't be a good idea to do that. So we put the artificial limitation of saying five gigabytes is the maximum size by using a feature like SLO or static large objects. You can segment your one terabyte object
20:22
into smaller chunks, upload them separately, and then provide a manifest file and Swift will concatenate your objects for you. Temporary URL is another feature where, for example, you upload an object and you want to provide temporary access to somebody else
20:45
and you don't want to provide them your authentication secrets. You can produce a signed temporary URL that you can hand it to them, they can use to get that data and that URL will expire after some time. Swift also provides support
21:01
for different types of authentication systems. Temporal is one of them, Keystone and OpenStack, FormPost. There's also going to be another talk by Christian today talking a little bit about how to develop your own apps and he'll cover a little bit more about authentication.
21:24
Let's talk a little bit about Swift architecture. Swift has a very modular architecture that allows for the flexibility when you're scaling it out. Covered here, four main components. The proxy is your main gateway to the cluster.
21:46
So that's how users will send their requests and the requests are received at the proxy. And the proxy will then proxy out those requests to the backend storage system.
22:00
The backend storage system is made up of the account server, the container server, and the object server. The account server and the container server, the account is really holding just a list of all the containers for that given account and the container is just holding a list of all the objects for that container.
22:23
And some metadata. The account and container server are typically, data is typically stored on a database on disk and it's replicated properly and all that. And the object server itself really becomes kind of the most important part here where that's the service
22:41
that is storing your data on disk. What's interesting about this modular architecture is that you could have those four services running in one machine or you can have them run in separate machines independently of each other. They are just web servers running
23:02
or implementing the WSGI interface and the communication between them, so not only the communication between the user and the proxy, but the communication between them is also through HTTP. Just giving a little bit more of the flow of the data here.
23:22
So imagine, for example, that a user is putting an object, is writing an object. It sends that request to the proxy and what happens is that the proxy will then replicate that data. So we'll write three copies of that data in a load balance and distribute it across the cluster in such a way that the algorithm makes sure
23:44
that you're writing the data across different failure domains. So either a failure domain can be a rack, a node, or even a whole data center. So if you have multiple data centers, you would make sure that it distributes your data across two different data centers or if you have multiple racks in a data center,
24:02
you would put in different racks. The other distinction that I like about this slide is that you can also break up your cluster between what we call the access tier and the storage tier. And it allows you to scale them separately from each other. So the access tier is gonna be the proxy
24:22
and it's gonna be very CPU intensive. So if you need to scale out bandwidth, you might just be able to add more and more proxies. But if you need to grow capacity, you might be able just to grow your storage tier and add more disks and add more storage capacity as opposed to having to add more proxy.
24:40
So that's a really neat thing about this architecture. But you can't talk about Swift architecture without talking about the ring. The ring is really about how Swift is able to determine where the data is stored.
25:02
So the ring is just a binary file that is using a hashing ring to be able to determine where the data or where an object is stored on your cluster. So that's something that you produce,
25:23
the ring as an operator when you're deploying your cluster. And the way it's used is when a object request comes in or a request comes in, the proxy is going to look up, is going to hash the name of the object,
25:42
that unique ID, the URL. And from that hash, it can look up where in your cluster the data would be stored. So the location is deterministic by using the ring, and that guarantees that no single point of failure.
26:04
Every node in your cluster is going to have that ring, and you can look up that information there. Another interesting point about the Swift architecture is what we call storage policies. So we have the support to store data
26:22
across different data storage roles or replicas, we say. So I talked a lot about three replicas to give an example, but you could define really pools of data, kind of break up your cluster into pools of data, and you might have data that needs to be stored with three replicas or two replicas
26:41
or original coding. You can also use this to say, let's say, I have a very, for legal reasons or whatever, this data needs to stay in Europe. You cannot go to the US, for example. You cannot be replicated to the US. You could also define a pool that maintains that data just in your cluster or your data center that is in Europe.
27:03
So you're not using storage policies, you can do that. And that's an operator-defined role that the operator would define. From a user's perspective, the way you would use storage policies is when you're creating a container, you define what storage policy
27:21
that container is going to have, and then every object that you're going to put inside that container is going to use that storage policy. So you could have a policy of flash drives, for example, very high performance or what have you that can be defined by the operator.
27:42
I mentioned the four main components of Swift, but there's also what we like to call the consistency engine. That's really background demons that are running in the object storage system themselves and just maintaining that eventual consistency. It's making sure that data healing is happening properly,
28:01
bit rot detection, the updates when you put an object, make sure that the container database or the account database is getting updated, expiring the objects. So I gave an example there with my little diagram when the proxy is trying to write to those three replicas and it's not able to write one of them.
28:22
It will actually write to another place to guarantee that durability. So it always will try to write through replicas even though it wasn't able to write to its primary location, and eventually it will heal and it will move that data back into place. So that's all done by that consistency engine.
28:43
That's it. Maybe at this point you're thinking, yeah, Swift is cool, right? I'd like to contribute. We have a lot of work ahead of us. The work continues. Right now we're focusing a lot on the scalability of Swift.
29:00
So container sharding is an issue that we have where there are users out there that are trying to put billions of objects inside the same container, and that you get what we call deep containers, and you need to be able to shard that container. Lots of small files. Again, putting 16 byte files or objects
29:24
inside the cluster can become a problem. Data tiering, just being able to move, when I talk about policies, being able to move from a three replica policy, move your data automatically from a three replica policy to a rigid coded policy, for example.
29:42
That's another feature that we're working on. We also have a project called Hummingbird that is implementing or porting parts of the system to Golang, and that's for performance reasons. So there's really a ton of work to do there.
30:02
And that's it. You can chat with us if you wanna, or you can contribute. You can find the code on GitHub, or you can find us on Freenode if you have questions, if you wanna talk about. And that's it.
30:22
Any questions? Any questions? Yes. What is it?
30:44
The priority. The priority, so what is the priority of the project Hummingbird? So again, just to restate the purpose. The purpose is to rewrite parts of it in Golang. And the priority, I would say, is very high,
31:00
because again, it affects the scalability of the system. But right now, we're kind of prioritizing deep containers a little bit higher than that. So it's something that we're working on. We have actually patches out there that builds the foundation work for being able to port.
31:22
And we're really just talking about the object server that we're going to port, and maybe some of the data consistency background demons. So it's important, it's very important to us. But there's more important work, like the deep containers that we need to tackle first. But it's ongoing parallel work that is already ongoing.
31:42
I thought I saw, here and there, yep. Being a core project of OpenStack, I understand it means that the release schedule, the release dates are tied with the releases or OpenStack, or does Swift have its own case?
32:03
I don't think, so does Swift have a separate schedule from the OpenStack cadence? We, I guess you can say that we do. So actually, a lot of OpenStack projects are following sort of their own cadence,
32:21
where they're able to just release their own component. And when there is an OpenStack release, it kind of gets bundled together. And we just tag a specific release that we did, the last release. And we say, okay, this is going to be the Queen's release for OpenStack.
32:43
I'm going to, you actually bring up an issue, which is, Swift typically actually gets called out in a nice way nowadays, that you're actually able to run OpenStack Swift outside of OpenStack. You don't need a whole OpenStack cluster to run Swift.
33:02
Swift is its own storage system that you can run without any, actually, Swift component, OpenStack component. If you have your own authentication system, you don't even need Keystone. You can just spun up a Swift cluster, and it does not depend on OpenStack,
33:21
other OpenStack components. Without...
33:43
Yeah, so, I'm going to repeat, make sure that I understood your question. So if you have a three-replica system or a policy, and you want to reduce that policy to two replicas, for it to lower the cost, yes, you can, without downtime, without downtime.
34:01
Yes, any more questions? All right, cool, thank you.