We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

COSI : a brief update

00:00

Formal Metadata

Title
COSI : a brief update
Title of Series
Number of Parts
287
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
For applications in Kubernetes, CSI provides a way to consume file/block storage for their workloads. The main motivation behind the Container Object Storage Interface is to provide a similar experience for Object Store. Basic idea is to provide a generic, dynamic provisioning API to consume the object store and the app pods can access the bucket in the underlying object-store like a PVC. The major challenge for this implementation there is no standard protocol defined for object and the COSI project need to be vendor agonistic. For example, in the case of RGW, the application can request for S3 bucket and Swift bucket from the same ceph-cosi driver. Ideally, the Kubernetes resource for the bucket can be migrated to the different cloud if the drivers support it and the application can seamlessly continue with the same k8s object. It won't handle the orchestration/management of object store, rather it will be another client and provide bucket access on behalf of applications running in Kubernetes. A similar session was given in last FOSDEM'21, but the whole project went through design changes and will share that information.
Interface (computing)Data storage deviceSoftwareArchitectureObject (grammar)Block (periodic table)Software engineeringPresentation of a groupTheory of relativityProjective planeCartesian coordinate systemData storage deviceFocus (optics)AreaComputer fileBitDiagramComputer animation
Data storage deviceBlock (periodic table)Point cloudObject (grammar)Diskrete MathematikBlock (periodic table)Data storage deviceLevel (video gaming)Physical systemAddress spaceComputer fileExtension (kinesiology)Enterprise architecture
Data storage deviceBlock (periodic table)Object (grammar)ArchitectureLogicMaß <Mathematik>Computer networkData storage deviceCartesian coordinate systemSoftwareHierarchyObject (grammar)Computer animation
Interface (computing)Data storage deviceBlock (periodic table)Plug-in (computing)Core dumpComputer configurationSystem programmingDevice driverLogicGoogolSource codeTerm (mathematics)System administratorSocial classVolumeDevice driverProjective planeVolume (thermodynamics)Cartesian coordinate systemData storage devicePlug-in (computing)Social classType theoryCodeResultantSource codeDefault (computer science)Device driverCore dumpBlock (periodic table)BitPhysical systemLogicSystem administratorComputer fileMereologyComputing platformComputer animation
Computer fileProjective planeBlock (periodic table)Data storage deviceCloud computingSet (mathematics)Computer animation
Generic programmingImplementationGame controllerSimilarity (geometry)Front and back endsRow (database)Library (computing)AbstractionLatent heatWorkloadCluster samplingMechanism designWorkloadCartesian coordinate systemData storage deviceGame controllerSimilarity (geometry)Communications protocolObject (grammar)Internet service providerFront and back endsSoftware maintenanceDifferent (Kate Ryan album)Portable communications deviceLink (knot theory)Set (mathematics)Library (computing)Operator (mathematics)Generic programmingProcedural programmingMechanism designComputer architectureLevel (video gaming)Configuration spaceProjective planeDigital electronicsResultantGene clusterPoint cloudType theoryData storage deviceCASE <Informatik>Revision controlDevice driverExistenceFlow separationSystem administratorDecision theoryComputer animation
Social classInformationMechanism designTerm (mathematics)Core dumpSource codeData storage deviceType theoryPairwise comparisonIcosahedronGame controllerDevice driverDisintegrationSystem callOperations researchBitNumerical integrationVolume (thermodynamics)Cartesian coordinate systemAsynchronous Transfer ModeCombinational logicMechanism designNetwork topologyDiagramWorkloadMereologySocial classFront and back endsNamespacePairwise comparisonOperator (mathematics)ResultantPositional notationInstance (computer science)Data managementGame controllerObject (grammar)Data storage deviceDevice driverPhysical systemCubeConnectivity (graph theory)Different (Kate Ryan album)Slide ruleServer (computing)InformationComputer animationProgram flowchart
Social classFront and back endsInterface (computing)Social classSystem administratorData storage deviceCASE <Informatik>Object (grammar)Mechanism designFront and back endsArithmetic meanField (computer science)Green's functionComputer animation
Game controllerSystem callDevice driverSocial classExistenceNamespaceCartesian coordinate systemDevice driverCASE <Informatik>Different (Kate Ryan album)System callFront and back endsSystem administratorKey (cryptography)Field (computer science)Game controllerWorkloadVolume (thermodynamics)Mobile appFluid staticsOnline helpService (economics)AuthenticationPosition operatorMechanism designGreen's functionCodeSocial classString (computer science)MappingMessage passingNamespaceComputer animation
Game controllerDevice driverFront and back endsMetadataCommunications protocolNamespaceAuthenticationIdentity managementPlug-in (computing)Data storage deviceAlpha (investment)Interface (computing)Communications protocolShared memoryInformationNamespaceAutomationData managementData storage deviceProjective plane2 (number)Device driverKey (cryptography)Front and back endsRevision controlServer (computing)Service (economics)Alpha (investment)AuthenticationFluid staticsWebsiteComputer fileCASE <Informatik>Arithmetic meanMetadataUniform resource locatorCartesian coordinate systemFibonacci numberData storage devicePlanningGame controllerComplete metric spaceDependent and independent variablesDifferent (Kate Ryan album)Decision theoryPersonal digital assistantGastropod shellOperator (mathematics)Plug-in (computing)Sampling (statistics)WorkloadDevice driverNP-hardComputer animation
Data storage deviceInterface (computing)Lattice (order)Projective planePresentation of a groupComputer animation
Device driverDevice driverCloud computingData storage devicePresentation of a groupSpherical capComputer animationLecture/Conference
Computer animation
Transcript: English(auto-generated)
Hello everyone, my name is Jifin Tony Kortan and currently I am working as a Senior Software Engineer in Red Hat Ceph Storage.
In this world, my main focus area is around supporting Ceph object store, otherwise known as RCW, for the applications running in Kubernetes. Related to this work, I got a chance to involved with the KOSHI project.
If you remember last post, I have given a presentation about the same topic with KOSHI. Today, I am going to give a brief update happened over last year and what is our current status for the ARFA release.
So, I am planning to cover following topics. I will start with an introduction with the object storage. Then I will talk a bit about PSA in Kubernetes. Then I move to the KOSHI project, what was its motivation, how it was designed and how user can consume the KOSHI for their workflow.
So, what is an object storage and how it is different from file and block storage. As most of you know, the traditional storage file and block was introduced to address storage issues in a system level or for a large enterprise.
But when cloud pushed in, the data got exploited to a larger extent. So, the traditional storage became bottleneck for the data explosion.
So, to address this, the object storage was introduced. And here, data is broken into small discrete units known as objects.
This can be accessed via simple network APIs like HTTP. Data is organized into a logical container, mostly known as buckets and it provides the flat hierarchy. So, the applications can access the data very fastly.
In short, the object storage design provides a cost efficient and easily scalable into larger quantities while maintaining a quick access.
Now, just talk about container storage interface, otherwise known as CSI. So, what was it sold and why it was, what it provides for Kubernetes. So, basically it provides a platform for file and block storage systems for the applications running on Kubernetes.
So, before CSI, connecting to storage was like you need to connect to a volume plugins of drivers or that. And it was, it needed to be directly part of the core Kubernetes projects.
And this made the life difficult for both vendors and Kubernetes like for providing general flexibility. So, now CSI results this by allowing vendors moving this logic to separate drivers.
And there are a lot of major players here like Amazon, EBS, or Google Cloud Storage. So, this means each vendors can support their own features for the storage.
And the core Kubernetes can be more secure and more reliable without having default plugins inside the code. So, now we can just talk about the certain CSI terminologies which was mostly used among users.
And most of you have heard these terms. So, this will help to understand the COSI project bit more. So, first one is storage class. The storage classes provide a way for the Kubernetes admins to describe about their storage provisionals.
Like, how, what it's like for example it's a storage or it's an Amazon EBS and all those stuff. Now, the user or admin can create persistent volume as a piece of
storage that is provisioned statically by the driver or dynamically through a storage class. And as part of or as a result of the persistent volume request, the applications will get persistent volume claims.
And these PVCs will consume the persistent volume sources. Like they specify like what, how much size of data they needed and what type of access they will provide. Like whether it will be read-only or read-write-once or read-write-many, like that.
Now, we will talk a bit about like how we started the COSI project or like what was the real motivation behind this project. So, as you know the Kubernetes container orchestrator which was primarily designed for cloud applications.
And cloud applications uses to more dominantly the set of just storage. And, but the CSA actually only provides like file and block storage.
So, we identify this gap and to address this we started the COSI project. So, what was the main motivation behind this project? So, here we need to provide and generate dynamic provisioning API like CSA but to consume the object store.
And applications can access the bucket. I mean the backend bucket underlying the object store like a PVC. We need to implement a Kubernetes controller to automate this decision with various pluggable provisionals.
And, it will provide similar user or admin experience for new and existing buckets. And, keeping all this mind we need to keep it as a vendor agnostic
like the different protocols or like the object store does not have a standard protocol. And, each major players define their version of object store protocol like S3 or Swift, GCS etc.
So, we need to be a generic layer on top of this vendors. And, here we are not talking about like orchestrating or managing the actual object store like the CSA even does not do that.
So, we need to handle separating. For example, look is a storage orchestrator which is designed to handle the majorly for the Swift storage. So, moving on. So, in like that like a team of three like Jeff, John and Ellen started a project keeping this in mind known as libbucket provisionals.
But, it was duplicated on February 2020. Like, we can consider it as a procedure project for the COSY.
So, what it is, it's like a golden library wrapping around a Kubernetes controller. It mainly uses two custom resources to abstract the backend bucket. And, user can make a request on the backend bucket via this resources.
Primarily, it was consumed by work community for the Swift project or Swift object store project. And, as new buys another vendor who uses this project.
So, they uses it like an external library for their operator. And, what this library actually does, it watches on the bucket claims or request made by users. Like, it's a controller.
And, it reconciles or retries those request in this like the operator. Like, it will be looked like. And, it creates artifacts like config map or circuit as a result of this bucket claim or request creation.
And, this config map or circuit can be consumed in the application port for accessing the backend bucket. It also handles the deletion of Kubernetes resources which was created on behalf of the team. So, when we started the KOSY projects like on top of the motivation, we created a set of course to address by KOSY.
It came from the prior experience from the link bucket professional and the motivation as well.
So, here we need to automate the creation, deletion of the bucket and also the access or revoke for the users. And, user need to provision their bucket according to their workload application.
Like, even rather if it's S3 or like Azure or like whatever application supports, we need to provide the bucket for the use case. User also able to use the existing buckets and he can also need to share the bucket with different users.
We need to support portability among different workloads and along with different clusters. So, if user move from one cloud to other cloud, if the driver supports that mechanism,
if the support provides two different types of access or something like that, the application will able to do their work in both clusters or in both vendors.
And, as I mentioned before, we need to maintain vendor neutrality during this development. Or like, it need to support huge or large amount of vendors and each vendor have their own features or their own definition for the protocol.
So, we just need to provide a generic library or generic layer for these vendors. And, they need to be easily integrated with the COSI project. So, the API should be like kind of a standard mechanism for every third party vendors.
Now, how the COSI architecture looks like. So, before moving to the actual architecture, I will talk a bit about the COSI terminologies. As I mentioned, for COSI project. So, first we have bucket class.
It's a cluster scoped resource containing defining about the provision for the bucket. Or, we can say the object for the bucket. So, using the bucket class, the users can request for bucket claim. So, it's a namespace resource which represents a request for a bucket.
It can be new or like an existing bucket. So, as a result of the bucket claim, a bucket resource will be created. Again, it's namespace scoped and it prefers the bucket claim. It has the information about the object or the backend bucket.
Now, the second part is we need to access the bucket. So, for that, we define a bucket access class. It's a cluster scoped resource which defines the access mechanism for that bucket.
And, user can request bucket access for granting the backend bucket access. And again, it's a namespace scoped resource. We will go a bit deep into everything while explaining the same diagram. So, I am going to give a comparison between COSI and CSI.
So, first one is like in CSI, we have storage class. So, similar to that, we have bucket class which actually defines the backend provision of how it looks.
And, then we have bucket which is similar to the persistent volume in CSI. And, the persistent volume claim is a combination of bucket claim and bucket access. And, CSI emphasize the generality of bucket by giving access policies or like access mechanism in the bucket class
by defining that in bucket access class.
But, CSI has less generality access policy instead of which has a predefined access mode like read write once or read only many or read write many. Now, I will just briefly go to the CSI topology diagram.
As you can see on the left hand side, you can see the Kubernetes API. Server and there are different components for the COSI. One is COSI controller. Then, we have the COSI sidecar and the driver.
Then, at the end, we have the vendor. So, I will explain each component in the upcoming slides. And, application will be residing on its namespace. And, on the right hand side, you can see bucket, bucket claim, bucket class. So, basically, there will be bucket class and bucket access class will be created by the admin.
And, you can see, you can request for the bucket claim which will create a bucket. And, bucket access which suffers bucket claim or bucket and the bucket access class.
So, after creating a bucket access, user will get a secret. So, using that secret in the application port, it can access the backend bucket. So, that's in general how the things work.
And, as you can see, the COSI, everything will be in different namespaces. Like, application will be running on the namespace and the COSI controller, as of now, it will be running on the cube system. It's kind of a dependency. Then, there is a portion of the namespace where the sidecar, driver, or the vendor portion will be running.
Now, as I mentioned, there are three different COSI components. One is the COSI control manager. So, it manages the lifecycle of the COSI API objects.
And, there should be at least one active instance to handle the request and all that stuff. Then, we have a COSI sidecar. So, it's an integration point between the COSI and the driver.
It sends a gRPC request to the driver. And, we need to have one sidecar for each driver. Now, we have the COSI driver. So, this is a generic driver in which you can plug in your storage, vendor, clients, or the APIs.
So, that it can send requests to the vendor positioner to get the bucket related operations like create, access, or delete. And, now we will talk about how the COSI can be consumed for your application workload.
So, I am planning to cover scenarios in which involve duration of the bucket, access of the bucket, and deletion of the bucket.
So, first I will mention about like as a prerequisite like admin need to create bucket class or bucket access class. This will provide the details about the back-end object storage and the defining mechanism for accessing those buckets in the back-end storage.
And, for the bound field case. So, the bound field case means if the bucket already assists or like if it is created in the back-end bucket or it will be an existing bucket resource which has created.
So, the generic means already the bucket was there. And, green field means the bucket resource will be created with the bucket claim request. So, for the bound field case, at least for alpha, it is not automated.
The admin need to manually do the work of creating I mean the back-end bucket or the bucket resources. So, that user can request bucket access on top of this bucket. Now, first I will go to the green field case how the create bucket workflow happens in green field.
So, basically admin has already created the bucket class. Now, user need to request for a bucket claim. The request will reach to the sender controller. It will create a bucket and for that it need to send a create bucket call for the sidecar and the cosey driver.
And, when the request reaches the cosey driver, it will send a request for the back -end provisioner or I mean the back-end vendor to create the actual back-end bucket.
And, after creating the back-end bucket, the sender controller will create the bucket resource. Now, this happens in case of green field use case. For green field use case, now how to access this bucket.
So, for that, like as I mentioned before, admin needs to create the bucket access class. And, bucket access class and bucket claim class both are different. So, user will create a bucket access. It contains details such as like the bucket claim which you want to use and the bucket access class.
And, also it will need to mention the secret name which the application code need to be consumed. So, similar to the bucket request, the bucket access, the first it will reach the controller.
So, for creating the bucket access resource, the request will pass to the sidecar or the cosey driver to say dan bucket access. And, the cosey driver will ask the vendor provisioner to provide how it can get the access details for that bucket.
So, currently we are planning to do the access mechanism in two ways. Either it will be a static credentials like access key or secret key. Or, it will be IAM style authentication with help of the service account given in the application port.
In both cases, it will generate a secret. I mean, all the details will be passed back to the driver then to the controller. The controller will create the secret and the bucket access after successfully handling a bucket access call from the user.
And, the secret can be attached to the port as a volume position or something like that. So, the port will wait until the bucket access request is succeeded and secret is generated.
Now, we will talk about bound field use case. So, here there are two cases which we need to consider. One, the backend bucket is already there. And, there is no bucket resource for that backend bucket.
Other use cases like if user needs to share a bucket he has created. Then, the bucket resource only needs to be created. So, first case, the backend bucket is there.
So, admin will create a bucket resource or into that backend bucket in the corresponding namespace. Then, for other case, he will copy the bucket resource to a different namespace for the application to consume it. And, maybe I forgot to mention earlier the bucket access and the bucket need to be on the same namespace
where the application is on it. So, after creating the bucket resource by the admin then we can create the bucket access referring to bucket and the bucket access class.
And, we need to provide the secret name as well. So, the difference is in the previous use case we need to refer to the bucket name. But, here we need to refer to the bucket.
So, yeah, the central controller like will identify it's a broad field use case by the reference to bucket. Similar to that, it will create a bucket access and request to the sidecar controller.
Then, again, the sidecar controller or the COSI driver will request for the access mechanism from the vendor. It will pass back to the sidecar then to the central controller and it generates the secret. And, as I mentioned earlier, the secret will be attached to the port.
And, application can consume the bucket using the details from the secret. Now, for the delete string case like since we have two resources like we need to delete the both resources.
So, first, if you have to revoke the application from using a bucket, you need to delete the bucket access. So, yeah, the central controller will send the request to revoke access and sidecar controller or the driver will send the request to the professional.
And, it will revoke access like maybe they will delete the keys or static details or they will delete the mapping with the service account for the AM side authentication. Then, the secret will be deleted.
So, what happens is if application is still running with the same secret, it will be affected since the back end when the app revokes its credentials. So, even though secret can be accessed or like application will be running, it will get the access teenage message while running its workload or use case.
And, when application got deleted, it won't be able to restart it again because the secret has already deleted. Now, after deleting the bucket access, we can delete the bucket claim. So, the bucket claim will be deleted only if all the access to that bucket will be removed.
So, basically, the idea is when you users send a bucket claim delete request, first it make the bucket unavailable so that no new request can be asked on top of this bucket.
Now, we will wait till the actual bucket release happened and then only the driver will call delete bucket to the vendor and the vendor will delete the back end bucket.
So, here all the assistant applications can able to use this bucket until the access is not removed. So, the workload will continue but no new request or no new workload can be given on top of this bucket.
And, once all the bucket access request are removed, then only the back end bucket is removed. And, we need to maybe do it asynchronously because deleting the actual bucket may be
a costly operation because it might be lot of files and data inside that bucket. And, before deleting or before the completion of actual back end bucket, the bucket resource may be removed. So, that's possible.
Now, I will just mention about the secret, what the secret provides in the application. So, the secret is contains a bucket info.json. It will be mounted in a specific location on the application port depending on the secret name and the amount path which user has given.
So, basically, it contains following information. It has the bucket name and the endpoint details for the object store. It also provides the authentication type. So, it will be T means static credentials like access key or secret key or it will be IAM or it can be both.
So, in case of IAM, the service account need to be mapped with the service account of the application so that the metadata server for the authentication can map the service account with the IAM style account or something like that.
So, yeah, it also specify which kind of protocol whether it's S3 or Azure Blob or Google Cloud like that. So, basically, it have the information to access the back end bucket.
Now, so, I guess I had covered most of the stuff which are planned for the alpha. So, post alpha, we have lot of things to do or like lot of work need to be done. Even though I mentioned about the IAM style authentication, it won't be implemented for alpha.
Like, we have the design, but maybe we can expect in the second version or post alpha release, maybe alpha one or alpha two. And other thing is like we need to automate bucket sharing across namespace.
Like, it's kind of a bit hard for that mean if you need to share hundreds or tons of buckets among different namespaces. Like, as I mentioned before, bucket is kind of a namespace resource. And if the ten application, ten different namespaces need to access that bucket,
then each namespace need to create or unmute to create the bucket on each namespace. And we also need to discuss the bucket lifecycle for shared buckets. Like, what happens if the old shell bucket will be deleted?
So, what should we do when the backend bucket need to be deleted? That was the decision which we need to make for the shared buckets. And we also need to have a cubelet plugin so that we can have more checks on the application side.
So, currently, as you see, it's just a secret and the application pod will attach the secret and will be running if it is available. There is no magic from the COSI to validate details of the application,
whether it will be compatible with the bucket request or like that. Now, we also need to consider integrating different storage features like quota or encryption, whatever which is applicable and which can be done generally on the COSI side.
So, these are our post-alpha plans. And there are a couple of responses. So, there's an ongoing CABS review. So, maybe it will be merged before this presentation. So, it's about to merge kind of situation. And, yeah, hopefully it will be merged, I guess, before first week of FIB.
And we have a website, but it's not updated with the current status We need to update the website before alpha. So, these are the different projects listed in GitHub. Like, the same sidecar project.
Then we have a project which defines the APIs back. And, as I mentioned, we have the manager controller which sits at the top. We also have a sample COSI driver. And we also have a Ceph COSI driver, which was written for accessing the Ceph project.
But both of these drivers are not updated with the latest version or latest descent. So, we are working on that. Now, to end the presentation, I would thank the following people who worked for the
Well, for implementing the COSI project. And you can always reach us out in the Kubernetes Slack channel, stick-storage-cosi for questions.
And we have weekly meetings on every Thursday at 6 p.m. So, currently it's kind of a design review meeting. It's not a user like meeting. But, yeah, we can discuss all the topics and what are expecting.
And, yeah, with that, I will conclude my presentation. Thanks, you, everyone, for sitting out and patiently watching my presentation.
So, the first question was, like, what are cloud providers are developing or release the COSI drivers? So, as I mentioned in the presentation, Ceph has one COSI driver. And Minio supports their driver. But both are not updated with the current cap.
And that's the two drivers which I am known of. But we see, like, active presentation or like participation in NetApp and other vendors or other companies like Pure Storage.