We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Data Replication and Migration from Ceph RGW to Cloud

00:00

Formale Metadaten

Titel
Data Replication and Migration from Ceph RGW to Cloud
Serientitel
Anzahl der Teile
287
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
With seemingly insatiable demand for and growth of data within the organizations, secure and quick movement of the data often becomes a challenge in any Hybrid Cloud enviroment. The external tools and multi-cloud gateway providers add overhead to the latency and may not leverage full capabilities of On-Prem infrastructure. To overcome this, Ceph distributed storage, often handling petabytes of data, provides ways to move data directly from its OnPrem S3 RGW servers to multiple cloud providers, thus easing data movement. In this session, we will briefly discuss about the CloudSync and CloudTransition features in Ceph RadosGateway (RGW) which can be used to replicate and migrate data to any external cloud providers. We will also look into how to configure and enable these features, their current limitation and future work. As a pre-requisite, it would help to know what ceph object gateway is and how it works (https://docs.ceph.com/en/latest/radosgw/)
DiagrammTechnische Zeichnung
Interface <Schaltung>InformationsspeicherungNamensraumObjekt <Kategorie>GatewayClientServerOverhead <Kommunikationstechnik>DatenverwaltungZenonische ParadoxienMultiplikationCloud ComputingHybridrechnerSynchronisierungBimodulMAPZeitzoneInformationService providerMulti-Tier-ArchitekturMetadatenGruppenoperationQuellcodeKonfiguration <Informatik>UmwandlungsenthalpieE-MailCloud ComputingREST <Informatik>DatentransferSynchronisierungServerGruppenoperationUnternehmensarchitekturMetadatenMultiplikationObjekt <Kategorie>Klasse <Mathematik>ParametersystemGatewayClientDefaultDatenverwaltungZeitzoneKonfigurationsraumMultiplikationsoperatorRPCMereologieService providerInformationInformationsspeicherungBimodulInverser LimesAutomatische HandlungsplanungAttributierte GrammatikCASE <Informatik>AggregatzustandFunktionalNamensraumMigration <Informatik>SoftwareentwicklerBeanspruchungProgrammierumgebungDienst <Informatik>Front-End <Software>DatenreplikationAbfrageNichtlinearer OperatorKartesische KoordinatenURLFehlermeldungRechter WinkelSystemaufrufSchnittmengeBildschirmmaskeZusammenhängender GraphSchlüsselverwaltungFramework <Informatik>PackprogrammEndliche ModelltheorieDokumentenserverHypermediaWärmeübergangRichtungMAPRelationentheorieInterface <Schaltung>Konfiguration <Informatik>NeuroinformatikFaserbündelSkalierbarkeitSchaltnetzTypentheorieHierarchische StrukturKrümmungsmaßMathematikp-BlockProgrammbibliothekElektronische PublikationSchlussregelFrequenzStandardabweichungSkriptspracheQuellcodeSystemverwaltungSoftwareKomplexes SystemDatenflussplanLastteilungDateiverwaltungStrömungsrichtungFehlererkennungRechenzentrumHybridrechnerMomentenproblemComputeranimation
Transkript: Englisch(automatisch erzeugt)
Hello everyone, welcome to my talk. My name is Soumya, I am part of the Ceph storage development team at Pradat. Today in this talk, I'd like to discuss how to replicate or migrate data from Ceph RGW server to public cloud.
So here is the agenda for the talk today. Firstly, I would like to introduce briefly to Ceph and its object gateway server and what are the various functionalities they offer. Moving on, we'll see what is the hybrid cloud environment and what are the challenges one faces while migrating the data between the cloud entities.
And how Ceph RGW server addresses the same by collaborating directly with public cloud using two functionalities called cloud sync and cloud transition. As the name suggests, cloud sync is used to sync the data and cloud transition is used to transition or migrate the data from Ceph RGW server to public cloud.
Ceph is a distributed object storage solution. It is highly performant and can scale well. Like thousands of clients can connect and access the underlying Ceph cluster which could range from, the data could range from petabytes to exabytes.
It is suitable for many modern workloads like cloud infrastructure, data analytics, media repositories and archival. Also, there is more importantly that it can work on any commodity hardware, so there is no vendor lock-in.
So one of the key components of Ceph is called RADOS. RADOS stands for reliable autonomous distributed object storage. As the name suggests, it's object storage, so all the data is stored in the form of objects on the backend file system. And it has got support for the latest set of storage capabilities like redistribution, self-healing and self-management of the data.
It talks to its clients via a library called librados. So these Ceph clients in turn provide file, block and object storage interface to the actual end users.
So Ceph object gateway or RADOS gateway or simply put RGW server, it is one of the Ceph clients. It provides object storage interface to the end users and it talks to underlying Ceph cluster via librados library.
So it provides an industry-leading S3 API compatibility. It has got support for both Amazon, AWS, S3 and Swift. So the way it works is it provides RESTful HTTP APIs to store the objects in the Ceph cluster.
So these objects uploaded are immutable like once they are uploaded, they cannot be modified but can be re-uploaded. And they are stored in a logical container called buckets in a flat namespace hierarchy unlike traditional file system hierarchy. And as I mentioned, they talk to underlying Ceph cluster via librados library.
Also it provides a unified namespace for both S3 and Swift users. Which means S3 and Swift users can interchangeably access each other buckets and objects within it provided they have the right permissions. And other main advantages of using Ceph object gateway is it has got support for most of the latest S3 features.
For example, multipart upload, object versioning and object compression. And more importantly, the users can combine multiple data centers which may not be co-located.
And they can provide a global unified namespace using this object gateway and take advantage of underlying Ceph scalability and performance. So in the current storage ecosystem, there is a growing interest in cloud services.
But at the same time, many enterprises like to favor hybrid cloud environment. The reason being it provides flexibility and agility to move workloads from one entity to another cloud entity based on the demand. So the way it works is the hybrid cloud environment, we can unify public, private and on-premises infrastructure.
And that creates a single and flexible distributed computing environment. So while working in it, one can actually leverage the advantages of using the cloud. But at the same time can process and store the sensitive data in a private network or on-premises infrastructure.
So it is very efficient when compared to using public cloud or private cloud alone. But there are certain challenges here. One of the biggest challenges is the movement of data. So with so many big data applications, there is a demand for and growth of data in the current days.
So and typically the large data transfer could take a couple of weeks to move, which may not be desirable. And more importantly, the data movement should be secure and controlled and it should be protected as it could be sensitive in nature.
So let's look into a couple of approaches which enterprises may choose to transfer the data in a hybrid cloud environment. So one simple approach is to use the external tools like rsync, rclown or stcommand or AWS CLI.
So these scripts are simple to use, but there is a lot of manual work involved. That's because these tools would lack monitoring and error detection abilities. So one needs to, the admin needs to write wrapper scripts to get to support those features to monitor the uploads and detect if there are any errors.
And the other thing is they do not support incremental synchronization. So the data needs to be re-uploaded from the beginning if there is any failure and which can actually impact the cost and time hugely.
Another approach which most of the enterprises are willing and actually considering is to use a multi-cloud gateway services like Nubar or Finko. So as the name suggests, multi-cloud gateway services, they actually connect
multiple cloud providers, multiple cloud services and also on-prem infrastructure services. So the clients actually talk to this gateway service and handle the data. So this multi-cloud gateway service, based on the policy set, either stores the data or replicates or migrates the data between the cloud entities.
It seems like it is suitable, very well suitable for hybrid cloud environment, but there are challenges here as well. For example, the data is stored and processed at multiple levels, both in multi-cloud gateway and the backend actual cloud or on-premises services.
So there is a redundancy in data and this could lead to management overhead, which may affect the IO throughput. Other important thing is, since the clients directly talk to multi-cloud gateway service, they are limited by
the functionalities provided by such service and may not leverage the full capabilities of actual underlying on-prem services. For example, in this case, RGW server. Other thing is, it's a complex system since there are multiple entities involved.
So if there is any error, it's often not easy to detect and debug them because the debugging needs to be done at various levels. So some of these limitations can be addressed and we can actually speed up the data transfer by making on-prem and public cloud services collaborate with each other directly.
So that's what the Ceph RGW server plans to do by using its new functionalities, which are called the cloud sync and cloud transition. So cloud sync is used to sync data and cloud transition is used to migrate data from Ceph RGW server to public cloud.
They are full-fledged services and compared to other third-party tools, they are secure, reliable and efficient. And more importantly, the data transfers are very fast because the uploads can be done, processed parallely and they can transfer to multiple endpoints simultaneously.
Another thing to note here is the sync replication can be incremental. That is, if there are any few objects which are changed within a bucket, only those objects can be replicated, but not the entire bucket to the cloud endpoint.
Also, since these services are part of Ceph ecosystem, the existing tools can be used to monitor status and detect if there are any errors. So now let's look in detail what these features offer.
So cloud sync is a sync model which is built on top of multi-site framework. For those who are not aware, Ceph RGW has support for multi-site framework for using which we can actually replicate the data from one zone to another zone.
So zones here hypothetically can be referred to as a geographical location where multiple RGW servers can be co-located and load balanced workloads. So if there is any data returned to these set of RGW servers in a particular zone, so that zone actually replicates the data to multiple zones, other zones by logging the changes.
So this data replication is asynchronous, so that means they are eventually consistent. So sync module is built on top of this very multi-site framework and using sync module, in this case cloud sync module,
one can replicate the data from a zone in Ceph RGW cluster to an external entity which can be any other cloud provider. So it syncs the data to remote cloud service using REST APIs which are compatible with AWS S3.
So for this very reason, right now this feature works only with S3 cloud, but we do have plans to support other cloud services like Azure and Google cloud. Also, so most of the times by default the replication happens at zone level like within
the zone whenever there is any change in the data, it is replicated to the cloud services. But using multi-site sync policies, one can configure and choose a particular set of buckets to be replicated, not the entire namespace.
But we can choose to configure only certain buckets to be replicated to the cloud service if needed. Also the movement of data is unidirectional, that is the data is replicated from Ceph RGW server to the cloud endpoint, but not vice versa.
However, there is an ongoing work to provide sync info provider APIs using which these APIs we can actually transfer or replicate data from even cloud endpoints to Ceph RGW server.
Since it's built on top of multi-site framework, the configuration is similar. So while configuring a target zone, we need to specify tier type as cloud and other parameters like cloud credentials, et cetera, need to be configured as tier config parameters.
So these are the trivial configuration parameters we need to consider to use this feature, cloud sync feature. The endpoint credentials like access key, secret and the endpoint URL, they are mandatory. And there are optional parameters like access to map the RGW users to destination cloud users and target
path to specify the path in the remote cloud endpoint to which the object needs to be replicated to. Moving on to cloud transition, unlike the cloud sync, it is used to transition or migrate data from the RGW server to cloud endpoint.
So it is based on S3 lifecycle configuration. So those who are not aware, S3 provides lifecycle configuration feature using which we can
actually define a policy or set of rules to be applied on a particular bucket. And when that policy is set, S3 server actually processes them on a group of objects and applies those rules and takes certain actions called transition and expiration. So when it's a transition action, it moves the data from one storage class to another storage class.
For example, we can choose to move the data from standard storage class to archival after a certain period. Or one can choose to actually delete the objects after a certain period using the same lifecycle policy rules.
So cloud transition works in a similar way. So here the storage class, the target storage class defined actually associated with the remote cloud endpoint. So when a policy is set on a bucket and while when the server is processing those rules and it detects that the target storage class is a cloud endpoint,
it then migrates the data from safe cluster to the remote endpoint, cloud endpoint in this case. And it migrates the data using the same REST API, which are used by cloud sync model. And these REST APIs are compatible, as I mentioned, only with AWS S3 for now.
Also, the data moment is unidirectional, like once transitioned, the source object data is deleted from the cluster. However, there is a parameter which when set, we can choose to retain the metadata object metadata even after transition to the cloud. This metadata can be retained to check.
It will be held for the queries later to see what objects are actually transitioned to cloud and what is the current state. This is a new feature and it will be available with the upcoming the Quincy release in the SAP S3.
So here for to use the cloud transition feature, we need to define a storage class target storage class, but specify the tier type as cloud S3 and similar to cloud sync, all the other config parameters like cloud credentials, etc. are configured as tier config parameters.
So for the data needs to be moved across storage classes, we need to specify what data pool details whenever we configure storage classes. But here, since this particular cloud storage class is actually associated with a remote cloud, we need not configure any data pool for it.
Also, the other thing to note is, as I mentioned, the data is a unidirectional data transition. So once the object is transitioned to cloud, we cannot restore it back. So for the same reason, while configuring transition rules,
this cloud storage class should be specified last among all the storage classes the object transitions to. So this is how the configuration looks like for the cloud transition feature. So the mandatory parameters are cloud credentials, of course, access key, secret, endpoint, etc.
And apples can be used to map the RTW users to cloud users. And the target path target storage class is also configurable. And this is the optional parameter which I mentioned, which is for retain head object. By default, it is set to false.
But if it is set to true, the self RTW server retains the metadata even after transition to the cloud. So these two features, the current state of these features is there are it works well with the SD cloud endpoints,
but it is also a limitation that it works only with SD cloud endpoints, though we plan to address the same in the near future. The data transfer is unidirectional. And with respect to cloud transition feature, once the data is transitioned, the objects cannot be read or transition back.
Other thing is, in case of replication, one may expect the same the object Mtime and ETag to be intact even after replicating to the target zone. But since here the target zone is actually a cloud endpoint, the Mtime and ETag could change. But however, the original object Mtime and ETag get stored as external metadata attributes on the destination cloud endpoint.
So we so we have plans to actually address those limitations in the near future.
For example, we we want to support this application and transition to even other cloud services like Azure and the Google Cloud. And as I mentioned, the data transfer is unidirectional for now. But there is an ongoing work call sync info provider using these APIs.
So one should be able to migrate or replicate data from cloud endpoints back to the RGW server. Also, we want to have federation between these RGW and cloud services by mapping users, roles and bucket policies among them.
And in the case of cloud transition feature, it's the right now when any object is transitioned to cloud. And if a user sends a GET request on that particular object, we fail the request with the invalid objects get error. But instead, we can choose to actually redirect the GET request to cloud endpoint by sending the
Resign redirect URL to the user or the actual the RGW server can actually read the object from the cloud and serve that data to its S3 user. Alternatively, there is a S3 object of restore object operation which can be supported
to actually restore all the transition objects from the cloud back to the RGW server. So that brings the end of my talk. Thanks for listening. Please feel free to reach out to me as a community in case of any queries.
Thank you.