State of Ceph
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 40 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/54424 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
openSUSE Conference 201913 / 40
7
11
12
13
15
16
17
18
19
20
23
25
26
27
34
38
40
00:00
State of matterContent (media)Data managementDensity of statesLibrary (computing)Mobile appJava appletBlock (periodic table)Kernel (computing)Device driverArchaeological field surveyBroadcast programmingExecution unitMultiplication signRevision controlData managementFile systemGateway (telecommunications)TwitterLink (knot theory)Ocean currentClient (computing)Module (mathematics)Crash (computing)Formal languageOrder (biology)Open sourceMultiplicationData storage deviceHypercubeComputer hardwarePlotterPhysical systemSingle-precision floating-point formatStability theoryVirtual machineQR codeObject (grammar)EmailSlide ruleAbsolute valueLine (geometry)Greatest elementBlock (periodic table)Library (computing)RadiusComputer architectureArchaeological field surveyPresentation of a group2 (number)ImplementationSuite (music)INTEGRALWordWebsiteDifferent (Kate Ryan album)Software testingTraffic reportingUsabilityCone penetration testRight angleMobile WebProjective planeComplete metric spaceiSCSIMathematicsMiniDiscLevel (video gaming)Local area networkComputer animation
08:23
Function (mathematics)Data managementMetric systemPrototypeAuthenticationSingle sign-onInformationLanding pageData recoveryText editorConfiguration spacePseudopotenzialFile vieweriSCSIModule (mathematics)FingerprintSqueeze theoremCommon Language InfrastructureAbstractionDemonGraphical user interfaceFlagFormal languageCASE <Informatik>Read-only memoryProfil (magazine)TwitterDefault (computer science)Arithmetic progressionService (economics)INTEGRALVideo game consoleSet (mathematics)Dot productTraffic reportingData managementView (database)Module (mathematics)Landing pageSqueeze theoremCalculationNumberCommon Language InfrastructureConfiguration spaceiSCSISheaf (mathematics)Metric systemAbstractionFunctional (mathematics)Open setParity (mathematics)Data recoveryComputer hardwareDemonRepresentational state transferSingle sign-onFront and back endsComputer configurationMedical imagingQuality of serviceLevel (video gaming)Computer fileMehrplatzsystemMathematicsSingle-precision floating-point formatGraph (mathematics)System administratorOverlay-NetzMenu (computing)FlagTelephone number mappingText editorRadiusGroup actionComputer animation
16:40
Metric systemData storage devicePlastikkartePredictionData modelAsynchronous Transfer ModePoint cloudCrash (computing)InformationDemonModule (mathematics)TimestampRevision controlStack (abstract data type)Communications protocolGoodness of fitEncryptionAuthenticationKerberos <Kryptologie>Duality (mathematics)Kernel (computing)Data managementConfiguration spaceData recoveryRead-only memoryComputer networkComputer configurationModul <Datentyp>TypprüfungResource allocationCache (computing)Chemical equationSpacetimeData compressionTraffic reportingData structureCrash (computing)Module (mathematics)InformationCommunications protocolData managementLibrary (computing)Computer fileLoginService (economics)Message passingDirectory serviceCache (computing)Function (mathematics)Different (Kate Ryan album)Integrated development environmentMiniDiscResultantUtility softwarePredictabilityRevision controlAsynchronous Transfer ModePoint cloudCalculationCountingError messageLatent heatTask (computing)Raster graphicsFlagBoss CorporationMultiplication signDirection (geometry)Software developerDemonDevice driverTelecommunicationRadarResource allocationCentralizer and normalizerConfiguration spaceSemiconductor memoryMetric systemMemory managementPerspective (visual)Neue MathematikRadiusData loggerCommon Language InfrastructureLipschitz-StetigkeitStaff (military)WebsiteData storage deviceLevel (video gaming)Dimensional analysisObject (grammar)Moment <Mathematik>Personal identification numberArithmetic progressionComputer animation
24:57
Limit (category theory)CodeSocial classPersonal digital assistantRead-only memoryData recoveryTime zoneData managementTransport Layer SecurityEvent horizonInterface (computing)Object (grammar)Human migrationComputer-generated imageryData storage deviceState diagramHard disk driveCommon Language InfrastructureSample (statistics)Client (computing)TimestampSpacetimeConfiguration spaceView (database)MultiplicationVolumeMiniDiscIndependence (probability theory)Uniqueness quantificationDevice driverDemonFrequencySemantics (computer science)Functional (mathematics)Streaming mediaEvent horizonTime zoneTimestampConfiguration spaceMathematicsObject (grammar)Right angleSurfaceMultiplicationVolume (thermodynamics)Independent set (graph theory)File systemDevice driverVirtual machineBand matrixCovering spaceLevel (video gaming)Social classWater vaporRevision controlUtility softwareControl flowLengthMedical imagingTransport Layer SecurityGastropod shellLimit (category theory)Endliche ModelltheorieHard disk driveData recoveryCodeSemiconductor memoryDemonCASE <Informatik>Key (cryptography)Bit rateSystem administrator1 (number)Arithmetic meanCache (computing)Function (mathematics)Client (computing)Domain nameInformation securityNamespaceStatisticsMereologyVideo gameReplication (computing)Different (Kate Ryan album)Human migrationMultitier architectureRule of inferenceWordIntegrated development environmentInterface (computing)CloningFile archiverSoftware bugData managementLatent heatCentralizer and normalizerWritingReading (process)Web 2.0DebuggerComputer animation
33:15
Gastropod shellMiniDiscCommon Language InfrastructureDirectory serviceAttribute grammarMultiplicationCluster samplingData storage deviceScale (map)Computing platformControl flowComputer hardwareBroadcast programmingVertex (graph theory)Operator (mathematics)Revision controlDemonData managementRootFingerprintPhysical systemExecution unitComputer-generated imageryWindows RegistryWhiteboardEvent horizonInternet forumIntegral domainLink (knot theory)Service (economics)View (database)EmailDifferent (Kate Ryan album)WindowProcess (computing)Software maintenanceScalabilityVideo gameProjective planeAttribute grammarArithmetic progressionOvalStudent's t-testCASE <Informatik>Data managementSelf-organizationEvent horizonData storage deviceGastropod shellBootstrap aggregatingScripting languageDevice driverScaling (geometry)Link (knot theory)Software developerIntegrated development environmentMechanism designSoftware bugPresentation of a groupCodeStaff (military)Set (mathematics)Slide ruleMetropolitan area networkPerspective (visual)Total S.A.Gateway (telecommunications)Information securityMeta elementComputer animation
41:32
VideoconferencingComputer animation
Transcript: English(auto-generated)
00:06
OK, I think it's time, so let's get started. First item for today, if you would like to follow my slides on your device, laptop, mobile device, whatever, you could just visit that link
00:23
or scan the QR code for lazy folks out there. So those are hosted on GitHub. So I'll give you another 10 seconds, more or less. If not, you can always visit them later. 5, 4, 3, 2, 1, seems like everyone is set.
00:43
OK, topic for today is Ceph. And today, I would like to give you an update on, in particular, what's new in Ceph. And with new in Ceph, I mean with the latest release, which is called Nautilus. My name is Kai. I work for SUSE.
01:01
And if you have any questions, reach out by mail, grab me in the hallway, reach out on Twitter, wherever. Everything is fine. We have a lot of things we're going to talk about today. First of all, quick introduction into Ceph. Just that everyone is aware, this is not what is Ceph,
01:22
what is an object storage talk at all. This is really focused on the new features in Ceph. So if you're expecting, I don't know, a complete bottom line talk about Ceph, that's not what I'm going to talk about, just to say the stage here.
01:41
Nevertheless, that's the only absolute basic picture and slide that I have. I expect everyone has seen that already. Please raise your hand if you have seen that picture already and you're aware of what this is all about.
02:00
At least half of you. So I will explain it really quickly. This just shows and visualizes the Ceph architecture more or less. Underneath you have the Rados layer, which is a reliable autonomous distributed object storage. That's what it stands for. As easy as it could be understood,
02:23
that's where all your data is replicated and the brain of it relies. And on top of it, we have right now four different ways how we can access our data. Because we would like to store some data in the cluster and we would like to get data out of the cluster. So we have four different ways.
02:40
To the left, here, that's librados. That's a library you could use to place stuff in Ceph cluster or to get out of it. You can use, I don't know, almost any language like C, C++, Python. So choose your poison, I would say. Second thing is the Rados gateway, which is a REST gateway,
03:05
which works familiar to S3 and Swift. So for those of you who are familiar with AWS and S3 there, that's more or less how it works. We have RBD, which is a replicated block device.
03:22
This is used for mainly hyper virtual machines. So that's a block device that could be put underneath a virtual machine. And also, RBDs are used, for example, if you put on top an iSCSI gateway to expose this LAN to a third node, you would use an RBD which is nothing less than just a block device.
03:44
And last but not least, I think it's stable since, not quite sure, I think, luminous. If I'm not mistaken, it's CephFS. That's a POSIX compliant file system which is put on Ceph. So there's no need to have an NFS gateway, for example, in between,
04:02
or a Samba gateway. And using an RBD, for example, formatting it and then exporting it to a client, you can directly use CephFS, which is much more performant and much better. So that's enough for the introduction. Let's get to where we are right now. First, an overview. Luminous was released in 2017, then Mimic in May 2018.
04:25
And we are already post-Nautilus. Nautilus was released in March 2019. As you can see already, we're right now on a nine-month release cadence. And we support a plus, minus two upgrade of it.
04:40
So let's say if you're still on a Luminous release, you can upgrade directly to Nautilus. Or if you're already on a Mimic release, you can then jump over Nautilus directly to Octopus once it's released. The current release cadence is under a heavy debate. I just added a link.
05:01
Don't get confused. This one is redirecting you to Twitter, where they started a survey last weekend during the FollowCon to ask the audience if we should switch back to a 12-month release cadence instead of a nine-month release cadence. So please make use of it.
05:21
I would like to switch to a 12-month release cadence again. Seth, just that you're aware, have some, or I would call, principles. You can also call them themes in the meanwhile. Five of them, first of all. Usability. Seth, as you're aware, hopefully, or if you tried it already,
05:42
you found it out yourself, it's rather complex to set up and also to administer and to manage. So one way to make it easier to consume, for example, with this orchestrator API, I'm going to talk later, and also to develop some upgrade automation around it. Quality is the next big topic here.
06:03
Quality means we're trying to collect crash reports. We have some telemetry module in the meanwhile in place, together with better documentation and test suite. Performance. Obviously, everyone wants more performance. To be honest, it's rather easy to get more IO out of a Seth cluster,
06:22
just add more hardware to it. That's how it works. So if you need more IO, just add more hardware. It's as easy as it is. The problem is, or the idea behind this, back in the days when the storage, the OSD layer was developed, it was developed against spinner disks because this was the thing back in the days. In the meanwhile, everyone is talking about all flash, obviously,
06:43
because it's getting cheaper and cheaper nowadays. And there is a new project which is called Crimson, which refactors the OSD stack more or less completely to base or to keep that in mind already, and which delivers already better performance compared to the old implementation.
07:02
I saw some presentation back last weekend. It's a follow-on. There's a Crimson talk, so if you're interested in that, just check it out. Multisite, which is mainly about S3 management capabilities, including tiering across different pools, for example. That's what Multisite is about.
07:21
And last but not least, what Torsten already talked about briefly, containers. So underneath the ecosystem, this is just a word for Kubernetes, Rook, and the whole integration here. Let's get directly into the features.
07:43
The most noticeable feature change in Nautilus was the, in quotes, newly added dashboard. The first version of the dashboard was luminous already, but it was a read-only version. In the meanwhile, this was heavily improved with management capabilities and whatnot.
08:04
This is how it looks like in the meanwhile. For those of you who have seen maybe the luminous version already, it looks a little bit different. And because I thought, I don't know, just boring pictures, let's see, OK, now it's working. I prepared a quick, just one minute presentation.
08:22
That's the dashboard. I won't show you everything, but just that you have a clear idea. You have some top menu items. This is, for example, how the OST tab looks like. You get an overview of the OSTs with some details underneath. We have some fancy overlay, if you go over those little dots there.
08:41
We can set cluster-wide OST flags, for example, like no in, so there's no need to go back to the CLI and console anymore. I want to show you also the RBD tab, where we can create, delete, edit RBDs, also create snapshots of it.
09:03
There is already an existing snapshot, which you may see. And last but not least, the pool tab. So as I said, just a quick, rough overview. And one thing that has noticeably changed, for example, is the Grafana integration underneath. So that's integrated in all the various tabs and views
09:24
in the dashboard. But I thought maybe some move-in pictures are more interesting than just some static images. As I said already, the dashboard heavily evolved. It's now built into theft, so as soon as you just install
09:43
the theft package itself and you set up a cluster, it's already there. The only thing you have to do is you have to enable the dashboard module. It's a manager module, and then it's up and running. That's everything that's needed. It has, as you've already seen, a lot of more management functionality. We'll talk about them in a bit.
10:02
Metric and reporting. What I used here at the back end to collect the data was Prometheus, and then to visualize it in the dashboard was Grafana. And yeah, what the outlook is, the whole hardware deployment, service management, that's work in progress.
10:21
In further details, new functionality. We now support multiple user and roles. In the past, a lot of folks requested stuff like we would like to have a read-only user, for example, for our monitoring team or for other purposes. And that's what we implemented there. We shipped with some default roles,
10:41
like an administrator, RBD manager, read-only user, for example. And if you grant your user with just those specific permissions, then he will only see, for example, those specific tabs or can only edit those things. We also integrated SAML v2 for SSO, single sign-on.
11:02
We embedded the auditing log into the dashboard. You've seen already the new landing page. We also, which is quite cool, translated the whole UI into, I don't know, seven or eight languages in the meanwhile. Hopefully more to come. So in case if there is a language that you are capable of and you would like to contribute,
11:24
that would be the easiest way. We're using trend effects, so that's rather easy to contribute to the community. And on top of that, we are making use of Swagger, which gives us now a nice and, I would say, fancy way for our REST API documentation.
11:42
OSD management, you've seen it already, setting all the various flags, for example, OSDs. We can also set, for example, recovery profiles in the meanwhile in the UI. So if you have ongoing recovery in your cluster, for example, you can switch between, I think, low, medium, and high performance-wise, so it don't impact your cluster that much.
12:04
We have a config settings editor. So yeah, seriously, any config option that can be set on a CLI can also be set in the UI. So still, you have to know what you're doing, I know. But that's one step forward. We can manage the pools. We can manage erasure-coded pool files.
12:23
Also, lately, RBD mirroring configuration was added to the UI, so there's no need to do this on the CLI anymore. You can just add your peer, adding your peer in the UI, and then create your RBD mirror on top, which is really nifty. And you have seen the Grafana dashboards right now already, so you have a clue how this is supposed to look like.
12:45
We still have a crush map for you, like we had already in the old days. It's still a viewer, nothing else. Just visualize the crush map. There's more to come. NFS management, we can do iSCSI target management, which, in the meanwhile, was replaced,
13:00
at least through the side, with the iSCSI. In the past releases, we used LRBD. We now switch to SEFI SCSI. We support QoS. We can also manage the modules in the UI and promisors alerting, in the meanwhile. I think, as you can see, there are a lot of stuff went into the UI, a lot of progress that we've made.
13:22
A lot of the effort was, to be honest, to reach feature parity with the old UI that we had, apart from the SEFI dashboard, which was called Open edX. In the meanwhile, we are already ahead, and now I guess we have a good foundation to bound on at the start of, and now there's more to come.
13:43
Another management change, already talked about it briefly, is you can call it the orchestrator sandwich. It's more or less the orchestrator abstraction. The idea behind this, to have a single abstraction layer, where, for example, the CLI and the dashboard could talk to.
14:01
Nevertheless, if underneath the deployment tool is ROC, SEF Ansible, Deepsea, or SSH, for example, so the commands would always be the same, and you could add other orchestrators to it, if you would like. That's exactly what we're currently working on, and that's also the foundation for the SEF CLI,
14:20
or at least the baseline for a unified SEF CLI, so to say. What it can do in the meanwhile, it could fetch your node inventory, it could create and destroy some demons. It could also plink your device LEDs in case it's configured correctly, but it's there.
14:42
And on top of that, you have the unified CLI, so you have a SEF orchestrator command, and then behind you can do things like device, as you can see here, device LS, and then the node, OSD create, and there are many more, so you have a unified UI. And I think that's what I talked already in the dashboard section about.
15:02
The idea is to integrate those service management into the UI in the future, so to soon or later move away as much as we can from the CLI to the dashboard. So in case you would like to use it, you can use the dashboard. If you would like to use the CLI, yeah, you're free to use the CLI as well.
15:23
One of the biggest raiders features that came with Nautilus, I'm not sure how many of you have dealt with the PGNUM already in the past. In case you've ever deployed a SEF cluster, I imagine you had to deal with it at least once.
15:41
There's a way to calculate the best PGNUM for your current pool that you would like to create, for example, but the problem in the past was you could always increase the number of placement crews, but you could never decrease the number of placement crews. So in case your number were too high and you would like to decrease your number, the only way out of it was to create a whole new pool
16:01
and to migrate your data. And this whole black magic now more or less disappears because now we can reduce the PGNUM as well, which is really, really cool. And on top of that, there's a way to do this automatically.
16:20
So you can turn on a management module which then at the end takes care of all of it so you don't have to worry about it. To be honest, that's something, the whole PG calculation and also that people had to worry about, I think that's something that should have never been exposed to the end user. So I think that's the right way of handling it
16:42
and it will improve even further in the future. So that's really, really cool. What else do we have? We're now collecting smart data of our underlaying OSDs. So this at the end and reporting them back to the managers and can take a look at the output there.
17:05
On top of that, there is a module which was mainly driven and developed by ProfitStore and it does calculation and failure prediction on top of that. So if you enable that module, it will scrape your smart data
17:21
and then it looks for some specific errors and error count and whatnot and then it will predict, okay, your disk is going to fail, let's say in five weeks from now roughly or in two months or let's say in five days and you should replace it soon. There is a built-in mode but on top of that, of course, because it was developed by a third party vendor,
17:43
there's also the cloud mode that they are hosting in their environment and then you're, I don't know, sending the data to their nodes. And they told us, I never tried it myself, that this has a higher accuracy and it's absolutely, I think, the best thing you can get out of it.
18:02
One version is free, the other one is paid. But as I said, you could also make just use of the local one. How does this handle at the end? You can just raise some alerts, for example, that something is going to fail or you can also turn it on that this OSD will automatically mark as out
18:20
before this even crashes. CLI command looks like that, for example. If you do a self-device LS, then you get an output like that and there, for example, you get, as I told, five weeks from now, five, eight days. So this one should be replaced rather soon and same here, so you get some prediction, which is really nice.
18:42
Crash reports, in the past, if a service in Ceph failed, you didn't even notice, to be honest. It was automatically more or less restarted and then the logs were stored somewhere on this node and you just had no clue that something happened when it came back on a Monday morning.
19:00
What it does now, all the log files are stored under slash wall lib Ceph crash directory pass and then those are also synced to the mons slash manager here and there you could do something like a Ceph crash info and there you get all the details on what happened.
19:21
That's more or less a request to all of you. There's a module which is called telemetry module. What it does, it just takes those crash reports and uploads them to the community but you don't have to be worried. It's just about some basic information, the current installed Ceph version,
19:40
how your structure looks like, so no critical and confidential data. The idea behind is that we get more predictable and better crash reports so we can, I don't know, fix things that are out there that we are not aware of. So that's just a warm request. If you have anything like a development cluster or something else ongoing, would be cool to get some reports.
20:02
If not even better, then nothing crash but to be honest, I think there's something going to fail. That's enough for management. Let's switch to radars. Again, more or less because I already talked about the biggest feature but that's the one I, the PGNUM reducing feature
20:24
but I think that was also meant to be a management feature from a pure radars perspective, the newest thing is the messenger V2. What is it? Sounds interesting. Between the Ceph demons, there is a protocol used,
20:42
how they communicate, which is other and the problem was that protocol wasn't encrypted at all and it wasn't even possible to encrypt it and the main driver for the messenger V2 was that everyone asked about, okay, I would like to encrypt that communication between my demons on the various nodes, maybe 100 of them, so isn't there a way to do that?
21:01
Yeah, now with Nautilus, there is a way to do that and it's called messenger V2. The cool thing is, the idea behind this that we also support dual stack like IPv4 and IPv6, that's not fully complete yet so right now you can either choose between four or six but it's in the last stretch.
21:22
We also moved the monitor port to 3300, so the IANA port that we got already a while ago and you don't have to worry, we have a dual support for the old messenger and the new messenger version so if you upgrade, for example, the new monitors,
21:42
they will just listen on the other port as well and if you have demons that could connect to the new one already, they will do and if they won't, they will just connect to the old port and messenger V1. Some other radar improvements, you can now set the OSD target memory
22:02
which is really cool because in the past it was kinda complicated to say how much memory will be used at the end by the OSD demons so now you can limit it. We also added NUMA support, you can now pin various OSD demons to specific NUMA nodes.
22:24
On top of that, we have now improved the centralized config management, what does it mean? In the past, you had to store the self.conf file on all the various nodes and now those stuff need to be stored on the monitors and can be managed from there so that's also really cool
22:42
because it's stored within an object. ProPress bars was also something that was heavily requested because there are a lot of long-running tasks in a Ceph cluster, at least that could happen and the problem in the past was you just clicked on do it or for example in the dashboard or you did a command
23:03
on the CLI and then it just disappeared and you had no clue what's going on. With CephProPress which also is reflecting the Ceph-S status which is a Ceph status, you get now output of it so you know what's going on
23:20
at least for the most critical things. Minor fix, not sure if someone ran into that already. Misplaced is no longer a health warning. Someone asked me why is that a thing to name on stage. To be honest, that's a thing because we had some customers who had the problem that they were on duty over the weekend, 24 by seven and if on a Sunday morning
23:44
or Saturday evening, the cluster switched to a health warning just because of a misplacing. They told us all the time really seriously, I could fix that on Monday as well but now my boss called me to fix it right now. That's the reason we can now turn it off.
24:02
If you still like it, there's a flag you can enable it again. Pluestore improvements, we'll switch to the next one. Pluestore has a new bitmap allocator which also more or less falls in the same direction like what the improvements on the OSD side were. You can now have a more predictable memory utilization.
24:24
I think that's what everyone is requesting and less recommendation which is the result out of it. We now have or could make use of our intelligent caching so the memory allocation between different caches like the RocksDB cache and ONodes in the data can be adjusted automatically which is rather cool.
24:43
We get a per pool utilization metrics which is rather helpful and also just bubbles up again to this fdf command and some minor improvements. With, I think, Luminous, yeah, there it is, with Luminous, there was a new device class,
25:04
map introduced, however you wanna call it and the problem was before Nautilus, if you would like to switch to, let's say this SSD, HDD class, you always had to change the crush map and then data was shoveled around and people totally got annoyed by that
25:21
because why do I want to switch if I then have to shift, let's say, a few terabytes of data because it's not necessary. We fixed that more or less so we can now just switch to the new model without the need to shovel any data.
25:41
You can now set a hard limit on the PGLock length which also is really cool because in some specific situation this just led to your uncontrolled memory utilization. There was a bug and it's now more or less fixed by that and there is a new erasure code way
26:01
which is called erasure code, a clay erasure and they promised a better recovery efficiency so the bandwidth between IO during, the users between bandwidth and IO during recovery should be much better, can't tell
26:23
because I haven't tested but that's what they at least promised. RTW, what do we have there? In RTW we can now create different zones where you could, for example, notify and listen on.
26:41
This sets up an event stream you could subscribe to, for example, which you could use for, let's say, also things like function as a service, for example, a specific object was placed, changed, whatever, then something else could be triggered so you can listen to it. We now have an archive zone which can be used
27:04
so your whole data will be cloned and there is a copy of your clone but versionizing on top so every change of your object will be tracked there in a versionized way which is really nice and on top of that we have the possibility to create different tiers
27:21
so as I said, let's say you have an HDD spinner, SSD, NVMe and on top of that an archival zone, your objects could move depending on your rules through all of those tiers so that's another step forward and the beast front, the RTW front end changed again. First of all, it was Apache, then it was Zipit web
27:42
and now it was replaced by beast. Again, as I said, of course, better performance efficiency in everything, what else, why should we change? So yeah, let's see. RBD, I think that's really interesting, the next feature for everyone, use the Ceph cluster underneath
28:01
or their virtual machine environment. We now support RBD live migration which is really cool so you can migrate an RBD from one pool to another. In the past that wasn't possible, if you create an RBD in a specific pool you are bound to it and now you can just
28:21
live migrate between different pools even between replicated and eraser coded pools. I think that's a big step forward. Another improvement is the RBD top command. If you ever use any top command you know what I'm talking about, how this is supposed to look like.
28:43
There you get some output and statistics about your RBD device, obviously. The command on the CLI, as you could guess, is of course called RBD perf image iotop which makes total sense to me
29:01
because why shouldn't I call it RBD top if I could put perf image iotop? Yeah, nevertheless, hopefully that will change. So if you would like to give it a try because we had some requests already I think a couple of while ago in the IC channel that I remember where someone asked, didn't you add the RBD top command? We said yes, it's there.
29:21
I tried it but my shell always tells me unknown command. Where is it? And then we told him, yeah, obviously it's called RBD perf image iotop. That doesn't make sense to you? No. As you can see here, what you would expect, some of the read and write iops, read and write bytes, same with latency.
29:41
So yeah, this improves also the statistics of RBDs and was something that was requested by day one, more or less. RBD misc. And I talked already briefly about the RBD mirror functionality that was added to the dashboard. The main reason this is now easily possible
30:01
was due to some changes that were made in Nautilus and also the configuration changes on central place that makes it really easy compared to older versions. Post Nautilus, we now support namespaces so you can create security domains within pools and lock specific clients into it
30:22
which is really nifty. We also support pool level overrides. In the past, for example, you were capable of setting, activating the caching on an RBD device, for example, and now you can also do this on the pool level and the RBD ls command sounds like just a minor thing but now also lists the creation access
30:43
and modification timestamp which is again really helpful for administrators. Enough of that. Let's go to CephFS. What do we have there? Yay, multi Ceph, multi FS volume support is stable now.
31:02
What does it mean? You can now have multiple CephFS file system within your Ceph cluster. Each of them have their independent set of rate of pools and MDSs. So that's something you have to be aware of but at least it's possible and it's called stable now with Nautilus.
31:22
We have a subvolume concept which was mainly copied from the Manila OpenStack driver into now the Ceph Manager. So we can now create subvolumes with their own quota, own suffix user keys restrictions and all of that. So that's also really cool.
31:41
And again, the unified CLI. So you now have CephFS volume, CephFS subvolume to fiddle around with those things which is really handy. On top of that, if you would like to use NFS, obviously we're using NFS Ganesha and we're now supporting active-active deployment.
32:04
Please don't get confused by active-active or active-passive. That doesn't mean that you can only have one active and one passive or one active and one active. In this case, this only means multiple active or just one active and multiple passive ones. So that's not bound to just two nodes. So in case you're wondering about that
32:23
because I had this question already. We are now handling, thanks to more or less Jeff, a lot of changes that he made and there's a really good talk from the DEFCON check. Check it out, a really good talk where he talks about that briefly and in detail. So if you're interested in those changes,
32:42
I can just recommend the talk from Jeff Layton. The NFS Ganesha demons are fully managed. We are the new orchestrator interface. So first of all, orchestrator interface at the end talks to the NFS Ganesha demons. This is already fully integrated into Rook for example and others are to follow.
33:02
Let's see who then is next and the sub volume and volume concept I showed you just a minute ago is also reflected here. So you can use it on top of CFFS and on your gateways as well.
33:20
CFFS shell, there was something, if I remember correctly, was brought to us by a student that participated in the Outreachy project and this was mainly used, or the idea behind was for scripting purposes of CFFS. So let's say the problem was, for example,
33:40
if you wanted to change the quota attributes, then you had first of all to mount the CFFS process and then you could change the attributes and then you have to unmount it. All of that is now possible within the CFFS shell. It does the whole magic in the background for you. Nothing you have to worry about. So this is really cool.
34:00
And on top of that, yeah, some performance improvement to the MDSs. So something you would expect from a new release. Cool. Let's get to the next topic, which is container. I just assume everyone talked about container already briefly over the weekend
34:21
because everyone is talking about containers so we're going to talk about container as well, obviously. If we take a look at containers, then we have two different views on it, more or less. First of all, we could say, okay, underneath containers, you need to store them somewhere. We need some storage so CIFF could be used
34:42
underneath your, let's say, Kubernetes deployment and we'll put CIFF underneath. That's one way. The other way is, hey, we could maybe use, move our services into containers for scale out reasons and other reasons and that's exactly what we are trying to achieve right now.
35:02
The whole idea, I think, is totally clear. Nothing I have to tell you over and over again. It's just to simplify the OS dependency and more or less the scale out perspective, scale out and upgrade. That's the main driver here.
35:21
I think you heard about it already. Rook, maybe, that's at least the operator that we're using from CIFF's side of life. It's really extremely easy to get it up and running. Even I was capable of setting up a container environment with Rook so you should be capable to do it as well obviously.
35:41
It can already remove adding and remove, it could already add and remove more monitors. It could deploy some demons on top. All of that went already into the Rook development and I think it was three weeks ago and I just checked briefly and I saw that already a bug fix release went out as well so I think we're already at 1.0.1 in the meanwhile
36:02
but the first official release was three weeks ago so it's now also declared and named stable but still obviously there's more to come and bug fixes because now more and more end users make use of it.
36:21
On top of that, we have already talked about right now more or less four deployment mechanisms. We have Rook, we have Deepsea which is driven by Zuse itself which uses salt underneath. Then we have Cef Ansible and on top of that now we have what they call the SSH orchestrator.
36:43
The SSH orchestrator is planned to be the replacement for Cef Deploy because Cef Deploy was deprecated and I know that a lot of people are out there who really love Cef Deploy because it was just the best script and you could just script yourself because everyone is capable of writing a best script
37:02
and the need to learn a deployment tool, man, not everyone is keen on it so that's why they came up with the SSH orchestrator now so you don't have to worry. There is something new you can use and it's, yeah, rather easy. The whole idea here is again to also
37:23
integrate that again into Rook and then at the end into the dashboard again to have efficient, yeah, in one place. The thing that we're struggling with right now and what we're trying to find out is the initial bootstrapping because we need at least a single mon
37:40
and a single manager up and running to be able to start the dashboard on top and from there to deploy all the other services so it's more like a chicken and egg problem so we're trying to find out what's the best way to bootstrap this maybe to ship some pre-configured containers just a single one where the first monitor would spawn for example. That's something that's work in progress and ongoing.
38:05
Community-wise, for those of you who missed it, Ceph Foundation is now a thing. What is the Ceph Foundation? The Ceph Foundation is a direct fund under the Linux Foundation and the whole idea of that and the driver of this was,
38:20
okay, we still have two big vendors who are pushing Ceph apart from all the other community members which is obviously Red Hat and SUSE and the problem was I tried it myself a couple of years ago to get some funding for a community event which was driven by Red Hat and I failed internally because we're not willing
38:42
to send some money to the reddish folks and on the other hand they had the same problems they can't send any money to us. I think it's just forbidden on their bank accounts, I don't know, but the solution to all of that is the Ceph Foundation which is independent now and they had 31 founding member organizations and in the meanwhile three more members have joined
39:04
and they will give you money and they will make the best use of it for community events, for right now they think about hiring, for example, someone dedicated for documentation because everyone is complaining about the Ceph documentation is so horrible.
39:21
Yeah, it's documented in code, I don't get it but yeah, nevertheless they're trying to find someone the same, the idea is to split the community role, they have a community manager which is right now based in the US so the problem is our community events as you can see here, next slide, are spread more or less around the globe so it would be easier to have a dedicated role
39:41
maybe in every region as well and in case you would like to join one of those, next upcoming one is in the Netherlands, after that there is the one at yeah, obviously CERN and the cool thing about it is that CERN right now has a maintenance window and the Sunday before,
40:00
so if you arrive already, let's say Sunday morning or Saturday, there is the possibility to yeah, go down, take a look at the LHC in Efresin so that's really cool so I expect a lot of people to join that day just because of CERN and not because of Ceph so let's see, that's interesting and then after that
40:20
we have the Ceph Day in London and the Ceph Day in Poland. If you're interested to host one, feel free, those are completely driven by the community, you can just reach out and say yeah, we would like to host the Ceph Day somewhere, let's say somewhere in Europe, in the US, I don't care, just send an email
40:42
and we try to help wherever we can and with that, here are just some links again where you find my presentation, it's the same I showed you already at the beginning so in case you would like to read some facts later on and if not, I will come to an end and would like to ask you for questions
41:01
if you have any or if you're just hungry and you would like to have lunch now, I can definitely understand that as well. No questions which could be a good thing at least, I hope so and if you have any, just grab me in the hallway and yeah,
41:23
thanks everyone for joining and enjoy the rest of your day. Thank you.