We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Distributed Storage in the Cloud

00:00

Formal Metadata

Title
Distributed Storage in the Cloud
Title of Series
Number of Parts
47
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Cloud brought many innovations - one of them is inexpensive, scalable and sometimes secure Distributed Storage options. In this presentation we will talk about distributed storage Options modern clouds offers ranging from elastic block devices and object storage to sophisticated transactional data stores.  We will discuss the benefits and new architecture options such distibuted storage systems enable as well as the challenges pitfals you need to be aware about.
Keywords
25
Thumbnail
06:06
29
Event horizonSource codeData storage devicePoint cloudPresentation of a groupDifferent (Kate Ryan album)Level (video gaming)Expert systemInformationComputer animationJSONXMLUML
Type theoryCASE <Informatik>Data storage deviceAxiom of choiceMereologyDatabase transactionFile systemDatabaseCartesian coordinate systemComputer file
Data modelDatabaseNumberDifferent (Kate Ryan album)Key (cryptography)Multiplication signQuery languageCASE <Informatik>JSONXMLUML
DatabaseMultiplicationDifferent (Kate Ryan album)Data modelFormal languageCommunications protocolAxiom of choiceWorkloadAnalytic setMultiplication signPhysical systemRow (database)Cache (computing)Data storage deviceCASE <Informatik>MiniDiscDatabase transactionJSONXMLUML
CAN busDensity of statesCountingData modelPhysical systemMultiplication signPoint cloudData recoveryData storage deviceInstance (computer science)Tracing (software)Server (computing)Scaling (geometry)Hard disk driveCloud computingGame controllerXML
GradientAnalogyService (economics)Utility softwarePower (physics)Open sourceData storage deviceCartesian coordinate systemPoint cloudDifferent (Kate Ryan album)Axiom of choiceCASE <Informatik>Independence (probability theory)NeuroinformatikSoftware developerCloud computingOpen sourceSingle-precision floating-point formatRight angleWave packetUtility softwareInternet service providerComputing platformService (economics)Application service providerEndliche ModelltheorieMedical imaging
Open sourceOpen sourcePoint cloudSoftwareAxiom of choiceRight angleSoftware frameworkCloud computingXML
Open sourceRight angle
Integrated development environmentSoftwareOpen sourcePoint cloudSoftwareRight angleLaptopAxiom of choiceSelf-organizationCloud computingScaling (geometry)Endliche ModelltheorieIntegrated development environmentPoint (geometry)Product (business)View (database)JSONXMLUML
Point cloudRight angleServer (computing)Open sourcePhysical systemJSONXML
Point cloudMomentumType theoryNumberData storage deviceComputer configurationSelf-organizationRight angleComputing platformJSONXMLUML
Human migrationBuildingBlock (periodic table)Interface (computing)MereologyRight angleDifferent (Kate Ryan album)Data storage deviceBuildingHuman migrationComputer animation
Wechselseitige InformationFlash memoryBuildingIntegrated development environmentCartesian coordinate systemLocal ringDifferent (Kate Ryan album)CASE <Informatik>Right angleData storage deviceScalabilityMiniDiscPoint cloudCloud computingFile systemProcess (computing)Replication (computing)Block (periodic table)Data storage device
Block (periodic table)MiniDiscComputer networkData storage deviceProjective planeLocal ringSoftwareConnectivity (graph theory)Block (periodic table)Data storage deviceRight anglePoint cloudOpen sourceCASE <Informatik>NumberAdditionCloud computingComputer configurationOpen setUtility softwareXMLUMLProgram flowchart
Installable File SystemComputer fileData storage deviceGoogolPoint cloudComputer fileLocal ringBitDifferent (Kate Ryan album)File systemTerm (mathematics)Object (grammar)Server (computing)Block (periodic table)Client (computing)Data storage deviceSoftware maintenanceAdditionData storage deviceRight angleSoftwareMedical imagingWeb 2.0QuicksortCategory of beingOpen sourceBackupComputer animationXMLUML
Digital signalData storage deviceServer (computing)BackupPoint cloudComputer configurationData storage deviceGateway (telecommunications)Right angleMultiplication signObject (grammar)Cartesian coordinate systemAdditionComputer animation
Similarity (geometry)Open sourceData storage deviceObject (grammar)DatabaseTerm (mathematics)Category of beingSpacetimeGoogolDifferent (Kate Ryan album)MathematicsXMLUMLComputer animation
Event horizonFactory (trading post)Service (economics)MereologyDatabaseCategory of beingCASE <Informatik>Standard deviationPhysical systemXMLUML
SQL ServerOracleDatabaseCategory of beingConfluence (abstract rewriting)Software frameworkDatabase transactionNumberOpen sourceQuicksortRevision controlDatabaseService (economics)Programming languageRelational databaseCloud computingAxiom of choiceEnterprise architectureSoftwareSpacetimeComputer configurationPoint cloudXMLUML
NumberDatabaseRevision controlEnterprise architectureStandard deviationPoint cloudUML
Distribution (mathematics)DatabaseSimilarity (geometry)Distribution (mathematics)Enterprise architectureAdditionClassical physicsDatabase transactionOpen sourceImplementationPairwise comparisonDifferent (Kate Ryan album)Computer animation
Shift operatorSingle-precision floating-point formatOracleHeat waveSet (mathematics)Different (Kate Ryan album)Heat waveAdditionSource codeNumberQuicksortQuery languageCloud computingComputer animation
Open setEnterprise architectureDatabaseRelational databaseServer (computing)DatabaseTheory of relativityPoint cloudLevel (video gaming)Query languageMultiplicationSource codeCASE <Informatik>Communications protocolGoodness of fitPhysical systemData storage deviceFormal languageAdditionWorkloadClient (computing)Analytic setData modelRelational databaseEnterprise architectureXMLUML
Cache (computing)Term (mathematics)Enterprise architectureTable (information)Electric generatorSource codeRevision controlOpen setData storage deviceTerm (mathematics)Open sourceRevision controlPoint cloudEnterprise architectureDescriptive statisticsCategory of beingCartesian coordinate systemData modelXMLUMLProgram flowchart
Table (information)Magnetic-core memoryProduct (business)Open setSource codeRevision controlEnterprise architectureEnterprise architectureTime seriesRevision controlDatabaseOpen sourceCategory of beingIntegrated development environmentPoint cloudInternetworkingXMLUML
InfinityEnterprise architectureRevision controlDatabasePhysical systemComputer programmingQuery languageMetric systemSpacetimeCommunications protocolExtension (kinesiology)ScalabilityData storage deviceTime seriesData modelData storage deviceDiagram
Lemma (mathematics)Open sourceDatabaseBoundary value problemOpen setMagnetic-core memoryPhysical systemSoftwareTerm (mathematics)QuicksortComputer animationXML
FreewareSoftwareData managementOperator (mathematics)Integrated development environmentData managementService (economics)DatabaseOpen sourceSingle-precision floating-point formatXMLUML
Source codeOpen setData storage deviceDifferent (Kate Ryan album)GradientAreaPoint cloudClassical physicsOpen sourceDatabaseComputer animation
Pointer (computer programming)SoftwareYouTubeVideoconferencingSoftware testingPerfect groupNormal (geometry)Open sourceDatabaseUsabilityInternet service providerOnline chatBasis <Mathematik>Projective planeDifferent (Kate Ryan album)XMLUMLMeeting/Interview
JSONXMLUML
Transcript: English(auto-generated)
Hello everyone. Sorry for a slight delay because of the technical difficulties. So let me just jump straight into that. Today I am going to talk to you about distributed storage in the cloud.
And this presentation is going to be the high-level overview, right? There is a lot of different things to cover so it has to be on the high level. And note that I am not an expert in all the technologies we are going to cover.
So if I am wrong about something, you know, feel free to use the chat to correct the information I am seeing. And that will be a better learning experience for all of us. Now if you think about the storage from a high level, you can see that there
is a lot of storage we need to be dealing with as we build our applications. And some of them are referred to as databases and they come as different names.
And I will really talk to all of them about the storage. And why is that? Well, because in many cases you can really use those interchangeably.
One or another approach may be better or worse, right? So for example, a few years ago, or now a couple of decades ago I would say, we would often make choices between storing files, let's say like small images, inside the database or on the file system.
And that came with benefits and drawbacks. For example, if you store small files in the database, it may not be super efficient, but you can do it as a part of transaction and so on and so forth, right? That is an example of an interchangeable use.
Now if you speak about the databases, databases itself are very complicated. There are different data models, query languages, databases can be built for different purposes. And there is a lot of internal design considerations which may fit for one use case and others.
You can see with database technologies, they come as relational databases, right? Sometimes referred as SQL database and everything else can be referred as NoSQL databases. But they also can be presented as a number of different data models, such
as document stores, key value stores, time series, graphs, and a lot of others. Now what makes it even more complicated is that there are some databases which are multi-modal. So that means that they can really support multiple data models inside of different databases.
And some can even talk different languages and protocols to access the same data. If you think about the database design standpoint, there are also different choices you would have.
We are often separating databases which are focused on operational slash transactional workloads versus analytical workloads, right? Often those are quite different systems, even run by different teams. We can also look at different systems which are designed for other cache versus persistently storing data for a long time to come.
Some systems are based in memory, others require the disk storage. Some systems are designed as natively distributed.
Their others are really very single-node systems. You know, think about something like MySQL or Postgres. And if you want to really distribute them, typically they will have a replication in place. There are column and row store databases as well as the new emerging technologies
such as you would see their blockchain databases coming up for some certain use cases. So there is a lot of complexity, if you will, when it comes to the database. Okay, now we are talking about the storage, specifically distributed storage.
Why do we talk about distributed storage in the cloud and particularly? Well, one is when you are building the large scale applications, you will often need redundancy, performance and scale which requires distributed systems to play.
But even if you are looking at the smaller scale systems, which could potentially be designed as a single, very important server, in the cloud, you don't really do that. Cloud doesn't really work well with that model, which sometimes referred to as treating your servers as a bat, right?
Because you do not really have as much control in the cloud. And in a worse case, you should count on the node which can disappear at any time without a trace.
You have a physical server. Well, you kind of know, even if it crashes, you often may be able to do some advanced recovery tools to recover data from the hard drives. Right? And some of the people who've been maybe in C-admin roles for many years have experienced putting together raids, which kind of systems which fall apart and doing that manually.
That is not how things work in the cloud. And in certain cases, if something doesn't work out, you may just have an instance which is disappearing because you don't have any physical access to the server on the cloud vendor has.
You can't do any of that advanced recovery. But well, let's now talk about the cloud. I think if you look at the cloud and especially the storage in the cloud, you can think about the different approaches, how you design your application.
One is thinking about the cloud as a utility computer. Right. And this is actually the image. Which came from Amazon itself, which was explaining the cloud to people 10, 15, I don't know how many years ago.
Right. And a cloud was not understood. And they talk about the cloud as utility, which is something which is kind of undifferentiated and commoditized. Think about water or electricity.
Right. Well, you obviously need them, but they are essentially the same. Right. You don't really count about providers. Where it has evolved, though, is what the cloud turned into being a proprietary platform. Right. Which is kind of you can think about that as a similar what we had seen 20 years ago with, let's say, Microsoft ASP.NET platform.
Now we have AWS or Google, so Microsoft, their proprietary platform, which has a lot of services which you can use together.
But each, of course, comes with some downside as well. What is interesting, though, is the same as it happened with kind of not on prem development model. So the open source is gradually catching up. And now we have a choice of how you really use and utilize the cloud.
You can think about really going all in on a proprietary services. Right. You could log in with cloud vendors. There is going to probably a lot of tools which are nice, well documented, integrated, lots of training and so on and so forth.
But you will be essentially a host of a single vendor and don't have a choice. Or you can choose open source stack, like, for example, coming from cloud native foundation, which
you can choose to run everywhere, use open source and treat cloud as utility and commodity. Right. So I think these are very important choices to consider. And really, in my experience, the different customers, different folks, they make a different choice in this case.
So I choose really betting on the single vendor because they believe that allows them to get the market faster. Others, they value independence and prefer to go with open source. There are actually even three choices what you can embrace as for technologies in the cloud.
One is a proprietary solutions by the cloud vendors. Let's say what Amazon does. Then there are proprietary solutions which come from a third party. Right. Let's say cloud marketplace or something. And there are the open source solutions.
And this is framework where I am going to talk about the different solutions we will see. Now, one thing you will see for a lot of the clouds is they are marketing their software as open source compatible.
Right. Say Amazon Aurora is compatible with Postgres or MySQL. But let's make no mistake. That is really one way compatibility which says, hey, if you've been running on this open source systems, you will be able to move to this technology.
And then there are going to be some additional advanced features which we hope you will use. And then guess what? You will be logged in as any proprietary software. Right. With that, I think it's very important to really think about open source,
especially in the cloud from the practical point of view. And this is how I approach open source. One of the questions that I would tend to ask to classify if a software we are talking about is truly open source or not.
Because you know what? Those days there are a lot of folks which are marketing their software as open source. Where it really is open core. Right. Or some other models.
Just because the open source is loved by so many. So what are those important questions to ask to see if it's open source in the practice? The first one is, can you deploy your solution in your environments without incurring additional costs?
Right. Like classing software or something else. Like it's clear if you are running something in the cloud, there are always costs. But if you say, hey, I just want to deploy it on my laptop. Well, it shouldn't cost you anything.
Any solution like Amazon Aurora, well, you can't really deploy it in your laptop. Right. With some others, you would have a choice. The second is, do you have a broad choice of vendors if you really need help? I think that's also important because some open source may be open source, but practically it's not very, very helpful.
Because maybe there is only, you know, a single company which have all experience about it. And the third one is, can you improve your software so it solves your needs better?
Can you contribute to the software and hire somebody you need? And I think that is important, especially in larger organizations running open source and scales, which often grow into the software needs which are not solved by that particular software. You know, think about Facebook, which did a lot of contributions to MySQL because, well, they are running MySQL at the extreme scale.
And the product as it was built did not completely match their needs. As we look at solutions, I will focus on the top cloud vendors in the Western world.
Right. AWS, Azure and Google Cloud. If you think about the open source solutions, I will focus on the solutions which are around the Kubernetes or cloud native ecosystems. Why is that? Because I think that Kubernetes is now the leading open source operating system for private and public cloud.
Right. It's like Linux a couple of decades ago, but not for a single node, but actually for a large amount of servers. Now, some people ask me about why I focus on the Kubernetes drive and OpenStack.
And the reality is I see a lot of people deploying Kubernetes successfully and much more momentum than the OpenStack, who is also deployed, but typically only by small number of large organizations because it is complicated.
The other upside of Kubernetes is what it is really available as a managed solution, both from the same private cloud, public clouds I mentioned, but also from large amounts of the other solutions.
Right. Let's think about Red Hat or SUSE or VMware. All of them have Kubernetes support in their platforms. And if your company is having a strong relationship with one of those vendors, then adopting Kubernetes can be easier for you.
Okay, let's now look at the storage types and the options. Now I will split those storage in kind of two big parts. One is your commodity storage. Right. And this is something that has a relatively simple interface.
If you need to move this stuff between different clouds, it doesn't require a lot of effort for migrations. It doesn't create a very strong lock-in. Often it really makes sense to use this as a solution, as a building block,
as you roll out your more complicated infrastructure. So, for example, node local storage. Right. That is basically your local disk which exists in majority of the clouds.
And you utilize it from your application in pretty much the same way you place some local file system on it. And really there is not a lot of difference on whatever cloud you are running on or whatever even you are running in a non-cloud.
All cloud vendors in this case have options. But one thing you need to note is what performance can differ by quite a lot. Right. Different cloud vendors, different solutions for local store can have a performance which is different by 10x, 100x or more.
Typically NVMe flash storage is what tends to be the fastest. And if in the cloud you should think about the local storage, it is obviously not reliable and not super scalable.
But often it is a great building block for distributed storage or something that you can use as fast and cheap for local processing needs. Temporary files, right, and whatever it is. For example, in our case, if you are running the cluster MySQL solution from Percona called Percona XDB cluster,
we often recommend using the local flash storage to do that because it already does replication and it works very well on non-redundant but very fast local storage to really provide you with highly available robust environment.
The next one is network block storage. All cloud vendors have something here, it is CVS for Amazon, Azure managed disks, GCP is called Persistent.
Disks, as you can see, right. And these often perform the same as a local storage, but they are actually network backed so you can kill the instance, right, or it may die on you while the data on those network block storage is persisted.
Right. And that is considered highly available solutions. Now what is interesting in this case is that there are also a number of proprietary vendors in this space, which provide specifically NetApp and Portworx, which provide additional solutions in this regard, right.
They claim to have better performance, so maybe deduplication, backup, replication, all the kind of additional stuff, which classically was available for the enterprise, Sun, NAS solutions.
There is also a bunch of open source options which exist in this case, right. Ceph, Rook, Longhorn, Open EBS, right. OpenStack, they also had this block storage project called Cinder,
which all allows you to build the network block storage from basically a bunch of local disks, right. That is something what you can do, but again, as I mentioned in many cases, we can consider cloud block storage, network block storage,
as a utility component which is not seriously differentiated between different clouds and just use that as a component without strong lock-in. Then there is a file storage. That is basically your NFS file system, right.
If you need that kind of API, then again, all the cloud vendors, they have something which is available. Again, it's not kind of super seriously differentiated. In all cases, you just mount it as a local file system. There is a little bit of difference about how permissions are managed, but that is not significant.
Again, NetApp and Portworx have their own interfaces, same as in an open source space, you can see in some of the projects, they allow you to not only talk through a block device protocol,
but also expose it as a network file system. In my experience, there kind of this shared giant file system store is not as commonly used those days as it was before.
Like for example, there we used to use NFS for backups on remote servers. Now we are moving those to object store like S3s, right. Or if you say, hey, you know what, I need to have 50 terabytes of images being served on a web server.
Again, that's what we typically serve now by object storage, not on a web server which runs on some sort of NFS file system.
Okay, that brings us to the object store. And that is very valuable, very efficient storage. Interesting property of this is as you put the data on this object storage, it becomes accessible directly by the client through HTTP or HTTPS protocol, right.
That means you don't really need to maintain additional server infrastructure to serve the files through HTTP protocol. IV as a device has an S3, Azure has blob storage, GCP has a Google storage.
They're all slightly different, especially in terms of uploading the data and managing the files, but they all in the end can serve the data through HTTP protocol. Here, I think the situation becomes a little bit more interesting with vendor storage. NetApp and Portworx, they again have their solutions, right,
the gateways to their S3-like interfaces, right. But what is I think more interesting here is there for all the previous storage time I've mentioned, you typically want to keep your data in the same data,
in the same place as you host your application, right. Like it would make sense for you or frankly it wouldn't be even possible to, let's say, run your server, application server on GCP
and mount the EDS storage from Amazon to that. When it comes to object store, you have additional vendors like Wasabi, Blackbase, DigitalOcean, Linode, and often you can meet and match vendors based on your needs. For example, some of those can be used as a rather low-cost backup option
if you need to store your data at a lower cost compared to what the major cloud would provide. If you look at the object store in an open-source space,
that is where Minio I think is an absolutely dominant player in terms of being able to roll out your open-source object store. There are also solutions like Ceph and Rook which you can use.
Okay, with that, let us move to a more complicated database and data stores. The difference with those is that they are rather highly differentiated.
That means that similar offerings are not easily replaceable. If you look at the, let's say, PostgresQL at Amazon and PostgresQL on Google GCP, while they are all based on Postgres, there is going to be a substantial amount of changes to move within them.
And if you specifically look at the property options, for example, looking at a Cosmos DB versus Spanner or something like that, then that is going to be completely different situations, right?
So in this case, if you are choosing the property versions, you typically have much more looking. Now, we'll also talk in this space about the queues, streams, data pipelines,
which are designed for moving data around often with persistence, which is not conventionally database, but this is a very important part of modern data infrastructure, and they are also highly differentiated. So I put them in this section as well.
So what do we have in this case? Well, you can see also for this kind of solutions, there is quite a few different solutions which is available, and especially on AWS. They are all different properties in this case,
but operating broadly in this case. If you look at a recent space, though, you can see the Kafka have emerged as really this most common standard
for moving data around between the database system. And if you think about the property solutions, you will find that there is both an enterprise software offering from Confluent for Kafka,
as well as managed Kafka by a number of vendors, such as Avian and Instaclusters, which are third-party. There is also a lot of open-source choices in this space.
You can think about Apache Kafka, Pulsar, RabbitMQ, ActiveMQ. Actually, one thing which is interesting about this space is similar to many options available in the cloud, there are also huge number of options which are available in the open-source space as well.
Often some solutions are specific to the particular framework or programming language as well. So that is listed by far not exhausted.
Okay. Now let's look at relational databases and specifically on the transaction. Here is what our cloud vendors offer us. And you can see at the high level,
it's either offering the sort of cloud versions or database as a service of the open-source and proprietary software, as well as something which is newly built on top of that,
such as AWS offers Aurora, Azure has the SQL database and the Hyperscale offers, so GCP has Spanner which they increasingly promote
as their sort of large-scale distributed database. If you think about the proprietary solutions, well, you can see both our standard enterprise database vendors,
such as Oracle and Microsoft, as well as a number of newcomers such as Hugo White and Cockroach DB, which offer you obviously commercial cloud versions of their products, as well as folks like Instacluster and Avian
who manage a number of databases for you. And in open-source space, well, that is where you would see your classical transactional open-source databases. I would highlight a couple here. One is obviously PostgreSQL,
which is in our experience giving the most traction those days. The second one I would mention is HugoByte, which is a very interesting distributed database which is built from the ground up with PostgreSQL comparability.
And the last one is TyDB, which is built by Pincap, kind of similar but different in implementation details from HugoByte architecture, also distributed database, but this one uses MySQL as a comparability protocol, not HugoByte.
And that is where I also can throw the PerconaHAT in. We provide distributions for MySQL and Postgres with some additional value, right, typically focused on the enterprise use cases, but in completely open-source license.
Relational analytical databases, that is another setting. You can see it has a completely different set of solutions. From all the cloud vendors and inappropriate solutions as well.
Obviously Oracle has solutions for data analysis, but we can see also a number of additional vendors here, such as Snowflake and Verica are often seen. I also would mention Oracle HeatWave.
That is something which they came about recently, and this is sort of accelerator layer for MySQL for that to be able to execute analytical queries much faster.
Unfortunately Oracle HeatWave only runs on Oracle Cloud and it's not open source. If you look at the open source, there is a lot of solutions here. And generally these are kind of like a very big ecosystem,
so at least some of them. You can have Spark and Hadoop, which has ecosystem with a lot of tools. There is Presta and Trina. These are the different systems which are often designed to be able to kind of query multiple data sources through SQL.
SQL level which can be very helpful for analytical queries. And there are solutions like ClickHouse, MariaDB, ColumnStore and IDB.
I think ClickHouse deserves a very interesting mention in this case because that is like one of those databases which is sort of able to talk multiple languages. In addition to being able to talk through its native ClickHouse protocol, it can talk with MySQL or PostgreSQL clients
which makes it a very useful database for many analytical workloads. And IDB is interesting and you may notice what that is present on operational database and here
because it is the hybrid database. It can do both transactional queries as well as analytical. DocumentStore. Well, the DocumentStore is when we do not use a relational data model but store the documents, typically JSON, in the database.
Here are the solutions which all of the clouds offers and you can see we have the native solutions. If you think about the proprietary solutions, MongoDB, cloud and enterprise version,
Cloudbase are the most common in this case. If you're looking at completely open source solutions, well, actually MongoDB would be your source available. Then MongoDB and Cloudbase have community versions. Also, think about relational databases.
Both PostgreSQL and MySQL actually has a pretty good JSON support those days and in many cases they can be used as a document database. MySQL even have a docstore that is a MongoDB-like protocol. You can talk to it.
Then the next item to cover is a KeyValueStore. KeyValueStore, I think we operate in two different areas. One is for caching and another is for persistent store.
For caching, you can see a lot of stuff dominated by Redis-based solutions. Same in terms of proprietary. While there are a lot of KeyValueStores, I think Redis is really dominating. If you think about the open source, there is Redis, of course.
Some applications still use Memcached, which was popular before Redis came out of age. I would also mention KeyDB, which is Redis fork with some additional performance improvements. If you look at the persistent KeyValueStore,
you can see where the clouds offer something else, like a different KeyValueStore optimized systems. Now, I call them KeyValueStores very broad. Some of those systems, including Redis, have a more complicated data model than plain KeyValueStore, but there is no good description I could come up with.
In the properties space, we have Redis cloud, Enterprise Redis, and really Enterprise versions of all those open source solutions. Pretty much most of them have a community and Enterprise version
in those days, where Enterprise is pretty much being appropriated. Finally, what I wanted to mention is the time series databases. The time series is interesting and phenomenal.
Their purpose-built time series databases is something which is one of the fastest growing categories over the last five years or so. And that is very usable for storing various sensor data and so on and so forth.
And you can think about both for monitoring our complicated environments of growth in the internet of things, this becomes increasingly important. All the cloud vendors, they offer something on their own. And you can see that there is not a lot of commercial database solutions
which you see on the market from this kind of legacy vendors, but there is a lot of solutions from the open source vendors, which many of them have a proprietary Enterprise version.
And let me mention a few here. One is Prometheus, which is a very interesting database, which really has been dominating as a data store for metrics in the monitoring space in particular.
It has Cortex and M3 as databases which were built to talk from QL, like Prometheus protocol, but offer some of the advanced scalability options, as well as Victoria metrics, which is actually quite a cool solution,
which was initially built around Prometheus, but it was extended to talk different protocols as well. Like for example, it can speak in FluxDB protocol.
In FluxDB is another very popular time series database. It's not just for monitoring, but for other stuff as well, something else to consider. And TimescaleDB, last but not least, is a PostgreSQL extension. So where other database systems here, they use their own non-relational data model
and program and query language, which actually can be quite useful for time series needs. TimescaleDB is a PostgreSQL extension, which typically makes PostgreSQL a lot better for storing time series data.
Okay, with that, let me also explain maybe in this space, and I opened the distributed storage in the cloud, what is per corner role, and why I'm talking to you all about this.
Now, if you think about what we are doing at Per Corner, really look at pushing the boundaries for open source databases. And really focus on the open source as I define that as really as an open source for real,
not some sort of open core software, which is marketed as an open source, but which is in the end comes to the same lock-in as a proprietary software. At this point, we focus on MySQL, MongoDB, and Postgres.
MongoDB unfortunately became source available in the last few years, but well, it's kind of too late for us to not to support the system as we have so many customers rely on Per Corner for MongoDB those days.
What we do in terms of software, we provide free and open source software with some additional functionality, both for Linux as well as for Kubernetes.
We have very advanced operators to run MySQL, MongoDB, and Postgres in a Kubernetes environment. And we also built the tool called Per Corner Monitoring Management, again, 100% open source, which can both serve as a single pane of glass
for MySQL, MongoDB, and Postgres to provide you observability, some management features. And what we see is really most important in the market those days is databases service experience, right? Instead of installing a bunch of packages, you can deploy a database in a few clicks, right?
That is how we are supporting the open source database market those days. So to sum things up, you can see what distributed storage in the cloud is actually gets quite complicated.
You can see what there is no one size fits all, right? There is a lot of different solution depending on what you need to be done. But I think what we can see as a grade is for all the areas where you are looking at the storage, like as a kind of classical file-based storage,
or you are looking some complicated database and data store, there is a lot of open source solutions available. With that, that's all what I had. And I would be happy to answer some questions if there are any.
How much, Peter? Okay. For the people here in the VDB, please ask a question. You can do it via chat or via microphone.
You can have now the permission for it. And for the people in the YouTube live stream, you can go downstairs at live and first come from today on the HS4, and there, please, on the video conference, and then you can come to the VDB here and ask a question.
Perfect. Okay, questions, questions. Any questions? Are they coming in the public chat or something else? Everybody can switch on the microphones and the question if you want.
Sounds good, sounds good. You gave a very good overview over the open source software.
Which software are you using for your normal business? Again, for what? Which software are you using for the open source software? Oh, well, I mean, it's quite a lot, right?
I mean, if you look, I use a lot in the Linux test space, open source databases like MySQL, MongoDB, Postgres, right? So, obviously, various tools which can be found.
You know, Grafana is a project I use that is quite extensively. Yeah. But I think that is kind of not really a very meaningful question, if you say, right?
Because I'm a CEO, right? So, my needs in the open source software is quite different. And typically, the work you are in, that kind of defines what open source software you will be using on daily basis.
Thank you. Other questions from the audience?
Do you code some softwares for the open source community?
So, do you help on some softwares there? Well, I mean, look, I personally may write some code, but that is very, very little. Very little those days. I mostly work on the design of a project, right?
Or some tests and usability of the software which Grafana provides. Okay. Thank you for that. Interesting. And where do you see all those questions coming up? I don't see them in the public chat.
I'm working on here because I'm a chairwoman and when somebody is asking a question, I'm asking questions. I see. I see. So, you're just helping me out a bit. Okay.
Well, everybody is for quite a bit. So, there are no other questions. So, thank you very much, Peter, for your talk. And, yeah. For the rest of the audience, have fun on the FrostGone. The next talk here will go out at 5.30.
It's in German. There's a talk issue. So, have fun here and, yeah, maybe see you again. Bye. Okay. Thank you. Bye-bye.