MySQL: Scaling & High Availability
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Serientitel | ||
Anzahl der Teile | 644 | |
Autor | ||
Lizenz | CC-Namensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/41498 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | |
Genre |
00:00
GammafunktionZentrische StreckungJensen-MaßInformationsspeicherungInverser LimesElektronische PublikationTabelleSoftwareSystemzusammenbruchSkalierbarkeitEinfache GenauigkeitGebäude <Mathematik>QuellcodeOffene MengeDatenreplikationKugelGruppenkeimEDV-BeratungGlobale OptimierungAbfrageCachingArchitektur <Informatik>Güte der AnpassungProgrammfehlerWeg <Topologie>EinmaleinsUmsetzung <Informatik>NetzbetriebssystemTransformation <Mathematik>Lokales MinimumEreignishorizontAuswahlaxiomMultiplikationsoperatorLASER <Mikrocomputer>Inverser LimesStabilitätstheorie <Logik>TabelleRechter WinkelOpen SourceEigentliche AbbildungBildschirmfensterData MiningSkalarproduktDatenbankWellenlehreGebäude <Mathematik>SpieltheorieDateiverwaltungTwitter <Softwareplattform>Jensen-MaßSoftwareVersionsverwaltungGesetz <Physik>Elektronische PublikationSkalierbarkeitZahlenbereichLoginSoftwareentwicklerCASE <Informatik>TVD-VerfahrenMAPEDV-BeratungGruppenoperationBitQuadratzahlE-MailDatenreplikationMailing-Listet-TestBefehl <Informatik>AbfrageCachingInformationsspeicherungRandomisierungInformationEinfache GenauigkeitFlächeninhaltMetadatenGlobale OptimierungWeb SiteStreaming <Kommunikationstechnik>ComputerspielBenutzerschnittstellenverwaltungssystemArithmetische FolgePunktBenutzerbeteiligungKugelFokalpunktComputeranimation
09:12
DatenreplikationBefehl <Informatik>AbfrageLoopHash-AlgorithmusOrakel <Informatik>InformationsspeicherungSichtenkonzeptMaßstabDienst <Informatik>PartitionsfunktionCodeVersionsverwaltungKartesische KoordinatenUnternehmensarchitekturRechter WinkelBefehl <Informatik>Quick-SortMathematikComputerarchitekturDatenbankRuhmasseCachingFokalpunktSpezielle unitäre GruppeDatenreplikationEnergiedichteCASE <Informatik>Grundsätze ordnungsmäßiger DatenverarbeitungFront-End <Software>MultiplikationsoperatorPhysikalisches SystemLastteilungAnalytische MengeBeanspruchungWeb logAlgorithmische ProgrammierspracheSpeicherabzugComputerspielInformationsspeicherungMixed RealityDatenstrukturEDV-BeratungVollständiger VerbandQuaderCodeOrakel <Informatik>Zentrische StreckungCheat <Computerspiel>PartitionsfunktionWeb SitePunktwolkeHash-AlgorithmusPunktLASER <Mikrocomputer>Generator <Informatik>Framework <Informatik>ZahlenbereichInklusion <Mathematik>MetadatenWhiteboardAbfrageGlobale OptimierungDatensatzPolstelleSchreiben <Datenverarbeitung>ZweiGebäude <Mathematik>MySQL 5.0Hierarchische StrukturLastCMM <Software Engineering>Computeranimation
18:09
SkalierbarkeitServerDämon <Informatik>COMWechselseitige InformationBefehlsprozessorDatenreplikationPunktwolkeGlobale OptimierungQuellcodeParallele SchnittstelleMultiplikationBefehlsprozessorCASE <Informatik>Rechter WinkelRauschenApproximationInformationsspeicherungSpeicherabzugMultiplikationsoperatorInterface <Schaltung>Güte der AnpassungSicherungskopieFunktionalDigitales ZertifikatTwitter <Softwareplattform>Dienst <Informatik>SoftwareFokalpunktPunktwolkeAdditionSkalierbarkeitAussage <Mathematik>EnergiedichteÄußere Algebra eines ModulsParallele SchnittstelleMultiplikationLoginGenerator <Informatik>SymmetrieSoftwareentwicklerZweiDatenreplikationProjektive EbenePhysikalisches SystemGlobale OptimierungKonfigurationsraumServerZahlenbereichZentrische StreckungDatenbankOpen SourceOrakel <Informatik>TopologieFeasibility-StudieInstantiierungPrototypingAlgebraisches ModellRechenzentrumFormale SpracheEindringerkennungProgrammierumgebungObjekt <Kategorie>LastteilungDatenverwaltungTypentheorieHochverfügbarkeitNummernsystemUmwandlungsenthalpieMehrkernprozessorTUNIS <Programm>Schreiben <Datenverarbeitung>Computeranimation
27:06
SoftwareentwicklerBefehlsprozessorAbfrageThreadMaßstabPunktDateisystemDatenverwaltungOffene MengeQuellcodeNabel <Mathematik>GruppenkeimDatenreplikationServerOpen SourceDatenbankRelationale DatenbankArchitektur <Informatik>PunktwolkeDiskrete-Elemente-MethodeBefehlsprozessorVerschlingungVersionsverwaltungThreadWeb logEinfache GenauigkeitMereologiePlug inEin-AusgabeGraphKlasse <Mathematik>Zusammenhängender GraphDatenverwaltungDatenbankUnternehmensarchitekturShape <Informatik>QuellcodeInformationsspeicherungSkalierbarkeitMultiplikationsoperatorTabelleSoftwareentwicklerPeer-to-Peer-NetzRechter WinkelService providerProdukt <Mathematik>AggregatzustandMarketinginformationssystemMinkowski-MetrikOpen SourceSpeicherabzugDifferenteUmsetzung <Informatik>Mini-DiscPhysikalisches SystemGüte der AnpassungCluster <Rechnernetz>DatenparallelitätTermEnergiedichteMultigraphLastFacebookRuhmasseResultanteProtokoll <Datenverarbeitungssystem>CASE <Informatik>ServerAbfrageEreignishorizontHilfesystemProxy ServerZahlenbereichKartesische KoordinatenSoftwareGruppenoperationFlächeninhaltDatenreplikationLastteilungRelationale DatenbankQuadratzahlFirewallFormale SpracheElastische DeformationZentrische StreckungZeitreihenanalyseComputeranimation
36:03
InstantiierungEinfache GenauigkeitPolygonzugDatenbankAbfrageMathematikSchiefe WahrscheinlichkeitsverteilungOrakel <Informatik>Parallele SchnittstellePortscannerMenütechnikEinsZahlenbereichKartesische KoordinatenSkalierbarkeitMathematikAbfrageSchiefe WahrscheinlichkeitsverteilungCASE <Informatik>DatenreplikationDatenanalyseServerKomplex <Algebra>VollständigkeitKeller <Informatik>Interaktives FernsehenEinfache GenauigkeitProgrammierumgebungDatenbankZehnProdukt <Mathematik>MultiplikationDatensatzSoftwareentwicklerQuellcodePunktwolkeQuick-SortOpen SourceZweiQuaderOffene MengeFacebookDistributionenraumComputersicherheitNichtlinearer OperatorCloud ComputingZentrische StreckungParallele SchnittstellePolygonzugDienst <Informatik>Klassische PhysikService providerStereometrieMultigraphVersionsverwaltungWidgetLastRechter WinkelBasis <Mathematik>BenutzerfreundlichkeitGleitendes MittelProzess <Informatik>Minkowski-MetrikFreewareDatenverwaltungSchlüsselverwaltungSchreiben <Datenverarbeitung>InstantiierungCoprozessorQuadratzahlApp <Programm>Sichtenkonzept
44:59
DatenbankOpen SourceComputeranimation
46:22
Kollaboration <Informatik>Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
00:05
Okay, good afternoon everyone. So I will take you through my journey, my journey with MySQL and through almost now two decades, not just one.
00:22
And for me it all starts in 1999 when I was a student in Moscow State University. And by the way you see there is snow here, right? Because wherever you show Russia you should be showing snow. What is that? Okay, let's see, is that my phone? Maybe that is.
00:46
Okay, yeah, and I got MySQL to be, to use MySQL in the first startup I co-founded in 1999 which was called Spylog. Anything, you know, Russia, Spylog, what kind of startup is that?
01:03
Well, it was not that kind of startup. It was pretty much Google Analytics of the days and at that time we would use MySQL for those kind of things. Remember there is no Hadoop, there is no Spark, there is no ClickHouse, there is nothing else in the open source area at that time.
01:24
So, I got to use MySQL and I started with MySQL 3.23 alpha. I was a student, I was not thinking a lot, right, so I used the alpha software version at that time.
01:41
And actually I learned a lot through using that very version of software. Believe it or not, at that time the MySQL was only introduced and MySQL was kind of so small that Michael Medeiros would personally reply to many of my questions.
02:00
And actually Monty is here today, Monty can wave, so the one and, not the one and only, but one of a few MySQL founders, the Monty is out there, so you can also chase him down for questions as well after the talk. Well, I didn't ask him, but you know, Monty is always good for a good conversation.
02:22
So what kind of challenges we had in 1999 with MySQL? MySQL, that means MySQL table locks, that is kind of very painful. My choice, using the alpha software, we also had a lot of questions, so a lot of Elonian experience.
02:41
And we also had some basic and funny problem by these days, is what Linux version at that time would have two gigabyte file limits from any file system, right? So you couldn't have quite large tables. And also with MySQL I had to face those problems of the table checks and repairs, right?
03:03
We were talking a lot of time. Now, in 1999 we had those MySQL tricks you guys are well aware of by now perhaps. Such as we would chart a number of MySQL nodes for scalability. And even because of MySQL table locks we would chart to have multiple tables and database on the single node, right?
03:27
To make those less painful. And you would have to build lots and lots of summary tables because I would guess aggregating large amount of data in MySQL was never quite.
03:42
Quite pretty. Now, about the same time, in 2000, the MySQL goes open source of GPL license. Before that, and I think many people don't know it, MySQL was not quite open source license.
04:02
It was still available for free if you would use proper operating system like Linux. But if you choose to create pains for developers to build MySQL on Windows, then you would have to pay for that.
04:23
In 2001 we get MySQL free 23 when GA or stable. And that is also the time when InnoDB storage engine was introduced in the MySQL Max edition. As well as we had an initial release of MySQL replication.
04:41
Another interesting event at that time is that MySQL was sued by Progress or New Sphere company. And there was this very famous lawsuit of kind of testing a GPL in a court at that time. Obviously, MySQL won and that's why it's still continuing its journey.
05:05
Now, for me, the challenge since 2001 was stabilizing InnoDB. Usually, I kind of jumped on that in a very early software and then I learned a lot and then I had a lot of downtime.
05:21
But that is wonderful when you're running your own startup, right? You don't have a boss to fire you. So I could have those kind of experiments. And also we had to do a lot of work to get MySQL replication to work. Because MySQL replication first was kind of relatively simple. Hey, let us go ahead and stream all our update statements.
05:43
Maybe we'll supply a little bit of metadata information, like what was the time at statement execution or what was a random seed. But in the early days, the MySQL statement replication worked only in most cases.
06:07
Now, the next stage for me was in 2002, I joined MySQL. And I did a little bit of development, but frankly that was not my thing. And I really transitioned to MySQL support and consulting and led what was called MySQL High Performance Group.
06:28
In 2002, we were powering what was later known as Web.2. And a lot of Web.2 was based on MySQL.
06:41
If you really look back at the companies founded in early 2000, and if you look at the open source databases, MySQL was by far their majority. It may not have been completely only game in town, but was close to that. And the focus for that was changing to the query optimization, right?
07:04
Or as well as working with a lot of those larger companies which would be implementing sharding. Also in 2002, we got this website which is called bugsmysql.com.
07:21
Now, the interesting thing is before that, MySQL was managing bugs through mailing list. And the goal was that there should not be any bugs in release. But the truth is that was easy to achieve when you have a mailing list because you can just forget some emails.
07:41
But as soon as we got bugsmysql.com, I think we never had a release which have all the bugs for that completely closed. So that's kind of a funny thing, right? So if you want to promise there is no bugs in your software, remember, don't implement a bug tracking database.
08:04
MySQL in 2003, that is when MySQL 4.0 was released which had improved replication and query cache. And was the first MySQL user conference taking place back in Silicon Valley. And I think that was a very important transformation for MySQL because we had brought so many people together to really compare notes
08:29
and really be able to talk face-to-face rather than just communicating through mailing list and so on and so forth, right? And I think there was kind of really explosion of the MySQL adoption thinking and the best practice at that time.
08:48
One of the pretty famous outcome from that is talks by Brad Fitzpatrick from LifeJournal. Anybody remembers LifeJournal? No? Some? Okay.
09:00
Yeah, so that was a pretty popular blogging software of the time which used MySQL. And the really popular talk a lot about how exactly the things are organized in LifeJournal. And that was well before it really became popular, right? Because right now we see all those Netflix engineering blogs, Facebook, Google,
09:24
everybody mostly talks about how they're building things, right? But that was not too much so in early 2000s. And what is interesting in this kind of back-end architecture today, we can see a lot of a very similar concept we still use building MySQL application.
09:44
There would be some sort of global database with metadata, there would be some sort of shards with user data, you would probably have something like a load balancer, there would be some caching, right? With memcache and so on and so forth, right? A lot of those concepts which are quite similar.
10:02
So in 2003, if you look at what kind of best practice you would adopt for MySQL, one is caching from memcache. That became very important and popular, right? So if you can't have MySQL to handle the load, cheat by caching. And then also it would have a massive replication, right?
10:22
So you would often have the MySQL node which would replicate to 10 slaves, you would often have even people using hierarchical replication, it would have 10 slaves which each of them would replicate to let's say 5 more slaves and all the kind of really massive MySQL replication hierarchy became commonplace.
10:42
In 2004, MySQL 4.1 is available. And I would call that first a checkbox release, because at that time a lot of MySQL focus was driven by sales and marketing, how we can get the new features which could be done to serve some customers which are asking for them.
11:08
So for MySQL, we have sub-queries which are introduced to MySQL, but they are never quite optimized until many years later. Or prepared statements, which I think are still in MySQL,
11:23
is by far less mature than Postgres or a number of commercial databases. And that was also a year where MySQL cluster or NDB has become first available. For myself, I started blogging about MySQL.
11:43
And I pull out my first post on the LiveJournal about MySQL, which is talking about the fact that MySQL doesn't have a hash join and how you can work it around. The funny thing is that 14 years later, MySQL still doesn't have a hash join.
12:03
So I guess that is still relevant. Now, in 2005, we got MySQL 5.0, which I would call the second checkbox release, which added store procedures, use, and triggers. But it wasn't very well integrated, I would say.
12:24
You would get triggers, and we wouldn't quite mix with replication well and all this kind of stuff. But hey, you know what? We got a lot of checkboxes in MySQL 5.0. It almost looked like an enterprise database at that point.
12:42
But another important change to MySQL came if Oracle acquired InnoBase, the creators of InnoDB. Now, I think what is very impactful for MySQL outside of this community, we got Puppet release. And Puppet was the first of a new generation of automation framework.
13:04
You think Puppet, Chef, Ansible, which really started to ingrain people in new thinking. You should not be manually managing your databases because that doesn't scale. You should puppetize everything or automate it with Chef and so on and so forth.
13:22
That wasn't quite a common place in 2005, but that was the start of the practices. In 2006, MySQL was crying if InnoBase acquisition fell out. Because even by 2006, the most large-scale MySQL users would be using InnoDB as a storage engine.
13:42
So what MySQL did is they bought a company called Netfrastructure, and Jim Starkey, which was Firebird founder, joined MySQL to implement the Falcon storage engine, which was supposed to be a much better replacement for InnoDB.
14:01
Anybody remember Falcon? Oh, yes, we have some heroes. What also happened interesting in this case is we started to have Hadoop being available, right? And why that is important, because over the years, we have started to see some of those analytical workloads
14:21
crunching through a large amount of data, moving from MySQL to Hadoop, right? Because MySQL didn't quite create it well for that. For myself, in 2006, I started on MySQL performance blog, right?
14:42
And that was our pretty popular blog about MySQL performance for a long time. And you see, we picked here the WordPress theme, Boxy But Gold, which nobody else used. And maybe you can see why.
15:01
But that means, even from very far, our website was very well recognizable. And that's also here, where I started Percona to give it by Jim Kachenko. And we were focused on performance. Percona pretty much stays for performance consultants, right?
15:23
That's where the name comes from. And our focus was helping companies to scale, to optimize MySQL. And we really worked with a lot of companies using MySQL at scale at that time. In 2008, we got MySQL 5.1 coming out, which would include features like partitioning and role-based replication.
15:49
Right now, again, if all those features which we added, it was understood MySQL needs more kind of safe and mature replication technology and statement-based replication.
16:01
We also have some microsystem acquires MySQL AB at that time, right? That's the first acquisition in the MySQL history. Another thing is what that is a year when we can see the born of a modern cloud. That is where Amazon EC2 has become generally available.
16:25
And again, while the cloud was not nearly mainstream at that time, that's when MySQL started its cloud journey. In 2008, we at Percona saw a lot of what I would call the dysfunction between Oracle slash InnoDB
16:46
and Sun slash MySQL in this case, right? Because MySQL was really, while users wanted to use InnoDB and to see that being better, MySQL and Sun wanted to downplay InnoDB and kind of hold off innovation out there.
17:05
And so it was easy to replace with Falcon. And we had a certain opportunity out there. And actually, we need, because we saw what it was impossible to scale InnoDB without actually writing code and fixing some problems out there.
17:24
And that is how we created Percona X-RayDB, the fork of InnoDB. That is also when we put a lot of time and effort and wrote high-performance MySQL book. That is called Second Edition, but actually, that was a completely rewritten book at that time.
17:46
And some of you may have read that. In 2009, we see the second acquisition in the MySQL Times, right? We see Oracle acquire Sun Microsystems, and so it gets MySQL.
18:01
And oh my gosh, that was a big deal for everybody. Because in MySQL, we were out there to get Oracle, right? It was the whole idea of a company, hey, we are going to get out there and democratize MySQL market and kind of really make Oracle pretty much irrelevant.
18:23
But that's not what happened, and I think that was the shock for a lot of people in the team, right? And that's where we get a lot also from that as an outfall, a lot of those rumors what Oracle will kill MySQL. Have you heard about that?
18:42
No? Anyone? Yeah, well, so what is it? Nine years and counting, right? Yet to see. Monty started his second database company MariaDB at that time, or MariaDB Project, right? The company had a different name to ensure the future of MySQL
19:04
and to have an independently run MySQL alternative. That is also the date when Amazon already asked for MySQL. The first database as a service became available.
19:21
And also that is when we get the MongoDB, right? I would say the leading NoSQL database was first released. 2010, we got MySQL 5.5, which has a lot of focus on scalability, finally, right? I would say in 5.5, we get a first big improvement in energy performance
19:45
because the teams are integrated, Falcon was ditched, and really a lot of focus was to make it in the DB actually work very well with MySQL. That is also where we see first release of a performance scheme
20:01
as a way to really understand what's going on inside MySQL better. Now, if you look at the highlights of the ecosystem at that time, that is where we got OpenStack initial release, right? And kind of supporting the other thing in the cloud ecosystem, the phenomenon of a private cloud, right?
20:21
People able not only to use solutions like Amazon, right? Using their infrastructure, but have the completely open source way to write cloud in your data center. From Percona's side, we have decided it's not enough just to modify in a DB storage engine.
20:42
There are some performance improvements in the features we want to do on the server side. So we created at that time Percona server, which was based on our X3 DB storage engine and a lot of additional improvements. Also, it was for years the problem of taking hot backups for in the DB, right?
21:08
In the base time, it was actually a rather inexpensive proposition to go and buy in the DB backup from in the base. When Oracle acquired, it was wrapped into the MySQL support, which was actually a pretty expensive value proposition.
21:23
And a lot of folks in the community were suffering not having a good solution for open source backups. Well, so we created one, which is known as Percona Extra Backup, which I guess some of you have used.
21:42
So what do we have challenges in 2010 with MySQL? Scaling with MySQL with multiple CPU cores. By the end of the first decade of 2000, CPUs have not been getting really faster. They have been becoming wider, right? They have been getting more and more cores.
22:01
So it was not only the question of some big-ass expensive service to scale to multiple cores. Even a very basic service would have ever-increasing number of cores, and MySQL was not really designed to scale very well with multi-cores first. So that required a lot of engineering and performance tuning.
22:24
We also had a lot of work put in the MySQL deployment automation. We have a lot of very large MySQL environment at that time, which had to be managed by very small teams of DBAs, right? And that's where the automation of using Puppet to manage MySQL and stuff like that became important.
22:45
And also, much more prevalent cloud made MySQL automation more feasible, right? Because it's much easier to automate everything than if you have a programmable infrastructure. You can spin up, spin down instances, right? Compared to when your automation is limited, what's in the end, you have to have somebody to come and write the new server.
23:04
The specific problem in MySQL at that time was also replication failover automation. Unlike some of the modern database systems, MySQL doesn't really have a replication failover built in. So for example, if you have a master and multiple slaves,
23:22
and your master crashed, then what will happen in MySQL? Well, pretty much nothing, right? You will have a bunch of slaves trying to connect to a master which doesn't exist to recreate that replication infrastructure, to select the new master, adjust your load balancer configuration, and so on and so forth.
23:41
That had to be done manually, and there have been a number of tools created at that time to help with that, such as MySQL MultiMaster Manager or later MHA. In 2012, looking at that problem and as well as the prevalence of the cloud, we have introduced our Perconex DB cluster technology,
24:05
which is really the new generation at that time, cloud friendly, high availability for MySQL. In 2013, we get MySQL 5.6, and that was, again, focused on better scalability performance schema,
24:24
getting GTIDs to make replication at least somewhat easier to manage, a lot of improvements to the optimizer. For example, remember, I mentioned what the subqueries in MySQL 4.1 were introduced, but many common subquery types couldn't really execute in completing the century.
24:48
In MySQL 5.6, a lot of such optimizations were added. And that is also the day here where the initial release of Docker happened.
25:00
Just for the background. In 2014, we have the first release of Amazon RDS Aurora. And why is that, I think, interesting? Because what Amazon Aurora is, is this is the new kind of software which happens with the cloud systems,
25:21
which is kind of based on open source, but not open source itself, and which really promises some additional features, additional scalability, and so on and so forth, but at the risk of very serious lock-in, similar to which pretty much comes with your proprietary software.
25:49
In 2015, we got MySQL 5.7 available. In MySQL 5.7, it's kind of getting boring, right? Again, even more scalability.
26:02
We have a JSON document store, parallel and multi-source replication. And what we can see in this case is that is where we see a lot of NoSQL, and especially JSON-focused solutions, giving MySQL some heat. Because by that time, a lot of the new generation of developers,
26:21
they would not really understand relational algebra in this kind of complicated language called SQL, right? They would write stuff in JavaScript. They would understand JSON object, right? And that is a way they would like to work with persistence. That is why a lot of them would use MongoDB at that time, right?
26:44
And MySQL was there to respond to that trend by starting to introduce JSON functions, and also a document store, which is a MySQL feature, which provides an interface quite similar to MongoDB.
27:01
That is also the year where ProxySQL was initially released. And in ProxySQL, I think there is an open-source tool which you can really hear a lot about those days if you're part of MySQL community, which really has a lot of very nice features for load balancing and otherwise managing MySQL traffic, right?
27:23
You can use it for all kind of things, from implementing your database firewalling, to caching, to load management, right? Or implementing some basic sharding. So that is a very important technology for MySQL,
27:41
like a system those days. From our side at Percona, that is where we acquired a company called Tokutek, which has a TokuDB storage engine to integrate that in MySQL. What kind of challenges did we see with MySQL in 2015?
28:04
Well, as I mentioned, that's where we get a lot of new SQL, no SQL solutions coming up. And the wonderful thing with those solutions is there was no need for manual sharding anymore, right? If you look at solutions like MongoDB, Cassandra,
28:22
and so on and so forth, right? They have sharding, which is pretty much done automatically for you. You just have to, you need to scale, you add some more nodes to a cluster, and that works. Yes, you have to trade a lot of SQL features for that, right?
28:41
But you get a lot easier scalability, which is good enough for some applications. Another challenge or another benefit developers saw is those no SQL solutions. It's what they have been a schema-less. That means you would not need to manage a schema. Like, hey, if you need to add another column,
29:02
you don't need to alter table, right? And which can take a lot of time and be quite painful. So as I mentioned as well, they had the many developers which don't quite understand SQL, and that was another issue at that time.
29:23
Now, another challenge that we saw at that time is that the MySQL single thread performance was getting worse release after release. And remember, if CPU is not getting faster, getting wider, that was really the problem,
29:42
what we may not get better performance with MySQL upgrades and CPU upgrades compared to previous releases. So here are some graphs to illustrate that. Now, if you can see here, these are kind of results from different MySQL versions.
30:00
You can see that at the high concurrency, every single version, every new version, would generally provide performance improvements, right? You could argue about the shape of this graph or how much better, but generally, the new version would provide better performance at the scale. But at the same time, if you look at single thread
30:24
performance, that was kind of going down, right? And here is the link here to the blog of Mark Callaghan, who is the self-proclaimed champion of making MySQL single thread performance to suck less, right?
30:43
So he has been doing a lot of investigations in this problem, attracting community attention to it. Now, in terms of 2016, there have not been any big MySQL releases for that,
31:00
but I think there are a couple of things which are important happening. One is, from our side, we release the first version of our Percona monitoring management product. And why I think that is important, because if you look at the MySQL ecosystem, while there is a good open source core database software,
31:22
right, you can think about MySQL, Community Edition, Percona server, MariaDB, they're all open source, right? When it comes to really, that's such a key component as a tool for monitoring, we generally have to be either SaaS or proprietary. MySQL Enterprise Monitor, Monyog, Vivid Cortex, New Relic,
31:42
right, all of this stuff is not open source. And yes, you can go ahead and build something together, you know, for using Nagios and some plugins, right, or maybe Xabix, but a lot of that would be do-it-yourself, which would be, well, pretty hard, right? So that is why we decided to focus our attention
32:03
on the problem of open source monitoring management first. Another thing I think which is wonderful is what the click house was open source, right? And while it has nothing to do with MySQL, the nice thing about it is it's a SQL language, which is kind of very fast and massively parallel, which, with the help of proxy SQL,
32:23
can execute queries with the MySQL protocol, and you can replicate the data from MySQL to click house in many cases easier than using something like Hadoop. And I think that is a very cool solution. If you don't quite want to go all the way to the kind
32:41
of big data and Hadoop infrastructure, but you want to have your analytical queries to run much, much faster. Even at the single node, click house may be about 100 times faster than MySQL, right? And it can scale to a cluster of hundreds of nodes.
33:01
MySQL in 2017, that is where we get MySQL group replication and also in the DB cluster product, which is close to that. And I think a lot of that was based on the success of Galera replication technology and Perconex DB cluster, which is our product based on that.
33:20
It shows ideas, hey, what people do want to have something more kind of managed and easy to use than MySQL asynchronous replication. In 2018 to date, we get MySQL 8.4 release candidate,
33:40
which was just recently released. A lot of very cool stuff, right? I mean, I can't wait for MySQL 8 to finally go GA, and I hope that will happen in the next few months, right? That's what release candidate supposed to mean, hopefully. And from our side, we have shipped the Rocks DB, or My Rocks storage engine as a part of Perconex server.
34:03
And what My Rocks DB is and what's wonderful about it is that is a storage engine which Facebook uses to really handle the massive amount of MySQL data much more cost effectively in terms of disk space, SSD ware, and performance compared
34:21
to what we ever could get in a DB, right? And so now that's available to you in easy to use package from Perconex server. Now, let's look at what is really state of MySQL overall
34:40
in 2018. I believe by now that is clearly not only the open source database anymore, right? There are other open source databases. PostgreSQL has been growing very rapidly and has very successful, I think, last three, five years. We also have a lot of general purpose, not general purpose
35:00
databases, right? Like there is InfluxDB for time series data, right? There is a solution such as Neo4j. If you are looking for graph-based databases, right? So a lot of elastic search, right? Again, if you have certain queries
35:21
where you want to have full-text search and some other application, elastic is pretty much, much better for that than MySQL. But obviously, MySQL is still number one open source relational database. And I think it really is shown how
35:42
it can use it very successfully with other database technologies as a part of your data layer. If you look at any large-scale technology company those days, they typically are not using MySQL alone. Most commonly find it's being used with, for example,
36:00
Redis for caching, or maybe still Memcache, Elastic for full-text search, then data will be probably replicated to something like Hadoop for big data analysis. And that replication will use technologies like Kafka, right? So we see MySQL taking a very important place
36:21
but within a portfolio of multiple technologies in the modern open source data source stack. We also see MySQL deployed very commonly in the cloud, increasingly commonly, both as kind of do-it-yourself environments on EC2 as well as with database as a service provider.
36:44
And you can see what all major cloud providers now have MySQL compatible open source, oh, not open source, but MySQL compatible not open source database as a service.
37:01
Let's also look at the modern MySQL scalability and what that means, right? These are, again, the graphs from Oracle marketing, right? So they probably show us the best case scenario. And we can see here what the MySQL can reach more than a million of queries per second, right?
37:22
On the single box. That is a lot of queries. And yes, of course, there is some big ass box that was used here, and these are some very simple queries. But if you really look at the single MySQL of instance, it can do hundreds of thousands of queries a second
37:43
of a medium complexity. You can generally do tens of thousands of updates or other kind of operations and traverse tens of millions of rows in your database.
38:00
And you can run successful databases of multiple terabytes in size, right? And again, this is not some extreme numbers, right? We had customers running 50 terabyte database on a single node, right, for example. Well, that is kind of painful. I'll be honest with you. You know, 50 terabytes on a single node is painful,
38:23
whatever database it is. But some people are doing that, and that's possible. Now, let's do some math to understand what exactly that means with those numbers, right, for developers and what kind of applications we could build
38:40
on the single MySQL node. We probably don't want to go single MySQL node. We probably would have some sort of replicas, if not for reads, but at least for have-ability. But let's just do some math. So even if we reduce those kind of Oracle marking numbers
39:00
to a much more conservative 100,000 queries per second, and we will think what our application requires 10 years, 10 queries, right, to serve user interactions, right, which is pretty common for mobile apps, that gives us about 10,000 of those user interactions a second, 10,000, right, which it can support,
39:21
which gives us this kind of big number of user interactions a day. And if you think about our user engagement, which is about 30 of them per day, that can give us about 20 million of daily active users. Of course, you'll say, well, Peter, it cannot be so even, but even if you think about your kind of classical skew with ours,
39:48
you're still looking at 10 million, 15 million of users just a single MySQL can support, right? And why am I talking about this?
40:01
Because the MySQL has a lot of bad press because it is painful to shard. But what I would say out there for 90%, maybe more of much more application, you don't really need MySQL sharding, right? And MySQL plus replication can be a wonderful database
40:22
to build the application of a medium size, right? Because majority of applications which are built and exist, they are not Facebook or Google scale, then they are much more manageable scale.
40:40
Okay, so what kind of challenges we see from MySQL in 2018? Well, one is increasing security and compliance requirement. That is a challenge for everyone in this space, right? Especially in Europe with kind of GDPR deadline looming on all of us.
41:02
And that really gets a lot of kind of distraction for engineering team, if you like. And as well as it requires a product changes to go against usability, which is another requirement, right? Which modern developers are pushing for. MySQL in containers, while that's very much wanted,
41:23
that is not really well developed, right? And I think in general, running databases with strong requirements for persistence are kind of still evolving as of right now. We still don't have easy to use scale out solution.
41:42
MySQL sharding is a pain now, as it was 10 years ago. We can see some new developments like vTest, which are designed to make it less painful, but they are not so widely adopted yet. Parallel query execution is still not there, right?
42:02
And it's needed now more than ever, right? For MySQL to be more usable handling the medium complexity queries. Another challenge for MySQL, I believe, is the GPL license, right? Why is that? Because it kind of gets squeezed between our permissive license,
42:25
like, for example, Postgres have, which allow everybody to use that without kind of fear of what you may be violating some GPL license, especially in not completely open source environment, right?
42:40
And while I understand all of us here love open source, well, complete open source nirvana is just not in reality yet, right? And the fact that MySQL is GPL really restricted adoption in some of the cases. But from the other side, the GPL does not protect MySQL
43:02
from getting the free riders in the cloud, right? If you want to run your own cloud MySQL version, right, like Amazon does, they can easily do that, right? In this case. In this case, the license such as IAGPL, which MongoDB has,
43:25
is much more protective for a company, right, which does not want somebody just to take their IP, add something more on that, and run it in the cloud. I also believe it is, what is very painful and disruptive for MySQL,
43:44
is the Oracle reputation. Because there are not too many people which like Oracle. Well, I am not going to run a pool and ask you to raise your hands to save embarrassment to some of the Oracle folks who have an audience,
44:01
but believe me, not too many people like Oracle in open source community. But the fact is, in my opinion, Oracle has done a fantastic job, and all those new MySQL versions 5, 5.6, 5.7, released under Oracle umbrella,
44:26
has a very solid engineering, performance improvements, and I would say the better quality than a lot of MySQL versions shipped before that. Now, the real impact of those open source community dislike of Oracle, and this kind of bad reputation,
44:41
is what MySQL was removed from number of Linux distribution for what I believe are very misguided reasons. And I think overall that really does not help their MySQL community. Okay, well, that's all I have to say about MySQL and its scalability,
45:03
but if you would like to learn more about MySQL, about other open source databases, I would welcome you to join us for Percona Live Conference, taking place in April this year in Santa Clara, United States. And now I'm happy to answer some questions.
45:36
If there are any questions, please raise your hands, and we'll come with a microphone to you.
46:19
That's fine.