We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale

00:00

Formal Metadata

Title
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
Title of Series
Number of Parts
611
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2017

Content Metadata

Subject Area
Genre
Abstract
As proxies (and database routers) go, the first one I ever used was the nowdeprecated MySQL Proxy. Since then, I've managed to use MariaDB MaxScale quitea bit (including its fork AirBnB MaxScale), played around with ProxySQL inrecent time, and also started taking a look at MySQL Router. In this quick20-minute overview, we'll discuss why these three exist, a feature comparison,and reasons when to use the right tool for the job.
Proxy serverRouter (computing)Goodness of fitComputer virusProxy serverRouter (computing)Computer animation
Proxy serverClient (computing)Server (computing)Sanitary sewerProxy serverScaling (geometry)Client (computing)Server (computing)Medical imagingThread (computing)Metropolitan area networkBenchmarkCartesian coordinate systemFront and back ends
Proxy serverEnterprise architectureStatement (computer science)Query languageInterpreter (computing)Medical imagingProxy serverCodeQuery languageGame theoryProduct (business)Scripting languageDatabaseInterpreter (computing)Statement (computer science)Scaling (geometry)Computer animation
Menu (computing)Proxy serverAlpha (investment)Level (video gaming)Beta functionCommunications protocolComputer architectureMultiplicationRouter (computing)
Scale (map)Menu (computing)ArchitectureQuery languageDatabaseLogical constantState of matterScaling (geometry)Filter <Stochastik>Right angleQuicksortInformationLogin
Menu (computing)Front and back endsScaling (geometry)Front and back endsMaxima and minimaClient (computing)Binary fileCASE <Informatik>Server (computing)Firewall (computing)Mathematical analysisSoftware developerLoginNetwork topologyQuery languageParsingBranch (computer science)Open sourceRegulärer Ausdruck <Textverarbeitung>Statement (computer science)Product (business)DatabaseCustomer relationship managementLevel (video gaming)Revision controlRight angleReplication (computing)RoutingLibrary (computing)System callMessage passingScripting languageSyntaxbaumParsingXML
BlogPlug-in (computing)Front and back endsContext awareness
Real-time operating systemMathematicsClient (computing)Scaling (geometry)Maxima and minima
Query languageScaling (geometry)Link (knot theory)Client (computing)Server (computing)Connected spaceNumberDirection (geometry)DatabaseProxy serverFront and back endsAuthenticationQuery languageTouch typingCounting
Repository (publishing)Event horizonSource codeClient (computing)Query languageRepository (publishing)AdditionMathematicsLibrary (computing)Event horizonSource codeClient (computing)Open sourceMotion captureMaxima and minimaSequelMereologyScaling (geometry)
InferencePlane (geometry)DatabaseSource codeOpen sourceMultiplication signScripting languageCASE <Informatik>Scaling (geometry)DatabaseProduct (business)Instance (computer science)Limit (category theory)Source codeXML
Stallman, RichardQuicksortSoftwareScripting languageSource codeXML
Router (computing)ArchitectureChemical equationServer (computing)Front and back endsQuicksortBitAxiom of choiceRouter (computing)Cartesian coordinate systemPlug-in (computing)Server (computing)Interface (computing)Front and back endsComputer animationXML
ArchitectureRouter (computing)Chemical equationFront and back endsServer (computing)Pairwise comparisonNetwork topologyVideoconferencingCircleDatabaseRouter (computing)Group actionSoftware testingProxy serverConnected spaceMultiplicationRun time (program lifecycle phase)Multiplication signLastteilungDistribution (mathematics)System administratorRewritingQuery languageCustomer relationship managementCartesian coordinate systemYouTubeBlogFront and back endsServer (computing)Replication (computing)Point (geometry)Software frameworkOperator (mathematics)AuthorizationHeegaard splittingRoundness (object)Electronic mailing listTrailPlug-in (computing)Gastropod shellPairwise comparisonConfiguration space
BlogClient (computing)EncryptionRouter (computing)Proxy serverDemonBinary codeOpen setEmailRouter (computing)DebuggerIntegrated development environmentWebsiteSoftware bugScaling (geometry)Electronic mailing listProxy serverEncryptionStreaming mediaCartesian coordinate systemBinary fileClient (computing)LoginFront and back endsPairwise comparisonMultiplication signDemonComputer animation
Projective planeLoginDemonProxy serverProduct (business)
Router (computing)Projective planeLine (geometry)Game controllerProxy serverFlow separationBlogExtension (kinesiology)SoftwareRouter (computing)QuicksortMultiplication signINTEGRALXMLComputer animation
Proxy serverStatisticsMultiplicationProcess (computing)Projective planeComputer animation
Punched cardMaxima and minimaMultiplication signProxy serverScaling (geometry)Source code
GoogolOpen setQuicksortProduct (business)Open sourceSource code
Router (computing)Axiom of choiceServer (computing)Customer relationship managementDisintegrationMultiplication signInsertion lossRouter (computing)XML
Axiom of choiceOpen setPower (physics)Router (computing)DatabaseYouTubeView (database)Server (computing)Installation artComputer configurationAdditionGene clusterProxy serverPoint cloudLoginScaling (geometry)Binary fileSoftware testingOpen sourcePerpetual motionoutputXML
Connected spaceTransport Layer SecurityQuery languageLimit (category theory)Regular graphXML
Limit (category theory)Query languageSoftware testingProxy serverGroup actionGoogolXML
Electronic mailing listInternet forumEmailHand fanRouter (computing)Group action
Router (computing)Discounts and allowancesGroup actionOnline chatCodeInsertion lossDiscounts and allowances
2 (number)Open setJSON
Proxy serverLattice (order)Process (computing)Hidden Markov modelMathematical optimizationJSON
Regulärer Ausdruck <Textverarbeitung>Mathematical optimizationJSONUML
Range (statistics)Level (video gaming)Proper mapJSON
Level (video gaming)Router (computing)Term (mathematics)Limit (category theory)JSON
Proxy serverJSON
Server (computing)Proxy serverCartesian coordinate systemStructural loadSemiconductor memoryCASE <Informatik>JSON
Virtual machinePresentation of a groupProxy serverInstance (computer science)Server (computing)Product (business)JSON
Computer animation
Transcript: English(auto-generated)
Hello. Good afternoon, everybody. Thank you for coming in this rainy afternoon to hear about the various MySQL proxies. Before we start, I'm curious to know how many here use MySQL Router, or have tried to use it, at least.
OK, what about proxy SQL? Oh, wow. And what about MariaDB MaxScale? OK, excellent. So good. So we can start in 20 minutes. So I'm Colin.
I work at Percona, Inc. And before that, I was at MariaDB as well, doing stuff on MariaDB Server. So a quick question is, what is a proxy? Basically, it's a very lightweight application between the clients and the server. It has to be lightweight, because otherwise, you
introduce latency for not much gains. And there have been benchmarks done by multiple people. So you can see benchmarks against other proxies to see how lightweight it is. And typically, some proxies will use things like EPUL and so forth, and the threading will change as well. It is largely a man in the middle,
you yourself, installed between a client and server. And of course, it can communicate with many clients as well as many server backends. And this idea of proxies, not very new. So this is an image taken from Giuseppe Maggia from 2007.
And he couldn't join us this year at FOSTEM. But this is an image giving you an idea of what a proxy does and leading you on to the fact that there is this thing called MySQL proxy. MySQL proxy was around 10 years ago. It was probably the first ever MySQL proxy out there.
It had an embedded Lua interpreter. Generally, you could define what it could do with query passing through it. And the Lua interpreter was handy, because back then, a lot of people would play this game called the World of Warcraft. And a lot of people could know how to script in Lua. So you could write really short Lua scripts for MySQL proxy.
And today, other databases with an embedded Lua interpreter include things like Tarantool. It is still used in MySQL Enterprise Monitor, from what I gather. And the Lua was extremely flexible. You could rewrite queries.
You could add statements. You could filter. And you could just write these short chunks of Lua code. Lua was very easy to pick up and very easy to write for. It is unfortunate that this product is sort of archived now. But never fear, because there are now multiple other products
for you to use. So we'll start with MariaDB MaxScale, because I think this was, by and large, the first GA-ready proxy out there. It became GA in January of 2015. And the idea was percolating for a lot longer before that.
And there were multiple alpha and beta releases before that. It is a level seven proxy router. It understands the MySQL protocol. It has a fully pluggable architecture. So what MaxScale does is it monitors the state of all database nodes constantly.
And you can also react to monitoring-based information or hints even from filters. So MaxScale largely became sort of this Swiss army knife. It was extremely pluggable. You could also do things like logging.
You could write to other backends besides MySQL. The idea of MaxScale was such that it should be able for you to translate even to other non-Mysql backends if need be. It also has a DB firewall filter, which you can use. Because you can't parse SQL through regular expressions.
You actually need to use a parse tree. You can feed it regex, but the parse tree needs to exist. You could also route via hints. You could rewrite queries. All this is still possible. And the query-level analysis, you could also filter logs. You could also call external tools with MaxScale, things like MariaDB Replication Manager
or even MHA via scripting. It also provided schema-based sharding. One of the most popular use cases for MaxScale that became rather famous was the binary log server, popularized by Booking.com.
So a couple talks ago, Jean Francois, he's got excellent talks about using the binlog server. And the idea there was to not use intermediate masters. So you would have masters served by MaxScale and then get all the slaves to read off MaxScale. And this has been obviously used in production and other database, heavy users of databases
have also got their own versions of binlog servers that are maybe not open source yet. And one of the most popular use cases was just to put it sitting in front of a three-node Galera cluster, three-node being the minimum to start a Galera cluster.
And the MaxScales from 1.0 right up to 1.4 is linked against MySQL client library, so either libmysqlclient or libmariadbclient, because you do need to have access to the parser. And for what it is worth, it is still getting development, the 1.4 branch is still getting development.
If you pay attention to the GitHub tree, you'll realize that there should have been a release on the first of February to fix the blocking of prepared statements, but that release hasn't been out yet. MaxScale also managed to create some kind of ecosystem around it.
The first known plugin was a Kafka backend written by Eve Trudeau. If you look at GitHub logs, it started probably around February of 2015, so shortly about a month after it became GA, and it was announced around middle of that year. And the context, obviously,
was to have real-time change data capture that would come in via your MaxScale client to a Kafka backend, and you'd see this only appear in MaxScale 2.0, which was something that got released in August or September of 2016.
And there was also the first known credible fork called Airbnb MaxScale. I say credible because you may have read news articles about another fork called GPLScale, and that was just a fork out of anger, I think. It was not a real fork that you could use.
And Airbnb MaxScale, the idea was to have a database proxy focusing on connection pooling, reduce the number of direct connections to the MySQL database. They deployed Airbnb MaxScale on all of Airbnb from early 2016, and it's pretty much powering all MySQL that touches it there.
And some of the features I've listed up there, and generally speaking, if a client completes a successful authentication with a backend MySQL server, Airbnb MaxScale will then serve the link between the backend connections and the client connections and park it in a connection pool itself of the backend server.
Now, denialist query rejection, as an addition for them, was quite important because sometimes they use Ruby, and the Ruby VM will trash, and when it trashes, they've also seen bad queries go to do delete where zero equals zero,
and MaxScale would actually stop that from happening. This was obviously no fault of MySQL. It was a fault of their Ruby VM crashing, and this is not a problem that only Airbnb faced. Other people have faced it, too. So having Airbnb MaxScale is quite useful.
So we are at FOSDEM, and this is a free and open source conference, so I have to apologize that I spoke about MaxScale for length, but the abstract did say I was gonna speak about MaxScale, so I would be cheating you if I didn't. So MaxScale 2.0 came out August or September of 2016.
Same GitHub repository. It is not linked against the MySQL client libraries any longer. It's replaced with SQLite, and SQLite is now known to parse MySQL. You can also do change data capture to Kafka. You can also push your binary log events to Avro or JSON,
but the most important thing was the license had changed. It went from GPL V2 to this thing known as the business source license. How many of you are familiar with the business source license? Well, that's quite a lot of you. Hmm? It was in the news, yeah, hard to avoid, yes, okay. So the business source license is time delayed open source.
It is not the first time this has been tried. GoScript and the Aladdin license did this as well. It was GPL after a year during the GoScript days. In this case, they tell you you are not allowed to use Maxwell in production if you have, well, more than three database instances in a free fashion. I don't know how that is enforced
except through this use limitation. And after three years, it becomes GPL. And Richard Stallman himself, back in the day, during the GoScript Aladdin license debacle, he said he considered it a problematical compromise because it gave us free software after a year.
But thankfully, that was back in the GoScript Aladdin days. I'm guessing many of you don't even remember GoScript or Aladdin, because that was in the mid-'90s when free software was sort of just, sort of becoming a bit more famous. So thankfully, we have choice now.
MySQL Router, fully GPL v2, so thank you Oracle. It became GA in October of 2015, and they made a Labs release about a month before that. And I guess they had, if you look at the release notes, you also see that they had internal releases, so they were playing with this for a while.
The idea there is obviously to do transparent routing between all applications and backend MySQL servers. They also have a plugin interface via a harness that you could use. So the harness provides dependency tracking, loading and unloading of plugins, configuration, logging framework, and so forth.
It can do failover, it can do load balancing. It can also distribute application connections in a round-robin fashion. So it'll forward the MySQL packets to a backend server without ever inspecting or modifying them, so you get maximum throughput. And I think one of the coolest things
is that this is one of the key features that you need if you wanted to start playing around with MySQL InnoDB cluster. And your host, Frederick, has actually got a less than five-minute video on YouTube that you could watch in your own time. So just search for MySQL InnoDB cluster on YouTube, and you will be able to see it.
It's a very good video about how you can use MySQL with router and group replication and the new shell, which I'm guessing they were talks about, or will be talks about. And then there is ProxySQL. The main author, Rene, is actually in the audience.
It has been stable since December 2015. There has been many times at many conferences that people did talk about ProxySQL. I would say 2015 was the year we went full circle on proxies, right, because we had three proxies become GA in one year. And the selling point for this is it's by DBAs, for DBAs.
And at Percona, we obviously like ProxySQL a lot, so we've included it inside of Percona HDB cluster 5.7. We've also got a ProxySQL admin tool that you can use for configuring your PXEs to have ProxySQL in front of it. And the idea behind this, of course, is to improve your database operations,
have HA topologies and so forth. It also does things like connection pooling, read-write splitting, things you'd expect from a proxy. I think the very important things for me, as opposed to just going through this entire list, is that it's runtime reconfigurable, so you don't have to restart your ProxySQL.
This is extremely friendly for users, our DBAs. And the monitoring built-in is also extremely useful. This monitoring can also then be pushed out to things like Percona monitoring and management and so forth, or consumed by other things. And yeah, it does query rewriting.
And there have been multiple blog posts and performance tests done on this. And I think one of the best things about ProxySQL is that Rene has spent some time comparing ProxySQL to other proxies out there, including HAProxy and GenX. And he released this sometime in January of this year.
So well worth taking a look at this comparison, because hey, if you find a problem with said comparison, you're always more than welcome to actually report a bug on the mailing list. ProxySQL, of course, has a few missing features if you want to compare it to MaxScale. One of them is front-end SSL encryption,
so that's going from client SSL encryption to the proxy to the application. It does support back-end SSL encryption. And there is an open issue, issue 891. This is kind of important for HIPAA environments and so forth. And I believe it is on the roadmap. It will get fixed in time. So if you care, you can track this on GitHub.
There is no binlog router. And from what I understand, very large internet sites may open up their binlog routers for you to end up using. If you want to stream binary logs to Kafka, there is this wonderful thing called Maxwell's Demon that you can also use. So it's another tool that is actually used in production.
And binlogs to Avro. How many here use Avro, out of curiosity? That's what I thought. And I think the cool thing about ProxySQL is it has lots of great resources. It's great when other people talk about your project,
not just you talking about your project. And we see things, we see Mark Matuso have extensive blogs, several lines. Cluster control also works with ProxySQL, and they talk a lot about it and how you can use cluster control. Pythian talks about it. Percona obviously talks a lot about it. So you can get a lot of resources.
And since we are at FOSDEM, it's also worth talking about the health of these projects. So they're all at GitHub. And Maxwell has been around probably the longest. Router, like every other MySQL software,
is not really developed in the open, so it is sort of dumped onto GitHub from time to time, I guess. Which is why you may see less contributors and so forth, but that's in no way the integration of how the software is being developed. It's just a different process. And ProxySQL also has a bunch of many stars,
multiple forks. The other thing is to see pull requests. There are a lot of pull requests in ProxySQL as well. So do these statistics tell you that the project is healthy? Kind of, if you see regular commits, issues being closed, it shows that people care.
GitHub has this other very interesting thing called punch card. And I thought this was kind of appropriate for FOSDEM as well, because this is the punch card for MagScale, which you see lots and lots of activity from Monday to Friday during working hours. Seems fair.
This is the punch card for ProxySQL, where you see activity all the time. Not because Rene is madly typing away. There are many other contributors to this, but I think this is sort of the spirit of open source, is you also work not when you're paid to work on it.
You work on it because you love working on it, and you improve it, and that shows in the product. And a metric that I don't think is a metric, but if you cared about Google Trends, this is a joke, of course. This is insert time to laugh. This is Google Trends thing, and the red is ProxySQL,
and it starts spiking a lot in August of 2016, and I'm guessing you all read the news and know why. So, what do you use as we come to wrap this up? I think MySQL Router is going to be
very interesting going forward, so it is something to definitely watch. I think ProxySQL is great for you to use today, because it's got a lot of support, it's well integrated, it works with other additional tools like PMM and so forth.
And if you need the binary log router today, before someone else makes a binary log router open source, there is GPL MaxScale that you can use that works, and I highly recommend you, if you need a binlog router to use that. But there's also one other consideration to think about, is that what server are you using?
If you are using MySQL and Percona Server, you can use all these tools, and it wouldn't be a problem. If you're using MariaDB Server, it is pretty obvious MySQL Router will not work for you, so that one choice is now gone,
and that will only leave you the option of ProxySQL and MaxScale. So, again, server choices also play a role, and I would be remiss not to mention Vitesse. I didn't put that in the abstract, but Vitesse doesn't sell itself as a proxy either. It is a database clustering solution for horizontally scaling MySQL, and it powers YouTube.
That alone should make you sort of open your eyes, so if you go to vitesse.io, you will actually be able to try this out with something like Kubernetes. You don't have to run it in the cloud, you can run it yourself, and if you run it yourself, you install something like etcd or zookeeper,
so you also ensure your cluster view is always up to date. Vitesse is definitely interesting, because it makes connections using very lightweight BSON, that each connection only uses 32 kilobytes of RAM. This is very unlike regular MySQL connections, where they could be anywhere between 256 kilobytes of RAM to even three megs.
It also does the whole rewriting of queries, so you can, you know, no limit clause, no problem. You can make sure that you get the query, you don't see a limit, you add it automatically. So, and Vitesse of course works with MySQL and MariaDB as well.
So there are a bunch of resources that are worth taking a look at. The proxy SQL Google group is extremely active. The MariaDB MaxScale Google group is also quite active. MySQL router doesn't have a mailing list, but it has this forum, which you need to log in to use.
I'm not a huge fan of forums, but it is a discussion place. And the Vitesse forum is extremely active as well, and they have a Slack group as well, so you can join your Slack group and chat. I'm guessing not many people here like to use Slack. We are at Fostem.
Okay, so, before I wrap it up, insert ad, Prokona Live, if you're interested in MySQL, you should probably be there, and you get your 30% discount if you use that code.
20 minutes and 43 seconds. Thank you. I'm open for questions. I'm open for questions if you have any, or if you plan to shuffle. Yes.
Can the proxy rewrite a sub-select to a join, for example? Why do you want to rewrite sub-selects? Now, if you use MySQL 5.6 or MariaDB 10, they can process sub-selects.
Join is more performant. Hmm, that is a good question. I have not had to do this recently, because I trust the optimizer, but this is possible, maybe, Rene? If you use regular expressions, you could probably get it going. But yeah, I don't think you need to now,
because it kind of just works in the optimizer. Trust the optimizer. Any other questions? Here, oh, sorry, I couldn't see. Yeah, so the question is, what if you want to do sharding?
And yeah, so they do like range-based sharding, so you can get some level of sharding. It's not a proper sharding solution, but you can get some level of sharding, and tools. Yes, they do.
Well, this router, kind, not yet, but it's in the roadmap, yeah. So, the other two do. And Vitesse obviously does sharding, too. But much harder to set up. By schema, yes.
So, they have their own limitations, in terms of sharding. Question.
So, the question is, where would you put the proxy on the application servers, or dedicated servers, and the advantages and disadvantages? If you're going to use it for load balancing, maybe you want to keep it on a dedicated server. I would say dedicated would make more sense, in case your application server goes down,
or runs out of memory, and kills your proxy. So, I'd just keep it on dedicated machines. Though, for what it's worth, it is from Alkin and Rene. A while back, they made a presentation about how proxy SQL has, one proxy SQL instance has, you know, hundreds of servers sitting below it, as well.
And it works in production. So, I would just use a dedicated machine, overall. Okay. Thank you.