We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Scaling MySQL with Python

00:00

Formale Metadaten

Titel
Scaling MySQL with Python
Serientitel
Teil
32
Anzahl der Teile
173
Autor
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
ProduktionsortBilbao, Euskadi, Spain

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Roberto Polli - Scaling MySQL with Python Python is the language of choice for the orchestration part of MySQL 5.6. After a brief introduction of MySQL replication architecture, the talk presents the python utilities released by MySQL: - a set of drivers in pure-python - mysql-utilites for replication, management and failover - fabric, a tool for scaling, sharding and provisioning new servers You will see how to: - create resilient configurations in minutes - use mysql-fabric to create high available infrastructures As a plus, we'll show how we: - implemented a fabric provider for provisioning new databases via docker
Schlagwörter
Elektronischer FingerabdruckMultiplikationInformationsspeicherungFront-End <Software>Atomarität <Informatik>ROM <Informatik>RechnernetzElektronische PublikationBitStatistikService providerTreiber <Programm>SystemprogrammMini-DiscTotal <Mathematik>ServerEinfache GenauigkeitPhysikalisches SystemEinfach zusammenhängender RaumBinärdatenMathematikDiagrammTransaktionZentrische StreckungDatenbankFehlertoleranzService providerMini-DiscDatenparallelitätBitMultiplikationsoperatorImplementierungE-MailEinsKnotenmengeTropfenMigration <Informatik>DatenreplikationElektronische PublikationGrenzschichtablösungVersionsverwaltungAdressraumSkriptspracheDatensatzSystemprogrammPunktTabelleFront-End <Software>EindeutigkeitRandomisierungCloud ComputingEinfach zusammenhängender RaumWellenpaketTreiber <Programm>TypentheorieClientSoftwareentwicklerLoginComputerspielPlug inSynchronisierungDifferenz <Mathematik>Funktion <Mathematik>BinärcodeComputerarchitekturCodeReelle ZahlKartesische KoordinatenSchreiben <Datenverarbeitung>PunktwolkeProgrammierumgebungSichtenkonzeptSystemplattformAutomatische HandlungsplanungWeb logDruckspannungWiderspruchsfreiheitArithmetisches MittelInverser LimesSchnelltasteFlächeninhaltRuhmasseTopologieStatistikDatenverwaltungBasis <Mathematik>Automatische IndexierungErneuerungstheorieDifferenteProjektive EbeneLineare RegressionFeuchteleitungMereologieGenerator <Informatik>IntegralComputersicherheitGewöhnliche DifferentialgleichungBildgebendes VerfahrenSampler <Musikinstrument>Produkt <Mathematik>Shape <Informatik>Kontextbezogenes SystemAnaloge SignalverarbeitungMAPGesetz <Physik>AbfrageDateiformatLeistungsbewertungGamecontrollerComputeranimation
LogarithmusFunktion <Mathematik>SicherungskopieAutomatische IndexierungGraphBinärdatenFrequenzRemote AccessZustandsdichteÜberlagerung <Mathematik>PasswortParametersystemLokales MinimumPortscannerCodeKlon <Mathematik>SpezialrechnerKnoten <Statik>VerzeichnisdienstServerEreignishorizontService providerATMLoginAutomatische IndexierungVerfügbarkeitTypentheorieTopologieAbfrageMailing-ListeGruppenoperationDatenreplikationServerDifferenteKartesische KoordinatenGewicht <Ausgleichsrechnung>InformationKonfigurationsraumPunktDatenbankInstallation <Informatik>AdressraumZentrische StreckungMereologieHeegaard-ZerlegungFramework <Informatik>Rechter WinkelFamilie <Mathematik>MultiplikationsoperatorRuhmasseOnline-KatalogNegative ZahlOrdnung <Mathematik>ClientMinimumVerdeckungsrechnungService providerMetropolitan area networkGebäude <Mathematik>SchlussregelNP-hartes ProblemParametersystemLesen <Datenverarbeitung>BeobachtungsstudieRadikal <Mathematik>Natürliche ZahlEndliche ModelltheorieVersionsverwaltungDatenverwaltungKomplex <Algebra>Produkt <Mathematik>FehlertoleranzFunktion <Mathematik>Computeranimation
Metropolitan area networkServerStichprobeTypentheorieServerDatenbankBildgebendes VerfahrenSystemverwaltungGewicht <Ausgleichsrechnung>RechenschieberPaarvergleichComputeranimation
Metropolitan area networkEinfach zusammenhängender RaumATMServerGammafunktionKlon <Mathematik>PunktQuellcodeSchlussregelFamilie <Mathematik>ServerGruppenoperationComputeranimation
SchnittmengeServerKlon <Mathematik>QuantenzustandArithmetisch-logische EinheitBitmap-GraphikParametersystemPortscannerMailing-ListePasswortMetropolitan area networkService providerInterface <Schaltung>Service providerTypentheorieClientVirtuelle MaschineRuhmasseComputeranimation
PasswortServerService providerEuler-WinkelMailing-ListeImplementierungExt-FunktorClientInterface <Schaltung>TabelleTypentheorieBildgebendes VerfahrenInformationDatenbankSkeleton <Programmierung>GruppenoperationVirtuelle MaschineGrenzschichtablösungInhalt <Mathematik>ServerRechenschieberXMLComputeranimation
ImplementierungReelle ZahlClientInterface <Schaltung>TabellePersonal Area NetworkKlon <Mathematik>PortscannerServerDatenreplikationInterface <Schaltung>ClientElektronische PublikationVirtuelle MaschineSpezifisches VolumenServerInklusion <Mathematik>XMLComputeranimation
Klon <Mathematik>PortscannerServerSpezifisches VolumenMultiplikationsoperatorServerBestimmtheitsmaßComputeranimation
MultiplikationsoperatorComputeranimationVorlesung/Konferenz
MultiplikationsoperatorDatenreplikationVarianzGruppenoperationBimodulTropfenImplementierungMAPVerdeckungsrechnungDatenbankp-BlockFigurierte ZahlVirtuelle MaschineKartesische KoordinatenCodeServerMashup <Internet>InformationMereologieZahlenbereichMini-DiscVerfügbarkeitKlon <Mathematik>InstantiierungComputeranimation
WiderspruchsfreiheitTransaktionSchlussregelInformationsspeicherungOrtsoperatorp-BlockBitrateElektronische PublikationEinsMultiplikationsoperatorDatenreplikationStereometrieSoftwaretestProdukt <Mathematik>GarbentheoriePhysikalische TheorieMereologieAussage <Mathematik>Kartesische KoordinatenMigration <Informatik>Vorlesung/KonferenzBesprechung/Interview
Transkript: Englisch(automatisch erzeugt)
So this is our agenda. I will show you, after a brief introduction on MySQL, what are Python utilities, and how to use them to connect to the database in a fault-tolerant way, and how to create, manage, and scale
fault-tolerant infrastructure. And then, after the run, a bit of some drops of clothes in how to deliver in MySQL in a cloud infrastructure.
As told, I have to thank my company for financing these long travels from Italy to Bilbao. Partek works with migration of big infrastructure, provides business intelligence solutions,
and vertical application for the financial market. So, you know MySQL. It's a single-process, multi-thread architecture. What maybe you don't know is that it has a two-tier infrastructure, architecture.
One layer providing the SQL and connection services. The other layer provides persistency, and is implemented via an API so everybody can write his own backend plugin.
I strongly suggest you to follow the development of MySQL, because newer versions are very, very nice. Of all the backends that maybe you know, there is the InnoDB, which provides SC,
the multi-version in concurrency control, and redo and undo logs, so everything you need to create a consistent database. And another important thing is the replication is based on change logs. Every transaction is recorded,
and to be provided to slaves. Those transaction logs are named binary logs. So, what should we do to manage and use a database? It's almost a small nightmare. It is managing tables, indexes, binary logs,
replication inconsistencies, because if you have a replication, you should provide a way to monitor that the actual recorded data match that what did you do yesterday, you download the script from some blog,
add random queries, shake strongly, and then you have your solution for your database. Well, those days are over. We have the magical MySQL utilities provided by the MySQL team. Those are written in Python,
includes some connectors for Django and Fabric. They are pure Python, so if you want to switch your Python implementation from CPython to PyPy, for example, they work smoothly and will improve in time
as soon as your Python implementation improves. So, no more C bindings, and those implementation are obviously multi-platform because they are pure Python. There are utilities driver and the Fabric orchestrator that we will see after.
So, how to use MySQL utilities? It's simple, just type MySQL UC, and you are in a new client environment, command line environment, a unique entry point for all utilities, it has contextual help, top completion, and otherwise you can access
each utility separately. So, what can we do with utilities? We can check the audit log, compare databases, even the data. If I want to check if my three gigabyte, 10 gigabyte, 100 gigabyte database
is correctly replicated, but I know that this is just one table to compare, I can just compare one table between two databases and check if this table is right. Those utilities can create a diff in SQL format,
so I can apply the resulting diff to one of the database to get everything in sync. I can, even some utility to check disk usage, or to implement a resilient infrastructure. For example, I want to know my address book
and email database that is actually on MySQL. I write MySQL disk usage, and now that I have almost one gigabyte of email indexes, and I get the data of the various database,
those log files are essentially the redo logs of my database, so even if I don't know anything about MySQL, I just write MySQL disk usage, and I've got a bunch of data and statistics, and I can learn from it. I get the output, I check the output
with the documentation, I learn MySQL. This is very useful, I find it very useful for make trainings and real life classrooms, because people can learn, even if they don't know everything. I have connector, this is a snapshot from the code of the Django connector,
so if I use Django, I can use the Django driver provided by my SQL team, it's prepared, and it checks my Django version. Let's start then with the interesting part, that is replication.
Replication in MySQL is almost asynchronous or semi-synchronous, that is, the master creates a change log, records all the changes, assigns them a transaction ID, and then the slaves connect to the master, get those transactions, and if you use
a semi-synchronous replication, the master waits and doesn't acknowledge the transaction until one's slaves acknowledge that the replication have been, the transaction have been replicated.
The simple diagram is the following. So the master produced the change log, the slave connects to the master, downloads the change logs, starts locally, and then applies on the local database. You may guess, why replication? Replication gives us availability.
With replication, we can scale reads, but well, we can even partly scale writes using different indexes on various server. For example, if I have a great, hard, slow,
full text index on my master, on my database, I just don't provision the index on the master, but I configure the slow index on the slave, so I run the master at full speed, at almost full speed, when the master replicates writes
on the slave, the slave schema set up the index, and I run the full query, the full text queries, only on the slave. In this way, I have a very scalable infrastructure
without adding complexity, just with the master, slaves replication, but using some tricks to improve the performance. I have my replicated infrastructure. Well, there is one command, one utility, replication show,
that gives me the replication layout. So even if I don't know my replication setup, I just run this command, it connects to the master, and start discovering all its slaves, and in one minute, I've got the topology of the database.
Even, again, I don't know, even if I don't know how to configure replication, it's just one command ahead, and if I don't know how to configure replication, there is another time, one command that takes care to contact the master server, provision a user,
a replication user, run all the replication checks, point the slaves to the master, set up the replication, start downloading the binary logs. Again, I don't know MySQL, I want to learn,
I just type one command, I get the output, all the steps that I need to set up replication, moreover, I get even the preliminary checks, so even if I learned MySQL five years ago, and I don't know which are the checks required for replication of 5.7, just run this command,
and get the information you need. I can initialize new slaves, we will see it after, because we are running out of time, and if I have to set up my fault-tolerant infrastructure,
I just run MySQL failover command, I told him, go to the master, get all the slaves, if the master fail, you take care of finding the best slave to be elected as a master, then do it,
strip off the master, elect the new slave as a master, and make all the slaves point to the new master. Okay, it's just one command ahead, but try it, but before using it in production, read carefully the documentation,
because there are many, many parameters, it's quite flexible, and it is very useful, so it's starting from zero to hero in minutes. Then, the nice part, the Fabric orchestrator, download the image, and just running a full server
infrastructure is just one command ahead. Fabric is a framework for managing, replicating, and scaling MySQL infrastructure. Takes a bunch of server, makes of those server replication group, configure the replication,
monitor and held infrastructure, makes a director for the redirect split infrastructure, so I configure my client, and then go to Fabric, get the data of the infrastructure from him, and then the connector knows where is the master,
where are the slaves, and he cannot contact the master, he will contact the slave. This is the infrastructure layout. At the center, we have the application. They ask to the Fabric node, who is the master,
who are the slaves, then contact directly the servers that are, every group is a bunch of server, and just contact the server for SQL, so there is no latency. The Fabric is not a proxy.
It just get, contains the data of who is the master, who is the slave, who are the slaves, and of the many groups you have. To set up, we have many commands.
Everything starts with Fabric. We can manage group, configure the server, configure provider, we will see it after in the interesting parts of the talk, so we have many feature. You want to install Fabric? Okay, set up the listing important Fabric CFG,
set up the credential to contact the server, and just configure everything, manage setup and manage start. Okay, Fabric is ready. You have a server capable of providing you a fault-tolerant infrastructure. Okay, you set up the server,
and you want to create many groups. Let's start with one groups, one high-availability groups. You just create the group and add, for example, just three server. Then Fabric knows those server are in the same group.
When you promote a group, one of those server is chosen as a master. You can even provide a server ID. Once you do it, this is the master, the other are the slave. Fabric provides initializing the slaves,
replicating everything, and then with lookup, you can just look at the addresses. There is one primary server and two secondary servers. You can even set a weight on the servers,
so if you want the most of the queries go to the first one, you set an higher weight, or if you want to test the 5.7, you add one server with 5.7, you give it a lower weight, so you don't disrupt your infrastructure.
If you activate, it starts monitoring for fault. Then we stop one server with MySQL admin shut down, and Fabric magically switched off the master from one of the slaves.
Now we have a faulty server, a primary server and a secondary server. It's done automatically, I just type the command you see on the slides, you can see it in the Docker image to connect to Fabric, yes, specify the Fabric cost,
specify the database and the SQL credential. If you set read-write, Fabric goes to the master. If you set read-only, Fabric goes to the slave. If there is a fault, Fabric points read-write
to the new master, everything from his own. To provision a new slave, you can just clone the group on a new target, so you add one server,
and then you clone the group on the server. Fabric knows who is the master, it takes care of it. In the cloud, Fabric can provision new machine, the OpenStack API. I didn't want to provision new machine, I wanted to provision new container with Docker,
so I implemented this interface using a Docker client. Now, if I add, register a new provider, I just register a provider of Docker kind, you see, provider type equals Docker,
then when I create a new container from my Docker provider, I just tell him a new image instead of a OpenStack template, and then you provision a new machine, a new container, a new database.
Then I can just add this database to, I got the information of the server from server list, and I can just add this one to my high-availability group, and well, my solution is scaling.
So, we have only five minutes, but I'm done. We can see, if you're interested, you can contact me, I will see the content of the missing slides. One thing after that is that OpenStack interface
supports machine snapshot via another client. Docker volumes doesn't support snapshot, but something is moving. I started to tweak about it, if you're interested in this, let me know, we can try to find a Docker solution
that is good for me and for you. So, replication is easier with Fabric and MySQL. You can clone server, file over is just one command ahead. Don't just follow masters. If you have a masters server that is broken,
just take it and dump it. Don't reuse it, please. Try Fabric with Docker, and play with Docker volumes. Okay, I've been in a hurry by the time, it's a tyrant. That's all, folks. Thank you very much for your attention.
We do have time for questions, so if you have questions, just rise around and get the microphone over there.
Okay, the guy asked, how can you make high availability on a Fabric server? Well, you can implement a master slave replication
of the Fabric database, and you can have a cluster to implement, to create a resilient infrastructure on Fabric. Anyway, even if your Fabric server dies,
MySQL server, using Fabric connector, has all the information to connect the server, because when they first connect to Fabric, Fabric gives them all the information of the infrastructure, so they know who is the master
and who are the slave. If the master server fails, you should have a cluster that checks if Fabric is up, and if it's down, should start another Fabric instance on another machine.
It is very easy because Fabric is a simple Python application, so once you have a replicated database, you just have to run in one second another Fabric instance using the replicated database. Other questions?
A simple question, do you know if the replication is supported in MariaDB? Replication, I think it is supported in MariaDB,
but actually this kind of setup is about not only replication, but to scaling easily, because you can have replication with many databases. The advantage of those tools is that using a simple setup
gives you the ability to have many groups of databases to manage 10, 20, 100 server easily. Because once you have created the group, you have done.
You can even provision your machine if you have another stack implementation. You note that one of the group has two server lost, you can just provision to machine, clone the group
to those two machine and add them. So it's not only about replication, it's about orchestration, okay? And you get the Python code, so you can take the code, it's GPL, you can take the code and you can mesh it up. Even with utilities, you can mesh utilities,
so you need something using disk usage and audit log parsing, you talk those two and talk the function, import the module, and mesh up and create your own application for,
I think the real advantage of using Python, of choosing Python is that it's really easy to do whatever you want with those codes. So don't think only about replication, think big. Think, you have many Lego puzzle,
you can take all these Lego blocks and build whatever you need. Are you completely screwed if you're stuck
on MySQL 5.5 or? No, because MySQL 5.6 introduces the global transaction idea. This global transaction idea gives the ability to easily find the most updated slave
because I can just check which has the slave with many transaction applied. The problem with the 5.5 is that the replication is based on the position of the log file in the binary logs, while in having a transaction ID,
you don't need to specify a position. You just tell the master, I have got those blocks of transaction, give me the ones that I don't have. Whatever the position is, so I can get some parts before and some parts in another side.
I strongly suggest to migrate as soon as possible. Migrate nicely, so make your test, not every application may play nice with 5.6, but look at the future, look at 5.6,
look at 5.7, try, even if it's not stable or general availability, the 5.7 starts. Start to try because there are many improvements on performance and the storage engine. Those are read the changelog of the 5.7. There are important stuff for people
take caring of production, of consistent, consistency and so on. Okay, I'm out of time. We can talk at the cafeteria. Thank you very much.