We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Automatic Connection Failover for Asynchronous Replication Channels

00:00

Formal Metadata

Title
Automatic Connection Failover for Asynchronous Replication Channels
Alternative Title
Automatic Asynchronous Replication Connection Failover
Title of Series
Number of Parts
637
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
MySQL Group Replication is a solution that implements a fault-tolerant database system in which redundant components can be removed automatically without compromising the overall system's availability. The Asynchronous Replication Connection Failover feature complements this by further increasing the resilience of replication connections between a replica and a set of sources. It allows asynchronous replicas to switch to a new replication source when the current connection fails. Therefore, an asynchronous replica can automatically switch-over its replication connection to a new member, when it is replicating from a Group Replication group, for instance. In this session we will present how Asynchronous Replication Connection Failover feature automates the process of re-establishing a replication connection to another source, upon failure of the current one. We will also present about how this feature integrates with Group Replication and thus how setting it up can be automated as well. Come and learn more about this new and exciting replication feature in MySQL, directly from the engineers developing the product.
179
Thumbnail
20:09
245
253
Thumbnail
30:06
294
350
Thumbnail
59:28
370
419
491
588
Thumbnail
30:18
OracleLocal GroupComputer programError messageServer (computing)WebsiteConnected spaceClient (computing)Process (computing)Configuration spaceGroup actionMathematicsUser-defined functionMachine visionPresentation of a groupHazard (2005 film)Physical systemConnected spaceInsertion lossMultiplication signMaxima and minimaDatabaseGroup actionReplication (computing)Configuration spaceSheaf (mathematics)Data centerAnalytic continuationCovering spaceComputer hardwareBitServer (computing)Error messageProcess (computing)Fault-tolerant systemMereologyConnectivity (graph theory)Internet service providerHydraulic jumpCASE <Informatik>Type theoryCartesian coordinate systemSoftware developerMathematicsConservation lawLevel (video gaming)Physical lawAnglePattern recognitionField (computer science)EmulatorCondition numberRight angle2 (number)Forcing (mathematics)Parameter (computer programming)GoogolSingle-precision floating-point formatCodecDemosceneInternet forumPrisoner's dilemmaOrder (biology)VideoconferencingDegree (graph theory)
Configuration spaceUser-defined functionLocal GroupConnected spaceClient (computing)Group actionWeightElectronic mailing listDatabaseMathematicsQuery languageThread (computing)Server (computing)Mechanism designFormal verificationVirtual machineState of matterGenderElectric currentUser-defined functionTable (information)Computer configurationSynchronizationConfiguration spaceoutputConnected spaceServer (computing)MathematicsThread (computing)Mechanism designWeightVirtual machinePairwise comparisonSlide ruleParameter (computer programming)Sheaf (mathematics)MereologySingle-precision floating-point formatException handlingOcean currentDatabaseFigurate numberInternet service providerSystem administratorType theoryGroup actionFunction (mathematics)Regulator geneService (economics)Gauge theorySquare numberFitness functionCustomer relationship managementBit rateMachine visionWeb pagePhysical lawAreaGoodness of fit2 (number)WordPattern recognitionArithmetic meanState of matterElectronic mailing listStress (mechanics)Rule of inferenceOrder (biology)Semiconductor memoryComputer animation
User-defined functionConfiguration spaceError messageCountingWeightForcing (mathematics)BuildingConnected spaceMultiplication signNumberBit rateMathematicsDemosceneCondition numberRight angleSelf-organizationComputer configurationRegulator geneArithmetic meanBootingReal numberOrder (biology)2 (number)Thread (computing)Normal (geometry)Replication (computing)Default (computer science)CountingError messageInfinityWeightXMLSource code
WeightError messageCountingWeightConnected spaceComputer animationSource code
Element (mathematics)Computer animation
Transcript: English(auto-generated)
Hello, everyone. Thank you for joining this presentation. I am Hemant Dange and I work for MySQL group application team as a developer. I am here today to present automatic connection failover for asynchronous replication channel.
This presentation is divided in four parts. First, I'm going to present why this feature is needed. Then I will share with you what this feature is about in automatic connection failover section. Then what are the requirement to use this feature and how to use this feature with example in usage and requirements section.
And finally, we'll cover when this feature gets activated and how it handles continuous reconnection failures in error and reconnection section. Now let's begin with first section, that is background. To explain why this feature is needed, let's consider a setup.
We have a group set up in New York data center with three servers, S1 primary, S2 and S3 are secondary. To have a bit better data availability, we have a standby replica set up in London, connected to New York side through asynchronous replication, C1 here.
Due to some hardware failure or some unknown reason, suppose primary S1 fails. So S2 becomes a new primary. The MySQL group replication implements a fault-tolerant database system in which redundant components can be removed automatically
without compromising the overall system's availability. So when S1 fails, it has to become new primary and S1 is removed from the group automatically without any user intention. But what about the asynchronous replication connections even if the standby replica was set up to handle disaster
due to which sometimes whole data center fails. And if asynchronous replication connection need to be set up by DBA every time it fails, then it is not a very reliable setup and loss of data can happen. So what we need to do, so what we need here,
we need a system which can automatically reestablish this failed replication connection with minimal human intervention. We need a system who can independently gather a new member of the group replication group so that a user doesn't have to add a new member
in a group, suppose in this New York data center, S4 is added. So this system should automatically know that okay, new member has been added and it added as a potential source. A system who can detect removed or failed sources.
So when S1 fails, so it should remove from, the system should remove this S1 from its potential source. So next time it shouldn't consider this replication channel to connect to this S1.
A system through which user can prioritize a particular source to which replica should attempt to connect first in case of failure. So if user want that okay, S3 should be considered first for our new connection and not S2.
Whether S2 is a primary or not, it shouldn't matter. I'm like if user want that S3 should be connected. So there should be a way to prioritize a particular source. So we have added all this requirement and many more in automatic connection failover feature.
Automatic connection failover makes deployment fault-ordinate of source failure by automating the process of re-establishment of an asynchronous replication connection to another provided source. That is S2 here.
This feature was introduced in MySQL 8.0.22 and further enhancement related to group replication were added in 8.0.23. So now that you are aware of why and what this feature is, let's jump to how to use this feature. Let's consider there are two sites,
New York and London connected to asynchronous replication channel CH1. In New York data center, there is a group with three servers, S1 primary, S2 and S3 secondary. We also have S4 which is not a member of the group but connected to this group through asynchronous replication channel CH2.
The server UOUID for server 1234R, 111, 2222, 333 and 444 respectively. And group replication group name for this group is GG4G. These are not actually UOUID format. They are mentioned as they will be used
in examples later on. To connect to source VBA needs to provide connection credentials so that replication channel can be set up by replica. The VBA should provide the same credential on the replica for all the potential sources
for a given channel. On the replicas server, create a replication channel with the replication credential of the source server. So we can use this like change replication source command to create
and we have provided host and port details of the S1 server. We also have added source retry account is equal to two. I will tell you this in error and reconnection section by this retry account is set up as two.
Also grant access to performance schema database for this replication user on all the source servers. We need this performance schema database access so that this feature can access details of the source configuration.
We added connection credential. Now let's add source details. There are two ways to add source connection configuration detail. Before I tell you how to add source configuration detail, let's know managed and unmanaged source. The managed sources are sources who have group replication enabled and who have managed type is equal to group replication.
You only need to add one member connection configuration detail. The rest of the members will be automatically gathered by reconnection feature. I mean like more like it is managed source so and some few more features are provided
for managed sources. They will be basically they will be managed by the feature and the unmanaged source are single servers who are not managed by this feature and every details has to be added by group. Mostly these are single servers who are not group members like S4 in our figure earlier
which yeah this figure S4 is a single server and S1, S2, S3 we can add as a managed sources. The asynchronous connection failover admin is UDF
can be used to add managed source configuration detail. For the UDF you need to provide channel name, managed type which is group replication only right now. Managed name which is group replication group name, a UID to identify member of the same group. Host and port details of one of the member of the group
so through which it can gather the rest of the member details. Primary weight of value between one and 100 would be used to assign priority to primary members. Secondary weight of value between one and 100 would be used to assign priority to secondary member. If you want your replica to always stay connected
to primary, add more bigger priority to primary weight in comparison to secondary weight. Suppose here for S2 and S3 we provided S2 secondary S2 and S3 we provided higher priority than S1.
So it would connect to S2 instead of S1. I will tell more about this in later section when we discuss about this failure weight. And right now let's see the asynchronous failure at source, unmanaged sources how to add them.
UDF can, the asynchronous failure at source UDF can be used to add connection configuration detail for an unmanaged alternate source. All the UDF parameters are same only and I already explained in last slide except weight, a value between one and 100 would be used to assign priority to this source.
For single server there's no primary and secondary so and it is unmanaged is not a part of the group. So weight directly would be used. The values inserted can be verified in replication is asynchronous connection failure table
of the performance schema database. So whatever you added in the with using last two UDFs you can check those details in this replication is in synchronous connection failure table. Now we have potential sources added enable the feature for the replication channel CH1
using change replication source command. The source connection auto failover is the option added with this feature. So if you want to enable this feature make it value as one, if you want to disable make it value as zero. And its current status can be checked
through replication connection configuration table which is in performance schema database. So now we have enabled the feature. Now let's start the replication channel. Start the replication channel CH1 on the replica server and check its input output thread status.
It would connect to source S1 as we provided S1 host and port in the change replication source command earlier. So source UUID because source one has UUID of 1111 as I mentioned earlier.
Now to see the asynchronous connection failure mechanism working log into host machine of source S1 and stop the MySQL server. It should connect to S2 or S3
depend on the failure weight, weightage, whichever is higher. Let's consider S2 as higher weight. So it got connected to S2. It can be checked in the replication connection status table.
S1 is removed from replication asynchronous connection failover automatically. The failure feature automatically update this table when source is added or removed. It also updates the failure weight of new primary which is S2 now.
Similarly, if primary is switched to S3, S2 would get secondary weight and S3 weight would be updated with the primary weight and replica would reconnect to this new primary.
If we add a source with higher failure weight than current connected source. Suppose we added a new single server S5 with higher failure weight 90 in comparison to other.
Let's see in the replication asynchronous connection failover. So highest weight is for S5 in comparison to S2, S3 and S4. So replica will reconnect to S5 now because it has the highest rate.
You can again verify that using replication connection status table. And similarly, we have also provided two UDF, the asynchronous connection failover delete managed and for unmanaged UDF also. So the asynchronous connection failover delete managed UDF
can be used to delete managed configuration details. And similarly, the asynchronous connection failover delete source UDF can be used to delete single servers or delete source configuration details of single servers. Now let's move to our last topic, error and reconnection.
The asynchronous replication failover kicks in only when existing replication connection errors out. Otherwise, it doesn't interfere in normal working of IO or SQL thread. So the gathering of source details or any other health checks,
which it does is not done by IO or SQL thread. What I mean to say this feature won't affect performance of IO or SQL thread. It only starts working when the existing IO thread fails. And for all the remaining checks,
which it does, it's done by another thread, not by IO or SQL thread. And okay, reconnection. When asynchronous replication reconnection is triggered, the replica will try to reconnect until it succeeds or stop replica is executed.
So this reconnection trial, like it keeps on trying indefinitely. It won't stop. It will only stop, sorry, infinitely. It will only stop when a stop replica
is executed by DBA manually. Or it succeeds in connecting to any of the available sources. The reconnection is triggered only when source connection auto-failover is this feature is enabled.
It will try connect to each source, source retry count times after every source connect retry interval between each reconnection. So it will try for each source, source retry count times. And after that, when this retry counter exhausted,
the number of trial it has already tried. After those get exhausted, it will try connecting to another source. We do recommend the values, source retry count is equal to three and source connect retry is equal to 10,
which makes the replica retry the connection three times with 10 second intervals between. We are recommending these values because the default values of these options are very large and it would make the switch to a new source, another source.
It will take around 60 days. The default values are so large for source retry count and source connect retry. So prefer label, you should use three and 10 as its values. Also the next source, which would be selected with will be with highest failover weight
for the same channel. I already told you my like, if you, the S5 was selected earlier which we seen in the example earlier. So S5 was selected because it has the highest failover weight. So whenever it tries connecting to a new source,
when after that connection, existing connection fails, it would be picking up the source which is having the highest failover weight in replication asynchronous connection failover. Thank you everyone.
Hope you would like this feature and would be useful for you all. Please check this feature and let us know what you think about this. Thank you, bye.