We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Tapping Into the Binary Log Change Stream

00:00

Formal Metadata

Title
Tapping Into the Binary Log Change Stream
Title of Series
Number of Parts
561
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Row-based binary log is mostly used as a logical replication log for MySQL. However, ever since row-based replication was introduced, it has also been widely used as an integration point between a MySQL server and other components in an IT infrastructure. It is often used as a capture-data-changes stream, as a source of data for extract-transform-load operations or even as an event notification trigger (e.g., propagating transaction information to proxy layers). Commonly deployed setups revolve around collecting/subscribing to data changes and propagating these to downstream consumers through a message bus, like Kafka for instance. This session will present such use cases, highlighting the additional metadata added to the binary log in the latest releases, explain how to efficiently make the most out of these and how to optimize the implementation of a change capture procedure. We will also showcase a couple of example plugins that tap into the server's binary log stream and export it even before it reaches the binary log files.
10
58
80
111
137
Thumbnail
15:21
159
Thumbnail
18:51
168
Thumbnail
26:18
213
221
Thumbnail
15:22
234
Thumbnail
49:51
248
Thumbnail
23:06
256
268
283
Thumbnail
28:38
313
Thumbnail
1:00:10
318
Thumbnail
21:35
343
345
Thumbnail
36:13
353
Thumbnail
18:44
369
370
373
Thumbnail
44:37
396
Thumbnail
28:21
413
Thumbnail
16:24
439
455
Thumbnail
25:10
529
Thumbnail
15:36
535
Thumbnail
28:04
552
Binary fileMathematicsStreaming mediaDirectory serviceSoftwareReplication (computing)Plug-in (computing)CodeServer (computing)BitGroup actionConnectivity (graph theory)Boilerplate (text)Software frameworkCartesian coordinate systemPresentation of a groupMultiplication signBinary codeComputer animation
Binary fileRow (database)PlanningStatement (computer science)Software frameworkLoginCoordinate systemCASE <Informatik>Cartesian coordinate systemDressing (medical)Replication (computing)Server (computing)MathematicsInstance (computer science)Database transactionPlug-in (computing)Binary fileTelecommunicationBitSurfaceMotion captureGroup action
TopologyDatabaseBinary fileMotion captureInstance (computer science)MathematicsParallel portReplication (computing)Semiconductor memoryDifferent (Kate Ryan album)Numbering schemeSet theoryLoginPhysical systemSlide ruleMiniDiscRow (database)Network topologyStructural loadData storage deviceBuffer solutionGroup actionMatching (graph theory)Regular graphCombinational logicMotion captureCircleTerm (mathematics)Mathematical optimizationBuildingMultiplicationChainSpur <Mathematik>Right angleSequenceComputer animation
Binary fileFile formatRow (database)DataflowMotion captureMiniDiscMathematicsCASE <Informatik>Replication (computing)Database transactionPlug-in (computing)File formatLoginGroup actionMoving averageBinary filePoint (geometry)MetadataServer (computing)Motion captureRow (database)Run-time systemData bufferEvent horizonStatement (computer science)Type theoryQuery languageInstance (computer science)MiniDiscStreaming mediaTable (information)Level (video gaming)Data storage deviceCache (computing)Multiplication signRight angleSemiconductor memoryNetwork topologyComputer architectureSet theoryComputer fileField (computer science)Commitment schemeCore dumpSequenceContent (media)Bus (computing)Phase transitionKey (cryptography)Form (programming)Descriptive statisticsData recoveryMessage passingRun time (program lifecycle phase)Integrated development environmentData loggerIntrusion detection systemComputer animationProgram flowchart
MathematicsPersonal digital assistantMotion captureOracleVideo trackingCASE <Informatik>Proxy serverMoving averageDisintegrationLogic gateBinary fileData recoveryDataflowComa BerenicesLocal GroupPlug-in (computing)Replication (computing)Database transactionServer (computing)CASE <Informatik>Motion captureMathematicsBinary fileInstance (computer science)Database transactionLogicStreaming mediaData recoveryDifferent (Kate Ryan album)Fitness functionGoodness of fitMessage passingTransformation (genetics)File formatPower (physics)LoginProjective planeData miningRevision controlChainTrailServer (computing)Point (geometry)Replication (computing)INTEGRALSoftware frameworkGroup actionVideoconferencingProxy serverPhysical systemMoving averageDecision theorySquare numberState of matterMetadataQuery languageMultiplication signSet theoryPresentation of a groupReverse engineeringOrder (biology)Connected spaceSlide ruleIntrusion detection systemProcedural programmingFormal languageBuffer solutionDatabasePropagatorMereologyEvent horizonProcess (computing)Plug-in (computing)Module (mathematics)Data storage deviceTap (transformer)Computer architectureTable (information)DiagramNatural languageBuildingData integrityPhysicalismComputer animation
Replication (computing)Plug-in (computing)MathematicsLocal GroupCASE <Informatik>Database transactionServer (computing)DataflowMotion captureTuring testPhase transitionPoint (geometry)User interfaceImplementationState observerInformationSource codeDatabase transactionCommunications systemCartesian coordinate systemReplication (computing)Server (computing)Moving averageDiagramCommunications protocolPhysical systemOrder (biology)Roundness (object)Data bufferCASE <Informatik>Decision theorySemiconductor memoryMobile appBuffer solutionView (database)SynchronizationConnectivity (graph theory)Streaming mediaParameter (computer programming)Motion captureData storage deviceBinary fileGroup actionCache (computing)Procedural programmingRevision controlMultiplication signComputer fileHookingRepository (publishing)Point (geometry)Message passingMathematicsException handlingBoilerplate (text)Plug-in (computing)CodeCore dumpPlanningState observerPhase transitionRun time (program lifecycle phase)MereologyFinite-state machineCommitment schemeLoginTelecommunicationResultantTotal S.A.Program flowchart
Block (periodic table)MIDIRecurrence relationBinary fileServer (computing)File formatLoginDifferent (Kate Ryan album)Set theoryMultiplication signBuildingPower (physics)Software frameworkBlock (periodic table)Instance (computer science)Plug-in (computing)Binary fileConnectivity (graph theory)Physical systemLevel (video gaming)Sinc functionWindows RegistryMusical ensemblePresentation of a groupMechanism designCommunications protocolProgramming languageDevice driverProcess (computing)PlanningMathematicsData typeJSONComputer animation
Euler anglesComputer animationLecture/Conference
Transcript: English(auto-generated)
Alright, hello everyone, my name is Luis and I've been working on MySQL replication for a little over 10 years now. For those who don't know me, I'm Luis, I lead the replication team, I've been around
for 10 years, I'm based out of Portugal, I've spoken here at FOSM a few times already, it's one of the places I like to come and speak about much more technical, deeper kind of
things. And I could talk about replication for hours, I think that some of my colleagues are well aware of that. But today I'll just be talking about a very specific thing, which is the binary log and
more specifically how the binary log is created inside the server, how it is utilized by different components inside the server including replication and group replication and how we can look into this framework that exists inside the server and maybe end up this presentation with a little bit of boilerplate code if you will to create maybe some plugin that
can actually tap into the server and extract this. So I'll start just, of course, I'll talk a little bit of MySQL replication but not that much, I promise.
And then I'll describe the binary log, I'm pretty sure that everybody here knows what the bin log is for short. Actually one raise of hands, who here uses statement-based replication only? Who here uses statement-based replication?
And who here uses role-based replication? The rest, right? So we will also be focusing on role-based replication for obvious reasons when it comes to change data capture using the bin log. And then I'll try to go into this looking into a couple of use cases, the MySQL group
replication plugin for instance, which extracts the changes from the server and pushes it out to this Paxos communication framework and does all this coordination and then decides something and tells the server to commit to reward transactions and so on. And then I'll conclude the session, this is basically the plan for this 20-minute session.
So MySQL replication, this is a very common slide that I usually pull up on our sessions. Replication is very simple, master-slave and then there's a log that is shipped around. This log contains, records the changes that happened in the primary or the master and
these changes are then propagated to the slave and the slave applies them and if it generates its own change log, its own binary log, then you can build all these different topologies that for instance Peter was talking about earlier. For better or for worse, MySQL replication is almost like a Swiss army knife or a loaded
gun as Peter said as well. So you can hurt yourself plenty with it, but it can really get you out of some nasty problem situation that you may run into. The interesting thing that we're talking about here today is the binary log and
the master generates a binary log, the one that is shipped around and the slave also generates the binary log. There's the persistent buffer which we typically call the relay log on the slave which is basically get the changes, store them on disk while I'm trying to apply the log and
just persist them on disk while this is going in case so that we don't saturate for instance memory. This was very important a long time ago, but not so much nowadays because nowadays we have parallel slaves and different parallelization schemes with write sets we can almost come to an almost optimal execution history in terms of parallelization.
So this persistent buffer is not so much needed anymore as it used to be, but it's still there. And we can build multiple replication topologies, you know, simple primary secondary or master slaves, a chain or a tree or a star or a circle or a group now with group replication
where we can have external secondaries from the group so you can mix and match group replication with regular asynchronous replication or you can have an external primary and then replicate into a group.
And why is this important in this session? It's important because we can have all these combinations of topologies together and therefore we can have dedicated instances of MySQL where we can actually play around to do some interesting things related to change data capture where we don't disturb the rest
of the topology. Over the years I've seen a lot of people do these kind of things like having a dedicated slave with row-based replication enabled to be able to mine the binary log, transform it, load it into some other system like Hadoop or something else.
Okay, so the binary log. The binary log is used for replication but not only replication, point in time recovery, integrating with other technologies, rolling upgrades is a very important thing when it comes to rolling upgrades because it makes it so easier to just do a rolling upgrade
on your topology and more. And as I said, we're focusing on row-based replication format because it's a feature-rich format to be able to load data, capture the changes that happen inside the server as they are recorded in the bin log.
Over the years the bin log or the metadata that we put into the bin log format in row based form, sorry, over the years the metadata that we put in the bin log in row format has been extended so that we also make it easier for you guys that are actually implementing these special cases of capturing and loading and transforming the bin log and loading it
into something else becomes easier. For instance in 8.0 we have more metadata when it comes to the table definitions so the table map log event for those that know the contents of the bin log has additional information such as the name of the columns, such as which fields are the primary key,
well, whether this column is signed or unsigned and so on, the character side and so on and so forth. And at its core it's really just a sequence of events, we call them events, which, you know, there are some type of events that are control events like rotate format description,
GTIDs and so on and then there's the data itself which is carried over as, you know, a query log event or row-based row events. But in the end it's a sequential history of the execution that happened on the master with the changes that were produced by then.
And how is this log, change log formed? First of all, and mind you that again I reiterate we're focusing on row-based replication, the way it is formed is that when a transaction begins and the statement is executed at the
handler level between the SQL layer of MySQL and the storage engine, MySQL, as many of you probably know, is a layer that is built on a layered architecture where you have this SQL layer which has the runtime environment and so on and so forth and then there's
the handler that separates the SQL layer from the storage engine itself. So when the data is propagated, you know, from the SQL layer to the storage engine, we intercept the changes, the role changes, the record changes, we capture those, we put those changes in the memory cache, in the memory buffer and we buffer this until the
commit comes. When the commit comes, what happens is that we get this prepared in the storage engine, so internally we run a two-phase commit for instance between the bin log and the storage engine, but right before we flush these changes to disk, to the file, to the
bin log file, before we persist these changes, this cache in the bin log file, we notify a plugin or plugins that could be listening to these changes. Think about group replication for instance. At this point in time, before we flush the transaction to disk, we notify a listener for this change, group replication for instance,
and group replication takes this data and pushes it out into Paxos. By the time it gets back from Paxos, okay, transactions is actually committed, so we flush it to the bin log and then we notify plugins that might be listening that, you know, I want to be notified once the transaction
commits. But the important thing here is that, to note, is that the capture is all done in memory and by the time a transaction commits, we share these captured changes with plugins or other listeners inside the server. And in this way, you can think of it as a,
you know, a change, an event bus for data changes that is running inside the server. And this stream of changes materializes as a set of files on disk, which we typically know as, you know, when we talk about, the bin log is really very much overloaded nowadays,
so it means many things. And usually people, when they talk about the bin log, it's a set of files that you can go in, inspect what their content is with MySQL Binlog2, like Shlomi was talking earlier here today as well. But at the end of the day, the change stream materializes as a set of files on disk. Yeah, so there's the MySQL Binlog2 to actually
go in and open these files and see what's in there. So use cases for change capture using the bin log. I'm going to talk about, or show some examples of some projects that build on the bin log to do some kind of change propagation
or some kind of procedure automation that really benefits from this log. I'm not going to talk about things like flashback because Shlomi already did and I don't have it in the slides, but that's another use case that you can use this change log
to actually replay it backwards and do these things like recovering, if possible, recovering an instance to a certain point in time by replaying the change log in reverse order. So it's actually like a compensation action rather than reversing.
Okay, change tracking. I think René is somewhere here. So René in proxy SQL, for instance, taps into this change stream and in this proxy SQL case, the process that mines the stream is actually ignoring the changes but capturing
the metadata that is in the change stream. It's extracting the GTIDs that run through this change stream and collecting a global knowledge of the system to know and understand in which state is each of the replicas in my system so that I can make decisions
about the routing of queries to the proper replica. This is how, for instance, proxy SQL. This is very roughly how proxy SQL does implement consistent reads across a set of replicas.
Shlomi also has this tool called GHOST, which stands for GitHub's Online Schema Transformations, which basically is a tool that tries to do a user-land online schema change by copying data from one version of the table to the newer version of the table
and in the meantime build on, tap into this change stream that is being executed while this process is running and trying to collect these changes to later on apply them to the schema transformation as well and do all this in an automated fashion and online.
I mentioned already online rolling out process is a really powerful use case or a really good fit for logical replication or change stream based replication where you can have
servers in different versions. They probably have, for instance, different on-disk layouts, different physical layouts, so you can rely on the logical stream to actually make sure that you have a sound way forward with respect to rolling out different versions of the server. Data integration. This is, again, a very common use case. I put here Oracle Golden Gate,
but a lot of people do it with, for instance, Maxwell and other kinds of framework where they tap into the bin lock stream, connect, for instance, as a fake slave to a master. They bring the bin lock, they process it, transform it maybe into some more language neutral
format, JSON or re-encoded in protobufers or something like that and push it out into some other message, but in this case, for instance, Kafka or something else. Some people actually
use that as well to move data back and forth from different databases, from MySQL into Oracle or from Oracle into MySQL or from MySQL into some other database and back and forth. And for point-in-time recovery as well, right? We can take a snapshot, we can rely on the bin lock to make it so that we can take that snapshot and roll forward some
of the change log that happened after this snapshot up to a certain point in time. So we can roll forward the changes just like Shlomi did on his flashback presentation. She said, I want to roll back, compensate my or revert the state of my server up until this point.
We can do that, of course, in the opposite way as we all know. So advanced data change data capture. So coming back to this interesting diagram here, you know, we capture the changes while the transaction is executing. Once the transaction
is about to commit, we notify that it's going to be committed and then we flush these changes to the bin log and then we notify that these changes have been flushed and committed in the bin log in the storage engine locally. So let's have a look at the group replication use case. Group replication is a plug-in,
it taps into the server, it has all these, you know, fancy diagrams and stack, very nicely layered architecture, different modules for capture, apply, recovery, conflict handling and so on. And it has this very interesting module here, which is the capture
part. And the capture part is really the guy that implements these hooks to tap into the server and say, when you load the plug-in, it says, okay, I'm going to register myself as a listener for the events that are propagated. And then at commit time, I want to be informed that these
events have happened. So I want to take a part of the decision of what's going on on this transaction execution timeline. And I want to know what was actually changed. So it intercepts the change buffer and sends it out to the group in the case of group replication. execution takes place, it captures it, then it pushes it further down in the stack
to the group communication engine. And at the very low end, a Paxos consensus round is done, is reformed, you know, majority acknowledges, we all make the same decisions everywhere. We have total order delivery in the system and then everybody advances in the same order.
This is at the end, it results in this distributed replicated state machine. But at the end of the day, what I'm really interested here is to show is how does group replication intersect these changes.
So the intersection takes place here, right before we flush to the bin log. This is, not at this point in time, the changes are still in memory, in this buffer, in memory buffer. It's an IO cache, for those that have actually looked into the code, it's an IO cache. And this means that the cache is mostly in memory, so it has a fixed size
memory buffer. If the transaction changes are bigger or larger than this buffer, it spills to this disk, so it swaps. But at the end of the day, we can consider it just as a memory buffer. So we take this buffer and we share it with group replication.
This is the same diagram with slightly more detail. We execute, we commit, we prepare, we start the two-phase commit protocol inside the server. Then we take this, we replicate in
the before commit hook, we give it to group replication. Once group replication says yes, okay, you're good to go, we commit the transaction. Otherwise, we roll back the transaction. Maybe there was a conflict, maybe this guy was sent out of the group, maybe something wrong happened. But if the group replication says yes, the commit procedure continues, we flush
and sync to the bin log, we commit to the storage engine. Committing into the storage engine means we externalize to the SQL layer what has been changed. So all logs are released and now you can see you have a different view of the data. When we flush and sync to the bin
log, it actually means that we externalize for consumers of the bin log. So slaves will already see it and other components in the system that are listening to this replication stream will already see it. When we propagate to group replication, it means that we externalize
already there to the members of the group. So that's one minute, okay. So that's when the actual commit takes place. So it's externalized to the group here as well. And what are the internal APIs for this? If you go into the source code and you look,
you'll notice that group replication implements these APIs. The before commit hook and then the parameter for this callback which contains these caches that contain the changes.
So it gets this notification with the caches and the group replication plugin can go in, open these caches, learn the changes and so on and so forth. This is the oversimplified version of the before commit hook. It just basically says, well, I got the caches, I did whatever I needed to do with them and at the end of the day,
I send it out to the group. I issue a send message to the group communication system. This is, at its core is that simple. So the key point is that the capture and extraction are the same as for regular replication except that it happens earlier in the commit phase.
And just to finalize this session, there is this observer plugin which is kind of like a boilerplate code that is also shipped in the source code of the server which just implements a very dummy
capture plugin. So this is the way, I mean this is the source, kind of like the summary of the source code of this plugin. It just implements the before commit hook and the only thing it does is increments a counter. Then at initial, okay, so this is actually just the before
commit hook so it implements a counter and the way it works is that when this plugin is loaded, it registers this hook in the server so that it intercepts this diagram that I showed earlier so that it intercepts this notification and then do something. In this case,
it just does before commit counter plus plus so increments this counter. And that's what I said basically. And if you're really interested into going to the, looking to the source code and whatnot, of course you can. It's, you just go to GitHub's MySQL server repository and
look at this replication observers example .cc file. So to conclude, the binary log is a very useful building block. We see it every day. Today, we've had, even today we had the presentation for instance explaining it. It is not just for replication and
since, you know, very, since very long time ago, since early 5.7, we have this framework where plugins can tap into and listen to notifications. One comment before I close,
we didn't have back then but now we have in MySQL 8, we have this beta infrastructure which we call the service registry, where it's so much easier to register, you know, services inside the server so that we have another different set of consumers that, you know,
just go to the service registry, look up the service that they want to listen to, register as listeners and so on. And we are thinking about moving these hooks that I was explaining here into the service registry to make it even more general for you, me, anyone write a plugin to actually make use of this infrastructure in a much easier way going forward as well.
So that's that. Thanks for, you know, listening to me and this is really cool stuff. I really hope you enjoyed it as much as I do. Okay, thank you. Any questions? Okay.
The question is, maybe you just answered it in this last sentence. I think the problem we see today is like you said, there's many drivers or like many processes that will tap onto the
binary log and in different programming language and anytime the binary log format changes or there's a new data type or whatever, everybody needs to update their code, 12 different programming languages. So the question is, will there be a mechanism where we will just tap onto the plan protocol or some API and just get the changes like JSON and whatever, something that
it's easy to repeat the question. So the question is really, will there be a less opaque format to the bin log so that we can make full sense out of it? That's the question. We are really looking into that because the bin log is actually an API to
integrate MySQL into other systems. Someone earlier today was on this stage saying, we have this, this is very good, we have that, that is very good. The real power comes when we combine these two things together. So any two components, MySQL, some other framework,
they need to speak to each other to provide combined value and we see that many times. So we are looking at making the bin log format much more interoperable, if you will. All right, so if you have more questions, just grab me outside and I'll be happy to talk to you. Thank you.