We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Heavy Duty Backup with PgBackRest

00:00

Formal Metadata

Title
Heavy Duty Backup with PgBackRest
Title of Series
Number of Parts
29
Author
Contributors
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceOttawa, Canada

Content Metadata

Subject Area
Genre
Abstract
PgBackRest is a backup system developed at Resonate and open sourced to address issues around the backup of databases that measure in tens of terabytes. It supports per file checksums, compression, partial/failed backup resume, high-performance parallel transfer, async archiving, tablespaces, expiration, full/differential/incremental, local/remote operation via SSH, hard-linking, and more. PgBackRest is written in Perl and does not depend on rsync or tar but instead performs its own deltas which gives it maximum flexibility. This talk will introduce the features, give sample configurations, and talk about design philosophy. PgBackRest aims to be a simple backup and restore system that can seamlessly scale up to the largest databases and workloads. Instead of relying on traditional backup tools like tar and rsync, PgBackRest implements all backup features internally and features a custom protocol for communicating with remote systems. Removing reliance on tar and rsync allows better solutions to database-specific backup issues. The custom remote protocol limits the types of connections that are required to perform a backup which increases security. Each thread requires only one SSH connection for remote backups. Primary PgBackRest features: Local or remote backup Multi-threaded backup/restore for performance Checksums Safe backups (checks that logs required for consistency are present before backup completes) Full, differential, and incremental backups Backup rotation (and minimum retention rules with optional separate retention for archive) In-stream compression/decompression Archiving and retrieval of logs for replicas/restores built in Async archiving for very busy systems (including space limits) Backup directories are consistent Postgres clusters (when hardlinks are on and compression is off) Tablespace support Restore delta option Restore using timestamp/size or checksum Restore remapping base/tablespaces
SineBackupDemo (music)Table (information)SoftwareComputer hardwarePort scannerServer (computing)Independence (probability theory)Data recoveryImage resolutionMilitary operationTape driveDensity of statesStreaming mediaTable (information)File archiverNormal (geometry)Computer hardwareTraffic reportingDatabaseSoftwareMultiplication signServer (computing)Point (geometry)Scaling (geometry)WritingPartition (number theory)MathematicsCrash (computing)Open setCovering spaceData managementDatabase normalizationCASE <Informatik>Structural loadSynchronizationQuicksortSystem callBitElectronic mailing listWeb pageRight angleComplex (psychology)Product (business)Exterior algebraScripting languageOpen sourceMereologyIntegrated development environmentReplication (computing)Online helpBackupComputer fileSoftware developerInformation privacyProcedural programmingLevel (video gaming)Demo (music)MultilaterationMoving averageVideo gameDivisorNuclear spaceFreewareGoodness of fitOcean currentData storage deviceSubject indexingComputer architectureGraph (mathematics)Thomas BayesEntire functionString (computer science)Core dumpOperator (mathematics)EmailState of matterCountingDirected graphComputer animation
Heat transferParallel portStreaming mediaDifferential (mechanical device)NP-hardInclusion mapBackupMilitary operationBlogDirectory servicePort scannerData recoveryLocal ringRemote procedure callPhysical systemHeat transferData compressionServer (computing)Differential (mechanical device)File archiverData recoveryDirectory serviceBackupTimestampFile systemComputer fileImage resolutionSpacetimeDatabaseCommunications protocolScaling (geometry)Gene clusterArmConnected spaceStreaming mediaData structureParallel portElectric fieldOperator (mathematics)Selectivity (electronic)QuicksortThread (computing)Software testingLinear regressionComputer configurationRewritingLink (knot theory)Directed graph1 (number)Installation artCASE <Informatik>InformationWebsiteSource codeProduct (business)Type theoryMathematicsCausalitySeries (mathematics)Point (geometry)Multiplication signTrailCodecKernel (computing)MetadataMiniDiscLimit (category theory)Process (computing)Cartesian coordinate systemScheduling (computing)Virtual machineInferenceClosed setGroup actionDistanceRight angleCore dumpTable (information)SummierbarkeitConsistencySystem callComputer animation
Length of stayParameter (computer programming)Computer networkMetropolitan area networkSoftware testingBackupMathematical singularityData recoveryCodeWindowMultiplication signReading (process)MiniDiscData storage deviceLevel (video gaming)Default (computer science)SummierbarkeitMultiplicationCASE <Informatik>MereologyDenial-of-service attackParallel portMessage passingSynchronization2 (number)Computer fileCodeProduct (business)BenchmarkPlanningScaling (geometry)QuicksortProcedural programmingUsabilityClient (computing)Point (geometry)Data compressionCore dumpDirection (geometry)Video gameSource codeCycle (graph theory)Asynchronous Transfer ModeComputer programmingPairwise comparisonStreaming mediaFile formatDatabaseRecursionSet (mathematics)File archiverSoftwareReal numberDirectory serviceThread (computing)Communications protocolIntegrated development environmentDirected graphData managementBuffer solutionReduction of orderSpacetimeTheory of relativityForm (programming)Single-precision floating-point formatParticle systemEvent horizonNumberSheaf (mathematics)Data recoveryComputer configurationPhysical systemVirtual machineInstance (computer science)Right angleSoftware developerBasis <Mathematik>CausalityLaptop
Demo (music)Metropolitan area networkValue-added networkGamma functionGrand Unified TheoryArmRegulärer Ausdruck <Textverarbeitung>Maxima and minimaInclusion mapInformationHand fanComputer iconMultiplicationAbstract syntax treeStudent's t-testRight angleDemo (music)Table (information)File archiverScripting languageDatabasePhysical systemComputer fileComputer programmingRevision controlOperator (mathematics)Type theoryMatching (graph theory)SoftwareIntrusion detection systemParallel portEmailFunction (mathematics)Virtual machineMassSampling (statistics)LogicData recoveryLevel (video gaming)Directory serviceAsynchronous Transfer ModeInformationData managementSpacetimeMereologyThread (computing)Form (programming)AreaData compressionBackupArithmetic meanPairwise comparisonProcess (computing)Multiplication signConfiguration spaceLibrary (computing)Bound stateVarianceGame controllerSoftware repositoryDifferent (Kate Ryan album)Link (knot theory)Point (geometry)Core dumpSingle-precision floating-point formatExtension (kinesiology)Repository (publishing)Uniform resource locatorBenchmarkCASE <Informatik>Universe (mathematics)QuicksortPointer (computer programming)WebsiteGoodness of fitDirected graphStandard deviationSymbol tableTwitterSoftware testingDefault (computer science)Query languageCovering spaceMeeting/InterviewComputer animationSource codeXML
Metropolitan area networkArmNewton's law of universal gravitationMaxima and minimaOvalAngleUniformer RaumLine (geometry)Gamma functionValue-added networkGrand Unified TheoryDuality (mathematics)Haar measureHand fanInternet forumDemo (music)File archiverTable (information)CausalityDirectory serviceDifferential (mechanical device)Arc (geometry)Computer fileMultiplication signSingle-precision floating-point formatInformationQuicksortError messageSoftware repositorySpacetimeRevision controlMultiplicationMixed realityReading (process)Functional (mathematics)DatabaseData storage deviceComputer programmingPoint (geometry)Right anglePhysical systemScripting languageData structureMereologyServer (computing)Line (geometry)Demo (music)Type theoryMessage passingPlanningCASE <Informatik>Maxima and minimaIncidence algebraSet (mathematics)MiniDiscBit rateOperator (mathematics)WordDifferent (Kate Ryan album)Product (business)Thread (computing)Rollback (data management)Computer configurationGoodness of fitMathematical analysisRepository (publishing)Function (mathematics)Ocean currentData recoverySystem administratorSoftware testingInversion (music)Partition (number theory)BackupTheoryElectronic mailing listTimestamp1 (number)SummierbarkeitDefault (computer science)Object (grammar)StapeldateiExistential quantificationMoving averageCore dumpConfiguration spaceCompilation albumComputer animation
Table (information)Extension (kinesiology)SpacetimeMereologyBackupComputer configurationData compressionField (computer science)Medical imagingMultiplication tablePhysical systemOSI modelComputer fileQuicksortThread (computing)SoftwareHeat transferMultiplication signNumbering schemeCASE <Informatik>Revision controlMultiplicationRight angleDatabaseWindowData recoveryFlow separationReading (process)Perspective (visual)Differential (mechanical device)InternetworkingCopula (linguistics)Utility softwareMathematicsInteractive televisionParticle systemOraclePersonal digital assistantAreaMaxima and minimaPoint (geometry)Game theoryData structureStandard deviationComputer animation
Chi-squared distributionData recoveryWeb pageFile archiverBitStreaming mediaDirection (geometry)Kernel (computing)Computer fileValidity (statistics)Functional (mathematics)File systemBackupMatching (graph theory)Multiplication signCASE <Informatik>Right angleDefault (computer science)Point (geometry)MiniDiscDatabaseDirectory serviceOcean currentData managementError messageTrailBlock (periodic table)Replication (computing)Goodness of fitSoftware repositoryComputer configurationInformationPhysical systemAddress spaceSet (mathematics)LogicPhysical lawFilm editingInformation retrievalGroup actionBit rateSelectivity (electronic)Presentation of a groupComputer animation
Metropolitan area networkHand fanReal numberGamma functionGrand Unified TheoryBinary fileValue-added networkRight angleProcess (computing)Thread (computing)Function (mathematics)Functional (mathematics)ForceMereologyParticle systemMathematicsElectronic mailing listDatabaseInformationTimestampCategory of beingPhysical systemRevision controlDirac delta functionFile formatFile archiverCodeSheaf (mathematics)Data compressionSubsetImplementationType theoryRemote procedure callOSI modelDifferenz <Mathematik>Communications protocolAbstractionWindowComputer animation
Transcript: English(auto-generated)
started. Can everyone hear me? I'll take that as a yes. Okay. My name is David Steele. Our topic today is heavy duty backup with PG Backrest. So PG Backrest is a new piece of backup software that you probably have not heard of before, and we're going to run through features and see how it works. I'm the senior data architect at
Crunchy Data Solutions. This is pretty recent. I've been working there since April, but I have been developing with Postgres since 1999. All right, so this is our agenda. Obviously at some point we're going to talk about Backrest, but the
first thing we're going to talk about is backup in general. Why backup? I want to make a case for that first. Then we'll talk about, once you've decided you do definitely need to back up, we'll talk about how to back up. Then talk about PG Backrest design, performance. There'll be a little philosophy, and then we will have a live demo. All right, so first, why back
up? First and foremost, hardware failure. No amount of redundancy can protect you from this, at least on Postgres. So if your master fails, then you're going to have some problems, and you want to make sure that you can recover that to standbys. You might actually have a multi-machine failure.
You might have, actually I'll cover corruption later. All these sorts of things can happen. You need to have backups, even better continuous backups. The next thing is replication. So when you're doing replication, of course everyone wants to set up streaming replication, but the thing that can happen is your replica will get far enough behind that it gets out of sync
with the master. The replica needs to be able to pull the wall segments from someplace, and that's when a wall archive comes in handy. The other thing is if you're bringing up a new replica rather than doing a base backup of the master, you can actually just recover your last backup from the backup server, bring it up, let it stream the wall that it needs, and then it
will sync up with the master and become a streaming replica, and this puts as little load on the master as possible. The next thing, of course, is corruption. You know, corruption can be caused by hardware or software. The trick with corruption is actually figuring out that it happened. So backups
will help you recover from corruption, but if you don't discover it until a year down the road, then you've got a bit of a problem. This obviously is made better by the page checksums in Postgres, but still currently there's still no system-wide way to look at an entire database and see whether you have corruption or not. Hopefully that's coming. If not, it may be coming in
backrest. The next thing, so these are the sorts of the things, you know, corruption and hardware failure, replication, these are day-to-day operations, or I mean things that can happen. The next thing is sort of an accident, right? So you dropped a table by accident, or you know you had an update script that dropped it, so it wasn't necessarily someone in
production just messing around, but you ran an update script, this table's gone, what do you do? Or somehow you've deleted your most important account, you had lots of fun cascades in there, so now all this data is gone, you need to be able to get it back. Backups can help you with this. You can bring up the
backup, you can export the data that you need, and then bring that back into your production database, and you're good to go. Replicas don't help you with this, because of course if you drop a table on the master that's replicated quickly. You might have replication delay, and that may help, or you know, you may not discover it till hours later that the table's gone, or the account's gone, or something like this, so backups are great for that.
Another thing backups can be good for is development. So this may not be, you may not be able to do this in all environments because of privacy issues, you may be government, or health, or whatever, but copies of production databases can make great development databases. Or staging, you know, you may
have a stage server where you bring over a copy of production, stage everything up, make sure that everything works, and then you follow the same procedure on production later. So being able to get exact copies of production onto another database server quickly is a good thing. Another thing is reporting.
So reporting obviously can be done on streaming replicas, but sometimes you need access to temp tables and other things like that that you can't do on a streaming replica. So one thing you can do is do a daily reporting server where you update the server at midnight every day, you bring it up as a normal master so that people can use temp tables and do writes and do other things
like that, and you refresh every day. If you can do, if you can stand to do reporting without the current day's data, a lot of people can. And the last thing is forensics. So sometimes there's data that was actually deleted on purpose, but you want to go back and look at it. You know, that it, for
whatever reason, I mean it could be something that, you know, something malicious you're trying to track down, it could just be something interesting, you went, oh, we got rid of that, we did that on purpose, but now we want it back. So you can use your backups for that, depending on how far back in time your backups are. Okay, so the next thing, of course, is how to backup. So
we'll talk about backups a little later, but I think everyone, if you're in this room, you probably think backups are important, or you're at least considering it. So the next question is how do you backup? The first thing, of course, is pgdump. Especially with small databases, this is what most people start with, it's very simple, it's very easy, doing
restores is very easy. pgdump has a couple of problems, though. One is it doesn't scale well. That was weird. Okay, so pgdump doesn't scale well. So
really, if your database goes beyond even, say, a gigabyte, then doing the restores can be quite painful. And if your database goes beyond 100 gigabytes, then it becomes insanely painful, because pgdump doesn't actually dump out indexes, so they have to be rebuilt when you bring the database
back in. The other problem with pgdump is that it's really, it represents only a point in time in your database. So if you, let's say, you're doing a daily pgdump at midnight, your server crashes at 10 p.m., everything's lost, so you restore that pgdump, well, you've
lost now 20 hours of data, or 22 hours of data, so that's a problem. There's no way to play forward and recapture all the changes that happen during the day. The other, and lastly, pgdump takes a lot of locks, so if you're doing partitioning or trigger writing or things like that, a lot of those things can't go on while pgdump
is running, because it'll take locks on just about everything in the database, and then you're stuck, you'll be holding on your partition creation until the end of the pgdump. If you've got a big database, that can be a real problem. The next thing, of course, is pgbasebackup. This is also built into Postgres. This is a pretty good tool, actually. It does a lot for you. It gets you
a backup. It copies all the wall archives for you. A couple problems here still, though. pgbasebackup always does a full backup. If you've got a very large database, this can be a problem for you. The other thing is that it's still not an archive management solution. So even though you get the wall segments you need to make the database consistent, it's not going to
provide you with the wall segments you need to play forward from there. So you still have to have some kind of management solution in place for that. The next thing you can do is manual backups. So call start backup on your own, copy the files across, and call stop backup. Or roll your
own Perl scripts around that. Again, there's a lot more complexity here than you might think, and you still have the question of what to do with your wall archive. Then, of course, you've got various third-party tools. Omnipitter, Barman, Wall-E. How many people in the room are using one of those three? Okay. How many people in the room are
using something they've rolled themselves? PGBaseBackup, yeah. Right. For PGBaseBackup, yeah. Well, for our tool.
Oh, for your tool for... Okay. Right. Okay. Yeah, I'm familiar with BART, actually. And then, okay, so is there anyone in the room who's actually using PGBackup at this point? All right. So, of course, that's your last option.
Sorry, I didn't actually list BART, because I was trying to list things that I know are free and open here, and I'm not sure about BART. Is it actually free and open source, or is it part of the EDB package? Okay. I can check up
on that. I'm happy to add it to the list, but I didn't think it was a free, open-source product, so that's why I listed these. And then, of course, the last one is PGBackups. This is a new way to back up, and we're gonna talk about that and see why I think it's better than the alternatives that are out there currently.
Okay, so the first thing. Almost everyone uses Rsync to do backup. Rsync is great. It's so easy, it's so convenient, it's beguiling. You want to use Rsync because it does 95% of the job for you. All you have to really do is call start backup and stop
backup in Rsync, and you're good to go. The problem is Rsync has a lot of limitations. So one of the goals with this originally was to get away from Rsync, get away from TAR, get away from all these tools that would limit what we can do in the future with PGBackrest. So let's look at some of the limitations. First of all, Rsync is single
threaded. This is probably the biggest problem. Of course, you could multi-thread Rsync by keeping track of everything yourself, but once you've done that, then you've done what PGBackrest has done. So why keep Rsync at all at that point? The other problem is it has this issue with the one-second timestamp resolution. So this
can, if you run two Rsyncs in close proximity, or you get really unlucky, you can actually end up in your incremental backup. Your second Rsync may actually miss files. This will happen if the file is modified in the same second as Rsync copies it originally. When it goes to...
You can use the checksums base. Sorry? You can use the checksums base. You can, you're right, exactly. You can use checksums, and that will work, always work of course, but with your 15 terabyte databases it's extremely painful. If your database is larger enough, you just have to go ahead and trust the timestamps, or you might as well
almost do a full backup. Once you've got to the point of checksumming the entire database, it makes incrementals less attractive, except for space savings, of course. Another thing about Rsync is no destination compression. This is a big deal, so when Rsync gets to the destination, it's not compressed.
Furthermore, if you want to do incrementals with Rsync using linkedest, the previous backup also has to be uncompressed, so you could obviously move the backup and then compress everything and go on, but then you can't do an incremental. The incremental requires the previous backup being compressed, so now you've got two uncompressed copies of your 15 terabyte database laying around, which is
unacceptable in a lot of cases. You could run it onto ZFX or something like that. You could do that. You could do that. Of course, that's one way to do it. Not everyone... One thing I started thinking about is a lot of Postgres installations are small ones. They're running on just a VM someplace. Not everyone is running
kind of big metal, and so I wanted a solution that sort of worked for everybody. So you can scale it all the way up. I have customers who are doing this, doing uncompressed backups to ZFS, and then they can bring those up as clusters on ZFS without actually doing any kind of restore. So there's lots of options, but I was trying to work from the simplest
thing and scale all the way up to the biggest installations. So anyway, pgBackrest doesn't... In this philosophy, it doesn't use rsync tar or any other tools of that type. It has its own protocol, which supports local and remote operation, and it solves the timestamp resolution issue. It turns out this is a fairly simple
thing to do, and I've been thinking of recommending it to the rsync guys. Basically, after you've built the remainder of the current second, and then start reading. There are some other ways to handle this that have been suggested to me as well, but I think that one's the simplest, and I've shown that it works.
So let's go through some of the features. So compressions and checks... Compression is performed and checksums are calculated in-stream. So I try not to do anything in place. So if I'm, you know, archiving a wall file or copying a file, whatever, I don't checksum it or do anything like that first. I actually just start copying the file.
Everything's done in-stream. Obviously, that's very efficient, and it also makes it very accurate. One thing you want to know is the size of the file can change while you're copying it, and it's good to know that. So I've got the correct checksum, the correct size, and I can store that on the remote. There's also asynchronous compression and transfer for wall archiving.
So if you have a system that's generating wall really quickly, you can actually offload that and do that separately, or you can, you know, synchronously push as well. It supports remote or local operation, and you don't have to do anything crazy with loopbacks,
you know, SSH loopbacks or anything. It will natively work locally. So if you want to back up to, say, an NFS mount, that's kosher. You could back up to a backup server. If you're doing remote operation, of course, that requires SSH to operate. It has threading for parallel compression and transfer. This is an important feature,
and this is one of the things that, of course, originally got us away from our sync, because now we can parallelize, and now you can dedicate as many cores to compression as you want, and, you know, those big backups can go a lot faster. There's full differential and incremental support. Yeah, go ahead.
Yeah. Oh, yeah. So basically, in essence, you just... So like rsync, backrest builds the entire manifest of the things it's going to copy right at the very beginning. And so then what all I do is I wait.
So let's say that happens at exactly, you know, 10 o'clock, 10 p.m. I wait after the manifest is built. Whatever second I'm currently in, I wait until the next second before I start copying. And what that means is that if the file is modified during that time,
I'll pick up those modifications, because I'm not going to start copying until the next second, and then the timestamp, you know, if it gets modified after that, of course, the timestamp will get updated. It may not get updated right then. Some file systems won't update the timestamps until in fsync, but I don't really care about that during that backup. The only thing I care about
is that that timestamp gets modified before the next backup. pgStartBackup does an fsync, so by the time, you know, you start the next backup, you should definitely have those timestamps on disk. Once the manifest is built, I never actually go look at any file metadata again. So like I said, I'm not concerned about that timestamp being updated. I just have to know that it's going to happen
sometime down the road. So yeah, differential and incremental support. A lot of people ask what differential is. Differential is just like incremental, but it's always off of the last full backup. They're a little more flexible because they can be expired on an easier schedule. Incrementals always depend on a long chain of backups,
so for some applications, it's better to use differentials than incrementals. Incrementals are good if you have just a huge day-over-day change and you really can't afford to accumulate that every day throughout the week between your fulls whenever the fulls happen. There are backup and archive expiration policies, so you can define how many full backups you want,
differential, you can define where archive will expire, whether you keep archive for all the full backups or just some of the full backups. It'll still keep archive to make the databases consistent even if you're expiring archive for the older backups. Backups are also resumable, so if you're halfway through a 15 terabyte backup
and it dies or you have to kill it or you have to bring the machine down, you can actually resume that backup. It'll recheck sum everything that's in the backup directory to make sure that it's kosher, and then it will continue. And of course, it will do those checksums in threads, so if you're running four threads or eight threads,
it'll actually do checksums across four or eight cores to make that as fast as possible. You can also do hard linking. This can be handy if you're doing uncompressed backups on ZFS. One thing we'll talk about in a minute, the backup structure looks like a consistent Postgres cluster,
so you can actually point Postgres directly at a backup and bring it up and it'll do recovery and it'll start running. Now, you wouldn't want to do that without taking a snapshot first, of course, because yeah, even with table spaces, yeah.
So what it does, it just rewrites, so it creates a base in it. I'm going to talk about this in a second, but this is handy for that kind of thing. It also works with Postgres 8.3 and above. I've just put in some experimental support for 9.5, but with some of the new recovery options, I don't really have those working yet, and the regression tests do a lot of recovery scenarios,
and so the regression tests are currently broken for 9.5, but by the release, of course, it will be working. So let's look at the backup structure. So this is where... So it's a really clear and simple structure. You have a base directory, which is the Postgres data directory,
and then you'll have a table space directory where any table spaces will go, and then there is a file called backup.manifest, which is a plain text file, which is human readable or machine readable, and it has information about all the files, checksums, timestamps, et cetera. So what I do when I do this is I rewrite the links
in pg table space to... Now I do them relative, so it's not... Well, that way you can move the backup directories around and they still continue to work. Postgres is perfectly happy starting up that way. It doesn't have any kind of issue. This isn't meant to be a production database.
It's a backup, but for customers that have very, very large databases that would be almost impossible to restore anywhere, it's great because you can actually just bring up the database in place. It'll do recovery. First you take a snapshot, of course. Take a snapshot, bring the database up in place, do recovery, and then you can do exports,
you can do forensics, you can do whatever you want at that point without having to go and make a big copy of it someplace, which is a problem. Yeah, so like I said, Postgres can be started directly in that backup directory if no compression is used. Another feature is you can actually tell Backrest to copy the archive logs that are needed
to make the backup consistent directly into pgxlog. So that way you don't even have to write a recovery.conf file or anything. If you want to do point-in-time recovery, of course, you're still going to have to do that. But if all you need is a consistent database, the xlogs you need will be right there. So very, very convenient.
That's an optional feature. It doesn't do that all the time because of course it can take up more space. So you don't necessarily want that. Okay, so let's look at some performance numbers. Because this is important, this is part of the reason why this was done, is to back up very, very large databases and do it quickly.
So this is a comparison I did with rsync. The two programs work slightly differently, so getting them working in the same mode was a little weird in some ways. The first example works quite well though. So in this case, we're doing one thread,
we're doing level three network compression. That's actually the default for backrest. rsync defaults to six, but I find that level three actually works very well because you get a lot of compression, but it's very fast. So if your destination is uncompressed, then the defaults work pretty well, but all this is configurable, of course.
gzip, gzip. Yeah, I went with gzip just because I wanted the backup directories just to be very accessible to people. And so if you are using compression, then everything's just in gzip format in the directory. So you can actually just do a recursive on gzip and bam, you've got your data back. So I figured there are some better
compression algorithms out there for speed, but I thought I'd just go with the old standby. This is something that might be made optional in the future, but for now it works pretty well. So here we can see pgbackrest does it. So this is a, I'm trying to remember the size of this database.
I think it was half gig or gig. I really should have written this down because now I've forgotten how big the DB was, but it was big enough to get some reasonable benchmarks off of. I think it was four or 500 meg or a gig. Anyway, so backrest did it in 141 seconds and rsync did it in 124.
So rsync was clearly faster. Backrest is written in Perl. So even though I'm using the zlib, you know, there's still buffer management, there's protocol management, all this kind of stuff. So unfortunately not as fast. But the next thing we end up with is multi-thread,
two threads. So the settings are the same. We're still doing L3 compression and we're doing no destination compression. Now we're able to do this in 84 seconds. So that's 1.5 times faster than with one rsync thread. And rsync of course is NA
because it won't do multi-threading. Then the next thing I did was one thread with network compression at L6 and destination compression at L6. And here backrest came in at 334.4 and rsync was 510. So you're saying, hey, rsync doesn't do compression. So how did you do this? Well, I compressed the files on the destination.
So do rsync and then do compression just to give you an idea of how much an advantage the in-stream compression gives you over compressing on the destination. And then the last thing was to run two threads and do the same thing,
network compression at destination compression at L6. And now we're at 2.93 times faster than one rsync thread. So this actually scales pretty well. These benchmarks were made on my laptop, which only has two cores. And of course, in this case, you've got the SSH processes that are running. You've got the compression and decompression.
So basically at this point when I went, and that worked pretty well, but when I went beyond two threads, performance just kind of went down and down and down. Whereas I tested a similar thing on a client's machine, much bigger machine, and I saw much better scaling in that direction. And I have clients that are running up to eight threads for big databases.
And you can keep eight cores busy doing compression if you run it that way. Yeah, well, there are two things. One is threading, and also the fact that the compression.
So if the destination compression is set to L6, I use that same compression for network, right? So basically what happens is the file is compressed on the source side, and then it goes across the network, and then it's stored. And that's it. There's no recompression on the other side. You actually just keep that compression stream and just store it to disk. That's it.
Check sums are done in stream. So you don't have to go and try to check sum that file at the end. And of course, check summing the file on disk before you copy it would be dangerous because that file could be changing while you're reading it. So, and also the size is calculated at the same time. So that file goes across and is stored. Now, if you have destination compression turned off,
then the default is to do level three compression. And then on the other side, it will uncompress it and put it on disk for you. But the checksums will still have been calculated in stream. And so all you need to do is just decompress and store. So it's just more efficient all along. So you can see even with one thread, you get some advantage with the in-stream compression.
And when you go to two threads, you just start multiplying that. You can see it's not quite a multiple. 1.5 times two should be 3.04, because there is, once you start doing multiple threads, now there's synchronization and messaging and all this kind of stuff being passed around. So you get a slight reduction in performance,
but you can multiply that up very well. Sorry, say again? It's a big part because I'm doing my own compression,
I'm not using SSH compression. I can actually keep that compression on the destination side. I don't have to recompress. So that's the big advantage, is just you're compressing one time, you're taking checksums one time, you're doing everything one time, and then that's it. If you want to decompress, that's fine. So you'll have compression and decompression,
but the usual case is to keep it compressed on the destination side. And that's where the real benefit is, just not doing things multiple times. Yeah, exactly. And we've seen this in production environments
where if you actually do try to, say, parallelize without compressing, you can flood a network pretty quickly, especially when you have a 32 shard cluster doing backup and pushing over a network, which was originally one gigabit, so that was a problem. Even a 10 gigabit network
can get pretty easily flooded by that big a backup. So let's talk just a couple minutes about living backups. So we talked earlier about why to back up, why you need to back up. I also have this sort of philosophy of living backups, too. It's extremely important that your backups work
when you need them. And one of the things you can do to make sure that's true is kind of subscribe to this philosophy of find a way to use your backups, right? Don't just have them something that's just part of a DR plan. You want to find a daily way to use them. So for instance, syncing, creating new replicas, offline reporting, offline data archiving. So instead of dumping May CDRs from the production database,
you could actually dump them from a backup. Development, all these sorts of things. Because in my opinion, at least, unused code paths will not work when you need them. And also, if people aren't using the backups on a daily basis, they'll be unfamiliar with the procedure. You want people to be familiar with the backup tools,
familiar with the restore tools, have this be just a normal thing that they deal with. So when something big happens, everyone knows how to deal with it, and you've got documented procedures in place, and you know your backups will work, because you're using them every day. If they don't work, you know it. You get alarms. And then, of course, there are things to do,
regularly scheduled failover, instead of doing it only when you have a disaster situation, to just test these techniques and make sure that everything actually works. Because if you don't do this, when you actually go to use your backups, you may find that the disk got unmounted somehow,
or you think you should have been getting alarms, but you haven't been getting alarms because some account was messed up, or monitoring was messed up, or all these sorts of things. So it's use it or lose it here, right? You've got to use these backups, or you could take the chance of losing your data. So the thing to do is to find good ways to use your backups.
If you can do that, make it part of the life cycle of your system, then when things do go wrong, you'll know what to do. Sorry, this isn't really relevant. I just really love this picture, so I try to incorporate that whenever I can. All right, so let's do a demo. Demos are fun.
Now, I like to script my demos. So this is a real demo. So we're actually going to go through and we're doing real backups and restores.
But what I like to do is write a Perl program that actually goes through all the steps. That way I don't have to stand up here and try to type and mistype and do things poorly. But if you go to, at the end, I'll have a link, if you go to the GitHub site, you can actually get to this Perl program, and there's also a sample output there as well, demo.out. So that's the program run on my machine.
So if you just want to see the commands, you can run through that and take a look. So it prints out all the commands and the output of all the commands that we're going to run. So the first thing we're going to do here is create a cluster. So we can do some testing. We're going to create a backrest.conf file.
I didn't select enough. So you can run backrest entirely from the command line, but it's a lot nicer if you create a configuration file for the things, your repository location and the location of your database, psql if you're running on a special port,
all that kind of stuff. That way you don't have to retype this stuff at the command line every time. But if you're using Chef or something like that, that's going to write everything for you, then maybe that makes sense. Although it can make recovery kind of painful because recovery is often a manual operation. You want that to be as simple as possible. And the last thing we do here is create a repo directory.
Backrest has a default, var lib backrest, but it won't create that for you. You do have to create the repo directory, and if it's not in the standard location, then you'll have to tell backrest where it lives. All right, so let's do a backup.
There it is. PG backrest stands equals main type equals full backup. Simple as that. So now we've made a backup. We can see it here. So we've got our first full backup. We've got a backup.info file, which contains information about this backup and subsequent backups.
And then latest is just a link that points to the latest backup. Heike, sorry? It's start backup, stop backup, yeah, copy files. So it is a physical backup, not a logical backup. I'd like to add logical at some point
because logicals can be very handy as well, especially for large systems, but for now it's just a physical backup. So we can take a look at the size of the backup. So the size of the database was 51 meg. The size of our backup is 4.9. You shouldn't expect to generally get this kind
of compression, and obviously since I just created the cluster, almost all these files are zeroed out. So they're very, very small, and you get a very, very efficient backup. The next thing is this backup.info file. So there's a whole, I'm not gonna go through all this, but there's a whole bunch of information in here about this backup, the archive start and stop,
the actual size of the original database, its size in the repo, et cetera, et cetera. We'll see a nice form of this later. You can actually export all this data as JSON, yep.
Well, the threads are actually enabled just to copy files, just to check some and copy files. So the main part of the program and all the control stuff is done in a single master thread, the main process, and the threads are brought up too. So then what I do is I build a manifest. I segregate things by table space
so that threads can actually work on different table spaces. So if you have four table spaces and four threads, then each one is going to initially work on its own table space. As you start to run out of files, threads will bounce along and start working on other table spaces as well. Generally speaking, compression is your biggest bottleneck, not IO.
So you can actually have multiple threads running on a single table space, but you wouldn't want all eight of them on one table space, certainly not initially, yep. Yeah, let me,
I should have, maybe I should have pointed that out. It's right here in PGCTL. So in this case, what I've done is I've turned archive mode on, wall level equals archive, and then here's the archive command here. PG backrest stands the main archive push.
So backrest has full archive management built in. You're not SCPing files anywhere, copying them and having them picked up. Backrest takes the archive file from cradle to grave. If it's asynchronous, it'll be stored locally and pushed up later, or we're not doing asynchronous archiving here, but it's checksummed as soon as it's pulled off a disk,
and that checksum follows it for its entire lifetime, so you can verify that it's correct and et cetera. So yeah, so that's the setup. In the documentation, of course, too, it'll tell you the basic setup of backrest on Postgres. The good thing about this demo is it's also, I mean, it will show you every step
of setting this thing up and running it, including creating the Postgres cluster. So we're pretty much through here. I mean, there's a lot of information about your database here. So if you, say, misconfigured a system and you start doing a backup from system B to system A's repo, backrest will fail. It'll tell you. It'll say, hey, you know,
the system IDs don't match here. The database version IDs don't match. Something is horribly wrong. It'll also do that for wall. It reads the wall headers for versions 8.3 through 9.5, and it will tell you if you're archiving to the wrong repo. So it's just a handy thing. You can fat-finger configurations, or you copy a file from one place to another, and suddenly you're mixing archive log,
and that would be very bad. Yep? Can you compare this to another database? I haven't directly, and the reason is,
well, I try not to be controversial to some extent. But also, you know, Barman is actually based on rsync, so I think by comparing to rsync, I've compared to Barman pretty easily. You know, the same thing is true. I could definitely do some comparisons with base backup, although I think it'd be pretty clear, you know, how that's going to work out, because base backup, of course, isn't threaded.
Base backup will at least compress and stream, so that's good, and leave you with a compressed copy. So that's something, but I think you're going to find that the performance would be very similar to single-threaded backrest performance. But I should do some more benchmarks. I just haven't been directly comparing. Also, like, WALI is a very special use case,
because it's compressing locally, and then pushing to S3. So I know it's really hard to compare those two. They're different use cases. Until I have that S3 support, it would be kind of disingenuous to try to make a comparison there. Of course, backrest will be faster, but that doesn't mean that WALI is not, doesn't have its own value.
Yeah, it definitely can. So one of the things I'll be working on this summer is backups from standbys, which is relatively easy to do. It's just more of a configuration problem
than anything else, because now backrest has to be aware of your standby, and some other things like that. Generally speaking, like I said, the whole backup process is very CPU bound. So IO just doesn't tend to be as much of a problem. If you've got a machine with 32 cores, then on the weekends, hopefully you can spare some of those to help you do the full backup.
But the thing about backup is backup can be slow. If you're doing one full backup a week, and maybe a couple of incrementals or a couple differentials, I mean, you can really afford to spend two days creating a full backup, right? So what really needs to be fast is restore. And in that case,
you should be able to dedicate the resources to, so now I'm going to do a restore. I've got the compressed data someplace else. It'll come across the network compressed, and it'll be uncompressed on the destination machine. So now I might say, well, hey, I'm going to dedicate eight cores to this, you know, to getting these checksums done and get this across. You can have massive performance improvements
in restore with parallelism. With backup, it's more of a convenience thing. How long do you want to spend backing up? If you've got enough cores to support it, and you've got enough IO to support it, if you've got a lot of disk sets, a lot of table spaces, then, you know, one, you know, I work on systems that, you know, have minimum of eight table spaces. So if you're doing eight cores on that,
each core is working on its own table space, and you've got one sequential read coming off of a single table space, which isn't that bad, as long as you have IO isolation, of course. This is the, let's see how we're doing on time, pretty good. This is the archive directory. So you can see the archive files,
all segments have been copied over. Each one has a SHA-1 checksum attached to it, which, like I said, stays with it forever, so that you can verify that the, and when the archive files are actually, if they're requested, when they're copied back and decompressed, they're actually checksummed on the way. So if there's something wrong with the archive file,
you're going to find out about it before, if it's been corrupted in place in the repo, you'll get an error from backrest about that, rather than having Postgres potentially, I mean, Postgres would also detect it, because the checksums in wall would be bad. And it would know it, and there would be a problem, but you'll find out in advance. And here's the same sort of thing before,
you've got this information file, which tells you all kinds of things about the wall archive, and make sure that you don't mix wall between multiple versions. Let's see here. Oh, and also you can see that wall is stored in its own version directory. And this means I haven't written these functions yet,
so right now, previously in Postgres, if you wanted to upgrade to a new version, you would actually create a new stanza, and then start backing up there. Now you're going to be able to just issue an upgrade command to backrest, once you've done the upgrade, it'll read the new information, it'll store that, and then it'll start accepting wall from that new database.
That way you don't have to, you can still do expiration across multiple versions, you can pin the last version, do some things like that. So these are things that are on the way. Now we're going to do a differential backup. So we can see now that we've got a new backup type.
So it's based on the full backup, but it ends with a D. So it's a differential backup. Latest has been updated to that. And now we can see we've got more archive. Now, if you might recall, our previous backup was 4.9 meg. Adding the differential only brings us up to 5.1. So this is good. This is where incremental really starts to save you.
And if you've got particularly very large databases that with a lot of partitioning, where you're creating new partitions every day, but you're not modifying the old ones, this can just be a lifesaver. It incredibly reduces the size of your backup. For very large data, incremental is an absolute must.
I have an open issue to allow you to specify checksums. Right now, backups are based on timestamp. Restores are always based on checksum, unless you force them off. The theory being here, when you've got a kind of a normally operating database
that checksums are good and you can trust them. If you're doing a restore, by definition, something's kind of gone wrong and you may not know what and you may not be able to trust your timestamps. So in a restore, checksums are always used. The default for restore is Backrest expects the restore directory to be empty. Right? That's the default. But there's a dash dash delta option
you can use where it'll say, okay, I'm going to check some what you have, compare it to the manifest, and only copy what I need to. And this can be very efficient with multiple threads because you can check some on multiple threads and then you pull from the repo only what you want. So I've seen examples of the one customer with a relatively small database, 30 gig,
but with four threads, they can do restores in under a minute from the NFS mount because you look, most of the database is the same, you copy what you need, whereas a full restore takes about six minutes. So, but that's on one thread actually. So, where are we?
Oh yeah, so, all right. So now it's release time, right? We're going to do a release. Before this, we decide to do an incremental backup. That way, if we need to do some kind of restore, we don't have to replay a lot of wall segments. If you generate a lot of wall, you'll want to go ahead and do an incremental here. And to show that we, where we are,
we're going to insert a message in our test tables before release. So we've inserted this message before release, we create a restore point, we do the release, and then we update the message to say after release. So this would be a version table or something.
My database always have a version table. So after the release, that version would be updated and you know that you're on the new version of the DB. And now, so QA says, okay, the release is no good, please roll back. So now we're going to call restore and immediately get an error because backrest won't try to attempt to do a restore
while Postgres is running. So we're going to need to stop the database and try again. So now we do our restore, we start the cluster and we check for the message. So we did a, let's actually look at the restore. So we did stanza main, type equals name,
target equals release, dash dash delta. All right, so this is a delta restore based on point in time recovery to that name. And we can see backrest actually writes the recovery comp file for you so that you don't have to do that because that's all the information it needs to do it. Although you can override stuff
if you want on the command line. And now we can see we've gotten back to before the release. Excellent. But then we also, oh yeah, I forgot. It's part of the script. So we got back to where we were before the release. We bring things back up and then this very important update comes in, right?
Well, now QA actually says, you know what? We made a mistake. The release was fine. Just go back to that point after the release rather than trying to reapply everything and keep everyone up all night. We'll go back to that point in time. So we can do that by
right, so in this case we just do a plain restore. So this is a default restore which is going to take us to the end of the wall string. Right? And so now we'll be back to the where we were after the release and we can see the message here after release. Well, now, uh-oh. So what about that very important update?
Now that's gone. So now we're back to after the release where we wanted to be but we've lost that very important data that got inserted to the database. So now on this system or another system we'd probably do this on another system. So this is an example of lost data being found with a backup. So maybe on another system we do this instead. So now we're going to do a restore
but this time we're going to follow a different timeline. We're going to follow the timeline that was created after that first restore recovery that we did to get back before the release. So that was timeline two. So we're going to recover onto timeline two and voila, there it is. There's our very important update on timeline two. We can do this on another server.
You know, dump the table out and bring it into production. We could do this on production. Lots of options, right? But you have lots of options to get places. And like I said, Backrest takes care of writing all your config files for you and all this kind of stuff. So it's a very simple command log operation to get to any point that you want.
Here's some info. So if you use the info function it will give you kind of a summary of the current status of the backups, oldest backup, latest backup, stuff like that. This isn't really very interesting. This is a lot more interesting. If you do output equals JSON it'll give you a really, really comprehensive
set of data about your repository, sizes, which backups reference which, timestamps, you name it. All kinds of information. And this can be used to feed monitoring or some kind of admin program or you name it. There's a ton of information here.
And as I get more, I'll be adding it to the structure, of course. And afterwards we stop the cluster, clean up and the demo is complete. Go back to this.
Just, so that's all I got. Are there any questions? Sure, yeah. Yeah, if you, it actually,
so if there are multiple table spaces and you specify multiple threads Backrest will make its best attempt to balance them, put one thread. So if you've got four table spaces and four threads, it's gonna at least initially start by pointing each thread at a separate table space to do its reading and compression. As you work your way down,
let's say one table space, if they're not symmetric, then you're gonna start running into issues where it's gonna run down and multiple threads are gonna end up on the same table space. I've generally found from an IO perspective you can still get away with several threads on one table space without, it depends on how busy your system is, but compression
is generally the bottleneck here, not read IO. Yeah, so yeah, what I want to do is add an option to, you know, sort of a max, you know, per table space sort of option. So if you have two table space, you know, if it gets to that point,
it would only maybe run two threads on that last table space or something like that. I'm kind of thinking about setting some maximums, especially if you're doing eight and it all devolves down to one table space, then that can end up getting a little bit hairy.
Yeah, and that's why I haven't, it hasn't been a big deal for me yet because I haven't seen any problems with it in the field.
We're always just massively CPU limited on the backups and it just, it ends up not being that much of a problem. Josh? As well, yeah. I've seen that,
but it's when you're doing your transfer over SSH, it's still using SSH compression on the network though. Barman? If you set it up that way. If you set it up that way, okay. Yeah, I mean, we've seen network as the bottleneck as well.
Like I said, that 32 shard cluster I was talking about could saturate the network, but usually it's CPU. A good scheme is to use an incrementally updated image and essentially you find some kind of recovery window
between the time you want the image copy to stay. It relies on you taking, making the image copy to start with and a lot of that because that's John Manley. Then it takes these incrementals and applies them to the image copy. So what that means is you never
have to do the pull back of what you can use database to keep updating it. Then the image copy basically stays as a stable full backup. Is it possible to take those incrementals that you have here and do that kind of thing? Could it be extended to do that? You absolutely could,
but it kind of goes against the philosophy, if you will, of full backups because the idea of a full backup is that every once in a while you're just going to go back in, copy all the data again so you're sure that it was right. You could definitely roll things along, you know, backups to another or you could do, you know. Well, I'm talking the reason
for the obvious thing you might want to do. Right, right. What I would probably do in that case is do full backup,
then maybe do weekly differentials and then hang daily incrementals off of those differentials. Yeah, that's not the I understand that's kind of in the room. I was just saying there was a way to maintain the image copy because the obvious only advantage of the image is switching in and covering.
Yeah, you could do that. The next step that I really would like to do is do sub file incrementals based on checksums. Because right now if a file is changed I'm copying the whole file and this can be painful for one gig extent. So I'm going to be using checksums to actually copy parts of files.
Exactly, exactly. I mean files are already broken up into one gig segment so you get some advantage unlike Oracle tablespaces which can be quite massive. Yep, I keep first.
Yeah, they're actually so they're calculated in the protocol layer and they're actually SHA-1 checksums. So I'm not actually What do you checksum? I'm just checksumming the entire file as it comes across. So this isn't related to the wall checksums or the data file checksums.
Those aren't always available. Obviously I have support all the way back to 8.3. So for the main backrest checksums I wanted to do something that would be compatible across all versions of Postgres. But I would like to do add support for 9.3 and greater databases to if you've got checksums turned on do more intelligent. I could still obviously
checksum the blocks myself and make that compatible backwards but I'd like to use what Postgres has there and also one nice thing is you could tell backrest to basically check all the checksums in your database when it does a backup to make sure if you're doing a full backup anyway and you've got all that data in your hands seems like a good opportunity to checksum your tables and see that the checksums match.
That way if they don't the backup will still succeed if they don't match but you'll get some alarm bells going hey you know I found bad checksums they weren't in the last database you know the last backup you might want to think about doing some you know a restore here and some recovery.
Right that's exactly yeah that's exactly what we're it's kind of an experiment we want to do first because obviously the pages are 8k so we're worried that the checksums may not actually match up with the page and you know for things
that are actively being written but at the same time because we're on a live system we're not sure if the kernel will present us with a torn page or not so it a little bit of experimentation and research is required there but it's definitely a direction we're interested in going. Yes sir.
Well the backrest will actually write the recovery conf file for you so wall of course is stored in an archive off with the backups so the recovery command will actually retrieve the wall from the archive for you for postgres. You're also welcome to write your own recovery conf file you can there's an option to preserve the current
recovery conf file so if you've already got a complicated one in place and you're doing a restore you can just preserve the current one you may have your wall on s3 you might have so backrest doesn't assume that your wall is going to be managed by backrest it may not be it may be someplace else maybe only your backups are managed by backrest but in the default case
backrest will actually automatically retrieve the wall for you from the wall archive which is stored beside the backups or wherever you tell it to be stored it could be stored in a separate repo but generally it's stored in the same repo as the backups
yeah the archive command is always in postgres so backrest will continue to archive even when a backup isn't running it doesn't it's not going to do it
just during the backup it's always going to be keeping track of your wall for you whether a backup is running or not because the archive command that we saw at the beginning is given to postgres you know the backrest archive command so it's going to be continuously archiving no matter what is going on every time a wall is pushed from postgres it'll archive it
yeah exactly so you know if you're i see people who don't have archiving solutions in place they set up streaming replication they're like oh well i'll just stream it'll be fine and and it's not
it gets desynchronized and then that's it they have to do a base backup or they have to do something drastic if you if you have good archive management in place you don't run into that problem you can just
well this this is what i'm saying in the docs i actually address this but if you're if you're doing this on zfs you must take a snapshot um well if you if you so let's say you did that let's say you brought up a cluster
on an incremental without doing a um a restore you're going to corrupt not only the incremental but the previous full different if you've got hard linking turned on like the whole set's going to be gone so yeah that's a pretty stupid thing to do there's no way for me to prevent that because i don't have hooks into the file system what will happen though is if you did a recovery
on that backup it would not work um it would say no the checksums don't match up these aren't the correct files something went wrong and so you won't you would not get an inconsistent recovery from that situation but you can certainly go in you can go into the backups directory and just delete files there's there's nothing i can do to prevent that the only thing i can do is when you actually go to do a recovery
it'll tell you no this is no good sorry i can't use this backup you gotta you know you gotta pick something else i will also be working on a validate function to just go and validate a backup you know it's just offline say i just just tell me that the backup is still good before i do a recovery that's a really simple thing it just kind of you know so many features
a little time right because i i'm going to read those files off it because every time something is transferred is checksummed so when i read the backup off of disk you know read that backup i'm going to transfer the file across
and while it's coming across and is decompressed on the database side it's going to get checksummed at that point well i also have the checksums in the manifest you know that i originally wrote into that manifest file so if those two things don't match that's it you know recovery is is failed and you'll get an error back josh did you have a question yeah um
uh yeah that's sorry that's what that um that's what that last thing was that information function so that gives you your unfortunately it's gone now let me
Let me just look at demo.out, harder than it should be. So yeah, here at the end. So if you run the info function with output to JSON,
this will give you the exhaustive list of. So here's the stanza. Main gives you status code, OK, da, da, da. It gives you information about the database in that stanza. And then in this backup list is an array of all the backups with all the information, start, stop.
For archive, the version of backrest that did the backup, the database ID, which you can reference in that database section, information about the size, deltas, labels, the label of the backup, the prior backup, any references to previous backups,
the timestamp of the start and stop, and the type. So this is a differential, and you can follow this down. And so here's the incremental. And the incremental, of course, has a, wait, that's a diff. That's an incremental. That's weird. I would expect to see that prior.
That doesn't look quite right. Well, anyways, yeah. Which? Well, currently, I mean, it should run on any flavor of Linux. Right now, the threading implementation
uses Perl iThreads, which some systems don't support, like Gen 2 Linux doesn't compile with threads. I'm working on taking the threads out right now and replacing them with processes. I'm right in the middle of that. It requires changes to the protocol layer, some abstractions there, so I'm working on that. And then everything will be done in processes. I really already do that, because the remote is a process that I start on the other system
to do the protocol layer stuff. So basically, I'm going to have, instead, just have local remotes that do the compression and copying and get rid of threads altogether. And that should increase the compatibility a lot. But for async archiving, I'm also forking, which Windows doesn't like. So I'm going to have to find a way to just do that
differently, do that as something that Windows will be happy with. I haven't really gotten any kind of Windows compatibility yet, so there's definitely that issue. Any other questions? I think we're pretty far over time. All right, great, well, thank you very much.