We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PostgreSQL Backup Strategies

00:00

Formal Metadata

Title
PostgreSQL Backup Strategies
Title of Series
Number of Parts
25
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceOttawa, Canada

Content Metadata

Subject Area
Genre
Abstract
Rumor has it all these other users and companies do this thing called "backup" on their database. And people keep saying it's a good idea. But what does it actually mean? In this talk we'll go through the different options we have for PostgreSQL backups, how to use them and some good tips and best practices for setting up a backup strategy that actually works when disaster strikes.
Vertex (graph theory)Replication (computing)Point cloudStorage area networkPlanningBackupInstallable File SystemComputer configurationCore dumpScripting languageDatabaseRegular graphFile formatFunction (mathematics)Object (grammar)Default (computer science)Cache (computing)Physical systemQuery languageData bufferStrategy gameData compressionBefehlsprozessorFlagDatabase transactionGroup actionParsingIdentical particlesComputer fileInstallation artInclusion mapDisintegrationCommunications protocolDirectory serviceFile archiverData storage deviceRadio-frequency identificationServer (computing)Reading (process)Image resolutionBlogInterrupt <Informatik>Euler anglesSoftware testingAutomationTwitterControl flowDatabase transactionQuicksortServer (computing)Communications protocolLevel (video gaming)State of matterData recoveryAsynchronous Transfer ModeRight anglePoint (geometry)FreewareTwitterReplication (computing)Computer fileRevision controlFile archiverDifferent (Kate Ryan album)Process (computing)Slide ruleError messageAreaArithmetic meanMereologyCAN busTheoryPhysical systemSimilarity (geometry)MiniDiscDefault (computer science)Directory serviceOperator (mathematics)File formatComputer hardwareService (economics)2 (number)File systemDatabaseConnected spaceProduct (business)Object (grammar)MultiplicationFunctional (mathematics)Table (information)Information technology consultingBefehlsprozessorInstallation artStrategy gameClassical physicsBitElectric generatorInternet service providerOrder (biology)Software developerSoftware testingVirtual machineStorage area networkCore dumpGene clusterFront and back endsMulti-core processorData compressionQuery language1 (number)Online helpScripting languageParameter (computer programming)Operating systemRegular graphGroup actionMultiplication tableLoginEmailForm (programming)Absolute valueCASE <Informatik>DistanceStructural loadGastropod shellPhysicalismLimit (category theory)VelocityFormal verificationCrash (computing)Multiplication signQueue (abstract data type)Drop (liquid)Arithmetic progressionBootingFeedbackCache (computing)Channel capacityTraffic reportingNumberoutputMaxima and minimaHigh availabilitySoftware bugMarginal distributionSign (mathematics)SoftwareParallel portEquivalence relationVirtual realityComputer configurationEncryptionSoftware maintenanceUniform boundedness principleText editorHand fanOpen sourceComputing platformData storage deviceComputer architectureFunction (mathematics)FehlererkennungFinite differenceFlagBackupReal-time operating systemData loggerBuffer solutionLocal ringNormal (geometry)RandomizationLogicExecution unitPort scannerOrder of magnitudeRelational databaseAutomationBlock (periodic table)Mathematical optimizationProfil (magazine)Subject indexingRule of inferenceBand matrixWave packetFilter <Stochastik>Sinc functionGoodness of fitKeyboard shortcutWritingReading (process)Client (computing)Substitute goodReal numberSingle-precision floating-point formatPoint cloudPlanningSelf-organizationDomain nameProper mapStandard deviationCartesian coordinate systemInterrupt <Informatik>Context awarenessKey (cryptography)Type theoryFitness functionIndependence (probability theory)Vector potentialLetterpress printingWage labourPerturbation theoryAeroelasticityDivisorLink (knot theory)WeightRoboticsPhysical lawThread (computing)Endliche ModelltheorieMoment (mathematics)System callEnterprise architectureGame controllerElectronic visual displayMaxwell's equationsCausalityCryptographyStapeldateiUniform resource locatorSensitivity analysisClosed setBlogTime zoneQuantumCompilerMetropolitan area networkSpectrum (functional analysis)Pairwise comparisonBasis <Mathematik>WebsitePerspective (visual)Focus (optics)Presentation of a groupDirected graphTablet computerThomas BayesChemical equationSpacetimeCategory of beingLatent heatBeta functionInsertion lossCoprocessorPattern languageSPARCHeegaard splittingWord10 (number)IntegerFrequencyHexagonSound effectSource codeForcing (mathematics)Message passingHydraulic jumpLattice (order)Partial derivativeConsistencyArrow of timeNatural numberParticle systemComputer animation
Transcript: English(auto-generated)
How low can I speak while you can still hear me in the back? Okay, I'm not going to try to get you to fall asleep, so I'll try to speak up a little bit. I'm sure it'll be fine. Welcome back from lunch, everyone. Are we going to count how many people fall asleep after lunch? Or do you just want me to not point you out if you do that?
Okay, I see people don't want me to point you out, so I will point you out. That's what I get out of this talk, right? Hopefully you can get something else out of this talk. So again, welcome back from after lunch. My name is Magnus Hagander. I work for a company called Redtail Impro. I work out of Stockholm in Sweden, normally, or in theory, every now and then.
I am a consultant working with Postgres and a couple of other products as well, so I travel to a lot of different places and do my work, but that's where my base is. Within Postgres, I'm part of the core team. I'm one of the committers on the database, but I'm not going to talk about any of that
stuff. I'm now going to focus on the stuff that is actually about using Postgres rather than building it. So if you're here in order to learn how Postgres backups work internally, sorry, you're in the wrong place, even though you're at PGCon. So we're going to talk about backup strategies, because we all need backups, right? We may think we don't need backups, but yeah, we all need backups.
So a typical thing when I talk to, unfortunately, in far too many cases, I talk to customers about the concept of backups when I meet them first, like, what are your backup policies and what are all of this stuff? But we don't need backups, we have replication. How many people think replication will save you?
Replication is awesome, but it is not a replacement for backup, no matter how many nodes you are replicating to, even if you're using the cloud. Actually, that probably makes it worse. But you come across quite a lot of places where people are literally saying, you know, we have five replicas in four different countries, so we don't need backups.
And you know, then someone actually runs this thing called an SQL query and does drop table. That replicates fast. That replicates much faster than any other data. You measure the replication delay, that's as close to zero as you can get. That tends to go real fast. So replication does not replace backups.
So then people talk, well, you know, we have clustering, right? We have failover, high availability clusters using these $100,000 products. It's the same thing. It doesn't protect you. These are both designed to protect you against hardware failure. They're not designed to protect you against application error. Postgres bugs, yes, we do have them.
Hopefully, fewer than many other products, but we do have them. Or user error. Like, there's no way that these high availability products will actually help you. Another classic we see is, you know, I bought a SAN, right? It comes with a 100% uptime guarantee, so you're safe.
It might even have, you know, built-in snapshots that run inside the same SAN with a 100% uptime guarantee. I mean, that's awesome, isn't it? How many people have ever run into that? I mean, what I say is like, really? Can that possibly work? Does the concept of a 100% guarantee exist? No, it doesn't.
It literally does not exist. I know there's a good news story. It's now a bit over a year ago, back home in Sweden, where one of the largest outsourcing providers was just generally outsourcing, you know, running, I guess we almost call it cloud now, but traditional machines in virtualized environments. They ran it for you. They had one of these beautiful SANs with a 100% uptime guarantee, multiple locations,
you know, every single bell and whistle there was. Guess how fast their corruption replicated? They lost four and a half thousand virtual machines, production systems for, you know, some of the most high-profile government organizations in Sweden, for example. They are, sadly enough, still relying on this, even though it didn't work.
So you need backups, right? That's sort of the first takeaway. There is no real substitute for proper backups. We combine them. We don't do just backups, but you do need the backups. And there's a lot of things that people do when you start talking about backups.
So the first thing you do is, well, you have a plan for backups. This is something that most organizations actually end up having, you know. You have some sort of, you know, what's my backup interval? How often am I going to take this thing? How long am I going to keep the backups around? Some people keep them forever. That eventually becomes expensive, but it's doable.
And you somehow plan for, what's your performance impact? Because backup is not free. It will cost you performance while you're taking your backups. These are part of the backup planning, and this is stuff that we need to do. But what's surprisingly common that people don't have is a restore plan. Backups are useless if you don't restore them.
Backup plans are mostly useless unless you actually plan for what's going to happen when you restore. It usually doesn't matter if it takes a full day to take your backup. As long as it doesn't, you know, prevent production from working, it doesn't matter if it takes a day. If it takes a day to restore, that's not going to be popular, depending, of course, on exactly what the system is.
But while you're restoring, your system is down. We all talk about hot backups, right? We've been talking about this for, I don't know, 15 years in the database world, or 20 years, where we're saying we can take the backups without affecting production. We're never going to have hot restore, because when we start our restore process, the system is already down. So we need it to come back up as quickly as possible.
So that is a much, much more important thing. And it's something that we do need to consider in our planning, which means we might need to combine multiple different solutions for solving different problems. Replication is excellent for restore speed. You just move your virtual IP over to the other machine, and you're done. Unfortunately, it doesn't really protect you.
Printing your full database on, you know, pieces of paper and have someone type it back in, yeah, that's kind of a backup. It takes forever to restore, though. It is also ridiculously expensive, even if you find cheap labor to do the typing. And quite error-prone. So let's take a bit of an overview about the options that we have in Postgres.
We actually have a lot of options. I'm not going to talk about all of them. But if we talk about the sort of core options we can split them in two in Postgres, we can work with something called logical backups. This is backups at sort of the SQL level of the database. This is what we typically refer to as dumps, which is why the tools used in
Postgres is pgdump and a couple of products around that one. This is what you'll typically find in a lot of sort of non-enterprise databases, if you will. This is the only thing that Postgres did prior to version 8.0. The other option we have is physical backups, which is we do the backups as some sort of a block or file system level
well beneath the level of SQL. Now, we can then break this one up into multiple different options as well. We can talk about the file system backups. We've got something called pgBaseBackup. We can do what we call manual-based backups. But they all group, and they all work the same. And they group into sort of these two main groups, which are logical and physical backups.
And the logical backups are really, again, if you've ever come from a non-Postgres database, if you've come from the open source world of non-Postgres databases, you will have done logical backups, because that's basically the only thing they do. What you get when you do this pgdump thing is you get a big, long SQL script that has all your schema.
It has all your create table, create column, create index, create function, create all the other multitude of objects that you can create in a Postgres database, and then it loads your data with a normal copy command, same as if you were just loading it from a CSV file on disk. There is no shortcut anywhere in what the logical
backups do. They are regular Postgres clients. So you can do that. You can just have a Perl script or Python script that does the same thing. Now, we built one for you that doesn't forget about some objects. So if you're taking backups, we do suggest that you use the built-in tools, and don't build your own. But there's nothing preventing you from doing that if you wanted to. It gives you great flexibility because, again,
it's at the SQL level. You all know that Postgres can do a lot of interesting things with SQL. Well, you can use that for your backups. You can do a lot of cool filtering. You can do a lot of good options with pgdump. But it's not necessarily the greatest one for performance. I'm sure many of you have tried it. Has anyone tried to pgdump a database bigger than 100
gigabytes? OK, a couple of people. Bigger than a terabyte. Bigger than five. Yeah. I am still sad for those of you who had to pgdump something that's bigger than a terabyte. 100 gigabyte is fine. Terabyte, that's when it's starting to get painful. Get up to multiple or tens of terabytes, it's just not
going to work all the way. But it gets you pretty far. And it is very simple to use. Again, pgdump is the main tool that we use to do these dumps. pgdump will dump one database. That is important to remember. If you have multiple databases on your system, you
need to run pgdump multiple times. It is sad when you run into people that forget this. Now they have a backup of one database, and they put the production data in another one. Yes, I've run into this. It uses a regular Postgres connection. What it does is it basically, when pgdump starts, it opens a transaction.
And then using the MVCC functionality in Postgres, that gives you a guaranteed consistent snapshot across your whole database. And while the backup is running, you can do anything else in the database, almost, while it's running. There are locks taken, so you can, for example, not drop the table while it's being backed up. Probably a good thing.
But you can do all your inserts, updates, and deletes, you can create new tables while the backup is running. They won't be in the backup, but you can create them. So you will get a backup that is consistent as of when pgdump was started. So if it takes you 10 hours to run pgdump, the backup will be 10 hours old when it finishes, but it
will be consistent. Likewise, if it takes you a week to run pgdump, these are the interesting things that happens when you get into these multi-terabyte databases. It will take a week to run it. And that's painful, but it will be consistent as of a week ago. pgdump is single-threaded.
It is single-threaded for now, because hopefully none of you have deployed 93-beta-1 in production. But with 93-beta-1, you can get parallel pgdump. So it will be in 93 that you can scale out across multiple CPUs. But other than that, it is single-threaded, meaning it will use a single connection to the database, meaning it
will use one of your 64-core database server will use one of them to do the backups. The other ones will just be idle, because that's why you paid for them. But it usually turns out that this is probably a good idea. I would say I might, on a 64-core machine, yeah, I'd probably use parallel pgdump. I'd probably use two or three processes, two or three
in parallelism. But the point is, if we actually have, say we have a 16-core machine, we run 16 cores on pgdump, who's going to run your application? pgdump will kill you in full parallel mode. Now, if you're running backups, you don't want it to do that. And again, it doesn't matter how long it takes.
If you're doing, say, an upgrade using dump restore, you just want it to run as fast as possible. That's when parallel pgdump is awesome. But for pure backups, it's usually not a problem that it's single-threaded, because it's usually not a problem that it takes time. Now, you can dump a number of different formats when you're using pgdump.
It's all very simple. It's selected by dash capital F, controls your format. Your backups should always, always, always be in custom formats, which is dash FC. There is no reason ever to use anything else for your backups. The default is not this. The default is plain.
That will just generate a text file that is a big SQL script. Now again, take this when it comes up to multiple hundreds of gigabytes and try to do anything interesting with this text file. We can go into editor wars, which editor is best, but none of them is going to like this file. None of them will work well with 100 gigabytes large text files.
They just won't. So custom format gives you the ability to do a lot of things. It gives you the ability to index and to inspect your backups while they're sitting over on your backup server. That's a good thing. So always use custom format. When you're using custom format, you also get compression by default. It's standard gzip style compression.
I think the level is five or something by default. Usually, database dumps tend to compress fairly well, given the output format. Even if you're storing pre-compressed binary data in your database, when copy reads it out, it gets escaped into hex, which we can then recompress. May not be the most efficient thing, but there's a
gain to be had from the compression. Now, pgdump supports dumping of separate objects. You can say dump just this table, dump just this schema, dump just these five tables. Lets you do this. When you're using pgdump for backups, you should never use these functions. This is a good tool.
We use it for many other things. But when you're doing backups, you need to backup the whole database. If you don't, you're not guaranteed it will restore. For example, say, well, I have 10 tables in my database. You can run one pgdump for each of those 10 tables manually. The problem is they each get a separate, different transactional context.
They will each be individually consistent with its own table, but say you have a foreign key between these tables. Someone might have modified your data between the two different pgdumps getting into the database. Suddenly, your backup doesn't restore anymore. So when you're running a backup, you should always use pgdump for the full database.
Now we can then filter things when we restore and say, I only want to restore this one table. But when we put them in the backup, everything should be in there. Now as I mentioned before, pgdump uses regular copy queries. It's a standard SQL thing.
The Postgres is not that fast at doing copy. It's actually pretty slow at doing copy. It's much faster than anything else we have, but it's not that fast. It uses a single back end. So a single TCP connection or Unix domain connection, a single processor core generating IO.
Now Postgres is smart enough with all these copies that it will not blow away your Postgres cache. Because obviously pgdump will have to read all your data. We're backing up your data. There's no way not to read it from disk at one point or another. Now Postgres has this feature which automatically detects when a session is running large sequential scans.
It's reading more data than it's ever going to fit in the cache, and it's just going to go straight through it. It will then confine that process, in this case pgdump or its copy, to a smaller part of the shared buffer cache and just have it evict its own buffers. So it doesn't affect the rest of the system. Now there's still the risk that it can ruin your
file system cache. That depends a lot on how your file system cache works. Usually we end up reading every block just once, so a good file system cache will not at least be ruined because they have similar kind of techniques to detect this type of behavior. But there is always, there will be an impact on your database caching when you run pgdump unless your whole
database fit in RAM. If the whole database is in the cache already, then we don't have to throw anything else out for us to get in. But there is a potential for this. And of course, not only do we read the data, we have to write this dump file somewhere. This is something that I see quite a lot of people who
just don't think about. And then they run pgdump on the database server and write the dump file to disk on the database server because that's the disk that's attached to the database server. Now not only did we just add IO to the same disk that our database is on, we changed what was nice sequential scans reading all the data to scan a little
bit, go over here, and write it. Now go back here and read it, and go over here and write. So you're turning your IO profile into something that's really bad. Now you can also, of course, run the dump on the local machine and store it to a different disk on the local machine. Now you're no longer generating this random IO, and
you're no longer fighting for the same IO bandwidth. But it's important to consider if you're writing it to the same disk, the writing of the dump file is probably going to cause you more performance impact than the reading of the database, even though, yes, it happens in a separate process. It's running on the outside, but it's all on the same hardware.
There is an impact there that we do need to think about. Now as I mentioned, pgdump does compression by default in custom mode. And this is actually something we can use in some sort of sneaky way. So compression, when we run it, is run and happens in pgdump. So you can run pgdump if we run it on the database server.
Obviously, the compression happens on the database server. If you run pgdump somewhere else on the network, and it connects to the database server, then compression happens somewhere else on the network. And the data that's transferred between Postgres and pgdump is uncompressed. That may or may not be what you intended.
Sometimes, network bandwidth is actually relatively cheap. You may have a dedicated network for your backups. And this may not at all be a problem. But it's something worth considering. Now the other use we can do for this compression is we can actually use it to throttle things. Typically, since we're using a single CPU, it tends to be the same for pretty much all the hardware that we
see today, that somewhere between compression level of three to five is where you get the maximum benefit. What that means is, well, you're going to end up, if you're running pgdump on the same machine, you end up with one pgdump using 100% of one CPU core. And then the back end that it's connected to using 100% of a different CPU core. You can't go faster than this.
Your backup will go as fast as possible. But is that what you needed? Is that what you wanted? If you decrease the compression, your load will become IOBOUND. Because you have spare CPU to compress more. You just decided not to do it. So you will create more IOBOUND and will be limited in speed by IOBOUND.
But if you increase the compression, things will be slower. Your backup will take a longer time. But it will generate less I0. And it will, in particular, generate less I0 per second. Does it actually matter if your backup takes an hour or an hour and a half to run? If it doesn't impact the application?
No, it doesn't. I quite often found that if you increase the compression, the backup runs slower. But the impact on the system is decreased significantly because pretty much all of our database servers have spare CPU capacity. And with high compression, we're paying in CPU capacity.
With low compression, we're paying with I0 capacity. We have CPU to spare. We don't have I0 to spare. That's the typical scenario. So if you're seeing a backup case where the backup is impacting your system too much, then try actually increasing the compression. It sounds counterintuitive. But it does quite often help.
Now, there is another thing that you can quite often do. So we said, in this case, compression happens in pgdump. What if I want to run pgdump somewhere else? Yeah, you can dump it over NFS. NFS, meh, not a big fan. What you want to do is run pgdump somewhere else and write the dump file over there. But now the compression happens in the wrong place.
How do we solve this? SSH. This is a pretty typical thing, right? SSH into your database server, run pgdump, maximum compression, custom format, log in, and just pipe, let the SSH tunneling work. This works really well. Does anyone spot the error in this command?
It's not actually an error that makes it not work. This works. But there is a very stupid mistake in this thing. Sorry? SSH does its own compression. And compression is slow. Compressing already compressed data is even slower.
And it makes it bigger. So if you're using this, you probably want to pass the compression equals no onto the SSH. Now, this default, whether it does its own compression, that depends on which platform you're on, whether it's on or off by default. It might be off by default, in which case this is not necessary. On the other hand, it never hurts to explicitly turn it off.
You might want to also say, well, it's also encrypted. Well, first of all, maybe that's a good thing. You're transferring your sensitive database backups. It might be a good thing that it's being backed up. But even so, encryption is actually much, much cheaper than compression, the way our CPUs and the way our machines work today. Compression is the expensive thing.
Encryption is cheap. But you can, of course, also tell SSH to use what's called a null encryption, which means it doesn't encrypt it. I seldom come around to actually doing that. And now we've got the same thing. Of course, with this dash capital Z9 means we're going to lock up one CPU core at the database server for 100%.
The client here is only processing this, a simple redirect. But we're sending the data off across the network immediately, which also has the advantage that we're not impacting the IO on the server by writing data there. So when we're doing a restore from pgdump, since we
said, if you're using this plain text format, it all just goes into PSQL or whatever. Has someone ever tried opening a 200 gigabyte SQL file in pgadmin query tool? It's fun. It uses a lot more than 200 gigabytes of RAM. It's not very efficient at that, so don't do that. We use pgrestore.
pgrestore works sort of, what it does is it takes this custom format and basically transparently converts it into this 200 gigabyte SQL file, except it doesn't do the whole thing at once. So it will then use a regular libpq connection, standard Postgres connection, run standard Postgres commands on it, and just restore all the data in there.
And this one does support both the full database restore, which is what we do when everything is broken and we need to restore from backups, and we can use partial database restore. We can restore a single table. We can restore multiple tables. What it guarantees is that if you're doing a full database restore, the data will restore. Because we had this consistent snapshot
when we took the backup. And as long as we're coming from the same one, you have the same sort of basic problem if you restore a single table. Well, if that table had foreign keys pointing somewhere else, yeah, we're not going to be able to restore it, because we'll still enforce primary and foreign keys. But if we have independent groups of tables or something
like that, we can restore these as a single unit. And it's pretty easy to use. Now, what this one does is it basically does a create table, then it does a copy of all your data, then it does create index, and then it adds your constraints. Does anyone think that's fast? These are like basically every single one of these operations are the ones we try to avoid because they're slow, and we're doing four of them.
This can be very slow for large databases. Now, how do we define very slow? When my restore takes multiple weeks, that is very slow. In fact, for a fair number of systems, if my restore takes two hours, that is too slow. It takes a long time to restore these once your database grows up to any level of size.
I'm sure many of you have experienced this much sooner than you hit a terabyte. Backing up a terabyte is fine. Restoring it is not fun. There are a couple of things that you can do to make restore run faster. You can use the dash one flag. This is the easy one. This one's been around forever. It basically runs the whole restore as a single
transaction. What pgrestore does is begin, then restore everything, then commit. The reasons why we like this, well, in general, we get fewer operations. That's not really a big win because our operations are so big anyway. The big wins are that we can enable some optimizations in
the system when we do this. We can, for example, write less data to the transaction log. This will only work if we're not doing replication or well-archiving. But if we don't do that, it can help us a lot of things. It also comes with the advantage that if the restore were to crash for some reason, we don't have a half-restore database. If you remember, we looked at what it does.
It does copy, create, index, add constraint. Well, if we're unlucky and it crashes, we do the copy. We don't create our unique indexes. Now we have a dangerous database. There's no protection in that unless you use dash one. Do you have a question? Yeah, well, yeah. It includes all the things. Yeah, it includes triggers. It includes your functions, rules.
No, it loads them after it's loaded the data. So it doesn't run the triggers during the restore. That could create a very strange restore, depending on what the trigger is. So they don't run during that. Yes? I'm getting there.
Now, the thing you can do if you really need this thing to run faster is parallel pgRestore. This is something that we've had since 8.4. So we were able to restore parallel long before we were able to back up parallel. It works like sort of your typical parallel commands, which you do dash j and give it a number. And it'll just run in that many parallel sessions.
So if you have a 32-core machine, you can run dash j 16, that'll give you 16 cores to Postgres and 16 cores to pgRestore. And sometimes that will actually run 16 times faster. That's still pretty good. Now, 16 times faster than a week is still not good enough, but it is much better.
Now, what it does is it still loads each one of your objects in a single session. So if you have one terabyte table and 500 really small tables, what it will do is start a connection for this one terabyte table, then another connection will finish all of those 500 small ones, and you're still waiting for the big one. We can't parallelize an individual table.
But usually, that's not what your database looks like. You have a number of different objects that are reasonably smart. And it tries to be smart. It knows in which order it has to load them to satisfy dependencies. But other than that, it tries to spread out the load not to get into that situation. It does a reasonably good job. Now, unfortunately, this one is not compatible with dash 1.
You can't use both of them. And in almost every case, if what you're looking for is performance, this one is better. Because this one, you can just tune up the number of processes until you hit the limit of what your disk system can do. Once you've hit the limit of your disk system, well, you're not getting any further. But you will get faster. Now, of course, if you do that, you lose the fact, the
integrity, so that if it crashes halfway through, you will have a half-restored database that you need to drop and restart your attempts. And it doesn't protect you against that. Yes? That n, a typical value if you're running the restore on the same machine, is half the number of cores you have.
Because then you give half the cores to Postgres, half to PG restore. You're running it on a different machine. It's the number of cores you have. If that doesn't kill your IOS system, then you have fusion IO, and then you're happy, because now it's fast. There are some other interesting things that you can do. You can turn off fsync. This is something we should never actually rea- I shouldn't
recommend you do this. We should never recommend anyone turn off fsync. It's dangerous. Things will die. Things will get corrupt, and you won't know until it's too late. On the other hand, it's very fast. And if we are doing a whole database restore, only if we're doing the full database restore, the fact is that if it
crashes at some point during the restore, we can just delete everything and start over. In that case, there's no real risk from fsync. The biggest risk from fsync is that you forget to turn it back on. And yes, I've run into this more than once. Actually, I've run into this more than once with the same customer, which is not a good sign.
So yeah, that does happen. Do not forget to turn it back on. After you've turned it back on, you need to flush. Obviously, you need to restart Postgres because it is that. You also need to flush the operating system caches. If you don't, you will get corruption. And you will get corruption that you won't detect. So how do you flush the OS caches, to be sure?
Well, the easiest thing really is reboot the server. If you reboot the server, you know it's flushed. There are ways that are completely different in all different operating systems. But the problem that I have with all of them is they don't give you feedback. For example, in Linux, there is something in Prox called Drop File System Caches, which you can tell the kernel, drop all your caches right now, and it'll
just go do it. But it won't tell you when it's done. You don't know when it's safe to start the database again. You can go in the server room and look at the disks and see when they stop flashing and hope that was it. That doesn't feel very safe to me. And usually, when we do turn off fsync for these restores, we may turn them off in order to save on the order
of hours. You can afford the time to reboot. It doesn't take that long to do that. Final note about this one. Don't forget this. And that's as you mentioned in the back. pgdump backs up a single database. There are objects in Postgres that are not inside databases. These are your users. These are your groups. These are your tablespace definitions.
You need to dump them as well. For this, we use pgdump all with the parameter dash g. That dumps only the global objects. If you don't do this, your backups will not restore. You have to manually go in and create the users, which is fine if you have two of them and not if you have hundreds of them.
This will always generate the plain text dump, but unless you actually have millions of users, that's not a problem. And if you have millions of users, you have a bigger problem. They should not be database users if there are millions of them. Please. No. Minus g means global objects only.
If you just do pgdump all, it will dump every single one of your databases in plain text mode plus the globals. Dash g says ignore that, do globals only. So moving on to the other options that we have, which is physical backups. Because we said logical was slow, right? Well, physical is slow, too.
I'm sorry. Everything is slow. We need faster machines. Now, physical backups in Postgres basically comes from this. Postgres stores its databases in files. As I'm sure you've seen, there's lots of them with very funny names. We can back up these files. It's pretty easy. No need to parse. No need to query anything. We don't need to read up.
We don't need to decompress and toast data. We don't need to do all of that stuff. So it's much, much faster. It also becomes architecture dependent. You cannot move a physical backup between two different architectures, say 32 and 64-bit, big and little-endian. That will not work. It will become Postgres version dependent.
You have to restore to the same version. Now, yes, that means the same major version. I think officially you have to restore it to the same major version and the same or later minor version. But in fairness, it works with an earlier one as well. But since you should always be on the latest minor version, that's not a problem. It is also dependent on your compiler flags.
If you are installing your database using a package, say an RPM or a Debian package or even the installers, those package maintainers make sure that they use the same flags so that this is compatible. It's a long time since that broke. It broke in the RPMs in like 8.2 or something when they changed into your datetimes.
And your paths must be identical. You need to restore it to the same place. That's kind of restrictive. But when you think about it, when you're doing backups, that's not necessarily a problem. Because usually if you want to restore, you want to restore onto equivalent hardware. You didn't suddenly switch when you're restoring. You didn't switch from Linux on Intel to Solaris on Spark.
You'd probably restore to something similar. So that's usually not a problem. What might be a problem is that it only supports full cluster backups. So you can't say take a backup of this one database out of my 100 databases. You have to back them all up as one step. Again, for backups, quite often that's OK. But really, it is a limitation.
It would be nice not to have that limitation, but we do. So the easiest way to do physical backups really still is offline backups. It is easy. How do we do it? Stop Postgres, take backups, start Postgres. Now, if you can do this, it's the easiest one.
You can back up your files exactly however you want. Because Postgres isn't running. Postgres doesn't care. It's just files. You can use tar, you can use copy, you can use file system snapshots, you can email them away, you can do whatever you want. Now, of course, in most scenarios, this is not an acceptable solution because of this tiny little step up here that says stop Postgres. That means stop your application.
How many people are in the luxury of saying, well, my application is really only used between 8 and 5? Nobody ever uses it in the evenings, right? Yeah. But if you can, if you have an application that's only used, I mean, I have customers who have applications that are only used for one hour a day, then don't underestimate the simplicity of this option.
Because it just works. Now, you cannot do this if Postgres is running. It may look like you could, but your backup will not work. Another very simple way of doing it is simple snapshot backups, because we're using file system or SAN-based snapshots while the database is running. Now, as long as your snapshot can guarantee that
this, or your snapshot system, whether it could be LVM, could be ZFS snapshots, could be your SAN, supports atomic snapshots across all Postgres data. That means all your tablespaces, including the pgxlog directory, which you are transactional.
If it can guarantee this, then you can just do the snapshot, and you're done. It will just work. Now, in most cases, that's going to be a problem, particularly if these things are on different file systems, you're down to enterprise, very, very expensive SANs to be able to do that. And it's one of those things, if they're not
exactly atomic, if they differ even on the microsecond level, you can bet it's going to work in your testing, and it's going to fail in production. And you won't know until you try to restore. Actually, you won't even know when you restore. You will know it a week after you restore, when you realize that there was corruption in the thing you restored. But it can be quite useful in small installations. And the advantage is you don't have to do anything.
It just works. But what you're usually going to end up doing is online base backups, as we call them. This is non-constrained file system level backups, meaning you can do whatever you want, anywhere you want, while we're running the backup. But we still do the file system level. We do this, and we recover it by using our transaction log.
So we combine these base backup with a transaction log with just the base backup, it doesn't restore. It needs the transaction log. We can do this with or without what's called log archiving. If we do it with the log archiving, this is the base for the point in time recovery, the ability to restore not just the backup, but any arbitrarily choosing point in time, which is where we want to end up going.
Now there are, of course, multiple different ways that we can do this. The easiest way to do an online base backup is to use the integrated base backup functionality. This is new as of Postgres 9.1, but you're all running, who's not running 9.1 or newer? Wow, oh, okay, there you go.
Well, now you got another reason to upgrade, right? This runs on top of the replication protocol because they're very similar. So in order to use this, you must enable replication. Even if you don't have any actual replication, you must configure Postgres to support replication. You do this by setting the parameter while level to archive, or hold standby, but at least archive.
And you set the value max while senders to at least two. In theory, you can set it to one, but always set it to at least two. And set it to 10. Keep a little bit of margin if you get network issues, or if you want to run multiple backups at the same time, or something like that. And then you run the simple command pgebase backup.
This simple command right here is the simplest way to do the full backup of Postgres. pgebase backup, give it a user, because it has to log in to the replication. This user must have replication privileges. Dash d tells it write it to this directory over here. Dash capital P gives you a progress report.
Unlike pgdump, this one can actually give you a progress report that tells you, no, I've done 25%. You can go have a coffee. I'm not going to be done any time soon. Dash x is the important one. If you omit your dash x, your backup will not be restorable without a log archive. But if you include dash x, pgebase backup will put
enough transaction log in your backups so that the backup is standalone restorable, that you do not need any log archiving. You do not need anything externally. It just works. In fact, you can dump this in two different formats. If you dump it in a tar format, it will
give you a tar file. If you run it in the default format, which is plain, it will actually reconstruct your data directory, in this case, in this backup directory right here. So after you've run this command, you can actually do pgctl, point to this directory, and say start. And it will start to Postgres in your backups. That's the format of the backups that you get, which
means they're completely standalone. They will just boot up, and you've got something running. That's pretty easy. Now, this will give you the one-off backup capability. This is the one you put in your cron job if you don't want to do anything complicated. This is the physical backup equivalence of pgdump.
If you want to go all the way, you need to use something called log archiving. It's a bit of a pain in the ass to set up, but once you have it, it becomes fantastically useful. How it works is that whenever Postgres finishes with a log file, and this is transaction log file, we're not talking about your system log that says this user logged in or whatever. It's the binary transaction log.
When it's finished a transaction log, it's sent to the log archive, this new concept over there. When we're restoring, we read it from the log archive. So basically, when we restore, we take this base backup, which was taken at a certain point in time, and then we read all our transaction log back from this log archive. We can roll forward through this, and the feature here
that we're looking for is the one called point in time recovery, which is that we can stop. So say we restore the backup from last night, and we start reading the log archives since last night, and we just keep going. If we keep going till the end, which is a second ago, then our database, as restored, will look like it did a second ago.
We can also tell it to stop at 9 o'clock this morning, which is when someone got back from the pub and did drop tables. That's why you should not log into your database if you get back from the pub at 9 o'clock in the morning. And if you do, you should make damn sure you have working point in time recovery. That's sort of the theory behind this.
The way it works in Postgres is, so like say it's both good and bad, it lets you do it any way you want. That's the good part. The bad part is it requires you to define what this way is, even in the simpler cases. What we do is we set a parameter called archiveMode equals on. This is one of those annoying parameters that requires a Postgres restart, because once you do
this, we will start up the logArchiver subsystem. We'll start a separate process that does this. Then you define a command that's archiveCommand that just says it's a shell command that you can define to anything you want. What Postgres will tell that command is take this file over here and archive it under this name. It's up to you to define what that means.
A typical example is you rsync it off to your archive server. You may copy it to an NFS mount. But you can do anything you want. You can send it via email. That used to be my go-to for showing that you can do anything you want. Then I thought of a better idea. You can post it to Twitter, split up. It works.
It works for a very short while before you get blocked by Twitter, because you're sending 16 megabyte chunks. That's quite a lot of tweets for one file. And they do detect that. But it works. And you can restore it. It's unusually complicated. I wouldn't recommend it. Another typical thing, if you actually want it to be in the cloud, use Amazon S3 or something like that. That's a very good way of doing it, unlike Twitter.
And the only thing that you promise, other than being able to do this and being able to tell Postgres if you succeeded or failed, is that you would also be able to define a restore command, which will do the other thing around. Postgres will just tell me, this file that you previously archived under this name, give it back to me and put it over here.
So if you rsync it away, well, then you're going to rsync it back. Copy it to an NFS server, copy it back. Email it off, I map it back. Post it to Twitter, just don't do that. But the only thing that you guarantee is you can get it back when it's being requested at this specific name. The rest is up to you.
There are some limitations in this that we've fortunately worked around a little bit more now. It always uses 16 megabyte segments, which means that if you have a database with very low velocity, you might not generate 16 megabytes of transactions in a day, but the ones you generate are very important.
For this, we have a parameter called archive timeout, which tells Postgres to, even if I didn't generate these 16 megabytes in, say, 30 minutes, send it anyway. The problem here is that we'll send the full 16 megabyte file, even if there's nothing in it. And it won't even be zero padded.
It will contain random junk, so it doesn't compress particularly well. So otherwise, you just set that to one second, and now you're done. That's not going to help you. Now, in Postgres up to 9.1, we solved this problem by replication. You have a replication slave. If you get a hardware crash, so you need to restore to anywhere within the past 10 minutes, you just use your
replication slave. If someone dropped the table, well, if you shut the server down, it will archive the file. So that's not the problem. But it still gives you count of anything, and it requires a replication slave. Maybe you didn't actually need replication for anything else. For this reason, as opposed to 9.2, we have a tool called pgReceiveXLog that you can run, that will
actually pretend to be a replication slave. It is a Postgres replication slave without Postgres. We just took the database out. So it will receive your transaction log in real time and rebuild the log archive. So you'll be running this on your log archiving server somewhere.
So now you really can't run it on Twitter. And it will rewrite the files, and it will generate them byte by byte instead of 16 megabytes by 16 megabytes. And that does solve the problem properly all the way. So in order to use this, you need a base backup again. Luckily, we can generate this base backup very simply.
We can use pgBaseBackup again, just remove the dash X. This is the default mode for pgBaseBackup. You just say, here, take my base backup, put it over here. Once you've done that, you've got something to work with for your log archive. It does not now at this point restore without the log archive. That's what the dash X was for. You have to choose.
Actually, you don't. You can still use dash X. Your backups will be unnecessarily large, but they will still work. You can also use what we call the manual method of doing base backups. This is the old way, as well, that you used in earlier versions of Postgres, which is you connect to the database, you say pgStartBackup. And then you copy all your files. And we don't care how you copy them.
You can use copy, rsync, snapshots, tar, whatever. The files will change while you back them up. That's not a problem. We'll be able to restore from whichever state you restore them in, or back them up in, as long as you don't miss them. And then you run pgStopBackup when you're done. Most common thing that people make a mistake with this one is they don't check the error code of every
single step. This is a multi-step operation. If any one of the steps fail, your backup is corrupt and not usable. So check the error codes. And really, use pgBaseBackup if you can. It will take care of things for you. It'll make sure things are restored and reset properly if something goes wrong.
Now, when we look at the system impact of pgBaseBackup, it's sort of similar in a way to pgDump. It has much lower impact. It will read all your data. Unlike pgDump, it will read the data from your indexes, because it just reads the file on disk. It generates a lot of IO, but it
generates sequential IO. So it's relatively cheap IO. It is single-threaded, so I like to say that's probably a good thing. Otherwise, we'd kill your IO really, really quickly, since we can read so much faster than copy can. We have the same sort of a copy thing with pgBaseBackup.
You can add compression. Compression happens in pgBaseBackup. We can use this to throttle this one as well. We're generating too much IO. Increase compression, it goes slower. But we have even less impact on the system in the form of locking while this thing is running. It doesn't lock anything. You can drop and create tables just fine while the
backup is running. So throttling it might well be a good idea there. When we look at restore performance of this, first of all, it will be orders of magnitude faster than pgRestore. Quite often, orders of magnitude faster. Suddenly, your terabyte database will restore in hours,
not weeks. That's a good thing. It will depend on the distance to baseBackup, because the restore has you restoring the full baseBackup, copying all those files, and then copying every transaction load file since that backup to the point you're restoring to. Now, depending on whether you have a database that is
large but with low velocity or small but with high velocity, either one of those can be the fastest. If you have a very small database that takes huge amounts of updates, restoring the baseBackup will be really quick. But then you have to load loads and loads of transaction logs, which may have just been updating the very same row, whereas you can have a huge database that's multiple
terabytes, you restore that, and then you had 100 megs of transaction log to replay. So that's going to depend on exactly where you are and what you've got. For that reason, you'll probably want multiple generations of baseBackups. Exactly how often you take is going to depend on the velocity of your database and the size of it. And of course, the SLAs that tell you how long you
are allowed to wait for the restore. The only real way to figure that out properly is to test, to try it. So I'm starting to run out of time. And I haven't actually started talking about backup strategies, which is what we said we would have. I've just talked about the options. Let me start with the most important of everything in your backup strategies.
Please make backups. Yes, as a Postgres consultant, I can make a lot of money by helping people who didn't make backups try to recover whatever is left of their database. But I don't like doing that. I like doing more constructive things. So please make backups. Don't forget about them. So how do you make these backups?
Well, you definitely want an online physical backup, really. Nobody does not want this today. Prior to PgBaseBackup, maybe you could avoid it because it was kind of annoying. And so PgBaseBackup, just do it. You want at least that. You almost certainly, if you have a mission critical database, if you have a lot of stuff in it, you almost
certainly want point-in-time recovery. Not just, let's restore backup from yesterday. Yeah, you do that late in the afternoon. You've had 1,000 people working eight hours on data in that database. You just threw that away. With point-in-time recovery, maybe you threw away five minutes instead of eight hours. You probably want that, almost certainly.
And then you probably want PgDump as well. Because really, this is backups, these whole let's do belts and suspenders and whatever else we can think of, if everything else fails kind of thing. If you can do PgDump, you probably want it. If you have a multi-terabyte database, you can't. It really doesn't work. It's not going to work for you. But if you have a smallish database, just run it.
Maybe not as often. Maybe you do a daily online physical backup and a weekly PgDump. These are also useful. For example, PgDump will actually read all your data through the SQL level. It'll help you detect corruption that may have happened from a hardware failure on your machine. It'll actually go through the data. It has other wins.
Another big thing you have to ask yourself is backup retention. And that's one of the difficult things, is you need to actually go to your business people and ask them what they need. The fact that if we have to restore a database as it looked a year ago, is there even any point to that? Is there anything interesting in the way it
looked that way? Is there a point in it? On the other hand, we might be dealing with, say, financial data or anything that's regulated where we need to be able to restore it to 10 years ago. That's actually going to be hard. Do use PgDump if you have that requirement, because you're otherwise going to have to restore it on a 10-year-old version of Postgres. And we don't support that.
So PgDump, if you actually need to restore something 10 years from now, that is a plain text format. Plain text formats have not changed in 10 years. We still have ASCII. We still have 8 bytes. We may have UTF-8 on top of it. I think we're going to have UTF-8 10 years from now. I hope we're not going to have Postgres 9.1 10 years from now, for your sake.
We should have done things. So it's all about what does it make sense, or does it make sense to actually be able to do with one second interval restore the database 10 years ago? Or is it OK to say, well, if it's 10 years ago, I can go down to monthly retention? You have to choose month, not second. So a lot of things there. Unfortunately, it comes back down to the business
requirements. It's not something that we as technical people can actually dictate. We can ask for business requirements, and then we can try to build something that at least pretends to do what they ask for, because quite often what they ask for is not doable, or very, very expensive. And then you tell them that, and they say make it cheaper. We need to deal with log file and base backups.
Remember that a restore requires the base backup plus all log files since the backup. It does not allow you to have any holes. Your log archive is really, really, really important. You lose a single file in your log archive, you get a break in what you can restore from. This is also a reason why we need to run base backup.
In theory, we could run one base backup and just have a huge log archive. It will take forever to restore, and if you lose a single file, your backups are gone. So we need some sort of a reasonable level of when we run our base backups. Might be on a daily level, might be on a weekly level. Another common thing is there's no point in keeping anything in the log archive older than the oldest base
backup you're keeping, because we'll always go forward, we can never go back. Backup and replication, you want both, probably. If you can only afford one, it will be backups. OK, backups solve a problem without replication that is more important than the one that replication solves without backups. But you really want both of them. Replication is excellent for recovering from hardware
failure, because you get up and running within seconds instead of minutes, hours, or days, with much, much shorter service interruption. You need backups as well, but that doesn't mean you don't need replication. Another useful thing that you can do around this to decrease your stuff here is using lag-behind replicas.
This is not something we support on the streaming replication protocol, but we do support it on the file-based replication, which is basically you create a delay in your replication system intentionally. Normally, we want to keep our replicas close to the master as possible, but say you create a replica that's running 12 hours behind. That means as long as you detect the error that
somebody did within 12 hours, what you can do is you can actually fast forward that replica, remove the delay, and fast forward it up until just before this problem without having to do a full restore. And we can do this easily if we use file-based replication in Postgres, because we just need to
define a command that looks, you can just look at the date of the file and just not copy it until you have to have a staging area that sits 12 hours behind. It's fairly simple to set up. It can be very useful if you need to be able to do quick restores on very large systems. Yes?
Oh yeah, absolutely. Absolutely. You'll want one that's up to date to deal with the hardware problem. And then you'll want a lagged one to deal with user problems, or if the hardware problem is it created corruption. So the next really, really important thing, please test your backups. How many people regularly test your database backups?
That's almost a fifth of the room. How many of you have ever tested your database backups? And that was because it didn't crash? So do test your backups. OK, we don't always test your backups. I actually have a customer, they run a fairly important Postgres database. They test their restore every day.
Every morning when the DBA gets in, he restores the database, and he runs test scripts and data verification to make sure it works. Now they have one database. You can't do that if you have 1,000 databases. That would be way too expensive. They have one database, and if this database crashes and doesn't restore, they don't have a company. There's nothing left.
So for them, that makes sense. For most people, that probably doesn't make sense. There's a big difference between that and testing once when you set it up and never again. It is a feature to test your backups
without doing a restore. I don't think there is anything in the queue for testing your backups without doing a restore, because why not just do a restore? I mean, you don't need an extra license for the
separate machine to restore in Postgres because you do an Oracle. Just boot up a virtual machine somewhere, restore it in there. Then you've actually tested the right thing. The other advantage of it is, what I like to do is also, a typical example is when you're using this for staging or dev, restore from backup.
Don't build a new one. Why? Because you also implicitly train your personnel. Now when this happens at 3 o'clock in the morning, you've actually done it before. You know how it's done. Which is also, yeah, your test should be restored, because then you know the whole process works, and you
know how to do it. Yeah, exactly. So when the developers want, we need a new update version of the database for development, a lot of people will then go take a copy from the master right away. I say, don't do that. Go to your backups, because now you tested your backups. And it came for free.
Exactly. At least two. Three, because you trained your people. As I say, well, don't automate. As in, you should automate, but you shouldn't automate everything. There should be the same manual process as there will be when you do a real restore. It needs to be there, again, so that you're used to doing
the right things. Well, I'm already significantly over time, but luckily there was a long break after me. So you got to stay. But thank you very much for showing up. If you have any questions, I think we'll let you out. But feel free to approach me here and ask questions after. And if you have interesting war stories from backups, I'm interested to hear them.
Yes, you're going to be able to get a copy of my slides. Instead of going to blog.hagender.net, you go to hagender.net slash talks. It's there already. The only difference is it'll say some other conference, but it's there.
Thank you.