Avoiding, Detecting, and Recovering From Data Corruption
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 32 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/52124 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
PGCon 202030 / 32
2
8
16
20
22
24
26
28
31
00:00
Computer animation
09:19
Computer animation
18:30
Computer animation
27:40
Computer animation
36:51
Computer animation
46:01
Meeting/Interview
53:27
Computer animation
Transcript: English(auto-generated)
00:10
Hi, my name is Robert Hoss and this is my talk on avoiding detecting and recovering from corruption for PgCon 2020 As many of you know, I am a longtime
00:24
contributor to PostgreSQL and PostgreSQL competitor and I work at enterprise DB where I've worked for about the last 10 years and I'm currently the chief database scientist and Today I'm going to talk about avoiding detecting and recovering from database corruption
00:45
So just to give you a brief overview of how the talk is going to go I'm going to start by trying to give a definition of corruption I'm going to talk a little bit about the three main causes of corruption I'm going to talk about some best practices for avoiding corruption and for detecting corruption
01:06
Then I'm going to talk about some of the possible signs that you have database corruption And finally, I'm going to say a few words about how to recover from corruption This is only a 45-minute talk and there's a lot that could be said about this topic
01:21
So I'm really not going to have to go time to go into as much detail as somebody might hope I Know also that there are probably a lot of people in this audience Who have had a lot of experience with this and may know more about some parts of it than I do or maybe even all
01:40
Parts of it than I do. So I'm certainly happy to hear other people's experiences and suggestions This is just what I've experienced in my own work at mostly at enterprise DB And a little bit from experiences that I had before joining enterprise DB Because Enterprise DB is a PostgreSQL support provider people tend to come to us when they have serious problems and database
02:06
Corruption is generally a serious problem. So the good thing about that is I actually reasonably often get a chance to See what's going on in cases where someone is having a problem with database corruption And so even though any individual PostgreSQL user probably isn't that likely to have corruption
02:27
Many of the people whose cases I get to hear about actually are people who have had a database corruption problem of one kind or another Which is what inspired me to write this talk. I
02:40
Think it's a little bit Difficult to give a great definition of database corruption, but I think I did my best to come up with something I think there are two main points that I want to emphasize here and the first one is that database corruption really has to do with
03:01
expectations if we store the number four into the database and Then later we try to retrieve the data from the database and instead of getting back the number four We get back the number five or seventy two or nineteen million then
03:21
for some reason our expectation has not been met what we Thought should happen and what actually did happen were different and You know, there could be a variety of reasons why that happens one is that our expectation might be mistaken We may be misunderstanding how the database is intended to function
03:43
a second possibility is that the database has a bug and a third possibility is that there is Database corruption and I think the way that we can distinguish between these scenarios is By saying that for it to be database corruption there has to have been a modification of the
04:06
contents of the database So that's why I defined it here by saying that the database Contents are altered in such a way that queries behave in a in an unexpected manner Relative to the SQL statements that have previously been executed
04:21
so You know if we insert some data and then we try to retrieve it back and there's nothing fancy going on like a trigger That should have modified the data or you know concurrent activity Which should have modified the data Then we really ought to get back the same data that we put into the database and if we don't That's corruption or if we get an error that's corruption or if the system outright crashes when it tries to read the data
04:46
That's corruption. And again, that's all assuming the problem is with the database contents rather than you know For example a bug in the database Generally corruption can happen in three ways
05:01
First of all, you might have bad hardware I think by far the most common example of this is a bad disk because probably many of us have Experienced the scenario where you put some data on the disk and then later when you go to try to access the data You don't get back what you stored or you get back errors That's pretty common
05:21
occasionally you also see bad memory where The memory that is used to store the data on a temporary basis on its way from your keyboard to the disk Actually changes the data, which it's not supposed to do It's supposed to remember the same data that you gave it And so you end up with something on disk that is not what you expected to end up with on disk
05:42
And that is also database corruption The second sort of way that corruption can happen is you can have bad software All of the software that's involved in making PostgreSQL work the way it's supposed to is very complicated You have not only PostgreSQL itself, but you have a file system
06:02
an operating system kernel a backup tool perhaps And maybe other things and all of these are complicated pieces of software and even a very simple piece of software Probably has some bugs and a more complicated piece of software is even more likely to have bugs
06:21
So that's definitely a possible cause of database corruption and you know that like bad disks is something that happens pretty regularly and Finally, there's user error Which I have found to be a very common cause of corrupted databases and one of the most common causes of this is faulty and backup and recovery procedures people who
06:44
Develop ways of backing up and restoring PostgreSQL databases that are just not safe that are not what is recommended by the documentation And that are not a good idea and in some ways this is probably partly the fault of our documentation which probably You know is not as good as it could be in terms of telling you what you really need to do in order to
07:06
End up with good backups and good restores, but there's also the problem that people don't necessarily Read that documentation. They don't necessarily follow what that documentation says so In any case
07:21
Very broadly speaking. Those are the possible causes of corruption hardware software and some kind of user error So, how do we detect corruption and how do we avoid corruption I've divided some best practices that I recommend into four areas backup and restore
07:41
Configuration storage and finally administration that I'll be going through each one of those areas and saying a little bit about it First of all backup and restore this is sort of the The short version of the the same backup talk that everyone gives when they talk about backup The most important thing about backups is that you have to take backups you have to take backups
08:06
regularly If you can take a backup then Or if you do take a backup and then later you have database corruption You can restore from your backup and get your data back and avoid having any kind of permanent data loss But in addition to that the very fact that you can take a backup means that all of your data is still readable
08:27
Something is still there that can be read it might not be the right thing it might not be the content that you were hoping to have but you have some content you have something in your database and That's a good thing to know
08:40
You should also make sure that you can restore your backups if you have backups that you've taken But you can't actually restore from those backups. They're not very useful to you at the end And you should not only go through the act of restoring them But make sure that the restored backups actually look okay that they contain the data that you expect them to contain that the data is accessible
09:02
And that everything generally looks like it's okay again If that should prove not to be the case Then your backup would not be very useful and all of the effort that you went into setting up your backup tool or your backup regime and All the storage space that you used for those backups would really be for nothing
09:23
And In order to do all of this well I think a really good idea is to use a good professionally written backup tool rather than a homegrown script It's been my experience that many post SQL users actually do backups by looking at the documentation
09:41
Following it more or less well and writing a script themselves that does the things that are listed in the documentation And that's not actually a very good idea in my experience because those scripts tend to be very simple and they tend to contain mistakes and sometimes they might be subtle mistakes like Not f-syncing files when you really should so that a concurrent operating system crash can cause a problem
10:04
But sometimes they're really obvious mistakes Many people seem to write homegrown backup scripts that contain no error checks whatsoever Or that do terribly unsafe things like removing the backup label file for no reason and those kinds of mistakes will definitely end in
10:20
database corruption If you use a professionally written backup tool, obviously That's no guarantee that there won't be bugs because as I said before pretty much all complex software contains bugs and backup tools definitely fall into the category of complex software, but Hopefully at least the person who wrote that software is more knowledgeable than you or as knowledgeable as you at least and
10:46
hopefully also because that software is an important part of their business that are going to be motivated to Find those bugs and fix them They're going to presumably have multiple customers who are using though that same backup tool and so it's more likely to be thoroughly debugged and
11:05
I think your chances are just better than if you write something yourself moving on to configuration There are basically three settings and the PostgreSQL configuration file Which you need to get right to avoid serious risks of data corruption and two of them are pretty easy
11:26
F-sync and full page writes both of them need to be turned off There are use cases for turning each of them on but they're pretty narrow and I think It's actually really hard to get right and really easy to mess up
11:40
If you have f-sync turned off and you have an operating system level crash You are very likely to end up with a corrupted database basically the only time you won't end up with a corrupted database in that situation is if the system had been idle for a considerable period of time before the crash Sometimes people turn f-sync off temporarily during their initial data loading
12:04
And then they don't actually turn it back on or they you know, they maybe they Modify PostgreSQL.conf so they think it's turned back on but they actually didn't restart the database server And so it's not really turned back on similarly with full page writes Some people who are running a copy on write file system
12:23
For example may think well my copy on write file system is going to prevent torn pages It will either write the entire page or none of the page So I don't need the protection that full page writes is documented to offer in my experience That's generally not the case and one of the big reasons is that PostgreSQL uses an 8k block size
12:45
Whereas the Linux kernel page cache uses a 4k block size so No matter what your file system does and no matter what your disks do you are not going to get Writes that are atomic and an increment larger than 4k and your PostgreSQL block size is going to be 8k
13:03
Unless you are using a very unusual configuration that that we really haven't tested very much and that isn't widely used by anybody so In practice, you just need these things to be on the wall sync method configuration parameter is also pretty important and
13:22
It's a little harder to set properly because unlike the previous two examples Default isn't necessarily safe And there's a lot more choices than just on and off on Mac OS X the default value is f-sync But in order to make it safe, you actually need to use the f-sync right through method
13:43
And on Windows, I don't actually have much personal experience on Windows, but what I have been told is that you need to use either f-sync or f-sync underscore right through or Disable write caching on your drive Generally a good practice here is to run the PG test f-sync utility, which is included
14:06
with PostgreSQL of debris PostgreSQL distribution And see whether the method that you've chosen is much much faster than some other method if it's a few percent faster Maybe even ten percent faster. That's probably fine If it's ten or a hundred or a thousand times faster than some other method
14:24
It's likely that the wall sync method that you've chosen doesn't really work on your platform And doesn't really guarantee that the data is durably on disk And then there is a risk of database corruption in the event of an operating system crash This is a kind of a disappointing area like you would really hope that
14:43
Hardware and software would be designed in such a way that if you said I need this data to be durably on disk It would definitely end up durably on disk. But in practice that's not really how things work. So You really have to be careful about this and make sure that you've got a safe setting
15:03
Another thing that you can do apart from PostgreSQL.conf is run with checksums enabled These are an optional feature and there is some small performance regression It's probably just a couple percent in many workloads But there are times when it can be 10% or even more depending on your workload
15:20
So if you're concerned about the performance impact you should test If your primary concern is corruption detection, then running with checksums enabled is a great idea It can't prevent your database from becoming corrupted, but it does make it a lot more likely That you will notice that you have corruption because it means that every time we read a checksum page from the disk
15:45
the checksum is going to be verified and if it doesn't verify you will get an error and If you are paying attention to your logs, you can notice that error and be aware that something has gone wrong so to use this option you can run initdb with the dash K flag when you first create your database cluster or
16:03
Starting in version 12 you can shut down the database and use pg checksums with the dash e flag To enable checksums and then start the database up again There is not currently an option for enabling checksums while the database is running
16:22
With regard to storage the best advice that I can give is that you want your storage stack to be as simple as possible and I think this is something that's becoming increasingly challenging for many people and I'm not quite sure what to do about that, but Nevertheless simpler things seem to be more reliable and have fewer problems than more complicated things
16:43
For example, if you use local storage rather than a network file system There's just a lot fewer components involved You have one server involved instead of two and you have a drive controller involved and you have a drive and a file system but you don't have you know, any network interface cards involved for example, and you don't have a
17:04
Switch or a router that's routing packets between two machines and could potentially go wrong And there's generally less to configure as well. So if you can use simple local storage My experience has been that that is a lot more likely to be reliable and to not cause you any problems
17:24
It may certainly be possible to set up things like NFS or ice gets you reliably But I've seen a lot of setups that seem to be unreliable. And so I've become quite skeptical about those I think one of the issues is that many people don't really consider very carefully the options that they use for things like NFS setups and
17:44
There certainly seem to be things that can be done on the NFS side to improve your chances for example NFS v4 I think It tends to be better than earlier versions of NFS You want a hard mount for sure not a soft mount
18:00
So you want to set the hard option and you want synchronous operation rather than asynchronous Operation for exactly the same reasons that you want to pick a wall sync method That's actually reliable and really makes a guarantee that your data is on disk Postgres QL really relies on the data being on disk when the system tells it that it's on disk
18:21
So if it's not on disk yet Postgres QL really needs to know that and it really is very very important that the operating system Doesn't tell it that the data has been written down to the physical disk until that's actually the case And so in the case of NFS what that means is you need the sync option and from what I can tell it appears that
18:42
You actually need to set that option in two places You need to set it on the NFS client that as the server which is mounting the remote disk But it seems that you also need to set it in the export on the server So the NFS server which is providing the file system as a remote file system
19:00
Also needs to have the sync option configured on that side in the Etsy exports file or some equivalent In order to make sure that both the client and the server are completely clear That this is supposed to be a totally synchronous mount Obviously, there are performance consequences of that But the positive consequences that you might be less likely to end up with a corrupted database. I
19:32
Important not to rely on it too much the advantage of something like RAID 10 is that for every disk you have Whatever its contents are supposed to be you have another disk which is expected to have the exact same
19:45
Contents and that means that if something goes wrong with one of your disks and it's either not Accessible at all or the contents seem to have gotten scrambled You have another disk that you can go look at and maybe that one is okay and you can get your data back You know in some cases you can just rip the bad disk right out of the machine and
20:04
Continue using the other copy of your data and be right back in in business or at least somewhat more back in business And that's a really nice place to be but of course you still need backups. It's entirely possible for Both of your disks to either fail or for both of them to get corrupted in the same way. I
20:25
Recommend a lot of caution when choosing a file system I think there is a lot to be said for using file systems which have a good reputation for reliability and which are very widely used I believe that ext4 and XFS are the only Linux file systems where I don't know of a case
20:45
Where somebody had a corruption problem that could be directly traced back to some behavior of the file system I think every other file system that I've seen somebody use at least on Linux Has had that problem
21:00
NFS definitely I know I'm about to make somebody in the audience very upset But my experiences with ZFS have been pretty negative I've seen multiple customers who had very serious problems That seemed like they could not be explained by anything other than horrible bugs in ZFS
21:21
So I can't recommend that even though I know it has cool features I think not losing your data is one of the coolest features that you can have and Ext4 and XFS both seem to do very well there Ext3 was very poor in terms of reliability and ext4 seems to have made a really large improvement and that's really good to see so I
21:47
Would recommend sticking with those obviously other people may have different experiences so I can just comment on what I've seen The other thing that's really important in regard to storage is that you need to monitor you both your PostgreSQL logs and your operating system logs for storage related errors after all if
22:07
you have storage related errors, and you're not looking at the logs that the problem is just going to get worse and worse and worse and Eventually, you're probably going to find out about it at a really bad time It is really surprising how many people just kind of assumed that all of this stuff is going to work
22:25
And it's not going to have any problems And I guess the reason why we assume that is because we all know that computers in 2020 are pretty reliable and most of the time they do work But you really want to know if you've got a problem if the operating system reports the problem to PostgreSQL
22:43
Then PostgreSQL is going to log an appropriate error message in the log file assuming that you have your logging set up properly And You know you want to notice that if you see IO errors for example starting to show up in the PostgreSQL logs That's something where you're going to want to take correction corrective action as quickly as possible
23:02
But there are also some times when nothing gets reported to PostgreSQL PostgreSQL gets told by the operating system that the operation succeeded, but actually there were signs of trouble and sometimes that's because From the point of view of the storage system Nothing has actually failed yet, and maybe for example the data that the disk is
23:24
Relocating your data from a sector that seems to be going bad to some other part of the disk that's still good So in a case like that there might not be an error that PostgreSQL can report But there might still be something in your kernel log that tells you that that happened
23:41
And that's the kind of thing that you probably want to know about Finally in terms of best practices. There's the whole topic of how you administer your database the important thing here And I feel a little bit silly saying this is it's really Important that nothing other than PostgreSQL modifies your PostgreSQL data files
24:04
And I'm not talking about text configuration files You know there are a few text configuration files in the PostgreSQL data directory Which of course it's perfectly fine to modify those files with a text editor But the rest of the data files are not designed to be manually modified or removed
24:20
And if you do modify them or remove them you may end up with database corruption in fact. It's quite likely We see people for example look at PGX log or PGC log renamed in newer releases to PG wall and PGX act and they say well There are a lot of files in here and some of them are old, so I'm going to remove the old ones and that
24:43
That typically ends very badly for those customers or those users Because PostgreSQL intends that it should manage those files that it should decide to remove them And if you remove them before it thinks that they're ready to be removed, then you end up with a lot of problems I gave a version of this same talk at PostgreSQL conference in India back in February and
25:07
Many people came up to me afterward and asked questions and one of the most common questions was essentially Hey when you said that we shouldn't modify the data files in any way Did did you really mean it because here's what I'm doing and all of those people then went on to describe things that were not
25:24
very safe And I later wrote a blog post which I've got a link to it here on the slide And there's a few more comments on the blog blog post Proposing other things that people think that maybe they should be doing in terms of manual modification of the database directory and
25:41
Generally, that's a really bad idea It's really really easy to corrupt your data that way The next couple of points on this slide are actually just variants on the same theme When you think about it antivirus software is kind of a crazy idea Antivirus software runs around your machine and looks for files that it thinks are dangerous
26:03
And sometimes it removes them otherwise known as quarantining them and sometimes it Modifies them otherwise known as removing a virus from the file from its perspective And that's pretty crazy. Right? You've got this software that's running around looking at every file on your machine and making
26:22
modifications to the files according to some algorithm that it thinks is good without having any I Real idea of how those files were intended to be used needless to say this can mess PostgreSQL up It is a good idea if you absolutely have to run antivirus software on your PostgreSQL server to at least
26:42
Exclude the PostgreSQL data directory, but I recommend not running it on your database server at all Because it's not really going to protect you against anything. It's just going to corrupt your database Sometimes that kind of software Doesn't really totally respect the option saying that the PostgreSQL data directory should be excluded and it just goes and modifies things
27:03
Anyway, which is not good Do not remove postmaster dot PID this file is to some extent a text file I mean it contains, you know, four or five lines of text and sometimes people feel that it's okay to remove but Generally what that lets you do is get two copies of the postmaster to run at the same time on the same data directory
27:25
Which is incredibly bad and will almost definitely corrupt your data. Even if you know you all the Even if you're only reading the data with the database there still may be file modifications going on at the physical level And you really cannot afford to have two copies of the database doing at the same time that at the same time
27:47
also consider performing plug testing, you know, if you have a database server and It's not in active production use and you can just run your workload against it or a simulated workload That is similar to your actual workload and then rip the plug out
28:03
Start it back up again. See if everything looks okay. That's a great idea. It's a great way to find out whether you have problems It doesn't guarantee that you don't have problems But if you do it several times and everything is okay every time That's at least some kind of an indicator that things aren't too bad if you do it once and everything breaks
28:23
Well, then, you know, you've probably got a probably got a fairly serious problem So if you do get corruption, what does it look like? Well, typically what it looks like is you get error messages in the log files It isn't the case that every instance of database corruption causes errors, but it's pretty common
28:46
sometimes you might just get different content out of the database, but remember that postgresql data files are You know, they have a structure to them. There's a page format. There's a tuple format There's hidden metadata that you as a user don't see but which the database system uses for its own purposes
29:05
And so a lot of times if you just corrupt some random portion of a database page You might get some kind of error when you attempt to access the data that's stored in that page So errors are pretty common result of database corruption unfortunately, the range of errors that is possible here is very wide and there's no simple rule to determine whether a
29:27
Particular error that you might see is the result of database corruption or whether it's the result of something else like a bug or even the result of something that your application did but generally what you want to be on the lookout for is
29:45
Errors that seem to be complaining about things that are internal to the database system rather than user facing things Here are a few examples that I've seen Working here at enterprise DB the first one says that it could not access the status of some transaction and
30:04
The reason that it is that is given why it couldn't do that is because it couldn't open some file PG underscore exact slash zero zero zero three Because there's no such file or directory now We don't really need to know what that file is or what it does
30:23
In fact the fact that we don't necessarily know that is a good sign that this message may be indicative of corruption because it's not complaining about Something that we did right like if you in try to insert a row into a database which would violate a foreign key constraint and
30:45
You get an error message saying hey that row would violate the foreign key constraint There's a clear relationship between the SQL statement that you attempted to execute and the error that you got and the problem Is not with the database itself in that case The problem is with the SQL statement and the way that it conflicts with the foreign key constraints that are in place
31:06
But what we have here is a message that's complaining about something that has no direct relationship with anything We did we may know that we execute read-write transactions against the database But we don't know what numbers the system uses to identify them internally. So, you know transaction three million eight hundred and eighty-one
31:27
Thousand five hundred and twenty two. That's not a number that we chose That's a number that the system chose and we didn't create that file PG underscore exact slash zero zero zero three The system decided to create that file and it decided that that file needed to be accessed
31:43
And now it's telling us that that file is not there. So something has gone badly wrong And all of the examples on this slide are really of that same nature The second one is complaining about not being able to read a block that it expected to be there Again, a block is something that's internal to the system. It's not a row. It's a block
32:06
So the fact that it isn't found must be Because the system made a mistake or someone changed things in a way that the system wasn't expecting Failing to refind a parent key in an index Well, we create indexes, but we don't know what parent keys are as a user
32:23
I mean as a device if we're if you happen to be a developer as you're watching this video You may know exactly what this means but a user might not and there's no reason they should So the fact that they're getting an error about it likely means that the database is corrupted Cache lookup failed is an extremely common kind of message that you see when your system catalogs
32:45
Get corrupted and so forth and so on all of these are messages complaining about Internal things to the database that the user shouldn't need to know about But the system is now complaining about because somehow they've gotten messed up
33:01
Sometimes you don't get an error those cases can be really hard to troubleshoot There are cases where corruption causes a database operation to go into an infinite loop for example an index that is intended to put all of the data in order might contain a circularity and some operation might might just go around and around in a loop forever or
33:23
You know, you might have a crash or something like that that's caused by corruption and those cases are pretty hard to troubleshoot But I think there are a lot less common than these cases that just produce errors So what happens if our database does become corrupted and we know it's corrupted because we saw a funny error message
33:43
Or some other symptom of corruption and now we want to recover our data well The first thing that we really ought to do is try not to recover the data from the damaged database Try to get it back in some other way So for example, if we've got a master and a standby and the standby becomes corrupted just rebuild the standby
34:05
Don't worry about The fact that it became corrupted don't try to fix the corruption. Just throw it out and start over Similarly, if you've got a problem with a master and a standby and the problem is only on the master Maybe fail over to the standby and then later you can rebuild the master
34:23
At a separate time or maybe even if the problem is on both the master and the standby Perhaps you have a backup that you can recover from rather than proceeding with your damaged database or Another thing that's actually quite common is someone may be using PostgreSQL for reporting purposes
34:43
But actually replicating the data into PostgreSQL from some other source and in that case It may be possible to just throw away the whole database cluster and rebuild the data from that external source and that's a great option because it avoids having to repair a damaged database, which is a Pretty difficult and somewhat risky thing to do and you can't really be certain how
35:05
Well, you're going to succeed in getting your data back. So You want to avoid it whenever you can However, that's not always possible Sometimes you really have no Realistic option other than trying to recover the data from the corrupted database
35:22
It may be that the corruption happened a long time ago and you didn't detect it So by the time you do detect it it's present on your master It's present on your standbys and it's present in all of your backups and if it's not coming from some other source then all of your copies are corrupted and they're all corrupted in the same way and
35:41
There's really no help for it. But to try to do the best that you can With the corrupted database in that situation So what should you do if that happens? Well, I'm going to tell you what I recommend Some techniques with which I've had some success But I do want to offer a disclaimer that this is something that should be done with extreme caution
36:02
You should consider hiring an expert to do it for you rather than trying to do it yourself And if you do try to do it yourself Please keep in mind that the advice I'm about to give you is based on my experiences And I hope it will be useful to you, but it is not guaranteed in any way whatsoever So if you try to do this you do it at your own risk and it's entirely possible that you may
36:26
Lose or further corrupt your data and it isn't also entirely possible that this advice that I'm about to give you is Entirely wrong or at least wrong for your particular situation and is actually a terrible idea And I disclaim responsibility for all of that
36:47
Basically what I recommend if you try to do this is a two-part approach and step one is to make sure that you have a complete backup of the database and what I mean here is A physical copy of all of your files or as many of your files as you can access
37:04
The important thing to understand here is that when you try to recover data you are going to be doing things that may make the problem worse and So if you have made a copy of all of the files before you do any of that you will at least be able to get back to the state that you were in when you first decided to try to recover your data in this way and
37:27
That's good. That means you can try again or At least not end up any worse off than you were Having a copy is a really really important step Once you've done that the second step is to try to use peachy dump to back up the contents of the database and
37:47
Then restore those contents into a new database created by a new init DB and the reason why you want to proceed in this way is that if you just Hack on the existing corrupted database until it seems to run again
38:01
You may get it running and it may seem like things are okay But you may have lurking problems that you don't find out about until much later So that's kind of a scary prospect and by dumping and restoring everything into a new database You know, you you may have already had some corruption that occurs But at least you know that any hidden metadata or internal structures to the database have been totally
38:25
reinitialized and you're not You're not going to have trouble with that kind of thing down the line at least not as a result of the corruption that you've already experienced naturally there could be Future Corruption events as well, especially if you don't find out why the corruption that you already had happened in the first place
38:43
But I think it's moving in a good direction Sometimes this process goes smoothly this dump and restore process goes smoothly But pretty often it doesn't and there are a couple of different ways that it can go wrong One way that it can go wrong is you can find that actually the database just doesn't start
39:02
Your corrupted database you try to start it up and you cannot start it And this can happen for a variety of reasons for example It may be that all of your wall files were lost or corrupted and you can't get them back from anywhere And without them you can't start the database
39:22
So, what do you do? Well, there is a tool Called PG reset wall or PG reset X log that can often allow a corrupted database To start and the important word in that sentence is start It's really important to understand that PG reset wall does not fix the corruption in your database
39:43
In fact in some sense, it makes it worse It's just trying to hit it with a hammer hard enough that the database will start up And if you have the kind of problem that PG reset wall can fix it usually will fix it It's a pretty reliable tool at what it does
40:01
It usually will make a database with this kind of problem start. There are exceptions. Certainly there are cases where It's not going to work But if you're missing wall files or you're missing PG control This tool has a pretty good chance of making the database start up again
40:21
and That gives you then the ability to run PG dump perhaps If you have a database that's so badly corrupted that there's no hope of starting it whatsoever Then rather than trying to proceed In the way that I mentioned before where you try to get it up and run PG dump
40:42
Another option is to use a PG file dump tool This used to be maintained by Tom Lane when he was at Red Hat and is now maintained by some other folks And newer versions of it have an option that can take a raw PostgreSQL data file and extract your data from it With a little help from you to tell it things about data types
41:01
So if you have a database where you're missing, maybe you're missing a lot of files There's no hope of getting the thing to start up Then that that tool may be helpful to you If you get the database to start the next problem that you might have is that
41:21
Running PG dump might fail and that's because PG dump has sanity checks inside of it and your database is corrupted so it's probably in some way insane and So it's possible that you're not going to be able to dump it One way that you can get around this problem is just drop the stuff that you can't dump
41:41
if this is an index, it really isn't costing you anything because you don't need the index in order to Dump out the contents of the table or if it's a temporary table that was left behind by a system crash. You could just drop it Sometimes you can drop other objects that you don't really need Maybe you have a stored procedure, but you know what it did so you can just drop it and recreate it in some way
42:06
by some other means and You know Sometimes you may even decide that the contents of a table are not that important and you're just going to blow the whole Table away in the hopes of dumping the rest of the database
42:21
So all of these are techniques that you can try to use to make PG dump succeed There's other things that you can do too. For example, manual system catalog modifications to try to fix problems that you may find there Sometimes you can even try to just dump part of a table if a dump of the whole table fails Maybe you can use a where clause perhaps based on the hidden CT ID column to extract just
42:47
Some of the blocks or some of the rows in the table Rather than dumping out the whole thing. You can't do that part with PG dump, but you can write your own select queries Sometimes catalog corruption, which is a very common cause of this problem
43:04
Can Be effectively diagnosed by using a tool that enterprise DB publishes called PG cat check I designed this tool and I was involved in the development of this tool along with several of my colleagues I find it pretty useful your mileage may vary
43:20
but it goes through and does a whole bunch of sanity checks on your system catalogs and tells you about the problems that it finds and if you fix those problems then You will probably be able to run PG dump if the problem was catalog corruption. So I think that's pretty useful
43:43
So, okay Let's suppose that you get the database to start and you manage to use PG dump or a raw select query or a copy command Or something like that to get your data out At this point you may think that you're in pretty good shape because all you need to do is restore your data But it's actually still possible to have failures on that side, too
44:04
basically, the reason why this happens is because at this point, you know that you don't have any problems with any You know hidden database stuff You've converted everything to text format. So there's nothing database There's nothing of the database internals that can get in your way at this point
44:23
But your data can still be logically inconsistent. It might be that whatever kind of database corruption you had resulted in a foreign key violation or a unique constraint violation or something like that which the original database failed to detect as a result of the corruption
44:41
But when you try to restore it into a clean database It does detect it and you get some kind of an error And in this case a human being needs to decide what to do according to the relevant business logic If you have a duplicate record, for example It might be right to drop one of them or merge the two of them or something like that But there's no general formula a human being needs to figure it out
45:03
If you get through all that you're probably in pretty good shape You may not have recovered all of your data but you probably have you know as much of it as could easily be extracted from your original database and Because you dumped and restored into a new database you've gotten rid of any internal problems that the old database had hopefully
45:27
and That's usually a pretty good start Of course, you always want to go back as well and try to figure out why the corruption happened in the first place Otherwise, you may just keep having the same problem, but at least it gets you back on your feet for the moment
45:41
That's all the slides I have and that's all the time we have So thank you very much. If you're listening to this talk if you listen to the whole thing that's really great and I appreciate it a lot and Hopefully we'll have an opportunity to do some online questions and I'm looking forward to that Thanks a lot and we're live with Robert to give some questions and your answers. Go ahead Robert
46:08
Okay, so just looking through the chat here. The first question that I see is from Ammit who asks Hi, I saw that the size of the pg control file is zero on a Postgres server
46:22
What might have caused this by the way, no disk was full The short answer is I don't know. I think it's really hard to troubleshoot a problem like this based on Just the you know, the amount of information that somebody can provide in a chat I really have no idea. I mean it could be a system crash
46:43
After the file was created, but before the file was f-synced Maybe maybe f-sync was turned off and a crash happened. Maybe somebody manually zeroed out the file I mean I get this kind of question a lot and I always kind of wonder what people are expecting me to be able to say as an answer because
47:02
You know in a sense it involves Looking into the past right? How did how did it get that way? And that's often a pretty difficult question to answer. So I really don't have a specific answer for you, but You know, you just have to sort of try to investigate what happened on that on the system
47:22
You know before the point when you observe that problem the next question I see here is Is there a way to tell how many rows of a table are corrupted And will be zeroed out if you use zero damaged pages equals on and vacuum re-index the table. I think the short answer is no
47:41
There's really no easy way that I know of to tell how many rows of a table are corrupted I have had very bad experiences with zero damaged pages For precisely the reason that you allude to in this question Which is that if you turn zero damaged pages equals on and you start doing stuff You don't know how many pages it's going to zero and it might be a lot
48:03
Um, and the time or two that i've tried to use this it was a lot and nothing good happened after that Um, so I uh, I don't recommend that as a corruption recovery strategy I actually kind of think we should consider just taking that option out completely because I think it's Just way too dangerous, uh to have something like that that just erases your data in a fairly uncontrolled fashion
48:26
Uh, but opinions may vary on that topic Um by not using homegrown scripts for backups I use pg dump within a script But I don't don't think you mean not to script pg dump. What homegrown scripts do you mean?
48:40
Yeah, i'm not really talking about pg dump here. That question was from dbl by the way Um, I don't really mean pg dump. I think people do Uh script pg dump and that's probably mostly okay Uh, maybe you want to try to make sure that your dump gets f-synced or that you have some check that it's completely written Uh, but generally I don't think that's a terrible idea that the place where people get into trouble is when they're backing up the the database
49:05
You know using the hot mac up hot backup methods where they're calling pg start back up And then they're copying the data files and then they're calling pg stop back up Um, and i've just seen so many horrible ways of doing that wrong, uh, it's really uh terrible
49:23
Um, I also see another question from amanette here who asks, how can I recover pg control, um, I mean There's no Magic answer to that right like if you lose a file any file There's no magic wand that you can wave that will get you back. Uh
49:40
Uh The file exactly as it was, uh, because there's just there's no undelete, uh functionality available Um, there are a couple of things you could do Uh, you could go to a backup and you could try to get pg control from your backup But of course if things have happened since then The contents, uh from the backup might not be very sane compared to the current state of the system and that's a very serious danger
50:06
Uh for that file Um also uh You know another thing you can do is you could run a pg reset wall. But as I said in the main body of the talk That has a good that will create a new pg control file and it has a good chance of making the database start
50:23
But it in no way means that the database is not corrupted anymore. The database is Definitely corrupted and and part of the thing that I hope people will take away from this talk is that you can't really undo it Uh, you know Once something has become corrupted
50:41
Like there's no way to go backwards from a corrupted state to an uncorrupted state and just have whatever got damaged Not be damaged anymore because that would basically require some sort of magic that we don't have um so it's very important to make sure that you don't get into this situation in the first place and that if you do get Into this situation you have the kinds of things that I described in the talk to help you
51:03
Recover like backups standbys and so on The next question here is from azim who asks Why is pg cat check not in core or contrib and did you consider contributing it? um Uh, we did consider contributing it at the time. I don't remember
51:21
Exactly why uh, we didn't submit that to postgresql I suppose we still could uh If I get a bunch of feedback from people who would be like to Likely to comment in a hacker's discussion and they all say hey, yes, please submit that I'll certainly talk to edb management and see if they're up for that and they might well be up for that
51:40
um, but Um, I suppose one of the nice things about having it, uh in a separate place Is that uh, we don't have to argue about whether it's a good idea or not Uh, so, um, yeah Uh, let me see if there's anything else here, uh, what was your bad experience with zfs or zfs on linux with os native
52:02
uh zfs I I don't want to share specific details, uh of of customer experiences because uh If the customers found out that I did that they might not like it very much um, but I I think you know in one case in general terms what I can say is that
52:22
uh the system just Like nothing was working right on that system and I don't really know why so that case is a little ambiguous I I don't think that was zfs on linux. I think it was on Solaris, I think the other case I ran across Was also on solaris and in that second case, but i'm not sure it might have been linux in that second case
52:45
It was definitely the case That the corruption was from zfs because it involved Changing certain zfs options and depending on whether you changed those zfs options or not Uh, you got corruption or not and the kind of corruption that you got
53:02
Again without disclosing any details from the specific customer was something that I personally believe There's literally no way that was a postgresql behavior. Like it was way insaner than that Um, that's all the questions, uh, I see in the uh, the irc channel so, uh,
53:23
Thanks a lot. And I hope you got something useful out of the talk. Thanks