Logo TIB AV-Portal Logo TIB AV-Portal

Corruption Detection and Containment

Video in TIB AV-Portal: Corruption Detection and Containment

Formal Metadata

Corruption Detection and Containment
Alternative Title
Survey - Error: invalid page header in block 123 of relation "foo"
Title of Series
Number of Parts
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Place
Ottawa, Canada

Content Metadata

Subject Area
This will not be the most exciting talk, nor is there (currently) a simple answer to make hardware corruption problems go away. But it's important -- without being careful, it's easy for corruption to spread to replicas and backups, leaving data hopelessly lost. Or, a strange crash due to corruption could take many engineering resources to analyze. This talk is about kinds of hardware corruption that can and do happen, and the ways to detect and contain the corruption as quickly as possible. Additionally, we'll discuss a roadmap of improvements to postgresql to make this an easier process; as well as alternatives (such as detecting corruption in the filesystem). Note: Some storage systems do provide strong protections against data corruption. This talk is primarily (though not exclusively) targeted at users of the local filesystem, particularly on Linux. These are the topics that will be addressed: Why not deal with this in the filesystem? The different kinds of corruption. When to detect the corruption, and how to contain it. Data page checksums Backups and corruption Replication and corruption Background and offline detection More work to be done
email backup Blocks time survey Home Page survey sets loss applications production permanent Computer animation permanent Blocks errors backup systems
control response backup Ionic real file system unit storage bits data replication component-based Computer animation rates strategy different configuration Hardware Hardware level figure firmware systems firmware
choice backup time file system survey sets argument events production configuration file system negative box Office disk backup systems validation storage traction bits Databases mathematics Computer animation real vector case Blocks localization
point files file system sets Semantics workloads crashes memory file system structure optimal stable Weak area recovery Databases scans applications caches processes Computer animation system objects Blocks reading
Slides category backup Computer animation information case recovery time recovery plan backup
web pages email email file system survey Home Page sheaf bits mathematics mechanisms Computer animation Relation disk Blocks errors
web pages email Slides files file system Home Page production crashes operations errors backup systems area email recovery survey bits basis scans applications Computer animation system log files Blocks Results
web pages email track Slides backup files states time Home Page Mass functions data replication number versions different file system level utilizes firmware errors position systems area Home Page email distribution Graph information Blocks logical consistency files Content PCI Databases bits limitations potential Location Hexagon processes Computer animation case calculation editors Right
web pages point email backup time Home Page loss crashes Propagation causal level structure series backup email recovery files traction instance processes Computer animation real vector Query crashes Results
backup factor real knot instance scans words Computer animation verification file system disk Right backup
point area backup files factor real storage ones scans data replication Zyklische Redundanzprüfung fields independent mechanisms Computer animation case operations orders file system backup record spaces systems
states time directions data replication part independent mechanisms Auswahlverfahren verification file system flag errors systems Weak area algorithm email bits Types sparse processes orders buffer Right reading write point web pages filters implementation overhead Open Source sign in Zyklische Redundanzprüfung powerful training versions workloads period crashes specific goodness level testing structure backup alpha Home Page analysis Computer animation case logic calculation
point backup permanent Computer animation Void's configuration recovery recovery bits loss backup Results
Observation beta states time unit part data replication mathematics mechanisms strategy different memory configuration analog verification core file system extent errors systems God mapping Development The list bits connections processes alternatives Right spaces record modes point current web pages Slides Vacuum backup server files student sign in events number versions workloads crashes utilizes report backup Home Page cellular Ionic independent Databases Indexable words Computer animation software case Universal statements ensemble
web pages backup files time sources data replication sign in wikis inference crashes mechanisms file system extent backup Home Page relation validation key files Content traction plan bits Redundancy Checking Computer animation hash case disk sort pressure
area man control factor states recovery logical consistency provide Databases Transactional part Zyklische Redundanzprüfung powerful Spring processes Computer animation different case Blog Right systems record
point backup factor files views time part Zyklische Redundanzprüfung fraction period goodness mechanisms memory configuration verification core backup Neun systems man recovery traction storage provide bits completion Computer animation sort objects
files mapping Blocks time data replication production goodness relativistic file system Bezeichnungssystem testing table errors backup form
so 1st of all a couple of quick survey question here on how many people have seen this error message Invalid OK yes so that's usually the start of a bad so using that in
a production system and I'm guessing that's the same set of hands because you must never see this and testing only in production but they're not for for which of you did that lead to a permanent data lost in on a couple i then I did you have a backup or replica and you had a backup replica ended that still leaves to operate a permanent data lost that OK so that each again here it's not as fortunate so know no permanent data loss for the people in the back of the applicants yet so 1 thing and that all the time talking about this talk is sometimes I you'll end up in a situation where the the same block in the back of the primary and anything go back up but in reality that's and what might be that so that the 1st step you know question we need
the answer is we should we blame right this is the end the 1st thing we you know I think there's some kind of corruption and so there's a lot of different pieces involved and so we like to unit and figure out what the responsibility of each component is what's failing which components should be checking the other components and so you know the 1st reaction of course is to blame the hardware or the the rate controllers storage system or the firmware on 1 of those devices and and that that is a you know totally reasonable balance and place to start but unfortunately it doesn't get you very far because you don't have a lot of control and so were not going be talking about what you can do there are so much so that most of you have a pretty good idea of which options are more reliable than others others in that respect I but I I think that it's important to always check at a higher level anyway even if you think your your system is reliable all kind of go into that a little bit more detail about
what we will be talking about but his 1st rescue well and what it has to offer and then how you use you I construct your replication and backup strategies to deal with this problem the real answer of course is all the above could be blamed but you know this is what
will be this so I kind of looked out while systems alone go into a digression about that quickly and so I want things about how systems it seems like a natural place to do that corruption checking to to make sure that everything is OK before it makes it into stress some other applications or that the vector parameter using the backup proposed rest so that's the case so that the problem with file systems this this talk as I mentioned in the in the abstract is targeted at people who you know are not necessarily able to use some sand or other system that that does all this checking for you and you know the file systems available on a set of ordinary local storage Linux box or something like that you could look into the events and butter office those offer checksums and all the validation those checksums and that could kind of be a reasonable choice for the problem is that neither of these really are a reasonable choices for many people I you know it was doing evolves make sense but which file system you choose and the only real option the butter butter
persons EFS the boat are either experimental or not widely deployed you know that let me give you a quick survey of how many people have evaluated the other offense or CFS I see I only see couple hands there and and so I'm assuming do you uh I mean you know you use that production a database this I see I see no hands so perhaps people you read the abstract of talking real history that this was started from the mostly people who who couldn't I use that option for it's just not viable for for many people those file systems are all briefly go into why but both are copy-on-write style and and there's there's actually a technical reason it gets a little bit complicated we can discuss more at the end of the talk at the time but there's some technical reasons why copy-on-write and checksums go together if you're not doing something that's scrambling copy-on-write in the file system or the storage layer in that it is difficult to get those checksums right without introducing performance a performance problems of war or false negatives and then it also turns
out that this copy-on-write file systems are not particularly good for database applications for many workloads but there are some people I you know I know some people who do actually prefer say CFS for their database workload but it does have some significant disadvantages for many workflows these copy-on-write semantics that lead to fragmentation of the files so what do you think is a sequential read is actually a much more random still and just based on that you know it's it's you know unlikely that these file systems are ready now and it's also even if you wait for stabilization optimization of these file systems they may never be with you want for your database workload so it's it's not it's not a great solution
so that that's kind of 2 to set file systems aside from the rest of the talk and if there's anyone have any questions about processes that would have a lot of it so you know there's a whole yeah the the point was that the file systems also the object the in-memory cache and so that leaves an additional vulnerability often that memory is very large so that there is a greater chance of some area happening there if you have a rampant memory problems that will likely result in crashes instruction regardless but if you have very rare memory problems and then that could be a problem that is that it might not be otherwise I with using a file systems to detect the checksum to detect the structure so as far as
the goals here some of what we want to publish as we want to detect and contain the corruption by that I mean because so think that the slides
and again so we'd like to reliably detect corruption as early as we can and we'd like to avoid propagating destruction I want we found so if you don't know that there's a corruption there it will likely property and then we want to eventually have a recovery plan based on this information so all these are
related if you have destruction and it goes undetected it's almost certainly going to make it to the the replica of the back in which case now the corruption is everywhere and you know you can go a long time potentially removing all backups and then you know you'd be totally unable to recover all i so the
1st I would conducted by the section so right now the this is what is available in Postscript to help with the detection problem this is the the baseline forgot my about detection what I'm really talk about is the mechanism so I kind of separated into detection and containment but is there a little bit related when I say detection in the underlying mechanism is used to determine whether corrections happened assuming you're already looking at the data and so you know here in PostScript you can use the existing PostgreSQL there will always do this page header check so the error we saw the beginning of the presentation of and up
here this there is the result of that page header checked so it's a very simple sanity check so it's I'm missing a lot of errors and a lot of
there's actually will result in medical and turning into
crashes for undetected production so the page check is
available but it's somewhat weak you get as you make use of that and look out for that certainly want to look for that in log files and so forth it will not take the system down so you might actually not notice even if your applications getting areas if the application gets there is rarely in those areas are not seen by somebody in the operations team then you you might actually not notice even of post this is is noticing so it's still useful to be aware of that area and at least the on the lookout for it with the monitoring tools and in 9 . 3 which will be released later this year and this has a much more robust way of detecting areas that should form a better basis on which we can do with which we can use to contain those there's an Indian who would be able to recover from them so the checksums 9 3 all talk about a little bit more in the next slide and then after that PG file them so
Jackson's in 9 3 are a big improvement over the page header check it's essentially a a much better version of the page and check if you have some kind of correction is very likely that that 2 detected it will detect that even if a page is good but if the the file system jumbled the blocks and move that good block to another location but it will detect that because it incorporates the page number into that checksum some calculations and so that that actually although it sounds like a theoretical problem where the I system is mixing of walks like that I I I I can assure you that is not a theoretical problem that is a very practical problem but with some you know by the firmware that kind of thing so so that is that's certainly a problem so it should detect things like that it will cost effective approach in the middle of the page and then it's it's much more reliable way to detect that the data you've written is not the same data that you sort system the back of EEG file them
and how many of you have heard of this utility here and they just a few people so I highly recommend you look at it it's it's not but it's not perfect but for what we want to use it for but it's it's really the only tool out there like this right now and I hope to change that sometime in upcoming post-stressed releases and so forth but this is on the foundry it's a tool and we will examine it will do a little bit more exhaustive checking the just the the track and then you can also do it offline on the files and it it gives you information about the content of those files and and then if it has some problem interpreting the data pages which or the next stages that will write an error to and this will have an error in the output and you can just for that so that's that might be a typical way if you suspect structions you could use the G filed employment 1 flower every file in the database and this is a graph for the areas of the CEOs areas you got corruption using the systems offline everything you you wouldn't use this for an online system and by the Securities suspect this is a great tool this is going to be 1 of only resources you have aside from is essentially a hex editor so it's it's good to be aware of that I actually recommend that you install this on you know that the systems where you know that you're concerned about that ready and you know I don't think that this is packaged on most of the distributions of the distribution so that might be good to you know compiled ahead of time so that when you have problem strikes you are a little bit familiar with this because of questions about what would be the use case for this which that is 1 use case is trying to verify a backup at stake in you know enough linebacker so that that is 1 potential use case but there are some limits to how how useful is for that but I mean certainly that you wanted to you know be able to confirm or or deny whether there's any corruption that happen so on if the system's online or it hasn't you know if it's an inconsistency you know base that there hasn't been brought to a consistent state that and you know there's always a chance that it could give you a false positives but you that should be relatively rare social just use it as a manual process to quickly determine whether you have corruption and can be very useful I'm assume that the following question was about prevention versus the you know after the fact diagnosis so I you know depends on what level you're talking about the prevention and this will prevent corruption but because of the Texas it can help us avoid spreading it and propagating at other places so it can help us contain those that corruption and that alone could you know I mean the difference between a recover recoverable system and are not recoverable system so another question here by it's a valid checked for the masses or all right 0 OK I stay sutures saying so that's that's an interesting question I'm going to go a little bit more into that I think in 1 of the other slides about the role replication place here and so let me see if that and then I will come back to your question if if it's not answers so
again just to quickly repeat and I recommend that you just familiarize yourself with this tool east downloading compiling it that way when you run into problems it's it's close at hand and so when I talked
about thinking and you know what I mean is that when you see the data you you want to actively look at the data before you're relying on outdated being correct and so there's a couple levels of that you know 1 is that if you just running queries right so you don't want to get wrong results and you don't want a stress fresh so that's what the the page header check will do and it's also something that the Jacksons feature will determine 93 3 will do a very good job of preventing that kind of propagation so that the checksums will contain that correction keep it out of the post as executive so that it doesn't thousand serious crashes and you keep it out of the vectors so it doesn't cause wrong results that if they happen to be a more subtle kind of structure on the on the other hand containment is the weather makes to back up replicas so that that's really you know the worst situation is that you have 1 instance of corruption you believe that your protected because you have a series of backups replicas and everything seems fine and then by the time you detect a correction if it's already spread that's that's the worst case because you're almost certainly looking at data loss at that point so that's that's what I mean when I say contain the corruption and so you know when
things is I I think that many post press deployments are actually quite vulnerable to corruption because the Bayes factor makes no no attempt to check for corruption you're really relying on the disk in the file system to give you back the right information and if it doesn't if there is some kind of corruption there it just going to copy it into the back of words into into the replica when you take a base factor to make a replica it'll it's systemic copied over so that's a no that's not a good situation but you know that that will be better in the future but but you know in the in the meantime we have to do what we can to you know for instance using BG file them to the of you know avoid copying the you know avoid relying on the back of only some verification so you know this is this is a big deal right is backups don't really protect you against against these kinds of corruption problems so yeah you really want to
validate your backups and no PG file them this is really the only tool out
there that can help that right now but it's not quite so bad is that the act hostess actually does have a mechanism that really really mitigates this problem and that streaming replication so the nice thing about streaming replication things goes back to to your question here is that streaming a replication is based on these write-ahead log records called of wall records and so those are transmitted to the secondary and replayed and each 1 of those records has a checksum on itself Sierra and so if you have corruption on the master that's going to be the most likely in the data area where was actually store around this time and if the wall there's nothing to become corrupted the standby will notice immediately in your field detected instantly and you'll know that there's a problem right away if it's the data area and on the primary that's corrupted that is most likely won't make it to the secondary at all because the the method of replication is through the streaming channel only and if that's the case then you're not actually moving files from 1 file system to another and you're not relying on this unreliable file system so on streaming replication mitigates this this problem quite a lot but there are still things you need to be careful of if you resync streaming replica at some point in your relying on base that in order to to rethink replica and so how many people are you streaming replication here so a lot of people and so that that's what I think is good mechanism I think that it's not perfect but it it mitigates the problem because you're not doing as many of these space that that's and so it makes it easier to verify the ones you do that and it makes it more likely that you'll notice the corruption before you've done a base that have been copied it to the to the replica so you wanna be and still careful of those resync operations that you know all new base factor but if you just keep a continuous streaming replication systems going so then you're you're much less likely to run into problems
I'm in yet you can end up if you modify the same page so let me repeat the question so is and the the question was about full-page rights which are way that data pages on the primary make it into other write-ahead log in are transmitted across to eventually string replica and that is a valid point but ordinarily what happens there is that the that the master lease needs to read it so let's say here using checks of things on the primary I know that's not available until 9 3 but if you if you go with this for just a minute that would involve you know modification you page on the primary which would involve read of that page which would involve a verification of the checks and so you actually have a good page there and then it makes it into the right and log in and makes the secondary so you're actually good in that case and so it's not a big problem it is if you're not using checksums it is a potential problem because the page header check isn't good enough and it may have you falsely stated the pages OK and if it doesn't end up causing a back and crashes something else if the corruption subtle enough to make it through and continue processing and make it into the wall it could make it secondary so that is still a vulnerability to it but in 9 3 with checksums sounds that have a lower probability will for the most part people stuff you use for all 4 of these 0 yeah right the idea the comment I think this is the point is that streaming replication is not the only way you can avoid problems like this and so use pointing out that long and this is which is another replication system based on the MPEG you it's it's available it's part of skype tools if I remember correctly and uh so that's available at its this this type of problem on it and it's open source and everything but it if you're looking for it I don't happen at URL and but it's the sky tools which is done by state and so that is they logical replication mechanism and because of that it's unlikely and you're not copying data pages around then it's so it's unlikely for the corruption they thinking the replica for that reason so that means that the point that I think that in many cases of streaming replication even without checksums is quite good protection but that might be a little bit better than the than what's currently available without checksums in 9 3 I believe that it will be at least as solid to use training replication as long for that purpose question they answer questions up performance penalty on the checksum so it's relatively low in many cases but it's you know I we did do you have some significant performance testing on the main cast the the checksum another itself is quite fast and somebody else names on sparse my I'm not sure of advancing the name correctly but he did quite a bit about research into the checksum algorithm and we chose 1 that's vectorizable and so forth on modern processors and in overall it should be a very quickly calculate checks on alpha so that ordinarily would not be a problem that might have a slight overhead and in workloads where you have a small shared buffers but large RAM and then you're exchanging a lot you know found between the file system and post-stressed and so that that could have an overhead they're not a huge overhead but all have some you know something noticeable and in order not to go into too much detail but on the other the other cost you'll see as you could see Additional full-page rights being sent to the wall so that period and that's actually the most likely performance from you'll see this is an increase in the full-page writes things and right lost another version of the whole world all of them are based on the question is does the overhead had compounded if you're using these this checksum algorithm over by the oppressors EFS which does its own so I I I haven't done any specific analysis and that ideas that they would found both systems are are checking the same data in the compounds and what I would do is I just look at your risk tolerance and say well is 1 check some at 1 layer good good enough for me and then this this is the probably the end of the process I'm writing is being 1 level higher up uh is often you know often useful so I think that if you were to choose 1 of you probably don't the post checksums but it also kind of depends on whether you're getting some in out of the oppressive but RFS already so if you're choosing to use that power system because it works for you then great and maybe those checksums are good enough and and that would be a very reasonable option and but when you don't like the performance of butter oppressors the best and you know if you're switching to them because of the checksums then actually might be better so just sticking with most interests of the under the problem is all right and that was a very good point that he just brought up so that was that you know it does offer you eventually more protection to have both checksums because for 1 thing it there's you know some subtle problem if you see say 1 check some fail and the other not then that could be a red flag that would require more investigation and additionally the not every bit of data related to post press currently checksum so that I you know would like that to change or or would like that substantially all be covered and eventually but not in the 9 3 implementation it covers the data area only so it leaves out some structures that was 1st users which are still very important but the you know the you know or not are not part of the 9 3 implementation so would still be vulnerable and
0 another thing I just thought I I would mention is that you could also have a case you know it's streaming repetition of the thing to be careful of his although it does offer a lot of protection it you know it also it's going on for a long time you have a huge amount of data then you could add totally independent failures in both and so you you really do want the error checking both systems because you know doesn't do any good kind of legal a back you know around for a long time with the data so maybe eventually corrupted they're going unnoticed because have been you know when you get this other random failure on the primary then you go to the secondary and the it is not very very rare that maybe it's in in a different direction somewhere else and so you know you have to watch out for those those independent filters also but so
as far as recovery goes you either you know go to fail over to replica or your soften back and I think you're all aware of
those options is not a whole lot more to say a some you know essentially if you have avoided propagating the corruption and you've been checking frequently enough and detecting it early enough then everything should be fine and know the high-level design of you know having backups and and replicas will give you the durability and safety that that that you would
expect from you know if you are missing some peace and then you're not detecting years earlier out for you are somehow and propagating the bad data then it's very likely result permanent loss and I know that the most you can do at that point is no use a little bit of creativity to to to try and avoid that you know complete disaster but there's no no real principled way beyond that point where you propagated corruption but so
is the question of how it was used in all cases so this tells you us all so the question was set up with streaming replication ordinarily it just goes you know kind of RAM to straight over the network and so that the question was about the case where doesn't where there's some replication lag and be Wall Data is actually made it to this can in there therefore to continue with the replication in the streaming replication needs to go back go to this get the wall records then send those to the streaming replica so that changes the failure mode a little bit of what will happen is that this the wall records are checks and they have been for a long time that even longer 4 9 3 so that will detect the year and and replicational stop and your monitoring tools will alert the whatever it needs to end I you should be able to take it from there and and so void replicating instruction in that case because of the case the same all and this is a long time and just as I'm OK so that the whole question was whether you know of copying log files for replication of words extent of the old warm standby kind of process right where archiving logs and using and replaying there's lots continuously there is that kind of system you're talking about end this yes yes that's right so since the wall records are check some of that they have a seriously on them then you are protected as long as all you're doing is replacing the wall and so the danger point is really that based backup stage and and the difference between getting out of the wall from the file system versus getting it from just ran as streaming replica set my is that if it's still in RAM you're less likely to to actually get in there because it will have some you know had a chance to read it from this yet so you know just like and me that's as far as part of a more general point any hot data is less likely to any corruption in any backing file of of some data that's hard in memory and is less likely to be detected because the data still memory and you're not actually doing read from this so that that's kind of a general statement there's just uni-gram versus this there is that it would not be detected if it's still rampant Soviet grant this so quick summary of the of the current state of of things I you know I I recommend making streaming replication a more important part of your replication strategy for the purpose of safety and avoiding frequent base that connects with you know without doing some kind of verification on the space that I and then I will win you 9 3 kinds of I I recommend that everyone I use checksums if they're concerned about any performance issues I highly recommend that you start testing now the betas out on and you can you know put a workload on that and see if you notice any difference at all and if you notice anything you know more than kind of you know the marginal change your worst additional additional full-page rights going to the right analog then of course report that the list if it's if it's actually performance problems and so on so yeah that's that's the that the current situation is that you get familiar with the she filed and of course a lot of people that answered your feet so again you can use of right I come up the do the the question was about the developer the comment was about the g dumping also another way to verify so a couple comments about that is 1 that that only works on the learning system so that's that's more of an online check which is still very you know very useful so that could be a good way to test say that the primary and PG-final dump is better for backup because you don't need to start to back up you just need to bring it to a consistent state and then and also PG file them you know will not crashes server right if you're concerned about corruption I mean maybe you don't want to maybe you don't want to crashes the server right if it if it's truck data and dump is trying to dump that out but you know other things that happen right I mean it might not result in a nice convenient error like this on the 1st slide of this talk and so that
systemic caution but in general yes I agree PG campuses is a good way to go knowledge of students on the other side of the cell in the of the we also so you have a problem of the of the of the of the of the development of the day and this all the idea of his you think you're right on the 1 you will see that it was 1 of the things that you problem of the of the duration of the of the use utility of the of of the solution the of the of the of the of the of the University of so these and the solution to the problem with this is that we want to so a couple points came up there is discussion in the audience they're all along or beta so Jim brought up the fact that vacuum could be an alternative online check to instead of a PG down so backing in Fiji down and they both would be a fairly good way to do an online check that but at the same time both the incomplete but you know as as Jim pointed out of the GTM them will largely miss the indexes because it's doing sequential stands to just dump all the data out and in a vacuum but if you have a if it's 15 pages because of the visibility map might you know skip all the pages and maybe the pages that are that our coldest and haven't been touched long time and maybe are therefore more prone to corruption so both of them can miss things and but there certainly sanity checks for this system on the other the comment was you can you can make backing up look at all the data I think I I forget what that option is useful to you but there is a God and it's in their well anyway so yeah I think there is there some way to get back into the 2 to get through all that is also a version of that that's where the name of the yeah I mean it's you know any of these checks are going to be incomplete because is if you have if you have 1 bit different than number somewhere an account balance the only thing that's going to catch that is a check you know so that all these checks are a little bit incomplete you know it would be a fairly and subtle kind of corruption to cannot be detected by a back back and looks at that page and would have to be a pretty pretty small amount of corruption or a strange pattern and so I mean it's it's a reasonable online ticketing and minutes you know like PG them not perfect but it gets many of the cases and is the unit of work and so that was the last century and the 1st thing that we the have observations to validate the data there is just 1 of the most of it is that we can write the music you know you know the the the but what I will be having errors on this the primary versus standby I mean there's there's not there's going be some connection there at the end of copying the the corruption over otherwise it's you know you could have option on 1 and not the other very very easily right I mean you can imagine that if you have a cold data on both of them naked independent corruption events in the whole data set so that the same thing about all of this the quality of streaming replication might miss some of the corruption that way because of the the wall mechanism that I was describing right it since it's not copying the data pages over there and that's actually advantage right and and you you know so what what you're trying to do is you try and you trying to replicate the corruption because you want to look for him which is which is an interesting point and then you could have you know what I would recommend in that case is database backup of the master and check that for use in online check like doing in a vacuum and for GTM to to you know verify that the data is is largely and let me I just go into some of the
improvements that I'm I'm looking forward to introducing In the future this this wiki page years is working in a working on some outlines and some ideas but it's you know not always up-to-date but you know plan on working on you know several things at least in a kind of planning out of several features that will prove the error detection capabilities and post press and the tools surrounding it to make a replication and backup safer were so of a couple of the
things I have in mind 1 is that right now of course Chris considers all 0 pages to be valid and the reason for that is older the technical has to do with relation extension of but I'd like to try to find some solution for that because of 0 pp valid and that means the kind of corruption that just zeros out a bunch of data will always go undetected for go undetected if it aligns on page 1 so that's that's 1 thing that I think I'd like to find solutions for another is that as the key pointed out that this the log and shared a lawyer you 1 for simple and you know you have shared a lot you know so that that mechanism is not currently checksums that can end up with you know writing to disk in then and getting corrupted and the silicon ticular quite important at the commit logs for stress so it's you know relatively small amount of data but it's it's important that it's not corrupted focused on data pages 1st because there's so many more of than that I I think we need a solution for that as well as a temporary files are not as critical of course but if you're doing a sort and an external source schools to this and then and you know you can end up with a situation and I've I've seen this before where can you know mysterious crashed but it's this kind of hard to prove that this was actually corruption from the file system that you know in this case at least I was almost certain that it was some temporary file there was being written out due to an external sort and then its red back in the contents are wrong and inference pressures and these are mysterious and take quite a bit of time to investigate and so that would be a potential you know improvement to be able to detect that which might but you make it less likely to encounter crash and quicker to diagnose and and additionally my offer have a canary in the coal mine if you see corruption there if you have a workflow that's heavy-ion hash joins the sorta something that uses this then but you know that might be kind of a canary in a coal mine and Michael shifted to corruption before it hits other other aspects the
areas of your data but in in uh a few other things but I'd like right now this has difficulty distinguishing between the corruption that happened somewhere in the wall in the right log but and the end of the right hand log that it needs to replace so in some cases it can detect the difference of Princess with the base factor in certain areas it can detect that it has reached a consistent state yet but in other situations it actually thinks you know that when it encounters the corruption that that's the end of the right log in and it'll bring the system and that would result in inconsistent database so that that's you know potentially quite about problems but and the reason for that is because of the way in let's say you have a post press no processes running finding of power powerful and then when writing that last wall record may be only a part of that wall record actually made it to this and and so when you bring the system often tried replay the 1st it's going to happen is you know the the CRC check is going to fail and so you know when that happens it needs to be able to recover from from that case it's already actually reach the real and of the wall that spring committed transactions so that's OK but the problem is that when you have a real corruption and not just a power failure then it could actually mistakenly think it from the end of the right blogs so I'd like to like to you know find some solution there as well and
own and also of course we in a new way to kind of enabled disabled control these checksums based on people's risk tolerances and so forth and then I'd like you know
a well-accepted method for a base that have to fail this fraction is in counter Europe while it's happening for in a you know a follow-up verification step so this is this is a big 1 we like in in some way to to say that you don't have a good back in time and so that could be done as part of the base factor may be that uses a verification step after the fact and and fails but if that purification step fails but but that's that's critical so that we don't start removing backups relying on that I many I'd like a more complete offline checker this of course lies to the previous point to validate backups and in a background check that works online so something like that more PG down but a little bit more principled targeted really have just reading and checking the checksums an otherwise discard the data and you might not want to run lots of Fiji doubts that acquire you know you know they try to get a consistent snapshot and and all these things and hold the snapshot of and for a long period of time and so forth questions like in your view that what would you on the 9th rejections features but works on online systems so little and when it's reading data from the this into stress controlled memory it will validate the checksum at that point so if the back if you actually bring the back up into storage and then the checksums will come into play but it but when I I refer to back up and generally talked about the the you know online back and the bit what's color-based factor and so that's another question it is sort of the use of it that this is going to be used in the core of so that the question was
about the PG upgrading the checksums of what to do with the checksums or the text and the data before it enters a system to a new way to enable disable checksums and trying and an Indian time options so you can't upgrade and you get you the from having their checksums that objects and so and so what we want to do is we want to provide a mechanism so that in this this is not available 9 3 but if you upgrade from you know the previous system the new system and afterward you have been enable the checksum yeah but work 9 3 that yes that's that's correct because as you point out the potential problems you don't want to just copy over unchecked some files and have them sold and have no checksums and and indeed it's essentially not validated data right if you're trying to say if you want to start from a clean uncorrupted point anyway most likely but not yet we we would like to provide a mechanism for you enabling disabling checksums but this
is really so quickly to
conclude I know this is the same same idea you know he's a file them you know that incorporate streaming replication test checksums now use and production when 9 3 comes out and you know just be careful and relying on and the unverified fact that and I look out for some significant improvements in a post 9 you will starting in 93 really with the Jetsons feature there so it is there any more questions out of and so on ordinarily will give the table name in the block number so you know usually all some investigation of find how extensive is because you'll get 1 year but often there'll be many bad blocks so art you know if you're comfortable with the LID is and mapping relations the files and you know I think we can make use of that and I that's that's maybe another good recommendations for people to be familiar mapping relations of ideas and and back in the actual files on the file system is like you know getting to be comparable with before you got a real problem like I said it will I think really the table name in the error message but at the same time you know your muscles and follow-up investigation that all form or the effective