The Worst Day of Your Life

Video in TIB AV-Portal: The Worst Day of Your Life

Formal Metadata

The Worst Day of Your Life
Title of Series
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place
Ottawa, Canada

Content Metadata

Subject Area
Recovering from Crises and Disasters What do you do when the worst happens? It could be a catastrophic hardware (or even data center) failure, a badly placed rm -rf *, or a PostgreSQL bug. We'll discuss how to recover from disasters that are far outside the usual operating procedure... and how to avoid getting into them in the first place. Every DBA with real-life experience knows that sinking feeling when you realize that something terrible has happened: PostgreSQL crashes with a PANIC message, you realize you were on the production system when you dropped that table, or you get a status update that "us-east is currently experiencing problems." What do you do? There's no single solution to catastrophic problems, but we can talk about strategies that might help you keep a cool head while everything around you is losing theirs. We'll talk about things like: Dealing with PostgreSQL bugs. Catastrophic hardware failures. Application and operator error. And, of course, we'll discuss what you need to do in advance to make the Worst Day of Your Life a little bit less traumatic: Backup and recovery strategies and tradeoffs. Upgrade procedures. Planning for business continuity in major disasters.
Frequency Message passing Error message Frequency Query language System administrator Query language Statement (computer science) Extension (kinesiology) Message passing Resultant
Query language Hypermedia Query language Website Database Formal language
Context awareness Statement (computer science) Price index
Euler angles Moment (mathematics) Physical law Maxima and minima Line (geometry) Theory Facebook Process (computing) Telecommunication Blog Single-precision floating-point format Telecommunication Self-organization Summierbarkeit MiniDisc
Web page Boss Corporation Dependent and independent variables Multiplication Installation art Computer file Multiplication sign Translation (relic) Directory service Database Database Directory service Client (computing) Mereology 10 (number) Goodness of fit Telecommunication Data conversion MiniDisc Form (programming)
Web page Sheaf (mathematics) Directory service Database Parameter (computer programming) Computer Message passing Crash (computing) Error message Computer hardware Scalar field MiniDisc Error message
Backup Computer file Multiplication sign Volume (thermodynamics) Database Streaming media Storage area network Number Error message Read-only memory Semiconductor memory Synchronization Computer hardware Synchronization MiniDisc Error message Backup Reading (process) Physical system Directed graph
Virtual machine Bit Database Computer Demoscene Computer programming Virtual machine Fraction (mathematics) Process (computing) Personal digital assistant Query language Semiconductor memory Universe (mathematics) Query language Process (computing) Table (information) Physical system Backup Physical system
Computer file Data recovery Data recovery Mereology Solid geometry Substitute good Strategy game Computer configuration Natural number Strategy game Extension (kinesiology) Physical system Backup Physical system
Point (geometry) Computer file Price index Database Drop (liquid) Database Drop (liquid) Mereology Subject indexing Message passing Mathematics Pointer (computer programming) Error message Query language Subject indexing Query language Utility software Data structure Message passing Error message Data structure Resultant Physical system
Mobile Web Web page Tuple Backup Theory of relativity Computer file Block (periodic table) Web page Drop (liquid) Database Directory service Variable (mathematics) Variable (mathematics) Number Subject indexing Process (computing) Compiler Finitary relation Subject indexing Block (periodic table) Table (information) Tuple Social class Row (database)
Web page Trail Computer file Source code Range (statistics) Set (mathematics) Drop (liquid) Database transaction Element (mathematics) Crash (computing) Mathematics Different (Kate Ryan album) Touch typing Error message Trail Chemical equation Web page Physical law Database transaction Database Mathematics Subject indexing Delay differential equation Crash (computing) Right angle Data logger
Table (information) Consistency Database Mereology Database transaction Event horizon Causality Robotics Surjective function Dialect Trail Consistency Graph (mathematics) Core dump Database transaction Database Cartesian coordinate system Variable (mathematics) Event horizon Query language Table (information) Tuple Force Row (database) Spacetime
Area Inheritance (object-oriented programming) Patch (Unix) State of matter Patch (Unix) Memory management Expert system Numbering scheme Library catalog Database Bit Library catalog Polarization (waves) Error message Error message Physical system Physical system
Web page Module (mathematics) Information Inheritance (object-oriented programming) View (database) Web page Planning Cloud computing Computer Computer hardware Control theory Control theory Information
Point (geometry) Backup Software developer Software developer Multiplication sign Data recovery Data storage device 1 (number) Database Basis <Mathematik> Number Goodness of fit Frequency Radio-frequency identification Query language Software testing Traffic reporting Associative property Backup Reading (process) Computing platform
Point (geometry) Web page Metre Backup Multiplication sign Database Mereology Number Wave packet Revision control Strategy game Perpetual motion File system Integrated development environment Software testing Configuration space Descriptive statistics Physical system Point cloud Covering space Presentation of a group Server (computing) Web page Electronic mailing list Sound effect Volume (thermodynamics) Database Line (geometry) Configuration management Shooting method Process (computing) Computer configuration Software Integrated development environment Synchronization System programming Revision control Strategy game File archiver Website Point cloud Right angle Physical system Library (computing) Flag
Tuple Multiplication sign Database Client (computing) Mereology Product (business) Strategy game Software testing Social class Area Covering space Dialect Graph (mathematics) Nuclear space Surface Digitizing Planning Database Line (geometry) Human migration Arithmetic mean Universe (mathematics) Table (information) Tuple Freezing
Scripting language Mathematics Channel capacity Multiplication sign Data center Electronic mailing list Bit Coprocessor Software bug
Dependent and independent variables Software developer Patch (Unix) Euler angles Software developer 1 (number) Hacker (term) Limit (category theory) Traffic reporting Software bug Number
Server (computing) Email Digital electronics Information Computer file Server (computing) Hyperlink Planning Core dump Database Symbol table Software bug Word Process (computing) Software Query language Personal digital assistant Operator (mathematics) Video game Software testing Hacker (term) Freezing Physical system Row (database)
Time zone Software developer Hacker (term) Multiplication sign Information technology consulting Total S.A. System call Thread (computing) God
Software bug
Group action INTEGRAL Multiplication sign 1 (number) Set (mathematics) Disk read-and-write head Mereology Medical imaging Bit rate Strategy game Circle Data conversion Error message Physical system Area Scripting language Email Texture mapping Structural load Binary code Moment (mathematics) Sampling (statistics) Electronic mailing list Sound effect Database transaction Bit Complete metric space Sequence Twitter Type theory Message passing MiniDisc Right angle Quicksort Point (geometry) Backup Server (computing) Mobile app Patch (Unix) Characteristic polynomial Flash memory Online help Number Twitter Wave packet Revision control Goodness of fit Hacker (term) Operator (mathematics) Computer hardware Energy level Integer Software testing Firmware Authentication Information Key (cryptography) Forcing (mathematics) Database Line (geometry) Limit (category theory) Cartesian coordinate system Timestamp Pointer (computer programming) Personal digital assistant Tuple
stresses crashing repeatedly the queries to retrieve and results which is always very exciting when you see those or you get experience here messages along that's all we had were everyone's favorite back into running for extended periods without obvious reason might say automatic statements you don't just like every
major traumatic situation when you're assesses administration person a crisis erupts you tend to go through all the stages of denial anger bargaining depression and actually fixing the problem in
denial is the most dangerous 1 of the it's because your 1st language is passing something unrelated both somebody did something wrong POS with doesn't have but of all you were running queries when it happened
I told you not to run queries when it happened in so you told me that we require so you're projecting the problem want to somebody else so OK bargaining
fi all we have to repair rosary vacabond then we're done I can go back to sleep on the personal fix everything it always does it always has all work when in doubt when Amazon and historicity the value media sites it's terrible it's all the BSs for we told you not be which we but we
did now and then you know you're old press statement you know and I you know that you're you're starting your realizes that could be impacted news it's just that the all and then you fix a problem you know it's better like you go straight to this stage be aware that I made a big joke but be aware humans do this and we are all just people and we will go through these stages when confronted with what with 1 of these horrible situation so moves slowly here the notes don't panic they are the very
1st step something goes Freud you see you the 1st indication that something has gone bad and bad way the very 1st step is to stop
stop stop stop whatever your 1st instinct is it will almost certainly be the wrong 1 because the
crisis is a problem was panic so the 1st thing you must do
is nothing at least do no harm if you're down you're down to take a deep breath and move cautiously don't rush toward the solution unless you are incredibly sure what the solution is is the power cable city and pulled out in front of you sure what America but it's really that straightforward so that me moment talk about minimizing communication checks how many people here work for the organization that has more than 3 layers between you and the sea item which is good because I because otherwise I'd be in jail for homicide of the there is a theory of thinking inside a large organizations that every single line employee the only reason the problem hasn't been sum over the grand enough title has that you know that you because obviously as soon as someone with a director or a Vice-President this so the phone then you'll get off your ass subject Facebook and fix the problems you have to wonder why because please think this because the higher these people because and anyone who had this attitude 1st the job should be working there but companies do so turn off your the cellphone don't answer your desk phone pick 1 channel internal Iousy whatever and focus on that because everybody in the organization if it's a highly visible problem is going to be yelling at you to try and fix it to worry about that later fix the problem 1st don't delete anything unless you know that's a solution the problem like the outer disk and that at the end of and the log is 90 % expects venture Croatian texts probably be a good 1st and remember I said tax law
so an actual conversation this not so doing what they're supposed responses I want you to leave everything in wall directory which log directory I
guess I think this conversation multiple times it's so painful
expensive no good way good for revenue but very careful so keep the parts if you possibly can make a copy of the reason for touching anything understanding literally for terabyte databases may not be the most practical thing in the world if you can't meticulously document what you change don't just starts stabbing of the of the on this files hoping to get it right and it just something to remember when you're in this this from this existential horror that you found yourself in there are tens if not hundreds of thousands of posters translations boss Chris works really really well it does have but rule out everything else 1st don't immediately go to the post book and an example of this is client very smart people you know there but not post 7 on which to work that automatically with everybody familiar with automatic form wraparound they were sure was broken because it was running for hours at a time on the multi terabyte database which had never had that happen run on it before of course will run for hours and has to read every single page of the entire desk so the data and that make it simple disposal 7 but they no you just have a really big database 1 hour later it's still running dispose with have about talk about minimizing communication channels of
self work up the stack
look for errors in the message especially when a back in crashes into running on Linux you will probably go in there and find message about the uncle so adjust the uncle parameters or the scalars I mean this is a really just the data sure can read every page on the date this and gets an error say nope sorry whole sections the databases it we know what the problem was that idea what the problem might be this is a
remarkably common problem especially if people who are not familiar with the database is all about the setting it up on specifically the setting up sand or heaven help us all NFS volumes of there can be memory corruption problems Ramesh actually remarkably corrupt sometime take the number of of of uncorrectable errors that are in the spec sheet for the random having your system and multiplied by the time we have a Discover there probably wanted to uncorrectable RAM errors in your heart hardware right down and
also the community streams things like making sure that the star backup of completes before taking serious that use sink and that's executable files around because SEP does not move atomically in our Sync does and so sometimes was was wake up sir reading this segment and get the end of it and start getting indigestion I think people files as well but just a snapshot back up and what I will do a snapshot of primary backup with the wall files was was was able to switch off the dark up to be and then but they have no segments so here there's the thing to do is was
suspected so 1st eliminate system will cost you know it's just program both was is just a program running on a computer it crashes for the same reasons lots of things crash isolate the crashing behavior if you can't quite tables crashing queries fraction or the any other processes scene she showing unusual behavior like or the idea processes Machine sucking up all of memory like the Javier by OK but if you don't have a clean back up because that's always the best solution for practically back of any you desperately need assistance have to back up and you can avoid repeating the problem which may take a little bit of diagnosis and University of the there aren't you
can't have it's just post with databases just thousand it's you can catch so I'm not going to go into all the various ways of patching them because that is beyond the 45 minute talk and it
also something else to remember is the best thing is always roll back to numbered system you may not have that option but there's no substitute for a solid disaster recovery strategy and remember this is your violent avoiding the warranty imposed by opening up straightened background the files directly but it's very easy to get yourself into a worse situation than you were in before so proceeding URIs another
scared you of there are no recipes for doing this you know
like but by its nature corruption is a one-off situation so we should determine the extent of it before of continuing and be sure you can step backwards either because you're working on a copying so
remember work on a copy
or on because you're able to read undo the changes you do if you're not able to copy the entire database you please copy the individual files for modifying before you modify them how similar in this structure is probably the most common on indexes have a lot of internal structure compared to the heat and it's easy to but it's easy to correct an index in such a way that the system stops running it's harder to to correct the heat that way you may start getting better results from individual queries by but the points around has utility that indexes have also to interim pointers and also to intro structure very easy to screw that up I dropped indexes 1st and rebuild them 1st of all registered Reverend drop them the good part about dropping indexes you can drop corrupt and that's no problem just as with file so those have to be valid of if you're getting these error messages than their messages involving these drop the index to rerun the query see if it works
so all the actual data for processes inside the base directory instead of instead PG data directory every religion has a role filed those which is indeed classes beyond the files that make up the individual relation that is a table or an index are based the database and all and a capsule 1 devices mobile . 1 2 3 4 4 big relations the heat unless you compile the special recompilation posters which you probably haven't but those with all these these features are divided into k blocks of each block has a variable number of tuples on it every road impose was has a C T I D which is the block in the tuple and what you it's just a magical and you can actually select it and see it for individual rows so if you
think the promise in the heat that is to say the actual data as it was the index's the 1st easy fix is set 0 demonstrators true of posters reads a bad page and the or reads a page and things that is bad ill-treated is 0 which is to say it has no data on and on which is of which of course you reduce data but in each of the least get on getting wounded database backup and remember dropping indexes from that table 1st so that you don't have indexes pointing to nonexistent tuples all really really bad the
damage is things like when the backing crock crashes when touches a particular set of Rosa rose but you can use DDE going 0 those particular pages do your math right because you don't want be your is very easy to smash something else balance and drop indexes 1st but
another classic problem and this was actually the source of the problem that wrote that I talked about right beginning are prone to the could see lot the commit law because the labels was works in needs to know whether or not a transaction is visible or not and the c log keeps track of all that stuff if you have a missing amateur accidently deleted see log files and you will see this because the law of the tax law because they're all be complaining about those I'm just remember that when you create is all 0 you're working all these transactions is being rolled back not element so this can cause data to disappear out of the database however if you have to do this you will frequently this is the only way you can recover the 1 thing go to keep in mind sometimes you'll see an error message in the have c log value that's gigantic far beyond the range of all of you know in in the in the billions that probably is a different kind of corruption so they just start creating you told billions c log files so it
can you can it do things like how close transaction to reappear robots and other side events so once you've done this leverages you see what I'm not going into details about this because but this is this is so something you can read you can research but it's like a tutorial of all classes of through all the various ways Apache but it it be fun for so
1 thing you can do is dump restore or the nice part about is newly restored database assuming variables which constitute the biting you I will on will be pretty guaranteed to be consistent of course application consistency is something else it may be missing rows or missing data that you that the application think should be there but you know that if you can to successfully do you restore post recipes will be OK other than the 1 what we did actually finally to solve the problem the beginning was we fix the most seriously crops and it did not and restore onto a clean house that worked very well so if you can do a clean PG down and you have the space to put it somewhere that's a good place to start if you have a really serious data corruption especially the kind where individuals while queries are crashing or individual regions of tuples or cause you're back into graphs you copy due mainly copy out individual tables the nice part about this is you can do some around the corruption so you don't have to dump the entire table and that you do is the ball down to create new and
receiving database and copy everything back into it this is the nightmare scenario which is the system can what is corrupted the problem is that
he is itself document is the is the self describing the heat if if there's if you don't have a system catalog in the heap is just a bag of bits it can be correctly bread so sometimes if the data corruption is logically isolated you can go in and directly patch isolated areas 1 example of this year's I ran into a problem where somehow a schema in PDT in MPEG in the catalog scheme the in peace schemas had pointed to itself as spy on parents which was causing no end of polarity when you try to use that Steve that was a that then I can fix that was just an update state of so if you have that kind of thing you can do but do you patch isolated errors if it's deeply corrupted you may have to get an expert into scavenge data of tools to help the user
pages that there's a countryman which so that you don't page information are control data which is which is the control data for the constant and we set x log to reset the wall control information now the cost
of the right answer is don't get yourself into this situation planning
for the disaster if you wrote it was because Kuo insulation any size something like what I described here will happen to you eventually just as the and that's not because both is super buddy by computers such harbor sucks people or cloud providers spend as little as possible on hardware it can be out that bad things will happen sooner or later you will have to deal with the best way to avoid this is to be prepared for the
1st and most important way to avoid this is test your backups if you haven't tested the back you don't have backups I might 1 of my favorite ways you given to developers is take take a back up in read primary ones development platform with an overnight if there's a problem with the data you will find out right away of the most user-friendly Linux go on working people are always saying need to run these giant complicated marketing queries on a marketing report queries on the database that will cause a transactional database to fall over a given their own copy of pulled pulled out of 1 of the nightly back-ups and make sure that your store steps are automated for because when you actually have to do it will probably have to run of so the right kind
that's do point time recovery that he numbers all very is is nice and it's good for emergencies as good for you will think little on quick stuff but you what point in time recovery back ups and keep a reasonable number backups associate wall segments of wall from heroic who is not a bad tools to do this because you stuff everything into S 3 and then moved to Glacier just leave it on a regular basis but you want the stuff around and as the preaching 1 of the awful things is at this stage can take care of it you don't know about it integers is never go disadvantages database for a while make sure you're running with Epson
Collins I assume everyone years because it's horrible you're not it make sure everything really happens but it's a lot of the time in San volumes will lie about whether or not they really think that everything that because of course they have battery backup catch so they don't need of 6 of them because they're trying to get on the train a flatter the number run with full-page rights on unless you're running 1 of these very small handful file systems that can handle the all the departs red cover has the effects so too because most of the systems of most systems not from pages and don't kill my minus line anything please take it deployment versions there allowed I realize I just told you about a scenario in which someone deployed minor version and it was a problem it does happen was was does have but it's it's a big complicated piece of software to be complicated to solve problems but if they have thing but in general you want to be on the latest version because there are a lot because there are a lot of data corruption problems that are fixed in each new 1 so the knowledge but it's a relatively unusual situation also plan upgrade strategy so you can move to new major versions a lot of this sometimes a problem of of 6 is not backward majored corruption problems usually are but it's really painful to see these customers coordinate 1 because they have no strategy from the they've been kicking the ball will do that major upgrades thing later down the road PG upgrade at this point is reliable enough that were recommending it so I would so that's 1 possibility
turned on checksums as of 9 3 others built-in check something on pages the nice part about this is off like 1 description immediately it doesn't fix the damage on error-correcting but at least you know it is very important that you know things are corrupted as soon as possible because the father the longer you gone since the corruption the the more likely it is that you know you'll be unable to recover from a rich have a backup that contains the bad data that were the now lost data a list check some file system like CFS turn it on you do have to do and it is that it is said that database creation time so this will this may be something you have to do is part of the upgrade strategy but definitely do that a new problem not running checksumming something process the Church of it so the creative you think you that some the not treatise on the it's cluster but it's not for database so it isn't it the way you think you ight of free energy meters createdby is always 1 of things that it takes nothing for me to confuse them so 1 of the things also is when you're in 1 of these disasters you often say 0 shoot need to fire up a new post especially from the archive of ITER were firing intervals isn't a big deal even in a cloud environment so getting the the whole set up just right like your your own and running all the all right we had to build Python 2 7 so we could run successfully install these packages and if it's 3 in the morning and you haven't had much sleep you're going to go crazy trying to remember all the things that she went through the provision this new house so all the packages are using some what are using you ideas in which he she UUID libraries ball so always build these systems using configuration management tools so you can just push a button and get a new 1 without having to go through without having to spend this entire time worrying about if you got just the wrong path this is nothing worse than 1 of those that they're going through site active this activists have that this you try to get the site and test all
this stuff throwing please I have Automated Test tools do applicational database can on the 1 it's very make sure that there's no crop data working somewhere in the nether regions the database you know this those tuples was allowed all although all data that was you know you have a database you have a table that has data for 4 years ago this there there has been created 3 years you know it's not corrupt really don't wait for freeze to tell you that there's a problem make this part of the migration of great strategy closely your mean
dataset was the graph where you keep the primary production database how do you keep your database back and how much data you lost always think about this meteorites happened I live in San Francisco you know the databases eventually every database in the geographic area will go off line at the same time this will happen but you know if you live in a in a few live in Seattle there's a larger nuclear bomb with snow on the top waiting for you in this that you know it'll happen so what's the plan and AWS Regions go line to the as we discovered when class so strong through but also remember things like I speaking with a client and I said OK you're and 365 named Big Data Center of the that built on gold surfaces goes their boats underneath it from the Gold Rush as it you know that whole ground is gonna 365 mean is going to fall over and burned in Bern when there's an earthquake a big earthquake so what's the plan to not only going cover we're at digital realty across the universe OK OK so let's let me do 1 rethink the strategy of playing you have
these situations have around have a bit of you know whether it's you know that you never know what it's a little piece of paper or something of that as leader doesn't require somebody else's data center to be up to get out of you you know these things are not perfect you frequent you have to go off script for a book but it's very important have this list changes they do this do this do this do this do this if that doesn't work then it's time to step and go back because you're the all these are the worst possible situation you read everything is going wrong at the same time and you do not have the processor capacity in your brain to sort it all out yourself because remember you'll be doing this
honestly so found the bug like I
found 2 months last year in PostgreSQL to these data corruption bugs were ones are reported
so they were anyone else besides nearest industry from movie was really bad yeah so this is the attitude that all the people that a lot of people will have when new report this but yet it's all it's
awful the budget you just found the worst but in the entire world but it was the worst thing in the post developers world they would fix it already and remember no 1 is paid just a fixed dose response if there are a limited number of people can happen post-crisis intervals and there are that many of so we document about the
thorough develop a test case if you can databases are aware understood that if you're having a but that's where the only crops upon your 2 . 5 terabyte databases building nicely this case may prove difficult people understand but document everything even if you think it's not important to Dole over diagnose the problem right away but when I reported this 1st but I blew this I really wouldn't straight to a diagnosis of it and I was wrong and that probably slowed down about text remember that they will need data if your data is sensitive to make sure you have a plan to anomalies so other people can look at because you know if you have gone and if you're in the US if you had the system the health care information or something like that you can't just throw the database onto you know on onto onto a public servant safe here here's everyone's cancer diagnosis records please let's let's fix this but together so I have a plan to economize stuff and file about there is about the fact that there is about filing system proposed you by e-mail or Web link there guidelines please read if you friends critical but now a critical is not does not mean your life is now miserable sadly that's not an operational definition of critical if it's the data corruption or repeatable server failure not your queries running slow considering about actors on because people want know these things exist diseases that databases advanced software they just work it out so he's so it is important to bring to escalate these kinds of issues but remember everyone's busy with their own crises you're crashing
freezing but we only sold the above packages PostgreSQL and if you're getting caught up circuits actresses out of them with the symbol so that's why you want to or if these are hung up attach straight to to see words coming up this can be very handy for when back in processes it's going and going and going and going markets show this actually waiting on a lot but it's never Germany for some of these are reasons once
you brought this up be persistent that people like the the biggest no no is filed but say 0 my god it we're all going to die the thing is brings the ground and they you never applied in following question monitor these threats and the i zone out on hackers all the time because the what you know it is drinking from a fire hose but if I raise an issue I will monitor for those those that when people ask questions answered them properly and answer them thoroughly and remember don't matter that they don't work for you in if they do and the reality is if you have a well-documented repeatable critical but was this it's fixed pretty fast so this works is that thing is there's no reason to be cynical about bringing up this kind of thing really does work and if it's a total
disaster and this is an organizational thing consider spending money I recover you fix the problem you can do that I might have a recommendation on and when you also you think gets it gets the call back and you pick yourself up off before war and the if you think posters consultees expensive was these not expensive this is expensive and
this compare the cost of of of getting 1 bug impose for annual license on this guy what's not
effective and questions I'm your all your role in in so it here so we've we've yes it's a little beyond the the build which will be and it
will be announced on Twitter yes I think in the whole of the of the the of the in this region we're able to do the patching at the SQL level as opposed to having to go in and not with on on the nice part about this is because the applicant I I'm not saying everywhere you go out and modify the schemas to include all modified timestamp right you have the same tool but it was during the and because it because that we were able to see find these duplicate records and throw away the the the the bad ones similarly we world compare the 2 databases and effectively using the modified time step as a primary key as a 2nd year of secondary primary the texture of it is that of 2 of and the to find the errors between them and and write scripts that leader inserted the stuff into the secondary data the new master of because of what was going on was in effect transactions were of it was almost like is running repeat it like the trade like because of things like when someone a tuple was updated on the secondary the primary was the primary was fine the promise primaries badly out of date by the time we noticed some of you add updated to pull come in but the old 2 would still persist so there will be 2 versions of the same tuple with the same primary key and which was a which is a good 1 and by using modified every a later ones that 1 that was very that we were lucky in that regard but is that myself reason to that last-modified time stamped everything probably not the head that we did look at of other things that you all know I had generally I had to patch things up on this and what we would do is look for ridiculously out-of-range values for things like of friendly what you'll find is is on is things that our the top that like Twestival pointers sort of are a good example because the the 1 things that are kind of highly structured inside of the but the problem is that you know the kind of we look at the sequence of bits and the feed almost anything makes sense you know integers can be anything you know but but of compressed the top of compressed types you can look at things and see like well this is supposed to be text type here and the Rangers 8 million name that doesn't look good and we can find things that way searching through those in the most critical situations what you can do is in effect we're of rebuild look at what the supposed to look like based on the schema and will go through manually but use same point manually going into indeed each repair databases primary for ourselves and we only that this of roles I want I personally that was a relatively limited number of cases like to but fortunately in the case of the very 1st problem I presented we're able to its equal level so that was worked out pretty nicely this thing the thing now because it has rewrite the disk image yeah you have to dump load at that point yet sex and help but sadly so the I am not aware of 1 that goes through and there's a not at integrity check that doesn't involve poster server which would be a great tool to have actually and in my copious retirement might take a stab at it but but generally you run into these things because you're getting either back as a crashing were throwing errors weird messages and so in effect was forces of that tools to find that stuff CA the use of the all the action of the of the announced mailing list proposed with yeah if you if you don't do what I so but it's too late for me would save yourself which is done all of your post which would be known to a single inbox I I have no idea that for a long time which meant announcement the same places hackers in general and performance of all this stuff and some would say is all by the way this worshippers which will cause genetic damage in your children and uses and extends all the other e-mail so now I separate those out of them at a different level so look at the in out the dash announced lists and read it and then read by its can be either lasing and the 1st to admit but read the release notes for every version of PostScript because there can be important stuff in there and it will also give you a good idea of how important this will really if it if it is well we fix school but in this thing you know in this authentication method you never used OK whatever if it but really you read this thing is is that if you don't use all this regular right away you're running on borrowed time and then you definitely want to show the new version 3 minor version upgrades very fast fortunately you just have replace binaries circles which back up again you could also and not aware of any time that you couldn't go backwards on a minor version but you may not want to but I like we did here going from like 1 of 9 3 1 2 9 3 0 so the definitely if you're responsible for the if if somebody's going to be upset if post with that you personally it was considered work following apps anything else search the rest of the world as far as it is it is the unanimous there is very little performance and I am sure that you can quickly I'm sure and you know people do some remarkable things because we have actually triggered bills that area but I would I would I would I would just assume that will be fine enable by I I feel very comfortable no reason not to it's a little bit can be a little bit tricky to to verify that on you can do smoke tests I mean I will sometimes slip all poured out of you can see if other people have problems with same hardware but if you're if it's an expensive enough system you can you can sometimes stuff around the people were selling it to you to get actual technical information 1 of things you know the question to ask is so when I issue a safe against the fastest and what really happens doesn't really hard and everything straight onto this conversion 1 of these things don't what they do is they just write to the battery backup catcher called it so I usually at some point someone will have found out is the work that they have documented this fact but it can be tricky which had a variant of but obviously this is not something the old well there is really fast attributed to be corrupted is usually not ballpoint economy preposterous yourself well battery backups can fail on or the battery backup or you can be downloading 1 of the battery trains but the moment you know it's all their probability of course if you're willing to accept that probability there is the thing about stands is also so you have to you have a strategy for what happens at the samples also this is something that a lot of companies treats answers if there no perfect and set aside your meteorite coming through the roof incorrect crushing and also I I have flash firmware sand the whole set fortunately there were about that because of the whole set because I insisted that that at the top of the 1st place we utilize that we do a complete backup before flashy diversion before so it's a and then the rate of 2 years ago I would have a slightly different opinion on because this is these were not kind of is fully baked as you want them to be but now you just make sure they are our are you know enterprising detector but you know the the expensive ones home that they actually that flush works the trim works and things like that but while the shore and fast yet so part of what you have 1 1 thing to consider is putting up is if you can if you if this is the this works for you it's often not that bad idea to put the wall on 2 hours of very fast and it's because it's been written over and over and over again and right where can actually be a problem over time on Wall points of also it doesn't benefit as much from the assets these random-access characteristics as it because depend only operation so if you especially if you have a hybrid system has well that's the seasons and fast disks if you need to put something on a spinning just put the wall there and this question behind him so I made it up so I just yeah yeah some some UCAs right where is it right the other people people so that have gotten kind of weird and I Rowley about right where on SST is but that's from because using them processes but if you're running them on solutions is taking right all the time right where can still be an issue is that ok if you so much for coming


  524 ms - page object


AV-Portal 3.20.1 (bea96f1033d39fbe77f82542458e108105398441)