We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback
00:00

Formal Metadata

Title
RTFM
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Reading the manual before starting a new work is always a good practice. However some situations like pressure for delivery or lack of attention may lead to wrong assumptions that cause unpredictable results or even disasters. The talk, in a semi serious way, will walk the audience through some of corner cases caused by the lack of the good practice of RTFM.
CountingGoodness of fitOffice suiteFreewareNoise (electronics)Group actionQuicksortSystem callObservational studyFood energySoftwareStaff (military)Computer programmingWordCondition numberProjective planeOpen sourceBitComputer animation
Computer-assisted translationInformation engineeringFrequencyMechatronicsHand fanMultiplication signMassMereologyVideo gameStructural loadMomentumPoint (geometry)AreaComputer animation
Replication (computing)Projective planeExtension (kinesiology)Point cloudWeb pageBlogHand fanProcess (computing)Metropolitan area networkSequelWordMultiplication signSource codeComputer animation
Level (video gaming)Streaming mediaWordLevel (video gaming)Game theoryCausalityGodArea2 (number)QuicksortExact sequenceSource codeVideoconferencingOffice suiteWritingComputer animation
Table (information)Content (media)Content (media)Table (information)Bit rateProjective planePoint (geometry)Context awarenessSource codeComputer animation
outputContext awarenessRow (database)Queue (abstract data type)Computer filePhysical systemWritingMiniDiscRoundness (object)TimestampTable (information)Web pagePoint (geometry)PlastikkarteField (computer science)IntegerPrice indexVacuumMultiplication signAsynchronous Transfer ModeLengthRead-only memoryBlock (periodic table)Process (computing)DeterminismRight angleMathematicsProjective planeControl flowCovering spaceGoodness of fitDrop (liquid)IdentifiabilityMoment (mathematics)Stability theoryCuboidAverageReading (process)WordComputer animationSource code
Uniform resource locatorTable (information)Field (computer science)Bit rateElectric generatorRow (database)Point (geometry)Key (cryptography)2 (number)Equaliser (mathematics)Touch typingRotationOperator (mathematics)Right angleAreaComputer animationSource code
Operator (mathematics)Link (knot theory)Graph (mathematics)Bit rateGroup actionVideoconferencingMultiplication signWebsitePresentation of a groupData storage deviceRight angleSlide ruleComputer animationSource code
BlogFunctional (mathematics)Data storage deviceDatabaseRow (database)Error messageVirtual machineLoginMessage passingSoftware maintenanceReal-time operating systemQuery languageTable (information)Point (geometry)DemosceneSequelThomas BayesWordInformationSource codeComputer animation
VacuumDatabaseNumberSequenceRow (database)Computer architectureTupleFreezingInsertion lossIntegerVacuumCausalityAsynchronous Transfer ModeMessage passingWebsiteMereologyOperator (mathematics)Analytic setArithmetic meanArithmetic progressionMetropolitan area networkMultiplication signComputer animation
Table (information)VacuumException handlingFunctional (mathematics)Multiplication signDatabaseLoop (music)Insertion lossQuicksortStapeldateiMathematicsDatabase transactionFerry CorstenOperator (mathematics)Software testingMultiplicationGroup actionGreatest elementParameter (computer programming)Right anglePlanningBitQuantum stateEmailComputer animation
Chi-squared distributionFunction (mathematics)OvalLoop (music)Exception handlingStudent's t-testTable (information)Loop (music)Exception handlingFunctional (mathematics)NumberInsertion lossSheaf (mathematics)Context awarenessAnalytic continuationFreezingVacuumComputer animationSource code
Query languageDatabase transactionExpected valueVacuumSelectivity (electronic)DatabaseLoop (music)Software testingFunctional (mathematics)Hydraulic jumpException handlingGroup actionInstant MessagingSquare numberNoise (electronics)Universe (mathematics)Thomas BayesSource code
MehrplatzsystemLattice (order)Type theoryNoise (electronics)Interface (computing)Asynchronous Transfer ModeCausalityDatabaseQuery languageComputer fileLine (geometry)Universe (mathematics)CAN busTouchscreenInformationMessage passingSource codeTable (information)Single-precision floating-point formatMathematical analysisWeightComputer animation
VacuumTable (information)Query languageView (database)Statement (computer science)File formatRule of inferenceSource codeScripting languageInformationComputer fileMaterialization (paranormal)Theory of relativitySquare numberSource code
Process (computing)Functional (mathematics)Online helpException handlingPatch (Unix)Block (periodic table)MereologyCASE <Informatik>Database transactionRule of inferenceMetropolitan area networkComputer animation
Lemma (mathematics)Mixed realityMultiplication signProcess (computing)Integrated development environmentDatabasePosition operatorTable (information)Casting (performing arts)Field (computer science)Data storage devicePlanningClassical physicsTunisStatisticsSpacetimeQuery languageType theoryPoint (geometry)BefehlsprozessorInheritance (object-oriented programming)Data structureData typePartition (number theory)Java appletSemiconductor memoryRow (database)Functional (mathematics)Server (computing)Formal languageQuicksortState of matterDisk read-and-write headMoment (mathematics)Local ringMatching (graph theory)GeometryCycle (graph theory)System callCodecKey (cryptography)2 (number)Arithmetic meanSource codeComputer animation
Cheat <Computerspiel>Function (mathematics)MetadataCasting (performing arts)Meta elementInformationSelectivity (electronic)Row (database)Parameter (computer programming)QuicksortMultiplication signDampingRight angleMetreSource code
Semiconductor memoryMultiplication signProcess (computing)MathematicsLevel (video gaming)Context awarenessShared memoryData storage deviceObject (grammar)Local ringLibrary (computing)Single-precision floating-point formatForm (programming)CountingSocial classRevision controlMeta elementTransformation (genetics)Type theoryRow (database)Source codeComputer animation
Functional (mathematics)Different (Kate Ryan album)EmailCasting (performing arts)Multiplication signMathematicsContext awarenessArithmetic meanKnotCartesian coordinate systemProcedural programmingQuicksortProcess (computing)Mathematical analysisElectronic mailing listCode2 (number)Parameter (computer programming)Expert systemBlogGreatest elementLevel (video gaming)View (database)Product (business)Arithmetic progressionMathematical optimizationMetropolitan area networkFormal languageMetreCausalitySemiconductor memoryPointer (computer programming)Data structureQuery languageSystem administratorError messageIntegrated development environmentSource code
Computer chessPreconditionerExecution unitProcedural programmingMultiplication signSoftware developerDatabaseSource codeLine (geometry)Automatic differentiationMetropolitan area networkPoint (geometry)Hand fanBit rateArithmetic progressionPosition operatorSheaf (mathematics)Military baseSource codeComputer animation
Row (database)TimestampQueue (abstract data type)Query languageTable (information)WritingCasting (performing arts)File formatField (computer science)Multiplication signFunctional (mathematics)Level (video gaming)Process (computing)Exception handlingKey (cryptography)2 (number)Array data structureNeuroinformatikInformationCoprocessorMereologyLogicElectronic mailing listEmailIntegerPrice indexBoolean algebraDatabase transactionAreaRollback (data management)Maxima and minimaArithmetic progressionAxiom of choiceSelectivity (electronic)Pointer (computer programming)NumberOrder (biology)Optical disc driveFerry CorstenStaff (military)QuicksortMathematicsTransport Layer SecurityCAN busComa BerenicesQuantum stateCost curveSoftware developerSequelData miningCondition numberRule of inferenceExpert systemFilter <Stochastik>WebsiteRight angleVacuumDiscounts and allowancesElement (mathematics)Position operatorComputer animation
Source codeTable (information)Goodness of fitRing (mathematics)DatabaseMereologyOnline helpDisk read-and-write headCodeMathematicsElement (mathematics)9 (number)Observational studyComplete metric spaceWritingDatabase transactionReading (process)Data storage deviceSound effectInformationCombinational logicCASE <Informatik>FamilyExecution unitMetropolitan area networkFunctional (mathematics)Line (geometry)UsabilityComputer clusterCondition numberPlanningArithmetic progressionCuboidPatch (Unix)Open sourcePoint (geometry)Flash memoryEntire functionTupleRight angleQueue (abstract data type)Data managementArray data structureRevision controlQuery languageResultantSubject indexing2 (number)Product (business)Level (video gaming)Port scannerComputer animation
Open sourceFacebookPoint cloudOpen setComputer animation
Transcript: English(auto-generated)
Let me introduce Federico here, our last speaker for the room after a long weekend that you
have been getting all this knowledge from open source project and free software projects, probably you're a little bit tired now. I was lucky to be also the program committee of this year and we selected specifically Federico to be here because we know that he has good energy and very interesting topic to close out our room.
So Federico is probably going to be able to deliver our very last talk in a very good way. He used to live in the UK but before Brexit he realised he had some enlightening conditions and said I will come back to my home country. So directly from Italy, please welcome Federico Cantoli.
Thank you very much. So, yes, my name is Federico, live in Italy and yeah, today this is the last talk of the day. Today we will have some sort of stories, rather technical stuff, so it will keep you awake
because it has been a very tiring weekend as the FOSDEM normally is. So, a little word about myself. This is my lovely cat, Hodzi. This is me, it's better the cat is on the front because it's cuter, so it's much better this way. I was born in 1972, I look younger, but as my girlfriend says, I am a time lord,
I'm a huge fan of Doctor Who, so that's the reason why I don't look old, but I'm really, really old in my opinion. I started with the IT in 1982 with the Tron movie, an amazing movie, and I joined the Oracle secret, Oracle DBA secret society in 2004.
This is the way I like to describe that period of my life, it's like being part of the secret society for sure. In 2006 I discovered this crazy database, at that time it was the 7.4, it was a massive bet for me,
and it's a bet I believe I won because now Postgres is gaining momentum, it's becoming something amazing, it's still amazing. And like Devrim, I have a Postgres tattoo on my left shoulder, so I'm really committed to Postgres.
I work as freelance DevOps and data engineer in the meantime when I'm not at the conferences. Those are my contacts, technical blog, I write stuff about Postgres, this is my Twitter, so if you want to follow me, I'm very happy to get that.
And this is my GitHub page where I have a couple of interesting projects. One of those I spoke yesterday, I talked yesterday at the Python Devroom about a compactor for Postgres in cloud without the use of extensions. It's an interesting project. Another one is a replication tool from MySQL to Postgres.
So it can help you to escape MySQL if you are using it. And the last one is LinkedIn. So today we are talking about RTFM. Anybody knows what RTFM means, right? So the F, read the F manual. What the F stands for?
Fantastic, fabulous, funny, fancy, well, not exactly. I'm not speaking that word now, even because it's in live streaming. I don't like to swear in live streaming, but that's it.
So what is the talk about? Telling stories of people that didn't do the RTFM and caused issues. So issues, disasters. So like any other disaster, I decided to put DEFCON levels like wargames. So we have disaster of DEFCON 5, just a startling noise, the DBA is vaguely impressed.
So well, something happened. The second level, DEFCON 4, tripping over feet, DBA is alarmed. Oh, something wrong is happening. And earthquake for DEFCON 3, jumping on the seat. And believe me, having an earthquake under your seat is something very, very startling.
I live in a quite seismic area of Italy, so sometimes we get these sort of emotions. DEFCON 2, asteroid dropping from the sky, DBA freaking out.
And then the DEFCON 1, Daleks invading the earth, DBA going berserk. It happened to me, going berserk. I was wearing all around the office. Then we have the dramatic persona, still from wargames. This is Professor Falcon, myself, old, ugly, quite cynical, and it's the perfect incarnation of the DBA, right?
And the others, David Lightman, young, reckless, very, very smart. And we have ideas that can cause the third world war if they are not careful. So, table of contents.
We have three stories and a wrap-up to discuss about what we've seen. The battle did it, emergency shutdown and cast a spell. Three stories of three RTFM that never happened and what caused them. Now they were fixed by myself.
So let's start with the battle did it. Let's see the context. We had, that was back in 2013, it was Postgres 9.1. Still the old PGX log, not too much about monitoring, there were no fancy things.
And we had this system with one very expensive Fusion IO cards. At that time, they were pioneering the disks in PCI Express. They were massively expensive. This system worked quite nice, but here this strange table, we use it as a working queue.
So some records, all the records were picked in some way, and these records were used for processing some stuff in other places. This table had just 100 million rows, not so big in my opinion, quite big for that time, now it's getting medium size for the size we are using now.
And this table had two timestamp fields. These two timestamp fields were updated twice, one for the start and one for the end of the processing. For each row, and each row was 160 bytes in average length.
Now, anybody knows the MVCC, how it works? In Postgres there is no update, so this thing was inserting twice these rows every time and each row were processed. And this table had indices, so the rows changed pages and the indices started bloating.
And also there were the primary key onto integer fields. So the problem wasn't exactly the bloat. The table coped quite well with the performance, but the problem was the SSD. On SSD, we have limited writes.
And writing a lot of stuff on this SSD, even with the round robin for the blocks, caused a massive exhaustion of the writes. That was the IO just on the PGX log, thanks goodness, it wasn't rotating disk, otherwise it would be much, much worse. And the data files were even harder in this thing.
So the table was rewritten every day. Every day this table got new blocks, and the rows started accumulating inside it. The dead rows and out of vacuum started every six hours, flashing more blocks on the disk, so consuming even more the writes on this Fusion IO.
And in just eight months, the available write dropped from 80% to 44%. Just eight months left before the dooms day. And the dooms day, in SSD word, it means disk in read-only mode.
At some point it just stopped accepting writes. Daleks coming down to invade the earth. So how was fixed this thing? Well, the primary key, I redesigned slightly this thing. I didn't touch the existing table, because it was so deeply coupled with the rest of the things.
So I could not change this thing. But I created a new table aggregating the fields of the primary key. The first field has common value, and the second field, I used as a grouping key. Used to be stored inside an integer array. So accessing from the first field and accessing the single pieces of the array,
physical location of the array, I could access exactly the point inside the other table, without rotating the rows inside the other table. And when the fix came in, the world generation rate dropped to 40 megabytes per second from 1.5 gigabytes.
And this is the mooning graph of the writes available. This is the operation, the mad operation. And then when the fix came in, everything went flat. No more risk. I estimated that the remaining PB will last for at least 10 years at that writing rate. That was amazing.
So how do you avoid this sort of situation? Doing RTFM. And for RTFM, I can recommend you these two links. MVCC slides and video from Bruce Momjian. I love Bruce's presentations. Go to his website, his momjian.us.
Watch it. It's something you will learn a lot of stuff. Every time I visit, I come to the conference, and there's a talk from Bruce. It's something like, bam, for me. My brain explodes every time. So watch this video. It will explain very, very clearly how the MVCC works
and how you can avoid the risk of exhausting your writes, but also reducing the I.O. So reducing the I.O. means having a more efficient caching, a more efficient database, and more responsive working for you.
This is the first story. This was the first story. So let's move to the next one. That was me, and I caused the DEFCON 1 situation. Emergency shutdown. That was an amazing discovery. By the way, this story, tomorrow on my blog, it will appear, the blog post about this story with all the explanations.
So if you look at the blog tomorrow, you will find even better described about what I found about this experience. So context. Virtual machine.
That was in 2016, 2017, probably. Postgres 9.6. Virtual machine for business intelligence. Fairly big database, 1.4 terabytes. Real-time replica from MySQL to Postgres using my tool, P.G. Chameleon. This tool first reads the rows from MySQL, stores into Postgres,
and then uses this information for generate the DDL and replay against Postgres. And now this is done through a PLPGSQL function. Replays the data and also manages all the errors that can appear during the replay.
So you can exclude tables. You can do some interesting stuff. So everything works very, very nicely. Monitoring not yet implemented. My bad. I was too busy on working on that thing. And, yeah, everything went fine. People and analysts were very, very happy.
They gave me this T-shirt as a gift for bringing Postgres in place on MySQL. And the day started normally. And then I heard screams, people not running any more queries with horrible errors. But the database was up as usual.
But the nightly maintenance failed. And I got this message in the log. Database is not accepting commands to avoid wraparound data loss in database analytics. This was the only shutdown emergency mode I even had in my career, starting from 2006 on Postgres.
And I was the cause of that. So I'm very proud of that. So I don't know if you have even read this part of the manual on the postgres.org website. But it's quite scary. What happens when these messages appear?
It means you have at least one tuple at just less than one million rows from disappearing from the database. Which is inside the database architecture. The rows are compared against numbers. And these numbers are just 32 bytes integers.
So they, 32 bits integers. So they wrap every four billion times. Every four billion numbers. They start from the number one and restart the sequence. And everything bigger is in the future. Everything smaller is in the past. So as soon as it wraps around, everything becomes in the future.
Postgres have a very clever way to avoid this scenario. But this often, the number rows lot available, two billion in the future, two billion in the past. Whatever is the number. And that is the reason why. So you have to run vacuum or auto vacuum to perform the operation of freezing the rows before this problem happens.
So what causes that? In sufficient auto vacuum processes, databases should have the auto vacuum on. Because auto vacuum does a lot of stuff. And in particular, can save you from having these sort of issues.
But tuning the auto vacuum is very important. Because fewer auto vacuum on databases with a lot of updates may not pick the correct table for doing the auto vacuum when it's needed. Or when the auto vacuum to prevent XID wrap around can start too late for saving your database.
And the other reason was myself. There is an apparently, I haven't found any reference about this behavior. An apparently undocumented behavior of the PLPG SQL functions.
You know, the functions in Postgres are single transaction. When you run this function, when it finishes and commits, you get all the changes. Otherwise you roll back everything. So they consume just one XID, right? Well, not really. Because if you have an exception inside your function's body, inside this exception there is an IDML, insert delete update.
Each time this IDML is executed, consumes an XID. So my function for replaying stuff in PgChameleon was consuming 100,000 XID every time completed a batch to replay.
And it replays exactly the DML for loop in an exception, consuming an XID. So I will show you slightly more in detail this thing. So we create a table ttest with the ID integer, no foreign keys, no primary key, just a simple table, just the data.
Then we have this function that fn loop with no exception. We declare this number and then we loop over this number from 1 to 1000 and we do an insert for each loop. Then we have the same function but inside the loop we have a begin exception, when others then null.
So if there's any error, just a function, skip the iteration and continue. So you have little sense in this context but it can be useful to have the exception. So let's check with this database.
Vacuum 3 is my database, starting from 3, I run three queries before doing that. Select that name, age of that frozen XID from PgDatabase where that name is equal test. So it returns the name and the age of the oldest transaction ID inside the database that is not frozen.
Then I execute the fn loop no exception and after that my XID increase by 1. So jump from 3 to 1, as expected. Now let's do the same with the other function. Age starting 5 after this query. Then I run this thing and the age jumps to 1006.
So 1000 XID, 1 per each DML inside the for loop plus the XID for the function itself. So how this was fixed? This is the emergency handling in the database universe.
You forget everything, you silence everything. And first thing, silence Slack because people were screaming in private message, getting a lot of noise from Slack. So silence Slack, move into an empty meeting room, put the message, I know, on the screen, on the door.
And then start the cluster in single user mode. It's an amazing experience, I don't want to repeat anymore. If you have ever worked with PSQL, PSQL is fantastic. Line completion, backslash commands, no. The single user mode is something horrible.
You get a lot of stuff, a lot of noise and you have nothing to do. You have to type every single letter of your query. You can do probably, maybe copy and paste maybe may work. But there's no way for loading files in external. So I think it should be improved, this interface.
So it should be kept that way. So people are so scared that they will try to avoid that at any cost. So start the cluster in single user mode. Get the edging tables and vacuum the edging tables and do a postmortem analysis.
Analyze what went wrong, who was the cause of that thing, myself. And this is a very useful query for getting the edging tables. I think this one, I'm not sure if I derived it from the PostgresQL Wiki,
which is an amazing source of information. Or from the pg-check-postgres, the bucardo Perl script for Nagios. But this is very useful because taking into account the real frozen XID for the table, but also for the associated toast table.
Because you can get a table that looks nice, but maybe the associated toast table is edging. So this query gets everything outside for any relation, table, or any materialized view. Because do not forget about the materialized views. Materialized views are basically tables that follow the same rules.
And when you get this, maybe something more fancy can be done. Something like using format for building up the vacuum statements straight from that thing, dumping on a file, this is a very Oracle style for using that, still using that approach.
So, RTFM. What I missed in this thing? Well, I wasn't able to find this case on the documentation. If you look on the exception part, trapping exception on the Postgres online manual,
it says yes, use exception with caution, because they are more expensive compared to the function with no exception blocks. But it says nothing about the exhaustion of the transaction XID. So, it may be useful to add a warning.
I will add by myself, I will submit a patch for the Postgres manual. First of all, I need help for building up the entire thing. So, how to build the documentation? I tried and I failed, so I need to learn. And that's the amazing part of this job.
You learn constantly. So, cast a spell. That is another amazing story. Let me recap today. 2012. I started the second job in the UK on PostgresQL.
Very large database. And that time there were a mix of environment between Postgres 9.0, Postgres 9.1. Then I upgraded everything to Postgres 9.2. It was an interesting experience. So, we had this large database.
At that time it was large, two terabytes. Still Fusion IO on that thing. On that database, I remember I joked about playing Tetris with the tables and tablespaces. Because periodically the active partitions were moved on to Fusion IO, but there was not enough space for keeping everything onto it.
So, the expiring partitions were put on rotating disk. And every month there was this movement of Tetris. So, playing a lot with the space. And they had an horrible design. And for horrible, I mean tables with a few fields.
One edge store field. And Java mediated the structure straight into the edge store field. And that was causing a lot of problems until the day I removed that thing. And we ended with a 91-field table.
Quite big. And we had, it was one of my first assignments doing performance tuning on a query that retrieved just 150 rows in six minutes. Okay, the storage were big, but that was absolutely insane. The storage, it was in SSD.
It was lightning fast, so there was something else. We had this super expensive Fusion IO storage and super expensive CPU and memory state of the art in 2013. We are talking about a 35,000 pound server. Bare metal, no virtualization, everything installed on that thing.
And they performed horribly. So, checking the execution plan. Well, first check the statistics are okay. Then check the execution plan. The execution plan was absolutely okay. No problem at all.
There were sub-queries with wrong joint criteria. Try to fix it, but no success in slowing, in improving the speed of this query. But at some point I did a select star from the same query. And the query completed in seconds, not minutes.
Oh, something started alarming. Some bells started ringing in my head. So, well, what went wrong in this thing? You know, edge store is not so complex. It's some sort of grandfather of the JSON. So, every time, whatever you store inside the key,
you get the text data type. So, if you are a strongly typed language on the other side, you have to do a cast for this data type. And the developers, instead of doing classic cast, colon, colon, data type,
decide to write a PLPG SQL function for each type they wanted to cast. You are reaching the point, you are seeing what's the problem. So, we add this function. This is a pseudo-function, it's not exactly the same.
I don't remember how it was. So, I re-written, trying to follow the same idea. So, they've written cast to integer, cast to text, why? Cast to float, cast to timestamp. They passed the key name and the meta data store,
the meta store, as a parameter. So, they used this information to retrieve the information and doing a cast straight into the return row. It was an amazing idea, right? Well, and we add this sort of select. Count another 80 meta columns written in this way.
So, every time, oh, but it was also worse. This is the optimised version. They made this thing at lower level of the sub-select, when the sub-select processed hundreds and hundreds of rows
before doing the final filter for the 160 rows. So, all time, every time you have something like that inside your select list, the hstore.so, the shared object, get accessed and the transformation from the hstore stored on disk,
which is basically text separated by pipes with an either, get transformed in memory, transformed in datum. The library does the magic. There's a lot of context changes between the memory, the main memory, and the memory used for the shared object.
Then this thing, it gets passed through Postgres. Postgres does the final cast, because he has to cast into the primitive type. And finally, the shared object gets discarded. And this is just for this one. Then it starts again, and again, and again. Count for every single row. It was even faster, six minutes.
It was incredibly faster, in my opinion. So, how this was fixed? Yes. Oh, no, our method is OK. You are wrong. You don't understand all the implications. We are using all the structures.
We are using... Man, I want to do a simple cast. Well, there was another function I don't mention at this point, which made a slightly different thing. It converted to text and then performed an internal aggregation in a different way array aggregation works. Not skipping the nulls.
They kept the nulls inside the aggregated array. They've written some sort of different aggregation function just for keeping the nulls inside the array. I don't know why. Why people want to keep nulls, which are basically nothing. But they made it this way. And that was the only function left in place
because there was no way to change in another way. Well, at that time I didn't know too much about the custom aggregations. Custom aggregations are something amazing in Postgres. Have a look. There's a blog post about from Josh Berkus and on my blog
how to build up a custom aggregate. You can do a fantastic thing with this thing. So, it started with the argument. When the argument finished, probably a couple of weeks, I finally, I've been authorized to rewrite the join and get rid of all the select lists.
And the query dropped to 10 seconds. It will be faster, probably, if I will be able to remove also the final piece of code. So, live long and prosper. And what to do for RTFM.
This is a very complex, this is a complex answer because working with Postgres is not simple. It seems simple, but Postgres is an incredibly complex environment and an incredibly complex product. The first thing to do is read the docs. So, get into the concept of what is a PLPG SQL function.
Maybe if they would have used the SQL function, the damage wouldn't be too big because the SQL functions are built-in in Postgres and they get in line. So, it would be much, much better. So, use the procedural language only if you need the procedural language.
I've also seen procedural language used for transforming data in ETL procedures. This is the wrong way to do it because the performance is completely screwed. The best way to do this sort of stuff is to do insert select.
So, if you can stick on the pure SQL as much as possible, you will get the best performance and the less churn in memory caused by context changes. If you have some idea, before starting implementing it
and maybe wasting your time, ask the community. We have a lot of channels for doing that. Of course, mailing lists, they are very, very nice. IRC on the Postgres channel on Freenode, there is a Slack application and there is the Telegram channel if you use Telegram.
And in this room, there are two admins, me and Janine on the bottom. So, we are the admin of the Telegram chat. And the last thing, if you can get it, because it's still quite difficult to find a pure DBA or a Postgres expert at DBA level, hire a DBA.
And more important, listen to the DBA. I remember there was a t-shirt written, I am a DBA, can we start assuming that I am always right and then we can continue discussing?
Well, it's not that true because DBA makes a lot of errors and a lot of mistakes. I'm here as a testimony about that. But it's important, if a DBA comes in and asks for some optimization, it's not just because he is arguing for something abstract,
it's because there is a real problem, even if the problem is not yet there. DBA usually looks forward to the problems. So, I've been incredibly fast, so let's go for the wrap-up.
I can explain how to cook a carbonara in the extra time if you want. Seriously, do you remember, I presented… How much cream? No cream, no cream. So, wrap-up about this thing, greetings,
the best, the only way in the movie is not to play, not really true. RTFM, reading the manual is important, but it's also very important to read between the lines of the manual and in Postgres in particular we have an amazing opportunity.
Other databases, closed source databases or dead databases starting with O, do not allow the users and the DBA to access what's happening, to know exactly what's happening inside the database. They even wrap and obfuscate the procedures in Postgres.
We have access to the source code. I like to think about the Postgres source code as a super poetry written in C. This is what I've written on my book because the… Is it poetry?
You can read, you don't have to be a C developer for understanding what's going on in Postgres. First, because the README, they are amazing. Every single section, the README will explain a lot of stuff. So if you are in doubt, check inside the source code. And if you are a C developer, you will be massively advantaged
about what's going on inside your database. And maybe you will build up something more and more efficient. So basically doing RTFM, but with steroids maybe. I think we are finished.
And this is the license. And that's all folks. I've been 16 minutes earlier. Sorry, been very, very fast. And there are any questions? Who asked questions?
If I was right, so when you wrapped your query in a subquery, it becomes faster, why? No, it becomes… Sorry, I didn't explain clearly. Basically, the cast functions, the PL function were inside the subquery.
So the query was even slower. Because the subquery process more rows than the upper query. If inside your subquery you have a filter, this subquery first filters these rows and then returns these rows. But when you process these rows at where condition,
also they get passed through the select list and then discarded. Meanwhile, they got filtered. So moving all the expensive computation part outside at the maximum level, according with the eventual aggregations or anything, it can improve massively your query.
I remember there were a query, a format function, a two-chart function at lower level. Just moving this two-chart to the upper level, it changes completely the query. Dropped from seconds to milliseconds.
The first example… Sorry? Repeat the question. Okay, can you repeat the question? Sir, what was the fix applied in the first example? Oh, okay, the first example.
The battler, the battler, the exhausting of the writes, right, okay. So what I did, I changed the logic inside this table. So the original table had two timestamp fields. So the processor updated the first timestamp and started the processing.
And then when it finished, updated the second timestamp to say, okay, I'm finished. And then this was used in a select order by timestamp to get the row not yet processed. So this was causing a lot of writes. What I did, I used the primary key, which was a composite key by two values,
to aggregate this information inside one common value and an integer array. Are you familiar with the arrays in Postgres? So the integer array is positional. So pointing the single element, I could access the exact value of the primary key.
And using this primary key, without updating anything, I will just look on the row and say, okay, this is the row I want to update. And then another table changed this counter, moved my pointer inside this array, without touching anything. And then I add the periodic re-aggregator, which runs every day,
for collecting new rows inside this big queue. So one of the ways to improve the performance in Postgres when you deal with queues, keep all the updated fields in one table, in one separate table.
Because in Postgres, you generate new rows every time you perform an update. If you have a row of 160 bytes, and you update the boolean, you generate 160 bytes, not just the boolean. So this is very important to understand for avoiding this sort of stuff, but more important, avoiding bloat on table and bloat on indices.
Any other questions? Why didn't autovacuum work in your second example? Well, that was interesting because... Can you repeat? Oh, sorry. Okay, I forgot about the... Okay, so why autovacuum didn't work in my second example?
That was a problem caused by the number of autovacuum workers. So there were too many tables that needed to be vacuumed, and there were too few autovacuum workers.
So autovacuum spend a lot of times on the same tables without picking the tables with the data I'm doing. Okay, why the exception adds the XID to the computer of the function?
Well, I can suppose. I spoke with a friend of mine, which is a SQL developer. I think he also contributed to some part of Postgres.
And we saved the email because it needs to be rolled back inside the transaction, needs an extra transaction for managing the rollback at data area level, so inside the PGX act where the commit status is.
So you basically get one XID, then all the XIDs that get committed in a virtual way, meanwhile the function progresses. And then when the function commits, everything gets committed and everything becomes visible. So this can be a structural reason.
So it's a choice made for having possible the rollback, the exception to cancel and rollback the pieces executed.
Okay, asking about Postgres. The stories were all related to Postgres 9, so it's back in the past. It's a flash from the past about this thing. And in Postgres 12, probably you will get the same results because it didn't change too much.
We still have the MVCC. We still have autovacuum, which works in the same way like in version 9. And also the PLPG SQL function works in the same way like it was in the 9. I think in 9.6, starting in 9.6, running autovacuum,
when autovacuum starts, because of the XID wrap-around. Oh, so autovacuum is prioritized over the XID wrap-around, so probably... You don't need the whole table. Okay, just the rows, so it will be much more effective. So probably the second story wouldn't happen with the newer version.
Okay, cool. Thanks for the information. Do you want XID? I think so. Oh yes, it's asking if the save point inside the transaction will consume an XID.
I want to try, but I think it may, because in any case it manages at XID level. But I will try. I will try, thank you. I have a question. Okay.
You mentioned ways of getting help from the community, like mainly the Telegram chat and everything. But when you figure out that there is something that is missing from the documentation, how do you contribute to help the documentation to be better? Well, this is a very good question. So, if you find that there is a missing part in the documentation, how do you contribute?
Well, I will start with following the listening to the talk of Leticia, Demystifying and Contributing PostgreSQL, which is a very interesting talk. So, it should be still online. So, it's an amazing talk that can drive to contribute to the Postgres community and how to write the documentation.
So, I will just, and this is probably what I will do, submit a patch for adding this information to the alert box on the Postgres documentation. So, filling up the missing parts.
I have a question there.
Okay. The question is if using the arrays for handling the queues is the best way to manage queues in Postgres. And the answer is yes, definitely. Because the arrays, in particular if you do not update the array ordering,
you will have a complete static table and you can just use a pointer, an integer, for pointing the element inside the array. So, you will limit the updates just to that tiny little integer. So, yes.
Can I do product placement?
Well, there are a lot. Oh, sorry. Okay, the question is where to find how Postgres works internally. So, I will start from the manual, from the open source code,
this is where I learned how Postgres works internally. I have written a book about this thing. It will come out next week. But there are a lot of informations about this thing. So, I will start from the README inside, I don't remember, is the storage tuple or something like that,
or epage.h, that is the README with all the discussion and how the entire thing works. But I will start from that thing. Read from the source and you will never get wrong.
When you run EXPLAIN ANALYZE and then your query is going to tell you whether it's using... Oh, you're mentioning, you asked... And then you will see, okay, you're using this index or it's doing a sequential scan... Oh, you mean... Oh, you mean... Sorry. Okay, I misunderstood your question. No, you had the right...
Ah, you want to know the internal. Oh, yeah, I want to know... Yeah, okay, source code, source code. No, you can't get wrong. But if you want to know what your query is doing, use EXPLAIN or EXPLAIN ANALYZE to display the execution plan and you will see what the query is doing.
Thanks. Okay, so thank you very much. Thank you very much.