repcloud
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/47506 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2020297 / 490
4
7
9
10
14
15
16
25
26
29
31
33
34
35
37
40
41
42
43
45
46
47
50
51
52
53
54
58
60
64
65
66
67
70
71
72
74
75
76
77
78
82
83
84
86
89
90
93
94
95
96
98
100
101
105
106
109
110
116
118
123
124
130
135
137
141
142
144
146
151
154
157
159
164
166
167
169
172
174
178
182
184
185
186
187
189
190
191
192
193
194
195
200
202
203
204
205
206
207
208
211
212
214
218
222
225
228
230
232
233
235
236
240
242
244
249
250
251
253
254
258
261
262
266
267
268
271
273
274
275
278
280
281
282
283
284
285
286
288
289
290
291
293
295
296
297
298
301
302
303
305
306
307
310
311
315
317
318
319
328
333
350
353
354
356
359
360
361
370
372
373
374
375
379
380
381
383
385
386
387
388
391
393
394
395
397
398
399
401
409
410
411
414
420
421
422
423
424
425
427
429
430
434
438
439
444
449
450
454
457
458
459
460
461
464
465
466
468
469
470
471
472
480
484
486
487
489
490
00:00
Point cloudECosRight angleArithmetic progressionComputer animation
00:49
Computer-assisted translationMultiplication signHand fanProjective planeTwitterRight angleDemosceneCellular automatonBoss CorporationLine (geometry)Prime idealSoftware developerPerspective (visual)Computer animation
01:35
DataflowComputer animation
01:53
ConsistencyData managementReading (process)Game controllerTupleRow (database)WritingImplementationCloud computingVacuumDatabase transactionRevision controlDatabaseInsertion lossService (economics)TrailField (computer science)Group actionPoint (geometry)Concurrency (computer science)Physical systemSpacetimeOpen sourceIntegerMechanism designWeb pageRollback (data management)Price indexBlock (periodic table)InformationSheaf (mathematics)Product (business)MomentumMultiplication signHypermediaComputer configurationMathematicsDean numberComputer animationSource code
05:36
Object (grammar)Price indexTable (information)Endliche ModelltheorieNormal-form gameStrategy gamePhysical systemData typeField (computer science)CoroutineRead-only memoryDatabaseData modelSimilarity (geometry)Procedural programmingQuicksortPoint cloudObject (grammar)DeadlockAsynchronous Transfer ModeExtension (kinesiology)VacuumRow (database)Electronic mailing listInsertion lossDefault (computer science)Flow separationMathematicsLibrary catalogDrop (liquid)Subject indexingView (database)Data storage deviceCondition numberPosition operatorMaterialization (paranormal)DialectSound effectPhysical lawSoftware maintenanceUsabilitySequenceCausalityMereologyBoss CorporationThomas BayesLevel (video gaming)Order (biology)ChainComputer animation
11:23
DeadlockNumbering schemeDatabase transactionFreezingStability theoryDeadlockMathematicsChainTable (information)Ferry CorstenKey (cryptography)QuicksortComputer animation
12:25
DeadlockStiff equationKey (cryptography)FamilyPosition operatorAxiom of choiceFunctional (mathematics)Moment (mathematics)Pattern languageWeb pageReplication (computing)MetreStrategy gameSequelDatabaseRoundness (object)Table (information)DivisorProjective planeTheory of relativityUsabilityDifferent (Kate Ryan album)Link (knot theory)Database transactionLevel (video gaming)2 (number)Bridging (networking)SpacetimePhysical systemOpen sourceCuboidImage resolutionBlogMultiplication signTwitterSystem callOnline helpBlock (periodic table)Pattern recognitionCollaborationismFerry CorstenOperator (mathematics)Type theoryMathematicsPlanningSoftware testingProduct (business)Traffic reportingLoginSerial portSubject indexingCodeDeadlockLattice (order)Vector potentialTerm (mathematics)Query languageParameter (computer programming)System administratorVacuumAlpha (investment)Field (computer science)Cartesian coordinate systemComputer animation
17:56
RippingParameter (computer programming)Right angleTable (information)Partition (number theory)Web pageNumberBuffer solutionLimit (category theory)Data storage deviceField (computer science)Similarity (geometry)Structural loadSeries (mathematics)Computer animation
19:24
Point cloudOpen sourceDiscrete element method
Transcript: English(auto-generated)
00:13
Okay, for the next talk, please welcome Federico, who's going to tell us about, which one is this? Repp Cloud. It's a repacker for Postgres.
00:21
Thank you very much. Hello, everybody. Can you hear me? All nice? Okay. Thank you. My name is Federico. I'm Italian. I lived in the UK for a while, and now back to Italy, so very happy to be here.
00:41
I will present this tool I've written over probably six weeks, but before things, let's start with me. This is my cat. He's much better, cute here. I'm ugly, so it's much better to show him. My Twitter is for Dr. Scarf.
01:01
I'm a huge fan of Dr. Wu, and I was born in 1972. I look younger, but I'm a time lord, right? So passionate about IT since 1982, joined the Oracle Secret Society in 2004, fell in love with Postgres in 2006, and I have a Postgres tattoo on my left shoulder, along
01:25
with this guy is just joined recently, so I'm very committed to the project. I'm also involved into the Postgres community. Before starting, little disclaimer, I'm not a developer, not really Pytonic.
01:41
I'm a DBA, so I made it by everybody, which is the role of the DBA, and I hate everybody, so to put things in the right perspective, I use tabs, so please let the hate flow through you.
02:00
So let's start with the stuff, with the strong stuff. How many Postgres users are in the room? Whoa, I love you. I love you all, despite I am a DBA. PostgresQL is the most advanced open source database. It's a great product, gaining momentum, I'm very impressed by that, and I'm very happy
02:23
with that. Available as a database service on cloud providers. So we have RDS, Heroku, other stuff, which limit the action you can do on PostgresQL, and unfortunately Avizondra back too is not good in some areas.
02:44
He needs improvement, or it's just architectural stuff, so we need to live with that. So what the point I want to stress today is the MVCC. How many of you know what is the MVCC?
03:00
Thank you. So multiversion concurrency control is the way Postgres makes things consistent. In read and out, it can manage stuff in read and write with very, very minimal lock. The implementation is very efficient, it's very different from other databases. Postgres doesn't have the rollback segment, for example, and the snapshots are managed
03:25
on the data page. It's amazing. You get everything you need for reading consistently directly on the data page. The data page is the problem at the same time, because the transaction ID is stored inside the row in two system fields.
03:42
The transaction ID is just a four byte integer. So you start from one, it reaches four billions, then it wraps again. There's a mechanism to avoid the wraparound failure, it's a topic for a different talk. But two system fields are used for track the visibility of the row or tuple, as is called
04:02
in Postgres. There's one field is the X-min, X-min is called also the insert transaction ID. When the tuple is created, the transaction ID which created the tuple is stored inside this field. And then there's another one, X-max, which is set when the tuple is deleted.
04:22
So Postgres, it doesn't remove immediately the rows from the physical layer, it leaves in place. And this information is used for tracking the row visibility. But there's no field for tracking the updates. This is quite a mystery, nobody knows about this thing. But in Postgres, there's no such thing like an update.
04:43
An update is made using an insert delete inside the same transaction. The old row version is deleted, the new row version is inserted. So you get two rows, one becomes dead, and so becomes wasted space. But it's still needed for running transactions and needs to see these rows.
05:05
But these old row version are removed by vacuum, when vacuum kicks in manually or automatically. And when the rows are updated, they might change the data page. It's not automatic that the row stays inside the same data page.
05:22
The data page is the data block, the physical block. And when this happens, when the row changes the data page, the indices need to be updated as well. And this may result in bloat. The indices bloat massively just because the tables are updated too much.
05:45
So how to reduce the risk of bloat? Database design. Data model should avoid tables with large rows that should be splitted. The third normal form is very helpful in that thing. Most of the grouping, the most updated fields in separate tables is a massive strategy to
06:06
avoid bloat. Because if you have one row of 200 bytes, and you update just the boolean flag, you generate another 200 bytes. Whether you change the boolean or change a different field, those fields should be
06:22
used for looking up to the rest of the data. Remove the unused indices, there's a way to query Postgres for knowing that thing. Or doing routine maintenance, vacuum, very low impact, less effective indices, or reindex.
06:40
High impact, high effective, you can reindex stuff, but until Postgres 11 is a blocking procedure. So locks your table in read-only mode, and sometimes can prevent also the reads. It depends how the index is used. Dealing with an existing bloat, I found databases in very, very bad conditions.
07:04
Indices, hundreds of gigabytes for storing just two, three gigabytes of data. So how do you deal? The first thing built in Postgres is vacuum full. But this is a massively blocking procedure. You get the table locked in exclusive mode until the vacuum is complete.
07:24
And on a cloud hosted database with hundreds of gigabytes of table, it can take days because the IOPs are limited. It's not really feasible, this thing. PgRepack is historically the way for repacking tables without locking, is a very good tool.
07:49
But the problem is you need to install the extension, and on cloud database, you can do that. So in June, I had this customer, so they had exactly this problem.
08:03
Cloud database with the impossibility to install PgRepack. So I came out with this crazy idea, and believe me, it was incredibly crazy considering all I had to develop. So they said, oh, let's go for it.
08:21
And that is the repcloud, repacking in cloud. This is the reason of the name. It uses a similar strategy like PgRepack. PgRepack creates a new table, regenerates the data, replays the data against this table, and then swaps physically the data at system catalog level.
08:45
Repcloud, it doesn't use this strategy. So it's a logical procedure. It requires minimal access on the database. And for example, on Heroku RDS, I think RDS recently added PgRepack to the list of extensions.
09:01
So probably it will be a better idea to use PgRepack if available. But on all the others that do not allow this sort of extensions, repcloud can help to reduce the bloat of the table. So how this thing works, create a new table like the original one, empty, creates a log
09:25
table with the same data type of the original table, Postgres have this fantastic thing called composite data types, which describes also the tables, and then the trigger stores the log changes that are happening onto the original table into this log table.
09:43
Then the data is copied with just a select insert into the new table, the log changes are replayed against the new table, and then attempts to swap the table. So renames the old one, renames the new one to the old one, and live long and prosper.
10:02
Not exactly, because the swap is absolutely not trivial. Because the Postgres dependency system is something you will see if you try to drop a table when there is a view attached to the table and you get, oh, I cannot drop it. You have to use Cascade.
10:21
Don't use Cascade. It's very dangerous. So even when you rename your table, all the dependent objects follow that table. So you have to rebuild exactly the same objects onto the new table and then you can drop safely the old table. Otherwise, you will lose information.
10:44
So what has to be recreated? Sequences, of course, otherwise any insert with the default value using that sequence, it will fail views, materialized views, and foreign keys, but the problem is any other
11:04
object that is related with these objects needs to be rebuilt as well. So the entire chain of dependencies, it has to be rebuilt backward and then everything has to be dropped and recreated in the correct order, otherwise it will fail.
11:20
And also, we have another problem, the deadlock. The deadlock is when one transaction is waiting for the lock on another transaction which is waiting for the lock on the previous transaction. There's no way to exit this chain. So Postgres kills one transaction randomly and frees the deadlock when it happens.
11:47
And this happens in particular when you have a chain of foreign keys. So this table is referring to this one, this one is going back, this one is going back there. So if you try to rebuild or drop these foreign keys, meanwhile they are in use, so they are being
12:05
checked and validated against the data changes, you will get the deadlock. There's no possibility. This is a schema, very simple. This is the PG bench schema with foreign keys. It's very useful for testing these sort of things. And the first thing I learned when I tried to run the swap, it was, oh, deadlock.
12:24
So how did I approach this thing? I decided to add the choice on this tool so the user can decide which strategy to use. Basically, ReCloud does nothing when it starts, when it detects the deadlock,
12:43
so he waits for the database. But it's possible to tell ReCloud to kill or cancel the offending query, the other query. So the system automatically checks if there is a deadlock, potential deadlock happening, and one of the blocked queries, not the one that is running this swap, is canceled or killed.
13:05
But sometimes it's not a good strategy for this thing. So it's also possible to run just a prepare swap. So everything is made until the end before the swap. So the table is created.
13:21
The logs are populated by the trigger. There is the replay functions running to replay and catch up with the existing table. So it's possible to stop the application and say, OK, now we can swap without the risk of deadlock. So a few seconds of downtime for swapping the relations.
13:43
I think it's something it can be afforded, rather a long-running vacuum full. Also, it can change the fill factor for the table. Fill factor is a very useful thing. So you can set the parameters. This is something still I'm working on it.
14:00
At the moment, we can set the fill factor. But my idea, it will add the change table space, change index table space. Everything is configurable at this level. So it's possible to create the new table with a different fill factor. So the new table will be less impacted by bloating, because there will be more space inside the data pages for the updates.
14:24
This is one of the strategies for designing tables that might be affected by the bloat. It can clean up the JSONB and JSON fields from the null keys. So it makes the JSON fields more compact. And also, it's possible to remove the keys, but just from the JSONB fields,
14:42
because the function that runs this operation is present only for that data type. More will come. And this is the tool in short. I would like to give some recognition. This is the company that believed in me
15:01
and gave me the permission to release this thing as an open source project. It wasn't supposed to be that, but I think it might help a lot of people. The dependency resolution is derived from the PG admin team. I use their queries adapted to my need. They were amazing.
15:20
They saved a lot of time for building up the crazy. The dependency system in Postgres is insane. So you have to get through a lot of outer join for getting everything. And also, the replace strategy is inspired by the PG debug code. So waiting for the transactions before starting effectively the copy
15:41
is something inside the serialized transaction isolation level. It's something that helped me a lot to understand how to approach this problem. This is the GitHub project. This is my GitHub. There's also other projects.
16:01
The other important project on this GitHub page is PG Chameleon, a replication tool from MySQL to Postgres. PG Chameleon is very usable, and it's something I want to pick up. I've been distracted in the last year by this something I want to improve.
16:21
And also, this thing is available on PyPy. So you can just it's an alpha release at the moment. But we tested it on production. It worked incredibly well for repacking a 300 gigabyte table. It took about four days, which were supposed to be downtime, and it wasn't.
16:41
So the customer was very, very happy about that. And the license terms are the Postgres license. I like this license. I think PG Chameleon started as a BSD license. It will move on to Postgres. It makes more sense. It's a Postgres-related project. And if you want to break it and report or you want to help me to build up this thing,
17:06
I think I will write the collaboration guidelines very, very soon for this project. I will be very, very happy to get any help about that thing. Those are my contacts. This is my blog, PGDBA.org, where I write stuff about Postgres,
17:25
Twitter already saved, GitHub, and my LinkedIn page work under the Creative Commons. And if you want to hear about horror stories tomorrow, I will speak at the Postgres Dev Room about RTFM.
17:43
Anybody knows what RTFM means? Sure you do. So I will tell about interesting and funny things. It's the last talk of the day, so I have to be entertaining. And that's all, folks. And there are any questions. Thank you for listening.
18:12
It's not directly related to the rip cloud, but do you consider partitioning as the solution against load? Well, partition is not designed for a solution for the bloat.
18:24
It's for having more efficient storage when you have a monolithic table. So a solution for the bloat is looking at the storage parameters. So fill factor, trying to avoid unnecessary updates.
18:40
And one thing you have to look into the pgstat old tables. There is the field called 2 pot counter or something like that. I don't remember. My shared buffer is limited. But there's something with the hot inside. If this counter increases, it means the tuple, when it's updated,
19:02
it stays inside the same pages. So it's a very good thing. So if you have a lot of updates and similar numbers on the other field, you are doing very, very right. Any other questions? Cool.
19:21
That's all right. Thank you.