Announcing pg_statviz
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61621 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023292 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
World Wide Web ConsortiumUtility softwareExtension (kinesiology)Mathematical analysisVisualization (computer graphics)Series (mathematics)StatisticsAerodynamicsTrailModul <Datentyp>Component-based software engineeringField extensionGEDCOMInformationTupleComputer virusWritingBit rateStatisticsField extensionUtility softwareExtension (kinesiology)Normal (geometry)Module (mathematics)Core dumpBuffer solutionDatabaseServer (computing)CodeMiniDiscCache (computing)2 (number)BitOrder (biology)Range (statistics)Type theoryPoint (geometry)Computer fileVisualization (computer graphics)Alpha (investment)Dynamical systemSelectivity (electronic)Line (geometry)Client (computing)AudiovisualisierungMultiplication signFront and back endsConnected spaceCountingLimit (category theory)Connectivity (graph theory)Software testingWeightNumberRow (database)Green's functionLaptopScripting languageCondition numberTouch typingEvent horizonPhysical systemIntegrated development environmentCartesian coordinate systemMereologySoftware developerTime seriesComputer configurationDistribution (mathematics)TunisShared memoryParameter (computer programming)Computer animation
10:05
Visualization (computer graphics)Utility softwareScheduling (computing)FrequencyAudiovisualisierungExtension (kinesiology)Table (information)Alpha (investment)Computer clusterComputer virusPartition (number theory)Moment (mathematics)Projective planeScheduling (computing)Software maintenance
11:03
Utility softwareExtension (kinesiology)Visualization (computer graphics)Logic gateSeries (mathematics)Mathematical analysisDatabaseStatisticsInformation securityGroup actionCodeWeb pageRepository (publishing)Template (C++)Directory serviceData structureTerm (mathematics)Software developerMathematicsInformation privacyHeat transferSelf-organizationComputer animation
11:27
Computer animation
11:51
Program flowchart
Transcript: English(auto-generated)
00:05
Hello. So, PGstatvis, from the name, I think you can understand that it's probably something to do with Postgres. And it is. It is a new Postgres extension and utility pair.
00:20
So it comes with its own tool that you use outside of Postgres. It's minimalist. We'll get into that in a moment. It only does the thing it's supposed to do, and it doesn't touch anything else in the system that it's not supposed to. And the purpose of PGstatvis is time series analysis and visualization.
00:46
That's the viz part of Postgres internal statistics. That's the stat part. So, Postgres internally keeps its own statistics. They are cumulative and dynamic statistics, right?
01:04
So you get, like, number of buffers written is a cumulative statistic that keeps going up. You also have dynamic statistics like PGstat activity that tells you what's happening inside your Postgres server at that moment in time. So if you take snapshots of these statistics internally from within Postgres and you perform time series analysis on them,
01:29
you can gain insights into how your server is behaving. So this utility that comes with PGstatvis extension can produce visualizations for
01:41
selected time ranges on the stored snapshots that are inside the database. So you can, for example, take snapshots of your server every 15 minutes during the course of a day, and then analyze it over 24 hours to see what your peak times were and what was happening inside the server at that time.
02:05
I wouldn't recommend taking snapshots more frequently than a minute, and it's easy to see why. If you have too many snapshots, it's harder to see the bigger picture, maybe.
02:20
So the reason for all of this is you want to track your performance over time, and potentially you can perform troubleshooting on why your server is not behaving the way you expect it to and additional tuning. So Minimalist, this is a tiny package that is based on the KISS and UNIX philosophies.
02:48
So keep it simple and sweet, right? And the UNIX philosophy is that it comes with a tool that you can run as a normal Postgres command line tool like psql,
03:03
with the same parameters and everything else, and it allows you very simply to create snapshots of the statistics and visualize them. So it's modular. We'll get into the modules in a minute. It's minimal. It's the least amount of code I could write to make this thing work, and it's unobtrusive.
03:26
So you can take snapshots without affecting any other activity running on your system, and I think that's very important for being able to monitor and analyze in production.
03:40
So the components are Postgres extension, as we said, and a Python utility that retrieves the stored snapshots from the database and creates simple visualizations with them using Matplotlib. The extension is written in plain SQL and PLPG SQL, so there's nothing to put in shared preload libraries,
04:04
so this means that you can just type create extension and you can start using it without even restarting your server. So create extension pgstatvis is all you need to do. We're working on the packaging now to get it distributed through the PGDG repos,
04:25
Postgres Global Development Group repositories, and by extension it will find its way into distributions hopefully soon. The way you install the utility is very simple. You just type pip install pgstatvis. If you tried that this morning, it wouldn't work,
04:42
but I just uploaded the file so you can try it out. As I said, this is a last-minute talk. It's very new. The code is pre-production quality. I would call it alpha code, but you can give it a try for yourself and offer any suggestions or fixes or tell me what I'm doing wrong.
05:04
Now, the extension can be used by super users, but you don't have to. The only thing that the extension needs is PG monitor role privileges in order to be able to select from the internal Postgres statistics tables, and the usage is dead simple.
05:22
To take a snapshot, you just type from within a client select pgstatvis.snapshot. Now, why is there no underscore there? It's because Postgres doesn't like us naming schemas PG underscore something. That's reserved only for core Postgres, so extensions are not allowed to do it.
05:43
What does the command line look like? You just pip install pgstatvis and you have the utility, and the utility, when you ask for help, is a normal Postgres utility. You get your database selection, user name, host name port, et cetera,
06:02
the same way you would connect with any Postgres client, and you've got modules like buff that shows you statistics on the background writer and buffers written to disk, cache hit ratio, checkpoint rate, connections,
06:20
number of tuples, weights that it found in the server during the snapshot, wall generation, and so on. And you can either run analyze, which runs all of the modules at once and generates visualizations, or you can run just one module if you're only interested in buffers. You can only say run buff.
06:45
Most importantly, there's a capital D option that you can use to specify the date range in order to visualize only the time range you're interested in, so like the last 24 hours only. And these are specified, of course, in ISO 8601 format,
07:03
so there's no ambiguity in how to type in dates. And it works something like this. You connect to database FAF as user Postgres, you give it a date range, and it just generates the snapshots and writes the visualizations as PNG to disk.
07:24
And yes, it has a logo, so it's complete. The visualizations look something like this. I apologize if the points are a bit too small for you to see. So as we said, buffers written to disk is a line that keeps going up until stats reset.
07:43
When the stats get reset, it starts from zero again. So perhaps this is more useful. This is the buffer write rate in megabytes per second. So you can see exactly how many buffers your Postgres server was writing to disk at any moment in time. And also you can analyze what was happening because of the background writer.
08:17
Now? Yes, thank you. So you can see here that because this was a test I ran on my laptop
08:25
with a script that was just inserting rows into the same table, the checkpoint line, which is the orange line, didn't do much because it wasn't scheduled activity taken care of by checkpoints, but you can see the back ends were doing most of the work.
08:43
And also you can see that the background writer, which is the green line, didn't get the chance to participate in all this buffer writing because from what we can see from the very low line, its limits were set too low for production. So you can gain insights into the behavior of your Postgres server like this.
09:07
Or you can look at connection versus status count. So you can see how many connections you had coming into your server from clients, how many of them were active, how many of them were idle, how many were idle in transaction, and so on.
09:22
But you can also see which users were taking up those connections. And I think that's really interesting when you have an environment that's used by multiple applications so you can know which developers to blame when it all goes south. Weight events. As I was testing this on my laptop and was overflowing it with IO
09:45
because I was inserting millions of rows into a table, I generated an IO data file read, sorry for the small letters, weight condition. And that was captured by the snapshot that was being taken every ten seconds or so for this example.
10:06
So thank you for listening. The project is going to be live at github.com slash virus slash pgstatviz in a few moments. You can find me on Mastodon. And what the hell, I'll do it right now.
10:26
As we said, this is alpha quality code. Oh, I forgot to say that it doesn't do any scheduling or any maintenance or any partitioning of those internal tables where it keeps the snapshots.
10:41
So you can delete them by hand. You can schedule the snapshots very easily with any tool you like, like cron or pgchron, but I didn't want to make this a dependency on the extension. So you can just configure it yourself.
11:01
And I can just go to settings and make it public right now. Thank you.
11:37
Any questions?
11:45
No, okay. Thanks anyway. Thanks. Thanks for Jimmy.