An Introduction to Memory Contexts
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 35 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/48297 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
PGCon 20192 / 35
1
3
4
10
12
13
16
17
19
20
24
28
29
33
34
35
00:00
Computer animation
00:32
Computer animation
01:28
Computer animation
02:33
Computer animation
03:35
Computer animation
05:09
Computer animation
06:15
Computer animation
07:56
Computer animation
10:21
Computer animation
13:43
Computer animation
16:24
Computer animation
17:37
Computer animation
18:32
Computer animation
23:27
Computer animation
25:23
Computer animation
27:08
Computer animation
28:55
Computer animation
29:54
Computer animation
33:07
Computer animation
34:33
Computer animation
38:30
Computer animation
39:47
Computer animation
40:38
Computer animation
41:26
Computer animation
Transcript: English(auto-generated)
00:06
So, thanks everyone for coming to my talk. This talk is entitled, An Introduction to Memory Contexts. So it's basically about how we interact with the memory management subsystem of
00:20
Postgres when we're writing C code in extensions or maybe modifications to Postgres or things like that where memory management is important. So a little bit about myself first and sort of where I bring some expertise to this. My name is Chris Travers. I'm a relatively new contributor to Postgres.
00:44
I have got one patch accepted so far. It was a bug fix for a strange race condition. We can talk about it after, but probably going to breeze through that pretty quickly. I lead the research and development Postgres department at Adjust GmbH.
01:04
And I've been a very long time user of Postgres. I've used Postgres in various capacities back to 1999. So I've been a software developer, software engineer. I work with, at this point, mostly Postgres oriented stuff.
01:23
And that includes extensions and so forth within the database management system. At Adjust, we are one of the very large users of vanilla Postgres. We have somewhere between five and ten petabytes of data in Postgres, depending upon how you count.
01:41
And our systems ingest somewhere like 200 to 400,000 requests a second. This all gets delivered in what's more or less a real-time analytics framework, that then users can pull the data out. In order to make all this work, we do a lot of C development in Postgres,
02:02
extensions, data types, aggregates, you name it. And so we do a whole lot of this sort of thing. The whole idea of this talk came out of some questions that my current boss asked me about memory management in Postgres.
02:22
I figured there's a real need for a talk on this topic, so I guess I'd better give one. And then I thought, well, there will probably be people here who know it better than I do, but I guess you're stuck with me. So first of all, why would we program in C in Postgres? First of all, it's very fast.
02:43
You have a language which has a very natural match to the actual machine architecture and what the machine is doing underneath. C will, in many cases, perform even better than C++. In many cases, you can get some optimizations that you would have a hard time getting out of Rust
03:01
as we look at how memory is managed in particular. And this is a really big, important piece of infrastructure when you're actually using Postgres as a development platform. So in Postgres, we can build data types, we can build functions, we can build,
03:22
basically you name it, into Postgres as a series of extensions. And so the best way to do this if performance is absolutely critical is with C. So what are some general problems when we program in C?
03:40
Well, the biggest one, and we don't have perfect solutions to this one, is the fact that linker symbols don't really have natural namespaces in C. And so one of the things that C++ does, spends a lot of time trying to do is solve this. They give you 15 different ways of solving it. But this is probably the biggest headache
04:03
that people who do a lot of C programming actually run into. You also have the fact that C doesn't have a natural exception handling framework. And then of course you have the fact that you end up having to do a lot of your own low-level memory management and pointer math and all that sort of thing.
04:23
So C has some low-level shortcomings that a lot of people run into when they're doing a lot of C code. And this is one of the reasons why everybody tends to be very afraid of programming in C, especially against something as critical
04:41
as a database management system. Just another point, object-oriented designs become very popular since C came out. C doesn't really have natural syntax for handling that. So people who are used to an object-oriented framework may end up going to C++ just to get out of some of these paradigms
05:06
that they're not familiar with and get into areas that they're more familiar with. The good news is Postgres has solutions to most of these problems. So you can, in fact, if you need to do namespacing, you can use dlopen, dlsim.
05:21
In fact, if you look at how user-defined functions are managed in Postgres, that's exactly how they're handled. Basically, dlsim gives you a pointer to the function, you call the function and away you go. Now, Postgres has its own exception framework, e-report, e-log, these sorts of things.
05:42
If you're not using them in your code and you're writing against Postgres, you should be using them in your code. Object-oriented programming in this case just isn't relevant to us. We tend to think more in something closer to functional programming. And so even though C isn't normally
06:00
a functional programming language, you can think of Postgres as being a functional programming environment. So that's also very important. And finally, this talk is about the pointer and memory management portions of Postgres. Now, let's talk about memory management problems in C
06:21
because as I said, this talk is centered on these issues. You have a few kinds of things that happen. They can cause various kinds of headaches. The first one that people usually run into is the memory leak. You allocate memory, you never free it. What happens?
06:41
You keep allocating memory, it uses more and more memory. Eventually, maybe you run out of memory. That sort of thing can happen. And the other side of this is the double free bug. You freed it, you go to free it again. Now your memory utilization map is corrupt and bad things happen. We definitely don't want that in the database side.
07:06
Finally, you have this problem of heap fragmentation. So you can run into cases where you're allocating and deallocating a lot of memory and the heap can get fragmented and for one reason or another,
07:20
glibc can't or won't reuse some piece of memory inside the heap and keeps allocating to the end of it and you get something that looks like a memory leak but you're not leaking memory. This can be just internal bookkeeping issues
07:41
with, for example, glibc. So Postgres solves all these problems for you by giving you something that looks suspiciously like a very deterministic garbage collection system. So, let's talk about how memory
08:01
is actually managed in Postgres. Anybody here ever worked with memory contexts in any form? A few? Remember, yeah? How many of you have actually read aset.c? Okay, a couple of you have looked at it.
08:22
Okay. Okay. So, if you haven't read aset.c, if you are getting into this area, I would recommend going through and reading the code, also reading the readme's in the memory manager section. These are sort of the first pieces of documentation
08:41
you should be looking at. So, when we think about how we normally work in C when we're not working in Postgres, we think of buffers being like a region of memory and data filling the buffer. The data is ones and zeros, and then we might do things with it
09:00
that give those ones and zeros meaning. The ones and zeros don't have intrinsic meaning, they have meaning that we give them when we read and do something with the data in that region of memory. Primitive types like char, int, that sort of thing, basically define sizes of buffers. And then we might define arrays or structs,
09:23
which basically give us a larger piece of memory that we can do stuff with inside. Now, some of your pain here comes from the fact that typically you're malleking and freeing this memory yourself, and therefore you have to be very sure
09:42
that when you're done using the memory, you go ahead and free it. Okay. In the database management system, that's actually a really hard problem because you may allocate some memory in one place, it may need to get used several times, and you may not have an immediate place to plug in
10:00
to free it later when it actually needs to be freed. So, this is a problem. And it's something that most C programmers struggle with to varying degrees while they program their, you know, while they program systems in C.
10:23
So, typical things that you're told are avoid using the heap, avoid using malloc and free because that will typically either go in the heap or it will go in, you know, mapped memory segments somewhere else, and then use the stack for garbage collection.
10:42
These were the things that were drilled into my head by one of my early C mentors at adjust, actually. You can use the stack as a memory, as a garbage collector and get around a lot of problems.
11:01
But in Postgres, we do things differently. This is all focused on the allocation set. I mentioned a set dot C. That's what this defines. Now, let's mention one really big problem with malloc that we haven't mentioned yet, speed.
11:24
Malloc is a system call oftentimes, or actually it's a standard library function that often will wrap some system calls. It may, therefore, effectively clear CPU caches. It may do a whole bunch of work behind the scenes.
11:43
It's not performance optimal. So, we don't really like to call malloc very oftentimes, but of course, we have to because, you know, we have to allocate memory that's gonna be used for some period of time, and then we have to free it later. So, what we do in Postgres is we have the allocation set.
12:02
The allocation set is a large series of memory allocations that are going to be freed together. Now, because they're going to be freed together, we can do certain optimizations. We can allocate a large block of memory, and when we run out of that,
12:20
we can allocate a larger block of memory and things like this. This minimizes the number of calls to malloc and to free. We can also make sure that everything in those blocks of memory are freed together. So, now we can protect against memory leaks
12:40
because we have defined points when all of this data is going to go away. Well, maybe not, we'll get to that in a moment, but that's basically the API design, right? So, we can take an allocation set and we can create it,
13:00
we can destroy it, or we can reset it. Does that make sense so far? And then items within it, instead of using malloc, we use palloc, and palloc will then take this current memory context framework, grab the current allocation set
13:22
that you've set it to use, and we'll allocate within one of those big blocks. There's also a function pfree, which is not used as often, which will free stuff that's been palloced, and then allow that space of that memory allocation set to be reused later.
13:46
So, this is kind of important. I'm a big fan of the idea that when you're programming in C, you need to be aware of what the machine is doing under the hood. So, what we do in the chunk allocator,
14:03
there's also a slab allocator, I'm not gonna get into that one in any significant detail today. It's only used, I think, right now for logical replication, but I could be wrong on that. But the chunk allocator is the one we use for most things. We start off with, by default, an 8K allocation.
14:22
So, we create the memory context, it's going to allocate eight kilobytes of RAM on the heap so you're not having to go through the kernel to get a mapped memory segment, you just get that block on the heap. Now, what this means is, as we go through it,
14:43
we can slice that up and we use it in various ways. this all works very, very nicely. Every allocation contains an additional pointer just before the pointer to the allocation, which is a pointer to the current allocation set.
15:02
That's a little bit complicated, so let me just specify it again. You have an allocation that's, let's say, 10 bytes wide or eight bytes wide. And just prior to the start of that, you're going to have a void pointer that points to the allocation set where that was allocated.
15:21
Reason for that is that if we use p free, we can take that pointer and we know where to go to free the memory. Allocation sets can have parents. So this, it can be a tree. This is very important.
15:41
And when we destroy or reset an allocation set, that cascades to all child allocation sets. Does that make sense? So if we destroy an allocation set
16:00
that is tied to, say, the current protocol message, and there's some other processing that's created some other things underneath that, then all of that memory will be guaranteed to be freed together to the operating system. This is nice.
16:20
It also minimizes your calls to free. So practical considerations, at least with glibc, your first few blocks will end up on the heap. We double the size of every subsequent block that's allocated up to, I think, one gigabyte.
16:43
And so once you get, I think, beyond like 64, then you start getting these anonymous or memory max segments that will live outside the heap and guaranteed to be released to the operating system when they're freed.
17:01
So we have far fewer mallocs. We have essentially very, very, very few calls to free. And this allows us to more or less guarantee that we will not run into memory leaks.
17:22
So this is actually a pretty good design. And it's one of those things that's a little bit hard to wrap your head around when you start getting into it, but it performs very well. And once you get the hang of it, it's actually really easy to use. So now we've talked about allocation sets.
17:42
Let's talk about memory contexts. A memory context is just an allocation set where we have decided ahead of time what its lifecycle is. They exist as a tree under one called top memory context.
18:02
And top memory context is never destroyed and it's never reset. And if you do either of those things, you can expect something probably to break because it will probably wipe out all kinds of stuff that you might be using under the hood. And it very well might destroy
18:21
several other defined memory contexts, which you might otherwise be relying on. So don't destroy it, don't reset it. So here's a list sort of in a tree format of the sort of the topmost part of the tree
18:43
that you're likely to encounter as you program in C. And we'll get to the lower branches of that in the next few slides. But top memory context stays around for the duration of the backend.
19:00
This is in backend memory, it's not in shared memory. None of these are in shared memory, by the way. And it's designed to store all the global stuff that you need to have persisted through the entire life of the backend. So when we wrote the Kafka foreign data wrapper, the connections to Kafka live in the top memory context.
19:25
Then there's the postmaster context, which is initialized before postmaster, when postmaster starts up before it forks off. I suspect backends might have a copy of that data. I'm not 100% sure on that point.
19:44
But you probably are never going to ever have to use this. I mean, if you're doing like initialization stuff when the database server starts, then you do that. Otherwise it just, it wouldn't be necessary. Next one is cache memory context.
20:01
This is an important one. This is where, for example, SPI cached plans live. And we'll get to that in a few slides. The idea here is this also lives for the length of time that the backend lives. And you should probably not reset or destroy this one.
20:22
But at least if you reset it, you're probably only like going to, I don't know, mess with any of your cache plans and stuff like that. So it's not going to be as catastrophic as if you were to destroy top memory context.
20:43
Yeah, I believe so. Then we have message context. Message context is also something you'd rarely use. Basically it's a memory context
21:01
attached to the particular protocol message you're processing at the time. So Postgres has basically a protocol where the client sends a series of messages to the server, the server processes them, and sends a series of messages to the client. So the specific message you're processing has memory attached to it.
21:20
That's where that lives. Now we start to get into the actual operational stuff where most of the stuff we'd write would live, which is the actual operational context. So you'll have top transaction context. And you probably don't want to use this one either.
21:42
This is where you would store things if they have to survive a rollback to a safe point. Otherwise you use cur transaction context, which would be basically the current transactional context that will go away if you commit or rollback,
22:01
including a rollback to a safe point. So then it's a little hard to figure out where to put portal context and error context because in most operations you will find them here,
22:20
at least if you're using explicit transactions. Portal context is for the statement. So the things that have to survive the entire execution of the statement would live there. And error context would store stuff relating to the error
22:42
that you happen to maybe raise or run into or something like that. As I say, they're hard to figure out where to put on this graph, this diagram, because they could also be underneath message context. If you're using the simple protocol, no begin, no...
23:04
No begin, no commit, then they would live up there. But in the extended protocol, they'll always be down here. No wait, in the extended protocol, they could still be up there too if you don't have a begin, I think.
23:24
I'll have to double check that. Now we get into the ones that we encounter on a daily basis. Every plan node has its own memory context.
23:41
So like every join, every sequential scan, every index scan, every basically aggregate node on the planner has its own per plan node. Typically, you're not going to use these, but they're important to know that they exist. In a few cases, you will use them,
24:02
particularly when it comes to aggregates. And it's not really the plan node that you'd use. You'd use a child to the plan node. Then we have per tuple memory context. So when we read a row into memory, we have to process it and store it somewhere.
24:21
We have a per tuple memory context. This is special. It has some really cool optimizations to it that if you're not careful in your testing can really bite you hard. We'll talk about how to avoid that as we get a little further. And then we have aggregate contexts that sit underneath the per plan context of the aggregate.
24:45
For logical replication workers, we have an apply context, which survives for the lifetime of the backend worker. And then we have an apply message context, which is just the protocol message again.
25:02
Logical replication, as I mentioned, uses the slab allocator. The difference between the slab allocator and the chunk allocator is that the slab allocator is optimized for equal size allocations. Well, the chunk allocation can allocate different size chunks, basically when requested.
25:25
So I mentioned there are some cool optimizations in the per tuple memory contexts. First one is it reuses the first 8K block. So if your rows are relatively short, you could process a million or two million
25:43
or a hundred million rows without ever hitting malloc or free. So this is a major performance boost because these are operations you do a whole lot. You don't want something like malloc or free to impose overhead on those operations.
26:04
Next thing is that you may have things at the end of the processing of the row that need to happen. And so the allocation is actually reset at the beginning of the next tuple. So these are reset at the beginnings of the rows
26:20
and they survive until the beginning of the next row. So this helps us really avoid malloc. And in most cases, everything's going to just simply live in this 8K block on the heap.
26:40
Everything's going to be really fast. You don't have a lot of things that you have to worry about. And in the event where you have a very long toast table entry, it will get perhaps a mapped memory segment just for that row. But that's one reason why toast can sometimes have really funny performance implications
27:01
that you don't necessarily think of or understand upfront. So some notes about aggregates. One common mistake is allocating the state material
27:25
for the aggregate on the per tuple context. This is a very bad thing. And one of the reasons why it's very bad is if your tests include basically running an aggregate over let's say 100 or 200 or a million rows
27:42
that are all like four integers, everything will work in your test cases. You run it in production and you have a text field somewhere in that tuple, and suddenly you get garbage out or worse. So this is actually a big trouble spot for a lot of people.
28:01
You want to use ag check call context to find the context. And you pass in a pointer to a memory context, probably that's living on your stack, to that function, and it writes the location of the memory context back there so you can use it.
28:25
So there's an example of the call. And you could get a lot of problems here if you're not careful. As we get into the best practices, you'll see how to avoid these problems
28:41
by setting up a proper testing environment which will properly clobber your data and remove all of these optimizations and replace them with de-optimizations to catch problems. So I mentioned p-free.
29:01
You pass in the pointer that you allocated to p-free. It subtracts the size of a void pointer from that, finds the memory context, goes there, and basically updates the bookkeeping information on that memory context to say, this data is now available for reuse.
29:22
Now, if you pass in a null, you'll get a segfault in your database backend which will cause all other concurrent transactions to roll back. Then the postmaster will clean all this shared memory and then start new backends for you.
29:41
So lovely recovery mechanism, but it has performance and production implications. So please be careful about this. So you're probably at this point saying, okay, well now I kind of understand
30:01
some ideas of memory contexts. How do I avoid making subtle, silly mistakes that cause me to go back and spend weeks trying to figure out why Postgres is randomly crashing or why I'm getting garbage data out of my aggregates and some but not all queries and things like that.
30:24
The first thing is you should be aware of the fact that you have three ways of allocating memory in memory contexts, and you should think through the decision to allocate it in each case. PALLOC is just like MALLOC, except it operates inside these memory allocation sets.
30:45
PALLOC zero then copies zeros over the data in that area. So you can make sure that any pointers, for example, are zeroed out and things like that before you go. Unlike PALLOC, there's no optimization possible here
31:02
with something like mapping in the zero page and doing copy on write. So like if you're just doing PALLOC, typically the PALLOC will be really fast, but your first write will be slow. Here the PALLOC zero will be slower. Finally, you can use memory context,
31:22
allow to specify a memory context when you allocate the memory. So this allows you to actually say, I'm in a per row context, but I actually want to allocate to the aggregate context or to the top level context or to something like that.
31:48
So you want to use that anytime you want to allocate somewhere that might not be the default for where you are at the moment.
32:01
Best practices for aggregates. I mentioned the agcheck call context. Use it and check the output. If it returns false, you're not in an aggregate. Something else is wrong. Then use e-report, send a nice error message.
32:24
How did this happen? You're not in an aggregate, but you're calling this function, whatever. And hand that back up. When you're likely to be aggregating or when you're likely to allocate stuff that you will need later, then change the current memory context
32:45
to the aggregate one, and then reset that at the end. There's some few tricks in that, but as you start getting comfortable reading the Postgres source code,
33:00
you'll see plenty of cases where they do exactly that. So the code isn't that difficult to follow in that regard. So using cached memory context versus top-level memory context. Everything that should be freed together should be in the same memory context together.
33:21
That should be your guiding principle anytime you look at these things. Keep stuff that needs to be freed together, together. You can, therefore, top-level, top-memory context is just for things which are true globals that need to persist always. And then in many cases,
33:42
you may want to actually create your own child that you can destroy or reset at times when you control. Avoid creating top-level contexts.
34:00
Yes, you can create as many top-level contexts as you want, but they're a pain to track. And then it's very easy to lose track of them. Then you can actually get memory leaks because you're allocating 8K of RAM that maybe you'll lose track of.
34:20
So don't do that. Create child contexts under top-level, top-memory context instead. Then you actually have ways you can introspect and look and find things and that sort of thing. Finally, the really, really, really, really big one. And if you get nothing else out of this talk,
34:41
this is the single biggest thing you should follow as far as the best practice. Compile Postgres with enable-csr. And test all of your C code on that Postgres instance.
35:00
This makes Postgres much slower. So do not run enable-csr in production if performance is important. But it does a number of things for you. It enables a lot of sanity checks at a lot of points. So you can get errors through all kinds of subtle mistakes that would not immediately bite you in production. Let me specify why I said immediately.
35:24
Because these subtle problems can creep up. They can cause problems that you won't immediately see. They may cause data corruption. They may cause all kinds of problems that might not become apparent for days or weeks or months after you've deployed a change. The bigger thing it does
35:42
is it zeroes out all of the data, no, the entire memory context when it destroys or resets it. This means you go from funny problems you can barely think about, well, you could have a hard time reproducing into reproducible crashes
36:01
that always happen on your test system. This point cannot be said enough. You want subtle bugs to show up as big catastrophic crashes in your test environment even if they would be maybe minor annoyances
36:22
in your production environment. And so there are a very number, a large number of subtle problems that maybe your test cases wouldn't catch if you're not running this. So let me walk through an example of this.
36:45
You have an aggregate. You forgot to switch the memory context to the aggregate context and you're doing Palook. Your test cases allocate on let's say a table with say a million rows.
37:02
So you have some good test data. Those rows are all four integers wide. So you're going to end up with a row where, you're gonna end up with a situation where you have four integers padded by void pointers, and that's always the same length.
37:22
By the time you get to the aggregate, you're at the end of that. You allocate the state data for the aggregate and now it's after the data for the row. That 8K block gets reused. The same amount of memory gets used for the next row.
37:44
That data is still untouched. You keep running it. Your aggregate appears to work, right? Now you run this in production and you have some variable length type somewhere in your row
38:03
and all of a sudden you get random crashes in Postgres and random garbage out and you name it, right? So that's an absolutely critical thing.
38:23
And as I say, this recommendation is the single most important thing I can recommend when you're working with this. So I'm just gonna take a couple moments, talk about a semi-advanced topic and then we can take maybe a couple minutes for questions and then if there are more questions, you can come up to me after things like that.
38:46
So SPI is the server programming interface. We use it to write user-defined functions and stored procedures in C. Gives us access to the database so we can run queries inside it.
39:00
Very, very useful thing. It's well-documented in the reference documentation. It's a really nice tool. So how does all this work? So SPI has a bunch of its own uses of memory contexts. It puts the SPI stack under the top level context.
39:22
Under top transaction context, it puts the executor one. Unless you're not in a transaction, which case it will put it in portal context. Now, when you cache a plan,
39:40
it moves it to cache context. So when you allocate a plan, you get basically an executor memory context for SPI. It's under top transaction context or portal context,
40:03
depending upon your environment. And in theory, it's possible to allocate it somewhere else but in practice, that almost never happens. You'd have to be doing something really unusual to have your plan running under, I don't know,
40:26
under top transaction context or something. This is absolutely critical. Each plan has its own memory context. This is what allows us to cache plans. So when we cache a plan, this is the actual code.
40:40
I've reformatted it a little bit. But basically what it does is it just takes the plan context and it resets the parent to the cache memory context. And now the plan isn't gonna go away at the end of the statement or transaction or whatever. Kind of neat. So Postgres has managed memory.
41:02
It's actually really useful. The memory management here is highly performant. It's highly optimized. And once you kind of get used to working with it, you can avoid a lot of big problems that would otherwise be a fairly large burden on your ability to deliver quality robust code
41:23
that you can trust to run inside your database system. So high performance. And this really does most of the work for you. But if you're not careful, you can mess it up. Again, enable C assert prevents almost all of the worst problems. Not all of them, but most of them.
41:42
So thank you. Any comments? Any questions?