We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Queues in PostgreSQL

00:00

Formal Metadata

Title
Queues in PostgreSQL
Title of Series
Number of Parts
34
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Many projects involve something resembling job or message queuing. In this talk I will look at: - the pros and cons of using a plain old relational database for this type of workload rather than specialized technology - approaches and designs - special considerations for PostgreSQL - performance and tuning - potential improvements
Enterprise architectureQueue (abstract data type)DatabaseServer (computing)Software testingSingle sign-onPriority queueSoftware developerTrailDifferent (Kate Ryan album)Cartesian coordinate systemEnterprise architectureServer (computing)Data structureJSONXMLUMLComputer animation
Personal digital assistantPriority queueImplementationSequenceInsertion lossData storage devicePriority queueData structureWordLine (geometry)Term (mathematics)NeuroinformatikData dictionaryElectronic mailing listData storage deviceOrder (biology)Insertion lossInformationNumerical taxonomyComputer animationJSON
Queue (abstract data type)Priority queueNumerical taxonomyCodeImplementationNumerical taxonomyOrder (biology)Priority queueProper mapMereologyComputer sciencePhysical systemMoment (mathematics)InformationElectronic mailing listWordBuffer solutionLink (knot theory)Semiconductor memoryPhysicalismRepresentation (politics)ImplementationJSONComputer animation
ImplementationSet (mathematics)Priority queueData structureQueue (abstract data type)Physical systemNumberMereologyImplementationQuicksortPriority queuePhysical systemMessage passingAlgorithmNetwork topologyOrder (biology)BitMiniDiscOperating systemDisk read-and-write headDivisorFloppy diskBlock (periodic table)Computer animation
System programmingMotion blurConcurrency (computer science)Database transactionInsertion lossInformation retrievalProcess (computing)Order (biology)Order (biology)Greatest elementWater vaporPriority queueEmail
Personal digital assistantPriority queueImplementationRange (statistics)Computer hardwareDatabase transactionDemonService-oriented architectureMessage passingMereologyDatabaseProcess (computing)BackupSoftware maintenancePriority queueRelational databaseProcess (computing)Characteristic polynomialMessage passingCategory of beingDifferent (Kate Ryan album)Universe (mathematics)Physical systemDatabaseDatabase transactionVideo gamePoint (geometry)Task (computing)MereologyRange (statistics)NumberBackupBit rateComputer animation
Personal digital assistantPriority queueImplementationPlane (geometry)Message passingDatabase transactionSound effectCASE <Informatik>Mixed realityPlanningPhysical systemNumberData managementPriority queueBitDatabase transactionDatabaseMessage passingSound effectOperator (mathematics)Moment (mathematics)Insertion lossComputer animation
Computer hardwareSound effectDatabase transactionInternet service providerComputer networkSoftwareConnected spacePlanningServer (computing)Database transactionInsertion lossMessage passingDatabaseMultiplication signRow (database)Denial-of-service attackAsynchronous Transfer ModeProcess (computing)
Database transactionSound effectMilitary operationPhysical systemProcess (computing)Priority queuePoint (geometry)Database transactionFlow separationPlanningPhysical systemOperator (mathematics)IdempotentIdentity managementDatabaseComputer animation
Process (computing)Control flowDatabaseDatabase transactionPersonal digital assistantPriority queueImplementationConcurrency (computer science)Table (information)Message passingOrder (biology)Interactive televisionPriority queueArchaeological field surveyDatabase transactionNumberDatabaseComputer configurationClient (computing)Different (Kate Ryan album)Row (database)Table (information)Concurrency (computer science)Normal (geometry)Order (biology)Moment (mathematics)Computer animationJSON
Data modelFunction (mathematics)ApproximationInsertion lossOrder (biology)Database transactionCommunications protocolDatabaseRelational databaseInsertion lossPriority queueEndliche ModelltheorieOrder (biology)Communications protocolMereologyMessage passingType theoryReplication (computing)Database transactionSequenceCross-correlationCommitment schemeMultiplication signTable (information)EmailNumberKey (cryptography)Point (geometry)Computer animationJSON
Order (biology)Communications protocolLevel (video gaming)SerializabilityWorkloadRow (database)Order (biology)CASE <Informatik>Point (geometry)SerializabilityDatabase transactionWorkloadLevel (video gaming)Best, worst and average caseJSON
Communications protocolLimit (category theory)Order (biology)Database transactionMessage passingLaptopDifferent (Kate Ryan album)NumberNeuroinformatikBit rateMiniDiscBitType theory2 (number)Row (database)Selectivity (electronic)Multiplication signCASE <Informatik>Order (biology)DiagramJSON
Communications protocolClient (computing)Structural loadRow (database)Priority queueDisk read-and-write headBitServer (computing)DiagramComputer animationProgram flowchart
Message passingCommunications protocolConvex hullRevision controlBitBefehlsprozessorCore dumpClient (computing)Row (database)Plateau's problemNumberStructural loadConcurrency (computer science)Diagram
Communications protocolOrder (biology)Database transactionConcurrency (computer science)Replication (computing)Message passingProcess (computing)Perturbation theoryClient (computing)Row (database)PlanningOrder (biology)EmailMessage passingPerturbation theoryMultiplication signPoint (geometry)Table (information)Game controllerConcurrency (computer science)Priority queueRandomizationGreatest elementProcess (computing)Computer animationJSON
Personal digital assistantPriority queueImplementationCommunications protocolMessage passingDatabase transactionSystem programmingMilitary operationProcess (computing)Complex systemDatabase transactionQuicksortMessage passingNumberMultiplication signCommunications protocolMaxima and minimaFront and back endsService (economics)Priority queueAsynchronous Transfer ModeComputer animationSource codeJSON
Default (computer science)VacuumPrice indexSpacetimeSequenceTable (information)StatisticsTheoryWorkloadDatabase transactionVacuumKey (cryptography)Default (computer science)Uniqueness quantificationRow (database)Table (information)TheoryOrder (biology)Concurrency (computer science)IdentifiabilityPoint (geometry)Right angleMoment (mathematics)Type theoryMultiplication signWorkloadNumberSet (mathematics)Message passingPriority queueStatisticsGroup actionBit rateSubject indexingSequenceInsertion lossDatabase transactionIntegerRandomizationSource codeJSON
Sound effectMemory managementInformation overloadTupleAmsterdam Ordnance Datum2 (number)Multiplication signRow (database)VacuumNumberMessage passingLine (geometry)Moment (mathematics)Right angleSawtooth wavePriority queuePoint (geometry)Greatest elementRevision controlDefault (computer science)Set (mathematics)Green's functionCuboidShape (magazine)Bit rateWorkloadMultiplication tableMathematicsGraph (mathematics)QuicksortVideoconferencingImage resolutionClient (computing)Table (information)Software testingCASE <Informatik>Hidden Markov modelGraph coloringConcurrency (computer science)Mathematical analysisProcess (computing)Graph (mathematics)TrailCubeWebsiteSign (mathematics)Web pageBitEngineering drawingDiagram
Personal digital assistantPriority queueFunction (mathematics)Message passingWorkloadPriority queueBefehlsprozessorType theoryExtension (kinesiology)Functional (mathematics)Key (cryptography)Mechanism designAdditionCodeComputer animationJSON
SpacetimeVacuumGame controllerConcurrency (computer science)Goodness of fitPoint (geometry)Task (computing)Revision controlDatabaseSlide ruleMultiplicationPhysical systemVideo gameJSON
Priority queueTheoryTupleSerializabilityOrder (biology)Database transactionSerializabilityProcess (computing)DatabaseCASE <Informatik>Single sign-onDisk read-and-write headRow (database)QuicksortBitMechanism designAsynchronous Transfer ModePrimitive (album)Limit (category theory)Order (biology)Database transactionView (database)Multiplication signTranslation (relic)WorkloadRevision controlRight angleStrategy gameJSON
Device driverKey (cryptography)BitEmailClient (computing)QuicksortCAN busLimit (category theory)Priority queueResultantRelational databaseMoment (mathematics)Endliche ModelltheorieNumberNetwork socketClassical physicsPattern languageLibrary (computing)PlanningMultiplication signSystem callSubject indexingAlgebraSemantics (computer science)Semiconductor memoryFunctional (mathematics)Selectivity (electronic)Term (mathematics)Table (information)Interface (computing)Memory managementInsertion lossTupleRight angleHash functionRange (statistics)Order (biology)IntegerCASE <Informatik>Query languageSlide ruleComputer animation
Transcript: English(auto-generated)
Okay. Hi, everybody. My name is Thomas Munro and I work for EDP. And I'm going to be talking about queues and Postgres. This is a little bit different than all of the talks in at least in this room today because it's application track. It's about using Postgres. It's not about internal details of Postgres and Postgres development. So, just a quick note
about myself. I've been working for Enterprise DB for about a year where I work on EDP Postgres advanced server and also Postgres. I've made a couple of minor contributions to Postgres so far. The one in bold there is skip-locked which is going to be mentioned in this talk
which is why I mentioned that and a couple of other things. Okay. So, this is the structure of the talk. I'm going to start by talking about what a queue is. So, first of all, just a dictionary definition. Outside North America, queue is kind of a normal word for a line of people or cars or something, right? I heard that the term is catching
on in North America as well thanks to Netflix which they have a queue feature for queuing up films, right? So, I'm kind of entering common parlance over here as well. So, that definition of the word is essentially the same as the computing definition which is a list of data. This is the Oxford dictionary definition, a list of data items, commands,
et cetera, stored in a way that allows them to be retrieved in a definite order. It's usually the order of insertion. I've put definite order in bold there so we can talk about what that actually means. I've just got my own informal taxonomy of queue-like things here. First of all, there are queues, proper queues, first in, first out queues
and priority queues and we'll look at those in a moment. And then we have queues with air quotes which are things commonly called queues which might not strictly fit the computer science definition of queues. For example, the IO queues of operating systems which reorder things and merge things and completely unordered queues which are obviously not
technically queues at all if they don't have an order because it's kind of part of the definition. So, firstly, mostly when you hear the word queue, people are usually thinking of FIFO queues first in, first out and they're often used in low-level systems because they have simple implementations where
the ordering is linked to the physical representation or physical layout in memory and there's, you know, circular buffers and link lists and so on, simple ways to implement those things. But slightly more generally, priority queues are the same sort of thing except where there's some explicit logical ordering where you perhaps put a number on things or, you know, something which
is part of the message controls the order that things go in. And the implementation techniques for priority queues usually look a little bit like sorting systems. They involve trees and the sort of things that you see in sorting algorithms. Then we have specialized queues and these are queues with air quotes. A good example is an elevator or lift or an operating system
in my o-scheduler. Allegedly they behave with better efficiency by merging together requests for, for example, blocks that are adjacent to each other on disk or something like that. But it sort of looks like a queue if you squint and tilt your head slightly because
you put things in and things come out and in some order which is perhaps complicated to describe and might even depend on the system clock or other kinds of external factors like where the disk head is or something like that. And finally you might have completely unordered or approximately ordered queues. Now often we don't really care about the order
that things come out of a queue that we put things into. For example, if you have a queue for emails that are going to be blasted out to the world, you might not really care if they get sent out in the right order. But probably you at least care that they come out vaguely in a fair order approximately because you don't want it to be possible for a single thing in the queue to get stuck forever. Like imagine you throw a basketball
off the Niagara Falls or something and it gets stuck at the bottom and doesn't manage to escape even though the water mostly is flowing through. We don't want that to happen. So to exclude that kind of thing, an approximate ordering can be a very useful thing to have. So why would you put a queue into a relational database?
Whenever I talk about this subject to some people, they immediately start saying, well, you should be using Redis or you should be using this, that or the other. And there's a whole universe of different technologies that do something related to queuing or messaging. And those things have all different properties and sometimes some of those specialized technologies are the thing you should be using.
I'm certainly not claiming that you should always try and use your relational database to do queuing. But sometimes it can be the right tool for the job. So some of the characteristics that I think make it worth considering using your existing relational database for queuing things. The key one is this top item here.
You want to do some kind of reliable persistent message processing, which is atomic with respect to other database work that you're doing. Now you can achieve transactional semantics with external messaging systems as well. But then you have to bring XA systems or two-phase commit and other complications into your life.
But for many simple tasks, it's reasonable to put things into existing transactions in your existing relational database. So some other advantages to doing that are that you probably already have a system of backups and failover and you've got a whole bunch of infrastructure in place and you don't really want to add any more moving parts.
But obviously this is only going to make sense if your rates of messages and numbers of consumers are sort of in the range that your database can reasonably handle. That kind of goes without saying. But if you look at the first point and the third point there, if you're trying to do something which is transactional with respect to existing work in your database, then you kind of already need your database to be able to keep up with the work that you're doing.
So combining those things in an existing database that you already have can often make a lot of sense. And the fourth point there, obviously if you really like Postgres and you're good at using it and you like it so much that you even come to conferences about it, then it's going to be a point in favor of considering that.
So let's look at some example use cases. If you're mixing database transactions with some kind of external effects, then you might want to use something that looks a bit like a message queue to separate those things. So in this example we want to book a seat on a plane and we're doing a booking management system of some kind.
And we also want to send SMS messages like on the phone to confirm bookings and seat numbers. So these two operations, updating something in the database to say that this customer has bought a seat on the plane or whatever, is a database transaction and also we've got this external system which can send SMSes to people.
Now this is obviously a ridiculous way to do it because it can go horribly wrong. You begin a transaction, insert something into a database to say this customer has bought a seat on the plane. You then send an SMS to the customer and then try and commit. But right at that moment, something goes horribly wrong.
It's probably not going to be an asteroid hitting the server literally, but it could be a temporary loss of network connectivity or something like that. And so we don't commit. So now we've sent this SMS out to tell someone that they have a seat on a plane and they don't have it because we've forgotten all about it. We didn't commit that transaction. Obviously that's not the right way to do it. So that's take one. If we try take two, begin the database transaction, do all the stuff you need to do to book a seat on a plane and commit that.
And next we're going to send the SMS message. But at that time, suddenly there's a flood or you lose network connectivity or something less dramatic happens. And now we haven't sent any SMS and we haven't got any persistent record of the fact that we intended to do that.
We haven't really succeeded in doing our two jobs. It's a better failure mode than the previous one when we told the customer that they had a seat on a plane when in fact they don't. But it's still not ideal. Take three is we separate these two things into two different transactions. Firstly, we begin our transaction, book a seat on a plane, enqueue something.
In that transaction, we're essentially logging our intention to inform the customer. And then we commit. And then a separate piece of machinery says, ah, there's a job for me to process in that queue. I'm going to try and dequeue that and send the SMS.
And then commit if that succeeds. And you can imagine that there are various failure points in that second step there. But so long as sending SMSes is an idempotent or idempotent, depending on how you prefer to say that, operation, if you do it twice, then it doesn't matter.
And then if you don't have such a setup, then it's possible you might send the SMS to the person twice. But if you don't consider that to be a disaster, or preferably if such operations are prevented, then that can't go wrong. Then you have a much better system. Obviously, you have to consider the possibility that sending an SMS fails.
But because you have this queue in place, you have the possibility to retry that later. OK, so some other things you might like to do with a database queue are farming work out to some large pool of worker servers, or any kind of aggregation or something like that that you want to do that doesn't have to be done in transaction,
because you've got some interactive transaction and you want to do it as fast as possible. But you want to make sure that you go and update some other numbers eventually. That's another good reason to use such a design. OK, so let's talk about how you might implement something like that in Postgres.
So the basic ingredients we're going to use here are, unsurprisingly, a table which is going to have rows representing messages. And for controlling the priority, we're just going to be using normal SQL order by. For signaling between different clients that are connected to this database, we're going to be using notify and listen. How many people are familiar with notify and listen? Everyone, pretty much. Yeah, that's good.
And to deal with concurrency, well, there's a few different options, and we'll look at those in a moment. Now, one thing to note is that the relational model and SQL databases generally don't expose details of the physical ordering. So if you want a FIFO queue, there's no way to get that without really doing it explicitly as some kind of ordering.
So really, we just have priority queues in our database. OK, so the basic protocol is going to be something like insert some message into a table, which would be very unsurprising, and then notify somebody using the notify facility.
And of course, notify actually defers notifications until after commit succeeds, so it's kind of a post-commit activity. So if it fails after notify returns, but before you commit, then no notification takes place, which is good. One thing to note about enqueuing things is that if you're very sensitive to time ordering, and you've got sessions inserting concurrently,
it is actually quite difficult to generate a key that increases monotonically with respect to the order of visibility of transactions, because commit order is not necessarily the same as the order that you generate keys in.
I believe there are some other databases that have ways of doing that. I think Oracle has a way to record a commit sequence number and things. We don't have anything like that in Postgres, and that's actually quite a difficult problem. We do? OK, so that's something that's part of the...
So you have to think about that quite hard, I think, to get that to work, if you've got concurrent insertions. But certainly doing it naively by just using a sequence or something, it's not guaranteed to work, so that's just a small note.
Yes, you could, but let's say two different... Forgetting the possibility that two people might see the same time exactly, even if you forget that problem, it's still possible that session A sees a time that's earlier than session B, but then session B commits before session A.
So I'm talking specifically about the correlation between the order that these transactions become visible to someone else, and the time, and it's difficult. And the reason that Oracle provides a way to do a commit sequence number is specifically because they want to...
This is my speculation, but I believe they needed that to do replication, and that's specifically something that I think Andrew is referring to. Yes. Oh, sorry. If it's important. It would be important if you were doing some kind of replication type things,
and that's probably what Andrew is referring to, I think. But most people don't care about that. It depends whether there is actually a strict dependency between the messages in your queue or not. And if you're just using a queue to queue up emails that need to go out or something like that, then you probably don't care.
There isn't really a strict... I'll come back to that point, though. So as for dequeuing protocols, here's take one, very naive. Just go and select one row in the correct order. This will work as long as you don't have concurrent sessions doing it,
and that's obviously not going to be good enough for many use cases, but it could be. This is just a starting point, naively selecting single rows. Doing something and then deleting the row. It's a very simple approach. So if your isolation level is anything below serializable and you try and do this with concurrent sessions,
then you're going to get into trouble pretty soon. And if you try and do it with serializable level, then at most one overlapping succession is going to be able to succeed with these transactions. So this is a kind of worst-case workload for serializable isolation. So we can only really get one session doing this at a time and have it work.
These are just some numbers from my laptop. I managed to pull out somewhere a bit under 3,000 messages per second, and that's actually just the transaction rate that I can commit simple, small transactions on this particular computer. And those numbers could be completely different on different types of disk subsystems.
So presumably, pulling things out one at a time is not going to be enough for many use cases. So a typical approach is to use explicit row locking. So exactly the same as before, we're selecting rows in order, limiting it to one, but using forUpdate to get a row lock on a single row.
So if we try and do that with loads of clients, here I went up to eight, the total throughput across all clients goes up when you get to two clients but then starts going down. And that shouldn't be surprising because these clients are all going to be competing for locks on the same row,
whichever row is the head of the queue, basically the next. They're all trying to select the same one, so only one of the clients succeeds. And the reason why we got some additional performance when we went to two clients is to do with pipelining and the other work that's going on in the chatter between the client and the server and so on. But it doesn't get you much extra performance and then when you get to three, it starts going down.
So the next step is to add skip-locked. Or if you're using Postgres versions before 9.5, you could use advisory locks with a little bit of extra work.
And that says basically instead of queuing up to wait for the same row, they're going to step over the locked rows and enable you to get some more concurrency. And here you can see that it goes up fairly nicely until it gets to about the number of CPU cores I happen to have.
And then it goes up just a little bit more and then you can see it starting to plateau. Presumably if I tried with loads more clients, it would be plateauing or maybe going down slightly or something like that. So what's going on here is that the first client comes along and manages to lock the first row. The second client comes in and you see it tries to lock the first row. Can't get the lock so it just moves to the next one and so on. So you get this nice spreading out of work which enables it to scale.
So coming back to the point we made earlier, the order by clause is controlling the time that we start processing each item. But it doesn't control the order that we commit. Those things are probably highly correlated but they're not strictly correlated.
Also anything that DQs and then rolls back will cause further perturbation of the processing order. So this slightly looser ordering is really good for concurrency while still being approximately fair to all messages. And that's important to prevent the scenario I described before as a basketball going over
the Niagara Falls and getting trapped at the bottom in the churn and never coming out. And what that really means is that this is an extreme example but if you've got an email queue and the rows in this table represent emails that need to be sent out. What you don't want is for some emails to be sent with a very high latency
and some with a very low latency because of the randomness of the churn of the ordering. So you still want some kind of loose order typically. Okay, so what kind of problems do we face when we try to implement these things? So that very simple protocol I described had messages which are locked, worked on by some kind
of worker process while they're locked and then deleted in the same transaction which is very simple. But it doesn't really let us deal with failure modes very well. For example, what if you can't send an SMS, if that's what the worker is trying to do, you probably want to retry.
So one approach is to have a retry counter on messages, leave them in the queue and then give up after some maximum number of retries. But you probably don't want to retry at full speed because if the SMS service is not working right now, the chances that it's not working again immediately may or may not be high.
Maybe you want to try again once quickly and then maybe you want to wait a minute or something like that because the service might return later. So then you might want to have a delay time which you put into the messaging queue or you might want to manage that with some other approach. And another kind of resilience which may or may not be useful depending on what sort of problem you're working on.
If it's possible that backends might crash or hang or get stuck but you still want work to go on, then it is possible to design slightly more complex systems where messages can be stolen from an existing worker. And you sort of do that with a protocol involving two transactions which I'm not going to go into in detail.
Okay, some other considerations when doing these kinds of things. I've seen people use integers as a primary key and they pretty quickly run out and get into trouble. Then there's the question of sequences being inappropriate for generating a strict order if concurrency is involved.
Another problem that you can run into is if you're using B-trees and you've got ordering which is not correlated to the insert order, you can get into a problem of generating a lot of bloat. And the next problem which I mentioned here is something which I haven't actually ever seen on Postgres but I have seen on DB2.
I think it should in theory apply to Postgres although I haven't come across it myself which is that if you've got a table that sometimes has no rows in it and sometimes has millions of rows in it and analyze runs at kind of pretty arbitrary times, you can finish up with some really bad statistics one day kind of at random.
So DB2 actually has a feature to deal with that and I've kind of wondered before whether Postgres ought to have that. You can basically say this table is volatile and it causes the statistics to be tweaked in a certain way that make it less susceptible to going crazy just because it happened to run analyze at a moment when it was empty. Yeah.
So maybe that's a feature that could be looked into Postgres.
Right, so if there's no ordering requirement at all, in theory you don't actually need to have any indexes on a queue table if you're prepared to just use the CTID to refer to individual rows that you select to lock. Although that might, some people might not be very comfortable with having a table with no primary key or unique identifier or whatever for aesthetic reasons or others.
So and then the last point is actually something I'm going to talk about in more detail. So default vacuum settings could easily be completely insufficient for this kind of workload if you're doing high numbers of message throughput.
And that's not really specific to queue type workloads at all really, that's just any kind of busy workload that's doing a lot of writes and high transaction rate. Vacuum can easily become a problem. So here I've plotted some graphs that show in green the number of dead rows in a, dead tuples in a queue-like table.
And in, what do you call that color, magenta, there's, it's showing the number of messages it's de-queuing per second using I think a couple of clients connected concurrently. You can see that it's sort of humming along there just slightly below 4,000 messages per second coming out of this queue.
And you can see that the number of, in the top left hand graph here, and you can see that the number of dead rows goes up, up, up, up, up. And then every minute, that's seconds along the bottom there, so every minute you see that sawtooth shape as vacuum comes along and vacuums up all of those dead tuples. And that's actually looking not too bad I think and it's, because it's not, I'm not really
making it do too much work and that's actually using default settings straight out of the box Postgres. It's built, installed and it's kind of doing okay. But if you move over to the next one in the top right hand corner you can see that the rate at which it de-queues messages starts to go crazy. And this is because in this case I've actually pumped tons and tons of messages which arose into a table and produced this completely bizarre unstable performance.
That's a problem. This some videos lower resolution than I. So the cyan one shows you the number of dead tuples in a table.
So that's, sorry, the, right, the cyan one at the bottom is the number of live rows in the table which at the bottom is just, in the top left one is just going along the bottom. The second one is backing up and the reason for that is that I started inserting tons more into it to see if I could kind of overload it.
So what I was trying to do is get it to, was to kind of overload the whole vacuum setup and make it misbehave to sort of see a disaster scenario. And you know, presumably if this happened to you, you'd be unhappy because you probably don't want your message throughput to become really unstable like that.
Yeah, so at about, at this point here I stopped inserting new, new rows.
So you can see that in the top right hand chart there, you can see that somewhere around 4,000 there's a line that it wants to be at but it keeps falling down from there. Anyway, the short version here is that, that's right, that's right.
Yeah, just to repeat, the magenta is the number of messages pulled out per second, dequeued per second. The green is the number of dead tuples and the sort of bluish color is the number of live tuples.
The number of live tuples is going up because in this test I was inserting a megaton, like way more than it could pull out and that's because I had more, more clients doing that concurrently. So obviously that's completely unacceptable and the, you know, without too much analysis the
basic problem is that vacuum isn't keeping up and it's interfering with the performance. So I tweaked and that's just using the default settings which, which are obviously insufficient. In the bottom left chart here, it's the same, the same setup but tweaked so that
it vacuums more often and that's just got a lower, it's got a lower nap time. And we see that the performance becomes very zigzaggy but at least it doesn't have all those different modes and go completely crazy like that. And then in this final chart here I pulled the nap time right down so that it
would, 10 seconds to 8 seconds I think it was, and performance becomes hopefully slightly more acceptable here. But still we, we, you know, we, we see the effects of vacuum interfering with the workload at the back there.
Right. Yeah.
Yeah. Okay. So just fill it for the recording, I'll just say. So the, the Sloanie guys who have a kind of a queue for the change log, change set log, whatever it's called, they just completely escaped this by having multiple tables and switching between them and truncating so that vacuum wouldn't get involved.
Which is, yeah.
Right.
Yeah.
Hmm. Hmm.
Yeah. Okay. And I'll just try and summarize that if I can for the microphone. The, the, the, um, um, that when you get a huge backlog, so the queue becomes very large and then you catch up processing that you can sometimes finish up with, um, used pages, sparsely spread around the heap.
And so it doesn't ever manage to truncate the thing, I suppose that's the basic summary right? So then you get this kind of ongoing effect caused by an earlier moment when you had a very large backlog.
But isn't it the case that if you eventually manage to have an empty queue, you'll still eventually have an opportunity to truncate, right? Bit of luck, yeah.
Yeah, I don't have too much more, but we can talk about it again in a moment.
Okay, so what things could we do better in Postgres that would help with this kind of workload? Well, one thing I've noticed is that if you have many consumers and you're using the listen notify approach to signaling, you can get a kind of stampede, like a thundering code problem type thing where you just put one thing in the queue and you notify
and everybody that's listening on the queue, which might be, let's say you have 64 of these workers, whatever, they're all going to wake up and then go and look in the queue and that's kind of a waste of CPU. So if we had a kind of signaling mechanism which were more like a function that you call and you just block waiting for it to give, and then you had a way to notify one and it would just, you know, something like that could be a good addition I think to Postgres
and could be done as an extension to support this kind of workload. This is kind of almost a silly thing to put on a slide because it's obviously a gigantic engineering task and not coming to a database near you soon.
But the whole, and it isn't really specific to queuing, but just looking at the behavior of vacuum and trying to get smooth performance from a system like this is just a good reminder that other databases don't face this kind of problem because they use the undo log approach to managing multiple version concurrency control.
And no doubt such approaches bring other problems into your life, but perhaps at some point Postgres will address that and then you just won't have to worry about that kind of vacuuming problem. Another thing which I thought about a bit in the process of working with queues and in the process of working on skip-locked
is that queue-like workloads are the worst case for serializable or one of the worst cases for serializable transaction isolation. And yet serializable transaction isolation is a really good thing that you might want to be using. So it occurred to me that there could be something that if you sort of tilt your head sideways and squint it might sort of be a bit like skip-locked
which kicks in when you're using serializable. If you have two sessions that both tell the database that they want some rows and they don't care what order they come in and they've got a limit clause saying you only want one row, then maybe there's some way you could teach the executor to show different rows to the different sessions
because some other session has an SI read lock. It would just be reordering the rows, but it would mean that two sessions that are running at the same time that both say, hey, give me one row out of whatever there is.
I don't care which one it is. If two sessions say that and they're both serializable, maybe one would see one row and another would see another row as a strategy to avoid a strictly unnecessary conflict. That's maybe a bit weird. I don't know. But in some way it's sort of conceptually linked to skip-locked, but it doesn't have to be so brutal
because skip-locked is a kind of primitive mechanism. It sort of shows you an inconsistent view of the database, right? It shows you it's hiding a row from you because someone else happens to have it locked. But maybe there's a more gentle way to do that that could be somehow worked into the SSI mechanism or rather the executor when it's running in that mode.
That's just an idea. And that's it. Any more questions or comments? Yeah, so Postgres has had advisory locking for a long time
and it's a thing where you can, using a number, you can say please lock number 42. And if anyone else tries to use advisory lock number 42, they'll have to wait. Or you can try lock 42 and it'll succeed or fail. So if you can somehow, if there's something in your queue-like table
which is some kind of ID that you can munge into an integer, or is an integer for example, then you can use that as a way to, you'd put that in the where clause. So whereas skip-locked goes in the for update clause, you say select blah from blah for update, skip-locked, limit one. In this case, you would put it into the where clause.
You'd say where, what is it, pg underscore, try underscore, advisory lock, however you spell it, 42, or ID rather, if ID is an integer, or hash of the ID or whatever, some way to convert it into a number and that's a way to get it.
So the question is, is there a canned sort of thing we can use? So I'm aware of queue classic for Ruby people. That's a library that I haven't used. I don't do Ruby myself, but I gather it's quite popular.
And it does all of these tricks and provides a simple interface to that.
So is the problem that the Ruby Python client... Yeah, but does the Ruby driver for libpq not expose the socket
so you can't use select to wait for... Or did whoever implement that not use that trick so that you would essentially be...
Where was the memory leaked or wasted?
Well I think notify and listen are good and very useful.
And it's good that the semantics with sending on commit and so on, that's all great. But I do think that we could probably, as I said in my slide, do something a little bit better. Particularly something that can deliver to one of the listeners. And that means changing, so that you can know how many listeners there are,
that means changing to a model where someone's actually blocking while they wait with a function call or something like that. I think we could probably do something a little bit better. At the back there.
Yeah, I think you're right.
Yeah, I think we could definitely do better than listen and notify.
I think probably a B-tree is the right thing for this. And I think if you have an ordering which is basically going up, like for example the time, and you're consuming things in that order, then a B-tree will behave very well, I think. B-tree bloat comes from filling up a B-tree and then deleting everything out of it,
but in a kind of decimating way so that you leave bits scattered all over it.
Do you have that problem? Interesting. And what kind of thing are you using as the key in the index?
How do you get to that pattern where you've filled up a range of values and then deleted bits from that range?
OK, so you're indexing it by insertion time and they specify when it would be.
But pgRepack would reorganize tuples in the heap.
Right, I see. I see, right. Yeah.
OK. Any other questions?
Ah, yeah. Yeah, that's an interesting idea. I think it's come up on at least one of the mailing lists a couple of times. No, I don't have a plan to do that at the moment.
It does sound interesting though. The thing with limit, the first thing is do you want to have limit on updates? That seems kind of weird, right? Like putting a limit on a select is kind of filtering the results from a query, which seems OK. But doing an update where you have a limit on it, what does that actually mean in terms of relational algebra or something?
It kind of seems a bit weird. Does it? Oh, then we should add that, yeah. Yeah, I mean, yeah, yeah.
So, yeah.
I think we're out of time and we need to wrap up.