Postgres Performance in 15 Minutes

Video in TIB AV-Portal: Postgres Performance in 15 Minutes

Formal Metadata

Postgres Performance in 15 Minutes
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
In 15 minutes, plus Q&A time, Postgres expert Josh Berkus will explain the essentials of making your database performance "good enough" that you can ignore it and move on to other things. This will include: Why database configuration is less than 20% of performance The 14 settings most people need Why connection pooling is essential Avoiding bad hardware DB performance for the public cloud Stupid things your app does which kills performance Enjoy this fast-paced roundup of PostgreSQL performance essentials.
Meeting/Interview Square number Bit Cartesian coordinate system
Default (computer science) Arm Computer animation Multiplication sign Square number Configuration space Website Parameter (computer programming) Resultant
Dependent and independent variables Arm Computer animation Integrated development environment Code Multiplication sign Sound effect Configuration space Set (mathematics) Database Logic gate
Content delivery network Mobile app Concurrency (computer science) Computer file Relational database Surface Consistency Weight Data storage device Database Mereology Cache (computing) Medical imaging Process (computing) Computer animation Square number Quicksort Object (grammar) Endliche Modelltheorie Resultant Physical system
Code Real number Multiplication sign Electronic mailing list Database Set (mathematics) Cartesian coordinate system Field (computer science) Entire function Equivalence relation Loop (music) Process (computing) Roundness (object) Computer animation Order (biology) Video game Pattern language Object (grammar) Quicksort Table (information) Resultant Row (database)
Fraction (mathematics) Different (Kate Ryan album) Database Mathematical optimization Physical system
Computer animation Computer configuration Query language Gradient Formal grammar 1 (number) Energy level Sound effect Quicksort Traffic reporting Computer programming
Vacuum Functional (mathematics) Code Multiplication sign Gender Expression Execution unit Database Revision control Subject indexing Computer animation String (computer science) Order (biology) Pattern language Quicksort Table (information) Extension (kinesiology) Row (database)
Web page Area Group action Dependent and independent variables Service (economics) Database transaction Database Maxima and minima Cartesian coordinate system Mereology Computer animation Query language Writing International Date Line Physical system
Meeting/Interview Computer hardware
Area Logical constant Statistics Befehlsprozessor Multiplication sign Database Right angle Quicksort Replication (computing) Resultant Physical system
Mobile app Functional (mathematics) Equaliser (mathematics) Multiplication sign Data storage device Bit Database Thresholding (image processing) Goodness of fit Kernel (computing) Different (Kate Ryan album) Order (biology) Computer hardware Point cloud Quicksort Resource allocation Physical system
Server (computing) Mobile app Service (economics) Scaling (geometry) Meeting/Interview Computer configuration System administrator Computer hardware Data storage device Database Online help Bit
Server (computing) Word Latent heat Computer animation Shared memory Database transaction Instance (computer science) Cartesian coordinate system International Date Line Connected space
Database normalization Query language Real number Database Quicksort Cartesian coordinate system Proxy server Connected space
Server (computing) Code Gender Virtual machine Analytic set Instance (computer science) Cartesian coordinate system Flow separation Entire function Connected space Workload Latent heat Computer animation Angle Order (biology) Lastteilung Pattern language Quicksort Table (information) Routing Traffic reporting Reading (process) Mathematical optimization Tunis
Slide rule Arm Counting Analytic set Set (mathematics) Limit (category theory) Variable (mathematics) Cartesian coordinate system Web application Cache (computing) Process (computing) Computer animation Semiconductor memory Query language Operator (mathematics) Buffer solution Configuration space Right angle Quicksort Resource allocation
Default (computer science) Computer animation Bit rate Semiconductor memory Buffer solution Set (mathematics) Endliche Modelltheorie Login Spacetime
Web page Slide rule Statistics Dependent and independent variables Randomization Arm Concurrency (computer science) Divisor Set (mathematics) Family of sets Subject indexing Computer animation Point cloud Table (information) Resultant
Demon Axiom of choice Decision theory Multiplication sign Modal logic Disk read-and-write head Mathematics Meeting/Interview Core dump File system Office suite Partition (number theory) Physical system Exception handling Algorithm Shared memory Virtualization Thermal expansion Instance (computer science) Data warehouse Demoscene Process (computing) Order (biology) Configuration space Pattern language Right angle Quicksort Row (database) Spacetime Vacuum Overhead (computing) Computer file Thresholding (image processing) Number Frequency Computer hardware Electric field Reduction of order Speicherbereinigung Default (computer science) Multiplication Cellular automaton Analytic set Database Volume (thermodynamics) Cartesian coordinate system Subject indexing Software Query language Table (information)
but the of the of the of the of of the of the of the of the of the of the of the of the and they have on hand I don't have a little bit about some basic things you can do to improve POS greatest performance for your Django application and that our running elephant here a few people know that elephants can run at 20 to 25 miles an hour actually quite fast on hand pose squares can view of thousands of requests per 2nd so the not getting thousands of per request per 2nd is probably a reason why I
hope so at what you do is you learning interpo squares in you going to post was killing that can't and you set the hidden parameter go faster to 10 and then you so even when you're done the the later than that questions the now seriously unfortunately it's not that easy and the we actually spend in post was posted the world spend a lot of time talking about how can we make things fast Armed Islamic would probably spend more time problem by that than just about anything else on except maybe the I can q and whose turn it is to review things it had that as a result if there is something that we could do by default and make post with faster we probably already did so this the things that you know you to make POS grows faster are going to require work on you can't just change a few configurations and in fact remember the things
that I do get paid as I do a lot of performance tuning performance in here a different sites and this is more or less been my time there is a lot of site I bet Minnesota my time you notice that tuning the configuration is a pretty small minority of how I spend my time on
and the effect tuning the configuration has isn't even smaller minorities Armed Islamic sometimes in some environments changing posters that of settings will have no effect no measurable effect at all on database to so instead of going talk about some of the other things you can do which have much larger effects on database throughput and responsiveness the first one
years do less the fastest good this request is the 1 you don't make at all in time that you're adding these of code this going work with with that you know any time you're referencing the or em a whatever you say 1st of all I the database the answer that this is something that I could be answering even the individual during the session without calling out to the gates now 1 of the things that would have about
there is catching of course but I mean uh this 1 is just look in the results cache and actually gives the results cache on that seems obvious come back for some reason people don't do it as much as they could on if the results cache isn't enough because you actually need to share things among several them back ends then you can use things like Radisson memcached also don't forget about using CD ends for catching large objects aren't even allowed obviously being stored in the database system Jameson's out there people storing images in the database storing compressed that and that sort of thing do that that you know when you retrieve it once copy it out oppose squares loaded up into a CDN I'm reference in the CDN by file name instead of retrieved from the database all the time on because there are a lot of it's a lot easier to scale CDN the reduced to scale a relational latest things that you don't need a relational database for them elsewhere so caching surface part thing and do is much catching if you reasonably can with your concurrency and and sort of data consistency model so on the other thing that we see lot in doing less is actually some insight that it's better better exhibited through common mistakes that people make it I want which is appalling
I know this is a very simplified example but is something I see a lot of with soundly-based based apps others which is let's have every back in Poland the database as fast as it can so that is planning for the jobs with no way when I was a little tiny weight like 10 milliseconds well this generates thousands to hundreds of thousands of database requests per 2nd for really the majority the draft and not another thing you another intact
pattern is requesting that you already have well this is from a real life example of some let's look up users by the user ID and then returned to the user ID whereas the after equivalent of solidarity from users where I d equals something I see that lack the good does not need to be in the loop here at all you can save yourself that request and also dating you don't need and for example returning an entire table in order to get the 1st well it's great when there's only 1 row in the table when there's 10 million rows in the table does not work so well and so of some of this if you have wide tables you some of the values was methods in order to return only the columns you need on particularly if the table has large objects and nobody fields large text that sort of thing to make a huge difference other than that we see
on that that is doing joins in the application code so that is I've got red let's get this list of things and then let's take an idea that was the things in which loop over that in the quest each related set of rows from the database 1 at a time this means that if my roster actually has 150 players in it that I'm going to be actually 150 separate requests to the database each round trip with its own latency to get those players dates you know we spent a good 15 20 years and posters land optimizing joins in trying to make them perform fast let us do our job you know use the multi-model stuff so that it gets passed down as a join into the back into the database and have the database return a joint results that instead of doing what amounts to an absolute joined in your application not
having limited everything and starting to do last the 2nd thing that would actually look at the years let's get rid of some resource-hungry requests were fixed if you actually do a lot of
database performance optimization like I do want things you discover is that a tiny fraction of the request against the database consume a vast majority of the system resources and that you really only care about the yeah maybe all that stuff in that 1 tails not as efficient as it could be but you don't care because it will make that much of a difference to fix so it's fine that's 1 of the really good tools that you find these
impose quizzes think called PG badge and where have had the better works is that you turn on and the logging for both grammars on every level options got it and then I you cut those labs death around you went through this program is probably you don't have to hack it to use it on and that it provides you with is incredibly detailed report of everything you're doing and what is your you really
care about for the so request things got this lovely sort of top query report you know so the individual queries generally time-consuming charisma freedom grades etc. so this is like that the queries that all you know through repetition or through individually running slow took up the most resources knows the ones that you want effects and what you want to those you start
looking at ways to fix them 1 of the greatest years is adding indexes i'm because you discover hey I'm doing this look up on this 1 column that has no indexes all the time and now that my tables a million rows of kind of back on in the fixing a filter expressions fixing them are you know your searches and that sort of thing so that they can use index i'm because for example if you're comparing a date value to something the accusing the time in Python then we're not immediately using index and back and pose for us to do that the community so a couple table on sometimes close was get out of date on that can be because you've got a weird update patterns orderly analyzes in keeping up it could be because you turned although vacuum slash orderly analyze off which is a bad idea on and some you know where you just don't hold a disabling injury manually in all on some other
especially that's how I V i gender methods allow you a lot of text searching aren't important thing to understand is that by default that tech searching is not supported by indexes in most databases on even starts with if you're database is not built with coding that of its but with actual unit code on version are you actually need to create especially index that will support such with which is this thing called Erica pattern on look certain that string there's there's more detailed instructions on you find by Google and do that on if the need to do case-insensitive then you need to you use a function or use case-insensitive text and both grows on I contains you basically need to look at some of Jane those extensions that support post was full text search the now I'm most
people don't use explicit transactions with a Django applications but some people do if you're doing you know a financial suffer other issues where you need to actually have atomic will they stick actions you can get into trouble in a few areas I and I don't belong right we read read read read read read read read commit because we were waiting for all of those reads before we do the connect on if those reads don't need be part of a transaction take them out I'm always thing this don't do being know let's at begin let's write some stuff with reasons that now it's Winterson pages for the user to look at the I don't know it's commit the transaction undp and the last would be let's wait for user response and Nelson another query in the same transaction because well that's happening what's happening in the database service we have what's called idle transaction and I'll transactions condemn resources especially locks on In the issue
with max is that lot can block other activity on this is a quick query check to check for queries that are being blocked by locks on the general thing about being brought by last is there's no amount of system resources you can throw at things that will make stuff better because it lacks our self-limiting on so this is something to look at something to think about if you doing concurrent writes now 2nd
portion is let's get some adequate hardware notion of saying the best of because really the best Herbert years the hardware that is fast enough for what you need and no more because you only spend extra money for performance you don't actually need now the
corollary to this year's that at its best post was will be as fast as you what we can't be faster than your hard work right yeah you throwing it on in AWS T 1 tiny do not expect to serve 25 thousand requests per 2nd it's not going to happen
1 of the areas where people can't under research chronically is I O I'm partly because I'm in hosting virtual hosting the various hosts make IOU most expensive resource and for that reason people didn't under Al-Qaeda on and as a result the Anibal crappy performance even though they have plenty of CPU and RAM left
available past grows right stuff all the time you know obviously stuff like rights and commits but even on a weed mostly workload pose was doing things like writing to support replication I do this thing called hit that's which I don't want explain but it does have a lot of sort of constant background rights writing statistics on about what's in the database so if you are limited by the absence throughput on your system that is going to when the post was performed so get adequate I
examples here if you your hardware I'm just going on and moved SST easily having already there's no good reason not to they're not even more expensive anymore on if you're in the cloud look at your apps allocation in the storage that your act again increasing it is not that expensive and can make equality in order make a difference in performance also for stuff and also using any of the 3 I am on Linux thank goodness on has also been some Linux kernel issues in the recent past you can read about these on that made for terrible I O performance I have now in terms of
Wyoming it's completely threshold you basically have to 3 thresholds of William for functional system 1 is the encapsidated you need most of the time the 2nd the cattle database and the 3rd is that it fits in pose quizzes dedicated catches a minority of right the v ion it and you know where this fits and allocating way is if you're in 1 of these sort of thresholds in by getting just a little bit more more and you can actually move up to the better threshold is generally worth to
isotopes here for the Amazon Web Services use um the I currently general provision over al-Qaeda act out of your storage on any will actually get better throughput then the new other options on post as a service which is offered by various companies gone war I recruits excetera that saves you administration it doesn't help you it performance necessarily so if you are actually performance constrained by the database and then you know you might end up actually branching off on your own it on early and make sure that your app servers he and post are in the same are the availability only i'm because latency can kill you the have some
little bit about stealing infrastructure was assume you've got to get adequate hardware again that's not doing it for you actually need to scale infrastructure so here's
requested easy things I want is use worshippers was how we put performance improvements of various kinds in every release I'm so upgrading is worthwhile initial pose was running its own server instance I databases tend to use all the resources which means they don't share well with other kinds of applications and then the
other thing is used PG bouncer with connection pooling specific transaction pool with your application because on post is extra connections even if the idle cost resources and so if you have hundreds extra connections you're paying for that the word
people so works is it's an event-based polar that connections coming in but they only get out did a real database connection when you actually have a query to run or when you're in a transaction on and that saves you a lot of resources on the database on if that's
not enough then if you have replicas anyway for redundancy let's look at and load balancing some of those read request to your applicants and the it and I you know the that 1 mountains require some kind of a proxy on there are various third-party proxies out there you can use the Belcher kind of this wage a proxy that sort of thing whereas the
use is acted as you've angle rat on it's the most effective way because only within the application code you know if you're about to do redirect i and for that reason using gender routs and actually having a read connection on the right connections going to help you a lot of I
now you can load balance read even load balance generically but it's much more efficient actually load-balanced specific portions of your workload for example if you can move any sort of large reporting analytics you do off onto a separate server a separate read replica you can actually post pose performance optimizers on its own a lot better when it has a consistent work all and then you actually reduce the manual tuning for their consistent workload as opposed to a mixing completely out of work and so moving say reporting often do its own machine moving machine where you're actually pulling entire tables in order to refresh cash onto its own read replica of moving killing that is like salary back by post was a whatever that has its own very specific access pattern that's a lot easier for proposed was to cope with if that's all that that particular instance suppose was due now to go
through all the stuff facility infrastructure I am actually gets notes and that count but I'm doing this laughter the ways that it really is active the least important thing and
so I've got some configurations of in here are my work put at the slides later on and so you're actually break this out this basically process has 230 some configuration variables you only care about a tiny handful of these arms I have a few of them 1 is we can determine impose grows automatically how much RAM is available to pose questions if you settings that you need to configure based on the amount of RAM available to post I want to assure buffers which is post was dedicated cache should be about were array working memory is shared memory limits for more doing per query operations like sorts on you know again 8 to 32 megabytes for your basic web application may be 120 8 to 1 gigabyte for reporting Analytics Application on don't or allocate city you don't actually run out of RAM the reason we have a limit there is because you want right
on I was a simpler effective capsized just basically tells post was how much space you have for caching on so I've requires array of I'm we've got 1 in the cold war buffers and that's I it's a common explanation the sensitive or gigabytes Armenians work memories the memory available for things like codevector model
analyze on their some settings and determines the size in and the rate of refresh for the transaction log on the defaults for this a kind of low at least until Bush was 9 . 5 comes out on instant generally 1 actually bump them up
I it and a few other settings moving stats these statistics to RAM desk and can improve responsiveness lot specially in the cloud and Our for SST is in for cloud you actually want to decrease random page cost which is the cost factor was results in a might using index Remington skin the table arms to effective I O concurrence of this but is put in here
for i said turn on all the learning settings for PG Badger this is that set of settings arm and it's in the slides you know for later and so we have idea pose of
violence number 1 do less querying fix your research we requests get adequate hardware skill infrastructure and then finally you know if you've done all those things a few waiting for some those things in the configuration file so
questions a few of the minutes for questions so that in this question the but now I everything is running as fast as you could possibly want you are there any performance heads for all the longing I the use performance overhead involved the lot because you're doing a lot of right that but with the very variable depending on how many queries you cheapening per 2nd how long and as queries that sort of thing and is the activity log being stored on the same I O resource as the rest of the database so I've seen that overhead be anywhere from not measurable to you know I If you're already close I saturated and it's on the same resource database the actually causing serious problems um the a various could you spend a little more and analyzing and infection in our analyzing and not acting in yeah but you know when you know you there's 0 some table you can correlate shows you the last time that was performed on tables in what that was cells after get is this to budgets vacuuming is different is garbage collection for both us we're doing is we're baggage collecting all of the rows that are dead because they've been replaced by new Rosa deleted on in cleaning up some other stuff cleaning up the index reference in that sort of thing and we do that asynchronously deferred because we don't want user request to wait on that back but a vacuum can't keep up for some reason on then you end up doing what we call bloated tables and bloated indexes that have a lot of dead space and that has been collected and now is analyze is updating the POS Christopher sticks about what's in your table so that we can actually play and execute the queries in the fastest way I now these things are only handle by something called the art of acting demon that use is on by the users a threshold algorithm to determine when it needs to vacuum in analyze various tables but if you have an a typical usage patterns sometimes the art of academe doesn't recognize when it needs to work on things aren't or if you just have an enormous amount of activity under the default set direct the might not keep up this need to bump up the number of workers and their frequency of work but how do you identify what the fuck you I'm so bloated tables and for post Buddha tables and some core examples for a you find it on the end if you're seeing a lot of queries and costly and begin to sort the analyzing inquiries to find out the queries are bad because the stature we thanks yes you mentioned no more of you xt 3 you have a preferred recommendation for false system for cost grows instance and does that decision change whether you're talking about physical this process to yeah it doesn't change in the visible this SST I but I generally so I use exodus from were smaller transactional databases on if I have a choice the and I use EFS a lot for data warehousing analytics on and the reason for that is that the press has a lot of nice tools for volume expansion copying and that sort of thing I bet it tends to be a little slow and small rights on and here the floor if that's what's installed in you have to get special permission to use x at you mentioned as is if you're running on hardware but other writers are also provide offering necessities so you weren't thing not here there's not that you have that and actually hardware verses verses virtual hardware versus all of you have read in the US and the use as the and yes yes the ever there used to be some sort of trade offices HUD resistance the these days the only reason right would use needs DB is if you have large volumes of data as in multiple terabytes and you're willing to live with it being slow in order to save some money note that the only reason to use spinning I come from sad world of Microsoft where are indexed fragmentation matters as wondering if that matters and posters and how you do with it yes it does but it does I E N I V and this problem things there 1 is I put it if person is being stored in the casino other network share given its own partition I'm so that you get less fragmentation interleaved with other files stored on the scene bumpers that can be actually pretty bad under the Alan the other thing that and failing to reduce fragmentation that really a lot of other things you do to reduce fragmentation except for war and I changing her dad it's in the database which is obviously application change you know or you know obviously I recopying stuff I still have that exactly what the problems of growing was present the fast for whatever reason because of how in the press writes files fragmentations of getting much worse than it is in england file systems on and we haven't honestly we look into why because a population of people who run posters doesn't care about performance is pretty small the that's all time and again thank you very much and my my mom and by the