Postgres 10, Performance, and You
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 88 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/37328 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2018 | |
Production Place | Pittsburgh |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
RailsConf 201832 / 88
9
14
16
19
20
22
23
26
27
28
34
35
36
37
38
39
41
42
46
47
53
57
60
62
63
64
69
72
80
85
87
00:00
Coma BerenicesStandard deviationExtension (kinesiology)Query languageHash functionCommitment schemeData integrityParallel portPartition (number theory)Table (information)Price indexE-textInterior (topology)Subject indexingVacuumSequenceReplication (computing)StatisticsTerm (mathematics)Radio-frequency identificationService (economics)SQL ServerVideo gameComputing platformInheritance (object-oriented programming)Table (information)VacuumSequenceRuby on RailsInheritance (object-oriented programming)CodeVideo gameNumeral (linguistics)Projective planePort scannerMathematicsData integrity2 (number)Hash functionStatisticsHigh availabilitySequelStandard deviationAsynchronous Transfer ModeSlide ruleSubject indexingQuery languageComputing platformReplication (computing)Electronic mailing listNegative numberDatabaseContent (media)Different (Kate Ryan album)Parallel portMereologyFacebookService (economics)Extension (kinesiology)Cartesian coordinate systemBitDiagramComputer animation
03:25
InternetworkingRead-only memoryMiniDiscQuery languageSpacetimeCache (computing)Pointer (computer programming)Convex hullDatabaseCodeMiniDiscCache (computing)Different (Kate Ryan album)Query languagePhysical systemBitPointer (computer programming)Multiplication signBuffer overflowTable (information)DatabaseMathematicsSemiconductor memoryMultiplicationScaling (geometry)String (computer science)Row (database)SpacetimeConcurrency (computer science)Overhead (computing)ThumbnailComputer animation
05:48
InternetworkingPartition (number theory)Table (information)Overhead (computing)VacuumHypermediaData storage devicePrice indexQuery languageStructural loadOperations researchRule of inferenceRow (database)Pivot elementTable (information)Partition (number theory)Extension (kinesiology)Product (business)Multiplication signRow (database)SequelData storage deviceMereologyMiniDiscPort scannerVacuumHuman migrationMetreMetropolitan area networkBest, worst and average caseAverageSubject indexingCASE <Informatik>Complex (psychology)Scripting languageRandomizationFrequencySystem callComputing platformConnected spacePivot elementSheaf (mathematics)DatabaseRandom accessProcess (computing)Overhead (computing)Structural loadComputer animation
08:40
Computing platformDirected setAddress spaceDevice driverUniform convergenceInformationTrailAddress spaceService (economics)Mobile appDifferent (Kate Ryan album)State of matterData storage deviceKey (cryptography)Uniformer RaumDevice driverRegulator geneRule of inferenceComputer animation
09:36
Address spaceDevice driverPrice indexRadio-frequency identificationHash functionRight angleDevice driverMetropolitan area networkSubject indexingHash functionComputer animation
10:08
Block (periodic table)Error messageFinitary relationAddress spacePrice indexHash functionDatabaseReplication (computing)Subject indexingBackupSubject indexingHash functionError messageDifferent (Kate Ryan album)Replication (computing)Computer animation
10:40
Subject indexingAsynchronous Transfer ModePrice indexHash functionAddress spaceDatabaseCrash (computing)Web pageSubject indexingWeb pageReplication (computing)Regular graphHash functionSinc functionMechanism designSet (mathematics)Computer animation
11:41
Address spaceHash functionPrice indexConcurrency (computer science)Subject indexingContent (media)HypermediaBounded variationInformationLatent heatHash functionGame theoryRow (database)Flash memoryComputing platformConcurrency (computer science)HypermediaMultiplication signAddress spaceHand fanData structureCASE <Informatik>Subject indexingDifferent (Kate Ryan album)Right angleComputer animation
13:34
Content (media)AerodynamicsParsingTable (information)Partition (number theory)Library (computing)Radio-frequency identificationE-textSubject indexingOperations researchConsistencyFile formatQuery languageLevel (video gaming)Source codeOperator (mathematics)Different (Kate Ryan album)MathematicsTrailInformationSequelDynamical systemRow (database)Subject indexingMultiplication signData storage deviceSubsetMultiplication tableComputer animation
15:12
E-textContent (media)Human migrationKey (cryptography)Digital filterSubject indexingRadio-frequency identificationStreaming mediaParallel portMemory managementSequenceQuery languageMereologyTable (information)Mathematical optimizationProcess (computing)Port scannerPrice indexStructural loadWeb pageRead-only memoryMechanism designSet (mathematics)Overhead (computing)Function (mathematics)Data structureResultantSubject indexingType theoryStreaming mediaQuery languageComputing platformMultiplication signPort scannerParallel portProcess (computing)Information retrievalLevel (video gaming)BitNumbering schemeHand fanOverhead (computing)AreaMechanism designMathematical optimizationSet (mathematics)Raster graphicsOrder (biology)1 (number)Constructor (object-oriented programming)Power (physics)QuicksortTouch typingMiniDiscSequenceDifferent (Kate Ryan album)Table (information)Perspective (visual)View (database)Computer animation
19:34
Query languageParallel portOrder (biology)Context awarenessIntrusion detection systemPrime idealRootDependent and independent variablesComputer animation
20:07
Query languageParallel portOrder (biology)Row (database)
20:33
Query languageParallel portOrder (biology)Context awarenessParallel computingQuicksortOverhead (computing)ResultantMessage passingContext awarenessOverhead (computing)Set (mathematics)Information retrievalParallel portTable (information)Line (geometry)Computer animation
21:24
Query languageParallel portPrimality testDirection (geometry)RootChemical equationInsertion lossNetwork topologyContext awarenessLevel (video gaming)Operator (mathematics)Uniform resource locatorComputer animation
22:04
Parallel portQuery languageNetwork topologyOverhead (computing)Subject indexingProgrammschleifeParallel computingLattice (order)Price indexAlgorithmOrder (biology)Default (computer science)Computer fileOrder (biology)RootElectronic mailing listContext awarenessDefault (computer science)Table (information)Software repositorySoftware bugSubject indexingOperator (mathematics)Goodness of fitQuery languageMathematicsHash functionSheaf (mathematics)Overhead (computing)MereologyParallel portProgrammschleifeSet (mathematics)Different (Kate Ryan album)Level (video gaming)Pairwise comparison1 (number)Computer animation
25:03
Convex hullQuicksortQuery languageProgrammschleifeLimit (category theory)Query languageSubject indexingMathematicsSoftware repositoryRun time (program lifecycle phase)Table (information)Computer animation
25:43
PlanningQuery languageSubject indexingObject-relational mappingType theoryPrice indexHash functionCommon Language InfrastructureComputer hardwareExtension (kinesiology)Radio-frequency identificationTable (information)Game theoryMathematicsCommunications protocolConnected spaceAnalytic setObject-relational mappingComputer hardwareExtension (kinesiology)Multiplication signQuery languageNumeral (linguistics)PlanningNumbering schemeType theoryPartition (number theory)Point (geometry)Subject indexingMathematical optimizationLatent heatHash functionRuby on RailsComputer animation
27:52
Bookmark (World Wide Web)Extension (kinesiology)View (database)Table (information)LengthFlow separationInformationDatabaseMiniDiscQuery languageBefehlsprozessorReading (process)Radio-frequency identificationPartition (number theory)1 (number)MathematicsExtension (kinesiology)Time travelCore dumpStability theoryMereologyFlow separationPlanningVirtualizationLengthTable (information)outputBefehlsprozessorComputer hardwareMultitier architectureBlogMathematical optimizationQuery languageDifferent (Kate Ryan album)Limit (category theory)InjektivitätComputer animation
30:33
DatabaseCommon Language InfrastructureInformationMultiplication signDatabaseSequenceTunisBit rateCache (computing)Basis <Mathematik>Port scannerExtension (kinesiology)
31:22
DatabaseServer (computing)Database transactionBeta functionStatisticsClient (computing)Reading (process)Default (computer science)Directed setBeta functionStatisticsDatabaseConnected spacePresentation of a groupCartesian coordinate systemQuery languageCASE <Informatik>MereologyWorkloadFeedbackProcess (computing)Server (computing)Computer animation
33:43
Coma BerenicesBlock (periodic table)Data typeJSONXMLComputer animation
Transcript: English(auto-generated)
00:12
The 30 seconds early, but I'm Gabe I'm Giving the talk on Postgres 10 performance and you so just to give us a quick kickoff who
00:28
Slides not working who here uses Postgres kind of show of hands Oh Whole lot of folks. Okay. I wasn't sure if we had any my sequel users or anyone anything else. So
00:42
So we're here for a lot of the same reasons We recognize that Postgres has a lot of really great things that are going on for it in the community native JSON support native hash support There's a lot of great community standards for data integrity and an insurance Continued excellence and a lot of really large and really expensive sub queries and a lot of numerous
01:06
Native extensions and extensions supported by the community with standards built in natively We're all here for the same reasons, but Postgres 10 got even better as A performant database. So some of the things I'm going to talk about today are
01:21
Native table partitioning hash index resiliency a big one for a lot of folks which is full text index searching on JSON and JSON blobs that is native now and A big one that I'll probably talk about which will be a little more in-depth parallel queries and how they affect Large joins indexes and other portions of the code base and
01:43
Query analyzer accuracy and changes to how the query planner does efficient work as well so you'll hear me talk about some phrases that you'll probably sound familiar if you're Avid user of Postgres, but I'm not going to jump into in depth how much they work
02:02
So I'm not going to go into how much bloat or vacuum Sequential scans or replications are working as well as pruning and statistics These are also changes that happen to Postgres 10, but I'm not going to cover them in this talk But a little bit about me first I joined Heroku a little over a year and a half ago
02:21
I've been primarily working on Sinatra and Ruby on Rails applications for the better part of the last four years Primarily right now in my role. I'm focused on Postgres and high availability modes and large data growth So growth in the terabytes of data I've worked on a lot of different data service engines more than I can list. I'm not going to go through all them all there
02:44
That's too much time But on the side, I'm trying to be a lot more positive So I'm trying to do a lot of self-help and life coaching things So I'm trying to focus on a lot of inspirational things so on the side I do some super secret project work on this social interest platform because I like Facebook
03:02
But there's a lot of negativity on there, so Some of the things that I'm looking for some basic things that any social interest platform would provide which is profiles But I'm also looking for curated content so inspirational quotes TV characters movies TV shows Foods the list goes on and I'm imagining a lot of usage so
03:25
We're going to just jump right into it in the first thing that I'm going to talk about is Inspirational quotes, and this is one that I actually saw when I was traveling to New Zealand Which I thought was fantastic which is happiness is not a destination. It's a way of life, and I'm hoping to see millions of these
03:40
populate throughout the the code base and through the platform, but Wow that that grew fast So even though quotes aren't very big. They're just you know small string text Most databases even at large scale have significant amount of RAM tons of disk space, but Some usage scenarios that most quotes are never updated or revisit it after a week
04:05
And most people only want the most recent and up-to-date quotes. They don't want to see stale stuff so That's kind of tough that one table grew really quickly some quick math there And that's way more memory than I have so a little bit about how postgres and some of its caching systems
04:23
Are gonna work here It's gonna attempt to load as much stuff in memory as possible and that it's gonna try to use What it has in memory, and if there's overflow it'll start doing disk writing with pointers to the files And it'll open them up at one at a time as it does file writing so it actually will manipulate the disks a lot
04:41
more So caching is done this way because if you access the data obviously you're probably gonna need it sooner than later So it's gonna try to do that in a couple different ways But the other problem with this is that Temp disk space is not free. I know we'd all like to believe it is, but it's actually not there are a lot of other overhead costs because usually temp disk space also happens with where the data is stored on the database on
05:07
disk and Caching records in large quantities can have some significant impacts especially as you're trying to Concurrently access the same table multiple times with large data sets, so
05:21
One of the real issues, and this is something that some people might have seen in the past Is that we can easily stop connecting to the database entirely if we run too many of these simultaneous really heavy bloating or locking queries so Who likes fast queries? Everyone I'm hoping everyone likes fast queries and databases staying up and continuing to connect yes
05:44
Yes, big big thumbs up in the front row. Thank you so a Great thing that they actually added as Postgres 10 is native table partitioning there's been extensions in the community for a while that do this, but Let's go into some of the big wins on native partitioning so
06:03
Native partitioning actually can entirely avoid the back vacuum in bloating process so bulk loads and deletes and offloads Can happen without causing overhead to your Postgres internals so slow accesses to your database Connections things like that They're gonna be a lot less costly because you're not stressing that one giant table as much our most
06:27
common used rows are only going to be the sections of the partitions we care about and A lot of the seldom used data can stay where it needs to on slow disk storage or even off on like You know cold storage we can move it off entirely and one of the great things about this is that
06:45
Queer performance improved dramatically we we got a lot more bang for our buck when we actually do the lookups previously We'd have to do best effort random index scans And then random access reads to the disks with large data like this disk
07:03
Data is storage is kind of fragmented in some scenarios and so in average of worst-case scenarios if you actually do Complexity some of these cases actually end up being worse than sequential scans so native partitioning is a great way to optimize for this Another key note to have here is PG part man is still in 10
07:25
it's still supported, but it doesn't have some of the Parallel support that we're going to talk about a little later in this talk Some other small noteworthy items for Native partitioning it's not natively supported in or M's obviously you saw me writing raw sequel in the migration script
07:45
There are have been gems to try to make this more consistent in things like active record But none of them are really well supported right now a lot of Another great thing about this though is tables can be created from your partitions, so if you need to do
08:02
What I'd like to call Yomo tables your your months tables, this is or even weak tables This is something that you could absolutely do And this is a great way to even segment even data even further as it migrates through and it's an easy way to understand your data over long periods of time and you can even create
08:21
partitions from your partitions I Would add a partition meme, but I feel like they're overkill But So now we want to do a pivot so we've had a lot of use of the platform and people are starting to create Products and now we want to do home delivery, so how do we get that done well? We obviously need customer addresses, so this seems pretty simple
08:43
And we want you know a mobile app to be able to access lots of different customer addresses when they're you know going making deliveries Right we want to take customer service seriously But we do have a lot of customers, and we have a lot of addresses so Addresses aren't uniform across North America actually they're very different even between the US and Canada and even between different states
09:06
different states have different rules and regulations about where delivery drivers can stop and and other really important features and Gotchas like that so nobody gets ticketed on the street So there might be some important instructions. We also need for not just customer satisfaction
09:21
But for driver notices as well So we can kind of store this unstructured data already in Postgres right H store already exists today a lot of people I'm sure use it. It's not new it's a key value store, but you know implementing it seems straightforward enough Let's just add the hash, right
09:41
Yeah, man those interests address lookups are real slow real slow The the other problem here is that when somebody's trying to make an update to this Our drivers aren't gonna get it in real time. It's not gonna happen We could add an index to improve lookups though, so let's let's try that and let's see what happens
10:01
So let's let's add a hash index on usage and and hope for the best right it seems like it should work Maybe maybe things improved but Show of hands anyone seen this error message before Show of hands who uses hash indexes in here
10:20
Not not many people okay Well that's that's good to know So Probably a lot of people know for good reason that hash indexes aren't really well supported in Postgres 9x For a lot of different reasons, but the key one being replication is is not supported, so it's not treated as a first-class citizen
10:41
The the other one is you get a lot of misleading failures like that that kind of looked like a Data corruption issue, but actually that's an index corruption And that's because the hash index can't keep up So there are other things about this is that if we have extremely large data sets to pull from Sometimes you have to completely axe that the hash index and rebuild it from scratch, which is not a great experience
11:08
What's the solution if we can't rely on this hashing mechanism if we really want to use hstore since it's mostly supported well Postgres 10 actually finally brought it into the light and made it a first-class citizen
11:21
So it added first-class citizen support such as replication. It made it crash-safe through shutdowns and failovers Replication is done faster lookups So the growth of each page for that index are actually consistent with regular indexing practices and the pruning happens much more Consistently as well, so indexes don't go as stale as often
11:41
so Going back to this we can rely on this much more consistently and Have better peace of mind and we do see actual improvement now checking through and updating the hash itself The the biggest thing that I'd like to point out here is the locking Concurrency issue was a huge problem, and we use hstore at Heroku for a lot of different use cases
12:06
the reduced locking for Concurrent access for records is a big win for us even though we don't use a lot of hash indexes hstore in general For the few places that we do that. This is a really big important thing for us
12:21
so Going backwards. We're talking about more curated content on on my social media platform And I want to talk about you know movies and TV shows and characters that we love that we can't get enough of For any big nerds out there. I'm a huge flash fan I met John Wesley Shipp for the first time earlier this year which was fantastic
12:42
So I want to list all these movies and TV shows and I want to get the characters for all of them, so Each show can have potentially 50 hundreds. Maybe even thousands of characters if anyone watches game of thrones I'm sure everyone's excited so each record can have a lot of varying data over time right and
13:03
Some things that we'll want to try to you know watch is genres catchphrases You know famous accessories and items or even just you know certain ways that they dress or look or interact with other characters I'm sure some people are going to get a little swifty after all of RailsConf
13:24
So we need some way to keep that data Unstructured but co-located with the general structure of how we want to address media and characters in general so Let's let's use a Json blob because this seems to make sense for this scenario We can query third-party data sources, and we can get them in consistent Json formats
13:45
So we can reparse them over and over again, and we can get updates The great thing about this is that we do have This flexibility, but it opens us Opens us up to a lot of other gotchas one of which is each row is dynamically sized because of this
14:03
So it's not consistent. What size each record is and Because we can have so many different data sources analyzing each character each record this could Potentially get a little hairy and tracking how this information changes over time So how can we keep this unstructured and not have to parse it all or create constant relationships through this so
14:26
We could create multiple tables. We we could add a lot of one-to-many relationships We could modify our rails app entirely to manage this information We could even do partitioning based on some subset of where the data is co-located per show
14:44
But as much as we could do all those things there is some other easier item to do and that is Using the full text index searching that's been added in Postgres 10 So because we can store whole request bodies we can store custom made Json blobs
15:01
This allows us for a lot of more no sequel dynamic schema operations And this is what we're really caring about especially when we're trying to get this off the ground in initial stages This adds a lot of flexibility so One of the great things here is that there's a lot of examples of how to do this online now on the Postgres
15:20
Materials, but this keeps our data structure on unstructured and flexible so The big thing about this is indexes can actually filter on a lot of different Views and kind of perspectives that you want to do for the type of index searches you want to do on your Json since They are treated as first-class citizens as well. And the other great thing about this is this gives us ways to
15:46
You know target folks on on the platform that we're looking at if we want to target them for you know Being big Marvel fans or you know, maybe somebody can't get enough of you know Joey from friends, maybe he just wants to say how you doing all the time
16:01
but I'm a big fan of Thor so a lot of hammers for me But let's let's jump back for a sec quotes was our big takeoff item that was the big one that got the platform recognized and people love throwback Thursday, so Let's let's create a throwback stream of quotes for for folks who want to know on this day
16:24
My favorite posters posted these quotes So we want to be able to replay a stream of old quotes, right? Well streams should include all of the posters that I liked and followed not just my own posts so that's a lot more quotes than you would normally see and
16:41
Even getting one month of posts and that's a lot of queries, especially if you're following a few hundred other people That's a lot of a lot of quotes, especially for power posters who post multiple times a day so Yeah, here we go, so
17:02
Postgres 10 did a lot of things to optimize how we do these types of queries and how fast they come back So we're gonna I'm gonna touch on three. There are more Optimizations in the parallel areas on how to return queries that Postgres 10 did I'm only going to touch on these three because these three are usually the big ones that people talk about
17:22
So we're gonna talk about parallel bitmap heat scans gather merges and merge joints So let's jump into it so in 9.6 Parallel searching and scanning was was introduced It was a little limited though. So It used to only do sequential scans and sequential scans were only done on the primary tables. So
17:46
This kind of created the problem where you couldn't rely on indexes you we had a little bit of a trade-off here where? You Either had to rely on your table caching to be really effective or your indexes to be really good
18:00
But you couldn't get an overlap here, which was kind of painful The big thing about Postgres 10 is that changed when they added way more parallelization to every level of your of your table schemes including indexes So you can it do parallel searching through your indexes first You'll actually go through and construct the only the index files first so that you are much more methodical and
18:28
Strategic on how you do your queries and how the results are loaded And so this requires a lot less data loaded up front a lot less Caching and way less results that's done to parse through and a whole lot less disk reads
18:41
So the next one we're going to talk about is gather and gather is actually the mechanism in which Postgres introduced how to do parallel query construction and and result retrieval so it was introduced with the idea of workers obviously to do asynchronous processing and
19:00
The big thing here was that there were multiple layers on how results were returned a lot of the time They were returned in arbitrary fashions mostly in the order of how fast they returned from each worker, so Unfortunately if you're adding searches or sorts to this portion of the mechanism This is going to add a significant amount of overhead because you're requiring not just the search of the b-tree
19:25
But you're also requiring an extra order layer to happen after the initial result sets are retrieved so a Small example here is you know I have a bunch of IDs in my b-tree
19:40
And I just want to retrieve the primes right so here's a small example of a b-tree The the issue here is that when when we're going through and and checking through primes The leaf nodes in some scenarios have more content than either the roots or some other leaf nodes So certain leaf nodes will turn faster than others the root node is our first return response
20:05
So 17 was actually the first node. I'll go back here, so 17 was the the first one we actually saw on the left side if you'll actually look on the three leaf nodes To the left of 17 11 is only got one so our first records are going to be 17
20:26
19 and 11 depending on how fast they return then to So we actually get stuff out of order so another pass is going to be required to sort through the Somewhat strange return results of the workers themselves and so they were only trained to return as fast as possible with no
20:45
Context well this changed in Postgres 10 so Postgres 10 changed this by optimizing the worker collection results So gather merges actually sped up the parallel worker result retrieval because they also kept Context of where they encountered the the data that they actually cared about and they kept them into account as the result sets were
21:07
returned so it removed a lot of overhead for sorting after the fact because At the end it ended up being mostly just a flat line So this was really beneficial especially for really large data sets traversing huge tables
21:24
So kind of revisiting this We can see that we have a whole nother side of the B tree. We didn't touch right I kind of glossed over it but for good reason Because really when we parsed it down we actually held context it did actually move through the B
21:43
Tree and and construct how we thought we were going to see everything the reason we kept context here is because we kept context of the locations that we visited each node at and So this is much more natural ordering of operations from an intuitive mental standpoint
22:01
So it's easier to mental map how our data is retrieved as well, too So the great thing is is because as we merge the leaves and the roots We basically can turn this into a linked list and it becomes a simplified array. That's Mostly natively ordered even before we add other order clauses if you add the order clauses
22:21
They're kept in context along with the parallel workers so they know what to do when they're ordering as they return results and Then the last one we're going to cover here is merge joins and this one While maybe Performance wise doesn't always pack the most wallet is the most one of the interesting ones for me as well. So
22:42
Kind of applying this to checking all the other posters. I care about for throwback Thursday I need to join on the other customers quotes, right? So Previously 9-6 only allowed this on nested and subquery loops or or hash joins This doesn't always happen on on the primary query of the table
23:03
So a lot of our subqueries were optimized for this but not necessarily the originating one that we cared about so This adds a lot of overhead and even though indexing and ordering is great It could actually make things worse and not better, especially the indexing portion, which is surprising sometimes, but it can be a gotcha
23:24
So Parallel merge join was added to consistently parallel Do parallel operations for all of the merges at all levels So all the subqueries all the way up the stack from from each sub table or even tables
23:42
You've had to rejoin to get different data sets for temporary tables. For example are now going to be done in parallel as well So this is going to make things much more efficient outer sides of the joins, especially when we do No comparisons or checks like that are going to be the most effective because we're not going to be as restrictive here
24:02
and it's going to be much quicker to run through certain sections of that with less data and combined with a lot of the parallel changes to the indexes Gonna be way faster in getting all of the data that you care about again The parallel changes that also happen to gather merge will give you some natural ordering by default
24:22
Even before you specify order, but will still maintain context for adding order as well So the last thing I'm going to touch on which I did mention this is a real-world example actually from something that we track at Heroku, which is Part of tracking our issues on our internal repos and external repos as well
24:42
So that we have a good list of items for when we get feature requests Issues bugs things like that and that is actually even though this table is pruned pretty consistently for the issues mapping the GitHub issue to a repo ID It's not huge, but it's not tiny either
25:03
Here is the query analyzer for for the table for a given repo ID Um Execution times not great. This index is basically not being used even though we expect it to be used in nine six It definitely didn't get used at all
25:21
And so we end up doing sequential scanning through it and it gives us some pretty crappy run times Here's the great thing with all the changes with Postgres 10 The parallel index got figured out by the query analyzer and it's being used now Even on a tiny table, that's ridiculous. I'm gonna let that sink in for a sec
25:48
That that's a half a gig table. That's not even a big table. That's a tiny one The changes that have occurred in the performance games are huge I'm not gonna go through query analytics. That's another talk in and of its own
26:03
That is way too much time for what we have to get through today. So unfortunately, that's If you want to hear me and talk more about it or explain anything more that I've seen My last year and a half two years come talk to me after But there are a few last-minute gotchas with Postgres 10 with rails that I want to go through
26:25
When I touched on some of them, but maybe some of you might not know rails 4 doesn't support Postgres 10 Unfortunately, they can the connection protocols changed and how Postgres 10 works And so unfortunately Postgres rails fuller only supports up to Postgres 9 6 if you want to make the move to 10, you will have to be on rails 5
26:46
Again as I've stated before partitioning is not native to the Ruby and and Ruby on rails ORMs So our specific index types like hash indexes and JSON indexes for the full text Query index searching we went over today and the analytics queries are all not available
27:05
Except for in raw sequel as well. So Some things to note but that's one of the reasons I love working at Heroku is because we try to offer a lot of tools for stuff like this So with the release of Postgres 10 and the support for that We've added a lot of extra support and numerous extensions to support Postgres 10 as well
27:23
We've added managed Postgres credentials because the permission schemes are very confusing as we've found over the years new Updated hardware plans at the same price points Improved CLI analytics because our analytics CLIs use the same query analyzer that they do there and the same PG internals that have had
27:42
The optimizations done as well and we are betaing a new feature right now for those to test out called Postgres connection pooling so The Postgres extensions that we offer are you know Numerous and most of them are natively offered
28:02
We offer a few external ones like post just and things like that If you want the complete list go go check them out But the big one that I love here is the is the time travel one That's that's a big favorite especially with the new changes to partition Postgres managed credentials was something we actually released late last year, but
28:21
We did this for a lot of different reasons. We had a lot of customer asks for this So Postgres permissions are really difficult to reason about they're at a lot of different layers and they have a lot of nuance into Creating granularity understanding revoking permissions things like that We still allow customization when we issue manage credentials
28:41
But we've tried to take a lot of the guesswork out of initially creating a lot of these credentials and roles for tables So we try to consider things like special tables or special characters No one likes SQL injection Bobby tables can can stay at home with his mom
29:00
Name lengths are another big gotcha. There's actually certain character limits that if you blow them Postgres silently doesn't tell you so that's that's another strange one And we've also published some of the best practices as well So if you want to see some of the other things you can check them out at the Postgres credentials Dev Center But I encourage everyone to even if you're not using
29:21
Managed credentials consider things like separations of concern Principle of least privilege and even better external control access if you're using a third-party vendor that needs data access You you want to give them very restricted things. You don't want to give them access to PII data for example to some of the things you ensure for your customers
29:42
So the other big one that we did earlier this year was Heroku PGX plans I was directly actually part of this project, which was fantastic I was very happy and excited about this which was we were able to utilize the the latest hardware virtualization For better IO better CPU and query optimization and also we've added all these plans under dedicated
30:05
Hardware so that you get consistent and stable performance We've documented about it in the blog post But the big wins here are that we've doubled CPU cores at tiers 4 and up We're exposing some of the dedicated provisioned iOS for plans
30:21
And we're increasing data disk size at every tier and prices either stayed the same or lowered So please go check that out if you're interested in and checking out some performance on the hardware side The one that I we do for a long time is one that I use on a daily basis Especially when people file support tickets for us is the Heroku CLI especially for Postgres
30:44
We've used PG diagnose and this is something that has been out there for a while But I want to reiterate because Postgres database tuning has been a big deal and other people have talked about it at the conference We've we've had a devcenter out there for a long time and we have the extension CLI to add on to the native
31:02
Heroku PG CLI commands so the big one is PG diagnose But if you're looking at sequential scans or caching and hit rate some of the things that I might have talked about today We actually offer pre Predefined ways of querying that information through all your tables and schemas. So please go check that out as well
31:23
now The one that I I know a lot of people have heard about and a lot of companies have started using is PG bouncer So PG bouncer for those of who are familiar or not have added Management for Postgres connection pooling This is basically because there's not really this idea built into Postgres
31:43
so connection pooling is done outside of Postgres and this kind of adds insurances to Making sure connections don't stay open longer than they're supposed to be queries don't kill your database and We've added this as part of the connection pooling. That's how this is done. PG bouncer is in beta as
32:03
Heroku PG connection pooling so Why use it? Well one thing Heroku in general loves to see is Stats, we love our stats This is better visibility into seeing how our connections behave
32:20
This also gives a lot of guardrails to protect your database and keep it up Especially if you have multiple applications connecting to a single database and this is really great for asynchronous workloads for anyone using lots of Active job or anything like that. This is fantastic We've offered it as a beta we're still working out
32:42
Making the best practices out there. We're still trying to get some some feedback from folks who You know ask for more customizability or better use case scenarios So we're still doing some some public beta work on it right now. It is supported out there and it is well documented So if you're interested in using what's on server right there for PG connection pooling, please go check it out
33:06
We still offer a client-side build pack that can run on a dyno It doesn't have as many of the built-in features that will be adding to Postgres connection pooling But if you are interested in using it separate from the database itself on your application layers or something like that
33:21
It's still available out there Whoo, and that is it. What a marathon Thank you so much, please come check out the booth I'm the last presenter for Heroku today and as always thank you again so much for letting me come and talk to you at RailsConf