9.4 On The Floor
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 31 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/19071 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Ottawa, Canada |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
PGCon 20141 / 31
1
9
13
14
18
19
20
21
22
26
27
28
29
30
00:00
BitCASE <Informatik>Closed setInformationThumbnailGoodness of fitRight angleComputer animation
02:44
Computer virusRootService (economics)Internet service providerInformation technology consultingCycle (graph theory)Beta functionSoftware testingVolumeReduction of orderOperations researchScalabilityPhysical systemScaling (geometry)Cache (computing)SynchronizationFinitary relationRead-only memoryFile formatCompact spaceWeb pagePrice indexQuery languageDigital filterPlanningFunction (mathematics)CountingLocal GroupAddress spaceHash functionProcess (computing)Database transactionTupleContext awarenessStatement (computer science)Set (mathematics)Mechanism designConfiguration spaceComplete metric spaceClique-widthDefault (computer science)Electronic visual displayRow (database)Table (information)Field (computer science)Line (geometry)ASCIIFloating pointComputer configurationObject (grammar)Subject indexingStatisticsKerberos <Kryptologie>AuthenticationServer (computing)AlgorithmSuite (music)Transport Layer SecurityEncryptionTime zoneMathematical optimizationVarianceOrder (biology)Maxima and minimaAsynchronous Transfer ModeRankingStack (abstract data type)IntegerFormal languageView (database)Local ringCondition numberData storage deviceMiniDiscSpacetimeData recoveryConsistencyAerodynamicsShared memoryMessage passingData compressionNumbering schemeMathematicsReplication (computing)Identity managementHuman migrationMiniDiscDifferent (Kate Ryan album)Mixed realitySpacetimeMultiplication signView (database)RandomizationSlide ruleReplication (computing)DatabaseTable (information)Computer configurationSubject indexingEqualiser (mathematics)System administratorSuite (music)Total S.A.MathematicsUniform resource locatorDefault (computer science)Goodness of fitHash functionSet (mathematics)Greatest elementSoftware developerFigurate numberOnline helpStack (abstract data type)Object (grammar)CountingExtension (kinesiology)CodeBuildingStatement (computer science)Type theoryBitCross-platformBuffer solutionQuicksortLevel (video gaming)Parameter (computer programming)WorkloadWeb pageSequenceVideo gameData compressionGroup actionNumberOperator (mathematics)MultiplicationSoftware testingFront and back endsLine (geometry)Complex (psychology)DemosceneData structureReduction of orderInformationComputer fileVirtual machineQuery languagePort scannerElectric generatorUtility softwareScripting languageFunctional (mathematics)Beta functionUniverse (mathematics)Order (biology)Mathematical optimizationKey (cryptography)Principle of maximum entropyCASE <Informatik>Physical systemInstance (computer science)Process (computing)Kernel (computing)CodeTheory of relativityPlanningCrash (computing)Function (mathematics)Ultraviolet photoelectron spectroscopyWeb 2.0TwitterDrop (liquid)Context awarenessAsynchronous Transfer ModeConfiguration spaceVacuumFlow separationPatch (Unix)ScalabilityFeedbackMereologyRow (database)AuthenticationElectronic visual displayDisk read-and-write headFile formatFocus (optics)Scaling (geometry)Regular graphBoolean algebraNP-hardCommitment schemeBenchmarkReading (process)WeightSummierbarkeitPointer (computer programming)Portable communications deviceTracing (software)1 (number)Semiconductor memoryCache (computing)Hacker (term)Parallel portExpert systemRevision controlCartesian coordinate systemComplete metric spaceMedianCategory of beingRight angleExterior algebraLimit (category theory)LoginError messageSoftware maintenanceServer (computing)Content (media)ResultantWindowMarginal distributionInsertion lossoutputStreaming mediaNormal (geometry)Condition numberCausalityConcurrency (computer science)Software bugTimestampModule (mathematics)Inheritance (object-oriented programming)AbstractionSystem callPoint (geometry)Real numberCalculationFile archiverHydraulic jumpMechanism designSocial classSeries (mathematics)Term (mathematics)Pairwise comparisonEndliche ModelltheorieString (computer science)Standard deviationEntire functionFile systemWiki2 (number)Shared memoryRun time (program lifecycle phase)RootOrdinary differential equationLimit setMessage passingCycle (graph theory)StatisticsWindow functionTime zoneInformation technology consultingData loggerSource codeLibrary (computing)Data storage deviceInformation securityData recoveryNumbering schemeWordRankingDynamical systemDatabase transactionElectronic mailing listFormal grammarLatent heatEvent horizonAreaTheoryArithmetic meanDependent and independent variablesStress (mechanics)Graph (mathematics)Similarity (geometry)Mathematical analysisWater vaporPersonal digital assistantForcing (mathematics)Transport Layer SecurityCylinder (geometry)Field (computer science)SequelMembrane keyboardBlock (periodic table)Directed graphNetwork topologyHypermediaNatural languageComputer hardwareComputer programmingVector spaceState of matterFlagSign (mathematics)Schmelze <Betrieb>Execution unitSolvable groupAuthorizationLogical constantProteinResidual (numerical analysis)ArmObservational studyEntropie <Informationstheorie>Time seriesBit rateJoystickLogicSpecial unitary groupGUI widgetSound effectGravitationStability theoryFrequencyHüllenbildungOperating systemWebsiteGraph coloringData managementLattice (order)NeuroinformatikGoogolMetadataModemGame controllerPopulation densityVarianceConnected spaceComputer animation
Transcript: English(auto-generated)
00:33
Hello, everyone. I think so.
00:42
If I had known there were going to be this many people, I would have done some cocaine first.
01:14
Hey, Chelnick's here. I guess we can start. Came to heckle.
01:27
Okay. I guess we'll go ahead and start. Can everybody hear me? Yay. No. You can't hear me? Well, you can, but you're close by. I don't think you can. Can you hear me?
01:40
I can't. Is that thumbs or middle fingers you're putting up in the air? Thumbs. Okay. Then thumbs means you can hear me. Good. A little bit. A little bit. All right. So 9-4 on the floor. This is a title as suggested by Joss Burkus. This is a talk about 9-4 features in case that wasn't clear.
02:00
It would have been awesome to call it like 9-4 out the door, but that would require actually having it done. And that's not done yet. So we're not doing that. Quick poll. How many people here actually use Postgres? Okay. Good. All right. So you guys are at the right conference. That's a great start. How many of you use Postgres 9-3? Sorry about that.
02:23
Yeah. So 9-4. 9-4 is going to be exciting. Hey, Dan. How's it going? How many of you use a Postgres older than 8-3? Okay. Wow. That's more than I wanted to see. All right. Cool. Just wait. You guys might want to think about upgrading. And as it turns out, there's a release coming up that you should maybe do.
02:43
So all right. Moving along. So I am your humble host, Robert Treat. You can find me on the Twitters or on my website. If you want a copy of the slides, I will post them most likely there or send me a tweet and I will send them to you or something along those lines.
03:02
My background when it comes to Postgres and features and all that stuff, I've been doing Postgres for a long time, been involved in a number of different things. I guess I still work on Postgres occasionally and I at least come and harass the other people when they're harassing me at these talks. I also know a lot about root beer and crocs and shawarmas.
03:21
So if you're into those things, come and find me afterwards and we'll have a good time. Couldn't find any root beer. It's rough. It's rough. My more technical background when it comes to Postgres and why I care about features, the company I work at, OmniTI, we do web consulting and helping build and run large
03:41
websites. We don't always get to pick the database we work with, but when we do get to pick it, we always pick Postgres. Because of that, over the years, we've gotten involved in a lot of Postgres consulting. So every time there's a new release that comes out, we study it and break it down, try to figure out, do we want to upgrade to this or not? If it's like 9.3, then we say no, and if it's like 9.4, I think we say yes.
04:02
So we're pretty excited. I'm actually much more excited about this release than I was the last release. So all the good things I said about 9.3, if you saw that talk last year, I really mean them this year. So with that, let's talk about Postgres. Postgres, as you know, we do about an annual development cycle as far as features that
04:21
get built. The basic path of this from last year, in June of 2013, we branched the code and started 9.4 development. We do a series of commit fests. How many people familiar with the idea of commit fests? OK, not a huge amount. So basically, the way that Postgres development works is people gin up a whole bunch of patches.
04:44
They plan for them to be in a commit fest, which are scheduled throughout the year. During the commit fest, people who normally are writing code and writing features focus on reviewing the existing patches that are out there. And the goal is, in theory, all patches that have accumulated at that time will be
05:00
reviewed and committed or possibly at least sent back with feedback as to here's why your patch isn't getting committed. And so every other month, they go through that process. And then the hope is at the end of January, when we get into the fourth commit fest, everything will get reviewed and committed and put into the new release. What usually happens is that last commit fest in January stretches on for, I guess,
05:23
about four months, because we're in May, while everything gets cleaned up and people really decide, are we going to put these features in or not? Is this worthy and ready and does it need more development? And so it usually takes four or five months for the first beta to come out, which actually has happened.
05:41
So next slide says, this happened just last week, I think to the day, right? That was May 15. So 9.4 beta 1 has come out. Hopefully, most of you knew that. If you didn't, you do know it now. So I would encourage you to go out and download it and test it.
06:00
I got the beta even on MacPorts, which I'm perhaps maybe one of the three people who use MacPorts. But even MacPorts had the beta in it. So you have no excuse to not be testing the beta. Actually, if you're wondering, how do I test a beta? There's actually a wiki page on this, on the Postgres wiki, of how to test a beta.
06:23
I guess it's how to beta test. I would encourage you to go check that out. Basically, it's look at the new features that you're interested in, check those out, run your existing applications against the new version of Postgres, look for some esoteric things, maybe set up replication, see if you can break it. Many people should go and try to do that.
06:43
But we certainly encourage people. And if you want help or have questions, like you have about 48 hours to find the experts and track them down on their feature to get as good feedback as you can get. So beta is now out. With beta comes what should probably be the final release as far
07:02
as feature set goes. It's possible some of the things will not make it in there. And I will also admit that it's possible some of the things in these slides are actually not in there, but I think that they are. I did try to cull out features that have gotten removed. So if you are a hacker and you see a feature that is not actually in the release, please let me know.
07:21
I will strike that from the slides. But I think all this stuff is in there. I think all this stuff will actually make it in there. And so hopefully this is a pretty good overview of what you're going to see in 9.4. Whenever we do these talks, the folks that do them, we have a huge list of features. We make up arbitrary groups, and then we throw features into those groups. Some of these features could go in multiple groups,
07:41
but these were the ones that I picked because they seem good. Hopefully you will enjoy them as well. So on with the feature onslaught. So we'll start with performance stuff. And there's been some stuff everybody likes more performance. I think most people don't want less performance out of their Postgres. So the first feature, reduced wall for updates.
08:03
As you probably know, when you do updates, you are writing data into the wall stream. That has to go all over the place, depending on what you're doing with your wall. That can actually become a bottleneck because of the IO issues and that. So they've actually reduced the size of wall records that are being generated when you're doing updates.
08:20
And so as they reduce the size of that, you can jam more stuff into the wall, increases the ability to do more IO. And so that's the trade-off for the performance. Less wall means less IO, should mean less resource usage in general, helps enable more heavier write activity. So nice little update for free.
08:41
Counterpoint to that, XLOG insert scaling is another thing that's been improved. Reducing lock contention for wall inserts. That's always one of the big things that people are looking for for performance improvements. Any time you can eliminate lock contention. So this is one of those ones, again, where they went through, cleaned up a little bit there, improved scalability of the wall system
09:00
by doing that. So everything in, well, maybe not everything in Postgres, but a lot of Postgres and trying to scale it boils down to what's going on with your write-ahead logs and the amount of IO that you have to deal with there. So that can often be a bottleneck. So improvements in that generally help out most use cases that are out there.
09:20
Another performance helper is a thing called pgprewarm. I said a contrib module. I guess I'm supposed to say it's an extension. So this is an extension. And if you do create extension pgprewarm, it will install it on your system. So the goal of pgprewarm is basically a set of utilities in order to pretty much pull files and data off of disk
09:44
and push them into either your operating system's cache, the file system cache, or you can pull it into shared buffers. Yes, sir? You can do the indexes as well. It will do both. Yes, sir.
10:01
And there's sort of some differences of how you may or may not want to do that. I think usually where people tend to use this type of utility in the old way that they used to be done, let's say you had to do like a kernel upgrade or something like that. So you reboot your system. You've now blown out all the file system cache. You've blown out all your shared buffers. You have a Postgres that you need to get up and running. The way that we would tend to do that in the past
10:21
if we wanted to try to get those files into memory before you start hitting it with queries is try to run like a vacuum on the table or something along those lines to see if you could pull that into memory. The problem with that, obviously, is that vacuum is not necessarily guaranteed to touch all the pages on the table, certainly the larger it is. So this sort of gives you a way to do it from the SQL level and really target,
10:42
here's the relations that I want. There's also some more sort of nuance as to how you can call this, but the main things are, if you want it in the OS cache, there's two ways to do that. You can do it async, which you'll kind of do in the background. So if you're launching up and trying to pull some stuff in, but you need to start accepting queries right away,
11:01
you might want to do it async. If you do it sync, it'll be faster, but obviously, there's much more hit on IOs as more things get read into memory. Usually, I think you probably want to go into the OS cache as opposed to into shared buffers, but there may be a few tables that you have that you know these are the real hotspot tables. They always sit in shared buffers of Postgres. In that case, you can pull those directly into shared buffers.
11:24
So nice little tool. It certainly makes it, I think, easier, especially on cross-platform deployments and that type of thing, an easier way to get this stuff where you really want it to go. So that's a nice extension that's been created. Another thing, and I tried to track down
11:41
more information on this. I actually thought it was kind of hard to find this, but there have been a number of GIN index improvements that I think are actually fairly significant. The two overarching things is a change to the page format for how GIN indexes are stored. This format has actually been compacted. And depending on what your data looks like,
12:01
you'll get somewhere between 2 times and 10 times reduction in size. From numbers that I was looking at and things I saw, 4 times reduction in size seemed pretty easy to achieve. So that's a pretty significant drop in the size of the indexes that you're going to have. There have also been some improvements in how Postgres scans those indexes
12:22
and enables speed ups there as well. Certainly if you have queries that have some frequent items and then some rare items, and you're doing GIN index scans on that, you'll see performance improvements there. There's actually another extension called B-tree GIN, I believe it is, which sort of mimics what a B-tree does
12:43
but uses GIN indexes for it. And normally people would say, well, if you need stuff that B-trees are good at, just use the B-tree indexes because they're awesome, and that's what everybody uses. There are actually a few benchmarks that show that depending on what you're querying, if you're doing queries with Boolean values
13:01
or sort of a limited set of things that you're querying on, you may actually get faster performance with the GIN B-tree indexes than regular B-trees. So that was actually kind of surprising to me. And so this is definitely a really good improvement. And I don't think it's gotten a whole lot of press, but it's certainly worth taking a look at.
13:21
I think there are going to be new cases where you may be more likely to use GIN indexes where you didn't think you should in the past. There's also a talk on GIN that's going on Friday at 4.15. I was going to say the Russians, but I think Heike is also in that talk, so I guess I can't say the Russians.
13:40
Cultural issue there if I say that. But I would certainly encourage you to check that out. The GIN improvements were pretty good. And I think, like I said, I haven't seen a lot of information out there about it, so it's definitely worth checking out. Performance-y stuff, I guess this is performance. I call this performance because if you don't have performance
14:01
problems, you probably don't look at this. In 9.4, starting in 9.4 now, they've separated out planning and execution time. When you do explain-analyze. And this one, I just happened to get this really good explain-analyze. I thought it was really good because you probably won't ever see this when you normally are looking at your explain-analyzes. It's a little bit cut off.
14:20
So what you have is basically two lines at the bottom. What you used to get was total execution time when you run explain-analyze. So we tell you how long the thing took to run. What that didn't tell you was, well, how much time did I spend in planning versus how much time actually executing the query? So now those two things are broken out. So the first one is planning time. The second one is execution time.
14:40
In most cases, your planning time should be insignificant, and your execution time should be what you care about. It just so happened this was a fairly fresh database that nothing had really been accessed, apparently. So when I ran this query, I just happened to get this monstrously large planning time. And I was like, oh, that's the slide I want for my talk. Then you can see sometimes planning time actually
15:01
makes the difference. When you have 10x planning time, if you're trying to optimize that query thinking, I need to get it to execute faster, execution wasn't the problem. After I ran this two or three times, and then after that, planning time was basically minuscule, and execution time was not terribly large either. It's a pretty simple query. But this at least allows you to tell
15:21
how much time am I spending there. And if you're doing a lot of OLTP workloads where you really are looking like sub-second queries and that type of thing, it's certainly possible that you're spending a lot of time in the planner. And so now you have a pretty easy way to see that. Another explain change, adding the grouping columns
15:43
into explain. So here I'm doing a query. It's pretty straightforward, just selecting codes and counts from a table. I do a group by, and the new piece that was not there before was this group key. So it does a hash aggregate, but it'll tell you, hey, what are you actually grouping on?
16:01
And if you're doing grouping on functions or something like that, it'll put that whole piece of information in there. The thing that used to be in there that was similar to this, you would see that for certain index scans that will tell you, here's the key I'm scanning on. There's that kind of information. But for some of the aggregate stuff, it wasn't in there. So that's been added in there now. So if you're looking at really complex plans,
16:21
obviously a straightforward example is good for slides. The more complex you have where you have multiple hash aggregates trying to figure out, well, which thing is it actually aggregating in this step? This actually gives you a way to know that pretty clearly. So that's certainly a nice little bonus change there. Another one sort of performance-y related.
16:42
How many of you guys have issues with lock weights and contention, that kind of stuff? When it bites you, it really bites you kind of hard. The amount of information that's now available, I think in previous versions, what you would see, you'd see the top line, and then the detail line would be there where it'd tell you,
17:02
this process is all in the lock, and the other one, well, not even that. So you'd know, I guess you'd get the top line. That's probably about it. So a lot more context has been added, including here's the actual query that I'm waiting on. And that's always been a really tricky part. You could get the process ID and then go looking like pgStatActivity and do a little bit of looking and maybe figure out why it's,
17:23
which part of the relation is it hanging on within that? But this actually tells you, here's the query that was run. And you even know the tuple, right? So that's the disk format there that it's telling you on the relation that you're doing. So certainly a lot more information there than what we've had in the past. So it should make it much easier
17:41
to debug lock weights if you're having those issues. So I'm really happy to see that. I think that's the one we've wanted for a number of years, but finally have it now. So that's cool. So there's performance improvements. Again, what's nice about performance improvements is you just get the new version going, and it's all magically there.
18:02
So what else is new? So we have a number of administration-related things. How many of you are DBAs or would consider yourself, you're the guy who has to run the database, or gal? How many of you are like developers who, if I don't have to touch the database, I'm much happier? So a lot of hands started to go up, and they're like, oh, wait. No, I have to touch the database.
18:20
So I guess I don't count that. OK. So pretty good mix here, pretty good mix of the crowd. So some administration functions, hopefully to make life easier if you're running the Postgres databases. Altr system is a new command that's been added in. And the goal of Altr system is to provide a way to actually change parameters within the Postgresql.conf from the SQL level.
18:44
And so this is, I would say, the first incarnation of this. And here you can see a simple Altr system set the work mem to 42. You realize 42 is obviously the right answer for all queries in the universe and everything thereof. So you set that to 42. The thing you have to sort of know and understand,
19:03
and I think we'll sort of see some best practices evolve here as people get into 9.4. What this basically does behind the scenes is there are now essentially two Postgresql.confs. There's one that you hand edit, and then there's another one that will be included that is machine generated. So when you update commands like this,
19:21
the machine generated one gets updated. And then that one will override whatever was in sort of the hand mode, the hand edited one. So if you do a command like this, the problem is that you, depending on what you're setting, you either still need to reload the config file or restart the server, right? So it doesn't sort of magically solve that problem. So you probably want to do like a pgreload.conf
19:42
after you do this kind of change. You may have to actually do a restart. If you have like a good function that can like crash the backend, like that's a really good way to restart the server from the SQL level. Maybe someone will make like a contrib or extension for that. So that's the, you still have to deal with that sort of, how do I restart the database?
20:00
But this will at least allow you to change some of those settings and set them from the SQL level. So nice change. Pset, how many people have ever used Pset? Okay, so unfortunately I recognize most of you have your hands up, so I know. So I think like Pset is like a really underused feature. And I actually think that this particular feature,
20:22
which has not been there, is one of the reasons why nobody really uses Pset. So the basic idea now is, so for those that don't know, Pset in PSQL, you have a way to change sort of formatting of output at the PSQL level, right? And you use the Pset command, backslash Pset, in order to change things like that.
20:41
And so what you didn't have before, which is what you had for most of the other commands, is now if you just do backslash Pset and hit enter, you'll get a list of all the different things you can actually change, right? And that you didn't have before. If you did that, it just wouldn't give you anything back. So now you can see like, oh, here's what I'm actually using and different options I can change.
21:03
Also they've added, so that in of itself, I think opens us up so you can go experiment and see like what types of things might I want to change. Null display is like a common one, if you know about this, right? So you can change your nulls to not show as null, like have it be the word null if you're not sure what you're actually looking at. There's a couple of other ones.
21:20
Record separator is one, I think. But anyway, so you can play with that and get output that you'd actually like to see and change for that. It also now works with tab completion. So if you start typing one in, you can do tab completion and get the right option. So it's like one of those like, it's a really little thing, but I think will really help people
21:40
because people are pretty familiar with set commands and a lot of the other stuff in PSQL. I think they don't use Pset, most people don't, because the only way to know what the options are is like go look in the docs or maybe the source code and figure those out. So this should certainly help people that are trying to get up to speed on that. Another one, setting table space options that create time.
22:03
Again, it's one of those like, you sort of after the fact, you're like, huh, yeah, we probably should have had this for a while. Used to be, if you wanna do something like this, so here I'm making a table space and my location is some magic RAM disk because I don't care about my data. And so the option I'm passing in
22:20
is I'm putting in a specific random page cost, right? And so I'm claiming that if I'm on RAM disk, I will get better than equality for random versus sequential writes, which is probably not true, but I'm putting my database on a RAM disk, so clearly I'm not so concerned about these things. You could do this in the past. You would have to do a create statement for the table space
22:41
and then do an alter statement in order to set the different options. So this is just sort of making that a little bit simpler, right? You can do it all in one command and set that up. So yeah, a nice little feature certainly makes it a little bit easier if you're doing like programmatic building of databases to be able to do that all in one step. Another table space improvement.
23:02
So now it's much easier to shuffle objects around, not that we don't like writing scripts to do all this stuff, but now having a simple command to do it for us is nice. So now you can move any object from a table space from one to another, and you can break that down different ways, like if you wanna move all the tables or all the indexes or just everything in this table space over there,
23:20
Postgres will take care of that behind the scenes. Here, I'm moving all my indexes onto my RAM disk, so out of the PG default table space, because you can rebuild your indexes, right? And they're probably gonna be corrupt anyway, so why not? So again, just sort of makes it easier for those that are out there trying to manage and shuffle things around like that.
23:44
A new view that's in the backend is pgstat archiver. This view tries to show you what's going on with sort of the wall archiver mechanisms that are there, and it's fairly straightforward. It's good if, you know, the more complex you start building out,
24:02
you know, replication setups and having archive files go all over the place in multiple databases that are within the size of your Postgres instances, the more you need this type of information, and a lot of this was actually fairly complicated to get at, like you needed to have command line access and get access to certain utilities or really track, you know, go troll your log files
24:21
to see what information was there. So this makes it a lot easier to get that kind of information. Here, you know, it's pretty straightforward, so like archive count is like the number of times that the archiver has gone off. The last archived wall, right, so it's giving you the actual wall file name of that. Most of this is fairly straightforward.
24:40
Failed count, if you break for some reason, or for some reason the wall archive command isn't actually working, right, that count will start to accumulate, so you can see like suddenly that that's broken. Should make it actually really easy to monitor that. We have a tool that we wrote called Omnipitter, and like we wrote a module like to try
25:01
to monitor this stuff, because it was complicated to monitor. So now you can do a simple query that says like, hey, my failed count is increasing, like obviously something is broken, I gotta go check that out. So that's the type of thing where it's like, now this stuff is like, when you make it exposed at the SQL level, suddenly it's much easier to do all kinds of things, because those are the tool sets that we're used to working with, so.
25:21
Also will tell you like the last one that failed, that kind of thing. So yeah, so that's certainly really nice. If you're into monitoring, which I hope you are into monitoring, you should be, then that'll make that much easier for you. I guess this is an improvement. It's probably worth knowing at least. So we've removed Kerberos 5 authentication.
25:42
How many people out there use Kerberos 5? Really, really, man? Okay, your new customers are in trouble. So this has actually been deprecated since 8.3, which is like, by my math, like 11 versions ago or something.
26:01
So yeah, you should use a GSS API instead. And is Magnus actually in here? I would point you at Magnus if I knew where he was. I think he went to the other room. You may want to query him and say, how do I migrate and make that happen? So 8.3 was a long time ago. So I believe actually 8.3 is not supported anymore.
26:23
It is not, so yeah. So eventually deprecated things do get removed, it turns out. Other similar, or sort of if you're in sort of security and into that kind of stuff, I think we've all given up on security at this point. But if you're still into that
26:42
and still clinging to the hope, so there've been some SSL and TLS improvements. New versions being supported, old versions going out. Also changing to the change in the default Cypher status has been done. From what I can tell, this should be pretty transparent for just about everybody in the world,
27:02
unless you're actually doing something specific and you've been in like your PostgreSQL.conf and you modify these types of things. It should be more secure, but generally transparent to all users. If you normally go with the defaults, these defaults are just better defaults. So hopefully you won't see too many troubles. But again, remember when I said test and beta testing?
27:23
Do that. A couple of new config params that have showed up I think will be of help. Auto vacuum workmem, mostly I think they're self-explanatory, but I will attempt to explain them anyways. So auto vacuum workmem works like many of the other auto vacuum settings that basically just mimic the server settings.
27:42
The default on this is negative one, like with the other ones, and it basically just means use whatever the regular workmem is set to. But if you want to set your auto vacuum workmem to something different, you now actually have the option. So pretty straightforward, but it's one of those ones where you're like, wait, we already have that feature. I'm pretty sure that was there like two versions ago. And then you look and you're like,
28:00
oh no, I guess we didn't. So session preload libraries. So if you're familiar with the preload libraries, there's different options that you can do to like start at the backend or when you start Postgres. This now allows you to do it at the session level and call those. This is a foreshadowing of some things that we'll talk about later. And then wall log hints.
28:21
How many are you familiar or heard about hint bits? So yeah, painful smiles in the front, hint bits. So without getting into too long a diatribe of hint bits, hint bits are little sort of pieces of helper information that Postgres has for doing queries against your database.
28:41
And it has to pass those around in order for you to get decent performance. I'll phrase it that way. Normally those things are not sent to replicants, right? They're not sent through the wall log. But in certain future replication scenarios that people were working on, again, a little foreshadowing, that will actually be something that you need and something that will be useful.
29:01
So there's now an option to turn that on. It increases wall log, the amount of data that's in your wall log, which is why it's not on by default. But again, so now you have an option to turn that on if you want to use it. Of these three, I think the one you're most likely to run into is the auto vacuum work mem. So just be aware of that one.
29:22
The other ones you'll probably hit at specific times, but all good. So, all right, so that's performance, some administration improvements. And then we get to my favorite part, believe it or not, is SQL. I guess we're probably, people here like SQL, right? Like, I guess there's a movie like SQL is Not Dead Yet
29:42
or something like that. So if you like SQL and you still use it, I think you'll be pleased with some of the features that are here. I was. So first thing, I always have a hard time making time for things, right? Everyone has so many things to do in their day. It's difficult to make time for all the things that you want.
30:01
There are now some new ways to make time inside of Postgres, because that's the kind of helping database that Postgres wants to be. What these are are a series of functions that enable you to build sort of date and time and interval strings. So I have sort of an example at the bottom, right? So I do select make interval. And what's nice about these functions is that they actually use named parameters.
30:22
So you can pass in the parts that you need, right? So we have make interval, passing in two hours, one minute and 1.2 seconds. And what you can almost see down here is two hours, one minute and 1.2 seconds. It was way easier than like all the manipulation that we used to have to go through for this kind of stuff.
30:40
So there's ones for interval, there's ones for timestamp, and then again for timestamps and time zone. Certainly from a programmatic aspect, this will make your life much, much easier, I think, when it comes to doing this type of stuff. Nice little thing. A much bigger feature is moving aggregates.
31:01
So this is sort of an optimization for aggregates. They're called within a moving window. So how many people are familiar with window functions? Okay, wow, I'm actually really surprised it's that many. That's pretty awesome. So if you're familiar with window functions and you understand the idea of as you're working on a result set, the idea of the window that can move along
31:21
with the result set, right? And look at like two rows ahead or two rows behind and that kind of thing. So there are certain aggregates that can be used within those windows, right? And if you have a moving window, this actually boosts support for those aggregates within the moving window. Sort of walking you through a very detailed example of this would be extremely painful
31:41
and take a lot of time. So I will say that if you are not reading to Pez's blog, you should be, because he has obviously a very awesome write-up on how this works and goes into the details of it. So I'll start with that. But the sort of main thing is if you're doing this type of stuff, if you're using these aggregates with moving windows, that optimization has been done
32:00
and the groundwork has actually been laid for other aggregates to be done that way. So that's certainly a nice bonus. Not my most favorite thing. I think this is my most favorite thing. Lead and lag, those functions I believe are there. Yeah, so this is not like adding those functions in,
32:21
it's optimizing those in the moving window condition. Yeah, so yeah, like sum and average, all those obviously are there. So this is just optimizing that. Ordered set aggregates is sort of a new thing that's been added in. It's a really new feature in that sense. So it's sort of a new class of aggregates that you can use.
32:41
And the basic sort of ideas or terms that you hear people talk about this are either like offset in a group or within a group as sort of the mental model. And this will make a little bit more sense. I have some examples of this. This is based on the SQL 2008 standard. So I guess it is more or less standards compliant, which we all love to comply with standards.
33:01
So that's good. A few examples to sort of show how this works. So let's say I have some data. And here basically what I'm doing, this is just some random data that I have, which are dollar amounts and then the number of people who have done this. So this is like a payment table. And so I've got data sort of dispersed amongst this.
33:21
And let's say I wanted to figure out like, well, okay, here's my data and it's all these different amounts. Could I figure out like what is the median amount? And you actually can do this in SQL now. I would say it's fairly painful. And so this makes that easy. And easy is good.
33:40
So here I'm doing a query on that payments table. So I'm grabbing a couple of values just so you can see them, the count, the min and the max. And I'm using this new function, percentile cont, and passing in like 0.5, so like 50%. And then you use the keywords within group. So I'm looking at within the group of data that I'm looking at, ordered by the amount.
34:02
So that's part of the way that these aggregates work. If you order your data in some way, could you tell me where I am at 50th percentile? Sort of the query. And so then the answer I get is 399. So that is my median amount. Everyone quickly goes to Wikipedia to Google, like median average mode. So I can remember that stuff.
34:21
It's fine. I did it when I was writing the slide. And I've probably got it confused at this point. So if you want to compute median, it's now very easy to do that within your thing. What is also sort of special and fancy about this, let's say you want to find out multiple pieces of info about your data. So here I'm like, can I get 50th percentile, maybe
34:42
90th percentile of that data? You can pass an array into that function with the different amounts that you want. And you'll get an array back with the different amounts as they're calculated. So there's actually a lot of sort of machinations behind the scenes in order to make this stuff work. It was pretty interesting.
35:00
But that certainly allows you to compute statistics on your data much easier. And if you go back and look, so if you remember, so 399 is where we say the median is. And 699 is the 90th percentile. I'll jump back really quick. So 699 is here. So I have most of my data does look like if you do some rough math, you get about 3,000 to the top.
35:23
You're looking at like 12, 14, 15,000 out of all my rows. So that seems like it's probably about right. So that's the quick math. You're like, do these guys do the right thing? And you're like, yeah, probably. So I did not break out my slide world to check, but I trust them. So that was much easier. If you decide like, well, that's all well and good,
35:44
but maybe I do actually want the mode, you can actually get that as well. So there's now, it's not the ode function. It's actually the mode function. So you do the same thing and calculate the mode. The mode was 499, and the median was 399. So anybody want to bravely say what does the mode mean?
36:04
Nobody is that brave. Are you sure? Sure. Most common. Getting a lot of people proud of that. So most common is 499. Let's go back and look at the data. So 499 should be 3,789.
36:22
Right there. It actually has the highest count, so it turns out it is the most common. So yeah, you should check this thing called PLR. It's like all statistics and stuff. You're like into like modes and means and averages. Like you would totally like that. I'm not going to revise myself. There you go. Did we obsolete some of your work? No.
36:41
So I mean, this is the thing. You could figure this out somehow, right, with SQL. Like you can see that the things are, if I can figure out what is the highest number and then like order it by the count, descending limit one or something and figure out what the value is, you probably could have got that value. But being able to do it this way, much easier, much nicer. So I think that's going to open up a lot of like,
37:00
the easier we can make it to do statistical analysis on our data, like the better for everybody. There's actually a bunch of other functions in this area as well, so I would encourage you to go check out the docs, look at those. Rank, I think it's rank. It's particularly interesting that you can pass in a value that does not exist within the set
37:21
and say, where would this rank within that set? That type of thing, I find to be kind of interesting. So again, order set aggregates, I think, really going to open up some things and make it easier for people to do more advanced SQL. So I'm pretty excited about that stuff. The other one, and I was talking to Richard last night. And he said, well, this is completely useless in SQLite.
37:41
But in Postgres, it's totally not completely useless. And I'm so excited. I was overly excited about this feature, and it seems so simple. Is there anything called filter aggregates? And so this is adding support for the filter clause. How many people have ever heard of the filter clause before? So you work on other databases.
38:00
So what this solves is the problem of, let's say, here's my query, and I'm trying to get a count. But let's say I also want to count where the amount's more than 5. So remember, if I go back to my data, so if half of the people were at 499, so I want to say, well, give me a count of everybody and then a count of the number of people who have spent more than 499.
38:24
So the old way you used to have to do this, I probably can remember this off the top of my head. So you would do probably, instead of the count star filter where the amount's greater than 5, which actually looks like what you want, you would have done sum paren case when amount greater than 5
38:41
is n1, else 0, and close paren. And it's not that that doesn't work. It's just that it's really ugly and offensive to your SQL sensibilities. So this is actually a much nicer way to do that. I think people can look at that. And I've certainly been asked by folks, especially the more complex those things get,
39:00
the more people look at those case statements, and they're just like, wait, why are we even doing sums? This doesn't make sense. And when you add nulls into that mix, it gets even more messy. So there's a much easier, nicer way to do that. And certainly actually adds a little bit of portability for Postgres as well by supporting that clause. This is another feature I think I'm probably overly excited about this feature.
39:21
But it makes me feel better about the SQL that I have to write. So that's filter aggregates as well. All right. So that's some SQL stuff that I'm really excited about. Let's talk about a few back end things. I've seen a lot of PLPGSQL code
39:41
that many of you in this room have written. And I think this is a really good feature for most of you. So now you can get stack traces from your PLPGSQL. And I find pretty much everything about debugging PLPGSQL is difficult, but really good in sort of like a caveman esoteric way. But beyond that, it's definitely pretty tough.
40:02
So this allows you to get stack traces and see what's going on with your PLPGSQL. Another one, refreshing materialized views concurrently. This is a big thing. If you remember in 9.3, we got materialized views. The big problem with those was if you wanted to refresh them, you need an exclusive lock on the materialized view while you were refreshing it.
40:21
That basically meant it was useful for almost nobody. This now makes it useful for most people, as you can refresh it behind the scenes. I think the thing to be aware of, obviously, is still, if it's a really expensive query, it will take a while for that refresh to happen. But the good news is it's done concurrently, so you don't have to hold the lock while you're doing it.
40:45
Yes, they get to use the old version while it's building the new version. And then it will swap it out. Almost like how you'd want it to work. It does hold a lock during the swap for the most infinitesimal amount of time.
41:01
So if you think about it like when you do a concurrent index build, it's sort of the same mechanism as that. Another one, with check option for auto-updatable views. Updatable views are really interesting in their current incarnation, because you can update things in a view that you can't actually see. I think that's interesting in ways
41:20
in which I don't use this feature, so it's fine with me. For folks who don't like that sort of side effect, you now have the ability to make those views and use the with check clause. And you can specify whether you wanted to check, is the row that I'm about to update actually visible in this view, or is it visible in this view in any other parent views that might be looking at this view? So there's some more flexibility there as well.
41:44
So that's good. I saw somebody use a phrase that completes the abstraction of views and tables, and that now you can throw errors if the row isn't actually in the view. It'll throw an error back for that. So it's much more difficult to tell that you're on a view instead of a table.
42:00
I know all the hipsters are really excited about this, the JSONB functionality. How many people have not heard of JSONB and its functionality? So everyone has heard of this. It's good. I wasn't planning to spend a lot of time, so that's awesome. So basically, some pros. If you move to JSONB from regular JSON,
42:21
I said no need to reparse data. That might be overselling it. There is less need to reparse data as you're swapping it around. Definitely improved indexing support by a wide margin. And now you get things like equality checking and better querying within the JSON data structures that you have. I think there are some cons to this.
42:40
Whitespace munging, I think, is perhaps more of a problem than people think it is. If you use JSON in your current applications, I would strongly encourage you to test the JSONB stuff while it's still in beta before it gets out into the world and causes trouble. I also noticed there are some JSON and JSONB quirkiness issues where if you cast invalid JSON to JSONB,
43:04
you'll get an error that it's not valid JSON. So there's some quirkiness things like that. Also, I found it takes more disk space. If you have smaller JSON objects, it'll probably take more disk space to use JSONB than it would otherwise. As you get to larger JSON objects and more rows,
43:21
that trade-off starts to go back the other way. There's actually a whole bunch of talks related to JSONB and the different storage formats and working with that data. Tomorrow afternoon in Morris at 256, I would strongly suggest checking that out. If you're using JSON pretty heavily, you absolutely should go to that. If you're thinking about using it,
43:40
then you may want to pick and choose, but it's still worth taking a look. Because Tom Brown is awesome and did some benchmarking, I decided to throw that into the slide deck. I don't know if Tom's here or not. So if you want to look at a quick comparison of JSON versus JSONB, and then he threw in MongoDB
44:01
because you've got to throw in MongoDB. So as you can see, table size of the data he was using was a little bit bigger with JSONB, smaller than MongoDB though. Index size was dramatically smaller, which is a big thing. And the first index size is like if you're indexing the entire JSON documents.
44:21
The second one, single key index is if you just go index a specific piece of the JSON document. Those are smaller as well. You'll note that all the numbers do actually beat MongoDB. But we don't quite lose your data as easily as MongoDB, so it's not exactly on par. But speed also is dramatically faster using JSONB versus regular old JSON.
44:43
So again, tomorrow afternoon, check it out if you're using JSON or interested in JSONB stuff. And start beta testing to make sure. I think that really needs a good sort of once over to make sure that that's going to work for folks. A couple of other things here.
45:00
Recovery target immediate. This is sort of a straightforward one. The recovery target, when you're setting up Postgres two nodes, like a Postgres replication setup, recovery target tells Postgres when it should launch the second instance and actually make it be a legit database. So we had a bunch of options for this. You could do it on the time or like a certain transaction
45:21
ID or a named point or like when all of the wall is normally played out and that finishes, that was sort of the normal one. Immediate allows you to tell it as soon as you get a consistent state, and so you know you have a working database, launch that thing up into sort of real database mode. Certainly, I can think of some cases
45:41
where when you're setting up reporting systems or dev instances and that type of thing where you really don't care that you have every wall processed, you're just like, as soon as I get something up and running, please get it up and running, that will make that an option. We are almost out of time. So a couple of quick things about future developments.
46:03
Dynamic background workers. 9.3 added the ability to have background workers, which were basically processes that were spawned when Postgres started and they could do then background work. And 9.3 is only a startup. 9.4 gives you the ability to start those dynamically,
46:20
either from like SQL or sort of other machinations within Postgres. So that's certainly a nice thing, and that is laying some groundwork for some future use. I think the parallel query guys definitely want to take a look at that and use that. Another one, dynamic shared memory, which I have to say is not dynamic shared buffers. Those are two entirely different things.
46:42
Please do not confuse them. But dynamic shared memory is, what it does is it allows the server to create new shared memory segments on the fly. So for example, if you created a new background worker from SQL and you wanted it to have access to some shared memory, it could now create a shared memory segment and use that.
47:02
There's also some message passing infrastructure that's put into place. So this will allow back ends that want to spawn other back ends to be able to talk to the back end that they spawned. So again, it's sort of laying more infrastructure for that type of stuff. Extensible toast support. So if you've been waffling about the type of toast
47:20
support that you needed, this now gives you some options in that category. You'll have added infrastructure. So the infrastructure is out there. You can support external or alternative compression schemes. I think everything now is libZ. Anyone know toast? It's got to be libZ, right? OK. Lib, come on.
47:40
lz, OK. Which, I mean, so it works for most people if you're doing toasted objects with sort of large data being stored that way. But obviously, there's a bajillion compression algorithm. So this should lay the groundwork for people to be able to use other compression algorithms for data, and certainly the type of data that you're trying to compress may make a difference as to what you want to use.
48:02
And last, but certainly not least, replication improvements. This is another one where there's actually like a whole slew of changes that have gone in. And I think probably the way that I would sum this up, if you've heard a lot about replication changes and async is coming and multi-master and all that stuff, I would say for the average user,
48:21
probably not a lot has changed in your world with 9.4 when it comes to replication. There are certainly a lot more features. And if you're the kind of person that likes to build replication systems, you now have all the pieces to build those systems. But the systems themselves and the thing of sort of being an easy to use, like start your Postgres and say dash, dash,
48:41
enable multi-master asynchronous replication distributed global hash table, that flag's not there yet. So all of the stuff that you're reading about that, I think it is worth reading about it so you understand what's going on behind the scenes. Because those systems are definitely coming. But as far as 9.4 is concerned, this is really about setting the groundwork so that third party tool authors, people
49:02
that want to build those systems, have the capabilities to build them. If you're into that type of thing, or if you are like, this is the one feature I really need, so I want to study it in-depthly, tomorrow at 4.15, Andreas is doing a talk on change set extraction. That I think is probably one of the more
49:21
complicated pieces and more user visible pieces of the suite of changes that they've put in. It's definitely worth going and taking a look. So that's sort of the heads up on replication improvements that are there. So we're also thinking that there will be fewer bugs in replication in 9.4. But I don't want to promise anything.
49:41
So that is replication improvements. Definitely check out the talk if you're into that. And that's mostly it, which is good, because we're out of time. Again, roger2 on the Twitters. I will post the slide someplace. I will answer this guy's question first. In this work, man, you could set it to 16 gig,
50:07
but Postgres will never use more than two. In 9.4, if you set it to 16 gigs, it will actually use 16 gig. Maintenance workman? Maintenance workman, which means you can use it to build a large index and have the whole thing get built in front of you. Again, I saw the full-time speed
50:21
of building like that. I feel like that worked before. I feel like that worked before. No? Not before nine. Joe says no. I have to believe Joe.