Identifying Hotspots in the PostgreSQL Build Process
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 199 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/32547 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2014184 / 199
2
3
10
11
12
13
14
17
18
19
23
24
25
27
29
45
46
47
48
49
50
51
52
55
56
57
58
65
67
68
70
72
74
75
76
77
78
79
81
82
84
86
87
89
90
93
94
100
101
102
103
104
106
107
109
110
111
112
113
114
115
119
122
123
124
126
127
128
129
137
139
141
143
145
147
150
154
155
158
161
163
164
165
166
168
171
174
175
176
179
182
183
185
188
190
191
192
194
195
196
00:00
Group actionStudent's t-testChainData miningBuildingPhysical system
01:51
TwitterSoftware developerLink (knot theory)Archaeological field surveyForestTerm (mathematics)Source codeTask (computing)QuicksortBlock (periodic table)Physical systemMereologyMultiplication signSoftwareBuildingProjective planeLecture/Conference
02:49
Source codeComputer fileScripting languageProjective planeSet theoryPhysical systemSoftwareFigurate numberTerm (mathematics)QuicksortGraph (mathematics)Multiplication signConfiguration spaceSoftware developerBlock (periodic table)MathematicsEndliche ModelltheorieCircleNumerical analysisCycle (graph theory)BuildingDivisorFile archiverLevel (video gaming)Object (grammar)Ocean currentCodeGraph (mathematics)File systemProcess (computing)Code refactoring1 (number)Directed graphEqualiser (mathematics)Program flowchartMeeting/Interview
05:47
Slide rulePhysical systemGraph (mathematics)MereologySoftwareGraph (mathematics)Cellular automatonTracing (software)Decision theoryComputer fileFood energyGrass (card game)MedianLink (knot theory)Multiplication signDirected graphRecursionFactory (trading post)Revision controlNetwork topologyTurtle graphics1 (number)BitQuicksortSource codeLecture/Conference
08:23
Source codeGraph (mathematics)Computer fileDifferent (Kate Ryan album)Flow separationOrder of magnitudePhysical system2 (number)BuildingCodeCode refactoringSoftware developerLine (geometry)Link (knot theory)Dependent and independent variablesHydraulic jumpGraph (mathematics)EmailTouch typingGraph theoryElectronic mailing listQuicksortPoint (geometry)Rule of inferenceTracing (software)Directed graphElectric generatorNumerical analysisCharacteristic polynomialStaff (military)Element (mathematics)MathematicsVideo gameMultiplication signCASE <Informatik>Sound effectMedical imagingAutomatic programmingCircleLibrary (computing)Event horizonRankingLecture/Conference
13:00
Revision controlComputer fileData miningLine (geometry)Whiteboard2 (number)Numerical analysisMathematicsDivisorSystem callXML
14:27
Dot productPhysical systemComputer fileAreaPlotterOpen sourceRevision controlThresholding (image processing)MathematicsCartesian coordinate systemGraph (mathematics)2 (number)Numerical analysisFocus (optics)MedianBuildingProjective planeStatisticsWeb pageDivisorHypermediaXML
16:33
Computer fileMultiplication signOpen sourceMaxima and minimaCodePhysical systemFood energyDivisorThresholding (image processing)Lecture/Conference
17:19
Computer fileSource codeDirectory servicePoint (geometry)Physical systemConnectivity (graph theory)Pattern languageLibrary (computing)ConcentricCore dumpLecture/Conference
18:27
Computer fileMathematicsEmailObject (grammar)Set theoryTwitterGroup actionObservational studyProcess (computing)FeedbackShift operatorSystem callSource code1 (number)CASE <Informatik>Software developerGraph (mathematics)Software testingThresholding (image processing)Task (computing)Electronic mailing listTerm (mathematics)Open sourcePhysical systemFlow separationFlagProgram flowchart
22:27
Thermodynamischer ProzessArchaeological field surveyMathematicsMultiplication signResultantPairwise comparisonMereologyGraph (mathematics)LengthTwitterRecursionParallel portSingle-precision floating-point formatLink (knot theory)Lecture/Conference
24:04
Series (mathematics)WordFront and back endsGoogolDirected graphRevision controlBuildingSubsetQuicksortSuite (music)Software testingLinear regressionResultantStandard deviationDifferent (Kate Ryan album)WhiteboardMultiplication signEvent horizonClient (computing)Virtual machineMereologyAbstractionMathematicsPhysical systemInstallation artCodeBitComputing platformWebsiteLaptopSerial portParallel portCombinational logicOrder of magnitudeMoment (mathematics)Arithmetic progressionRule of inferenceThermodynamischer ProzessMedical imagingGroup actionOpticsNumerical analysisShared memoryLecture/Conference
29:53
Observational studyLevel (video gaming)MereologySlide ruleElectronic mailing listMathematicsSummierbarkeitChain1 (number)Workstation <Musikinstrument>Multiplication signNetwork topologyTwitterRule of inferenceView (database)Disk read-and-write headGraph (mathematics)Physical lawWeb pageThermodynamischer ProzessSocial classHeegaard splittingInclusion mapComputer fileSuite (music)Point (geometry)Software testingScripting languageEmailGraph (mathematics)Projective planeUsabilityComputing platformLibrary (computing)Computer hardwareArchaeological field surveyPlanningSoftwareBranch (computer science)Server (computing)CodeAnalytic continuationTerm (mathematics)Lecture/Conference
34:59
FreewareSuite (music)Point (geometry)WordCore dumpServer (computing)Process (computing)Product (business)BuildingSoftware testingMereologyLinear regressionMultiplication signQuicksortDatabaseWebsiteInstallation artStress (mechanics)BootingSoftware frameworkWeightElectromagnetic radiationOrder of magnitudeAreaTheory of relativityUniverse (mathematics)Right angleLikelihood functionView (database)Bit rateRule of inferenceFigurate numberVolume (thermodynamics)Level (video gaming)State of matterPhysical lawFunktionalanalysisInferenceEndliche ModelltheorieSpacetimeResultantJSONXML
41:25
Profil (magazine)Physical systemFunktionalanalysisWorkloadSuite (music)Software testingSource codePrice indexProduct (business)ResultantDifferent (Kate Ryan album)Computing platformMultiplication signMathematicsPoint (geometry)Computer fileQuery languageBitDatabaseLinear regressionWeb applicationCodeFigurate numberStructural loadGreatest common divisorLink (knot theory)QuicksortSoftwareSoftware frameworkSoftware developerSet theoryNormal (geometry)Directed graphSummierbarkeitView (database)Form (programming)Field (computer science)Endliche ModelltheorieGroup actionInternetworkingDecision theoryTraffic reportingReverse engineeringMereologyCalculus of variationsMathematical analysisQuantum mechanicsVideo gameMoment (mathematics)Theory of relativityBasis <Mathematik>MedianFocus (optics)Execution unitFamilyRight angle
47:51
Moment (mathematics)Numerical analysisMultiplication signSound effectNoise (electronics)Endliche ModelltheorieMereologyPlastikkarteService (economics)Water vaporForcing (mathematics)Software bugTotal S.A.TheoryPoint (geometry)Lattice (order)Formal languageGoodness of fitView (database)Address spacePerspective (visual)Group actionObservational studyPhysical lawElement (mathematics)Limit (category theory)Event horizonComputer animationLecture/Conference
Transcript: English(auto-generated)
00:02
Okay. Hi everybody. My name is Shane, first of all, and this is an awesome opportunity
01:30
for me because I'm a PhD student from Canada and most of the conferences I attend are for research and this is like actual stuff is happening here. So it's kind of neat.
01:44
Yeah, so thanks for welcoming me. My talk is about identifying hotspots in the Postgres build system. Actually, before I start, here's my Twitter handle and this would be really awesome for me and my future research if you guys could actually take a look at my
02:05
Twitter handle. I just tweeted out a link and it's a survey, a developer survey. It'll take maybe five minutes of your time to fill out and it's just about how you interact with build systems and these sorts of things and it really helped me direct my future work
02:21
if you could fill that out for us. Anyways, back to the task at hand. So we did some work identifying hotspots in the Postgres build system and I find it's usually useful for me to take a few minutes to explain what I mean by a build system before I start digging in because there's kind of overlapping terms. Everybody thinks something different
02:44
when I say build system. So if we imagine that this pile of Lego, pile of blocks is the source code of a software project and this spaceship is what we want to deliver to our customers. Well, the build system is the set of order dependent instructions
03:02
that take us from the blocks to the actual spaceship or in software terms it's the make files and configuration scripts, these sorts of things that describe how source code and tech files, documentation files are translated into deliverables and then packaged.
03:24
And we've been using build systems, developers have, for quite some time now. Since the concept of the developer's work cycle as thinking of the code, thinking of something you want to change, editing the code to implement the change, running the build
03:43
to integrate the change into the deliverables and then testing what you've produced and then you circle back around. But in this model, all builds were thought of as equal. So anytime I ran the build, it was just doing the build step. But in reality, some builds are more equal than other builds. So some builds actually
04:06
take longer than other ones do. And this recent XKCD comic kind of highlights that, saying that one of the number one reasons for developers slacking off is the code is compiling. So they're waiting for the build to actually complete.
04:22
And we've got some quotes from GTK developers who say, you know, the build is abysmally slow and it's preventing them from being able to get their job done. So slow builds are actually impacting people. That's the impression that we get. So what can people do to address build performance
04:43
in the wild? So refactoring is a common technique. So if we imagine that we have a four file system, a very important thing to notice that this particular file, when we change it,
05:02
takes a while to reintegrate. Because first we need to recompile this object file, then re-archive it, and then it gets linked into two deliverables. So before refactoring this source tree, four commands are triggered when you change util1.c.
05:24
However, if we look at this graph and we find out that util.c is only ever used in the first deliverable, it's never used in the second one, we can actually refactor this code to look something like this, where now if util.c is changed, only one compile command is triggered
05:43
and then one deliverable is relinked. So now only two commands are triggered and we've shaved our compile time in half. But the question is now I'm working on, imagine I'm working on a real system that's not four files.
06:00
Where do I focus my effort? Where should I look for gains to refactor? And we thought the most obvious one is look for the files that are the slowest to rebuild. Find the turtles. So we came up with a two-step approach to finding them. To finding the
06:20
slow files. So first step, we extract the dependency graph from your software system. And the way that we do that, I'll explain in the next slide. The second step is we analyze that graph to find out where the slow files are. So going into each step in a little bit more detail, in the first step, we're constructing the dependency graph.
06:45
So we use this tool called Macau, which is a tool that will parse a debug trace of a build and it'll understand recursion as well and piece together the dependency graph. So what I'm showing here is actually a Postgres 9.2.4,
07:04
which is probably ancient for you guys now, but that's the version that we studied. And this is the dependency graph. So on the outside edges are the source files. So these are the ones that there's no command to generate them. And then as you work closer in, that's where things are getting compiled and linked
07:24
and these sorts of things. So yeah, we have this tool to build this graph. Now, in that graph, we don't have a cost for an edge yet. So we could look at that graph and just say the cost of recompiling a file is just
07:42
all of the edges that get triggered. But instead, we decided to do something a little bit more practical and look at how much each edge costs. So how long each compile, each link and everything costs. So we did 10 clean rebuilds of Postgres and timed each command.
08:02
And then we take the median of those 10 runs and that's the cost for each edge. So now we've got a nice way of finding out how much it costs to rebuild the file. So we said, let's try this out. So we picked three systems and we said, let's find the slowly rebuilding files.
08:24
So before I dive in, here are some characteristics of those big graphs for these systems. At the bottom here, highlighted in yellow, is Postgres 9.2.4. We also looked at glib and we looked at qt. And I'm showing in blue the number of edges, or sorry, the number of nodes.
08:43
So these are files. So this is like .c files, .o files, executables, libraries, these sorts of things. And then the red bars are edges or dependencies and the rules that trigger. So you can see we looked at three systems with three different orders of magnitude
09:02
of edges. So up here, qt was the largest that had over 2.7 million edges in their build dependency graph. Postgres has about 121,000. And then glib, or sorry, Postgres is the smallest down here, glib had 121,000.
09:23
So yeah. So we said, now let's go through each one of the source files so that the nodes on the outside of that big graph that have no in edges and trace through the graph to see how much it would cost to rebuild each file. And then we plotted graphs
09:42
that look like this. So on the left-hand side are the .c files, on the right-hand side are the header files, and each point on the graph is one file. And we sorted it according to build costs. So the stuff on the far right is the stuff that took longest to recompile.
10:01
So we saw some interesting things here, like for example, down here is where we would expect most .c files to be, triggering one compile and maybe a couple of links. But there were these jumps in the GTK build where some .c files actually triggered several compile commands and several link commands as well.
10:24
So we dug in and found that they had some code generators there where if you changed that .c file it would end up regenerating some code and then recompiling and linking some stuff. So yeah, there was some interesting stuff there. Here on the header files you can see there again steps. Some of these header files were
10:45
included in the source code generator, which meant that they would trigger the same sort of activity. And as you'd expect on the y-axis, at the top here, .c files, if you touch them, tend to rebuild very quickly. Maybe 12 seconds is the top here, whereas if you're touching header files,
11:05
you pay a higher price. This is what Postgres looked like, which was much more, I guess, flat, what we expect. Touching .c files never really took more than two seconds to recompile. But we found that
11:22
if you did touch some of the bad header files, it could take more than three minutes to recompile. And when we turned to QT, this is where we said, oh boy, these guys are in trouble. Because there are some .c files that actually took almost three minutes to recompile, and there were some header files that took over two hours
11:44
to recompile. So if you're unlucky enough to have to touch one of those, it would take forever to fix whatever you were working on. But in any case, we said, this is great. We can sort this stuff, we can find the files that are really slow to rebuild. All we have to do to find the stuff
12:04
you should focus on for refactoring is just draw a simple line. I don't want to wait more than 100 seconds for my code to rebuild, so everything above this line, I need to look at refactoring. Or maybe you're more patient than me and you say 150 seconds. Or maybe you're less patient and you say 50 seconds.
12:24
But either way, all we have to do is draw a line. We said, great, so let's pick an arbitrary line. Something we felt was bad was 90 seconds. We read some books, 90 seconds seems to be how long people say developers are willing to wait. So we sent a list of these header files, or C files,
12:43
that took longer than 90 seconds to the glib developer mailing list. And we asked them, of these files, do you think any of them are performance bottlenecks? And we got some scary responses. Somebody said, for all of them, no, none of them are hot spots.
13:02
We don't care about those files. They're not painful at all. Then we asked them why. He said, because none of these files change that often. So we went back to the drawing board. Originally we said, let's look for the slow files. Maybe it's not the slow files that are the bottlenecks.
13:22
Because they might only rarely change, right? So if they only rarely change, people don't have to pay the price of rebuilding them. So maybe what we should be looking at are the files that change the most often. So we said, okay, how are we going to find the files that change the most often?
13:40
We turned to the version control system, and we did some simple mining. We just looked at, for example, for each file, the number of changes that had happened. We assumed that files that changed a lot in the past will continue to change a lot in the future. So then we said, okay, let's build those same graphs, draw a line there.
14:03
What we found is that many of these files already built very quickly. So we thought, okay, that's not going to work either. So focusing on the files that change the most often also isn't sufficient, because they may already be optimal. There's not much you can do to refactor a file that builds in less than a second.
14:23
So what we suggest is that instead of trying one or the other, you should focus on both. So we call these files build hotspots. They're the files that rebuild slowly, but also they change frequently, so people have to pay that cost of rebuilding a lot.
14:42
So we took this two-step approach that we originally had and turned it into a three-step approach. So we added a third step where we took the version control stats that we mined, integrated it with the stuff we pulled from the dependency graph to get what we call a quadrant plot.
15:03
And this is a pretty simple concept. We just take the rebuild cost and the number of changes for each file and plot them against each other on X, Y grid. So each one of these dots would represent one of the files in your system. And then we split it into quadrants. So based on thresholds that you pick for your project,
15:25
you say, I'm not willing to wait more than 90 seconds, 100 seconds, 20 minutes, whatever it is. That's your threshold for rebuild cost. And then you pick a threshold for the number of changes, so 10 changes in the past is a lot, or 100 changes in the past is a lot,
15:42
and that becomes your threshold. And then we just say, focus on this red area. Those are the files that you want to rebuild. Those are the files that you want to refactor first. So we said, great, now we've got a metal detector. Let's go try it out on some open source systems and see what kind of hotspots we detect. So for this, we had to pick some thresholds.
16:05
Like I mentioned, we had been playing with 90 seconds because we read in some books that that's a pretty good threshold. And we also needed to pick a number of changes for the threshold, and for that we picked the median. So anything that changes more than the median number of changes for a file in your project, we consider that a frequently changing file.
16:27
So some of the things that we found using quadrant plots for the three systems, three open source systems that we've been exploring. So about 7% of GTK,
16:40
or sorry, GLib's code base was identified as a hotspot using those thresholds. So about 65 GLib files. And some of the things we found are that you don't actually have to refactor all 65. Sometimes they'd all bottleneck into one place, and there were some main culprits,
17:02
we said, that you could focus on. What I've noted here is that the maximum amount of time it takes to rebuild is about two and a half minutes there for GLib. When we turned to QT, poor guys with the two hour long build,
17:24
we found that there were a lot of hotspots, 732 or so, but it was roughly 8% of the source code files, so on par with what we were seeing in GLib. As far as main culprits,
17:40
we started trying to manually analyze this stuff to find some main culprits, but just 732 files was too much for us as outsiders looking in. We weren't really familiar with their system, so instead what we did is we found the components, or just the directories that had the highest hotspot concentrations.
18:03
So these are like, of the files in that directory, how many of them did we find as hotspots? And some of them with really high percentages were the core library, the XML patterns one. But again, yeah, they had a really bad scenario where you could have files that were rebuilding very, very slowly.
18:25
Then we turned to Postgres and we found that only 2% of the files are actually hotspots here, so good job to those of you who are Postgres developers. Way to go. 27 hotspots though, we still found 27 hotspots here.
18:41
And in the worst case, it was just over three minutes to rebuild some of these hotspots. So some of the main culprits that we identified are here, these five files. So actually if these five files were tackled internally, most of these hotspots would actually disappear.
19:04
So if I kind of shift now to what we learned from doing all of these studies on hotspots, is that they kind of highlighted this transitive trend for hotspot-ness, if I could make up a term. So if we imagine that we have an object file that gets linked into a huge list of deliverables,
19:26
a huge list of DLLs and executables, and this file changes frequently, we would flag it as a hotspot. So this C file here would be considered a hotspot.
19:41
But if this C file actually ended up including a header file, this header file can actually, changing this header file can trigger the same amount of build activity. So it's going to end up having to recompile that object file, relink everything. And now if this header file also changes frequently, it's also going to be a hotspot.
20:02
But that's not it. If this header file is also included in another header file, and that header file includes several other header files, and some of those are also hotspots, we've got what we call a super hotspot. So changing this file will actually trigger three hotspots to recompile,
20:25
which then keeps getting compounded the further up you go. So the advice we have here is that you should try to limit the use of these header file hubs. So header files that just include other header files, because they tend to accumulate build activity.
20:44
And if they change frequently, now these become hotspots. So just to quickly sum up what I've been talking about. So we started out saying what I meant by a build system. It's the set of order dependent tasks that have to be executed in order to translate your source code
21:04
and your documentation into something that your customers or your users need. And developers rely on the build system to be quick so that they can make their edits, change their deliverables and test them quickly to keep that feedback loop fast.
21:21
But there are some files that end up taking longer than others to rebuild. And they can really interfere with developers. So what we set out to do is come up with a way of finding these files so that you can refactor them and improve your build performance.
21:40
So we first started out trying to find just the files that rebuild slowly. That didn't work out so well. We tried the files that changed the most often, which also didn't work out so well, but when they were combined, so we call these files build hotspots. And the ones that rebuild slowly but also take a long change, change very frequently.
22:04
And we came up with an approach to find these files based on thresholds. Finally, we did some open source studies using our approach, found some hotspots and found some general trends
22:20
that we think could help avoid making build hotspots. So that's all I've got to say about build hotspots for now. I'd be happy to take questions from you guys. But at the same time, I'd like to remind you again that we put a survey up. And you can find it at my Twitter link.
22:42
It would be really helpful to me, and we'd be happy to feedback the results to you guys if you could take the five minutes to fill out the survey. Thanks again for your attention. Are there any questions for Shane?
23:03
I'm wondering how you measured this. Were you using a parallel build, a single process build, please? Yeah, this was just single process. But one of the things we want to look at is using this graph, we can actually find how Make would parallelize things, and we can actually find problems that way.
23:25
My experience nowadays is that even if you have to compile everything, it's not so bad because it parallelizes pretty well. Oh, okay. Even when you're using recursive Make, have you had any problems with that? Well, it seems to be fast enough, though.
23:41
Oh, okay. Cool. Yeah, I mean, I know that recursive Make doesn't parallelize as well. Oh, okay. I guess there's one other thing before we go into questions any further.
24:01
So my supervisor, my Ph.D. supervisor is here. He's at the back, and he's organizing a workshop on release engineering. So maybe he can tell you a couple of words about this. So last year we had the workshop, it's actually called Build Systems.
24:23
We have the release engineering in general. So integration, version control, and all these kind of things. This year we're doing it again because it was quite a big success. We asked people from Google and LinkedIn, Netflix, also from the source there.
24:41
This time we'll be hosted by Google in Mountain View. The deadline is February 28th. It can seem like a sort of small abstract there. The event itself is the 11th of April. We have already a keynote speaker from Google. And there's another speaker coming up.
25:06
The deadline is still open, so feel free to submit. So I have another question. Sure, sure. I was wondering, you had a quote about the GKK phone board. I was wondering, do they really view Google Builds as being Google Builds?
25:24
I believe so. I believe, yeah. That's interesting. Yeah. I think in our case, it's probably not so much the build for Postgres that's the issue. It's generally on a modern laptop or a Mac like that with an SSD.
25:43
Postgres would build in a couple of minutes, usually. I think so well on my Raspberry Pi. Yeah, we keep meaning to buy you something real quick. The bigger problem for us is actually, I would say, running the regression test. Because that takes orders of magnitude longer than actually building the code on those systems.
26:06
For the GTK guys and the QT guys, I've built those packages myself. And they're horrendously slow. I can see how you could make a real difference to their daily work.
26:24
For us, I think, for me at least, it's more the regression test. Okay. So your regression tests, are they smart enough to know only this stuff has changed? I should only run this subset? Or do you just run the whole suite? No, it runs the whole suite. Every time.
26:43
Does this work? Yeah, I mean... I'm already following that thread. Within the company I work for, we have a vastly expanded regression test suite.
27:02
And we have sort of subsets of it that we can run at any one time. But the main Postgres one, I think, is just still the standard parallel of serial skepticism. I mean, it's not too bad. It takes a few minutes on a laptop. There are more tests if you run all the isolation tests and stuff.
27:21
Let's take a little bit more. How frequently do you run the tests? Do you run them after every change? Well, we run automated builds for the installers that my team looks after. We run regression tests as part of every build, every night. And if we trigger a manual build.
27:46
For the Postgres project, we also run a website, the build founder, postgresgirl.org, where we have lots of volunteers that provide time on their machines. So they run a small client that will do a good checkout, build the code, run the regression tests.
28:06
And there's a couple of different ways we can run them. Runs the regression tests across the concrete modules as well. And then reports the results into the build part website. So that we can see if anything has broken the build or any particular platform. I think we've got about 30 different animals at the moment.
28:23
Different machines, architectures, compilers, combinations of them. Most of those are running at least one build daily, probably in most.
28:41
But that's pretty distributed. So overall none of us really see a huge problem in that. The bigger problem is when the build is broken, after making a change.
29:19
So last year at workshop we had someone from Google came and had the same issue,
29:27
the test took days. So they also then started recording dependencies. So they're basically saying this test is actually testing this and this stuff. We can see the three tests. And then they could really nail it down.
29:41
This thing changed and we won't be stepping into the world. Yeah. It's an investment. It also depends how often we change the tests. Yeah, I don't think with Postgres we're at anything like this age where it would be worth us doing that right now. Certainly with the stuff that we run within the company I work for,
30:02
we have some of our test suites on certain platforms can take close to 24 hours to run even on modern hardware. And there that might be an interesting approach that we should look at. Can all this process be kind of summarized or automated
30:23
in terms that you have a code branch and you just run through some kind of script or piece of software that can identify what's possible. Is it kind of self-contained so to say? So at this stage it's still a research prototype,
30:42
which means it's not automated at all and it's just me tweaking things. But yeah, it could very much be automated. The step that takes a while is the 10 queen rebuilds just to make sure that we've got the right cost for each edge. But that's again a trade-off, right?
31:00
You could build once and say that's the cost and then every time you rebuild you record again. But yes, essentially every step that I do manually right now could be done with a script. There is always a risk that at some point somebody can include
31:21
a couple of edge libraries or something like that that could end up and it would be very difficult to trace back from the software side.
31:52
It could even be integrated with a continuous inspiration server to make sure that every change you see what the impact is on hotspot as to keep coin in my same term.
32:07
I'll borrow one of the posters to spend some effort splitting up every class a year ago. So I'm not sure if that's actually included in 9.2. My country might look different.
32:22
We did do it for 9.2. The project does tend to do it every two or three years. Somebody will sit down and just clean up all the header files and remove unnecessary clues. It would be nice to get that as a step in the build farm plan.
32:46
We should exchange emails. I can help you out with that, definitely. I guess I also, since I've still got some time, I had some bonus slides which name the... So I mentioned there were five main culprits.
33:04
These are the ones that, through the chain of including, including, including, end up being hotspots because of that. So I don't know if any would jump out at you if the top of your head is being recently cleaned up. So some five files get dragged in through the main Postgres file,
33:24
through this HTUP details file. Each time I indent, it means another level of include. So yeah, you can see this one's four levels deep but becomes a hotspot because it was included in files
33:42
that were hotspots of the chain. Some more, some more. So yeah. Do you have a tool for that? Do you look at unused header files or based on experience?
34:04
I think we have a tool that looks for unnecessary includes. And besides that, it's a manual process.
34:25
So were there any other questions? Thanks again for inviting me to come out. This was an awesome opportunity. And one more time, there's a survey up there. Thank you.
34:46
So the next session here will be at five and that will be the final, well, the announcement from Magnus. Did I get voted back in again or not? Did I get voted back in again?
35:02
The elections results. Yeah. We will pause this here. So thank you again, Shane. Any idea where to get a coffee?
35:24
I'd actually complete the talk about the F1X7. If you want to run over and grab a place now, I can go right now.
35:44
Thanks for your questions. Interesting stuff. The boot speed is a big problem. Even if you can't get the heavy parts. It's just a habit. You can't see a thing in front of you.
36:01
That's a horrible thing. That's a horrible thing. Yeah, either way. But you have that test room thing. It's a nice tool. It'll be a nice job. Play with it. It's a problem that I have.
36:21
Oh, I know. I know. Everything is here. There's a big idea here. Just hack something. All right. You guys can go in here. I'm just going to move.
36:40
Okay. All right. I'll take a moment. I'll take your words down and introduce you. Okay. I'll get you here first. Okay.
37:00
Thank you.
38:08
triple checking
38:47
the testing, because many companies want to see lots of tests and soon you do something you want to test, so what do you do? to be sure you test everything which takes ages?
39:29
we sell support for the community no stress but we take a copy of the code and add all of the capability
39:40
and various other features, so we sell that and support that as well as supporting the community no stress and obviously contributing to the community and providing installers free of charge and that sort of thing
40:01
so our regression test suite for just the core database server based on the same framework as Postgres but where Postgres has got a few hundred tests and ours has got a few thousand we don't have the benefit of all the community so we test the crap out basically
40:25
and we found that suite there are certain things that we've not got to the point where we've invested a lot of time in trying to track down the gesture but we know there are hot spots in there
40:41
on Solaris Spark in particular it seems to be very slow, even on a very expensive test machine, I don't mean expensive like a production server for a big website a sort of build and test machine that costs 40,000 or something is fairly pricey
41:02
and we can, a 40,000 pound Spark machine have a regression test suite that takes 24 hours to run that on Linux will run in 8 hours and yeah, it's pretty much bizarre is this guy related or something?
41:23
yeah, like I say, of course there's enough problems here that we've dived in that deeply I mean, we've done some basic instrumentation and narrowed it down to a few particular tests but haven't actually come up with any real clues as to what the common factors are, if there are any
41:40
that kind of thing, it seems to not be particularly efficient on that workload for some reason I never know, of course we have customers using that platform we have customers using the platform but they don't seem to run into any problems with it on production workloads
42:01
because the regression test framework does a lot of things that you wouldn't necessarily do on a normal production system because it's doing different loads of things to try and you know, set up databases for tests that you're about to run that sort of thing, which is very random compared to what you might be running on an actual production system that's just doing the same thing
42:24
running the same web app or ERP system all the time yeah, it's an interesting idea sooner or later we're going to get to the point where our regression test suite is even on the platform as well, we don't have these problems
42:43
it's just going to take much longer to run than we wanted to I've never considered thinking about the tendencies between the tests and the changes
43:05
to figure out where he needs to put the pendant that slows him down yeah, that would actually be kind of difficult for us because it's the QA team that would actually develop tests based on loads of feature requests
43:21
but those guys wouldn't have any knowledge about how a particular feature was actually implemented, so they wouldn't I imagine that would be quite a common problem in a software house where different sets of people are doing the development to the actual QA
43:45
I'm not sure how you'd figure out a direct relationship between them programmatically because we wouldn't want to exclude things like for example if you create a new test that was to test a specific
44:00
aggregate that we'd added to the database you could very easily link that test and say well only run this test if these three source files which implement that aggregate function have changed but actually we tend to want to do more than that we would want that test to be dependent on any other code
44:24
paths that may have changed in the database things that might call the aggregate function, bits of the planner that might figure out how to use it differently within a query file I imagine that might get quite tricky to do
44:41
yeah, profiling tools might help there if you profile while running the test, you can see which functions are followed this is the idea run it a few times in different scenarios and see if you get some fairly stable results start to take that as being a pretty good indicator of what matters
45:10
well, I've seen three, four talks today and I can honestly say this one has given me the most to think about ah, cool, interesting around
45:20
popping out ah, good, thank you I'm hoping I've never seen that first person on before, I was a little nervous when we chose that talk, but I don't think we paid off anyway so do you have time to pop down now or is it too late?
45:42
so Vic said he was going to wander down with you in the room yeah, he's right now he's going to take some things back to the booth yeah, we've got our election results and I was standing for re-election, so I really wanted to be well there was only three of us running for three places
46:04
I think it's a four button place, but I still probably should be congratulations congratulations, thank you ah, this one, ok
47:09
thank you it's a salt American and it's a drink it's German and it's a drink yeah, that's exactly what I meant
47:22
see I asked this same question about two hours ago we brought the elephant took a while, it's always way, there he is
47:43
let's put your paws up there my ass girl it's not even that, it doesn't look like definition evil
48:04
that's ok I'll guarantee you that I will run over time for 45 minutes for this you sure? yes
48:35
did he get lost? no, everyone's here
48:40
I assume Maya's there yeah, Maya's there and Peter is there yeah, so am I you think it'll take me 40 minutes? no, it might take 10 we don't stop
49:01
oh, what? everyone's here everyone's not here we have to wait all of a minute well, previous sessions haven't actually finished yet no, on paper
49:21
remind us of the first time out go to this talk ok, that's good to know you're going to want that last bottle of water, Marcus thanks for pointing out that there was a bottle of water there if you just grabbed it
49:42
there's no mugs or glasses so I can pour some over it yeah, but for him you don't have to
50:06
you just want to charge do I not need to charge? no
54:39
what services were you using?
54:44
yes, quite obviously
56:40
first question was how did you get the e-mail address out of the card machine? turns out you used the website
57:09
what do we call these? bugs, apparently elephant bugs, not Postgres bugs, because Postgres doesn't have bugs Kristof told us that
57:32
but none of that was in stuff I did this time
57:42
so we were actually distributing bugs today? yes we couldn't fit the forces
58:20
there's one minute left people might be timing it I don't care about you no, it's less than that I'm waiting for Slonik
58:40
are we supposed to stand in that corner and dance the whole time? we agreed on this go ahead