Merken

Estimating query progress

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
if starting you know this guy Mrs. Cecil Daniel and he is the source of the quote that we should start with that quite and attention should right so I wanna start with
running this and showing you at so OK so that we thank so if you you want to know why this doesn't really
work you have to state a lot uh so high and you know what I work this small company called that support and talk about estimating from selected block just show that if you
wanna follow this slides on your lap so this is where you can get them if you tried that close that this is where you can get it and this is more there's a morphable 3 in the repository of this needs a modified was dressed in a slightly modified this out of both the changes I had to add that was the end of the lake this is just for for feature so our
1st line of Mordor say what what the problem is and how would you go about estimating the progress that up for this with my talk this is working right at the very last which kind of traits that might have between the fury of how it should work or and how like scientific papers say it should work and practice which means that what happens when you try to apply those those papers to view life also which is like hitting and report so the
problem is that the problem is very simple trick you it yet right I am a query and running for 5 hours and getting angry calls up that the idea was saturated CPU is dying should limited control on watch what should I do I mean should I ignore the calls for half an hour or more than that the result or should I tell so this is that they a amazing so this is the problem that this is trying to
solve Our do a precise you don't know and I mean posed by default will not tell you anything about how far the queries and it's it's like binary it's not finished is it is hard to decide in general how long will we mean you can try to explain look at because you can try to do some things but in general it it's really hard to say how long we will will take before you actually run it to completion up and you have to take that this decision
isn't for me you you have to just take a chance and so
while they use case is straightforward and simple site should kill it's not that simple letter to think about it more so if you an estimator it's gonna give you something it's it's like a way to get something of the system which could be a number that says how much work is done yet in in an abstract way that we need to find out how much work is completed they between 0 and 100 per cent how much time is remaining how much how much I need to wait before it stops and then there's this new problem that the back and that's actually running the queries busy running queries so we need to somehow going to take that information from inside that process and related to the outside and there are a few more horrible that they will cover in the hovels section so what makes a good estimate what would you like to get from an innocent these 4 parties ie I find I think there's there's quite a strict so you would like the estimated to be 1 of the I don't know if you ever I mean the download from it's modernist is usually you always get more and more bytes of you wouldn't want an estimator that goes back because this is somewhat monotonous granularity out this is important thing because you have progress nation right now in post is goes between 0 and 1 1 is that what it's not that useful if it's not done so you on the estimated with different in in across small timing so you want them to to change while time progresses family the universe you don't want to don't want progress speculation have a large overhead because if it's like doubles that that the pulse of the time from your query justice tracking the progress it's it's not really worth it and you'd like to just needs to be independent of the system is running on so you wouldn't like that the progress do so you will want to progress faster and faster machines and so on source and you'd like to be you'd like it to be abstracted from that the actual speed of the basic thing it's like like a Donald counter in a progress bar is not network independent those fossil fostering slowly slowly what are they a good estimator will not be affected by the changes in the in the underlying systems so
that this is the part where I start talking about the theory how how ideally this would work in in in the model that I set out so stuck with with how to model those that a query so would do you would you call work in a query I'll start with a brief waded through how the executed works as we will be working with the security system like a very simple just to just for you to get a basic understanding of the concepts and the analogy of among so the executer uses a tree structure there is that this tree is a mirror of the plant select like when you do explain you see a planetary the executor has a mirror of the tree and the nodes from 1 g in the other tree our link there was actually can think about this 1 she will be using that the executive tree like it's so like when you explain analyzed you see these 2 cheese superimpose right at the plant gendered 1 gives you the the pines offers occasional 1 gives you the the estimator which is the what of the planet put in at each node of the execution tree produces tuples and has a method that that says the minister of this method either tuple orders knowledge on water will be recovered from the sexual
executing the plan is very simple you just call this function on the top node of the tree until there are no more tuples every time you get to a point in it that the whoever is receding make that the client right so the sacchita is this very simple in its in the way it works and then that's more aggressive uh so
how as a node executes this uh this generic method of achievable from the from the node it is actually is specialized for each node of each node has its own way of producing to right and a sequential scan older will produce a global by matching the next 1 2 bytes from the from the this and for me that will return a sort node is a bit different what sort node would do is when you're asking for the 1st local election you read all the tuples from its downstream nodes sort and give the 1st and then when ask for the 2nd 1 it gives it a 2nd 1 and so a much you know that is every different again because what it does for instance is such as 1 tuple from from the no that's that's downstream of it but we things in the street and it it returns the caller the recurrence if the reference to the fact that the data in memory so you can see you can consult together without exactly think that such method of the downstream and and all kinds of node you can use poses can generate or like all kinds of exact Cuisia nodes that that are that have their own methods to fetch the return of the and usually involves fetching tuples from downstream nodes or reading this class if it's a sequential scholar and next the nodes have additional data uh attached to them that get updated while the queries run we will use these counters to estimate progress and make a said each executed node has a link to the plan node and then no you have planned estimate were the letters said I think this will be there that money to so few examples of this is an execution tree for query right that's up on the top level it's semester of joint and what the executor will do is it will repeatedly ask of this node for tuple and when it returns knowledge will say that that's no this no work because it's a nest no when asked for a tuple will fetch 1 tuple from inner from in child food will ask a given it to them because they will say OK I'm indexed so I'll just go to the index such that the index and you can go to the heat but to the actual people it read here there is occlusion continues with the other child which we wouldn't stand up to find that the matching tuple for the conditions and then once this is done 1 to and so on and so on so right so more or less it's it's it's a recursive process where you just as this guy and then the delegates were to all the other don't so right most of what I'm going to show you the URI section and make possible emotion freeing the practice sections is based on this so this is still like a fundamental paper that's that's all of the classifications in all the other work about progress estimation of so this just want to put the some just so you know that when show is not really my ideas it's it's just the that area that I'm gonna try to kind of they just for you and then and then show you how implementation of this would look so 1 or how do we model the work that needs to be done we do this uh through the number of tuples of return across all execution so we type progress to the number of tuples with no not the number of bytes process not the CPU time spent or you takes or anything that we do we use the number of people that returned and the paper calls this forget next models that get next is actually
precisely this so fortunately the overseas edition method is similar to what the what these
guys say that very model will be so that the concepts going to match right so it was a school the difference get next is like the the basic operation and the progress will be the number of get next called across the entire executive and would is completed will all nodes have had get next called up to to the point where where no more no
right so the estimator would use is that it's the sum so for any given moment in the progress of a query as the sum of 2 would have been returned divided by the sum of tools that will be returned at the end so that it doesn't have so I'm gonna call 1 of those and not 1 of those k so k will be that that have already been written by 1 of the nodes for like an old then n is going to be the number of tuples that be written in local and this looks fine and then the but how the hell do you estimate of rank k is fairly easy right each language and that would just added onto that's kindness and but n is the heart of so that the channel and actually is how do you estimate how we will be returned from an execution was that you can you calculate the
so it's Fujian and as the get
mixed model estimator it going up it's gonna be here for me and so
that's just so you won't get confused by the by the acronym so what's good about GMM it is kind of computer wimples so that this part of the models kind of match get next is Morris will be executed because it's it looks
simple enough the other definition of these it's it's usually 1 of those in right because
that this number will be just should be in theory really just a number that's this is the mean number of tuples and over there and this number will go so this should be 1 of
it's quite fine grained is usually have quite a few tuples with them and so on there is a tuple got getting process all the time so that they the fall and then and then only jump up you can use a lot requiring you reuse all infrastructure that wasn't already has estimates for the nodes like instrumentation for the plan for the executor that is used by explain analyze that tells you how many to blow already processed all that is already done in cycles receiving more of that but as we know of the planet can can estimate in various ways which are not always good so there are some things that that will be challenging now this looks simple
and it's not useful in its in
its simple form because basically all of the
estimates the as which can be very for if you just do the naive thing and and calculate G. and and directly
make at an even moment look at the execution you add up the the tuples returned and estimate and divide you wouldn't get a very good estimate this is where the concept and execution pipeline appears so if
pipeline areas of the definition this is that it's a maximal subtree of the executive plan that executes concurrently which means that it's all of the nodes that will be asked about for a tuple when the route of the pipeline wants to need to produce and then I O back to the
slide once we go through the examples in the local to hold fully closed held in a deconstruct the tree into quite like so then these which are usually sequential scans index thousand if only thousand start you pipeline so this is where your pipeline starts nodes the blocking which means that they need to read the input completely before progressing to start a new pipeline and joints combined pipelines of the
so this is an example of of the of the previous Executive tree with place with with play annotations so uh this is by going 0 programs inappropriate 0 actually this is all 1 by playing and why is that because well because the start and right then this is a little of this is a joint so it joins the divide that this is yet another pipeline that she's a dancer so it that's joins these 2 white and the Samoans years you actually this tree is a single pipeline and it kind of makes sense right when you ask of this node for a tuple it will ask this knowledge and that's this will ask this and that and so on so forth so each time you ask for a tuple In the root node it will execute every node in the tree so that a single life there it is an example of a of a tree with 2 pipelines just so you know that there can be more pipelines there can so uh here I have a nested between them and the sum is found in the hash join and what the what the hash stranger that is up there 1 of the children is a sequential scan and the other 1 is always the fashion world and the hash when you ask it for a tuple will completely drained prior to and only then start returning actually it won't it was there at the return of because this is what we're posters does something a bit outside of the get next model but if you think about it this way it'll it'll drain this this node and then provide joined with with information needs to match tuples coming from the northern sky and and the other side right so that a set of hatch is is a new pipeline so these are
pipelines this this concept of very yeah at the the 1 yet
so the actual parameter parameters of holds consular parameterized are our thing and I like to so that even in the theoretical model of the paper they they don't consider the middle the little about being able to scan over several so we this will be thank all of yeah depend the material no larger if I had about what actually materialize as new pipeline before I read the code and actually know materialize like and will ask the downstream the node or a tuple return but will not drain it's complete so it's it's it's executed concurrently the auditory underneath materializes executing code country concurrently with the with the tree of all material for the rest of the there's a problem with naturalized yes but it'll a you these screen that In in the paper we only consider scans and 3 kinds of John where hash join and exclude him from the mean going outside from the effort to provide so that this might be wrong what I what I assume about materialize about other kinds of nodes that are not mentioned so
it's we have quite planes
uh and there's another concept we need which is driver so each pipeline has a set of viral and drive around celebrating the leaf nodes of the spy plane except once during the ultra tree of the nested loops of for the tree that gets executed for each tuple of that in the subject so you can think about drive around there is no no no no the driver the quite the ones that are that are supplying the whole pipeline with loop all of that but can get hold of and this definition Morris captures that are now gyrus or are important for pipeline because you you can it's there better suited to the estimate how far along the hyperplane is than just looking at all the notes as they're driving the pipe one-third of the pipeline should more must be done and and because there there usually low in the tree very you you can get better estimates about this this and number the number of tuples introduce and so your error is small so that
grayish nodes here actually driving right so it when you started to construct this tree you have for diagonal like all the notes but then because this node isn't that outer subtree of the nest look it is not the right and in progress and so you're done your your you end up with only 1 driver all and it can also kind of makes sense because it's a nest look right so when you see when you finish scanning the interrelation while you're done for each scan of that so for each tuple from the interrelation using a so basically this node is is is driving the just just bear with me and then will come the implementation and you will see that there are issues of so right this is promise to pipelines where all this is that the driver knows just because of its use in the node and then this is not a driver note because it's in our subtree of a nest and this is a diagonal because it just a simple pipeline of and yet again this
is interesting so you have an estimate so this the the subtree will not have diagonals and if Lamar joint and that this merging is is getting fed by 2 and expands and they're both Tribunals with our definition and actually kind of makes sense because the both of driving the pipe they they both are supplying work to the rest of the paper so you can have more than 1 Jaber no that at the time and this brings us to the United States analysts like which is what we are going to be actually using all the paper propose all the whole the whole idea is based on what they call driver officers would just says that if that that ratio between tuples a pipeline has produced and the Bible is to produce is estimated by the same ratio for the driver
right so like let's go back here but here they say that that the ratio of the number of get next calls across of this pipeline divided by the number of calls that despite playing will will do 1 is from completion is approximated by it by how many how far along this node so homely tuples of this this from the will to move this so that's that's quite into it right it's like the driver knows supplying work for the whole pipeline if I know how far along the driver knows is I know how far along the whole hyperlink so it will not calculate that get next estimator across the entire pipeline will only calculate this for we driving all
and so that's the equation 1
no this is for pipelined learning problem for quite ways of being executed being for pipelines that have finished they say and and it's obvious that tuples got the there are equal to the proposal will be written and all of is the quietly not so we have now returned all the 2 so if the pipeline is finished where that where we got we have we we we don't need to calculate and for this pipeline because we just take care and is precise because we know it because we actually found each and every tuple as came through the point now for pipelines that have not started to be executive what we do is is we what we know all tuples return in in another 5 and 0 and we use the parameter estimates for tuples that are to be that this is how this is what the paper Monday as so there is a distinction by planes law in progress by playing finished by plane that have not started yet and this is all fine and then the URI but in imposes at a node can run and then you run again it's it's it's hard even say when when the pipeline section finished so this also is is not that trivial when you actually go and try to but like definition why this is what we and so the other
thing we we will calculate is actually the tuples returned for each by playing and tuples that will be returned wage by plane divide by 2 we we break it down like pipelines we estimate each pipeline separately and we get and actually all the equations you so I I like this and then someone told me this is a honking bad idea and so I just left it here for you to see what would the stock could have been if someone would have told me out of doing like using the glottal nation for every they so OK we so right we now have a definition we can just go and that's was
going the basic idea when I
when I started was 1st calculate pipelines identify drivers we already saw that there are some issues that would comes apart but this is not you can try and do so you start and you have the execution tree and you also have pipelines the schools but when someone when you need to calculate from sexually you walk all the pipelines and you calculate the k and numbers for every for each 1 that's a new device and then you're done and actually you also take advantage of the fact that you walk the whole executive tree to what I did is I I also graph like saved snapshot of the tree to see as a progress and show you 1 of those so
actually this is what the tool produces about this is the this is 1 of the tree that you saw in the of the slides it's annotated with pipelines is it's it's it's great out well for a driver announced and it says only tuples have produced only 2 pools are to be produced what is that and it goes and goes and there's a snapshot of known and because the tool that sampling love love love love love love and then it's done and this is the last lecture room yes there is a problem that you you know the thing in the the nice thing about the DNA is that it doesn't matter because we we use during the all estimated so we only look at driver and all the while it's executing so that's that's going to manage but yeah this is this is not the kind of animation you can get with the with the the study
of the role history of the yes the land kind
of well but what is it what if
all you you might have noticed not said anything because your
quirkiest but it actually will like stop now and what
about the the other hand you have to be of
soda of so that
was that's that's how I wrote it down following of this
is an effect so let's let's do
the same with go you for each bipolar so you need to know that the routine that's most important processing 1 point which pipeline complicated and ended up so 1st of all for each node of the pipeline is the whole interval having process from both the planet has a then if all the driver and also that pipeline already done I assume this pipeline is and then return K as n right because the number of tuples it will return is equal to the number of tuples it has if not of the driver knows has even started yet actually OK if any of the driver knows has started assume despite what currently run was already producing and well editor of K as systemic tuples process that's straight from the from the compass we update ones trying and then if if you work through the markets of very difficult you will see that and for this pipeline will case for for a of actually this is run by because this is this is not the global K this is K 4 is the global k so this is this is this is where where it's a bit tricky DNA says that the ratio between tuples of equal to was processes to move to the process is the same for the whole pipeline and just with titles but I want stands for the whole might not just the the diagonal because otherwise they would misrepresent despite playing in the grand scheme of things right would only give I would only return the k number for the driver old and this will be wrong and he became a number for the whole pipeline so it was really a return K which is the number of tuples process where the entire pipeline and if we multiply it by that proportion for the driver not I will get my estimate for the whole right this is also heard about that right actually just want to introduce it again right now because you have to think about how this at the start of the entire all this community living OK careful so when it starts so when it when you haven't started yet usable understanding as soon as it starts you switch to the so they're so there is this there is there's this moment where were where you can have this this clear for this this was because of as by the we it and I know I believe it cannot but it would have to think about it bit later yeah haven't but then language you that can but can there be can break this again not sure about the the order is vitamins usually start box while OK what yeah that there is this problem of like it is it is this which is used in actually the the in a bit but the paper accounts for a way to right so if if it's not finished and if it's not if it hasn't started its housing or right so then you say 0 4 tuples process and then on the estimator for the work to prove to be so that's another
problem I need to solve was how the how the grab that it's made out of the back end running the query and have it available or other back ends up and I went through shared memory I and actually someone will assignment so that patch from on high accuracy that allow people to to add a hold on 6 user-selected and significant a added a hole in your extension and then when you signal the back and saying I'm sending signal and a 1 and this is the whole light signal it wouldn't run your and interpersonal patch the idea that we needed some way to actually go and that if you have but back-to-back back and it's busy doing stuff give some results of and what happens is that at the back and when it gets the signal it's a little flag and the next time it has a break the check interrupts it will cover great progress stuff shared memory and then 7 will or other migrants from the would this also means is that if you want estimates you don't actually get you ask for them and then once the back and that's running the query gets around to processing your service your signal you will be able to do the rabbit but usually this is indeed of almost immediately as we do check interrupts like at the same point you would be able to cancel a query it will calculate the actual problem so that's why it's if it's ideal I don't know but it's I was very concerned because you can change that once you in in the estimation works there will be a way to out the back it's away as good and then I had to do some changes to corporals dress and mentors around from the least the most controversial police controversial by far is that most of the code is an extension and if it's not in court so the changes in court actually quite small and probably kind knew of acceptable half of so I need to store and private data mining the executor of tree so that the planner it can already use the called from extensions can can add private data to the planetary so we can retrieve it later the executor achieve did not have the feature just edited but it's like a pointer that you can put stuff in memory for for future reference of a big deal I need to all instrumentation that's is being done when the query is run but just the just allowing the user to control how instrument what what happens on instrumentation and location and this there's hope on on the on signaling a about and which I think it's it's funny I can see immediately way vendors but it I mean if you say signal another dangerous ones and this is probably wrong side that's so OK that that
looked like since like something simple and manageable and then you start writing stuff there are things that were not accounted for by the light and all contribute to the driver nodes and they actually give examples of of pipelines like that in the the gloss of brought about that when they get to actually defining the annex so no 1 actually told me how to hold estimator for the user driver no estimator for a pipeline with multiple diagonal just said I think the 1st make you take the drive around and you divide that was process like up with multiple isn't that simple nodes and have more than 2 children like when you have a lot of the and you can have more of so you don't only have joints that have multiple children have lots of stuff and then there's a whole slew of stop bursts really weird like like like stands functions like recursive queries like a really weird stuff low self respecting favor of with the rest it and so you have to do with that of and and rescales would put them through this as well and then the planet and something quite people so some of these who I was able to cancel I already so come up with something that is made use of making it you have more than 1 driver around to you I assume that case so the number of tuples to be repaired to be the average of all the drugs notes take it it can be the sum of actually what it could be the sum right but why average because when you have a margin which is a typical example of something where multiple driver notes it will stop as soon as 1 of the inputs rounds out right so what is associated more or less if you would if you were kind of mashed these tribunals together let's say moved more rest on average this is how far that progress together it's just take the average of and then the number of the number of tuples that will return I to the minimum because it's former joined this is the right thing to do is to say this number of tuples of 100 that the number of get next calls will be done is the minimum across all the across both married join notes whether I mean it was just something to this is so isolated the if if someone has a better idea of how how to deal with multiple driver announced it's it's it's easy to check but this talk running of our now an annoying thing is that for nested loops that land estimated will not account for the actual look so if you have a nested loop join and you have to sequential steps right and this 1 says unintelligible 3 tuples and this said up return of sex to the other 1 is actually gonna retire 18 tuples was going risk 3 times so what I did is I actually won them in an accounting for so when I traverse the tree or nested loop the notes and look at the in the time they making the limited like everything in the of the child by the number of loops in a list at estimates of what the do so it that gives me a bit closer to reality than just using contrast but so this is a quick of how information is represented in the plan and now I have so for I have specific goal for nested loops for former joints along but not for each and every you know as that and I just assume that knows that have more than 2 children are of blocking and will train them before progress but you have independent it's gone it's going do that on we have enough and it's not enough it will not that worried that if you have a union it will it will drain the inputs and then actually you knew usually insert fashion of harshness were ready give me what I want but then it just something to do to get this thing running and some nodes like recursive union which is another 1 of things calls with its it'll question what kind of that love of execution notes can end up increasing those loops number that you get from explain as the Solaris from reading the code I felt the only where you can get something other than looks want is nested loops joins the scans and and then in volume those said because you and I thought non-recursive union that's just not support so that so so the so it actually deserves solutions inch and then actually the are things that kind work can be improved but still that doesn't missing feature that the paper was really big on and then they they say it's quite important which is refining the estimates as you go along and this is a new concept for each node in the ring
but that's so for each node of you meant that maintain 2 numbers the minimum and the maximum number of tuples that and you go about refining those as you go along so that they could have a sort somewhere on the I know that it will never return less tuples that have already been returned by by child node right so I can try go on and push the lower boundary of and there were that and you can buy at the upper boundary also we've pushed down in some of right right like for sort this will actually be exactly the same number of of the of the tuples returned by the but lower notes right so is there the basic idea is that you maintain these 2 numbers for each node and then you never returned anything from outside of the range of and as you progress you go but they come closer and closer before that I have implemented that that they say helps a lot I can see
why you could add some some smarts that they say OK I have a joined on the non move foreign keys so I know ever really 2 people will have a counterpart so it would be 10 and use that money tuples as the as in there as there are in inevitable right or you can say 0 it's 11 joint so it's never going to decrease cardinality when you look at the in it's is only gonna turn as many tuples as as in attendance or more so that's missing and it it it should have guess hello so where where we go from here there are
things are just annoying and that just to be fixed perhaps as a provirus perhaps inside the module and the things that we need to as so 1st of all just say different pipelines like like a sensible will material started by plane is this a driver knowledge I don't have called for each and every by mode and I haven't analyzed every node so for some for some ideas just assume Yang has more than 2 inputs is only block that has 1 and put it on the legal and and so someone have to go look through all these nodes and make sure that pipeline building code for detecting code the correct no it's not that easy to see from brightness often in Figure 7 years of response because of the node being able to be executed multiple times it's pretty hard to declare finished as so so that you can sometimes as simple as that is not that straightforward so you don't get any visibility into 4 nodes and I'm looking sure how we how would you go about getting a number out of that right so you it's hard to say how far along source and sensor nodes of driver notes because they drain the input for the this is kind important if you have a source knowledgeable their use your until it starts all with tuple so Andrew sure how you could see how far along this sort is prisons forehatch I actually go and look at the number of entries in the hash table so I know how far along are we rushing before sort on and there a it's using that shared memory trick of stashing stuffing should memory and other back and take it out so you need to signal that the back and doing the work for it to you to give yes estimated so there's all kind of security problems here I haven't even started looking at this stage I think is on on the water superuser but I haven't really checked but this is something that's reasonable to improve once you but you know what you will not have rescaling as a whole the concept because and I will like you said the risk of our can can just below your escalates out completely actually materialization nodes that what they do is because there is not blocking but if you ask for the same again it will not go down and for or that the channel for a it's actually much cheaper to go over the material node than it is over an index and no right you haven't nested-loop join of that our tree is an index scan it will scan that index repeated if you come at the couple then the material node will retain tuples flowing out from extent and other looks over these much much cheaper but nothing in your account for the cell is going to be scanned a million times so each of these 2 nodes is gonna be random you so who yet so that the problem with materialize you you need to do that that trick walking the tree and manually multiplying by the number of tuples where you rely on the on the estimates for the inner child which can be when the so that's annoying because every change in the cardinality of the inner child in the next to the node will make all the estimates for the ultra knows so much worse but you could say as the plumber from and I like doing that we found in something that executive I'm assuming the planners perfect and if something doesn't when it is so if this if this if if some right if if the planet the wrong date that it's like garbage in garbage out like a and and that the original paper doesn't account for that which is understandable 99 and making fun but but this is it you can account for I mean I'm sure there are papers of building up a sure these guys are the worst are way spiral the meters they actually came up with all that and just kill them because they have of the recursive so what would you have is something that these are things you have to actually years of there's a separate problem which is being famous single tuple which means I have a nested loop and the process of the elders of the there are it should which should in the story that in there are no will at the only 1 to so it doesn't matter that it's a nested loop because I will just facts that want to and is accused the whole of the subtree what's right so there will be no moving even though it's nested loop because the the the driver known just returns 1 to but then if a true there is more than 1 tuple everything goes faster by because you have to actually go and countries the and take several times if this happens this means that the planet we messed up by allowing this huge so we need to be reached but this this actually happens and actually this can be very severe form the estimator because it says OK you have 1 driver like remember you have 1 variable why would the to something that's a to make all this the reason white because this was just at and want to so even if it does want to hold it it pretends that the new sale called the driver notice that other than work continues with all that all that as far as the as the pipeline is considered the driver and all is done and now is just the after after just medial finish of the completion you're you want you want drive more with more input from the driver but then some treaty what's happening is that the bulk of the work is spent here and it's not accounted for at all some
nodes breakout from the from the get model of select fashion old don't we tend to think to the the hospitable and they they returned I think know and they just have the the partial inside and then the parent node of knowing that they will find a world below them grants stuff from the and there is an obvious and they're very common is that it's say you have and the where is like that have you shouldn't but then again it's fine so yeah that's not bitmap index another funny these because they will scan the entire index they will drain index and then move on building the embedding of the the bit maps and it's not that easy to say how far along you are building love so so these nodes do not follow the get mixed model so that they recognize outside of all of all of the papers come are and then so yes if it if it happens that 1 of those is a driver around it can the whole thing can kind of folder then the utility commands so it's it's really cool to to track the progress of a selected but would you also really would like this to fall to track the progress of a coffee or create index right especially when you're doing creating this without concurrently you wanna know if this will be locked for a day or if it's gonna be lot for 10 minutes Our and as these included statements of they do not go through the executive there is no treaty everything I said is a light 1 1 you see 1 of those operations are tracing views could be worked around so like that a good estimator was this would account for those by just having special purpose code for that make it you have utility command is already handled by the special-purpose built you would add a special-purpose gold for estimating the problem like for create and that's I think it would be quite easy to say how far along lot like how how much of the relation indexing for underlies both of 2 for vacuum yet again so for copied if you had somehow say and copying x million roles and and then sort the copper you you would see how far various I think it's you could try to have special purpose cold for for the and so currently but this that where we're spending on what I'm thinking of them as it does some changes to call for for those who confronted with unmodified equals resident they're are not that controversial so if we think about with that I think we can implement the DNA kind works abandoning their progress bar I showed you at the beginning was was impressed by now you know it's it's lovely but it was that's what it means what is that we it's it's returning a slope between 0 and 1 hopefully and adults a graph is filed fight then use for these assumption images it's a it's a way to stop it might be better to return to our excellent data Synthetic explain and an external tool each process this and then give you a number of years the adult file whatever I so like because it might be better to do than in some other formant which refining that should be implemented because should really help with some things because I mean that the whole strength of this model is that the driver knows very well estimated because you're usually quite well know how many rolls would use and when you're doing a sex crime is you know this is relation with the proposal that estimate is usually precise and you can use that to kind of new refine the estimate of the entire pipeline as it goes on and this this should prevent and for the time being the cold that that's starting to positive areas a a little proof of concept do not run this and you should a if you send me because you will have a lot of time and I will fix them but but that it's it would disapprove of concept to see what can we do me is if the paper is even applicable in and if it makes any sense to go in that direction so we have minutes
for questions on it and all of that so of young PDT-style activity it's in shared memory and you were saying it so you could yep I think is the thing about this is that you only get estimates when you will so when you ask for them and they get conflict it not the overhead is there is negligible so so that's that's why I went with the asked of it if you don't wanna have it for each and every word like if you're doing all the people it's it's overhead that you'd rather avoid doing all that calculation of the yeah yeah absolutely yeah out of the for instance for user 0 yeah yeah I had been a lot of this is happening in the 2nd half of the sum of the of the of the of community it if it's not working and the overhead should not be that means the actual overhead of those I haven't measured it but it should be like in the low per cent 2 per cent this is not just walking the tree and then some simple equation it actually using this actually forces start collecting of them into instrumentation stats on the trees like if you explain analyzes will run the query a bit more instrument and then if you with and without so so that's so so it doesn't pay that go ahead but if would be if you could turn it off for for queries you will you only care about the overhead thing if a transfer 10 minutes it's gonna be 2nd the 2nd fractions of a 2nd your OK yeah have any and all that I want to minimize in fact all although of course just make it easier for people to try to run that may be in there on the whole map of with with a copy of the production it was whatever so I wanted to keep that small so whenever I could I would I would go around the is it but if it is in the hope that helps explain things like that more of than let's call
Offene Menge
Retrievalsprache
Prozess <Informatik>
Physikalische Theorie
Vorlesung/Konferenz
p-Block
Quellcode
Sichtenkonzept
Computeranimation
Schätzwert
Tabelle <Informatik>
Retrievalsprache
Videospiel
Dokumentenserver
Mathematisierung
Abgeschlossene Menge
Befehl <Informatik>
Malware
Maßerweiterung
CAM
Computeranimation
Arithmetische Folge
Physikalische Theorie
Code
Klon <Mathematik>
Rechenschieber
CAD
Modelltheorie
Druckertreiber
Verkehrsinformation
Gerade
Implementierung
Resultante
Retrievalsprache
Vervollständigung <Mathematik>
Kontrolltheorie
Gewichtete Summe
Systemaufruf
Zentraleinheit
Binärcode
Computeranimation
Entscheidungstheorie
Entscheidungstheorie
Rückkopplung
Schätzung
Retrievalsprache
Default
Schätzwert
Retrievalsprache
Web Site
Prozess <Physik>
Mathematisierung
Familie <Mathematik>
Zahlenbereich
Computeranimation
Entscheidungstheorie
Physikalisches System
Virtuelle Maschine
Font
Puls <Technik>
Arithmetische Folge
Geschlossenes System
Schätzung
Retrievalsprache
Grundraum
Hardware
Schätzwert
Datennetz
Quellcode
Physikalisches System
Rückkopplung
Funktion <Mathematik>
Einheit <Mathematik>
Overhead <Kommunikationstechnik>
Garbentheorie
Overhead <Kommunikationstechnik>
Retrievalsprache
Punkt
Kontrollstruktur
Wasserdampftafel
n-Tupel
Automatische Handlungsplanung
Physikalische Theorie
Computeranimation
Netzwerktopologie
Metropolitan area network
Knotenmenge
Client
Font
Trennschärfe <Statistik>
Retrievalsprache
Modelltheorie
Datenstruktur
Schätzwert
Lineares Funktional
Binder <Informatik>
Netzwerktopologie
Tupel
Datenstruktur
Wurzel <Mathematik>
Rechter Winkel
Physikalische Theorie
Mereologie
ATM
Ordnung <Mathematik>
Modelltheorie
MUD
Retrievalsprache
Prozess <Physik>
Kontrollstruktur
Klasse <Mathematik>
n-Tupel
Automatische Handlungsplanung
Implementierung
Zahlenbereich
Zentraleinheit
Computeranimation
Übergang
Differenzengleichung
Netzwerktopologie
Knotenmenge
Arithmetische Folge
Sigmoide Funktion
Datentyp
Retrievalsprache
Weitverkehrsnetz
Modelltheorie
Materialisation <Physik>
Quick-Sort
Folge <Mathematik>
Schätzwert
Indexberechnung
Binder <Informatik>
Knotenmenge
Quick-Sort
Portscanner
Netzwerktopologie
Tupel
Datenstruktur
Flächeninhalt
Loop
Verschlingung
Wurzel <Mathematik>
Rechter Winkel
Automatische Indexierung
Zahlenbereich
Festspeicher
Konditionszahl
ATM
Garbentheorie
Modelltheorie
Innerer Punkt
Speicherverwaltung
Instantiierung
Schätzwert
Schätzwert
Retrievalsprache
Nichtlinearer Operator
Adressierung
Punkt
Gewichtete Summe
Momentenproblem
Wellenlehre
Formale Sprache
Gewichtete Summe
n-Tupel
Zahlenbereich
Knotenmenge
Computeranimation
Tupel
Knotenmenge
Arithmetische Folge
Rangstatistik
Zahlenbereich
Sigmoide Funktion
Schätzung
Retrievalsprache
Modelltheorie
Modelltheorie
Schätzwert
Binärdaten
Schätzwert
Retrievalsprache
Matching <Graphentheorie>
Wellenlehre
Gewichtete Summe
n-Tupel
Zahlenbereich
Berechenbarkeit
Information
Knotenmenge
Physikalische Theorie
Computeranimation
Arithmetisches Mittel
Gemischtes Modell
Tupel
Emulation
Datenverarbeitungssystem
Zahlenbereich
Schätzung
Mereologie
Case-Modding
Modelltheorie
Modelltheorie
Quick-Sort
Arithmetisches Mittel
Modallogik
Schätzwert
Schätzwert
Binärdaten
Retrievalsprache
Prozess <Physik>
Gewichtete Summe
n-Tupel
Automatische Handlungsplanung
Berechenbarkeit
Information
Knotenmenge
Computeranimation
Tupel
Knotenmenge
Zahlenbereich
Schätzung
Dreiecksfreier Graph
Weitverkehrsnetz
Modelltheorie
Quick-Sort
Schätzwert
Retrievalsprache
Momentenproblem
Automatische Handlungsplanung
n-Tupel
Gewichtete Summe
Berechenbarkeit
Information
Computeranimation
Font
Knotenmenge
Bildschirmmaske
Schätzung
Flächeninhalt
Quick-Sort
Modallogik
Schätzwert
Binärdaten
Routing
Knotenmenge
Teilbarkeit
Tupel
Flächeninhalt
Physikalische Theorie
Zahlenbereich
Mehrrechnersystem
Modelltheorie
Arithmetisches Mittel
Folge <Mathematik>
Gewichtete Summe
Hash-Algorithmus
n-Tupel
Computeranimation
Netzwerktopologie
Knotenmenge
Hash-Algorithmus
Wurzel <Mathematik>
Optimierung
Quick-Sort
Videospiel
Indexberechnung
Ein-Ausgabe
Knotenmenge
Teilbarkeit
Wendepunkt
Netzwerktopologie
Rechenschieber
Portscanner
Menge
Verbandstheorie
Rechter Winkel
Automatische Indexierung
Loop
Verbandstheorie
Information
Portscanner
Parametersystem
Hash-Algorithmus
Rohdaten
n-Tupel
Knotenmenge
Code
Physikalische Theorie
Computeranimation
Portscanner
Netzwerktopologie
Knotenmenge
Loop
Verbandstheorie
Quick-Sort
Portscanner
Touchscreen
Ebene
Schätzwert
n-Tupel
Zahlenbereich
Knotenmenge
Computeranimation
Eins
Netzwerktopologie
Loop
Tupel
Knotenmenge
Druckertreiber
Menge
Loop
Schätzung
Hyperebene
Verbandstheorie
Druckertreiber
Fehlermeldung
Schätzwert
Schätzwert
Hash-Algorithmus
n-Tupel
Indexberechnung
Implementierung
Statistische Hypothese
Computeranimation
Office-Paket
Netzwerktopologie
Netzwerktopologie
Portscanner
Tupel
Font
Knotenmenge
Druckertreiber
Arithmetische Folge
Rechter Winkel
Loop
Physikalische Theorie
Neunzehn
Druckertreiber
Diagonale <Geometrie>
Schätzwert
Retrievalsprache
Vervollständigung <Mathematik>
Hash-Algorithmus
n-Tupel
Zahlenbereich
Systemaufruf
Statistische Hypothese
Gleichungssystem
Computeranimation
Portscanner
Spezialrechner
Tupel
Knotenmenge
Druckertreiber
Hyperlink
Loop
Weitverkehrsnetz
Druckertreiber
Zeitzone
Schätzwert
Ebene
Retrievalsprache
Parametersystem
Punkt
Gewichtete Summe
n-Tupel
Gleichungssystem
Gesetz <Physik>
Teilbarkeit
Computeranimation
TUNIS <Programm>
Knotenmenge
Tupel
Arithmetische Folge
Schätzung
Ruhmasse
Garbentheorie
Retrievalsprache
Graph
Gewichtete Summe
Zahlenbereich
Information
Knotenmenge
Computeranimation
Portscanner
Netzwerktopologie
Netzwerktopologie
Graph
Font
Druckertreiber
Code
Schätzung
Ruhmasse
Identifizierbarkeit
Druckertreiber
Implementierung
Beobachtungsstudie
Retrievalsprache
Offene Menge
Unterring
Gewichtete Summe
n-Tupel
Computeranimation
Wendepunkt
Rechenschieber
Netzwerktopologie
Metropolitan area network
Druckertreiber
Mehragentensystem
Ruhmasse
Mehrrechnersystem
Weitverkehrsnetz
Binärdaten
Offene Menge
Retrievalsprache
Prozess <Informatik>
Gewichtete Summe
Regulärer Ausdruck
Extrempunkt
Gleitendes Mittel
Information
Sichtenkonzept
Knotenmenge
Computeranimation
Wendepunkt
Portscanner
Netzwerktopologie
Metropolitan area network
Graph
Code
Schätzung
Ruhmasse
Cloud Computing
Druckertreiber
Implementierung
Retrievalsprache
Bit
Prozess <Physik>
Punkt
Momentenproblem
Formale Sprache
n-Tupel
Zahlenbereich
Aggregatzustand
Computeranimation
Eins
Knotenmenge
Geschlossenes System
Anwendungssoftware
Schätzwert
Soundverarbeitung
Prozess <Informatik>
Zirkel <Instrument>
Nummerung
Knotenmenge
Texteditor
Tupel
Druckertreiber
Rechter Winkel
Ordnung <Mathematik>
Diagonale <Geometrie>
Mittelwert
Randverteilung
Retrievalsprache
Einfügungsdämpfung
Gewichtete Summe
Punkt
Prozess <Physik>
Extrempunkt
Rekursivität
Extrempunkt
Information
Computeranimation
Eins
PROM
Netzwerktopologie
Hook <Programmierung>
Code
Fahne <Mathematik>
Speicherabzug
Kontrollstruktur
Kontrast <Statistik>
Quick-Sort
Druckertreiber
Gebundener Zustand
Umwandlungsenthalpie
Lineares Funktional
Systemaufruf
Instantiierung
Ein-Ausgabe
Knotenmenge
Tupel
Dienst <Informatik>
Funktion <Mathematik>
Datenstruktur
Verbandstheorie
Einheit <Mathematik>
Rechter Winkel
Zahlenbereich
Festspeicher
URL
Information
Diagonale <Geometrie>
Schätzwert
Hash-Algorithmus
Multiplikation
Mathematische Logik
Mathematisierung
Automatische Handlungsplanung
n-Tupel
Zahlenbereich
Unrundheit
Maßerweiterung
Patch <Software>
ROM <Informatik>
Code
Überlagerung <Mathematik>
Loop
Multiplikation
Knotenmenge
Unterring
Arithmetische Folge
Mittelwert
Front-End <Software>
Schätzung
Zählen
Retrievalsprache
Verband <Mathematik>
Spezifisches Volumen
Maßerweiterung
Zeiger <Informatik>
Speicher <Informatik>
Schätzwert
Datenmissbrauch
Mailing-Liste
Patch <Software>
Druckertreiber
Loop
Identitätsverwaltung
Ruhmasse
Rekursive Funktion
Innerer Punkt
Gebundener Zustand
Retrievalsprache
Hash-Algorithmus
Extrempunkt
n-Tupel
Zahlenbereich
Knotenmenge
Quick-Sort
Computeranimation
Chipkarte
Metropolitan area network
Knotenmenge
Emulation
Geschlossenes System
Loop
Rechter Winkel
Zahlenbereich
Ruhmasse
Verbandstheorie
Quick-Sort
Cloud Computing
Schlüsselverwaltung
Innerer Punkt
Retrievalsprache
Bit
Prozess <Physik>
Gewichtete Summe
Befehl <Informatik>
Aggregatzustand
Computeranimation
Richtung
Netzwerktopologie
Spezialrechner
Perfekte Gruppe
Softwarewerkzeug
Typentheorie
Trennschärfe <Statistik>
Speicherabzug
Computersicherheit
Meter
Materialisation <Physik>
Quick-Sort
Gleitendes Mittel
Druckertreiber
Figurierte Zahl
ATM
Nichtlinearer Operator
Befehl <Informatik>
Topologische Einbettung
Vervollständigung <Mathematik>
Sichtenkonzept
Rohdaten
Computersicherheit
Gebäude <Mathematik>
p-Block
Quellcode
Ein-Ausgabe
Dateiformat
Knotenmenge
Gefangenendilemma
Tupel
COM
Hochvakuum
Rechter Winkel
Automatische Indexierung
Zahlenbereich
Festspeicher
Beweistheorie
Hochvakuum
Beweistheorie
Tabelle <Informatik>
Schätzwert
Ebene
Hash-Algorithmus
Wasserdampftafel
Mathematisierung
n-Tupel
Zahlenbereich
Zellularer Automat
Code
Fluss <Mathematik>
Loop
Knotenmenge
Bildschirmmaske
Multiplikation
Arithmetische Folge
Spirale
Schätzung
Endogene Variable
Hash-Algorithmus
Vererbungshierarchie
Modelltheorie
Maßerweiterung
Bildgebendes Verfahren
Implementierung
Tabelle <Informatik>
Schätzwert
Materialisation <Physik>
Graph
Relativitätstheorie
Softwarewerkzeug
Indexberechnung
Gemeinsamer Speicher
Elektronische Publikation
Modul
Quick-Sort
Einfache Genauigkeit
Portscanner
Netzwerktopologie
Mapping <Computergraphik>
Druckertreiber
Flächeninhalt
Loop
Ganze Funktion
Rekursive Funktion
Modelltheorie
Innerer Punkt
Schätzwert
Bruchrechnung
Bit
Gewichtete Summe
Gemeinsamer Speicher
Gleichungssystem
Wärmeübergang
Statistische Analyse
Rechnen
Biprodukt
Computeranimation
Netzwerktopologie
Forcing
Retrievalsprache
Overhead <Kommunikationstechnik>
Instantiierung

Metadaten

Formale Metadaten

Titel Estimating query progress
Untertitel Theory and practice of query progress indication
Serientitel PGCon 2013
Anzahl der Teile 25
Autor Urbanski, Jan
Mitwirkende Heroku (Sponsor)
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/19046
Herausgeber PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
Erscheinungsjahr 2013
Sprache Englisch
Produktionsort Ottawa, Canada

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Theory and practice of query progress indication Your query has been running for 70 hours. Should you kill it now or ignore angry calls for a few more hours and hope it returns the result? A question many a DBA have asked themselves. This talk will try to cover some of the techniques the database system could use in order to make decisions like that easier. We'll describe an approach based on existing research papers and report on the attempt of implementing it in a useful way inside PostgreSQL. There is ample scientific literature about reporting query progress in relational database systems. Some of the papers published even mention implementations in PostgreSQL. The practicalities, however, are often skimmed over. The talk will start by describing a method for calculating a progress indicator for running queries proposed by Surajit Chaudhuri, Vivek Narasayya, and Ravi Ramamurthy in their 2004 SIGMOD paper. We'll try to see how the terms used in the paper translate to modern PostgreSQL and what practical challenges lie before a hopeful implementer. We'll continue with a demonstration of a module that could be grown into a useful progress indicator solution. The topic will also be an excuse for a little excursion through the PostgreSQL executor and its specific behaviour that needs to be accounted for when calculating query progress. We'll try to give the listeners a basic understanding of how the executor works and familiarize them with nomenclature used in that subsystem.

Ähnliche Filme

Loading...