Toward Implementing Incremental View Maintenance on PostgreSQL
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 35 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/48269 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
PGCon 201930 / 35
1
3
4
10
12
13
16
17
19
20
24
28
29
33
34
35
00:00
Computer animation
00:49
Computer animation
02:24
Computer animation
03:12
Computer animation
04:35
Computer animation
05:35
Computer animation
06:31
Computer animation
08:37
Computer animation
10:15
Computer animation
11:25
Computer animation
13:28
Computer animation
15:36
Computer animation
17:23
Computer animation
19:36
Computer animation
20:28
Computer animation
21:13
Computer animation
22:15
Computer animation
24:11
Computer animation
25:02
Computer animation
26:06
Computer animation
27:24
Computer animation
30:38
Computer animation
31:26
Computer animation
32:24
Computer animation
33:14
Computer animation
35:44
Computer animation
36:55
Computer animation
40:57
Computer animation
Transcript: English(auto-generated)
00:06
Hi, hello everyone Thank you for coming to this talk. The presentation title is to what implementing incremental view maintenance on Postgres and I Would like to talk
00:21
about our work in progress implementation of incremental view maintenance on Postgres This is a technique to update material views Efficiently Before starting I would appreciate if you hold your questions
00:43
Until to the end of presentation. Thank you. I My name is Yugo Nagata an engineer at SRA OSS in Japan. I Have a postgres
01:01
Experiments Take any kind of support consulting and some education about Postgres internal architecture and currently I am in charge of around the section and especially I Am working on incremental view maintenance
01:22
today's topic This is a outline of this talk firstly I Introduce views materialized views and what is incremental view maintenance. This is called
01:42
IBM IBM then I talk about what to be considered to implement this and It's my explaining Our work in progress patches. I patch this idea and how it works Next I'll show some behavior examples and a simple performance evaluation I performed
02:10
Finally, I summarize this after some discussions I'll talk about what is incremental view maintenance
02:25
Let's start about views View is a virtual relation based on the result of stored state query and view is defined by a definition query on base tables
02:41
this is a Sorry, this is a simple example of a simple natural join views In the database only the definition query is stored So and the content state is not content
03:01
So the result is needed to be computed when query is issued to view On the other hand material view passes the results of the definition query
03:20
in the table like form So this does not need to compute the result when a query select query is issued on to this view It enables faster access to data using the normal abuse
03:40
However, the data is not always up to date so materialized view needs to matter maintenance of the data each data the data of material view is computed at the definition time
04:01
If the view is created This way is similar to create table as statements Maran that is the view That the result of the view definition query is computed and said
04:21
The result is inserted into the view To keep consistency between the materialized data and the views tables. We have to maintenance of this view so Refreshing materialized view is a way of the maintenance in process QL
04:45
When refresh materialized view command is issued The contents of materialized view is replaced with the data The result of the query is
05:06
Definition query is computed and result is inserted to the temporary table And then the data is replaced with the current materialized view data Might were by swapping
05:22
this means this need To recomputed all the result of query from scratch to refresh materialized view With concurrently option the materialized view is refreshed with
05:44
A weaker lock and without locking out concurrent select on the materialized view In this case the result of the differential query is computed and then that so this different is
06:04
Computed and this is merged to materialized view This has a weaker lock but it is safe that we need to recomputed all the result of the definition query from scratch
06:37
Incremental view maintenance We call RVM is another technique
06:44
to maintenance view maintenance materialized view This computed and applies only the incremental changes to the materialized view This figure out RVM
07:00
Formally This Base tables data and This is a B definition query and using this table data and query We can compute it content of materialized view
07:20
and Then where some update is issued on the base table. This table is modified to This is updated updated base tables content We can compute The applet so up to date the latest materialized views content by
07:44
Using view definition and this new contents of the table This is the computation and this is that the refresh command is doing on the other hand
08:02
Update to base tables from the from the change of base tables we can compute the change of the materialized view and Applying this change to the old materialized views data and we can get
08:22
New up-to-date materialized view contents data This red line is that what IVM is doing I'll explain the basic theory of IVM in this slide
08:44
So This is a view definition the view definition is described in Relational algebra form this is a simple example This is a natural drawing view of table R and R and S
09:12
Okay and This In this notation This navra are
09:21
embedded Greek Delta R is a tuples deleted from this table and This third triangle are there are is a tuples inserted to this table You think this
09:41
changes deltas and this view definition we can compute changes on view now will be an database In natural joint case this is
10:02
Like this and After that applying these changes to the view we can get Up to date materialized view data This is an example In this case the contents of R is like this
10:24
contents of table S is like this and this is a natural joint view Okay after table R is changed
10:42
R is changed the first rows this data is changed So number R is like this. This is a Or number R. This is an older data and Delta R is this is a new data This is a changes on the table
11:03
and in this case we can compute the change on the view by calculate joining this changes and Table S and the result is like like this number V on delta V
11:25
Finally the view is updated by applying applying these changes like this number V and delta V Applying this way this
11:41
Tuples in view is updated updated like this Next I Explained or Implementation idea of IVM
12:09
There are several considerations for implementing IVM first have to extract changes on base tables
12:22
About this we use after triggers and pruncheon tables I Explained about this in other thread Also logical decoding of a word bar is another idea
12:44
But we have not a conscious this will so we use after trigger second How to compute that delta changes on to be applied to material abuse
13:01
This is basically based on relational algebra algebra In theory we can handle barriers view definition Complex view definition, but We start we start from a simple view definition that is
13:26
selection project on join use Sorry, yeah Fine fine to mutt maintain paint up late, but as abuse There is a two approaches in theory
13:45
One is immediate maintenance and another is deferred maintenance in immediate maintenance views Updated in the same transaction where the base table is updated
14:02
On the other hand in deferred maintenance The views are updated after the transaction is commit For example fan view is access it as a response to use command right to fresh command or
14:23
The view is updated periodically and so on And Rest how to handle views with top of duplicates what distinct clothes?
14:45
This is an example of views with double duplicate There are two semi tuples this is duplicate and Suppose applying this change to this view
15:01
In This case only one couple is this only one top must be deleted rather than deleting all of them We cannot use simple direct a direct settlement because this we
15:21
delete both of two of tuples and matching The Values of the column another problem view with distinct class
15:41
in this case Duplicate is duplicate in best table is eliminated by distinct class Suppose uprising these changes to dispute
16:00
For Everything this purpose a popular should be This tupper should be deleted if and only if Depreciate over the Tupper becomes zero after this step deleted if
16:21
this means Deleting this table only one so if best table had Only one this table this channel deleted disputes, but This best table have to this tuples
16:43
We can delete this top from view because One couple is one couple is left in view in best relations Also, thank inside in this couple Additional topic cannot be in fact if there is already already the same one
17:07
This table is already exist in the view so after is that applying these changes? We cannot insert a new table in this view
17:30
Other implementation or I mean we last year we proposed an I will implementation at PC conf you
17:41
28 2018 this is a POC a proof-of-concept implementation of using a tupper row or IDs Or ID due to all ideas this can handle builds with the priesty correctly
18:03
The OID is used as a primary keys on Tupper's in materials views OID is System current so this is invisible to use us so we can handle views with Tupper duplicate correctly
18:21
if Tupper's Value is same OID is different so we can Use OID Also, this kink is not supported in this implement and in this implementation
18:41
material views can be Incrementally updated using refresh command, so this was a kind of deferred maintenance But This there are some problems on this approach First of all or why the system come is removed
19:02
In postgres 12, so we can't use this approach any longer Also, there was some difficulty to implement difficulty for implementation because this needs Many changed in executor nodes
19:22
for example Therefore After some surveys on IOM we started to work on a new IOM implementation The working in progress part of this new implementation has been submitted to the business hackers mailing list
19:49
From now I explain this implementation This provides a kind of immediate maintenance That is material view
20:02
Updated automatically and incrementally after their best tables are updated This is performed in Aptatouria Also this support of views including the duplicate tupper's or distinct growth in view definition
20:21
For this purpose we use counting algorithm What is counting algorithm? Counting this algorithm is an algorithm for handling tupper duplicates or distinct in
20:40
maintenance in maintenance view maintenance In this algorithm the number of tupper is counted and stored in material view Example sorry This means first tupper Has
21:02
One tupper this tupper has two tupper's answer this tupper is has one diversity and then then Just this changes applied to the view
21:24
for tupper insertion And the tupper had to be inserted into a view that count is increasing Okay
21:41
This is insertion and This counter is incremented and fender tupper are to be deleted from the view the current is decorative And if the count becomes zero and this is one to From one to two, but if this is from
22:04
from one to zero This count if the count becomes zero the tupper must be deleted from the view Next actually how this works in detail and how to implement it
22:24
Think waiting material view to create material views with supporting with IVM support this syntax is used create incremental material view This command create material view
22:42
Which is automatically and incrementally updated after base table change it Please note that this syntax is just tentative in the In the current part so it may be changed in future
23:01
This is a example create a incremental matters view. This incremental is Every option During fighting Matter as view is populated. It's a number of the top up counted by
23:26
Adding counter a star is a star and group by is to the way And this result of counting is stored in the material view as a special common name
23:44
We use IVM counter in the current party this is a equivalent query performed at creating materials view Select count as IVM count and group by this is columns
24:05
In target list and also in the after polygons On all base tables are created
24:20
They are created for insert delete and apply to query and as a statement level trigger and with crunch on tip, I'll explain the next slide and Not that this polygons are created Automatically and internally instead of issuing create trigger settlement directory. This is
24:46
similar to the implementation of foreign key consolines This is a great example Decavalent query of this equation create trigger
25:01
This same a Frontion table is a feature of after three words introduced from since postgres QL 10 Using this feature Changes on table can be referred in the trigger function by using table name specified by
25:28
referencing Class right this in difference in new table as I've a new table all the table as I've been all tape In this case
25:41
IVM all the tape contains a purpose deleted from the table in this in a statement and I've a new table content of toughest and newly inserted to the table in this settlement in theory This is collecting
26:00
Nabla are on data are respectively In the after trigger function change on views are calculated This is performed by rewriting the view definition query Specifically by replacing the base table in the query with a function table specified
26:25
And friends the trigger is created and also Count us and group by is added to count the diversity of tuples and the result in stored inserted into
26:41
temporary tables Which corresponds now will be and There's a way that changes on view and this is a equivalent query Okay This is
27:02
same Previewed query in the current Asta and the counter by is added and since the difference is The table name is the press it by transition tape their tables name And this is all the table on this new table
27:25
After calculate this changes the view can be updated by merging this for each tuple in changes table if Corresponding a tupper already existed in the view the value of I've M count is
27:44
updated instead of executing delete and insert delete or insert a settlement simply And then the current becomes Zero the corresponding Tupper is deleted from the view
28:02
This process is performed by using modifying CTE Like this And this is a fan applying Double division
28:22
and After searching material abuse and if the same Tupper is Existing in the view so I've M current is updated and if not
28:40
If the current becomes zero so Tupper is deleted And Tupper insertion example a updated I've M count If the Tupper is already in the view and if not
29:02
insert the new Tupper's in into the view Finally fan what is the view? Is accessed that is the fan select query is issued for much as a view and there are two cases
29:25
One is view is defined with distinct class and case as us is Distinct is not used Then this thing is used in query all come except to I've M count
29:42
of each tap return Not that the diversity of the tablets are already eliminated by A grape by so this is a correct result of distinct And if distinct is not is not used in view
30:05
Each Tupper is returned I've M count, but it's timeless This is performed by rewriting the query to replace the view with a sub query
30:23
which joins The view itself and generate a series like this generation one to the I mean kind of value
30:40
I'll show some example behavior This is a simple example This created simple matter as a view with I've M option And after and that context is at this and
31:04
Inserting a couple in its best relation best table And then Select query for this view returns the new tupper's That is a view is
31:21
automatically updated This is a simple example This is an example view with double duplicate This is a best table and matters view with incremental option is created like this
31:50
Base table has a four rows, but the contents of matter view is actual three rows
32:01
And this is because diversity is eliminated by group by And then And after that the contents of matter view is that The result return result shows There's four rows
32:28
After inserting a low 5b into the best table and very thing 1 a and 3c from this base table
32:40
After that the contents is correctly updated The before content is that and the new contents is like this 1 a 1 a is deleted and
33:01
the 1b and 1c C is deleted and B is added as an example
33:23
so IBM current clamp is Normally invisible for users as shown before however this can be visible If this is specified explicitly in the target list this may change in future, but
33:43
This means the a 5b is There are two beats in materials view and one is in materials views and then explain command is performed You can see the internal
34:01
mechanism you try uses of generate speeds like this explain result finally I'll show the result of a simple performance evaluation I use the two materialized view of a simple natural joint and the
34:27
Pg bench table is Used the tables are used and the scale factor is 100 and one view is a normal materialized view and another is a materialized view with incremental option and
34:44
The different definition is the same first of all I measured execution time of refresh of a normal materialized view in this case it looks
35:04
It took more than 10 seconds On the other hand after creating index on matters view and updating And one over the best table the view Pg bench account
35:23
It takes only 18 milliseconds And the contents of matters view is updated automatically and correctly this is rapid than normal refresh command
35:44
however after dropping the index I think the best table took about four seconds Although this is a faster than normal refresh
36:00
But this shows the appropriate index is necessary on the test view for efficient IVM this Information is a working progress. So there are many restrictions the current implementation supports
36:26
view with selection projection in a joint and distinct aggregation and group by and Self-join sub query after joining city window function set operations are not supported for now
36:42
However, I would like to deal with a part of Applications by the first list of IVM and about a timing of view maintenance the current
37:01
implementation supports only a kind of Immediate maintenance that is views updated immediately immediately after a base table is modified On the other hand, we also need deferred maintenance In that approach views
37:23
Updated after the transaction for example by the command is a command like refresh So to implement this deferred approach we need I think we need mechanism to maintain change logs of base table
37:44
For example, we're Maintaining that changes Ocado in base table and maintain and control
38:01
What log is needed and what log it's Unnecessary and can be deleted and so on Also, there could be another immediate
38:22
Mean there is implementation in that The view updated at the end of transaction that modified the best tables instead of after triggers In this case Several tables related
38:44
Several table are changes several best table can be changing. So this approach also need to Implement log change log maintenance mechanism as well as deferred maintenance
39:02
There will be discussion about counting algorithm implementation of this part firstly in our implementation IVM count is tolerated as a special current name. So Maybe this name has to be inhibit in user table or at rest in materials view
39:23
However, I'm not sure it is accessible to use such a special current name There is as a better way Second
39:40
General rate series function is used from material views with Table duplicate is accessed Actually This does not Have to be generate a series function. We are able to make a new set of returning function for this pump this purpose
40:02
but there Despite of their performance issues And this is a problem first due to using such a function Planar low number estimation gets wrong. So this can
40:20
have an impact on performance and also the cost of Joining the function and as a view it could be high So due to this performance issues, so maybe
40:40
We might have to add a new plan node To support to implement counting algorithm instead of using a set returning functions other issues
41:02
Fine concurrent transactions modify best tables. There could be a long lock waiting Raise conditions that these are the wrong result of the maintenance This is a big problem
41:22
so We need more investigation investigation for this issue I mean we are investigating now in addition we Could do some optimizations for example
41:42
Counting algorithm is not Necessary not needed if a view does not have distinct work Duplicates double duplicate so in that case We can skip extra work for counting Also fan say overhead of IVM is higher than normal refresh
42:07
We should use Later a normal refresh instead of IVM for make this decision We would we can use
42:21
optimizers cost estimation This might be useful sorry In this talk we explain our implementation of increment implementation of incremental view maintenance On post work you will and this are always a kind of immediate view maintenance using after real
42:48
This can handle views with Topics duplicate and distinct class using counting algorithm
43:04
This is a back in progress and so there are a lot of to do in future By the first Brief of IBM I direct support aggregation and group by and I would like to implement
43:21
Defied maintenance after that of course we also resolve concurrent transaction issued and consider some optimizations This Working in process patch a has been submitted to pieces have got the list
43:42
The subject is implementing Incremental view maintenance, so we are so happy if you give us any suggestion opinion comments on this thread Thank you, that's all any question
44:33
Collect the ID of the piece that would have been affected by the change
44:47
Reusing the view definition with a filter for just
45:25
You're gonna run into problems when you get into aggregates that are not reversible and some are reversible
46:08
But that's a loss Can I put it in the next No Okay The whole point of the story I don't know if anybody else thinks about this but
47:15
I like the idea of actually doing it but if there is some
47:33
Option to do
48:12
If we use low-level trigger Calculation of change the own view is necessary for each rose
48:27
I think It Have a rise overhead So so I think it is better to use statement level
49:22
What do you do what is your opinion about? These are block tables materializing logs versus what I understand to be temporary tables in your implementation right now Seems like you'd be doing inserts into either the temporary table or the materialized view log in either case Why choose one versus the other?
49:43
Why not create a materialized view log? At the temporary table? No
50:04
The question is why we don't use a transition road I'm just trying to work Like Oracle Why not do that
50:22
I just started Small stuff so But To support deferred maintenance
50:41
We need Some log mechanism log maintenance mechanism So I'd Implement designed log mechanism eventually
51:31
This is a somewhat other folk and Pink passing passing a scared statement
51:45
And expanding after this select after this target raised If The table
52:00
If this name is crumb is there just ignore it
53:09
Yes, yes, yes, I'm aware Yes, I Realize this thank you for solution. Thank you. Thank you