PGCon 2018 - Lightning Talks
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 37 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/49122 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
Computer animation
03:46
Computer animation
04:36
Computer animation
05:29
Computer animation
07:51
Computer animation
08:49
Computer animation
10:34
Computer animation
12:03
Computer animation
13:13
Computer animation
14:19
Computer animation
15:04
Computer animation
18:05
Computer animation
23:18
Computer animation
27:47
Computer animation
28:50
Computer animation
34:08
Computer animation
38:56
Computer animation
42:39
Computer animationEngineering drawing
43:35
Computer animation
44:55
Computer animation
45:58
Computer animation
46:55
Computer animation
47:55
Computer animation
48:54
Computer animation
50:36
Computer animation
52:28
Computer animation
53:20
Computer animation
54:19
Computer animation
55:35
Computer animation
57:24
Computer animation
Transcript: English(auto-generated)
00:16
I guess we might as well get started and let him do it. So first off is Joe talking about securing Postgres.
00:23
Joe? Yeah, actually, this is just an idea that came out of the talk that I'm going to do tomorrow, which is much longer on securing Postgres. It was just too detailed to do in that talk. So it's basically just something I thought was a neat idea. It was actually inspired by Magnus. He did a talk at one point on TARDIS for Postgres.
00:45
So this is the idea of doing kind of a history table using RLS, range types, and partitioning. So basically, you start out, you create the base table called time travel. It's got two regular fields, and then this tr field is basically a time range, tstz range.
01:04
And the idea is basically that the upper part of the range is going to be set to infinity for current rows, and it's going to be set to some value for rows that are expired, or in other words, rows that either have been deleted or updated. So in one partition, the first partition,
01:22
I set the constraints so that basically the upper range being infinity goes into that partition. The other one is called history, and so if the upper range of the tr range column is not infinity, it'll end up in that partition. This get pit function is just kind of a utility function.
01:40
I did this with kind of a custom configuration variable. I know Tom hates when I do that, but it does work pretty well for kind of session variables. And I'm just going to set that thing to a value. If it is a value, then I'll use it. If it's not a value, I'll just use the current clock timestamp. And then I set alter table, and I enable row level security,
02:01
and create a policy that basically says, is my tr field, does it contain this timestamp that I'm using? And then finally, there's a trigger that will maintain this for me. So anytime I update a row, I'll make sure that I create the old row with the upper range of the tr
02:23
field set to the current clock timestamp, and any new row, the lower range, has the current timestamp. So that's the function and the trigger down there. So now just generate some data so that we can see how it works.
02:41
And then in this first update statement, I'm changing one of the values where ID equals 42. And I'm going to return the timestamp, basically, and set a psql variable point in times that I can refer back to later. I then delete another row.
03:00
And you can see when I do my query, I only see the row that I expect to see based on the current time. And I'm going to do a couple more updates, and I see exactly what I would expect to see. But then when I set this config variable, based on that point in time I captured earlier, which was before I deleted that row 4242, now you can see I'm seeing the original update number
03:22
1 instead of update number 3, and I'm seeing row 4242, just as if I had gone back in time. But if I reset myself to the super user, super user does not get affected by RLS, and now I can see all the rows. And that's all I have.
03:48
For taking my idea and making it more awesome with partitioning in RLS, next up is Thomas. And yeah, I don't know what this is. We'll see about that.
04:00
OK, hi, everyone. I'm going to be speaking about Scheme and Postgres. So about a year ago, I saw this email as eating my cornflakes in the morning and reading the commit log. And I saw that, does this work? Yeah, PL Scheme had been removed from the manual as something that was supported.
04:20
I had a lot of thoughts that came to me at once, like well, we actually have PL Scheme, and now we don't have it. And then I started wondering, what could you actually do with Scheme inside a database? Why would you want to use Scheme inside Postgres? And there's an idea that I had sort of been wondering about in the past in another context altogether.
04:42
If you want to move code closer to data, like typically if you've got some kind of, say, a web server taking requests and then making a bunch of requests against the database to build the response, you might make multiple requests to the database, and you might not want to do that. So you might come up with ideas about using stored procedures or something to reduce the number of round trips that you make.
05:03
Stored procedures, or I should say, F-manager functions or extensions, are fun, but they're kind of also not always fun. And I thought, well, what if we could just use alien technology to just make the code move into the database without having to do all that stuff, just using magic? And I thought, well, this would be a nice experiment
05:20
to see if I could do this. So I tried to do this on my flight to Canada, which was quite long. So I managed to get it kind of working, and I wanted to talk about it. Really what I'm doing here is just showing a, like a, really I'm just trying to raise interest in PL scheme, but this is like a use for it.
05:43
So yeah, here's some code doing a bunch of stuff, running a bunch of queries. And if you wrap it in this remote form which I cooked up, then it'll just magically move this code into the database and run it there. So without the pink remote xy thing,
06:00
you can see that it's actually, that block of code is doing three updates, and it's using variables that are in the local scope. If you wrap it in this remote thing and tell it which variables need to be kind of transferred when it does this, you could come up with something clever that could analyze that automatically, but I just did it explicitly. And in this case down here, this is an expression which runs a couple of database queries
06:21
and then builds a string containing bits of both of them. And the whole thing gets pushed down to the database if you wrap it in that remote thing. So basically, this is compiling code and caching it inside the database and reusing it so that it's relatively efficient and getting the result back to you. So you could just put that around it or not.
06:42
It'll still work both ways, either locally or remotely. Another thing which I'm interested in doing with Scheme in Postgres is automatic transaction management. So for example, if you're using serializable and you want to deal with retries, if you explicitly or declaratively mark the chunk of your code that is a transaction,
07:00
then it can help you retry, and that's quite useful. If you combine those two things, then you get finally a system which can automatically retry transactions when it needs to and also do a single round trip to the database. And that's what that's showing there.
07:21
Now, if you push random code into a database and execute it, then I don't know. Presumably, this user already had some. It's the same with running stored procedures. If you grant someone the ability to define stored procedures and let them run, then they can do anything that that stored procedure could do.
07:40
Whether this is worse than that, I don't know. You could probably the solution to the way to manage that is probably based on modules. So yeah, there's probably some module-based solution you could do. My main point here isn't really to say that you should do this, really just to show PL schemas back
08:04
and to raise interest and see if anyone's interested in helping get that back into a usable form. That's all I got.
08:23
Yeah, that was not entirely what I expected when I saw the title. That's interesting. So next up, we have a new extension, PGPG. And with Gills, who's going to teach us all about it? Do you want to click her? Yeah, thank you. There we go.
08:40
Go ahead. Thanks. I'm currently working in a new extension to emulate a global temporary table in Postgres SQL. This is the ORAC steel global temporary table. This is also what is defined in the standard SQL.
09:04
So you have permanent tables created by the DBA. Everyone, every user can insert data in this table, but it only sees its own row. Row are persistent on transaction or in per session,
09:22
and we can create the temporary table in any schema. This is not the case with usual PostgreSQL temporary table. It works with analog tables for performance
09:41
and an item column who stores the session and the back-end PID with the back-end start time. There is also row-level security to make user only see his rows.
10:03
And there is a view to hide the PG, the item column. The rows cannot be removed by the back-end. It will be too slow, so there is a background worker who
10:24
connects to each database and removes obsolete rows per session or per transaction. It's a five-second nap time for the background worker
10:41
by default, but you can tune this. But next, this is an extension that could be very helpful for ORAC to PostgreSQL migration because global temporary table are often found in ORAC database.
11:00
And if you have an application who is creating too much and deleting too much tables, you have bloat in your catalog, so this can help you. This is a fresh extension, so it's a bit slow now.
11:21
I have to work more on this extension, but the goal is to make a POC approval concepts to have a patch in PostgreSQL core. Once I have something performant, I will submit a draft to the team,
11:43
and we can see if we have any interest in having patch in the core. That's all. Thank you.
12:07
Leticia and Sarah, come to talk to us about PostgresWomen. Go ahead. Yeah, so initially, it's a group in March.
12:26
Oh, we only have one. You went through that fast. OK, so here's some facts. Women comprise about 50% of our global population as well as the workforce currently, yet they're involved in computer science has dropped over time pretty heavily.
12:42
Men and women used to break even in degrees and careers in computer science back in the 1970s and 80s, and that number has dropped to about 35% on average today, from 50%, obviously. And then specifically for our community, data science majors and careers for women, that number is 12%.
13:00
That is the lowest of all the STEM fields. And I mean, come on, the first programmer in the world was a woman, Ada Lovelace. What happens? What can we do about this? OK, so once we figured it out, we decided we wanted to change it.
13:20
Why? Just because we decided that there were no reasons why women can't be so talented as men. So it's too bad to cut off all that talented people
13:41
out there. So data science isn't a men's only club, right? So please, if you care about the future of not just our community, but just technology in general, everyone has a hand in fixing this. Men, women, all genders, right? So all of you have the ability to be kind, understanding, and encouraging to young girls and women who are
14:02
looking to approach the STEM field. If nothing else, you could be a mentor, or you could even offer advice, like how do you get started, or where do you want to go, or maybe I could give you a recommendation, something, you know? And finally, if you're a woman, step into the light and inspire someone. You never know whose life you could change. And so you can follow us on Twitter, Facebook,
14:22
and we have a mailing list on posresgel.org. All people are welcome to our group, and we'll try to make it better for the future. That's all.
14:48
So next up, we have Federico, who I guess is a chef? Yeah. Oh, perfect. Yep. Anyone hungry? Yeah. It's going to be worse after this. Possibly. Hello.
15:01
So I decide to check if posgres can be used for cooking the perfect carbonara dish. I'm Italian. I care a lot about the cooking, and there's a lot of misunderstanding how to cook the perfect carbonara dish. So posgres is amazing. And let's see if we can cook a perfect spaghetti carbonara.
15:22
So first thing, we need to create the tables, the containers. So we have the boiling pot, the mixing bowl, the frying pan. Everybody, every table have the natural key on the contents. And we have the temperature, Celsius, sorry. So the next step is to create
15:43
the table for our ingredients. So the ingredient name, quantity, the alternative. We have known many alternative, but there is something. Notes, status, and the ordering. But it's very important to exclude the blasphemous ingredients. So we add the check constraint to exclude ham, cream,
16:02
belly slices, parmesan, wild rocket, anything. Pineapple, I also seen spam inside the carbonara. No way. So the next step is let's add the ingredients, spaghetti. The notes, it says, if you want to al dente, just subtract one minute from the cooking time.
16:21
And pork cheek, we can replace with bacon. Sorry, Roman fellows, it's not nice, but I can do it. So grated pecorino cheese, medium side eggs, coarse salt, fine salt, two pinches, ground black pepper, olive oil, on conflict, do nothing. Then we create a story procedure,
16:44
a function to boil our carbonara, cook our carbonara. So we raise notice. Let's add eggs, pepper, salt, mix in bowl. So we insert into the mixing bowl and then we mix everything all together with a fork
17:00
and raise notice, profetto. Next step is we add olive oil in the frying pan, turn on the fire, and add olive oil, pork cheek. There's a bug. I forgot to add the pork cheek to the notice. So turn on the fire, wait until the guanciale is cooked and don't make it crispy too. So after 10 seconds, molta bene.
17:24
And we add the water with a simple select over generate series. We wait for the water to warm up. When it's up to 100 Celsius, we can add salt and pasta. And finally, we drain the pasta,
17:41
we delete from boiling pot the water and coarse salt, put everything together, we add the cooked guanciale, then we add the egg turn, mix all together, add the grated seeds, mix all together, grated Pecorino seeds. So insert into boiling pot ingredient name, grated Pecorino romano cheese.
18:01
Buon appetito and returns spaghetti carbonara. This is the output. And if you want to download the gist, there's the link down there so you can download the entire source code. Thank you very much.
18:21
Thank you. Maybe we should have put that at the end right before that. So now for a more serious topic.
18:40
Let's build an application server. Hello. Hi. I would like you look at Postgres as an application server. And when we get Postgres as application server, we have to think about deployment and all related issues like the bug test,
19:09
code coverage is a hard thing, I think. And we have to do some code deploy system for functions. Of course, trigger consists function
19:24
and now I will show some problem with trigger's function. Okay, both continues. We have to do deployment without any effects on production.
19:47
And I'd like to mention works free. It's, I think, very important. When we, there is some general pattern for deployment.
20:05
And I have, I got a good experience when this general schema implemented through search path replacement.
20:23
When, if a client of, database's functions wants to use new version of a functions API, then we switch client to,
20:45
by replacement schema to new version of API. You can see this general schema is a very long method by, I think.
21:03
And when we work with function, everything, okay. But trigger actually is not a simple function call.
21:20
It, when we create trigger, we, we resolve function name at creation time. And then when trigger,
21:42
then when trigger are fired, at runtime, we, at runtime, Postgres, that doesn't resolve function name. So, so we can,
22:02
we can do something like this, with red, red highlight. We can do some wrapper function, and already community maintain some general trigger.
22:27
Or, or I can imagine something like this. We can declare a trigger with some option to do resolve name in runtime.
22:46
And it was my message. And still some issue present in, at least about invalidation in triggers too.
23:02
Thank you. Any question, maybe? Michael?
23:21
Yeah, I think he's a little bit scared to talk about corruption. That's why he was still over there, right? I'm not, I'm scared. Yeah, raise your clicker if you want to move. Okay, thank you. Let's corrupt things. So, who has ever seen corrupted instance of Postgres? Just raise your hand.
23:43
So, you saw corruption, and wah! It's the end of the world. Basically, what happens, what are the kind of messages that you would see from Postgres side if you see a corruption? There are many of them. Like for example, the first one is kind of famous. The could not read block N of relation X, Y, Z
24:01
read only zero out of eight kilobytes. Like who has seen the first one? Just wondering. The second one, that is missing from relation. If you are there, like, it's pretty bad already. For the third one, using page checksums.
24:21
Ah, okay. Because I heard a lot of good things about page checksums and actually, you suffer, so it justifies its own existence. So, corruption, you can have many, many, many origins related to any kind of corruption. Postgres can itself, for example, is one.
24:44
Where you could see, for example, the could not read zero out of eight kilobytes, which could be caused by the OID consumption, for example, which has been a recent bug fixed. Where there was also in the past a bug related to the visibility map and the replay on some bytes.
25:03
But you could also have things at the file system, the kernel, the more the application stack gets complicated, the more, the largest amount of risk you potentially see for any kind of things. Of course, you have the CPU memory disks and you have also other external things,
25:21
like for example, cosmic rays, it's proven can cause three bits. Well, if that happens, I don't know. So, there is a wiki page on that. If you see any kind of corruption, first read that. It has a lot of references and things. So, what should you do when a corruption happens?
25:44
First, your breath. And then you stop Postgres immediately. And you are in save as much as you want, save as much as you can mode, really, really at this stage.
26:03
And one thing that people tend to forget before playing with a broken data folder is that you should take a cold copy of the data folder itself, which you can begin to play with, such as you can save as much data as you can from the instance. When it comes to try to save a corrupted cluster,
26:22
you can have a lot of imagination. For example, for the missing column for a given instance, well, if you have a fixed schema, you can just update the catalogs manually, and you may perhaps be able to get back some data. PostgreSQL also comes with a couple of options.
26:41
Zero damaged pages, in which case you would actually lose data which is on the pages which got corrupted. And you can also ignore checks and fail ups, in which case you may actually crush the instance of Postgres itself. So you have a bunch of options, but be sure that you take a copy of the thing
27:03
before doing anything, you put that in the same place, and then you try to save as much data as you can. And after that, what can you do to try to detect corruption, to try to find it? You have pgdump, pgdump. Who thinks pgdump is really a corruption detection tool?
27:24
Who has already used it to try to detect that? It's not a corruption detection tool. It's in the name. It dumps data. You don't use it to detect corruption. Okay, it's cool, it does some sequential scan,
27:40
and perhaps you may be able to find some hit pages corrupted all the way, but please don't do that. We have other tools to do that. One of them is pgcatcheck, which checks for consistency in the system catalogs themselves. For example, imagine that you are missing an attribute. Emulate that by doing a manual delete on the pg attribute table, and you would see that.
28:03
You have amcheck, which is able to do a bunch of checks. You have page checksums. And we have also new tools like pgverifychecksums, which is able to check for a stopped cluster that checksums are in a clean state.
28:22
And we have also other tools like, based on that, there is something called pgchecksums that we can use to enable checksums, disable or verify them, and that's all.
28:43
Do you want to do a very fast lighting tool? No, no, no. Okay. Then don't. Then we'll pass it on to the next sponsor we're supposed to do, a lighting tool, which is, do you know? Who are, what are you talking about? Partitioning advisors. Yes. That's good.
29:01
You weren't planning to talk about the thing that's on the slides? Yes. Perfect. Then go ahead. Thank you. So, I'm Julien Rowe from France, and I've been working with Yuzuko Hosoya for quite some months now to a new feature for HypoPG, which is a partitioning advisor for PostgreSQL.
29:22
So, the idea of partitioning advisor is to create fake partitioning to, like you say, you have a big table, and you say, what, how my application would behave if, fortunately, I would have partitioned my table? And the goal of this extension is to be able to do that,
29:42
try a lot of partitions, different partition schemes and whatever. So, HypoPG is a very simple way of an existing extension, which already does some kind of advising for indexes. It's a little bit the same. It just lie to Postgres and say,
30:00
how my query would behave if this or this index existed in my database? So, you can define a lot of hypothetical indexes and see your query plans using Xplain without analyzing and see how Postgres would behave and if it would use the index or not. There are some other project which is using this extension.
30:21
I think like Dexter was published recently. So, why doing this feature? There are some problems. Like the first one is real partitioning on Postgres is quite new. It was introduced in PG-10 and quite enhanced in PG-11.
30:44
So, I think most people don't really have enough background on how to partition, on which column to partition and everything. So, the idea is to be able to quickly check with your real size data for this and try a lot of different partitioning schemes.
31:03
So, I will let Hosoya-san continue. So, microphone. Thank you. Yes. Let's see the example. Yes, at first, I'm using this target table or hyper table.
31:21
Three million rows were inserted into this table using generate series. And in this case, as you know, the query plan is like this. And let's create hypothetical partitioning schemes. You can define the hypothetical partition table
31:43
using hyper-PG partition table. You have to specify the target table as the first argument and the partition by clause, the second argument. And so, you can define the hypothetical partitions
32:01
using hyper-PG at partition function. You have to specify the partition, hypothetical partition name as the first argument and the partition of clause and the four value clause as the second argument. These functions require apostrophe
32:22
to specify these arguments. And after that, these hypothetical partitions table and hypothetical partitions are stored into the back-end local memory. And so, let's see a query plan using explain again.
32:42
So, just like this, you can see the hypothetical partitioning schemes. And also, you can find that partition pruning works well. So, in addition to this example, you can simulate list and hash partitioning
33:01
and partition-wise join and partition-wise aggregation and anyway join. So, partitioning advisor is already helpful to design partitioning schemes. Our next steps are as follow. We propose to have bit integration in Postgres score and we will improve size estimation,
33:22
size estimate for hypothetical partitions. And for now, we have been developing for Postgres 7 so we will support Postgres 10 and we will support much level hypothetical partitioning and we'll support hypothetical indexes and hypothetical partitions.
33:41
We plan to release the first version for Postgres 7. We hope it will help all of you. It's a working process for now so we would be happy to have some feedback. That's all. Thank you for listening.
34:13
Hi, everyone. I'm Chen Biersek from the Libo. I'm writing Python for PostgreSQL.
34:22
And last year, I wrote a tool you may have heard of it. It's called LDAP2PG. It's not a migration tool, it's a synchronization tool. When you use LDAP with PostgresQL, you just set up the PGHBA, how to check the password in a LDAP directory,
34:40
but you still have to create the rules and privileges and options and memberships. And so this is where LDAP2PG helps because it can be very cumbersome to write this manually or to write another script.
35:01
Okay, so there is a few solution already to create the rules. So the first option is just to create them manually. You can even save SQL file in source control. Since you don't save password in it because the password in LDAP, it's quite easy to save it. But when you have a lot of rules to create,
35:24
it's very hard. So you begin to create a script that creates LDAP to create the rules. And then the PostgresQL can get connected with the rules in PostgresQL. But that's very hard and you can do a lot of mistakes and you don't know what rules you have to remove
35:42
to drop. So there is a lot of such script in the industry, but I don't think it's a good solution because there's a lot of bugs in it. Ansible has a few modules to manage rules and privileges that are very nice. However, it's very hard to do LDAP queries in Ansible,
36:04
and so it's very slow when you begin to have a lot of privileges to manage. There is a tool that is unfortunately unmaintained, and maintains pg-ldap-sync. And I borrowed some ideas from it,
36:20
but that's the problem with some open source tool. It's not maintained. One question is why does LDAP to pg manage rules and privileges? It's mainly because of default privileges. You want to create default privileges once you create the role, the owner rules itself.
36:45
If you have one single owner rules, it doesn't matter to manage privileges. And you're not supposed to manage privileges with LDAP to pg, you just can ignore it and just manage rules. Also, it can be easier to manage rules with LDAP to pg
37:03
because the YML syntax is very simple. So this is electric pg. I tried to make it very simple, especially the configuration if you use to write Ansible playbooks. It's just a simple playbooks, especially for LDAP and PostgreSQL.
37:24
It runs almost everywhere with minimal dependencies. We already have it installed with a very old CentOS systems and so on. Something important is that you don't have to use LDAP. You can just use LDAP to pg to write a simple YML
37:42
with all your rules and privileges and it does the synchronization and the checks for you. The basic logic of LDAP to pg is a loop to that introspect, progress, and LDAP so it's able to know that someone should not be super user or someone is missing.
38:04
And the rules are created first so it can manage the owners and then synchronize the privileges, grants, and revokes. Here is a simple YML. The first directory named privileges allow you to create a group actually of privileges,
38:21
connect and select our well-known privileges. Each privileges is defined as three queries. One to inspect who has the connect privileges and one to grant and one to revoke the privileges. Then you have the sync map which is a list. It's a bit like task list in Ansible. You just create a role named me.
38:42
You can add options and so on. And you have also a grant rules that tells grant privilege RO randomly to role me. I think it's pretty obvious. And one thing I made with LDAP to pg to be nice with DBA
39:02
so there is a lot of messages that are meaningful. They'll help you to debug without knowing how it's done. If you have a verbose debugging you will see a LDAP search command that you can copy past and that's on. Thank you.
39:21
I hope you enjoy it. Hello, my name is Konstantin and I want to tell you about using standby in production.
39:40
So first of all, the structure of today's talk is as following. First of all, did log on standby. Then we talk about statement amount detail and how it replayed on standby. And the third, vacuum process
40:01
and replaying truncating data file on the standby. So first, imagine you have two tables, items and options. Then you open a transaction on your master and alter table options. Then open transaction on standby and select something from items.
40:21
Then alter items on master and then try to select something from options. So here is added log and it's not detected till 10 version. And in 10 version, Postgres is successfully detected. Second, the deal statement and the deal
40:43
and we usually apply the deal. For example, altering table with options, statement timeout and the log timeout move. For example, 10 milliseconds. And when this is replayed on standby,
41:03
there is no such options. So if you have a huge load on your standby, there would be a lot of logs. So it is a problem. To resolve this problem, I suggest next case.
41:21
First, create HAProxy script to switch your traffic and stop replication on your active standby. Then apply alter command on master. Wait till this command is replayed on inactive standby
41:41
because we see in wall file and there is several access excessive logs and there is no unlock. And in our example, we have 75 wall files till unlock comes and it comes with commit.
42:01
What we can do with this? First of all, we can, I can suggest for example, options that can disable truncating data file for table. Or our colleagues from Postgres professional
42:25
suggest another, how to say it, way. They say that we should, community should try to decrease the number of logs on standby. That's all.
42:40
Thank you. Thank you very much. Next up, we have Grant Anetsuru, who will update us on the Asian community. Thank you. Right, sorry. And click it. Yeah, click here. Just click left and right there.
43:01
Hello, everybody. I'm Grant. I'm from, well, from China. And here is Isuru. Isuru Fujita from Japan. Yeah. Here, we too want to have some introduction to you guys for some events from Asia, especially China and Japan.
43:22
And because, well, you know that usually there is lack of voice from our country and there is a few people have a contribution to our community. And this, well, this is, well, not good for the users,
43:40
especially, we use Postgres, we want to contribute back to the community. But still, so we, here we come, so it's China Postgres QL Association. And if people, you guys, if you attend the Postgres Conference, US Intercity this year, we have set up a table there to promote the China Postgres Association.
44:05
And it was first last year in Beijing and COPU World Open Source Summit. And you can see that we have the official website here. And, sorry.
44:20
And the purpose is we want to promote the Postgres in China and make some more connections between the international community and the China community. And we are doing some support and services to the whole businesses in China. And we are doing some conferences and meetups.
44:43
And also we do some, make some Postgres book translation, publish them, so that to help the people to promote the Postgres in China. And, oh, we have the Postgres International Consultant Committee.
45:02
And now we have Joshua and we have Bruce, Oleg, Rui, Michael, you can see from US, from European, from different country, so that with the global, with the international community help to make the association more helpful in China. And next month, actually in China,
45:21
we have two events. One is the Open Source China, Open Source World Summit, and also together we are holding the Postgres QL Open Summit in China next month. And here is our contacts and the Twitter.
45:41
And I hope in the future we can get more interactive with the global Postgres community. Thank you. Let me introduce PgCon.Asia.
46:05
This is the second PgCon.Asia. This is a report on the second PgCon.Asia. Before the main conference, we had an un-conference like this PgCon.
46:24
And around 30 people joined this un-conference and discussed many topics such as Postgres sharing and Asian communities.
46:41
The main conference started with a keynote by Oleg from Russia and we had many talks by Bruce and Magnus and many other guys. And in day two, we also had many talks by many people,
47:02
including Dave and Ishii-san. We'd like to thank all the speakers. This is a ceremony at the closing party. And they are going to clock open Kasko Japanese sake
47:20
to make a toast with this sake. And yeah, splash of sake. Yeah, we all enjoy this sake. Finally, the announcement of the next PgCon.Asia. It will be held on December 10 to 12, Tokyo again.
47:41
So let's drink sake together. So I think that sets us up perfectly. Let's surprise their organization and all of us go there just to see what happens.
48:04
And also, every one of you should submit a talk to the conference so they get a lot of speakers. That'd be a lot of fun. Have we found our file? I think it's only one of the folders. Could you hit the top folder? Special offer for SanDisk customers, that seems wrong.
48:22
Yeah, yes, that one. Okay, let's do that one instead. You don't wanna do SanDisk commercial. No, I don't, I wanna do this one. Okay, then do this one. So I hit here? No, that's your mic. Okay, this is what I speak to? Awesome. You clicked on this thing and you speak into that the other way. Okay, cool, cool. Thank you. Hey, I was putting together a talk today and I realized if I change that talk a bit,
48:43
make it shorter, it would become a good lightning talk. And the name of my lightning talk is is PostgreSQL becoming the real-time analytics database? And to start with, okay, what is a real-time analytics database? What is real-time analytics? So this is an emerging workload for databases.
49:03
If you haven't heard of it before, maybe you heard of it in a different name such as an in-memory database or HTAP. And there are different use cases that fall into this workload. I'll think of like a customer-facing dashboard. You log into Google Analytics. It is, when you run a query,
49:21
it's going over billions of records and you want the replies in under a second. So that's the typical real-time analytics workload defined in these use cases. And there are a bunch of new databases, proprietary databases that are specifically built for this workload. What I'll describe in the rest of the lightning talk
49:40
is basically there are new features in Postgres and on the extensible extension framework make it very competitive for this workload. And I'll talk about five of them. So the first one by Andres is the just-in-time compilation for faster queries. Andres has been working on this since Postgres 9.6. I think he has a lightning talk tomorrow on the topic.
50:00
A talk, yeah, yeah, a talk on the topic. And the way you read this graph is if your workload fits into memory, basically these changes improve performance by two, three x for free. So the second one is using approximation algorithms for this workload. Think of the dashboard use case again.
50:21
So when you're serving real-time requests, you want to, you don't always have to provide exact results. There are extensions such as hyperloglog or top-end that enable you to provide approximate results but in real-time. The third one is Citus. It's basically scaling out Postgres.
50:41
In here I have a simple architectural diagram. You basically take your tables, you shard them. Behind the covers you have these tables, S1, S2, S3, which are each table is a shard. Two things to note in here. One is basically you typically want to shard on a granular level such as user ID or device ID. That way you can paralyze your computations
51:02
across dozens of machines. The second thing to note is basically why are you doing this? With this way you can keep dozens of terabytes of data in memory. And now you can use hundreds of CPU cores in parallel. The fourth thing, which is new in Postgres 10,
51:21
is native partitioning. When you think of the previous diagram or when you think of Google Analytics, one dimension of the data is user ID, device ID, so there is that dimension to it. A second dimension of the data is the time dimension to it. So you can partition your data by time so that your indexes remain local to your partitions and expiring data becomes simple.
51:42
And the common pattern that we see is you shard by a granular key such as device ID or user ID and then you further partition that data by time. What's exciting is that you can use these new features and then the extensions together. So a common pattern in real-time analytics with some of the proprietary databases
52:02
is you have some hot data that you frequently update. Say that's today's data, you do updates, updates and whatnot. And then after a day you basically close out your partition and rotate it into cold storage. And then with large data sets there's another Postgres expansion C store, C store foreign data wrappers.
52:21
You can basically sort that data, compress it with C store and then put that into a partition. And here's an architectural diagram from a different database vendor that talks more about this idea where at the top you have your relational regular heap based structures. You update this and after a while
52:41
you close the shard and then you basically get it into a columnar storage format where you apply compression. To conclude, Postgres is awesome. So proprietary databases built in these features tailored to real-time analytics workloads. There was a demo that we did at PGCon three years ago.
53:01
It was futuristic at the time. Thanks to new features in Postgres 10 and 11 and thanks to the flexible extension architecture, the future is here today. Thank you.
53:21
There's only one left, you'll get to go soon. Yeah, a lot of pressure. No pressure at all. More food. More food. So sticking with our food. Cool. Cool, so sticking with the food themes from today. First off, my name is Jonathan Katz. And I'm not gonna get as technical as Federico did in terms of implementing a food-driven design.
53:42
This will be more of a philosophical talk about something that's very important to, well, maybe to me. So for those of you who haven't met me yet, I'm from the city of New York and that becomes very clear quickly when we begin having a conversation. And something that's very important in New York is salad. Salad started off as something that was very simple
54:01
and it grew and grew to something that was very robust. You could get 50 different ingredients with it, you can mix them up in all different ways. And that kind of led to users expecting a robust salad experience to the point where they could overanalyze it and parse it and really demand what they want. And really what this is to say is that ordering a salad in New York City is no joke.
54:22
There is a lot that you need to consider in terms of requirements. It's not just putting it together, it's delivery, it's execution, it's getting everything there on time. And given how big New York is, you can run into various challenges. For instance, are all the ingredients available? If the place I get my salad doesn't have my wasabi kale Caesar dressing, well, I'm out.
54:44
Traffic also tends to surge. During lunch or in the work week, a lot of people want their salad and they want it quickly because, well, we gotta get back to work. There could also be performance bottlenecks. Maybe someone is really harassing the person who's making their salad, demanding no, I want this much dressing, this much dressing,
55:02
or like that tomato example up there. Or sometimes someone can't decide what salad they want, which in my opinion is even worse. Also, payments. People are fumbling around for their cash, their credit card's not going through. I mean, come on, I wanna get out of there, I wanna pay for my salad. Of course, uptime. We wanna make sure that the place is open so we can actually order it.
55:21
So the user experience isn't just about the flavor or the features of the salad. It's the whole thing. It's the whole package, the delivery. And the point of this is, how do they deliver this high-performance salad experience? So I actually researched this. Where I work, there's actually a whole cluster of salad places and I went in and I wanted to see
55:43
how could they do this. Actually, this entire talk was gonna be pictorial, but I got really self-conscious taking photos of people within the salad place, so I decided to make some diagrams instead for it. So first, to handle the indecisiveness and or the overcommunication, there's somebody that's preparing the transactions at the door.
56:01
You actually see this person right here and he or she comes up to every person and basically asks what they want. You get a little slip of paper that basically says exactly what your order is and you go to the next step. The next step is gathering the information or basically gathering the ingredients for your bowl. You hand the slip of paper to someone who basically runs around, gets all the ingredients
56:21
and hands you your bowl back. And there's multiple people doing that too, almost asynchronously. And you make sure that you keep everything, keeps all of your ingredients together. You're then assigned to a worker and there's six background working stations where you can basically parallelize chopping up and mixing your salad. And that's great because you're basically getting in and out
56:42
very, very quickly and it's very ordered. Unfortunately, there's still a bottleneck in the process and it has to do with the payment. Now, you can have multiple cashiers but there's potentially a lock there because if the cashier's one, the credit card's not going through, cashier two, the person can't find their money,
57:00
suddenly the queue is backing up and you can't handle the workload enough. It seems like we're stuck in this high-performance salad problem. But fortunately, thanks to the power of the internet, you can order online. You can asynchronously execute this process by ordering ahead of time and you're told exactly when your salad is ready. So you can basically optimize the trip from your office to the place to get your salad
57:21
and you get it and it's all great. But sometimes the system goes down and then they have to shut everything down and the restaurant's locked and you can't actually get your salad during operating hours. And perhaps you ordered it online and then you can't get your salad which actually happened to me
57:40
and I'm basically standing outside with the door locked, I see everything inside, I see my salad, I'm like, open the door, nobody's answering, but what can you do? The conclusion is that this robust salad experience, basically the salad industry in New York listened to the market. They basically saw that there was a really high demand
58:01
for it and as the process, as people kept ordering more and more in salad, they not only improved the process but they even automated it further and realized, hey, we can get this down from being a long process to very quick and make sure all New Yorkers are happy with their lunch. Thank you.