JSON[b] Roadmap
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 32 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/52138 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
PGCon 202016 / 32
2
8
16
20
22
24
26
28
31
00:00
Computer animationMeeting/Interview
00:51
Computer animation
07:14
Computer animation
13:22
Computer animationProgram flowchart
18:00
Computer animation
19:31
Diagram
20:34
Computer animation
21:54
Computer animation
31:09
Computer animation
40:24
Computer animation
49:39
Meeting/Interview
51:02
Computer animation
Transcript: English(auto-generated)
00:09
Hi, I am happy to speak at PGCon 2020 virtual edition conference. Today, I will talk about JSON B roadmap.
00:23
I am Alek Bartanov, I work at Moscow University and Postgres Professional Company. I worked with Postgres since Postgres 95 and here is a graphical illustration of project in Postgres, of my project in Postgres.
00:41
The first one is introducing locale support to Postgres when Postgres became international database. My colleague Nikita Goluchov, he is core developer at Postgres Professional Company, he is Postgres contributor and principal developer of SQL JSON.
01:01
Also, he contributes a lot of features to the core of Postgres. I and Teodor Sigayev, we introduced some SQL features in 2003, long time ago, when we developed H store data type,
01:21
which is a Perl-like hash storage without nesting, without arrays, but it was binary data type and it has indexing support. So H store became very quickly a very popular extension and many people enjoy those features in Postgres for many, many years. It's still popular extension.
01:42
In 2012, in Postgres, we got JSON. JSON as textual storage with some validation support. But in 2014, I and Teodor Sigayev, we introduced JSONB, binary storage with nesting objects and array support,
02:07
a lot of indexing and it becomes very popular extension, very popular data type in Postgres. In 2019, we got JSON pass,
02:22
Pg12, we got JSON pass support, which is a part of SQL JSON standard. It has also indexing support. So Postgres, NoSQL Postgres, actually has a long history. Here's some very brief overview of JSON in Postgres.
02:44
Actually, we have two JSON data types. One is textual JSON, another is binary JSON. Some people said better JSON. This very simple select illustrates the difference between JSONB and JSON. We see that JSON preserves white spaces, order, duplicated keys,
03:11
but JSONB doesn't preserve white spaces and no duplicated keys. Last key win, all keys are sorted, sorted by length and then key name.
03:26
JSONB has binary storage. That means that we don't need to parse every time we access JSONB. It has indexes support. It has a lot of functions and error operators to access keys.
03:43
So JSONB has a great performance, thanks to indexes and performance is comparable to the very popular NoSQL databases. Also, there is a JS query extension, which provides JSON query language with gene index support. Postgres becomes very popular since introducing JSONB.
04:04
We see on this plot that popularity of Postgres rises since introducing JSONB. Other databases remain some or a bit degraded, but Postgres is growing and growing.
04:26
Success of Postgres, success of JSON in Postgres somehow influenced SQL committee and SQL Foundation recognized JSON. And in the fall of 2016, we got specification of JSON support in SQL.
04:48
So it describes a reduced data model, JSON pass language. This pass language describes a projection, a part of JSON data to be used by SQL JSON functions.
05:05
SQL JSON functions, they consist of construction functions. So values of SQL types converse to JSON values and query functions where JSON values converse to SQL types.
05:21
In PostgresQL, we choose to support JSONB as a practical subset of SQL JSON data model with ordered and unique keys. It's a practical, it's a good approximation for SQL standard.
05:40
And we first started with the implementation of JSON pass language because this is most important part of SQL JSON. And it was committed to PG-12. SQL JSON function is also very important, but we already have constructed functions, which is sort of a wrapper around JSONB construction functions.
06:04
We need to implement query functions, and also we implemented indexes. Actually, we can use already existing indexes, built-in index access method and JS query of classes.
06:24
We need to add support of JSON pass to existing classes to support some specific features of JSON pass. But actually, this is what we have for now. JSON pass query language is flexible query language, which specify the parts of JSON.
06:51
The syntax is very familiar to the application developers, so it use dot notation to access members.
07:03
We have dollar sign, it describes the current context element. We have access to the arrays and elements of arrays. We can use filters and item methods. This is example of how JSON query, JSON pass works.
07:26
In this example, we consider floor or house with two floors with apartments, and we want to find the apartment, which has rooms in specified area, more than 40 and less than 90 so medium size room.
07:46
So on the first step, we have dollar sign, which, which is a JSON itself. We see the picture, the graphical representation of this house.
08:00
On the second step, we got array floor. Then we got array of two floors and apartments. It's two arrays of objects, actually apartments in each floor. On the fifth step, we extract five objects apartments. And then each apartment filtered by filter expression. And the result is a sequence of two
08:27
exchanges so not items, you can see on the green. We see that this apartment has room. At. And this apartment has room with area 60.
08:41
So this is how expression works. SQL JSON standard conformance is in progress is very good for JSON pass. We support 15 out of 15 features of JSON pass, and the positive 13 has the best implementation of JSON pass.
09:01
If we compare comparison with Oracle MySQL and SQL Server. More information about JSON pass we can, you can see from my talk at Milan last year. Here's a link. Just click JSON be indexing has.
09:25
We have to have classes, one is default JSON be ops and other JSON be hash ops. So this slide shows how actually key and well you call actually Jason be index.
09:43
So, Jason be ops extract keys and values separately. It provides more than all operations top level exists operators, operators contain separators, but overlapping of large postings for keys and values, maybe quite slow.
10:07
Jason be hash ops extract hashes of us. So it supports only contains operator, but it much faster and smaller than default of class for contains operator, and the performance is comparable to the famous no SQL databases.
10:27
So now we're going to talk about roadmap ideas. So first we will talk about generic JSON IP and discuss the possibility of grant unification
10:41
of to Jason data types. Then we'll talk about Jason be partial decompression SQL JSON functions. Jason pass as parameters for Jason be of classes, just query genome cluster core Jason pass syntax extension, and simple book notation access to Jason data.
11:05
This is just not just ideas, most of this item, or most of these topics. We already have prototypes working prototypes, and I will. In the last slide, you will see the links to the GitHub repositories where you can
11:25
try, you can play and give us feedback, or your thinking, we are open for discussion. So let's start from generic JSON API. I would say, just on.
11:41
And the grant notification. Why just on. We have to Jason implementation of two data types and to limitation of user functions. You see the screenshot. You can see the lot of Jason functions, and much more Jason be functions.
12:02
And we see the some overlap of these functions so it's. We have a lot of the maybe a lot of duplicated functionality, some code. And if we introduced support of Jason and SQL JSON. So, we will need a lot of work duplicated work. So we don't want, we need some unified interface.
12:35
Also the coming SQL conduct will specify JSON data type. For example, Oracle 20 see in preview already introduced Jason data type, and they introduce also binary format, or song.
12:51
And for the sake of compatibility with SQL standard. We need just one JSON data type. And we wish that it will internally be Jason be by default, and somehow we could specify to behave as all textual Jason.
13:10
So for example the syntax maybe looks like. Jason with some modifiers. So, here comes the idea of generic JSON API generic JSON API, or do song.
13:30
So the current, the current JSON API is looks like this, we have two separate API for Jason and Jason be Jason has lecture and parser with visitor interface and Jason be use Jason the lecture and parser for input and several functions and iterators
13:48
for access. This makes this makes it very difficult to implement Jason pass functions. So we support only Jason be in Jason pass.
14:01
Not easy to add some new features to Jason, like partial decompression the toasting different storage format. So that's why we come to the new API. JSON, which based on JSON API. And it allows easy to implement new features and add Jason support to it. Jason path, and it's called Jason.
14:33
So, using this JSON, we removed old code for JSON functions and operators.
14:43
Jason, and Jason be user function, have only separate entry in which you put that on, wrapped to the JSON structure, but the body of the functions are the same they just call generic subroutines. So SQL JSON use this generic subroutines engineering JSON pass API.
15:07
Using JSON, it's easy to add some new features. For example, partial access to the compressed data, or partial dictionary decompression. So if Jason be stored.
15:27
We'd compress, compress. So, keys, or we have separate dictionary for keys. So for example, to implement some new feature we need just to implement container ops interface.
15:42
Actually, it has two mandatory methods in it, and it greater in it. Other methods could be used from default implementation. Using JSON is just a step forward to unified JSON data type, because it unifies implementation of
16:03
user function, but actually doesn't solve the problem of unified JSON data type, as required by SQL standard. So we have three functions, three options. Do nothing. JSON as from standard, but application will be slow, because JSON is a plain text.
16:28
Second option is introduce SQL JSON compatibility mode, where JSON becomes is alias of JSON be. And the third option is more right option is to implement pluggable storage method for data type, which doesn't exist in positive.
16:53
In case of JSON, it means that only one JSON data type exists, and several storage formats like text, binary, compressed binary, and one set of functions.
17:06
So, we see the direction of development from one to three. So, at first stage we do nothing. So JSON application will be happy but slow.
17:24
Then we introduced, we need to discuss how to make it right compatibility mode. And so JSON becomes alias of JSON B, and there will be no sub, no, in that case, there will be no JSON.
17:44
And then parallel, implement pluggable storage. And that will be we will be happy. I don't know how many releases, it will require, but it looks like a good plan.
18:01
So example of implement, example of using JSON is a JSON B partial decompression. Let me explain what it is. Current JSON B works with fully decompressed binary JSON B data. So it cannot benefit from partial PGLZ compression, decompression.
18:24
With JSON API, it's easy to implement partial decompression with resume support for JSON B. And we modified PGLZ code to provide ability to caller, to preserve the compression state between successive calls, so we can resume decompression.
18:48
And prototype currently support only partial PGLZ compression. But in principle, it is possible to use partial get-toasting. Even future get-toasting with slicing.
19:03
Here is example. We generate a key table with 100,000 JSON B of size 128 kilobytes, which compressed to 1.2 kilobytes, so no toast.
19:22
And another table is long keys. So JSON B will be toasted. And here we measure the performance of access to the different keys.
19:40
And on the left picture, we see the no toast and right with toast. We see huge difference in performance to access key one, the first key, and the last key. And we see that partial decompression can greatly improve access to the keys.
20:08
In both cases, in toast case, effect maybe is lesser than no toast case, but still big.
20:22
So this is just free benefit from using partial decompression, which based on generic JSON API. If we compare JSON pass and error operators, we see that for multilevel, multilevel, for nesting JSON B,
20:49
we see that JSON pass is much faster than error operators, because this is a violet.
21:02
This is a JSON pass, partial key, key one, and green, and the upper one is a partial error operator.
21:25
Because in the JSON pass, we can do the decompression inside function. But between calling operators, it is impossible, so we have slow access.
21:44
So this example illustrates one of the benefits of using JSON pass. So a summary of JSON B partial decompression is that accessing data in the beginning of compressed JSON B is much faster,
22:05
without any modification of user function. JSON support can be added later. Multilevel JSON pass is much faster than chained error operators for compare JSON B. Since JSON pass functions benefit from PGLZ resume between levels.
22:26
It's possible to add partial de-toasting, prefix, and slice. So now we have life hack. Life hack is that keep important keys in the beginning,
22:42
and access to the keys will be much, to these keys will be much faster. SQL JSON function. SQL JSON function consists of construction functions. Actually, this is wrapped around existing JSON B functions.
23:04
You see that we have JSON B object, which construct JSON object. We have JSON B build object functions. JSON array, JSON array aggregate, JSON object aggregate.
23:21
Also, we have JSON retrieval function. JSON value, which extract SQL value of predefined type from JSON value. JSON query, it's extract JSON text from JSON, from JSON text using SQL JSON pass expression.
23:41
JSON table, query JSON text and present its relational table. Is not JSON, the test and JSON exist. And the most important is JSON table. It's a relational view of JSON.
24:01
So here is an example how to get rooms from JSON in relational form as a table. So you can use a result of this select in others selects.
24:24
For example, to join with other tables. So SQL JSON actually is very nice to have, but mostly we interested in JSON table.
24:40
Another topic is parameter for op classes. Parameter for op classes is operator classes is a collection of access method, set of operators, and some access method specific support function. For example, you see the names in bold. This is an op classes.
25:01
When you create index for B3, for example, you can specify a specific op class. For JSON B or for, you can use JSON B ops. This is a default op class, you can omit it. But if you want to use a JSON B pass ops, you should specify it here.
25:24
We need to extend in this infrastructure because op classes, they have hard coded constant. For example, just op class for tier sector or for arrays, they use hard coded size of signature or binary signature.
25:44
Op classes allows you to define these constants outside of op classes. And this, for example, signature size for different data may be different. And this was committed to PG-13.
26:03
Also, we can use different algorithms to index. It depends on the data. For indexing of non-atomic data, Postgres has non-atomic data type like arrays, JSON, JSON-D, TS vector, and other.
26:20
We can specify what part of column and value to index. It's not like partial index. It's sort of functional index. But we don't need to use exact expression in SQL as for functional index.
26:41
Because op class already knows what to index. But with functional index, op class doesn't know. It just index what comes. More details about op classes, about parameters for op class, you can see from our talk at PGCon two years ago.
27:06
Here are the syntaxes for parameters for op classes and examples how to specify signature. For example, for int array, we can specify signature 32 bytes.
27:21
And default new range is 100. For indexing JSON-B, we can specify which part of JSON-32 index. In this example, we see parameters called projection and some JSON pass.
27:45
Op classes parameters can be used in JSON-B op classes for selective indexing of parts of JSON-B document. We implemented projection parameter for in core op classes for JSON-B.
28:04
Examples, here is examples. If we use JSON pass extension, we'll talk about later, for sequence contraction. We can index multiple paths, multiple branches of JSON-B3.
28:22
Here is example, we have projection parameter, which describes two branches of JSON-B3. So, ID and rows. This example use IMDB names table.
28:42
Let's compare full functional and selective JSON-B indexes on IMDB titles. Full index syntax is very simple. Functional index, we have to create function special title ID kind here.
29:02
We want to specify that we want to index ID, kind, and year branches from the IMDB table.
29:21
And create index using these functions. The last example is selective index, which use parameters for op class. We specify projection. As you see, we don't need to specify any functions. The syntax is very clear.
29:42
And JSON-B pass op class knows about these parameters, knows what to index. Resulted index and build time is as expected. Full index is much bigger and takes a lot of time, much more time.
30:03
Functional index and selective index have the same, more or less, the same size and the same building time. Let's see the examples. Let's first the performance of queries.
30:22
Let's consider a simple query which contains only index pass. So we want to find the titles of movies, of episode, kind episode, and released in 2016.
30:46
So the first full and selective indexes, they have the same syntax. And here is the example of execution plan for selective index.
31:06
We see the time is 386 milliseconds. If you use the index, index, and do some recheck.
31:21
For functional index, query should include index condition. So we should use specific, we should use the query the same as we used for the creative index.
31:43
And execution time is almost two times more. It's about 600 milliseconds. Because recheck is more difficult, functional recheck. Situation becomes more interesting when we have a more complex query.
32:06
Where we have indexed conditions and non-indexable conditions. For example, let's add title, some title to the query. And we know that index doesn't contain, doesn't know about title.
32:24
So for in case of selective index, nothing change in the query, we just add title. And we have index scan, and recheck, and recheck because index doesn't know about title.
32:43
So this recheck just filtered by title. And we have 382 milliseconds. But if we write the same query, previous query for functional index, also add title, this will not work.
33:03
We'll have zero results because this is a wrong query. We should use very different query. Simple query and add different operator for title.
33:25
For filtering title. And then this query will work. And execution time again slower than for selective indexes. Selected JSON via op class support various user separators, additional non-indexed conditions logically ended.
33:47
Not or but ended, as we've seen on the previous slides. It supports array subscripts, some strict locks, index query pass combinations.
34:01
It doesn't support yet filters, expressions, and item methods. So the main conclusion for parameter for JSONB selective index is that query, much simple.
34:23
You don't need to invent special syntax as in case for functional index. And it's not even, not only faster, but also more, more faster, more performance, performance. JS query extensions has some indexes, which we would like, which we want to move to the core.
34:48
One JSONB pass value op, it supports exact and range queries on values and exact pass searches. So as you know, the in core op class is JSONB op class. It doesn't support this.
35:06
So JSONB value pass ops, it supports value search exact and wildcard pass queries. And new one, which we implemented recently to support JSON pass.
35:25
It's the same as JSON pass value ops, but ignored array pass items. And this greatly simplifies extraction of JSON pass queries. Also, JS query provides debugging and query optimizer with hints, but this will not move to core.
35:45
Here is example how JSON pass value ops is slow for long logs pass. Simple table with empty JSONB.
36:00
We have one million empty JSONB. And we create index using JSONB pass value ops and try to find all rows, which satisfy this condition with this JSON pass.
36:20
Very simple. We have plenty of time, half of milliseconds and three milliseconds execution time. This is quite slow, because for this JSON pass, op class should extract 124 entries. We have gene debug JSON pass value functions in JS query, which shows you all entries.
36:49
Obviously, the query should be slow. If we add support of JSON pass, which is JSONB locks pass value ops, it doesn't account arrays and pass.
37:08
And the same query runs much, much faster. So, planning time is 10 times faster and execution time is 120 times faster. Because for this JSON pass op class, JSONB locks pass value ops.
37:26
Extracts only one entry, as expected. Not 100, 1024. So, this is very nice addition to the existing op classes.
37:43
Plan your support functions, plan your support for JSON pass. Recently, in PG-12, API for planning support function appeared. It allows transform functions to operators expressions.
38:02
And this operator expression can be accelerated by indexes. So, we implemented this support for JSONB. And JSONB pass match function, for example, transforms to this operator. And this operator, for this operator, we have index. And what's also important is that it pass variables to operators.
38:27
Which is impossible, but JSONB pass match, they accept variables. Here is an example. We have this query.
38:42
And we see the execution plan. We have index for this column in table T1. Actually, this is cell join. And we see that function JSONB pass match converted to the operator.
39:05
And to this operator, parameters pass. So, for each row, we have sequential scan on second table.
39:23
And this value transforms to the constant, which can be used for an operator. So, this operator, JSON pass and be adverse, these functions, they produce a constant.
39:46
And we will have index conditions, indexable expression, index separator. Note, it's very useful. And we won't plan to support it, to commit it, to progress.
40:05
JSON pass syntax extension. A-squared-squared standard describes syntax for JSON pass. But we miss some very obvious functionality, some very convenient functionality.
40:22
So, we add, we added, we want to add to the postgres. For example, array construction index. So, we, from array 1, 2, 3, we want to add two members from left and from right, from begin and to the end.
40:41
This syntax is very easy. It looks very natural. The same with object construction index. Sequence construction index. Very natural, and I think that is very useful. I remind that sequence construction index is useful for specifying several paths to the op class.
41:06
Here is an example of this sequence. Object and array construction syntax useful for constructing some derived objects. For example, if we can see that, if we consider this query from INDB database,
41:30
we got execution time around 13,000, about a second and a half seconds. If we don't use this constructor and use some standard syntaxes of which, specified by SQL standard,
41:49
we see the, we can write this query, but it is also slow. It's much, it's longer, more complex, and it's also slow.
42:01
If we don't use SQL JSON functions and use a row operator, we get even more difficult query, and even two times slower query. So, syntax extension is not even, not only more convenient way, but also sometimes more faster way.
42:31
We can use array and sequence construction syntax for simplification of inquiries, for example.
42:43
Another syntax extension is object subscripting. So, we can say that looks also natural, and that object subscripting easy to use it.
43:01
Array items with lambda expressions. If you know what is a lambda, you understand that it is very powerful way to work with JSON. So, we have map, reduce, fold, max, function support already.
43:21
We can use sequence functions with lambda expressions. Again, map, reduce, fold, max, and I think that this extension would be very useful for developers.
43:44
Simple dot notation access to JSON data. Sometimes, it's much easier. For simple queries, dot notation can be much better than array operators and functions. For example, we can write j.k1.k2 instead of these two row operators, or instead of JSON query functions.
44:12
Access to array could be more easy, more familiar than using array operator and JSON query.
44:23
So, this is idea we did not try to implement it. I know that Oracle already implemented dot notation. So, it's just an idea. So, what's the status of this roadmap?
44:41
Generating JSON API would be nice to implement for pg14. Here's a link to the repository at GitHub. The grant unification is not clear. Need help? We need to discuss to elaborate our proposal what to do.
45:08
JSON departure decompression also would be nice to have in pg14. Also, we have a prototype at GitHub.
45:20
SQL JSON function from SQL standard certainly should go to pg14. We have a repository. We have packages published in hackers mailing list. We have long discussions. So, invite you to join this discussion.
45:44
pg14 planner support functions also would be nice to have in pg14. We have also prototype at GitHub. JSONB, JSON pass as parameter for JSONB of classes.
46:02
We want to commit to pg14. We have also a GitHub repository. Gs query from G classes from Gs query should be easy to commit to pg14.
46:24
Here's a link to the GitHub. JSON pass syntax extension also would have repository and also would like to commit to pg14. Simple notation access is just idea.
46:43
Another idea is copy with support of JSON pass. We don't have any prototype, but it would be very, very nice because if you download JSON data from internet, they usually consist of array of JSONs.
47:05
And it is not, there is no easy way to, for example, to load to progress. If we specify copy with JSON pass, you can easy load to progress specific parts of JSON data.
47:25
Not all, but some specific parts. So, I think that idea is worth to think and try. We need some, we need developers. So, as you see the roadmap, and all this project roadmap is more or less. Okay, good.
47:47
We work on it for several years. A lot of work was done, a lot of work needed. So contact me and Nikita for collaboration. This is a open.
48:02
Is this, this is a positive community project. And we need more developers to implement all this together we can do anything. So, references. You can download slides from the links above.
48:24
Then read about standard from technical report. My introduction to JSON pass. Just create the extensions, a link to parameter from classes, and you can download IDB tables to play with JSON.
48:41
And here's a whole JSON in all this logo, transform to the transform. First we, we have a elephant with the trunk. In the form of JSON logo tip.
49:00
Then I got a year to have brackets around elephant to say that this is some JSON, because people don't know what that whole JSON logo, it looks like. And then, Dennis Rosa, I posted it in Twitter and Dennis Rosa,
49:27
provide the feedback in, in the form of new in my, my elephant with the ears in the form of bracket. Thank you, Dennis Rosa for this. And this is the last slide of my presentation.
49:44
I like it because contemplation. Sometimes it's just a start for the big book for the big things. Again, thank you for attention.
50:01
I hope that we can we will work together on improving those cool progress. And I want to say that you see this guy, which is starring. And this is how I look after finishing slides, this nice.
50:22
The mountain in the background is the Macaulay. The one in the fifth highest altitude, a few highest mountain. This picture I made in 2014. When we introduced Jason B.
50:43
And then we went to the big track big adventure in Nepal mountain, this is a baron the plateau altitude is more than 6000 meters. Enjoy this picture.
51:01
Thank you for attending.