Rethinking JSONB
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 29 | |
Author | ||
Contributors | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/19140 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Ottawa, Canada |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
PGCon 201517 / 29
1
2
3
6
10
11
12
13
14
15
17
18
21
22
23
25
26
29
00:00
James Waddell Alexander IIQuantum statePattern languageComa BerenicesMaizeSubject indexingExtension (kinesiology)E-textInterior (topology)Exponential functionRegulärer Ausdruck <Textverarbeitung>Data compressionBuildingType theoryRange (statistics)Generic programmingArray data structureOperator (mathematics)Function (mathematics)Query languageQuery languageEquivalence relationExistenceFunctional (mathematics)MereologyRootKey (cryptography)Network topologyPrice indexBookmark (World Wide Web)Complex (psychology)CASE <Informatik>Revision controlCondition numberElement (mathematics)LengthExpressionScalar fieldPower (physics)SupremumLogicIterationPairwise comparisonConstraint (mathematics)Data typeString (computer science)Instant MessagingSpring (hydrology)SummierbarkeitFloating pointObject (grammar)Boolean algebraTypprüfungSimilarity (geometry)Product (business)Table (information)Local GroupPort scannerDigital filterNP-hardMemory managementMountain passOrder (biology)Form (programming)Reading (process)Operator (mathematics)Physical systemSign (mathematics)Array data structureNumberCartesian coordinate systemElement (mathematics)Condition numberPort scannerTable (information)ExpressionQuery languageSubject indexingDifferent (Kate Ryan album)Formal grammarType theoryRevision controlKey (cryptography)Order (biology)Pairwise comparisonData compressionExtension (kinesiology)Potenz <Mathematik>Similarity (geometry)Single-precision floating-point formatRow (database)WritingLevel (video gaming)Computer configurationPrice indexDemosceneGradientPoint (geometry)QuicksortSoftware maintenanceMedical imagingAuthorizationStudent's t-testVapor barrierView (database)Set (mathematics)Covering spaceFlow separationBinary codeExecution unitWordLocal ringDistanceInstance (computer science)System callCombinational logicIdentity managementFood energyParticle systemPresentation of a groupLogicComputer animation
08:44
Element (mathematics)ExistenceOperator (mathematics)Single-precision floating-point formatRegulärer Ausdruck <Textverarbeitung>Complex (psychology)Query languageElectric currentImplementationSubject indexingTable (information)Price indexLocal GroupPort scannerDigital filterMemory managementDrop (liquid)Network topologyForm (programming)Exponential functionDefault (computer science)Array data structureScalar fieldLogicAliasingComa BerenicesKey (cryptography)DivisorNumberObject (grammar)View (database)CASE <Informatik>CountingExtension (kinesiology)Formal grammarSet (mathematics)Mountain passSubject indexingForm (programming)FrequencyMusical ensembleFood energyDifferent (Kate Ryan album)MereologyExecution unitExpressionFilm editingLibrary (computing)ImplementationObject (grammar)WebsiteWeightPrice indexElement (mathematics)Query languageArithmetic progressionSeries (mathematics)Condition numberCoordinate systemNumberCurveAreaConstructor (object-oriented programming)CASE <Informatik>Solid geometrySlide ruleSign (mathematics)Functional (mathematics)Instance (computer science)View (database)Complex (psychology)Multiplication signSet (mathematics)BitSummierbarkeitRaster graphicsArray data structureLimit (category theory)Mechanism designRule of inferenceInequality (mathematics)Inverse elementSequenceNatural languageMessage passingSimilarity (geometry)DigitizingStaff (military)Operator (mathematics)Key (cryptography)Port scannerFormal grammarSocial classAliasingDivision (mathematics)Computer animation
17:28
Data typeSocial classAutomatonTable (information)Formal grammarOperator (mathematics)Function (mathematics)Interior (topology)File formatDefault (computer science)Data storage deviceConsistencyLibrary catalogPhysical systemSubject indexingRegulärer Ausdruck <Textverarbeitung>Complex (psychology)Key (cryptography)Port scannerZugangsverfahrenInterface (computing)Network topologySimilarity (geometry)Price indexProduct (business)Memory managementIntegerOperator (mathematics)Subject indexingSet (mathematics)ExpressionProduct (business)Query languageSource codeInequality (mathematics)Equaliser (mathematics)FamilySelf-referencePhysical systemMathematicsParticle systemInterface (computing)NumberTheoryFormal grammarSocial classEstimatorData compressionFunctional (mathematics)Image resolutionVideo gameTerm (mathematics)Instance (computer science)Memory managementBoss CorporationKey (cryptography)Asynchronous Transfer ModePairwise comparisonCuboidArithmetic meanCASE <Informatik>Gene clusterQuicksortGroup actionWebsiteSimilarity (geometry)RoutingTotal S.A.Table (information)5 (number)MereologyInternetworkingSequenceRange (statistics)AreaEuler anglesParameter (computer programming)State of matterSlide ruleNetwork topologyData storage deviceSpacetimeArray data structureMathematical optimizationFile formatFinite-state machineGraph (mathematics)ChainPort scannerArc (geometry)PrototypeSoftware testingPlanningRaster graphicsComputer animation
26:12
Subject indexingNetwork topologyKey (cryptography)CurvatureMemory managementData storage deviceProduct (business)EmailFile formatBinary fileDressing (medical)VarianceNumberData compressionPeg solitaireTable (information)Serial portData dictionaryExtension (kinesiology)Different (Kate Ryan album)Pairwise comparisonMetric systemRaw image formatEstimationSubject indexingSheaf (mathematics)Key (cryptography)Presentation of a groupDisk read-and-write headMereologyData compressionSlide ruleCASE <Informatik>Atomic numberCompass (drafting)EstimatorDifferent (Kate Ryan album)Extension (kinesiology)Negative numberPrice indexProduct (business)AreaPairwise comparisonData dictionaryDomain namePrototypeNP-hardBoolean algebraSocial classInstance (computer science)Form (programming)Data storage deviceSpacetimeState of matterReal numberNumberData structureLimit (category theory)Core dumpPhase transitionConcentricCellular automatonMixed realityTable (information)RadiusSoftware developerGodFunctional (mathematics)Graph (mathematics)Binary codeResultantOverhead (computing)EmailIntrusion detection systemWordRepresentation (politics)Compact spaceNumeral (linguistics)Array data structureSelectivity (electronic)Computer animation
34:56
Pairwise comparisonData storage deviceFile formatDifferent (Kate Ryan album)EstimationProduct (business)Price indexIntrusion detection systemMemory managementSubject indexingData typeIntegerDefault (computer science)Key (cryptography)Network topologyTable (information)Data compressionSocial classOperator (mathematics)Function (mathematics)Library catalogPhysical systemRegulärer Ausdruck <Textverarbeitung>Complex (psychology)Interface (computing)ZugangsverfahrenPort scannerElement (mathematics)View (database)CASE <Informatik>CountingScalar fieldNumberObject (grammar)Coma BerenicesAliasingInterior (topology)AutomatonFormal grammarFunctional (mathematics)ExpressionInterface (computing)Element (mathematics)User interfaceLibrary catalogPhysical systemQuery languageSocial classInstance (computer science)MathematicsTerm (mathematics)Price indexFrequencySet (mathematics)FamilySlide ruleEuler anglesNetwork topologyBeta functionParticle systemFlow separationBoss CorporationSystem callInequality (mathematics)NumberForm (programming)Data structureText editorLogic gateCartesian coordinate systemArithmetic meanoutputMixed realityAdaptive behaviorGene clusterParameter (computer programming)SummierbarkeitMultiplication signGraph coloringGradientRule of inferenceOpen setSubject indexingCuboidCASE <Informatik>Extension (kinesiology)Software developerDemosceneEndliche ModelltheorieTask (computing)ResultantPower (physics)Operator (mathematics)Validity (statistics)Electronic mailing listChainFigurate numberTouch typingPlotterDecision theoryGroup actionFormal grammarTable (information)Row (database)QuicksortPrototypeField (computer science)Range (statistics)Matching (graph theory)Data storage deviceCore dumpData managementSound effectReal numberLimit (category theory)Standard deviationKey (cryptography)Hacker (term)Goodness of fitRegular graphPlanar graphSoftware bugAutomatonPlanningObject (grammar)Revision controlFeedbackComputer animation
Transcript: English(auto-generated)
00:00
So in this talk, we will share some of our thoughts about how we can rethink JSONB, because JSONB was already released, but it has some problems. And we are going to discuss them today.
00:20
A little about us, I believe everybody knows us already. So at first, we will talk about querying problems. And the second, we will talk about possible compression of JSONB. JSONB was released in 9.4.
00:43
And there is some built-in query capabilities. There is a set of operators, contains operator. There is also operators to get values of particular keys. And you can create a gene index on the whole JSONB.
01:02
Or you can create expression indexes on particular keys. Simple expression, how one can query a JSONB array. Just using contains operator, you can find that in array of text there is particular item.
01:24
No problem so soon. But for instance, another example, for instance, you have JSON which represents company. And you would like to search for companies where CO or CTO is called nil.
01:43
So you can try to write such expression. First contains about first name, yes, and two other contains operators about the title for person and company. But what actually such query will return,
02:03
since all of contains operators are processed independently, then actually it could be different persons. So actually, you found the companies where there is CO and CTO. And someone different can be called nil. So it's not the query we wanted.
02:25
The correct version looks like so. No, you should duplicate condition about person name. Yes, and then query works OK.
02:40
But you have to duplicate. And when you have to express some complex logic, then the side of expression could grow exponentially. There is another approach to express such query.
03:00
You can write exists, sub-select, and use JSONB array elements function, which returns the JSONB array, which is what, into table. And then you can write expression about particular row of the table.
03:23
But a problem of such approach is that it lacks of indexing support. And also, it have quite heavy syntax. So we had to choose between these two option.
03:42
And then we create another option. We create an extension JSQuery, JSONBQuery, which is available at GitHub for 9.4.
04:03
JSQuery, it have a much operator. So you put the whole query into a single JSQuery value. So like, TSQuery for T-search encapsulates the whole query. Then in a similar way, JSQuery behaves.
04:22
And there is indexing support for this one operator. JSQuery supports all the basic capabilities for querying. It's allowed to query any check expression for any element of array, for any key of object,
04:43
for anything in JSON at any level. It have special dollar sign to refer current elements. So in this expression, the both condition with dollar is about the same element of array.
05:03
Next. It have, there is a grammar. It have all basic operation for scholars. It have a path to refer different values of JSON. On GitHub, you can find quite comprehensive read me.
05:27
Also, it can be used for schema checking because there is a is type operators, which can check that particular values of JSON
05:41
we have particular types. And you know, as it is comparison of queries written in different manners, so JSQuery provides quite a short and suitable form for query.
06:03
Nice. But you know, there is still a problem in JSQuery. However, it have indexing support. It's rich enough for typical applications. But it's not extendable because JSQuery have just some fixed grammar with fixed number of supporting operators.
06:27
And you can't just write, just create in Postgres new data type, new function, or new operator, and use it in JSQuery.
06:40
If we would like to support it, we would have to duplicate the half of Postgres inside JSQuery to support system catalog, planner, executer, and so on inside JSQuery. We are not going this way. The right way is to support all the features at SQL level.
07:02
And then I would like to notice that this problem isn't new. We already have all of that problem with arrays and hstore. But on JSONB, it become even more urgent.
07:21
And some example about arrays. You can create table with arrays, create gene index on it. And if you search for query conditions that any element is equal to 10,
07:42
then you have a sequential scan. And you know, in different other combinations, you also have a sequential scan. Index is not used. But you know arrays have some special operators, overlaps,
08:04
contains, and so on, which can express the same things. And in this case, index would be used for such queries. Next. Why we have to do that?
08:23
It's because PostgreSQL planner can use index for just particular forms of expressions where column operator value, and order by column, and the KNN. And in order to index be used, we
08:43
have to make our query look like the first, look like the column operator value. So it was done for arrays. There are some operators, overlaps, contains,
09:01
contained, and the query matching operator in query. Next. But for instance, there are some other expressions. You can search for element which is greater than 10.
09:22
You can write it using any syntax. But for instance, if you like to search for element between 10 and 20, you have to write subselect. You can't write any a between 10 and 20, because any should be always on the right.
09:41
That's a nice clutch. But in subselect, you are OK, but you have heavy syntax, and you have no indexing support. Yes. But GIN could support such queries.
10:05
It's no problem for GIN. It's a problem of infrastructures that planner can't pass such expressions to access method. So this is a picture what we have with GIN and GIST
10:22
indexes. So column operator value is supported, and others are not supported. Next. And what we would like? We would like to support different kinds of expressions with our indexes.
10:42
The next. And also, we have similar problem with HSTOR. In HSTOR, you can, for instance, create GIN index and if you get the value of key and then checks for equality,
11:00
GIN index wouldn't be used. It would be sequential scan. Next. But if you use contains operator, everything is OK. It's bitmap index scan using GIN index. Next. Next. But if you create an expression index,
11:22
then contains operator wouldn't use. And the scan, it would be sequential scan because expression should match exactly. So if you have index on key 0, then you have to expression on key 0, not contains expression.
11:42
Next. But in this case, on expression index, you can use an index scan with such expressions. And also, you can use index scan for inequalities, which is not possible in GIN index
12:03
because inequalities can be expressed in contains operator. So this is the picture which we have with indexing HSTOR. And you know that in this case, we can say that SQL becomes not so declarative
12:25
because there is different indexes and you specify which index can be used by your expression. You should write the expression in one form for one index and expression in another form for another index.
12:42
Next. And the situation we would like to have when at least GIN and GIST supports such expressions. But we are not sure about expression index for B3,
13:00
should they support contains or not. But it's not the main problem. Next. OK, and we have quite similar situation with JSONB, which is just cumulative
13:24
of the problem of arrays and digital. Next. Next. So in JSONB, we have following consequences of subselect.
13:43
We have no indexes support, and we have heavy syntax. But it would be nice to work out these two issues. Next. Next. The first of our proposal is a new syntax
14:01
to query JSONB and arrays in a quite short and suitable way. It's available in our GitHub. So the idea is to write some condition
14:22
about any element or each element of array and give alias to the element of array or value of object. And then write an expression about such alias. Next. Some examples, query for elements divisible by three.
14:47
Or query for elements between 1 and 10, just for arrays. The next one, search that all scholars of JSONB are numbers.
15:09
Search that there is in JSONB array, there is at least one object. Next slide.
15:21
Or the example I gave before, it search that in any department, any of stuff satisfies a given condition. And since we give this alias S for a particular person,
15:48
then both of these conditions would be about the same person and everything would be working OK. Next. Another example, find the department
16:05
where all the staff have a salary more than 1,000. Now there is a construction each element, which means that each element of array should satisfy a given condition.
16:22
The next slide. Actually, the implementation here for now is just a syntax sugar. For instance, if you create a view on such condition and you try to view a definition, then you can see that inside there is actually
16:40
an unnecessary element function, and then it's some complex condition to aggregate it. But what we actually want to do is to allow index usage for complex expressions
17:02
and even sub-selects. What is specified in the class? In the class, we specify some set of supported operators and set of functions, which is needed for access methods.
17:22
Actually, operator class specified to us some grammar of which expression can be used by index. Now this grammar is trivial. It's just set of common column operator value expressions.
17:45
So you just specify an operator class set of operators, and then optimizer recognize such expressions in queries. But we would like to specify another grammar
18:03
to support more wider queries. The idea is to use a grammar for all classes. So in such case, we would have automaton,
18:25
which have transitions between sort and destination. And then, such operator class can recognize the sequences of operator. The next one.
18:42
For instance, how can operator class for a store looks like? There is an operator to fetch values. And then, it changes state from one to two.
19:00
And in the state number two, we support inequalities in inequalities with particular key. And then, we are going to the final state. And other operators, which is working with the full store, they don't specify these parameters.
19:25
This means that they adjust from one to zero, from the initial state to the final. The next slide. So how we look on the graph. So there is a source and destination.
19:42
We can apply the set of operators directly to hStore. Or we can fetch some value. And then, for this value, specify equalities or inequalities.
20:00
In this case, we can support the wider range of expressions for hStores. And if we extend this schema even a little more, we could have also the support of subselects
20:22
to allow particular unnest functions, which is specified in op class, to be used in index expression. So we can index a chainable operator. Yes. And index it to search chains using index. So you're going to have not just chains of two,
20:42
but chains of three. Yes, yes. There could be self-referencing arc for, for instance, JSONB, which could have unlimited depth. Then you just have self-referencing arc and everything OK.
21:03
But, you know, there is a lot of work to achieve it. We are working on some prototype with hStore, which is somehow working. But there is a lot of changes to make this work. At first, we need to change, to make these changes
21:22
in the system catalog, which is not a big problem. Then we have changed the planner to support such complex expression to actually evaluate automaton. And the question here is, how high could be our head?
21:42
Planning our head for complex queries. This requires a lot of testing. And also, we have to change the interface of access methods. Because, you know, now, executor passed to access method array of scan keys. This means that such scan keys are conjuncted with
22:05
and operator. And each of scan keys is a column operator value. But we need to support complex expression. And then we need to pass not the array of scan keys,
22:22
but something called scan tree. And also, we have to change operator classes as well. Because operator classes interface now also assumes
22:41
that there is a column operator value. And it passes them separately to supporting function, not all together. So this interface also require a work.
23:00
So we can see that there is a lot of work. But it would solve our problems with JSON-B. Yes. And also, it would help us with our long-term problems with arrays and HSTOR. OK. And also, we could index OR.
23:22
Yes, yes. You know that in GIN, you can't pass the OR operator between two expressions into GIN. AND would be passed, but OR would be not. And the problem is that GIN can evaluate OR quite
23:41
efficiently. But when you have OR, you have a bitmap OR, which is much less efficient than it would be done in GIN. OK. So this work was supported by Wargaming. And we have another topic about JSON-B compression.
24:13
Yes. The benefit of this work is that it is a ground for the worker. Yes, yes. Because our work for worker indexes
24:24
was bound by the infrastructure problems. You know that this is one step to build the required infrastructure for worker. OK. And about JSON-B format compression, for example,
24:45
we can compare storage of Citus data set in JSON. It's given in JSON. In route, JSON have about 1 and 1.5 gigabytes.
25:08
For instance, if we import it into JSON-B, we would have about 1 and 1.5 gigabytes heap size, for primary key, and for instance,
25:22
GIN index on product IDs, about 90 megabytes. The next one. And another approach, we could store it into just flat tables. So for similar product IDs, which was an array, we would have a separate table.
25:43
So it would be master detail with foreign key, et cetera. And then we have about 1 gigabyte of the total heap size.
26:01
But also, we need to be able to just assemble the whole document by product ID. We would have similar product battery index, which also about 300 megabytes,
26:21
and also index on product ID, the next one. And another approach is to put similar product IDs into array, and it appears to be the most compact way to store this data in Postgres.
26:42
Next slide. So there is a comparison of the graph of fully normalized third form for storage for arrays and for JSONB. We would say that JSONB have about same size, about then normalized way to store this data.
27:06
But array is the most compact, appears to be most compact way to store this data. And why JSONB overhead in storage is so huge in comparison with array?
27:21
The next slide. So there is a part of document. I got just parts to fit it on the slide, and the binary representation. At first, we have quite large headers, and then we have quite large section for key names,
27:45
and then a relatively short section of actual values. So we see that overhead of the headers and key names is relatively huge. The next slide.
28:01
And we have the first idea about how to compress JSONB. This idea is to maintain dictionary of all the keys of JSONB. Then we can reference keys just by the numbers.
28:24
They are identifiers. Compress headers using word byte encoding, and also compress small numbers using word byte encoding instead of store as a full number. But the negative aspect of such approach
28:42
is that we have to maintain the dictionary of these hemas. The next slide, please. There is a prototype of JSONBC extension. We just have a table for dictionary. For now, it's just a global table.
29:01
And in C, we have functions to get ID by key name and get it back. Return name by key ID. The next one. So in JSONBC, we have quite a compact way for compression.
29:21
It is the same document as a few slides before. The data is about the same, but the numerics are compressed. And headers also take much less, because they are word byte encoded.
29:42
And headers directly reference keys by the IDs. So quite compact way of representation. Are you going to store that dictionary outside the field? Outside, yes. But when you have a lot of documents, then the size of dictionary is very small compared to it.
30:06
And you can see how it takes a customer's review for one year in JSONB, in array. And in JSONB, it takes even less than in array,
30:21
because in JSONBC, there is a numeric compression. Then it takes even less of space. The next one. We have another idea. Another idea is to extract everything
30:42
except the raw data into schema, and maintain the dictionary of schemas and the documents to only raw data and the reference to the schema. There is also some negative aspects. At first, we have to maintain dictionary of schemas.
31:02
And the second aspect is that there could be possible very many schemas. In the worst case, it would be as many schemas as documents. For instance, even if you have a limited number of keys,
31:25
if every document has its own structure, you know it can be unlimited depth and so on, then you could have an enormous amount of schemas. So in the first approach, you have the risk
31:40
that you could have very, very many keys. Then in the second approach, this risk is even larger, because number of schemas can be very large. The next slide. And we have not yet even prototype,
32:03
but the estimation, how compact could it be? For customer reviews, we have 16% space saving in comparison
32:20
with JSON-BC extension. So we adapt if this approach worths the effort, or we should just concentrate on maintaining the dictionary of the keys. Yes? Because with your dictionary of schemas,
32:42
in turn, you use dictionary of kings, or are you just going to have them? So we know this talk is about rethinking, and we are just thinking between two possible approaches. We didn't select anything particular yet, but in future discussion, we can probably
33:03
find something even better than both of them. I'd say one of the other problems with the schema one, and to give you an idea of the use case that would break this, because we actually have it in production. Presumably, if you're storing schemas, people who don't store null keys,
33:22
that is, if they have no data for a key, they don't store the data at all, that's a new schema. So if I have keys one to five, that's one schema. If I have keys one to four, that's a second schema. If I have keys two to five, that's a third schema. Yes. This is a use case, when will you store in keys?
33:41
Yeah. Yes, then you have unlimited number. So yes, if you have values stored in keys, then you probably don't need this way of compression. Just the current JSON being. I mean, I think the most promising approach is probably the strike dictionary of keys.
34:01
Probably. And that doesn't run into Josh's problem. OK, probably will show me. How many different we have seen so far before keys, and that's very degraded cases,
34:21
the use cases, that at most, you can see thousands of different schemas among millions of documents. It's not well. Well, you'll taste all the keys. I mean, there's effectively destroying data in keys. The result is that it was worth, for this particular use case. It's only a table, it's a general solution.
34:41
Hold on, let me see that. No, I'm just kidding. Let's say it's at least thousands, and possibly a lot more than you. Mm-hmm. Is it so? Oof. 72 million is for that particular use case.
35:01
Interesting. Actually, they can't have, yeah, so actually, it's 50% as many schemas as they have documents. And that's just because when they don't have something for a key, they don't store the key in there. But if you're storing something as a JSON schema, a missing key makes it a different schema.
35:22
So you have to change it to store null keys. Is this popular with this? I don't know. Mm-hmm. OK. OK, so we need to start some work in this very hard plan
35:51
about querying. We would have a prototype of this approach for querying
36:05
for hstore quite soon. But for now, it has a lot of clutches. So scan trees isn't implemented yet, and so on. So the interfaces of access methods and operator classes
36:28
need some rework. And I would like the other hackers of Postgres
36:41
to join us with their ideas about how to implement it better. So we need your feedback. What do you do? We show you the problem. No, there is a lot of challenges. The problem appeared in ancient Postgres
37:04
when it was introduced early in the time. With hstore, we improve the situation. We opened a box of Pandora. And with JSONB, we have a problem in all outside.
37:21
We open these problems. It's burden. So do you understand the problem, huh? Yeah, so this is a real problem. And if you want to have Postgres extendable real way,
37:42
if we support JSONB, some complex object with some structure, we need to understand what to do. For 9.4, we have JSQuery, which is a kind of workaround problem for some time.
38:01
But in the long time future, we have to solve it and have extendable way to query JSONB with indexes support. The problem is that there is no standard for the query language. Recently, I found one proposal to SQL standard,
38:23
but it looks like a limited version of JSQuery, the same approach. So I don't think it's good. Another approach, which is closer to our element, is college DB. Nickel languages, N1QL, they have more or less
38:45
similar language, but it doesn't support regular tables. I mean, I like the idea of building the language. Yeah. Probably, you know, the proposal syntax is just sugar. So if you suggest something.
39:03
Yes, because in general, we have to support subselects for index search. In the proposed syntax, we have keys any, which means that one element of the set
39:24
satisfies expression. We have each, which means that every element of set satisfies expression. But one could need, for instance, 3 of 10 to satisfy given expression.
39:41
And you can't just express it in any, or you have to write many, many expressions. Then we have to support subselects in some general way. Before, we have an idea to have a separate planar
40:04
node for any element. And for this particular separate node, we don't have indexing support. But when we thought about different possible situations,
40:22
we decided that the right way is to support subselects to be accelerated by index. In the slides, the proposal is about extending operator class,
40:42
but also it could be possible to don't extend operator class, just create a new function in the interface. For instance, access method would have new function.
41:03
I am can search. And this function gets expression on its input and return whether it can search this expression using given index or can search. This is alternative approach.
41:20
And then access method could pass that expression to the operator class. So planar ask access method, access method asks operator class, and then go back. But in this approach, the details would be hidden into interface function.
41:43
You would see no structure of the queries in the system catalog. But if you specify operator class for system catalog like this, then you would have visible set
42:03
of possible chains by the operator class for planner, executor, and so on.
42:44
Index support for common table expressions. So in general, what the queries like this means,
43:10
there is some function on nest which turns some container into elements. And then there is some aggregate.
43:23
Aggregate. If you have an extendable manner support custom aggregates, then we will be fair. So you can create custom aggregate, for instance,
43:43
three of them, or custom aggregate which have several parameters of how many of elements should satisfy. And then add support of such aggregate into operator class.
44:01
So the idea is to add into operator class as well as just an operator spin which could be in the chain. Also, the pairs of a nest functions and aggregate functions. And then it would be fine.
44:24
Any other questions? So is JS query, the JS query that you have on GitHub pretty much ready to go? Yes. OK. Very short question. Yeah, but you haven't been talking much about it.
44:43
And so I thought it was still heavy development. Yes, we have pretty many talks about JS query. And then in the current talk, we said about JS query not so much.
45:00
But it's quite ready. You can check it, send the bug reports if you find some bugs and we will fix them. We don't want to have JS query in core.
45:21
Yes, because in core, we would like to have some extendable way for querying JSON. But I can use JS query right now. I mean, assuming you want to get out of this direction, what we're going to need to do
45:40
is to sort of work out what can be done in terms of manageable chunks rather than land everything in one great big hit. Yes. So the possible way to, at first, don't touch subsillates
46:01
and just support the chains of operators. For instance, just for each store. But it's still quite, it could be quite heavy, quite heavy, because we still need the interface changes.
46:25
We actually could, the interface changes could be done before the support of operator chains. For instance, we can support countries
46:42
which are passed into access methods. And then we also have some benefits. For instance, GIN and O. It's some benefit. Or probably when we change the interface of GIN operator
47:02
classes, we could have effective range search for B3 GIN. You know that if you find by B3 GIN column greater than 0 and column less than 10, it will find all the values
47:24
which are greater than 0, all values which are less than 10, and then overlap them. It's because all the quals are passed into a class separately.
47:40
So GIN class can find that it's a range. It's C just, yes, C just two inequalities separately. And then GIN do two-partial match and do intersection itself. So it's very inefficient. If we could pass the whole structure of an index searching
48:07
query into op class, we would have these benefits. So we could do this in the following step. At first, do the infrastructure changes for access methods
48:23
and operator classes. Then do support of chain operators. And then do support of subsyllabs. The problem is we don't know what the overhead in player
48:41
could be. But we hope that Yes, we need to test it on complex queries. We hope that player will get already passed. Yes, because you know what we specify in op class? It is a kind of grammar.
49:02
But that grammar works not on raw text. It works on already parsed tree. And evaluate, for instance, small automaton over parsed tree would be not so hard.
49:22
We hope that. Thank you.