Logo TIB AV-Portal Logo TIB AV-Portal

I know greater-than-or-equal-to when I see it!

Video in TIB AV-Portal: I know greater-than-or-equal-to when I see it!

Formal Metadata

I know greater-than-or-equal-to when I see it!
Title of Series
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place
Ottawa, Canada

Content Metadata

Subject Area
A tour of the operator class facility A PostgreSQL operator class tells the database server how to sort, match, and index values of a particular data type, and an operator family ties related data types together for the purpose of these operations. They constitute agreements between data type authors and data type-independent code, with most data types defining at least one operator class. This talk will explore the theory behind operator classes and families, including the assumptions data type-independent code is and is not entitled to make based on available operator classes. We will walk through the creation of new operator classes, some practical and others deliberately perverse, and examine some exceptional operator classes already present in PostgreSQL. Writing a query expression such as "val greater than = 1" without knowing the data type of "val" is almost always a mistake. It will work for the vast majority of data types, but it will malfunction in rare cases. We will examine the use of operator classes to adapt such an expression to each data type. This talk caters to writers of code intended to deal with unforeseen data types. Application code such as replication systems and client interfaces are affected, as are backend features such as integrity constraints and join algorithms. As an author of such software, you will leave the talk equipped to audit for mistakes and substitute code that harnesses operator classes and families to process values of arbitrary type. The talk will also help prospective authors of new data types.
area classes choice enterprises Context matchings cellular interactive knot part Operations Indexable dual Computer animation software operations data types classes systems
Observation Context concurrent code time scans loss argument part independent Operations hash core square extent classes systems area constraints information systems The list bits Zugangsverfahren processes hash orders data types Results record classical classes Slides functionality non-existence Barriers table fields elements specific terms operations Indexable contrast structure association tasks conditions data types addition default comparison key information unique operations Databases applications timestamps classes Indexable words Computer animation Query functions topology statements key objects table family
Context equal singularities water Ordinary Differential Equations Operations different DAS errors classes systems The list open subsets category input organization sort data types systems Results record classes functionality server Regular Expressions The list inequalities number production terms operations Indexable integers man data types comparison focus Gleichheitszeichen operations total catalog applications classes inclusion redundancy Computer animation Query functions topology statements wireless discrepancy integers family
Observation Context code equal time singularities argument shape part Operations image orders different single pattern classes area The list sampling bits measures alternatives hash orders triangle pattern sort data types classes classical control Barriers diagonals patched characteristics number naturally operations Representation integers structure optimal comparison Multiple validation surface polygons operations volume limitations exploitation Indexable Computer animation different operating topology vertices speech table
classes Actions functionality decision singularities similar water part Operations orders memory operations case pattern classes systems conditions data types default comparison key Stream operations effects inclusion Indexable numerous Computer animation hash Query case topology orders pattern sort box data types distinguish boil Results record
choice Context Actions equal cloud platforms perspective Operations fraction sign PIT different case core sets scratch errors systems classes exception hard Development The list effects meetings scans category means processes hash orders sort boil geometric point Slides similar help CAP events rules templates goodness Authorization Representation ORM data types comparison default matchings Stream operations code applications SCSI classes software Query case functions topology statements libraries Observation code decision time directions acidication sets loss functions data replication part total Semantics strategy hash box series area Cover algorithm PCI 3rd backend open subsets configuration website Right data types distinguish record classes classical non-existence functionality server services inequalities objects naturally operations testing optimal conditions modules response validation Gleichheitszeichen expressive front end plan Indexable voting Computer animation key box
Context time singularities ones sets clients Operations splitting analog hash core classes recognition constraints information systems storage elements hash computer scientist input sort data types geometric classes web pages point functionality implementation server control storage elements number terms operations level integers conditions architecture data types default comparison response key demo logical consistency operations basis classes Indexable Location voting Computer animation logic Query case functions topology objects table integers
but also good afternoon my name is no match you little background about myself I made PostgreSQL letter and enterprise TV employee and here today to talk about plasticity operators class facility operator classes are part of the provision post rescuers made for its users to extend the postcritical indexing capabilities solves the tell about a few other aspects of PostgreSQL you'll indexing interactive upper classes heavily it will take a closer look at the SQL used to define operator classes and I'll discuss the implications of some the design choices you can make when defining the operator classes and discusses the context of familiar data types while operator classes did originate in the context of indexing system now uses them and several other areas of go through those to be aware of them and finally I'll be talking about the implications for the cell for you right and how you can make sure that your software makes the assumptions that it's entitled to make and no more
1 of the hallmarks of PostgreSQL on generals its extensibility and this is particularly strong in the context of its indexing capabilities there to relevant layers there the 1st is the index access method layer and the 2nd is the operator class the index access method where currently the 5 index access that use of the tree hash this gene and but it is theoretically possible that your own as a user but in practice they mostly been added to the core code as part of major releases some of you may have seen a talk just the last slide about a proposal to add to the x axis so expect that was to grow in the future but in the context of 1 major release that's that's the list as it stands today now in order to index values of a particular data type the system requires certain knowledge particularly that the index access methods do not know about specific data types instead they constantly consulted database object called and operator class to get the data type specific knowledge they need in order to index values of a particular data type now in contrast user extension in this area is quite alive and well if you find an extension of that an extension that adds a datatype place like PostgreSQL world trade barrier for pDx you regularly find that they include additional offers classes so that flexibility definitely being used in in the field what is an operator classics glue between an index access method and addicted to it provides the user access method with the operations of requires on that data types and the facts that requires in order to do the full task of indexing values of that data type specifically that the blue consists of functions and operators of the details of which are specific to each index access now for most of this talk obese focusing on the the tree index access method that is the most prominent and it's the d fault if you wish to create index statement without specifically naming different x axis that also the only 1 that supports uniqueness so it's the only 1 used in primary key constraints and the constraints at this time but if we as much time at the end of you know and a couple of other axis methods that so to give a bit of background on and how the tree works it creates a data structure and this boils down to a to a method for maintaining a sorted list and it does so in a way that's relatively efficient in the face of contrary modifications concurrent indexes and updates to that list so the data specific information required by the tree is the bill is that is a function that's able to place objects of the data type in a word pairs function function takes 1 2 arguments the datatype question returns an indication of whether the 1st is part of the arguments is less than or equal to or greater than sex we consider these SQL statements here each of these make some reference operator classes in order order to do this job when you should be table statement the requests primary key system what's up what's called the default the tree operator class associated with the datatype that call and associates that operated classic indexical parades in order to implement this constraint then whenever you insert new rows into a table course not index entries for each of those rows so in the 1st synset statement runs it's created the 1st row and the index there's no need to compare and since
the job of this B-tree indexes to maintain the values found in the colonies in order when the system executes the 2nd insert has to decide does january 1st 2015 come before or after January 1st 2014 so it's the comparison function found in that operator class that has the knowledge that January 1st 2015 is to be considered greater than January 1st 2014 what happens when a query for the Query planner it looks at the operators that appear in the where clause and it examines their membership operator classes to find opportunities to use and extends to satisfy the square in this particular query that less than operator happens to be a member of the same operator class used in the index that implements the primary key it's about observation tells the planet that an index scan on that index can possibly satisfy the square there's a tightly related concept operator classes called operator families and their role is to extend the operations that operator classes due to cover situations that involve more than 1 day when you're indexing you're only dealing with 1 type is a particular column has a particular data type that's a long term thing however when you're actually searching and indexed search condition may or may not be that exact same datatype and operator families adding in the concept of comparing between different they considered a slight modification of the queries we saw previous slide there's an operator family called time on which ties together all the comparisons between 3 data types that they data and the 2 times death as a result of the existence of an operator family the system knows that it can search through an index that was originally done and date value using a timestamp search now the
this is the SQL statement used to create operator family man created operator class this year is is simply a mimic of an operator class already exists in the base system namely the D. fall off the tree operator class for the integer data after some preliminary material it specifies the name the outside the access method which supplies the bulk of the material is a list of functions and operators associated ordinal numbers now the ordinal numbers that you'd expect to appear depend on the axis this is what is typical for beaches there's 1 mandatory function entry and 5 mandatory ob ob operator the function entry is 0 is that comparison function I mentioned earlier takes 2 values the datatype tells you whether the 1st is less than equal to or greater than the 2nd that's used actually maintain the it's actually find where when Roebel wireless relative to the other rows that already exist in that the 5 operators use for a different role there used to to decide when a particular query can use that X 1 a particular where clause is applicable search that text these 5 operators are expected to have the mathematical properties of the total water knowledge that is to say it each of them must be trans and given to input values you must get a true result from either operator 1 or operator 3 or operator 5 false result from the other 2 if you're saying have operator 1 and operated 3 return true indicating that the values about less than and equal to each other be an error in the definition of the operator class no it may seem kind of redundant to have to explicitly say that the operator can that looks like an equal sign is in fact inequality as well it turns out that the system otherwise does not make too many assumptions about the behavior of operator based on its name as far as the rest of the system is concerned you can create an operator that's named with equal sign and do multiplication the system has no problem with that it's when you actually specify operators in a crate operator class stating that the system is now given liberty to assume these have the mathematical properties you expect of Paris but for the rest of this talk and a couple terms fairly carefully when I refer to the equal sign operator I'm specifying the operator whose name consists of a single equal sign regardless of the behavior that it when I use the term equality operator I'm referring to an operator regardless of its name that behaves like a mathematical equality now when you look at it in this context you see you the create operator family statements pretty trivial to create operator class statements were all and things have a 1 thing to be aware of is that once the statement is done executed system catalog entries
have a difference of focus the system catalog entries a focus on the operator family so each of those up those operator alliance will create an entry the PGA M. catalog and any function 1 create an entry in the PGA and Proc catalog those will actually be tied up to the operator family and the operator classes assorted treated like leaves off the operator the this sort of discrepancy between the organization of the SQL and the organization the catalogs is probably a historical 1 operator families were introduced in 8 . 3 whereas operator classes in products beginning last and so when operator families are introduced the catalogs change more than the SQL Server text the system does not
limit you to a single operator class for each data type and access and if you define more than 1 you can only define 1 of them being the fall of the and find the others and that they represent in the context of Peachtree alternative sort orders for particular data now for some data types this is kind of a hard concept to that the integers have 1 natural source now you could do something crazy like order by the number of 1 bits and its representation or something like that but you really have to strain the think of another way to sort images that's not just silly but a datatype where there are multiple valid sort orders is the poly-A so consider these 3 shapes here and Nagin we created a table of columns and enter the vertices of the shapes in the polygons that column we like to venture a guess of which of these 3 shapes is going to be considered greatest according to the greater operate when applied the drawing triangles in Uranus why do you think that OK well aid anywhere else on gas under nobody gets in there and why is that well all very true observations and a trick question there is no there is no greater than operator for polygon that's all there is an equal sign up and it seems that have but you also have pretty much the kind of comparisons that that I think there are credible for these and thereby parameters well things like that is also comes up in the natural world questions like What is the largest lake well Lake Bikel the largest volume Lake Superior has a large area it even becomes of exotic measures like 1 surface diagonal which is 10 unique is the largest I namely hydrologists just think about this stuff sometimes that's how this flexibility is not something that's actually used to much in the court code and 1 example of it being used as a 90 fold operator classical text pattern parts this is a bit obsolescent even sensor 9 . 1 inches keyword but it too existed for a long time and still exist for backward compatibility support like the optimization like operator I think 1 reason that hasn't been popular it had non-default the tree operator classes is that you need a different list of those 5 operators for each 1 of them were released about 4 sometimes you can reuse the equality operator in less than half the trouble is that in order to have to let to less than operators that take the same data even need to have different names in not actually be the last example and they need to be in different schemas so there's really only 1 operator named people like to use when talking about less than that's less than current so I think that's been a barrier to sit exploitation of speech in a similar problem affects hash indexes where it's not a problem is for all the other operators classes Gene destined as the years don't have this problem that you necessarily need use different operators each upper class but hopefully if we have extra time i'll get into that moribund basically those operator classes you give the person defining the operator class more control over structural aspects of the index of convert sample had to operator classes supporting the exact same operators but with different performance characteristics as not something that arises in the context of the tree patch
as I said well operator classes originated in the context of indexing now ground affects several other
parts of system memory go through those are the most prominent is water by which users typically tree operator classes you right order by x the system looks up the datatype that's and looks at the the default B-tree operator classes that data and then uses the comparison function specified by the and operator class to put the values in order but there's also rarely use syntax to specify which operate class you want a little squiggle there is the less-than operator of the text pattern obs operator class that is twice a lesson operator it's listed as operator what you can sort of get the idea of why people don't like he's ultimate operator names convey the predefined we're SQL queries yeah well after Pearl you know about the stress with and there's a new way that operator classes and I can play play a part in order by introducing 9 . 1 I believe where an operator class can declare that if you have a query against an index column being applied to an operator that the index is then able to return values in order given by an operator this is added to support nearest neighbor queries so this is the k-nearest neighbor text search query the ordering by the some of similarity to a particular search conditions this is currently was supported by them just stop the class or to the guest index access but that may change another quite important use of operator classes beyond indexing is free quality is come up in numerous parts of the system things like union group by for example in processing group by the system receives a new row pesticide does this rope along with some existing group does row started down and have to make that decision and the system needs to effectively compare that value the group key to the group keys of existing groups the way it does that is using an operator class either of the tree operator class or hatch class now this is sort of invisible in many cases 1 case where it becomes apparent that something non-trivial is happening using dealing with the data type that can have values that are considered equal even though there distinguishable given example numerical type values 1 . 0 1 . 0 0 are considered equal that's a good thing because otherwise you'd have to say 1 of less than the other that is only a half but they are distinguishable that if you store them both in column and check the bytes stored and there's different bytes are in this region is that now the tree and hash operators classes of the numeric take notice they know that those values are equally distinguishable you see this in result of this so this is a select distinct query which is boils down to group by and as a result of those 2 being considered equal we only get 2 rows and the out if there instead being grouped on whether they're indistinguishable we'd see all 3 values again not actually a surprising result per say but 1 of the things that shows is that if you have a data type that has this kind of situation where values of distinguishable yet equal need be very careful about where you draw that line in and consider 2 values to be equal because that's going to have effect in all kinds of other parts of the system going effect by it's going to affect union of all these parts of the systems refer your default the tree operator class are going to make decisions based on fact yeah here of you will you may
actually know what the system does that I I would get you might just truncate the trailing zeros when it calculates the hash value basically the yeah the the default has shot glasses Americans you specifies its own hash function does more than just hospital raw bytes and you you may be looking at all of the you know that yeah this was on all of loss you would the yes that yeah I think that that is the difficulty in ever treating text values that are distinguishable is equal to the cover the hash function knows how to represent that of the the ash to the same value for the service right something that unfortunately picture is not as clean as I just painted it and that there are parts of the system that you might well expect to be the same as things like by unions are but they actually use the equal sign operators post using inequality operate these include expressions like in that is distinct from the case x when also and all if is in this category in that you might expect them to be doing and quality comparison but you actually just get equal sign operator whatever that may be now this sneaky because most popular datatypes have an equal sign operator that is also the quality of so you don't notice that it's a good test data types that exist exhibits the difference is the box the type of box datatype geometric representation of a box but it is no beef tree operator classes no hash operator class does have an equal sign operator which compares area so 1st of these queries use Foursquare users select distinct which uses the default be people hatch operator class associated with boxed oops there is none so great fails the 2nd example uses the in expression and here you get error comparison so you know this new these queries that any operators in all any like equal sign anywhere so this is just illustrating how the things that are introduced through through operators library classes can creep and other behaviors system no other major things that separate classes we use for joints so processing emerged join that's only possible because the observation that the join condition that's the ball the equal sign that query is a member of a B-tree operator class planner notices that has emerged on the list of things you can possibly do also uses the comparison function to drive of sort nodes in the event that actually chooses to do the we merge join similar thing with hash join and full hash operator classes fractions has suffered classes generally don't have the people want that's the exact same queries we saw the previous slide and here the the hash operator classes existence allows the planner generated the hash indexes themselves actual on this fast indexes those have been deprecated for some time however hash joins are not going to be deprecated anytime and academically foreseeable future so almost all data types that includes B-tree operator classes also include action of class if you're curious for testing purposes 1 exception to that is the money data types in the court system has a B-tree operator class does not have action class I'm not sure very so everything I said so far has really been focused on the perspective of the system what is the system do based on the upper classes that have been defined benefits about which is your application due to make the right decisions based on what I've talked about so far but if there's 1 lesson I can drive home 1 thing I'd like you to take away it's that anytime you write a equal b a equals sign be that only has well-defined meaning if you know the data types if you're using that as a template for values of unknown data type behavior that statement is not well-defined leads the system that makes no provision to keep that in the world so In order to start doing better reaction to answer another question 1st question is what we call the semantics does your application really need and the kind of applications and thinking of here actually things like object-relational mappers caching layers replication system things that deal with data types of the original authors may not have had in mind in and have access to because of the that that layoffs offers being employed site that has its own custom data type with the house and I tight about a couple notions of equality equal sign quality which you never want to use Beecher hash shoppers classic quality which is reasonable use used that many parts of the system is also kind of a 3rd century quality that I call exactMatch and this is only treating things is equal if they're in fact indistinguishable in this world 1 . 0 is not equal to 1 that is the system uses this in a couple places 1 of them is in the context of processing or row update it uses exact match a quality and indexed columns to decide whether a new index entry is required that's the hotter he only tuple optimization but it's also used in a couple places the Query planner and its use in materialized you refresh coming up for now the system actually implements that using a binary comparison of the data that will be stored on disk but if you want to use exact match a quality on in your code your application side so you can do it by doing a bite wise comparison of the output of the 2 data there are some caveats to this because their social service settings that change the output of values things like date style and extra floated it's so you make sure set those in a disciplined
way for going to this kind of comparison but that's how you do it the question to ask yourself is given as part of my software do I want to treat 1 . 0 and 1 . 0 0 6 so for example you caching layer and doing intestine elevation human error on the side and validation if you get values that are equal but distinguishable whereas in other cases like something that's going to be tied to a primary key index you might wanna use operator class-based quality instead you have the same semantics as the index you live in if you if I among those choices your choice is to use operator class a quality than this question of any action look up the operator which used to do that now if the code runs in the back and there's a module called typecast which is convenient API for retrieving it the algorithm is used boils down to look for a D fold the tree operator class if 1 exists using quote equality operator that operators class otherwise look for hash operator classes is its quality operate you can implement that you can implement that same are at the front end using a system cannot query David now I believe is not distinct from just uses the equal sign up and Gracia so yeah that similar names not similar underlying the data the the choice to use equal sign operators and things like is distinct from and in in I think was probably a mistake probably mistake too late to fix but I'm having for designing at all from scratch again would probably be systematic is 1 way to look aside the question of that's true yes it's intermediate yeah because at the making of the development of the material I've used right that you wanted use is distinct from around this problem was over all the those people in serve that of and finally note that not all types have the a notion of quality at all XML doesn't stay son doesn't for practical purposes you consider box doesn't because it's equal sign operators declared in the opposite class so days on the that's that's a big step forward but yet so whenever writing software like this have a fallback plan unless you're willing to just the support anything is the data that's not covered here but there are enough of them that's where that always 1 yeah i in in something like PLP do SQL code I don't think there is this found in seeking the release of her to find that out the rut run into the sand together with area that sounds like a good idea of what it was like those that apply to that that the that kind of thing that I think that the Kneser-Ney series is what's the date style problem stop so their islands of of there's another side it is the responsibility of voter implementing a data type an acid they a set of the opposite classes so as I said it's important to pick carefully where quality semantics opposite effect everything that provide great comparisons all that stuff but many data types that have sort of ambiguous choice in this area actually declined to include any kind of comparison function so it's not haptic it I do encourage you to make considerable efforts to do come up with the notion of a quality fidelity because without that can and that seeking group on it here is pretty limited many things to so even if it's hard trying to come the quality of dividing tree opposite class of if once you've done that you may as well including default hash upper class usually relatively simple at that point whether use up operator classes of other index access methods varies more in the nature of the type of a gene if you're if you're datatype has the nature of being a container like reaI XML also follows category that's a good clue to look at opportunities to add access method or excuse me aging upper class just it's a little hard to guess it's a little hard to generalize from what I've seen the development of GIS classes usually come the opposite direction that someone thinks up a new search strategy and says 0 what data types is search search strategy appropriate for an ad that I can give general rule about when you stop at class help that's the core of the material I wanted to cover reduced next time out that more than I can cover but I'd like to take a pause for any questions folks have this time in car there is a lot more now than the server that all the way to the end of but if you want to serve yes so what you do at that the people around the world and that was the well this is the guy is not necessary for 1
yeah that that's a good point and when you define medieval B-tree average class OK if you're less than greater than a somewhat arbitrary detectors at a reasonable sense of comparison that I think there are probably some other types of basis the same recognition step or the sheer yet a few more things you can check out if you like to learn more about this topic the 1st page listed here is the core documentation for about the 1st half of what I talked about today goes in more detail than I wanted to today great reference you like to learn about other operator class other index access methods and how the upper classes are created a red references the ones the ship with post rescue well and particularly for gene guests there these control bottles that we implement the treaty in terms of those of those access methods those great basic demos if you like to see sort of with guest is capable of that you can't just leave you with the tree geometry datatypes have guessed operator classes that illustrate that pretty nicely but if you like to have a sense of how to write the client there's actually piece of server that recommend looking at namely the foreign key implementation if you start at the server function called a at foreign key constraint you can see how it picks the operators that will later use to consistently implement back strain I was quickly go through a couple other access methods where they're operator class looks like hash is the simplest of the access methods always as one function 1 operator of the operators of quality the function is a hash function in the same sense that any computer science textbook discusses discuss attached a little more interesting is gene operator class gene operator classes of for mandatory functions but no thing that always differs about them is almost always different about them is that the storage clause of genus targeted towards cases of objects that have substructure object that can be split into several what's called the so for example raise this is an American built and operated class for gene index integer race the storage clause indicates the data type keys that are extracted from the input values so start with an input of an array of integers splits keys each key is a particularly integer the architecture of a gene index is of the tree of features so the top level of the tree is indexed by keys and each key is then associated with the B Tree of locations in the in the table that contains somehow at all and that's a there are some simplifications with in in the gene implementation of the cases that have fewer keys but that's the general case features feature function 1 serves the exact same function and serves the tree access had to it a compares 2 values of storage time and is used to maintain that top-level features in almost every case you can simply use the same function 1 using the default the tree operator class of your storage to functions 2 and 3 year responsible for splitting in input value into its constituent keys like splitting an array into the elements that erases speak most interesting function is function for the so called consistent function this plays a role is actually doesn't have any direct analog in the context of the trees in that this function allows the set of operators supported by gene operator class to be flexible and not be understood in advance by the access so there can be any number of operators name gene operator Alaska 1 operators doesn't operators gene itself has no preconceived notions about what operators will here it's actually this consistent function that's responsible for understanding all the operators specified in this opera last so when a query is run in the planner determines the gene index can satisfy that query every time you want to ask the question is this particular search key consistent with this particular value in the the index that is good this this search condition return true given this particular search value of this particular index entries calls that consist of function passes the ordinal the operator that's being used in this particular search and that consistent functions were responsible for the logic of interpreting the operation represented by the operator and this is kind of what I was talking about earlier about how you have considerably more control the context the mn puts us about time so any any last questions 1 thank you for coming and going to oppose the thank