We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PostgreSQL in Django 1.8

00:00

Formal Metadata

Title
PostgreSQL in Django 1.8
Title of Series
Part Number
14
Number of Parts
46
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Among the topics are: A survey of the new Django 1.8 PostgreSQL features. Using migrations with PostgreSQL in interesting ways. Real-life applications of the new field types. Basic model design for good performance on PostgreSQL.
Point (geometry)ApproximationDatabaseData structureData typeOrder (biology)Transformation (genetics)String (computer science)FrequencyLevel (video gaming)Filter <Stochastik>Dimensional analysisIntegerSubject indexingBitBoolean algebraLine (geometry)Extension (kinesiology)Moment (mathematics)Projective planeSubstitute goodTable (information)Link (knot theory)GodAreaSystem callQuery languageSimilarity (geometry)Matching (graph theory)Operator (mathematics)Human migrationNormal (geometry)Error messageField (computer science)Musical ensembleLaptopSocial classCuboidSequelWordArray data structurePredicate (grammar)Reverse engineeringStress (mechanics)Equaliser (mathematics)Insertion lossElectronic mailing listExpressionSoftware frameworkSurfaceGraph (mathematics)Bit rateOpen sourceView (database)Disk read-and-write headEndliche ModelltheorieElement (mathematics)ArmLengthMultiplication signMappingRight anglePattern languageMechanism design1 (number)Enterprise architectureRow (database)MultilaterationQuarkCore dumpRevision controlRange (statistics)Three-dimensional spaceComputer fileBlack boxCoefficient of determinationRaw image formatPort scannerComputer animation
Medical imagingDatabaseData structureData typeFormal languageMathematicsComputer programmingOrder (biology)Relational databaseNetwork topologyString (computer science)Wave packetLevel (video gaming)Dimensional analysisIntegerSubject indexingMathematical optimizationForcing (mathematics)Matter waveMereologyPhysical systemResultantCore dumpSampling (statistics)Table (information)NumberLink (knot theory)DataflowQuicksortQuery languageGoodness of fitRange (statistics)Operator (mathematics)Water vaporNormal (geometry)RectangleInstance (computer science)Field (computer science)Point (geometry)GenderHash functionMoving averageoutputData storage deviceArray data structureOnline helpOptimization problemInclusion mapEqualiser (mathematics)Electronic mailing listRadiusExpressionView (database)Disk read-and-write headEndliche ModelltheorieKey (cryptography)Different (Kate Ryan album)TimestampBlock (periodic table)Archaeological field surveyElement (mathematics)Classical physicsLengthMultiplication signProgram slicingSpacetimeRight angleService (economics)Associative propertyFacebookDampingAverageMatching (graph theory)Error messageArmOverhead (computing)Raw image formatMobile appComputer animation
Attribute grammarDampingDatabaseRow (database)Data typeImplementationOrder (biology)Product (business)Level (video gaming)Control systemIntegerSubject indexingInfinityConnected spaceLoop (music)Extension (kinesiology)Pairwise comparisonTime zoneQuicksortQuery languageMatching (graph theory)Range (statistics)Operator (mathematics)Human migrationCASE <Informatik>Field (computer science)Bound statePoint (geometry)Codierung <Programmierung>Electronic mailing listGraph coloringCanadian Mathematical SocietyView (database)Endliche ModelltheorieNP-hardKey (cryptography)MiniDiscObject (grammar)Single-precision floating-point formatIdentifiabilityContext awarenessMultiplication signBoundary value problemRule of inferenceSpacetimeRight angleDefault (computer science)ThumbnailSoftware developerMedical imagingData managementFormal languageString (computer science)Axiom of choiceBitGroup actionMereologyPhysical systemSlide ruleScalar fieldScaling (geometry)DataflowSystem callTranslation (relic)PressureFood energyRankingGenderoutputOpen setData storage deviceSequelInformation overloadData conversionChemical equationReflection (mathematics)Open sourceSelectivity (electronic)TimestampPattern language1 (number)Computer animation
CodeDatabaseData structureData typeLie groupOrder (biology)Relational databaseCombinational logicLibrary (computing)Level (video gaming)INTEGRALIntegerSubject indexingArithmetic meanBitUniqueness quantificationExtension (kinesiology)MereologyScalar fieldTable (information)Query languageRevision controlRange (statistics)Operator (mathematics)Human migrationException handlingSystem administratorField (computer science)Data storage deviceCartesian coordinate systemExclusive orEqualiser (mathematics)Module (mathematics)Stability theoryGUI widgetKey (cryptography)Different (Kate Ryan album)Representation (politics)Wrapper (data mining)Object (grammar)Single-precision floating-point formatMultiplication signRule of inferenceWritingRaw image formatMedical imagingNetwork topologyString (computer science)Inequality (mathematics)Physical lawDoubling the cubeDiffuser (automotive)Power (physics)Slide ruleHydraulic jumpNumberSystem callSurgeryGoodness of fitFamilyWater vaporProcess (computing)Instance (computer science)Food energyInheritance (object-oriented programming)GenderSpecial unitary groupArithmetic progressionStandard deviationSpacetimeRight angle
CodeVideo gameDatabaseRow (database)Data typeRelational databaseFeedbackCombinational logicString (computer science)Level (video gaming)INTEGRALSubject indexingBitBoolean algebraLimit (category theory)Structural loadExtension (kinesiology)MultiplicationCore dumpTable (information)NumberQuicksortComputer reservations systemSystem callQuery languageSimilarity (geometry)Exception handlingCASE <Informatik>Error messageField (computer science)AdditionPoint (geometry)Cartesian coordinate systemArray data structureData conversionExclusive orPredicate (grammar)Insertion lossJava appletExpressionOpen sourceCondition numberEndliche ModelltheorieKey (cryptography)Different (Kate Ryan album)Multiplication signWritingRight angleSoftware developerMathematical analysisWorkloadNatural numberOrder (biology)SoftwareNetwork topologyFunction (mathematics)Semiconductor memoryProfil (magazine)Programmer (hardware)IntegerGeometryAxiom of choiceComa BerenicesTheoryQuantum stateSalem, IllinoisCellular automatonPlanningRange (statistics)Operator (mathematics)Human migrationBound stateVector potentialSummierbarkeitInheritance (object-oriented programming)Social classGenderVotingSpecial unitary groupGradientSet (mathematics)SequelInterior (topology)Run-time systemRoundness (object)Stress (mechanics)ArmProgram slicingComputer animationLecture/Conference
3 (number)Computer animation
Transcript: English(auto-generated)
Okay, I'm going to plunge right on in because we have so much stuff to talk about.
Specifically, we're going to plunge right in on Django 1.7 introduced native migrations, and Django 1.8 introduced Django contrib Postgres. And there's so much cool stuff in here and so little time, so let's plunge forward. The first thing I want to say is most of what I'm going to talk about in Django
is either a Kickstarter or Indiegogo, I forget, I think it was a Kickstarter project for it, who deserves endless praise for these wonderful new features. So first I'm going to talk about migrations and everyone's going to say, wait, migrations aren't Postgres specific, why are you talking about migrations? And the reason is that a lot, some of the stuff we're going to talk about,
they're an enabler for other things, and Django 1.7 migrations are just amazing. So if you haven't used them, just go to your laptop, I won't be offended, move to using migrations. And thanks Andrew for those. So a quick overview for anyone who hasn't used the 1.7 migrations
or have been treating it like a black box. Migrations are built around this notion of operations. Each operation in a migration moves the database back or forward through its schema migration timeline, so it adds a field, adds a table that corresponds to a model, does something like that. There are lots of individual operations,
but there are two super interesting ones for our point of view, which are run SQL and create extension. These are operations that are built into, that come with the Django framework, core framework now, and you can use them in your own migrations. Run SQL, probably you can guess what it does.
It applies raw SQL directly to the database, and now everyone's saying, oh god, it's one of those database guys talking about raw SQL, I'm going to lunch again. But really, work with me here, it's very, very handy. Specifically, it's very useful for things you can't do directly in using the models yet, like creating indexes for some new types
we're going to talk about. There's also a create extension operation, which runs a create extension command. You can sort of pattern as forming in this naming convention. Specifically, the extension mechanism in Postgres, which was introduced in 9.1, something like that,
it's a little bit like pip for Postgres, is it allows you to add packaged extensions into your database. Previously, you had to drop a .so file here and run the SQL here and do all this stuff, and it was just horrible. Specifically, we're going to talk about this extension, hstore, which needs to be added to your database before use.
It doesn't come with the core Postgres installation. A lot of this will make more sense later. At the moment, I'm just teasing you. One thing to notice about hstore is if you have it, it adds a query to every time you connect to the database, but we'll talk about that also. So, that's all you need to know
about the new extension facility. Let's talk about 1.8. 1.8 adds some new field types, which are array fields, range fields, and an hstore field. 1.9 is going to add some more cool stuff, but this is what we get in 1.8. So, an array field. In Postgres, since pretty much the dawn of time,
arrays are first class types. You can have a single column that is an array. Now, old school database people, like their heads explode at this point because this is denormalization and this is horrible. Forget those people, they're old and in the way. This is a really, really cool feature.
An array field is now a field that you can import from Django contrib Postgres, and it lets you use those directly. Before, there were tons of fields, of things you could find on Django snippets and things like that that would let you do it, but now it's fresh out of the box. And these map into Python arrays. Why I said array here is because I'm old and in the way.
What I mean, of course, is a list. One thing to remember is that Postgres arrays are of homogeneous type, unlike Python lists, which are not. So you pick a type, ints, string, something, and that's what you get in a Postgres array. Python will let you put, of course,
anything you want in a list, but don't do that if this is what you're trying to model. So, the other thing is that Python multidimensional arrays are rectangular. Remember Fortran? Just like that, isn't it great that we're honoring our ancestors in this way?
So, they're traditional multidimensional arrays. You say, I'm gonna create a three-dimensional array with dimensions of five, seven, and 10, and the whole thing appears and is populated for you. Individual entries can be null if you need to represent a hole in the array, but it's not like a list of list structures where each individual element could be any size you want.
So once you've got it, as the dog chasing the car said, now that you've caught it, what are you going to do with it? You have array field queries. The basic one is contains. The contains matches if the array field on the left contains all of the entries of the list on the right.
So, abc contains ab, but it does not contain ad. Order is not important in a contains query. So they can appear in any order you want. You could also have contained by, which is effectively the reverse.
Matches if the list on the right contains all the entries of the field on the left. So, this is not contained by that, and that is not contained by that. But, that is contained by that, we. Order's not important here either. Okay, that's easy.
And then you have overlaps. So, which is the any version of this. So, this overlaps this, but that does not overlap that. And, those return a boolean. Then you also have the predicate, underscore, underscore, len, which returns the length. Which returns the length of the field on the left
as an integer. So, this is kind of approximate syntax. You wouldn't actually write this in your Python, but you get the idea. That if it's storing this value, you apply length to it, you get two. Now, something to remember about this, is unless you've created an expression index, which we'll talk about in a bit, this is going to do a full table scan.
It will pick up every single row in the database for that table, rattle it to calculate the length, and filter it. So, you probably don't want to do that on a big table. You can also apply transformations to it,
such as an index. This takes, like for example, this filters on everything whose first element, underscore, underscore, zero, is the string a. If there's no array zero, entry zero, for example, the array is of length zero,
it doesn't match. It simply returns false. There's no error. So, if you put five million there, and there are no five million length arrays, you'll just get back false for everything. One thing is, directly, you can't specify this programmatically.
You build in a zero there. Unless you can do string substitution and use quarks to do that, of course. You can also slice the array, which is pretty cool. You can say, okay, anything whose entries zero to one are that, or, probably more usefully, zero to two, contains a.
You can also index array fields. So, you can just say db index equals true, and you're done, right? Wrong, sorry. It creates a B-tree index on the array, which is pretty useless for an array. So, this is one of the downsides right now
of using these, is you have to, if you want indexing on them, and you almost certainly do want indexing, you have to do some special magic. So, let's talk about that. But first, let's talk about how Postgres does indexing. Postgres supports different kinds of index, which most people who are just using the Django arm never see, because you only get one kind out of it,
which is, unless you're using GeoDjango, which is a B-tree index. B-tree indexes are great. They're really, they're nearly the optimal solution to a particular problem, which is they're very fast, they're compact, and they provide total ordering. So, you can walk a B-tree index up one side
and down the other. You can use it to accelerate queries like greater than, less than, or equal. It's, B-tree indexes are great, but they're not perfect. Specifically, a B-tree index requires a totally ordered type, like integers. Every integer is greater than, less than, or equal to another integer. Those operations apply to any two integers.
Strings are totally ordered. Floats are totally ordered. Points arrays and H-source and things like that are not totally ordered. What does it mean for one array to be greater than or less than another? Well, you can make up something. You can say, well, we're gonna compare the elements in order and then, you know, all that. But most people don't use arrays that way. That's not a very interesting way of using an array.
What you do do on arrays are things like inclusion, like the contains operation. What you wanna do is say, does this, I wanna find all the instances of an array that contain this particular element, no matter where they are in the array. So Postgres has you covered. It has two different types of indexes,
which are GIST and GIN indexes. For those studying along at home, if you, GIST sort of stands for generalized index storage technique and GIN stands for generalized inverted index. So now you know. GIN indexes are generally used for types
that contain keys and values. Arrays, in that case, they just contain values, not keys. Hstores and JSON are good examples of things that are key value pairs or just value lists. GIN indexes are generally used for that kind of data structure. GIST indexes are generally used for types that partition a mathematical space.
So like a point or a range or a rectangle is an example of something that would be indexed using GIST. If you want, talk to me in the hall and I can explain the details on this for the moment, just roll with it. And the good part is once they've created just work, you don't have to do anything magic to them once you have them.
They're updated and maintained and managed and dumped and restored and everything, all the good stuff happens to them in Postgres. So Arrays support GIN indexes, generalized inverted indexes. The nice part is these accelerate contains, contain, bind, and overlaps. You will use the index as appropriate to make these guys go faster.
It doesn't help length or slice operations though, so be aware of that. So what this looks like at the SQL level is you create an index on app underscore model using GIN field. That using GIN is the part that indicates that what you want is a GIN type index.
One thing to note is GINs can be large, especially if there's a lot of data in the underlying table. And they're not free to update. Just don't run around creating them just because. Create them if you are going to be doing the operations that will be accelerated by them. Specifically, if it's a small table, don't create a GIN index.
Unless it's a pedagogical exercise for yourself. Let's say you want to index length. For example, a very common query is, get me all the arrays where the length is greater than seven. I'm having a hard time coming up with an example of why you'd want to do this, but the world is wide.
So people do this. What you can do is create an expression index. Array underscore length is Postgres's array length operation. With the field, you have to tell it which dimension you want the length of. Remember, multi-dimensional arrays. So dimension one, and dimension numbers are one-based.
Why yes, Postgres was invented in the early 90s. Why do you ask? You can also index slice operations like that. So if you're going to be doing constant queries on does this slice equal that slice, you can do those. The double parens indicate this is an expression index. So it's actually going to calculate this expression for each entry in the table,
and stuff it into an index for you. Postgres is smart enough that when you do the query, and it sees a matching expression, it'll use the index instead of running the expression again. So that's pretty cool. And also, Postgres arrays are one-based as well. What can I say? It seemed like the thing to do at the time.
Okay, so now we have all this machinery for doing array fields, but why would you ever want to use an array field? Well, the first one is, the underlying data really is an array. You want to store an array. You're getting things like, for example, you know, a very common situation is, you're recording raw sensor input from a sensor, and the sensor is saying, okay, at this timestamp,
I recorded 23 samples, here they are. And you want to just dump this into a field. You could denormalize, you could normalize it, sorry, not denormalizing it, and create two tables, one for the base sensor and one for each of those, but wow, that's a lot of overhead. So just stuff it in an array field. One of the best uses for an array is as a replacement for a many-to-many table.
One of the things that, so for example, the classic social networking problem, you have things and you have people, and people like things. Now, in your basic relational database model, the way you'd build this is you'd build an intermediate table that is a many-to-many table
between people and things, and every time someone likes a thing, you insert an entry into this many-to-many table. And then you have one billion people and 12 billion things, and that's, and a, you know, a 85 petabyte many-to-many table. Huh, okay, probably not.
So, what you can also do is, in the people field, store an array of everything they like, and in the thing field, store an array of everything that, everyone who likes this thing. There's these, the entries in an array can be key fields, so these could be integers that index to the other side, much more compact and efficient to query,
and then you index each field. I'm not sure I would build Facebook this way, but if you have a smaller system, this could work very well. You could also use this for denormalizing the results of an expensive query. For example, you're doing, you're caching the results of a query, you might keep the many-to-many field
for some other reason, but have an optimization where you're storing it as a denormalization. Denormalizations like this are generally, need to be approached with caution. I won't say never do them, but, because you do have to worry about maintaining them, make sure the data stays up to date, things like that, but it can be very useful.
Okay, and now we have hstore fields. Who's ever used hstore in Postgres? Okay, so hstore is a semi-built-in hash store data type. It's like a dict, a Python dict, that can only take strings as keys, as values, and values. So it's a single level,
it has to be a string on each side. Pre the JSON type in Postgres, this was the only way of storing unstructured data, associative data like this. It's not super powerful, and you notice the semi in semi-built-in, and we'll talk about what that means. So first of all, how do you get hstore to work? Because if you log into your average Postgres database
and try and create an hstore, a column, you know, create table, blah, blah, blah, X, hstore, and it throws an error saying I have no idea what the hstore type is, because it's not actually built into Postgres. It has to be installed in a particular database before you can use it, it's not part of core. The good part is, Django-controlled Postgres
comes with an hstore extension that'll install it for you. So you create a custom migration, and we use an empty, create an empty migration, add the hstore extension to operation to it, and it'll apply it, and you're done. So that's good. There's this one weirdness which only really matters if you're really pushing the performance of your database,
and for some reason you're not using connection pooling, which you should be. The problem is that it has to do, every time you connect to the database, psychopg2, which everybody's basically using, if you're using Python, has to connect to the get the object identifier for the hstore type, because it could be different in every database. So this adds one query to the connection.
It's usually not a big deal, especially because Django has this thing about asking all sorts of questions for the database on every connection, like what's your timezone and things like that. So one more query among friends. But you do have to be aware that's what's going on. If you see this query fly by, now you know what's going on.
So hstore types are represented in Python as a dict. The keys and values must be strings, not integers, not lists, not anything else. They're translated to and from the database encoding, so if your database is in something besides UTF-8, they'll be translated to the right encoding inside of Python.
Please say your database is in UTF-8, because it's horrible if it's not. Really, don't do that. So it makes DBAs cry. But if in the off chance it's not in UTF-8, it'll do the translation for you. hstore supports contains and contains by.
Both the key and the value have to match in this. And you have has key, which matches fields containing a particular key. So that's cool. And you have a has keys, which takes a list. So that's pretty useful. And then there's a keys, which returns the list,
which matches the list of the keys in the field. So you can say, give me all the fields whose, rows whose field has keys which contain either A or B. So that's pretty cool. And values does the same for the values of the hstore field. So you can query on values as well.
hstore fields support GIN indexes. So, exact same syntax, at model, loop. And accelerates, contains, has keys, but not contains by. Sorry, just the way the GIN index is built for an hstore.
So why would you use an hstore field? So it's great for storing very rare attributes. Andrew actually touched on this in his talk, his great talk earlier about database anti-patterns. Which is, you write a CMS or some other system, even an inventory control system or something like that, and you send it to the customer. And you send it to a bunch of customers.
And you don't want, you want the customer to be able to add attributes to items. For example, an inventory control system, you might want to be able to tell people, okay, for the item, they want to be able to add ISBN if it's a book, or color if it's a thing that comes in colors or sizes. But not every item is going to have that. And you don't know when you ship your product
out to the customer, which attributes they're going to want. Well, you could create fields, individual fields in the database, but that's first kind of hard in Django. And second of all, that way, mad this lies from a migrations and implementation point of view. Well, what you can do is create a single hstore field and use it as the place to store all of these random attributes.
Generally, my rule of thumb is, if there are going to be fields that are null 95% of the time, consider an hstore field instead. But one thing to remember in this use is that null fields take zero space in PostgreSQL. They don't actually take any room on disk. So you're not costing any space
by creating null fields with zero use. And another use for these is, if you have an attribute that's populated very, very, very rarely, that it's not worth creating a whole field for, this is another solution to that. And user-defined attributes, which we just talked about.
That being said, if you're doing Greenfield development right now, you probably want to use JSON instead, especially once 1.9 comes out and we have first-class JSON support in Django. But if you have to do something right now, there's no JSON type in 1.8, so use JSON, hstore, if you need it right away.
Or you're trying to plug into an existing database which has hstore. Now, in increasing order of coolness, we now come to range fields. PostgreSQL now has native range types. Range types span a range of a scalar type. So for example, 1,8 as in for range includes all of those.
This is an inclusive range, so it includes those guys. You can also write them as exclusive bounds. So for example, 1,8, it goes from one to seven, but doesn't include eight. So far so good? And that's the default. Notice that this is the PostgreSQL syntax.
Obviously you can't write this in Python and have it be syntactically legal. That would be awfully cool, wouldn't it? Bet there are languages that'll let you do that. Anyway, but this is, so default, and the default is this open on one end, closed on one side, open on the other range. If you emit a bound, it means all values greater than or less than.
For some types, particularly dates, also have a special infinity value. I guess there's a special end-of-days value in types. I don't know. So if you see one of those coming out of a query, you might want. Psychopagy 2, the database adapter that pretty much,
there's no reason not to use, and if you're using Django, you have to work hard not to use it, includes a Python range base type that handles all the various boundary cases and the infinity special cases, and that's what the range fields in Django are built on top of. Out of the box, 1.8 supports integer range and big integer range,
so 32-bit and 64-bit integers, a float range, date-time range, and a date range. I am pleased to say that date-time range is time-stamped with time zone in Postgres, because if you're using time-stamp without time-stamp TZ, you're probably making a very bad mistake.
So everybody go and make sure that you're using time-stamp, time-stamp TZ, not time-stamp, and date range. So contains, contain by an overlap kind of work the way you'd expect comparing two ranges. There's also a fully less than, fully greater than, which is true if both the upper bounds
and lower bounds of the field are greater than or less than the comparison value, so if the whole range is to one side or the whole range is to the other. And adjacent to is true if two ranges exactly bump up against each other, there's no space between them, there are no values. This is a place where the parentheses, the open bound is useful,
because you can, on a closed bound, there's no way of doing this with a continuous type. There's no way of writing two float ranges, two float ranges that are closed, that exactly bump up against each other, because there's always another float that you can shove into there. So this is why an open range is important. There's also not less than, as the field does not contain any points
less than the comparison value, and not greater than, which works the other way around. Range fields use Gist indexes, which you could probably have guessed the syntax, but here it is, at model using Gist field. Note that you'll have to, for now,
you have to drop this into a run SQL migration. There's no way of saying this just at the Django model level. That's okay, that's what run SQL's for. All the comparison operators that we just described are accelerated by having a Gist index on a range.
So why would you use this? Well, here's a problem. Let's say you're running a hotel, and your rule is don't allow two bookings for a room to be inserted in the database for the same room where the dates overlap. Okay, and you want to do, so how do you solve this problem? There's actually no way of solving this
with traditional unique constraints, because there's nothing that's necessarily unique. You could say room 102 for Monday through Friday, and room 102 for Tuesday through Saturday. There's nothing, those two taken together are not unique, but they are still, it's not valid to have both in the database at the same time. You double book the room.
Scratch, scratch, scratch. So how do we solve this problem? Postgres to the rescue. We have a facility in Postgres called an exclusion constraint, which is a relatively new feature, I think one two in Postgres, but it will allow you to not have this situation arise. It's a generalization of the idea of a unique constraint.
Let's stop for a moment and think about what unique means. Unique, you can say unique says, well, no two values can be the same for this column when you insert them. Okay, that's fine. But you can also say don't allow any two values who both of which pass the equality operator in there. Now, we've said it the same thing, but we've said it in a slightly different way.
We've said don't allow two things where this particular operator, equality, are in the database at the same time. So we could say, well, what if the operator is an equality? What if the operator is some other operator? So we could say don't allow two things in the database at the same time where the overlaps operator matches them.
Hmm, well, that's interesting. Because now, and then we say, well, okay, and let's let you add them together. So we'll say these two fields can't pass this operator and these two fields can't pass this operator and, and, and. So we've generalized the idea of unique to let you use any operators in combination with and.
And I think an example would be very important here. I'll keep the faces. So there's a catch. You have to have a single index for this whole shebang. And since range types require a just index, the index has to be a just index.
But the problem is we talked about this example where we're using room, and room's just an integer or a string, you know, however the hotel wants to represent them. This is a scalar value, and scalar values don't have just type, or don't have just indexing. Uh oh, we just blew it. I was going to show you this great example of how to solve this problem in Postgres and now I can't. Well, of course you can't.
Because there's a module called btreeGist which lets you create these indexes on mostly simple scalar types. It's a Postgres extension. It's part of contrib. It has to be installed in the database, but it ships with Postgres, which means you can use the create extension migration to get it into your database. So let's talk about how we'd actually use this.
So, you know, import the models, import the datetime field, and here's a booking, and here's the room, and here's the range, and okay. This would probably be a foreign key to a room, but you know, you get it. That's easy. And then we say, okay, what's it look like? Now there's the integer that Django created for us, and the room, and the range, and we're all set.
Okay. So far so good. Now we create this extension btreeGist, and we add this exclusion constraint. Now notice what we're doing here is we're saying, add this index where the room is equal, and the date is with at and and. And and is the Postgres version of the overlaps operator.
So when you do an overlaps query in Django, what you're gonna get is this double ampersand. Okay, now let's try adding some rooms with all of this. Say we're gonna add a room and save. That worked. Add a room and save. So because it's the same room, but an entirely different range.
Add a room, save, so far so good, because same time range, but different room. Okay. Room, do do do, one, two, three, save. Notice, however, this overlaps that. Bang. Oh, look, it didn't let me do that.
Pretty cool. And notice that this means the constraint is being enforced at the database level. Sure, you could write code in Python that would do the query, and if it returned an overlapping range, say no, there's exception. But the nice part about doing it in the database is if you're doing bulk imports, if you have other kinds of queries,
this means the database itself enforces it. So you should use range fields to represent ranges. I hope you get a lot of value out of this slide. So you probably figured that one out. It's more natural, and you get better operations than the traditional high-low pair
of stuff in the database. And you get more database integrity and more interesting operators available. Now, coming soon, we have JSON fields. They're not in 1.8, but I think they've, I'm almost sure they've landed for 1.9. These are fields that support arbitrary JSON structures. They're still a little bit of a work in progress, both on the Django side and on the JSON side,
on the Postgres side. Postgres has two JSON types, JSON and JSONB. Sorry about that. JSON stores the raw text of the JSON blob, whitespace and all. It is literally the text. It is, in fact, a wrapper around Postgres's text type. JSONB is a compact indexable representation. It's a lot like, it's similar to BSON,
but it's a lot better. So why use JSON instead of JSONB? It's faster to insert, since it doesn't have to process the data. It just shoves the raw text in the database. And JSON allows for two highly dubious features that people use in JSON, which are duplicate object keys at the same level
and stable object key order. These are not allowed by, there's nothing in the JSON spec that says that these features are available in JSON, but people still use them. Bad people, they should feel bad. And it's also, this is okay if you're just logging JSON, like you have a log table that's accumulating API calls or something like that, where you don't really need to process the stuff,
you just wanna have it somewhere. So JSONB, pretty much every other application you want JSONB. It can be indexed in useful ways, unlike JSON. The first coming JSON field in Django uses JSONB as its underlying representation, so just roll with it. JSONB has JIN indexing.
JSON, the only kind of indexing that's available for JSON are B-tree indexes that treat them like strings, which is pretty much useless. You get these operators. And the query has to be against the top level of the object in the index to be useful. So you can't index, you can't do a query,
if you have a nested JSON structure, you can't query for a key that's way down in the structure. It has to be at the top level. That's being worked on. You can query nested objects, but only in paths that are rooted to the top level. So why would you use JSON support in general? You're logging JSON data.
You want audit tables that work across multiple schemas. This is a very common problem in relational databases, where you want a single audit table that handles all the other tables, rather than having one audit table for every other table, which is that way madness lies. It's a nice way of pickling Python objects so that other tools can read them.
The standard libraries that pickle Python objects into JSON kind of put a lot of cruft into them for my taste, but they work. And for all the things that you used to use hstore for, like user-defined objects, rare fields, things like that.
The new fields have admin widgets that go with them that are really very cool. hstore and JSON widgets are really only good for debugging because you basically get the raw text spat out to you. But why are you using the admin for this? And the unaccent filter, which I'm out of time for, so just read about it in the documentation.
And thank you, questions? If in Postgres the arrays are one index, the example you gave in a query set filter
was zero index, is Django converting the? Yeah, it's Python-style indexing within Django because otherwise you would probably go insane. And it does the zero to one conversion for you. Okay, cool, thank you.
Quick question about expression indexes. Are expression indexes B-tree indexes, or can they also be? You can have B-tree, gin, and Gist indexes, yes. It's a little bit of an advanced class, but for example, I can definitely see things like if the output of the operation,
like a slice, is gin or Gist indexable type, then by all means you can have a gin or Gist expression index, and Postgres does the right thing. The various geo-extensions do this a lot for cool stuff.
I'm used to having to be a super-user to create extensions. Is there any hope to not need that requirement when I'm putting it in migrations? Nope. Okay. Yeah, sadly, because you could really seriously screw up your database with a bad create extension,
super-user access is going to be required, which is really annoying if you're example on RDS, where you don't have super-user. So in the case of that, you don't use a create extension situation. You write many angry letters to Amazon asking for that extension on the next release. Sorry, one more question. Of course.
Do you have any tips on migrating MySQL to Postgres? Hi Russ. Not that I can deliver in a small thing like this. I will say that actually the database layer is usually only about 25% of the nightmare. The rest of the nightmare is at the application level.
It's not so, if you've been really ruthless about your database agnosticism in Django, you probably can get away with it without too much hassle and there are tools that will probably do it for there, but a surprising number of MySQL applications rely on things like the return primary key on query,
on query for null thing that MySQL does in InnoDB and things like that. So most of the time, you spend most of the time looking for MySQLisms in the code actually, rather than the database. There are tools out there that actually will do the dump and load conversion for you.
Hi, are there any performance improvements moving from a many-to-many table to an array? Potentially huge because when you think about how the many-to-many query works, it has to potentially suck up a huge ton of rows and go through these enormous indexes.
Indexes are fast, but they're not infinitely fast and so if you, now there are upper bounds. I mean, one billion row item indexes are not gonna be any more fun than a billion row table, but potentially you could, this could be a huge win, especially if you're using a GIN index, because the way a GIN index works very quickly is it indexes, so if, for example,
if you have a key of four, the GIN index records everything that has a four in it and can go pull those rows potentially much faster than running through a many-to-many table. Thanks. And smaller. You mentioned that the new fields only, will only work with JSONB.
What do you recommend in order, if you want to use the plain JSON for those sort of dirty choices? Evaluate your life choices. It's, unfortunately, the bare JSON type does not have a lot of that good stuff in it. You can get some of it by writing, if you absolutely must, you can write expression fields
that extract the stuff out of, that extract those fields out of the JSON type and index them as B-tree things. At some point, though, you're probably better off doing the JSONB unless you absolutely must have those two misfeatures. Sure, well, if querying isn't necessarily a goal
and you just want to get access to it, could you subtext or subclass, like, a text field and then make it at least accessible? Sort of, to do your own? Well, in theory, you could alter, you create it and then alter the type back to JSON and probably everything will work just fine at that point.
I haven't tried it, but it would certainly be worth a go. The other thing, of course, is if you want the performance enhancement, the small but not zero performance enhancement on inserts that JSON has over JSONB. Absolutely, okay. Your booking example is really nifty.
I'm wondering a couple other things. Would it be possible to create multiple conditions using the B-tree Gist extension? If so, how would you recommend accepting handling? You could create, as in, more than one predicate on the, not quite sure of the question, sorry. So let's say that in addition to the date ranges,
you also didn't want certain types of runs to be booked. Yeah, I mean, it could be essentially, the only reason in that case I had to use the B-tree Gist hack is because one of the fields only has Gist indexing
and the whole thing has to be a Gist index, but all the indexes that appear in the exclusion constraint have to be of the same type. So that's why I had to force the int, the string that was the room, into it. If, for example, everything is a, you can write exclusion constraints
as any combination of Boolean predicates and it together. Okay, so effectively there's no reason to have a situation where you could have multiple kinds of integrity errors that you'd have to have exceptions for. You can have 12 different exclusion constraints on the same table. There's absolutely no limit to how many you can have, except that, of course, it has to check them all
on every insert, which could be a little bit annoying. So you'll only get one out. It won't tell you, oh, by the way, these 12 different exclusion constraints all failed. It'll hit one and then stop the insert and you'll get that one. And then you can rerun it. When it comes to giving feedback to a user when there's certain kinds of integrity errors, how would you recommend handling that on the Java side? That is a little more complicated.
In that case, you may need to, I would actually do both, both query the database in advance to get the right kind of error back to the user, but still have the constraint in case you have other applications that don't run through the same UI that try to insert data, you try and bulk load it from someplace else,
using the copy command or something like that. That way, because you'll inevitably, what will happen is like, take the room example, your hotel's running fine, and then hotels.com comes in and says, we'd love to send you reservations, but we have this API. And, you know, so that kind of thing. I'm sure room reservation systems
are a tiny bit more complex than that example, but you get the idea, so. Do you have any insight into what might be coming in 1.9 and beyond with regards to Postgres? Well, at this point, 1.9's pretty well locked down, so you can just read the release notes. You know, I'm not a contributor to that stuff directly,
so that would be the best example. I wasn't certain of your involvement. Thanks for the great talk. How do you feel about the funding model of the contributions to Django for the Postgres stuff, and are there any other things that you'd like to see funded in a similar way?
Um, that's probably a longer question, a bigger question than I can answer right here. It is interesting to me that I have very strong opinions that companies that use open source should give a little more back than they do. There are certainly features I would love to see,
like being able to push different foreign key, foreign key cascading models, specify those from the model level. For example, if you want to have something other than delete, be able to specify that, and have that implemented by the database rather than Django implementing it.
But yeah, getting money for these kinds of extensions is an issue. Kickstarter is great, but it does tend to be a popularity contest, and it locks out a lot of developers who wouldn't otherwise have, you know, who are brilliant programmers, but haven't worked as hard, but don't have the high profile, and that's a shame, I think.
So tell your employer to write big checks is my answer. Great, and if you need anything done with your Postgres database, give us a call. And thank you. Thank you.