We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Schemaless SQL

00:00

Formal Metadata

Title
Schemaless SQL
Subtitle
The Best of Both Worlds
Title of Series
Number of Parts
20
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Producer

Content Metadata

Subject Area
Genre
Abstract
The Best of Both Worlds Schemaless database are a joy to use because they make it easy to iterate on your app, especially early on. And to be honest, the relational model isn't always the best fit for real-world evolving and messy data. On the other hand, relational databases are proven, robust, and powerful. Also, over time as your data model stabilizes, the lack of well-defined schemas becomes painful. How are we supposed to pick one or the other? Simple: pick both. Fortunately recent advances in Postgres allow for a hybrid approach that we've been using at Heroku. The hstore datatype gives you key/value in a single column, and PLV8 enables JavaScript and JSON in Postgres. These and others in turn make Postgres the best document database in the world. We will explore the power of hstore and PLV8, explain how to use them in your project today, and examine their role in the future of data.
VideoconferencingMetropolitan area networkWordInteractive televisionRollback (data management)Service (economics)Computer fileRollback (data management)Asynchronous Transfer ModeType theoryDatabase transactionPoint (geometry)Multiplication signScripting languageDefault (computer science)Group actionAtomic numberXMLUMLJSON
Software developerCartesian coordinate systemDatabaseSoftware developerPerspective (visual)Stress (mechanics)QuicksortXMLComputer animation
Metropolitan area networkMultiplicationTotal S.A.PC CardRelational databaseData storage deviceIcosahedronCodeRevision controlComputer fileOnline chatRobotProduct (business)Set (mathematics)Price indexDrop (liquid)Extension (kinesiology)String (computer science)Division (mathematics)Density of statesMereologyCompilerGoogolFunction (mathematics)VarianceExecutive information systemData typeParsingCountingMaxima and minimaPort scannerDomain nameTime domainTable (information)Line (geometry)Limit (category theory)Uniformer RaumIntegerAreaSocial classInformation systemsQuicksortLogicCartesian coordinate systemStatement (computer science)Video gameSequelMereologyTrailStreaming media2 (number)Fitness functionKey (cryptography)Formal languageLastteilungData structureProgrammschleifeData typeRead-only memoryOperator (mathematics)Endliche ModelltheorieSubject indexingDatabaseBoolean algebraFunctional (mathematics)Web applicationTemplate (C++)Multiplication signOrder (biology)State observerSource codeDebuggerReading (process)View (database)NumberClient (computing)TheoryCodeTable (information)Software developerGoodness of fitHand fanServer (computing)Web browserJust-in-Time-CompilerParameter (computer programming)Open sourceBlogData storage deviceType theorySpeicherbereinigungIntegerAuthorizationParsingCrash (computing)Human migrationRelational databaseMachine codeRevision controlArrow of timeReal numberReplication (computing)Power (physics)Stack (abstract data type)Proxy serverNetwork topologyQuery languagePlastikkarteInterior (topology)Numeral (linguistics)Rollback (data management)Social classExtension (kinesiology)Form (programming)Sinc functionProduct (business)Domain nameProof theoryMappingCapability Maturity ModelNP-hardProjective planePointer (computer programming)Musical ensembleWellenwiderstand <Strömungsmechanik>Vapor barrierString (computer science)Library (computing)FrequencyBitComputer fileSelectivity (electronic)Web 2.0Level (video gaming)MetadataSeries (mathematics)Graph coloringData managementPLS (file format)Electric generatorWeb pageElectronic mailing listCountingMultitier architectureNeuroinformatikRegular graphPoint (geometry)Natural numberCache (computing)Meta elementInheritance (object-oriented programming)Pairwise comparisonRecursionRandomizationDifferent (Kate Ryan album)Sampling (statistics)Order of magnitudeMathematicsCasting (performing arts)Physical systemSet (mathematics)InjektivitätAttribute grammarAverageExpressionPlanningBuildingComputing platformFlow separationBootingStatisticsWeightLoginLatent heatDemo (music)WordPort scannerINTEGRALRaster graphicsMobile appHeegaard splitting1 (number)Hash functionResultantDefault (computer science)InformationService (economics)Metropolitan area networkFile formatComputer programmingValidity (statistics)Database transactionCombinational logicSerial portSoftware testingSound effectMatching (graph theory)Message passingPhysical lawStandard deviationAreaService-oriented architectureMedical imagingPerfect groupCuboidProcess (computing)Condition numberBlock (periodic table)Event horizonPositional notationCoroutineFerry CorstenStability theoryGoogolConnected spaceEuler anglesImplementationException handlingHeat transferRow (database)Stress (mechanics)Normal (geometry)Plug-in (computing)Semantic WebCheat <Computerspiel>Dependent and independent variablesField (computer science)Object (grammar)Scripting languageFocus (optics)MassComputer-assisted translationTrigonometryGroup actionError messageGame controllerPlotterAlgebraic closureDivergenceShared memoryMaterialization (paranormal)Forcing (mathematics)Instance (computer science)Arithmetic meanEqualiser (mathematics)Self-organizationRight anglePerspective (visual)Slide ruleData dictionarySequenceComputer animation
Transcript: English(auto-generated)
I'm going to talk about a technique that I'm calling schema-less SQL. My name is Will Lineweber. Before I start, I want to do a public service announcement. Put this in your psqlrc file. It should be the default. It's fantastic.
What this does is it auto-save points when you're in a transaction. When you screw something up, like you type from instead of for, it doesn't blow away your transaction, and then you can just keep going. It only does it when you're in interactive mode. If you're pasting scripts into your psql, it's fine. It won't do it. But if you're interactively into console, it just saves your butt all the time.
This is fantastic, and it's impossible to discover, so it's called on-air rollback. Put that in. I work at Heroku on the Heroku Postgres team. Before I really get into it, I want to give my perspective of where I'm coming from. I'm an application developer. I actually just started using Postgres a year and a half ago when I joined Heroku.
Before that, I had heard about Postgres, but I never really used it myself. I used other databases. I used Postgres indirectly because before coming to Heroku, I was a customer of Heroku. I pushed my application to Heroku and it would run on Postgres on Heroku, and I'd be like, oh, that's fine, whatever.
But then after joining the company, I really began to grow a much deeper appreciation for the database and I want to share that. Before my journey, I started with PHP and MySQL, which I don't think is that uncommon of a start. This is sort of what my applications looked like.
It was just a big tangle of logic separated by HTML, separated by more logic, and then some MySQL statements thrown in. It was a real huge mess. Then I discovered ActiveRecord, which is the Ruby ORM that comes with Rails.
This blew my mind. This was phenomenal because I was able to deal in my mind with relations instead of the actual tables. I could start describing in a declarative way how I could think about my data. I had authors and books, and they have a relationship between each other
in a way that's more than just a foreign key. This just made everything really simple to deal with and conceptualize my mind. I loved it. I fell in love. But still, everything wasn't perfect. I still had to deal with migrations. I still had to write my code, do a migration, deploy it.
There's this time when my code is live and the migration hasn't run yet so everything's crashing. I'm just one developer. I don't have a team or DBAs to handle this. It was all me. Things just weren't perfect. I just felt something wrong. So I kept looking for other solutions.
Then I came across CouchDB. I'm curious, has anyone here played with CouchDB at all? Yeah. Just as a brief overview, this isn't a CouchDB talk, but one of the great set of features that CouchDB has is it's all over HTTP.
So I can stick my load balancers and reverse caching proxies that I have for my web stack. I can take those same technologies that I understand and put them in front of my database and have some of the same advantages. It's RESTful-ish. Some people get angry when I call it RESTful because it's not quite REST. It has multi-mask replication built in, which I thought was really cool.
Turns out my startup didn't need it because we never got that big. It was cool that it was there. You query it with these MapReduce views, which are more like directly manipulating a Postgres index B-tree, which I'm going to get into more later. Which at the time I thought was cool, but now that I know more about how Postgres works,
yeah, let Postgres do all that for me. But the real awesome thing about Couch is that it's a schema-less database. I describe my documents as JSON, which is perfect for me. It's a web application developer because I'm dealing with JSON and JavaScript on the front end at the same time. I already have to know that. But these documents, just like the relations did for ActiveRecord for me,
these documents fit my world view much better. It's really easy to understand the love that a lot of people have for Mongo and Couch, which I think is easily dismissed in the Postgres community as these immature technologies, which they are. But you don't want to echo some of the things that Peter said in the keynote.
They're moving fast, and it's important and easy to understand why developers are clinging to these technologies. It's because documents, they're fun to deal with. SQL, it's kind of hard. Tables, they suck. They get big, and they're cumbersome to change.
Migrations, they're painful, so instead of doing them when you should early on, they just become painful later when you're forced to do them. Third-normalized form is a pain, and often the data that I deal with in my application isn't truly relational. It can be modeled in a relational manner, but it's not inherently relational.
And so that sort of mapping is sort of an impedance mismatch. Because I want my storage to look like my code because I am one person who's doing all layers when they're too disjoint, changing my mind to go from one place to the other is this barrier that slows me down.
And so if I can keep my code in my application tier similar to that that's in my database, as much as that can make sense, I'm allowed to go much faster. And so as a concrete example of some of this, this is what the schema looked like for the startup I founded before coming to Heroku.
And so what we did, it was sort of like music versioning, like code versioning but for music. And so a song, there were many songs. The song had a title, it had a pointer out to an artist's document, and other metadata at that level. Each song had an array of versions, and individual versions had lyrics that could change per version,
it had the date that it was created, and metadata at that level. Each version had a series of tracks, and tracks had their own information, and so on and so forth. And I realized that I could have done that in SQL.
But it was hard, because going back there, that file, it has to know about everything above it. It has to know about the track that it's associated with, the version it's associated with, and the song it's associated with. And all those foreign keys, if I had a files table, it was just like a lot. And it was hard to keep that all in my head at once.
And so storing it as a single document, a single song, and having it all in one place, that made a lot of sense to me. So what does this have to do with Postgres? So I give this talk at RailsConf, and I went into all these awesome things that Postgres can do, all old hat for people here, but none of them knew that any of this stuff was in Postgres.
So this is sort of building on this platform. Postgres is serving more as a platform for data, rather than just a database. Another thing that makes Postgres great is one of my colleagues, Dan, is giving a talk tomorrow, Wally,
that ships the right-hand logs to S3 for super durability. And one of the things that I realized after coming back to Postgres, after coming to Postgres, coming back to relational databases, that I didn't realize that I missed, was transactions.
Being able to psql into my database, type begin, and play around, and then have the safety net of being able to do a rollback if anything should go wrong, is just phenomenal. I didn't realize how powerful that was until I went away from it for a while. But still, I started to miss documents, but I missed them.
I don't miss them anymore, because I brought them back into Postgres. So I'm going to talk about two things, hstore, which I'm sure everyone here is familiar with, but more importantly I'm going to talk about how we use it at Heroku, and then PLV8, which is out and coming soon.
So hstore, I'm sure, is there anyone who has not used hstore that I should get into? Okay, I'll get into the details then. And so it's a key value store that you can fit into an individual column. And so you have all your other columns, but then you can have an hstore column. And what's really great is you can have indexes on it.
And so one of the things I ask a lot of people is, have you ever just serialized data and stuck in a column in this opaque column that no one could ever use? Has anyone done that here? See, much, much, much less. When I ask that at RailsConf, everyone raised their hand, because a lot of times they don't want to do a migration.
They don't want to add another table. If they can just get away with shoving that in a column, and then make the tradeoff that they can't query on it, that's good enough for a lot of people. And so the fantastic thing that hstore does is that it lets you have that key value serialized data, but also query on it.
And so this is the documentation on postgresql.org. It gives a lot of... Were you asking something? Oh, okay, sorry. And so what you can do...
So this is the basic syntax. If you want to get into the details of it, I recommend checking out the documentation on postgresql.org. But it has this arrow operator for getting a single key out, and you can update stuff with the concat operator. But the fantastic thing is you can treat it like any other column.
You can do joins on it. You can... So this query would give you all of the companies that have a red product. And you can create indexes on these operators. One of the... You can have a generalized index to get sort of a broad index, or if you know you're going to access one particular column a lot,
it helps to have a functional index. One of the things that really frustrates me is you need two parentheses around there. If you just do one, it doesn't work. And so one of the funny things with this is, so I saw that, and I'm like, oh, I can't create an index on this. So I switched to the underlying operator, which is getKey.
I'm not sure, but there's an underlying function operator that the arrow resolves to. But if you create an index on that and then use the arrow operator, which just resolves to that function, it doesn't pick up the index. And so it's two prunes. And so installing HR is super easy if you're on Postgres 9.1 or later.
It's just an extension. It comes in the standard contrib. You can use it. It's been in Postgres since like 8.2, 8.3 for a while. Of course, before, you have to do the backslash i and find the path to it. But with the extension support in 9.1 is fantastic. Create extension hdoor.
One of the things that is super exciting for me is the next version of ActiveRecord has built-in support for hdoor. And the excitement, at least in my community, the Ruby community, for hdoor is phenomenal. We've put out a blog post about announcing support for this because we launched 9.1 and have the extensions.
And the traffic that people went for, they went nuts about this hdoor. And we're starting to see a real change of people, I'm going to get into later about actually adopting advanced database features. And this is sort of a vanguard of this culture change,
at least in the Ruby community. If you're using the gin index, you can use this. And this is sort of what it looks like from Ruby. I can say products where my attributes column has the key value, color in red, and then gets subbed in there to prevent SQL injection.
One of the other things that with ActiveRecord, ActiveRecord 3, which would be interesting for you guys, is that it actually does prepared statements automatically. And so when your application boots up the first time it runs a query, if you're using Postgres, it will prepare the statement and then hold on to that. And there are some caveats for prepared statements,
like if the plan changes significantly, it's not going to update that. But since it does it just when the application boots up and the applications typically restart somewhat frequently, it can usually get, it's pretty good. We've only seen a couple support tickets for people needing to turn this off because of getting bad plans too frequently.
If you want to see a demo of how to use this in Rails, one of my colleagues wrote an open source app to show this off, and you can check it out. And the integration is fantastic. I use a different library called SQL, which also has support for it. So how we use this at Heroku, so we run the database service,
and every 15 to 30 seconds we check in on each and every database that we run, and we ask it some questions like, are you running? How many tables do you have? How many connections? How much table size? Do you have a lot of waiting wall segments that haven't been shipped off? Because if it gets above anything more than 10,
we worry that if anything should happen, we wouldn't have the wall shipped. So we check in very frequently. But the checks that we do have changed a lot over time as we discover new things that we should be checking and realize that there's other things that we've checked previously we don't need anymore. So we have gigs and gigs and gigs and gigs of these observations,
and if I were to do a migration on that every time I wanted to check something new and experiment and see if this is going to be a useful check or not, I'd be taking my database down all the time. And so instead, we just have an hstore column that we put these observations in, and it's mostly the same,
but the structure sort of changes over time, and that's okay. Another tool that we've open sourced here called wickeld stands for word count dash l as a service. So wickeld, and it's a program written in Go that if you format your logs as a key value thing like this,
it'll take that and then shove it in an hstore. And so we use this to do a lot of transactions. We use this to keep track of our billing across the platform, all the events. We just collect our log streams and then shove them in an hstore.
And Go is a great fit for this because it can spin off the Go routines to collect all these different log streams and then write them into Postgres. And so the sort of approach, this bulk bag approach, of having your regular columns that you would have
but then just adding an hstore column and storing data in there. And over time, if I see that this particular key isn't going anywhere, I'll promote it out to a proper column. And that way, I have the flexibility of trying stuff out without having to do a full migration and add a column. I can add it in there.
And then if it's not useful, forget about it. And if it is useful, take the time to write it into a proper column. However, I'll be honest that I don't actually do the promote as much as I probably should because the hstore is perfectly fine. And that's all pretty good. But one of the problems is you can't nest an hstore in an hstore. So it's just one flat structure.
And the other problem is that it's only strings. The keys are only strings and the values are only strings. And so we get around this by having, as a typical Rubyist, just some conventions. And so we have these naming conventions where if a key ends with underscore at, we'll consider it a date time. And then in my application code, we'll parse that into a date. If it starts with num or ends in size, we'll convert it to an integer.
Booleans end with question marks. And then we just use strings for everything else. And then we've done, I think we do those observations I was talking about on the order of, depending on how long the checks are taking on a particular time, between one and three million an hour.
And just all just shoving them into hstores. And it's been fantastic. And the great thing is you can use it today. But the new and exciting part is, yeah, I actually don't know.
So, but I don't know, I'm using it and I like it.
And so PLV8 is sort of the interesting part. So PLV8 is the V8 JavaScript engine on Postgres. It was started by Hitoshi Arada who's giving a talk that you're not seeing because it's right now. And then continued on with other people, notably Andrew Dunstan.
So why JavaScript? Well, I mean, it's everywhere. There's JavaScript in every single browser. It's easy to get started with. And even JavaScript arguably on the server thing with Node.js though. I'm not a particular fan. And it has good parts. Has anyone seen the JavaScript good parts book?
Yeah, it's like this big. And then next to, and then you put it next to the JavaScript reference which is this big. And so the argument of the good parts is that you can learn a language and master of a language isn't knowing everything in the language. It's knowing what to avoid. And unfortunately in JavaScript there's a lot of things to avoid.
But if you stick to the good parts of it, it's actually, it's an okay language. Important for us at Heroku is it's a trusted language. You know, JavaScript was built to be sandboxed inside a browser. And having that same sandboxing and, you know, model and having it in Postgres is perfect for us to offer it as a PL language.
Because you can't go out and crash yourself is the big one. The VA JavaScript engine is fantastic. It's written by Google. It compiles the just-in-time compiler to native code which makes it really fast. It has modern techniques like a generational garbage collector and so on and so forth.
So, you know, people here are like, oh, there's been other PLs for a long time. You know, PL Python, PL Perl, PL Ruby. You know, why am I not interested in those? And it's because of, JavaScript is really suited to sort of manipulating these data structures in JSON.
Whereas other languages are more general purpose. Yeah. And so you can get PL V8 at Google code. Installing it is pretty easy as far as things go. I used to have to do that PGXN thing. I haven't done it and it still works.
I'm not really sure about the PGXN. I don't like it with the equals one. And then installing it works. Using PL V8, I'm going to go through an example of Fibonacci which, while isn't the greatest example as far as benchmarking and such goes, it gives a good feel for an introduction to how you would use it.
So here's a Fibonacci done as just a regular PL PGSQL function, just the naive no caching recursive Fibonacci. And so you run it over, you know, going in increments of five from zero to 35
and it takes about three minutes. This is just on my MacBook. So let's take what that would look like if we did it in JavaScript. This is the same implementation, same recursive naive Fibonacci. And it's fast.
You know, granted, PL PGSQL is not really a numeric language. However, this shows off the advancements you can get with compiling to native code with V8. I didn't want to stop there. I wanted to just see how far I could push it and introduce a memoization.
And it's cheating, but then it's really fast. And that's sort of just how you would go about using a JavaScript function. So let's go back to documents. Here's a sample JavaScript document or JSON document that has a name,
a numeric type, and then an array and then nested in the array, some smaller documents of phone numbers, just to show off all the different features of JSON. And so I made a table that just has a data column. It has a bunch, a bunch, a bunch of random documents of this structure.
I think I put a million in there. So here's a basic function that would pull out a numeric type from a JSON document. So let's walk through this here. The function takes the key that you want to pull out, and then it takes the document as text or as JSON.
It returns a numeric, and what it does here is it just uses the built-in JSON parsing, parses the data, and then just pulls out the key and then returns that. And so here's how you'd use it. If you're doing something like you're getting the averages of all of them, it is slow. It has to parse every single document each time just to pull out the one key it wants.
The second example here is I've just put all those ages into a regular standard table and just had Postgres do it, and it's much faster. So you are paying quite a huge penalty currently for doing this parsing. But as you start getting to more and more complicated queries, the difference becomes much less.
And so you're using a common table expression to get a bunch of different statistics. And it's still slower, but it's no longer like several orders of magnitude slower. But now that we have that basic building block, we can improve on this. And so pulling out a specific age, again, is slow, but you can create an index on it,
because Postgres doesn't carry that this index came from PLV8. Once you create the index, it knows to treat it just like any other index that it has. And so the second one, using the index, it uses it just fine. And then I'm super interested for Postgres 9.2 with the index-only scans to even improve on this even further,
so I don't have to do any of the fallback to JSON once I have my indexes built. And for those of you who are familiar with CouchDB, this is the same exact tradeoff that Couch has. So in Couch, you create a JavaScript MapReduce view, and the first time you create the view, it's really slow, because it has to crawl through each and every document creating the view,
just as Postgres has to when it's creating the index. But then afterwards, once you have the index, it's perfectly fast. With one notable exception, Couch cannot combine arbitrary numbers of views into one view, whereas Postgres is perfectly happy with combining arbitrary index lookups together with the bitmap checking.
And so here in this example here, I'm just taking two different indexes that I made. It is combining them together, doing the re-check on the ones that has to, and returning the results. And this is fantastic. I'm super happy. Thank you, everyone who worked on any sort of this index stuff.
This is phenomenal. Thanks. And then, of course, the great news is Postgres 9.2 with the native JSON datatype. Now, once the datatype is a default datatype coming with Postgres and you can install PLV8,
the combination here is fantastic. Now, I wrote this before. There was the backport extension of JSON datatype. So if you don't want to use the backport to 9.1, one of the things I've been doing is I discovered domains. And so you can create a sort of a poor man's datatype with the domain and just check, make a function that checks if it's valid JSON.
If it can't parse it, return false. If it can't parse it, return true. And so here's two examples. The one on the left is valid JSON, so it returns true. LOL is not unfortunately valid JSON, so it returns false. And then you can create your table using this domain.
However, it's probably better off to just use the backported JSON datatype. So what can we do when we have arbitrary JavaScript in the database? One of the things that excited me the most was Mustache. Has anyone used the Mustache templating language? No? Okay. Well, I will go into it. And so Mustache is a really great templating language because it has no logic at all.
You just have the keys that you want to pull out in your template, you surround it by two curly braces. And it can do some basic looping over things, some really basic conditionals if something exists or not, but you can't put any application code.
And one of the things I've seen time and time again as new templating languages have come out is people shoving too much logic into the view when really the view should just be handed the data in a template and go. And so Mustache is implemented in several languages. There is a JavaScript limitation, and so this sort of technique is you create your function definition,
you paste in everything from the source code, and then just a little snippet at the end to parse the JSON from the view, put it with the template you give it, and then return it rendered as HTML. And it's not actually HTML, it's just the function name because it's intended for web pages.
And so the top string there is what a Mustache template looks like. So with the pound declaration, I'm iterating over all my things. The period will print that thing, and then this is the end of my loop. And so if I have a things array in JSON of world from will, I can use this function and then template it together.
Is this useful, having a templating language inside Postgres? I don't know, but it's pretty cool. And people, it's exciting. But more useful, using that same technique of taking a JavaScript library intended for the web or for NodeJS
and putting it in Postgres that's actually useful is JSON select. And so this is sort of like XPath or CSS selectors but for JSON documents. And so using that same method, I pretend that I'm going to be a node package manager file, so I make an exports object, I paste in all my code, and then I do that same thing.
I match my document for the selector that I sent in, and then I return it back as JSON. And so this is a function that takes my selector, it takes JSON and returns JSON. And so as a reminder, these are the documents I'm sort of dealing with,
the people's name, the ages, and the phone numbers. Now that I have JSON select, I can say, give me the first name, so name nth-child1 of data as name, and give me all the phone numbers as phone. And so this is going through all my documents, and I have sort of an expressive way
to look at the semi-structured nature of these JSON documents. And so instead of having to have an individual function for my git numeric out, now I can just say, give me the age out of this document.
Of course, since it's coming back as JSON, I have to cast it to an integer to do the comparison as an integer. But this is the same example I did before, but now with this sort of meta language built on top of a PLVA function. Now there are a lot of rough edges.
Notably, all that casting that you have to do, since I'm just returning JSON, if I'm actually returning a string, I have to sort of parse it, cast it to a string. If it's integer, I have to cast it as an integer. It's sort of weird to run into problems when it's sort of this new language is JSON select that you have to learn.
However, there's really exciting promise. And so just take it to the end here. What's the worst thing I could do with having JavaScript in my database? And I think it's this. So what this is doing is I'm just passing an arbitrary JavaScript source.
It is evaluating it and then immediately executing it. And so instead of just having, this is really exciting because instead of just having SQL injection, I can have SQL and JavaScript injection together in one. And I'm thrilled. So here's some uses of what you can do with it.
First one here, just returning the date of when I made this slide. Second one, you're taking just an arbitrary large number and turning it to a string, which happens to be the word JavaScript in base 36. Taking my data, so actually something that's sort of useful is taking my data, pulling the name out, splitting it on each character, sorting and joining.
So this is actually sort of showing off how you can. If you just need something real quick to do some manipulation of your data and you know JavaScript better than SQL, which I do, you can just do this. And so pulling this back into where I'm coming from, Ruby, and then applying it to other languages,
all of that work on that end should be transparent to me as an application developer. It's important that we start working from both ends. So this stuff in Postgres is great. Exposing more of the power of Postgres in a nice way that I can then build up libraries from Ruby
or other languages, Python, Perl, and bring both of them up from both ends. And so just an example of a project that I have is taking this PLV8 JSON documents and putting it back into ActiveRecord. So this is an ActiveRecord.
I made a product class that has a primary key as an integer. It has a data column, which is just JSON, and then some timestamps. And so the data in Ruby, it just shows to me up as a hash or a dictionary. And so I have a cost. I can set it to three. I can just put nested information in there as my native language.
And then my library knows how to serialize that to JSON and deserialize it back for me. And so I can then save it and deal with that. And that's a very nice way. And that's really easy to implement, this serialization and deserialization. But the real interesting thing is when you start wanting to query on that.
And so I just made 30 products here, each with cost from one to 30. And so borrowing from the hstore little arrow notation, I decided that my JSON select is going to be the arrow for a JSON data type. And so I'm pulling out cost greater than 10.
Now, the tricky thing here in sort of the rough edges is that because I want to deal with 10 as a number, I have to have my library smart enough to realize that and then change the query to cast it to numeric. One of the interesting things with JSON is that it doesn't have integers. It only has numeric. And so I can't really get away with casting it to an int
because it may be floats or exponential notations or whatever. So I have to fall back all the way up to the numeric on the number tree. So this is sort of a proof of concept. It works. I can deal with that. And once this is sort of usable,
I want to move on and start building languages for other languages, Python, Clojure, some of these other ones, so that everyone can use this and put together a function package of useful PLDA functions so that you don't have to know about them and go seek them out.
And one of the things that sort of bit me is I had to learn about function costs because the JSON select is a monstrous function. And Postgres was rechecking more frequently than it had to until I bumped up the function cost a lot. So sort of in closing, what we have now is friendly documents
that developers really love. But having them in a world-class database that's been around as fast is fantastic, which is just awesome. And there's a lot of interesting possibilities and promises by having a data type and a PL language that is really suited together.
They're designed together. They work together. And the impedance there with using that is just so low that it makes it just a joy to work with. And more importantly, though, for the Postgres community, expanding out into the mindshare of application developers
is another reason to get people into using advanced database features. So with all these ORMs, one of their selling points is that they're database agnostic, which isn't really a selling point for you when you are making your application. It's a selling point for them to get as many customers as they need, as they can have, or users.
But once you build your application, you're not really going to change. It's super expensive. It's super painful to change. So once you pick your database from the beginning, you're probably not going to change. And so the database agnostic system of an ORM doesn't matter for you.
But people are still like, oh, I want to keep it portable. I don't want to use these advanced features. But that's starting to change, especially with things like exposing the full-text search to people, exposing geolocation, and now HSTOR and PLV8,
Ruby developers and some developers from other languages, from what I can tell, are starting to take advantage of these features, which is fantastic. Another thing that is interesting is now that you can have one language all the way from the browser to the application tier to the database, you can start playing a little bit looser with where your computations are done,
and it's easier to move your code from one tier to the other as it makes sense, which is exciting. But could all this be better? Yeah. And so I just have a little wish list of things, and you can take them or leave them. But if people start using this, so I'm super happy that JSON Data Type made it in.
If people use it, then we can work on making the storage better, the keys not repeating them and such, making it more efficient to look up. But if no one ends up using it, then it's good that the time wasn't spent a priori.
Eliminate reparsing, if we can somehow store the parsed JSON in a different format so that the functions don't have to reparse the JSON each time. I don't know the technicals of how that could be done, but that would speed things up considerably. A general JSON index, if we can get that into Gist.
Auto-function costs, I did not know that functions could have costs, and you can change that until my colleague Dan was like, oh, you should just bump up the function cost. I assume that Postgres could notice that this function is very expensive and bump it up, maybe.
An HTTP Postgres would be phenomenal. If we're speaking JSON, if I could put some of my queries to one of my read-only read slaves, and because I've crafted it, I know that it's going to be what my client wants, and skip all of my whole application theory.
This was something I was able to do in Couch, and it was really great, just because I made my view and I could send it. That's all I have. Thank you.