We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Using PostgreSQL with Redis

00:00

Formal Metadata

Title
Using PostgreSQL with Redis
Alternative Title
PostgreSQL + Redis
Title of Series
Number of Parts
25
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceOttawa, Canada

Content Metadata

Subject Area
Genre
Abstract
Using native wrappers and a Foreign Data Wrapper for two-way Redis integration Redis is a high performance in-memory distributed data store, which can work well in conjunction with PostgreSQL in certain types of applications. The talk will introduce native wrappers for the Redis API, which allow pushing data into Redis at high speed, as well as use of both the native API and the Redis Foreign Data Wrapper to pull data from Redis. We will also discuss possible scenarios where use of each of these is a win, and the trade-offs involved.
Coma BerenicesWeight3 (number)Server (computing)Wrapper (data mining)CASE <Informatik>WebsiteService (economics)Installation artConfiguration spaceKey (cryptography)String (computer science)Scalar fieldSet (mathematics)Electronic mailing listHash functionIntegerParameter (computer programming)Table (information)Object (grammar)World Wide Web ConsortiumSpacetimePattern languageType theoryFlow separationNumbering schemeBinomial coefficientClient (computing)Library (computing)Web pageUniform resource nameRow (database)Computer configurationWeb browserExtension (kinesiology)Lattice (order)QuicksortDigital filterFunction (mathematics)Single-precision floating-point formatSineCAN busReliefArc (geometry)ArmDefault (computer science)Mountain passOperations researchVarianceTotal S.A.Utility softwareStatement (computer science)Boolean algebraField (computer science)Level (video gaming)Installable File SystemCrash (computing)SummierbarkeitInformationInfinityDatabaseStatisticsComputer hardwarePhysical systemView (database)Metropolitan area networkArithmetic meanCloud computingElectronic meeting systemDatabase normalizationNewton's law of universal gravitationSpecial unitary groupMenu (computing)ACIDStructural loadAsynchronous Transfer ModeNumberVector spaceArray data structurePrice indexInterior (topology)VacuumSubject indexingStapeldateiProcess (computing)Partition (number theory)Product (business)Inheritance (object-oriented programming)Constraint (mathematics)SicTerm (mathematics)FrequencyRead-only memoryReal numberDecision theoryReplication (computing)AerodynamicsCodeDivisorCharacteristic polynomialOpen sourceDisintegrationConcurrency (computer science)Vertex (graph theory)Slide ruleTerm (mathematics)Database transactionQuicksortNumberCartesian coordinate systemAutomatic differentiationType theoryBasis <Mathematik>Extension (kinesiology)Array data structureMereologyProjective planeProcess (computing)Subject indexingAnalytic setNamespacePhysical systemDifferent (Kate Ryan album)Row (database)DatabasePattern languageInterrupt <Informatik>CASE <Informatik>ProgrammschleifeVideo gameKey (cryptography)Product (business)Multiplication signQuery languageObject (grammar)Client (computing)Configuration spaceStructural loadOperator (mathematics)Parameter (computer programming)Computer fileInstallation artConcurrency (computer science)Replication (computing)Computer hardwareComputer configurationUtility softwareTable (information)Variable (mathematics)Connected spaceOrder (biology)Sound effectDefault (computer science)Instance (computer science)Library (computing)Office suitePoint (geometry)Front and back endsSpacetimeDecision theorySet (mathematics)Fraction (mathematics)Greatest elementRight angleWritingSoftware repositoryAlgorithmInsertion lossHash functionTraffic reporting1 (number)Endliche ModelltheorieConstraint (mathematics)Exception handlingStapeldateiError messageLimit (category theory)ResultantExistenceMessage passing2 (number)Category of beingField (computer science)Data structureBitCombinational logicMappingObservational studyMiniDiscSelectivity (electronic)Characteristic polynomialSemiconductor memoryView (database)Goodness of fitStatement (computer science)Variety (linguistics)Web 2.0Direction (geometry)Core dumpFunctional (mathematics)Condition numberPlanningVector spaceString (computer science)Enumerated typeMoment (mathematics)SoftwareCodeEqualiser (mathematics)SubsetEuler anglesVirtual machineSource codeStatisticsAd servingReal-time operating systemSingle-precision floating-point formatLastteilungData typeIP addressData warehouseDebuggerForm (programming)Communications protocolMobile appElectronic mailing listStress (mechanics)Exterior algebraSoftware developerBefehlsprozessorGeometryIntegerOpen sourceBoolean algebraInteractive televisionSinc functionLevel (video gaming)Online helpData storage deviceState of matterTrailCAN busWorkstation <Musikinstrument>Staff (military)RadiusCovering spaceSheaf (mathematics)Reading (process)Bit rateWeb pageState observerVideoconferencingNumbering schemeControl flowSpring (hydrology)LinearizationSummierbarkeitMonster groupThermal conductivityConformal mapTouch typingSensitivity analysisWeightCuboidHybrid computerLoop (music)Crash (computing)Chemical equationoutputFile formatMeasurementElectric generatorWebcamFloating pointMetropolitan area networkIncidence algebraWordInformationServer (computing)Partition (number theory)Pole (complex analysis)InternetworkingLine (geometry)Military baseCurveNatural languagePosition operatorFlash memoryImage registrationMathematical analysisTheoryTheory of relativityAreaInjektivitätExecution unitDisk read-and-write headComputer animation
Transcript: English(auto-generated)
OK, good morning, everybody. First up, you're going to have to excuse me if I every so often have to blow my nose, because my allergies are playing up somewhat. So we're going to talk about Postgres and Redis.
And we'll look at, here's basically what we're going to cover. We're going to talk a bit about what Redis is and how it works. Then we'll take a look at the Redis foreign data wrapper. Then we'll look at a companion package which wraps Redis command API.
And finally, we're going to have a look at a case study of a high performance ad server that's based on a combination of Postgres and Redis and uses these utilities.
Redis is a high performance in-memory key value data store. Now, it will actually write to disk every so often. And how it does that is configurable. And it also has some replication features, which I won't be covering here. But essentially, this is what it is. It's a pretty simple data store.
The nice thing about it, the thing about it that makes it attractive for use with Postgres is that it does have some structured data types, as we'll see.
It has almost no configuration required to set it up. That's the commands you need to do on a Fedora instance, recently, a recent Fedora instance, to install it, run it, and get into talking to it. The last is just the command line client
for talking to Redis. And that's all you have to do. Now, when you get to production, you're going to tune it a bit. Principally, you're going to tune how much memory it uses. But to start off with, that's all you need to do.
It's very, very simple. Redis, as I said, it's a key value data store. And the keys are just strings. That's all they are. But the values are rather different. They can be scalar values. They can be strings.
Or they can be integers. So you have increment operations, which are useful for doing things like web counters and so on. And they can also be structured. And these are the four structures that they support. They have lists of things. They have unordered sets of things. They have ordered sets of things.
And in effect, these are actually scored data. So you can insert a value into these things along with some score. And then the ordering is according to the score. And you can get back that score. And the last thing they have is a single value
can consist of a set of name value pairs, somewhat like an H store. Yeah. Pretty much, I believe. I don't think we're actually doing any of that.
I think that is, yeah. So these hashes play a big part in the way that we're using the Redis in conjunction with Postgres.
There's a very simple command set. This command set is unfortunately quite large, but all the commands are very simple. There's nothing like SQL. It's basically a command and a few parameters.
That's the URL where you can go and have a look at the command set if you're interested. Here are some examples. So the first example is setting a scalar value. You simply say set the name of the object
and the value of the object. The second command is setting some values on a hash key. So this is simply an arbitrary name and then a set off of property name value pairs.
The third is adding some values to a set. The fourth is adding some values to the end of a list. And the fifth is setting some values in an ordered set.
The scores, by the way, can be floating point numbers. They don't have to be integers. OK. There is no creation command for a Redis object.
You simply, it springs into existence when you add some value to it. So it's almost schemaless. The only thing is that if you try to use a list operation on a set or something like that, then it objects.
But apart from that, Redis is pretty much totally schemaless. All right, and we have a single global namespace for all the objects of all types. So you can't have two objects of different type with the same name. Every object has to have a unique name.
So yeah, people tend to build unique names along this sort of pattern. One of the things you can do is this command, keys,
which will find all the keys that match a particular pattern. It's using type globbing type patterns. The Redis document's actually worn
against using this command in production. And we'll see later on how to get around this problem by using a different command to find the keys that we're
going to need. So Redis users basically do, in effect, tables, that is to say, segregate the global namespace into particular portions by either they use a prefix,
so they'll put a date on, or they'll put a session ID on this part of the key, or something or other which is going to segregate that.
And then they use the keys command that we talk about. Or what they do is this. They keep the keys in a separate set. So you might have a name set called key hitset. You're going to add your key to that. And then if you want to go and find all the keys that
are in your set, instead of using the keys command, you'll simply get the members from this set, which is a very fast operation, and process that. But the downside is that the application has to make use of that. Redis itself is unable to match up the members of a set,
or indeed the values returned by the keys command, to anything else. So your application has to do all the work there. There is a client library which the Redis community
has produced. It's called High Redis. It's pretty simple. And it's available on GitHub. So the Redis foreign data wrapper is something Dave Page wrote to start with several years ago.
And back in the 9.1 days, when we were first starting off with data wrappers, and recently it's been brought up to date so that it runs with Postgres 9.2 and somewhat extended by me.
And recently, we moved it from Dave's personal GitHub space into this more generic name. So that is now the authoritative source for the foreign data wrapper. The tip of it, by the way, is currently broken.
I'll tell you about the breakage when I tell you the latest stuff I've done in a while. So originally, basically, it only supported the scalar data type. It simply ignored everything else
and said, oh, well, it's not a scalar. I'm not going to give you back anything. And it had no support for segmenting the namespace in any way. So basically, it treated the whole Redis namespace as a single table.
And about the only thing you could do is you could say, well, I want just that one key. So it would actually push down a qual from Postgres on the key name for equality only. And if it did that, then it would actually go and just fetch that key and not give you back all the rows.
Otherwise, it would give you back all the rows and it was up to Postgres to apply any where conditions. But of course, that might mean you get a whole lot of data back from Redis, and then you're having to filter them on the Postgres side.
So starting around last December, I started doing some of these updates. The first thing I did was to make it work with Postgres 9.2. And then we started with all the data types are now
supported. So you can have a table that will deal with set objects or list objects or hash objects or ordered set objects, as well as with scalar objects. The second thing we did was to provide an option where
you could specify the prefix for a set of keys for the table. So then your table is, if you say that the prefix is hits colon, then it will only fetch keys with names that match that as a prefix. Or it can use the table sets functionality
that we talked about before. You provide the name of a set, and then the keys of the table are the values contained in that set. So for everything except scalars,
the data comes back as an array. And that's actually returned as a Postgres array literal. So if you actually define the table as being an array of text, say, then it will actually break that up into an array of text, which you can then
index, and we use that in a variety of ways. So hash tables are the most important type, because really, they map most closely to Postgres records.
The Postgres record is essentially a mapping from a set of names, column names, to a set of values. So a Redis hash is exactly like a row, in that respect.
So if you're using a hash type table, then the best thing is to define the table as having an array of text that's the second column. And then what you can actually do is it's possible to turn that into JSON, or HSTORE, or indeed to a record.
So here's an example. We're going to take this. We're going to create a foreign table called webSessions, which is going to have a key. All these tables pretty much have to have one or two columns, a key and an array of text.
We see that the table type is called hash, and we're using the prefix method to choose which keys we want. And we can simply select star from webSessions, and we'll get a bunch of sessions, one row per session.
So to use that with an HSTORE, what we can do is we can, because HSTORE has the ability to turn an array of key value pairs into a record, we could create a record type, for instance, which has these properties, ID, browser,
and username. And then we use HSTORE's populate record to turn that into records of this type. This works pretty well. There's also a new extension which I recently
wrote called JSON object, which pretty much does the same thing as HSTORE's populate record, but it does it from a JSON object.
So if what you want is a JSON object, rather than an HSTORE or record, you can actually create one from this. And of course, in 9.3, we have functions to turn JSON into records.
That's part of the new JSON work. So here's the thing. Don't ever use the key prefix in production. The ad server that we're going to talk about,
they actually tried. That was the first pattern they wanted to use, and they did. And they said, well, we don't want to bother with having to keep the keys in a keyset. And then they came to me a couple of weeks later and said, but it's not performing very well. I said, well, what happens if you try with a keyset
and suddenly, magically, it just worked really fast? All they had to do was, as they were setting the values, they would also do sadd on the key. And then when they fetched it, it was just very fast. This is the pattern that is actually recommended by the Redis
docs. They say, don't use keys, use sets of keys. OK, the other thing you can do is because you can actually use sets of keys as a kind of where filter on a table.
So you can actually put the keys you want in a Redis set, define a table which used that set as its keyset, and then pull that back. And that effect will subset the data that you want. And they're actually using this in the ad server.
It works pretty well. We don't actually have support for a lot of push down still in the foreign data wrapper. So you're going to need to use these sorts of tricks,
at least at the moment. If somebody wants to fund push down, we can have a look at it. But we don't have any plans on the table for that. OK, so as I mentioned, in 9.3, we've got JSON populate record, which means you could avoid the use of hstore
if you want to turn stuff into a record. Post 9.3, it would probably be a good idea to have something which didn't actually have to use these intermediate types, JSON or hstore, to create a record. And we should have a function, either in core or as an extension, which would
take an array of key value pairs and simply produce a record of the required type directly. That would be quite possible, and it's probably worth doing. In fact, it's probably worth having in core. It's sufficiently general use.
It would certainly be useful for use with the Redis data wrapper. OK, here's the thing I wrote recently, and which has currently caused the data wrapper to be broken. The Redis foreign data wrapper currently treats every JSON,
on every Redis value set, as a table row. But what if we actually treated it as being a table instead of a row?
And the classic case for doing this would be sets or lists. So what you can actually do is, or what you will be able to do when I unbreak this, is define a Postgres table to correspond
to a single Redis object, and then get the values in that object back as a set of rows. That has all sorts of possible useful values. So the sets and lists will simply
come back as single field rows. Hashes are going to come back as key value rows with two fields. And ordered sets, you can choose one or two fields. If you have two fields, the second is going to be the score that corresponds
to the value in the ordered set. And particularly after my experiences the last couple of days, so I now know how to write writable foreign data wrappers
after the black hole experience, not that that required a lot of code. I'm planning to start work very soon on having the Redis foreign data wrapper able to write values back to Redis.
But in the meanwhile, we can't do that. And we certainly can't do it in Postgres 9.2, which doesn't have support for writable foreign data wrappers anyway. So what we've done is we've wrapped that library, which
is also used in the foreign data, the high Redis library. We've wrapped its functionality in some Postgres functions, which are fairly simple. This work was sponsored by IVC, who are the creators of the ad server.
And it's available on their Bitbucket repo. And it basically has four functions, Redis connect, Redis disconnect, Redis command, and Redis argv.
Now, the Redis foreign data wrapper makes a command for every time you select. It doesn't keep persistent connections hanging around. But this is really designed for much lower level operations. So we didn't want to be having to connect and disconnect
every time you push a single value. You might want to push a million values from a table. And I didn't want to be having to make a connection and break a connection for every single value that I was pushing to Redis.
So basically, what you have to do when you're using this library is to set up a handle using Redis connect. You then use that handle in the Redis command and Redis command argv commands. And then when you're finished, you disconnect.
Yeah? So far as you know, does the
I don't know of anything in the foreign data wrapper API. I mean, basically, it sets up. I mean, you've got to remember, once you've got that, from the point of view of Postgres, you've just basically got a table.
So everything has to be handled inside the foreign data wrapper code. I don't know of anything that would give you access to arbitrary commands. So even when we get writable foreign data wrappers, there's probably still going to be some value in this library because you're going to be able to do things like flush
all if you want to delete, to clean out the database and that sort of stuff, to get access to the utility commands of Redis.
OK, so the first argument to Redis connect is a handle. It's an arbitrary integer which you choose. It needs to be a number between 0 and 15, in fact. And it's simply an index into an array. It's an extremely lazy way to do it. And so I apologize for that.
And I probably ought to fix it and make it a fancier name and then look that up in a hash table. But I don't right now because I was in a hurry. All the remaining arguments are optional. They all default to something.
So normally what you want, if you're running on the local host and you're running with Redis database 0, Redis databases are also named 0, 3, to 15, by the way. You can simply give it the handle name that you want and then use the defaults.
Ignore duplicate says, if we've already created a handle with this, don't object, just return. And that was found useful by IVC in some of their code.
OK, as I mentioned, these are persistent connections. So you need to set them up and break them down explicitly. So these other things are basically very thin layers over the library functions, which
have almost identical names. If you want to use one, just use Redis command argv. Redis command is the first one I wrote.
But unfortunately, mapping, this doesn't work terribly well with variadic C functions. And so Redis command argv, where you actually pass an explicit vector of arguments terminated by a null,
I believe. And that works a whole lot better. So Redis command, you can only have four arguments, up to four arguments after the command string. If you need more than that, then it objects.
At the moment, there are text values, but I might switch it from using variadic text to variadic any. I wasn't actually all that familiar with variadic any at the time I wrote this originally. I've since become familiar with it and done one or two things with it, particularly in the JSON space.
And it has certain limitations, but I don't think they particularly apply in this case and would make it a lot easier so that you wouldn't have to cast things to text. Josh found that with one of the other utilities
I wrote that it was a whole lot better when we used variadic any. So the uses are to push data into Redis and also to run utility statements. And it's being used for both of those things in the ad server. So I've also added in the ability to push a record.
And basically, what this will do is it will pull all the data out of the record without you actually having to name it explicitly and manipulate that. You can tell it.
You can just pass in the record. You can tell it if you want to push the keys, which you normally do. You can tell it the key set you want the keys added to. You can tell it what the prefix is going to be and what the key fields are in the record.
And then your record could be hundreds of variables long, and you wouldn't have to do anything that would just set that. This is basically for pushing a hash table. And it will set up the key set for you as well if you don't pass null in here,
as well as all the hash rows. So it works quite nicely. At the moment, this is not in use. And they're actually doing some looping code in PLPGSQL.
This is going to make life a whole lot simpler in the ad server code. So why use Redis? Well, it's fast. It's actually really fast. It's not terribly safe. Its notion of transactions is almost nonexistent.
And to the extent that it's existent, it doesn't make terribly much sense, at least not in relational terms. So it's useful for certain types of data. If it's data that you can afford to lose,
or at least to lose a little bit of, then it's certainly useful. And the real reason is that, given the constraints of the project that we were working on, we could not have achieved the performance goals simply
by stashing all the data in Postgres. We would have overwhelmed the system. So the use case is an ad server. And basically, our attitude was, if Redis crashes, it's not a tragedy. But if the whole system is slow, then it is a tragedy.
Because all our customers are going to leave if we don't serve the ads back really fast. So this project was created by IVC. It was done pretty much on a shoestring. And they're a sort of small boutique software house
in Cary, North Carolina that I've worked with for quite a few years. They've sponsored a number of extremely useful Postgres developments, such as extensible enums. And they're a bunch of good people. So most of these slides are based on information
from them. So here are the system goals that they had. They need to be able to serve 10,000 ads per second per CPU. So they're actually running on eight-core machines in the database. So that's 80,000 per second.
They're using some old hardware that they managed to acquire very cheaply. They did not want to spend up big on hardware and overcome the limitations by running on super-fast hardware. The timing constraints are that they
need to be able to answer the question of which ad in two stages. The first stage is actually answered from Postgres. And what it does is it says, what are the ads we can legally supply to comply with this request?
And it comes up with a list of around about 30. The second question it answers is, of those 30, which one is going to be the best to serve based on statistical analysis of what ads have recently been served in this particular area,
and which ones are actually going to pay the best? Because it's based on pricing, and there's a fairly complex algorithm which is applied. That answer is computed from the data that is held in Redis.
It records the ad requests, the confirmations, and the clicks. And it pretty much needs to be a 24-7 operation. So here's a physical view of the system. Essentially, we have these four systems at the bottom.
They're Dell 2950s, which are not high-end machines, right? They don't have a huge amount of memory. Two of those run the sys, run the, they all run,
all of these run Postgres databases. Two of them run the database and a hot standby for answering that first question. And two of them run the backend system, the data warehouse, in effect, that does all the analytics and the billing and the back office support. So everything else runs on virtual hosts,
run from these zen hosts out on the right-hand side. And that includes Redis. Redis is running out there. Redis is actually chewing up almost all of that 128 gigabytes, two instances on each, one
on each of those zen hosts. And it runs with about, I think, 97 gigabytes of memory. OK, here's a logical view of how it works.
The system is, the front end is, the ads are actually served out of Nginx, which deals with the file system, pulls the ads straight off the file system. The queries come in through into Node.js, which we found to be extremely fast. It talks to Postgres at the top and then to Redis here
and answers that query. And it's answering that query in about 10 to 12 milliseconds per query. The transactional database holds only two tables.
And we actually construct those two tables every 10 minutes. And they're constructed in the business database, which is a conventional OLTP-type system.
We construct that. We push that data out to the transactional database in its highly denormalized form, build the indexes there, and then we simply swap it for the old one using a rename. And that provides pretty much no interruption whatsoever. We have not noticed any slowdown at all
as we're doing the rename. It's working very well. We also replicate some data into the data warehouse from the business database for doing the business analytics. And that is done on a selective basis using Londis 3 out-of-the-sky tools.
So there we are. We have six Postgres databases. I think I've pretty much just about covered this slide in what I just said. Yeah, the other piece of this is that we have PG pool load balances over the transactional database and failover.
And if anything else falls over, it's not a tragedy. But if the transactional database falls over, then we're not serving ads anymore and we're losing money. And they've tried quite a lot of stuff to make it fall over.
They've pulled plugs and crashed machines in various ways, and it's still stayed up. It's fairly robust. OK, so business database, as I said, is a pretty conventional OLTP-type system. It's a fairly normalized system.
We've got tables for ads and advertisers and publishers and IP locations and stuff. There's only a handful of users ever touch this database, which are our business customers, as to say, the people who publish our ads and the people who advertise.
But they actually tried, in the first instance, serving the ad requests direct out of this database. They did not get to first base. They could not sustain anything like the load that was necessary, certainly not with the hardware that they had on hand.
So we pushed these materialized views out to the ad serving database. One of the keys to this is that we use Postgres advanced data types a lot.
You know, the size of an ad is a Postgres geometric type. It's a box. There's extensive use of arrays of various types. We use some text search vectors and text search queries to match keywords.
And we use gin. I think we've got just about every database, every index type that Postgres supports all on the one table. And it's all working actually extremely well. They don't know whether or not they can actually
they've tried stressing this. And on the current setup, what happened was that they were actually able to saturate their network pipe before they were able to stress the system. So they don't know what the actual stress point of the system is yet.
So here's the ad serving database. As I said, we simply copy the data in this very denormalized form every three minutes and then create the indexes and swap them out. And they are absolutely convinced that this could not be done without Postgres.
They're extremely, they're very Postgres friendly, and they wanted me to tell you that thank you very much. OK. So let's say we're getting 10,000 impressions per second.
All of these impressions are actually going into Redis. And then they're pulled from Redis in batches using the foreign data wrapper and the set model every six minutes for statistical analysis.
If they'd been doing this with individual inserts into Postgres, it would have been overwhelming. They couldn't have done it. They use partitioning to run their nightly back processing
to produce billing reports and other statistical analysis. And again, they're extremely grateful for the way that partitioning works with Postgres. They think it works very well for them.
OK. Oh, one other thing. The impressions are not stored direct into Postgres. But that's not where the money is.
The money in serving ads is when people actually click on an ad. That data is actually recorded straight into the recording database as well as into Redis. But that's only a tiny fraction of the traffic
because you bring up a web page, you're going to see 100 ads. And most often, you're not going to click on any of them. So clicks don't account for very much. But impressions are almost all the traffic. But of course, you might pay $0.01 per 1,000
for impressions, whereas you might pay $0.50 or $1 for a click. OK, so basically, this recording database does all the back office processing. They initially tried pulling this data
from the data in the business database that they required in using DBLink. They found that was way too slow. And so they replicate that data in using Sky Tools. And it's a lot faster.
Fraud detection. Fraud detection is actually a major issue in ad serving. And there's a considerable incentive on publishers to generate clicks on the ads that they're publishing so that they get some revenue. And you need to be able to analyze
the patterns of impressions and clicks in order to prevent fraud. This is actually done in two stages. Firstly, they have some fraud detection that is running pretty much in real time
against the Redis database, where the clicks are initially stored. They also do longer term fraud detection in the recording database, because they only keep basically a day's worth of impressions at any one
time in the Redis database. They clean it out pretty regularly. That means you need to store a lot of data in the recording database. So we know pretty much this. This is telling us what we already know here. In the ad server, as I said, 94 of the 128 gigabytes
on each of those zen hosts is dedicated to the Redis VM. And almost all of that is used by Redis. Redis is the only thing that runs on its VM.
So they have found that the stuff that we did was a lot faster and made life a lot easier for them than if they had to write this themselves. And they found good performance characteristics from it.
So the stuff's going the other way. We actually need to, one of the things that you need when you're making a decision about which ad to serve is what are the prices that are being paid for impressions and clicks, because that helps you to decide whether you
serve this ad or that ad. And so that actually gets pushed from the business database into Redis so that you can make the decision in Redis, or the Redis client, in effect, can make the decision. And on the other hand, the impression and click data
is pulled from Redis into the recording database via the foreign data wrapper. So we have interaction both ways. This has been in production since March, scaling pretty well, as I said.
And it was done on a shoestring. There are pretty much two guys with some help from me that did this in the space of three months. So here are their conclusions.
Rich data types, absolutely essential. Couldn't have done without them. The indexes and the different index types helped them a lot. Good interaction between Redis and Postgres. Essentially, we're using Redis to do what Redis does well and Postgres to do what Postgres does well.
This is the key to good hybrid systems. I'm convinced that we're going to see more and more hybrid systems of this kind. So picking one tool to do all the work is probably not going to cut it for a whole lot of applications. And you're going to see people
combining different sorts of data stores for different sorts of uses here. Somebody else was telling me about using Postgres with Hadoop the other day, and we're seeing a number of cases of that and so on. Node.js is very fast. Its concurrency is superb. And they are pretty much in love with it.
And the costing, they think that their development cost was approximately 2% of what a commercial alternative would have cost. It's all open source. So that's it.
Yeah. Node.js is essentially answering all the dynamic questions. So the only thing that Nginx is doing is actually serving the ad. In other words, the answer comes back from Node.js and says, this is the ad to serve,
and then Nginx pushes it out. The actual query is handed off by Nginx to Node.js, and the whole app is written as a Node.js app. OK?
Yeah.
Not sure what the wire protocol is, to tell you the truth, but it's probably something fairly much like that.
Right.
You're going to get an error. I mean, all these commands check that you can talk to Redis, and they throw a Postgres
exception. So the thing comes in and it says, OK, have I got a connection? No, I haven't got a connection. E-report error. OK, now I'm going to run this command. Well, if that didn't work, I'll get the result back in e-report error.
Yes. Yeah.
Yeah, I don't know. It's probably worth looking at to see whether we could. The trouble with not doing ephemeral connections is that we would have to have some API to make them persist.
Now, maybe, and there's nothing in the foreign data wrapper model which actually would do that, but that might not be an objection. It might be that we could simply say, well, if there's a connection there, use it, and if not, connect, or something like that.
Yeah.
Yeah, we might be able to do it, and maybe we could say, you know, use a, it could be an option on the foreign table. Use persistent foo or whatever, and then if foo hadn't been set up, you would set it up,
and if it had, you would just use it.
Yeah. Yeah, that's probably worth doing. Yeah, no.
I mean, no. No, there's no nesting of data types at all.
Yeah, it's basically a key value store. Yeah, yeah, yeah.
Not sure to tell you the truth. I haven't written that part of it. OK?
All right.