We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PostgreSQL features for Internet of Things

00:00

Formal Metadata

Title
PostgreSQL features for Internet of Things
Alternative Title
PostgreSQL features for IoT
Title of Series
Part Number
16
Number of Parts
110
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
19
20
Thumbnail
44:46
23
30
Thumbnail
25:53
69
Thumbnail
25:58
76
78
79
96
97
InternetworkingDiscrete element methodOpen sourceDatabaseFinitary relationDemo (music)Computer iconHand fanElectronic mailing listCountingData typeTable (information)Asynchronous Transfer ModeClient (computing)DivisorThread (computing)Database transactionSoftware testingLine (geometry)Point (geometry)Symbol tableComputer programmingLoginAssociative propertyOffice suiteExecution unitMultiplication signSemantics (computer science)NumberVariable (mathematics)Database transactionDemo (music)Theory of relativityRule of inferenceObject (grammar)Sheaf (mathematics)Set (mathematics)Interpreter (computing)Figurate numberLaptopRevision controlPhysical systemSource codeMechanism designGreatest elementData storage deviceService (economics)1 (number)StatisticsData typeQuicksortSpiralExtension (kinesiology)TimestampForestRow (database)Touch typingInternet der DingeIterationDrop (liquid)2 (number)Physical lawInstance (computer science)Cartesian coordinate systemComputer configurationFocus (optics)Content (media)Process (computing)Insertion lossGroup actionCellular automatonCASE <Informatik>InformationEnterprise architecturePressureServer (computing)DatabaseStructural loadOpen sourceTable (information)Subject indexingStapeldateiBlock (periodic table)Field (computer science)Position operatorMereologyForm (programming)Type theorySingle-precision floating-point formatBitRange (statistics)Relational databaseOverhead (computing)Event horizonMultilaterationSemiconductor memoryMiniDiscInternetworkingXMLLecture/Conference
Integrated development environmentMIDILaptopStructural loadSubject indexingBinary fileOverhead (computing)Information privacyLevel (video gaming)ACIDDuality (mathematics)Computer-generated imageryPhysical systemSingle-precision floating-point formatTwin primeSet-top boxScalabilityLinear mapArchitectureDigital filterSinguläres IntegralRelational databaseVariable (mathematics)Software bugTexture mappingMultiplication signStudent's t-testStandard deviationExecution unitWave packetNumberGroup actionVarianceConstructor (object-oriented programming)Bit rateInstance (computer science)Condition numberPhysical systemQuicksortPressureAreaDatabase transactionEndliche ModelltheorieLinearizationMedical imagingVotingScalabilitySubject indexingVacuumForm (programming)Limit (category theory)Range (statistics)MathematicsDirection (geometry)Message passingCASE <Informatik>Process (computing)Mixture modelQuery languageDialectComputer architectureLine (geometry)Point (geometry)MereologyData structure2 (number)Natural languageDecision support systemWordSolid geometryProjective planeCross section (physics)MeasurementInformation securityComputer programmingSoftware testingExtension (kinesiology)Insertion lossSingle-precision floating-point formatSemiconductor memoryMoment (mathematics)PredictabilityStress (mechanics)Server (computing)ResultantBenchmarkDatabaseTerm (mathematics)Internet der DingeInformation privacyReading (process)Level (video gaming)Complex (psychology)Graph (mathematics)Open sourceType theoryObject (grammar)Row (database)CoefficientReplication (computing)ChainCentralizer and normalizerVector potentialData warehouseDifferent (Kate Ryan album)MultiplicationRevision controlMechanism designCartesian coordinate systemLaptopGradientSoftwareStructural loadRule of inferenceLecture/Conference
Variable (mathematics)Level (video gaming)Discrete element methodSubject indexingSpherical capOrientation (vector space)TheoryBitOnline helpElement (mathematics)Set (mathematics)Multiplication signAreaComputer programmingDifferent (Kate Ryan album)NumberTerm (mathematics)Arithmetic progressionComputing platformVotingSummierbarkeitType theoryDatabaseBasis <Mathematik>Projective planeQuicksortInformation securityCommitment schemeDatabase transactionImplementationMereologySubject indexingState of matterUniverse (mathematics)Likelihood functionVariancePersonal computerFood energySemantics (computer science)DistanceProcess (computing)Data storage deviceOrder (biology)Endliche ModelltheorieTotal S.A.Internet der DingeInstance (computer science)Level (video gaming)Formal languageWave packetEvent horizonMedical imagingCASE <Informatik>WordInternetworkingDivisorDegree (graph theory)Connected spaceQuery languageSoftwareKeyboard shortcutLibrary (computing)Table (information)Parallel portBefehlsprozessorRevision controlCovering spaceCopyright infringementComputer architectureData typePairwise comparisonServer (computing)Field (computer science)Client (computing)Point (geometry)Focus (optics)MomentumBlogLatent heatInverter (logic gate)Lecture/Conference
Core dumpGoogolComputer animation
Transcript: English(auto-generated)
I'm going to speak loudly about PostgresQL, if so known as Postgres.
Full disclosure, I haven't got any devices attached to me apart from the microphone. I'm going to be talking about databases for the Internet of Things, specifically Postgres. I'm one of the people that write Postgres, so please shout out abuse if you really don't like what you hear.
So who's heard of Postgres? Shout of hands please. Keep your hands up if you know something about Postgres. So Postgres is an open source database. Come on in, don't worry.
It's an open source database with a BSD-style license. A lot of people think that it's a relational database. People that don't like it tend to call it a relational database because they feel that that somehow limits it and makes it 20 years old, as I've heard it said. So Postgres is actually an object-relational database, which makes it highly extensible.
And what that's given us over the years is the easy ability to add new data types, add new types of index. And specifically some of the things I'm going to be talking to you about today is our ability, fairly recently, to store JSON data within the database in a compressed, indexable form.
And that's known as JSONB, as well as our ability to store GIS data within the database, which is via a package called PostGIS.
So what I was asked to do was do a demo. So I'm going to do a demo just on my laptop. And what I wanted to do is just give you a very simple database to think about. It's a single table, and it's got two columns in it, one called TS, which is a timestamp, and another column called J, which is of type JSONB.
And into that simple table, I've got a pretty simple blob of JSON, which has got a field called device, which has got a number.
And then 12 things that I'm calling measures, but obviously they could be temperature or position or air pressure or anything you wanted that to be. So the JSON that I'm storing, the schema doesn't matter. This is just a demo.
The number of fields in the JSON block doesn't matter. That's just a demo as well. So I'm just giving you a flavor of what's possible. And on that table, I'm adding a single index. And I'm going to be using a new type of index that we have in Postgres 9.5.
We've got a block range index or a brin index. And this is the type of index that's specifically designed to be low overhead as you insert data into it. And it's also specifically designed to have a very low overhead as the table grows. So it is specifically designed for tables of like terabytes and above size.
And what I'm here to tell you about is really that that's specifically for the Internet of Things. So this is the bit where I wish I'd put my mic on. So I'll attempt to do that here if you let me know if there's a problem with the audio.
So what I've done here is just set up a simple database. As you can see, I've got a single table in the database as I've just described. The current table has got 2.2 million rows in it.
And that is 430 megabytes. So rather than do anything attaching to the Internet, I'm just going to give you a demo just straight off my laptop. So if we truncate that down to zero bytes in size,
what I'm now going to do is run a test where I actually load data directly into that database, single records at a time just so that you get a flavor of roughly how fast Postgres is.
So I'm loading 100,000 records using two connected sessions through to the database. And each individual session is inserting into the database.
So it's transactionally recording each and every individual insert. So in this particular example, I've got 11.7, nearly 12,000 transactions per second going into the database on my laptop.
And that's using fully safe, fully sound transaction semantics. If you don't want that, if you think that that kind of thing is over the top, then we can do it a slightly different way.
What I've got here is the ability to use relaxed transaction semantics using a feature called synchronous commit equals off. What this does is it doesn't flush the transaction to disk when you commit.
It just leaves it in memory and it's then flushed half a second later. So if you're willing to potentially loosen data in the event of a crash, we can greatly increase the performance from the server. So by enabling this feature, I get 32,000 transactions per second off of two sessions.
So as we can see, we're getting 30,000 transactions per second using full individual transactional semantics.
What I can also do is I can actually load data into the server using bulk mechanisms. Previously, I've prepared a data set that's got 2.2 million rows in it. And if I load that into the server, you'll see that that loads significantly faster than the individual row mechanisms.
And hopefully, that's going to come back about now. So 14 seconds, 2.2 million rows loaded, and that means it's loading a million records every six seconds into the database.
So why show you two things? Well, obviously, if you want to just load the database with full transactional semantics that works,
if you want to accumulate records into batches before loading, that is an even faster approach. Both of those are still transactionally sound, so you can still query the database while that's happening.
So people can still be accessing their application all the while you're doing it. So you can put any form of table lock on the system. All that we require is that you cannot access data until it's been transactionally committed. Once it's committed, everybody can see it.
So back to the details. So I've been loading JSON blobs. What I was doing, actually, the program that did that was actually randomly generating the JSON blob and then inserting that.
So it wasn't predefined data. It was actually calculating that and loading that as part of that demo. So let's just have a look at the numbers that we looked at. That's 30,000 JSON messages per second, which works out to be 10 million messages per hour on my laptop.
So if we had a bigger server with greater capability, you can imagine what you'd be able to do with that. The bulk data load was a million JSON objects loaded in six seconds, which gives you 36 billion messages per hour loaded into my laptop.
So imagine what you'd be able to do with a server. What I'm going to talk about in a minute is the fact that we also have Postgres clustering with a technology called Postgres XL. And we've got basically linear performance for that.
So what we'll be able to do is... I think your coefficients are off a little bit there. I think it's 36 million per hour, not a billion. Okay, that's a shame. Are you sure about that? Yeah, it's four times faster, right?
No. If it's one million in six seconds, 120,000 per second. Okay. It's locked. The world needs hearing. So where does all this come from?
We ran a project recently called Axle. It was an EU funded project. So almost all of you sitting in the room and paying your taxes, I hope, and that's actually gone towards funding Postgres for big data. What did we put into Postgres as a result of this project?
Well, BRIN indexes, which I've just described. We've also put row level security into the database, which is of sufficient grade that you can have very complex medical privacy rules inside the database.
So, for example, what we'd be able to do there is have devices that load things like blood sugar measurements and blood pressure and heart rate sensors loading data into a system that nobody but you or your doctor or somebody that your doctor had temporarily authorized
would be allowed to see those readings. So one of the types of applications I hope we'll be able to do with open source software within ten years is that you'll be sitting there stressing about something
and you'll hear an ambulance coming towards you and you'll go, that's strange, what's the ambulance for? And it will actually be a sensor on you that's detected that your blood pressure and heart rates have risen badly as a result of your stress
and they've predicted that you're going to have a stroke and so they've sent an ambulance to come and get you before you've even realized it's happening. So that's the kind of thing that we'll be able to do with the transaction rates that I'm talking about. So we'll be able to take a blood pressure or glucose level reading
from everybody in Europe every minute. That's the type of numbers that we're talking about here. So what have we got in terms of Postgres clustering? The idea is that we're going to have a database that is a single system image
but it's composed of many separate data nodes. This is just about to be released as beta. We've benchmarked it using the industry standard TPCH benchmark
which is a complex decision support benchmark not necessarily related to the Internet of Things and the red lines are the performance of PostgresXL. The upshot of that is on a 16 node cluster we've got quite a few of those queries going 16 times faster.
The more complex queries in some cases the performance dropped to like 4 or 8 times but overall we've got some very good performance. The most important thing is that for simple queries we've demonstrated linear performance gains or linear scalability.
So if you can imagine the type of things we'd be able to do if you added 128 node cluster you'd be able to scale that up significantly. There's some other graphs that show measurements of linear scalability.
So the blue line is Postgres and then the lines above are 2, 4 and 8 node performance. You can see it's scaling upwards very nicely. We've done that test at 16 nodes at the moment. We think there's no particular reason why we can't go to 32 and 64
but we're just sort of working our way up at the moment. One of the other things that we have recently released is a tool called PGlogical and what this is is a publish and subscribe data transport extension for Postgres
and the idea here is that you'll be able to connect databases together into distributed data architectures. So for example you'll be able to have devices feeding into a regional centre
and then regional centres feeding into area level centres up to potential central data warehouses with a model that we'd be able to send reference data down to the individual devices
and then collect detailed data and send that back up the chain the other direction. So a replication in multiple different directions but here we're sending specifically the individual data items we're not sending things like changes to the index or vacuum records and things like that
don't get sent via this mechanism. The PGlogical stuff all works cross version now so one of the things that I think will be possible in future is you'll be able to install Postgres on your individual devices
and those devices won't need to be upgraded for ten years but it will still be able to send data back through a distributed data architecture like this to a central server where the central servers were actually upgrading each year as the software improves.
So that's one of the design objectives there. So this was 25 minutes and I wanted to say everything I wanted to say fairly quickly so there'd be some time for questions so I've got about six minutes left. What I'd like to do is just summarise some of the things I've told you about Postgres.
So most importantly the model you use for your data is flexible. You can use a schemaless design if you choose or you can use a relational model or a mixture of the two and there's other ways of doing things as well.
So we support a number of different data types, that sort of flat structured data but we also support document style, text style data or GIS data as well. Hopefully I've demonstrated that the data loading rates are easily sufficient
for you to consider using Postgres for your applications. Although I didn't give a demonstration of the performance of it, the BRIN style indexes I would remind you that the BRIN index was in place while I loaded that data.
So you could see how little the BRIN index slowed down the data load. But the BRIN index itself acts very similarly to partitioning in that it will allow you to access date ranges in that table with high performance.
So that's what it's specifically designed to do. Also being announced is Postgres Excel, that's going to be announced in the next week or so. And I've also showed you or at least discussed the fact that we've got low level security if you need it
and we've also got integrated data transport facilities to create not just a single server but a whole distributed data architecture that is appropriate for the Internet of Things. The last point is that many people like Postgres because the BSD style license
means you've got the capability to install it on devices and not worry too much that someone is going to be chasing you with some copyright infringement or some sort of babble about GPL. So it's particularly suitable for embedded devices as well as big data.
So that's Postgres. I hope I've given you a good flavor of that specifically for the Internet of Things. So over to questions. Who's got a question?
Hello John. You have to bring the microphone to the... I'm sorry? You have to walk to the person who... I'll walk to you, thanks. How are you? It's an SSD. Luckily enough I can afford one now after years of not...
I just checked your fingers again. So you can do 100 million per hour with the... That's good. I'll check yours myself as well. Anybody else? Sorry?
So if I can put some data in a couple... Can I also index by field in the JSON item? So the question is can we index the individual JSON items?
And the answer is definitely yes. We've got index types that allow you to index particular elements of the JSON blog and you can have different index types on each of the individual items if you wish. We can just in fact index the whole blog so you've got a kind of everything index approach.
So we do that using an index type called GIN which stands for Generalized Inverted Index which is essentially the same type of index that Elasticsearch and Google uses. Other questions please?
So it basically means that... Sorry? The microphone please. How did you do that? So go ahead. Sorry. So basically it means that your stimulus is also added to the G6. We can change the G6. Absolutely correct. Yes. So I have one question. Do you always recommend to go through an intermediate software that goes to the database
or can the IoT devices, the chips, should they talk directly to the cluster server? For a database guy, as long as you put it in Postgres, I don't mind how it gets there. There's a lot of other considerations that you need to think about.
In some cases it's possible to go direct, in some cases not. Do you know about libraries for small devices that allow you to connect to Postgres? Well obviously we just use the Postgres client libraries for that.
So they're available from a number of different languages. There's language bindings for basically everything. I do have a question. Sorry, that wasn't supposed to be invasive. It's a capability in Postgres that you can access it anywhere from anything.
Can you come with the microphone? Yep. In one of my previous jobs I did a lot of CouchDB, which is all JSON all over the place. And the previous job I did four years of Postgres.
So I was wondering with your knowledge of Postgres, maybe your knowledge, I don't know if you have CouchDB. CouchDB, everything is JSON and it's optimized for JSON. So I was wondering if you could do a parallel with advantages and disadvantages of the-
Well obviously I have an opinion on CouchDB and I prefer Postgres. So what I can tell you, I think a direct comparison I think would be unfair in many ways.
What I can tell you with Postgres is that it's a database platform that contains many different elements. So we are innovating and progressing with Postgres technology very rapidly in a number of different areas. And all of the areas of progress all interlock and interreact.
So the types of things that you're getting there, enhanced security, enhanced indexing, new data types, all come together in Postgres. And that's why I work on it and why I recommend it.
Because if you design your own data store, you're basically losing all of the momentum and all of the additional features that you get with the Postgres package. So Postgres may not always be the first to innovate in use of new features like a new data type like JSON.
But we do innovate in many areas. So the PostGIS implementation for example is widely acknowledged to be the best in the world. And obviously things like transactional semantics, robustness, as well as performance is what Postgres is noted for.
So I don't want to diss other projects. I just want to point out that Postgres has got a lot of momentum and a lot of focus with lots of different features coming together in one place. Any other questions?
Is Brin index specific for the income index? So Brin indexes can be used with any data type. They're specifically designed for use with naturally ordered things.
So the example of date time there is useful because the data has a natural ordering. So another example of something that you might want to use would be a sort of order ID which would naturally increment over time. That's for its best use case.
If the data is sort of all over the place, then it is much harder to use that started index. Postgres has got six different types of index all suited for different use cases. So whether it's sort of B-tree is useful for RTP, high volatility inserts, updates, deletes.
Or Brin is more suitable for historical tables and we also got text style indexes as well of various kinds.
So the progress that we're making on parallel queries that we're developing are intranode parallel query.
Which means you'll be able to scale it to the number of CPUs within a particular node. The technology I'm presenting here with Postgres XL allows you to scale across multiple nodes.
So that you can actually get a much higher degree of parallelism using XL than you can with the intranode parallel query. The parallel query that we've got doesn't cover all types of query. So the Postgres 9.6 version that will be out later this year,
maybe September if we're lucky, will contain the first elements of intranode parallel query.