PostgreSQL - The Database for Industry 4.0 and IOT
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 160 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33719 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
DatabaseInternet der DingeIntelComa BerenicesSoftwareMereologySelf-organizationProduct (business)Office suiteQuery languageCartesian coordinate systemPrincipal idealSystem programmingObject (grammar)ComputerCoprocessorInternetworkingTerm (mathematics)Focus (optics)Computer programInformation securityInformation privacyLocal GroupNumberCybersexComputerSource codeDynamical systemSoftware maintenancePredictionAerodynamicsInterrupt <Informatik>Local ringProcess (computing)Digital filterVariety (linguistics)DatabaseRing (mathematics)StapeldateiProduct (business)Bit rateMultiplication signDigital mediaPasswordMereologyGrass (card game)PlotterVirtual machineSerial portMarginal distributionTerm (mathematics)Latent heatStaff (military)Integrated development environmentCartesian coordinate systemSoftware developerNeuroinformatikPhysical systemTraffic reportingMathematicsSurface of revolutionMassData loggerNumberConnected spaceGreatest elementControl flowSpacetimeGroup actionFocus (optics)MultiplicationComputer programmingCartier-DivisorSemiconductor memoryComputer fileObject (grammar)Web service3 (number)Office suiteInternet der DingePulse (signal processing)Field (computer science)Statement (computer science)Set (mathematics)Materialization (paranormal)Point (geometry)Software maintenanceInterrupt <Informatik>2 (number)CybersexInternetworkingBitError messageLecture/Conference
09:56
Focus (optics)Computer fileDatabaseMaxima and minimaOpen sourcePhysical systemRelational databaseSystem programmingSweep line algorithmInformation technology consultingSoftwareLocal GroupOpen setEnterprise architectureElectronic mailing listSoftware testingStaff (military)CodeModel theoryGreen's functionData storage deviceQuery languageAnalytic setDisintegrationPrice indexPattern languageBinary fileType theoryData storage deviceComputer fileSoftware developerCodeRelational databaseSoftwareLocal ringEndliche ModelltheorieSet (mathematics)Multiplication signVirtual machineMathematical optimizationAttribute grammarSubject indexingRow (database)Standard deviationQuery languageField (computer science)Connectivity (graph theory)Computer programmingMaxima and minimaObject (grammar)Type theoryPhysical systemParameter (computer programming)Electric generatorMaterialization (paranormal)Internet der DingeError messageWave packetOpen sourceBusiness modelLatent heatMachine learningTrailOperating system2 (number)Data analysisComputer scienceNeuroinformatikDatabaseTable (information)Ocean currentTwitterRevision controlNumberCartesian coordinate systemGroup actionEmailProjective planeOpen setFormal languageGoodness of fitCybersexStress (mechanics)AdditionIntegrated development environmentComa BerenicesMereologyVideo gamePulse (signal processing)ResultantGreen's functionInternet forumComputer architectureProper mapWater vaporDegree (graph theory)Lie groupDifferent (Kate Ryan album)Focus (optics)InformationDataflowOffice suiteElectronic mailing listProgrammer (hardware)Computer chessData typeData structureWriting
19:23
CodeQuery languageDisintegrationAnalytic setData storage devicePrice indexType theoryBinary fileReading (process)AstrophysicsCodeData typeData storage deviceStress (mechanics)Array data structureMultiplicationElectric generatorBlock (periodic table)Computer animation
20:11
CodeAstrophysicsSubject indexingSurfaceRelational databaseTable (information)CodeField (computer science)Mathematical optimizationData typeDatabase
21:17
Table (information)Operator (mathematics)Price indexCountingInformationDatabase normalizationSoftware engineeringSubject indexingChainObject (grammar)Attribute grammarNumberInformationField (computer science)Table (information)TwitterLatent heatMultiplication signQuery languageCartesian coordinate systemNormal (geometry)Computer programmingLine (geometry)WordDatabaseKeyboard shortcutNetwork topologyBlock (periodic table)Computer fileWeb serviceWritingXMLComputer animation
23:29
Table (information)TwitterSet (mathematics)Key (cryptography)Function (mathematics)Formal languageRow (database)Type theoryAnalytic setData structureMathematical analysisQuery languageMathematical optimizationPrice indexPoint (geometry)Variable (mathematics)Interface (computing)Point cloudExtension (kinesiology)Binary fileRange (statistics)Block (periodic table)Pairwise comparisonData storage deviceSeries (mathematics)Plane (geometry)Declarative programmingField (computer science)Multiplication signCartesian coordinate systemRow (database)Block (periodic table)PlanningSubject indexingPhysical systemPoint (geometry)Axiom of choiceExtension (kinesiology)Computer filePulse (signal processing)Range (statistics)Hydraulic jumpVotingTemporal logicOrder (biology)Overhead (computing)Direction (geometry)Natural numberFrequencyPhysicalismData storage deviceIntegerHypothesisData typeProcess (computing)Cache (computing)Semiconductor memoryLevel (video gaming)Spring (hydrology)Latent heatMathematical optimizationMereologyHard disk drivePoint cloudSoftware developerNormal (geometry)Data loggerAnalytic setLine (geometry)Traffic reportingNetwork topologyRevision controlTable (information)Type theoryPort scannerGoodness of fitAttribute grammarSimilarity (geometry)XMLComputer animation
28:37
Declarative programmingMeasurementTable (information)Partition (number theory)Range (statistics)Execution unitElectronic meeting systemExtension (kinesiology)Digital rights managementMethodenbankCodeDatabaseLocal ringWeb serviceLogicReplication (computing)Streaming mediaPartition (number theory)Table (information)RoboticsFile formatDatabaseMathematicsReplication (computing)Virtual machineCodierung <Programmierung>Rule of inferenceData loggerDrop (liquid)Revision controlWritingRow (database)InformationExtension (kinesiology)Latent heatInternet der DingePhysical systemComputer programmingCodePoint cloudWeb-DesignerLogicFunctional (mathematics)Open sourceVideo gameInternetworkingCycle (graph theory)Data structurePower (physics)Software developerFactory (trading post)Stress (mechanics)Slide ruleComputer fileTranscodierungForm (programming)DivisorCategory of beingIntegrated development environmentGroup actionFunction (mathematics)Cartesian coordinate systemImpulse responseProgramming languagePoint (geometry)Right angleComputer animation
34:38
CodeReplication (computing)Finitary relationOrder (biology)Streaming mediaMathematicsOpen sourceInternetworkingMultiplicationPay televisionDatabaseInformationStreaming mediaProcess (computing)Codierung <Programmierung>Branch (computer science)Computer animationDiagram
35:37
Streaming mediaWrapper (data mining)Digital rights managementStandard deviationDisintegrationSoftware frameworkLibrary (computing)Revision controlComponent-based software engineeringUnicodeOperations researchPhysical systemDatabaseviRevision controlReading (process)DatabaseUniform resource locatorPhysical systemImplementationTraffic reportingApplication service providerWrapper (data mining)Presentation of a groupOperating systemOrder (biology)BlogMultilaterationTable (information)TwitterWorkstation <Musikinstrument>INTEGRALInternetworkingProduct (business)CuboidProcess (computing)Standard deviationWritingComplete metric spaceSubject indexingOcean currentInformation securityOperator (mathematics)Connectivity (graph theory)QuicksortPrice indexRow (database)Goodness of fitScripting languageComputer scienceSource codeComputer animation
41:43
Computer programmingData structureDatabaseXML
42:20
Field (computer science)Subject indexingKey (cryptography)Order (biology)NumberMereologyDatabaseMultiplication signFunctional (mathematics)Software testingLecture/Conference
Transcript: English(auto-generated)
00:04
So, first thing, who am I? I'm using PostgreSQL since around 1996. I was using it first with PHP, so I was young and I needed the monies. I've essentially been using PostgreSQL longer than I've used Python.
00:21
I joined the Python community something around 2003 with the EuroPython in Cholera. I formed my own company, querying other databases than PostgreSQL for stuff and putting it to Excel. Due to my involvement in the Python community and later in the PostgreSQL community,
00:42
I was spotted by Second Quadrant, which is a global PostgreSQL support and development company and invited to run the German affiliation of Second Quadrant. Now, we're talking about PostgreSQL as a database for Internet of Things and Industry 4.0.
01:02
So, while people still are streaming in, I will spend some time on discussing what's behind those terms, give a little approach and make clear what we are focusing on. Internet of Things, I found a very interesting graphic, which is much better than those bullet points.
01:27
It started out with tracking everything. The big, big grays don't have your entrance badges, have an error of ID entrance badge so you can scan it.
01:41
The idea is you put computers into everything that's everywhere. It started out with some stuff here just to really take stuff that's been moved around and more and more intelligence, whatever you call intelligence, is put into these things.
02:03
It's a big, big, big buzzword. You can formulate it another way. Computer chips became cheaper, so it was economically viable to put them into shoes or drinks or whatever.
02:21
Which part of the Internet of Things should we focus on in this talk? In the media, there is often something about the household appliances because everybody wants to get on this buzzword, so your fridge will order the milk for you. I don't think so.
02:41
Amazon had those dash buttons where you can order a specific product to a non-described price from Amazon to have it delivered. Even washing machines companies tell you that the washing machine will be updatable with new washing programs.
03:01
Just to be a little bit serious, they are turning left and turning right in different speeds. How much can you really update on how much is a gimmick? Anyway, those connected applications push us into a very, very, very important discussion. Who owns the data? Who shall have access to the data?
03:25
If they see Harold is always cleaning white clothes with a high temperature and he always has the special button to clean out blood, maybe they start to get suspicious.
03:43
Is he just a bleeder from his nose or is he a serial killer? Even there with the household appliances, there's a discussion. I want to avoid this discussion of the social impacts for this talk. We have a very interesting talk this morning about the efficacy of IT,
04:04
and there is an open space there. So here, important topic, not the focus here. The other term in the title is industry 4.0. The very interesting thing, that's a German thing and a German government thing.
04:23
The German government set up a work group and they called it Industry 4.0. Of course, they called it Industrie 4.0 because it's a German government and they talk German. But anyway, there are multiple aspects of Industry 4.0.
04:44
One thing is there are more sensor data to be collected. They hope that the machines will communicate with each other. And there is a very fascinating sentence in the final report of the work group. Humans and machines communicate and cooperate.
05:02
So if you ever, ever shouted at your printer, you shitty beast, why don't you print, you're way ahead. Anyway, do we really need release numbers on our industry revolutions? So do we need release numbers on revolutions? I found a very interesting graphic that kind of tries to explain what 4.0 in the industry was.
05:30
In the first thing, we switched from horses pulling things to water power and steam, mainly in Great Britain. Then second, the mass production mainly started with the butchers in the United States.
05:45
Then copied from the automotive industry. The third thing, computers and automation. And the fourth part with the cyber and physical systems. And I think that's one part we can focus on. And that's some stuff that some of our customers are really talking to us at Second Quadrant.
06:04
All those production machines produce a lot, a lot, a lot of data. And usually, they write it to something that essentially evaluates to def null. So maybe it's written to some memory chip on the machine or some logging file which gets deleted.
06:25
But more often than not, all the data is put into Wayne. And not used anymore. 4.0, the idea and the part of industry 4.0 I want to focus on this talk is those interconnection of systems, what to do with it.
06:44
So, finding the focus for this talk, because Internet of Things and Industry 4.0 is a big, big, big topic with a lot of people making a lot of money talking a lot about it. We know there will be more sensors, more computing devices, more objects.
07:01
They will produce more data. Watching and analyzing log files would be a good thing to do. I think that's a statement that most could subscribe to. It would be good if we looked at the warnings in our log files and take or don't take actions.
07:24
And finding your stuff is something that would really be helpful in a professional field. There are ethical problems if you have your own stuff and everybody in the world can track it. But just in a professional environment, just think if every worker just spends five
07:43
minutes searching for something and we're just talking about personal office supply or tools. We're not even starting for the production materials flying through the company. How much productivity is lost? And it's not lost to fun activities like playing Minesweeper. It's lost to sad activities like searching for stuff.
08:04
One part of those sensor data and stuff we are really talking about with our customers which they want to implement, we're talking about the maintenance aspect of stuff. If you want to have an air condition that runs with a very high margin that it will really keep on running
08:28
or you have a heating system that keeps on running, you have to do regular maintenance. And that regular maintenance is set on specific time intervals and then you have to change specific parts much earlier than needed.
08:43
And sometimes if you have a set interval it will be too late and the machine will break down. So the regular intervals give you additional costs because you have to do the changing and maybe you do it too soon or too late because you have fixed intervals.
09:02
There is another way, the fix when broken maintenance method. You just wait until your air conditioner breaks down and then you start fixing it. The good thing is you only change the parts when they're really broken, you don't change them too soon. The bad thing is the interruption of service. Now that's the part of those sensor data and Industry 4.0 and other things I really get.
09:26
It's really nice if you grab the sensor data from your machines that are really working, air conditioner, heating machines in the production floor, in the shop floor, and you analyze the data and see what happens or does not happen and can have a dynamic maintenance.
09:44
You save a lot of money and you can keep the stuff running. Now why should you share those logging data via the internet and consolidate it in a big database or somewhere else? You could always do it locally and we all know those jokes about the check engine light.
10:03
You can fix it by a post it putting on a check engine light. I know it from developing software for end users. All the warnings, I really saw how the training in the second or third generation for my software was. Oh, this warning always comes, just press OK.
10:22
The local handling of error messages, of warnings, of check engine lights needs a lot to be desired. Those local check engine lights, those local things giving information, they relay on a specific set of parameters that are frozen when the system is developed, when it's distributed, when it's rolled out.
10:47
Maybe in newer generations for this machine we have new consumables, new filter materials that are more thorough, they take more dust out of the air in the air conditioner thing or they have a longer time and your given set of parameters is very static.
11:07
If you can communicate it to a central system you can do it and you can also add new analyses. So our focus, I have been moving backwards, our focus, more data has to be stored somewhere, more data has to be analyzed.
11:24
Now, why should you store that data in PostgreSQL? Something about PostgreSQL. PostgreSQL is an open source object relational database system. Very important, it's not just a relational database system, it's an object relational database system.
11:42
So there are some more features written than just tables. It's more than 15 years of active development. We are strongly confirming to the official standard of NCSQL. You never hit 100% of a standard but we are very hard trying to be very close on it.
12:01
It runs on all major operating systems. Currently we are facing out OS2 and some very exotic HP UNIX version for PostgreSQL 10. And I put something very in bold letters, PostgreSQL was built to be extended.
12:22
That is, getting new components into PostgreSQL was planned in the design of PostgreSQL. Some specifications, very good, the maximum database size. You can put all your Twitter tweets into PostgreSQL, it's unlimited.
12:40
The maximum number of rows per table is also unlimited. Currently we have a maximum table size of 32TB and a maximum field size of 1GB. And that's sizes that really are tested, which are supported and which work in practice.
13:01
A very, very important part, and the speaker before me was talking about the license, is the PostgreSQL license. It's kind of an MIT license, that's the legal lease, I've shortened it. Do what you want with the software, if it breaks don't come crying. So you can do what you want with the software, you can bundle it, you can resell it.
13:26
And this gives us some very nice aspects. The first thing, within open source software there is not every open source software is equal. Some open source software is owned by a company.
13:41
And companies are sometimes very good for the society. Think of kiwi.com, what they did an awesome party last night. Sometimes they have do no evil for some time in their slogan, but sometimes they become evil or sometimes they vanish. So I'm a company owner myself, so it's no capitalism critics.
14:04
I just know that computer companies are not here forever, and their current policy is not here forever. So it's very important. PostgreSQL is owned by the PostgreSQL Global Development Group. It's like Python, which is owned by the Python Software Foundation.
14:21
Just more safe because the PostgreSQL Global Development Group is not really a legal entity you could sue or get from the planet. The open development of PostgreSQL is very, very open. We have an open discussion about new architecture, new features, new comments. It's open on the mailing list.
14:42
The discussions that lead to go one way or the other are always researchable. If we are talking about complex technology, also things the database is not doing is very important. The development history, what led to do it this way, all is out in the open, all the thing is done.
15:01
We have a big resilience against being bought by some company which happened with other open source projects. We as Second Quadrant are one of the biggest contributors to the current PostgreSQL code. We have not been there for all time, so 15 years, but currently we still contribute less than 50% of the new code.
15:22
Current things are 30 to 40, and we have the big group of four of 22 committers on our stuff. Even if somebody would buy us a Second Quadrant as one of the big contributors to PostgreSQL, it would not really matter to the project. There are still a lot of more contributors,
15:41
and if they buy us, our people would not have to stay within the company. They could go out and work as a developer somewhere else. That gives a very, very strong resilience to the PostgreSQL project, and that gives a trust which also led the trust in me to put my money into PostgreSQL. The license of PostgreSQL is open for various business models,
16:03
which is very important because you can't survive by selling stuffed animals with a logo on it. It's not possible. You need to try different business models. We at Second Quadrant have the business model of selling support and development and early access. Other companies like our main competitor, EnterpriseDB,
16:23
they sell a proprietary fork of PostgreSQL with some additional features. Greenblum, Translators, StormDB did the same thing. They made their own version. We know of many contributors to PostgreSQL have kind of an Einstein model of their employment.
16:41
There are government employees in Vienna or somewhere in the United States, and in their time as a DBA, they also do development for PostgreSQL. We have a very specific culture of PostgreSQL development. There are some big companies like EDB, Second Quadrant, Creditive, Cybertech,
17:00
Crunchy Data Solutions and some more all over the world. We compete for the same customers while contributing to the same code base. It's a very interesting and not conflict-free environment, but it's also very forward-moving. Now, why store all those data in a database
17:23
and why in a relational database? The idea is, if you get sensor data, log files, persistent tracking from your stuff and put it into a database, you have a common ground to put your analyzers on it. That analyzer can be machine learning
17:41
or programming to analyze the data. It also can be data scientists. That's a big benefit if it's a relational database because there are many, many tools for non-programmers to do data analytics on relational data. So, having PostgreSQL as an intermediary
18:02
for all your data of your Internet of Things and N64.0 gives the possibility of people from other fields outside of computer science and programming to come in and analyze the data with their tools and with their methods. Why should you store document data like sensor data, log files and stuff into PostgreSQL?
18:23
You could use a document database. PostgreSQL, since three versions, has a JSONB data type to store documents in it. And PostgreSQL still has a proven query language. SQL, structured query language, proven by many decades of experience
18:42
designed for non-computer scientists to write queries. JSONB data may be queried by SQL. JSONB data has a full index support so searching for attributes within your JSONB objects can be index supported and the PostgreSQL optimizer
19:03
knows how and when to use which index to just get the right rows. You don't have to program it yourself. We can later, I call it the Hannah-Montana-Miley-Zeros pattern, have the best of both worlds, like pulling out regularly queried attributes
19:23
from your JSONB object into columns and what is seldom known when you just read the actual blocks of PostgreSQL features. Those JSONB data types are founded on multiple generations of code so it's not a brand new fresh code, of course it's fresh,
19:44
but it was used and used and used before. It started out in 2003 by Théodore Sigaf with the H-Store data type which was made to store something that's like Perl arrays. It was great if you were a highly trained Russian astrophysicist
20:06
to very efficiently store the data. The challenge was if you were not a highly trained Russian astrophysicist it was very challenging to deal with H-Store. So it took off in specialized communities
20:22
like the astrophysics, like the biological data community, but for the whole thing it was not really graspable what you do with H-Store and how to put data in. Now on that foundation of the H-Store all the experience of indexing non-relational data that's stored in these fields
20:41
on that experience, JSONB and JSON data types were integrated into PostgreSQL. They share a big common code base within PostgreSQL. The long experience with indexing the stuff, very good indexes for this stuff, optimization of this stuff, and the JSONB data type, JSON that runs the web.
21:03
Now, how to do JSONB? That's a book database, very short, you have a title, case-sensitive text, you have an ISBN which has its own ISBN data type, and you can do an index on books with the pubinfo field.
21:24
And the publishing info field can be a JSONB object with any number of attributes. And just by doing this index you can search on any attribute
21:41
of all the JSONB objects index supported. You can search for them anyway without writing your own program just by querying it, but by setting up an index it's index supported. Index supported, just a short explanation, if you read through all the lines in the database you have to pull a lot of data which takes a lot of I.O.
22:02
If you have an index, it's a shortcut, it's like searching for a word in a book all the time or just going from the index in the front of the book. So that's the kind of query you can do on JSONB data, I have just one example. That parallel normalization I suggested I told you about,
22:22
when I'm developing database applications currently, I always have this, the Germans will know it, setb.fou. It's for special use, that is I always have one column in my tables for the specifications my customers did not tell me when they asked for creating a new table.
22:41
And this table is created as JSONB, so every attribute that belongs into that table that was not discussed when the specifications were written are just put into that JSONB thing so I have a user table without having the Twitter handle in.
23:01
I put the Twitter handle into the JSONB field and it's stored. If there's a next cool thing like FlipChat or Twitch chat which is Twitter but for chatting and with dogs or whatever, I can put the new handle into that setb. Now, if that thing really takes off,
23:23
you want to move it into the relational field, you can add a column for that attribute from your JSONB and you can transform all your stored data and you don't have to adjust your applications,
23:42
you can write a trigger which every time the row is updated looks into the JSONB field and writes the stuff into the regular columns. So you have the Hannah Montana, Miley Cyrus, you have the best of both worlds, you have the JSONB which still can be used and you have the columnar data which can be used in your analytics use or reporting stuff.
24:03
And if you have migrated all your applications to use the column, you can drop the JSONB attribute storage for good. Why columns? And why in PostgreSQL? Why not just move a primary index and a JSONB? Why do all this dance to get the best of both worlds?
24:21
We have documentation which is given by the column names. We have additional optimizations that are even stronger than a chin index that we can have on JSONB and we have a lot of analytic tools which really deal well with columnar table data. Now, another part which makes PostgreSQL really, really a good choice
24:43
for all those new data types from new sensors is extensibility. Extending PostgreSQL by a new data type which can be used from all applications, which can be index supported is something that can be done in a pH... No, you don't need a pH. Yes, you need a master thesis.
25:01
Something around that's the effort to create and a lot of people do it. I have one example. Lidar data. One of the big things with all those self-driving cars coming up and currently it's used for geographical things. The planes fly over the city and do a Lidar scan. You have lots and lots and lots and lots of data types.
25:23
Data points. And to store them takes a specific form because if you would just take every coordinate into an integer into one column, the overhead with those millions and millions and millions of points would just not make it feasible to search something.
25:41
So one guy created a PG Point Cloud data type and he's with the Canadian government for nature research and he just extended PostgreSQL by a data type specific for Lidar where he has found specific ways of storing optimized Lidar data.
26:05
He compresses it, he makes it specifically searchable and he just takes some features of that Lidar type which are specific to it. And those optimized data types that's been in PostgreSQL from the beginning. It was designed for it.
26:20
And with all those new sensors reporting stuff we have not been thinking of last year, PostgreSQL is prepared to put those extensions in. One thing my colleague Alvaro and the other developers of PostgreSQL put in into the last versions is very cool if you have a lot of data that has kind of a normal natural ordering.
26:45
What's a normal natural ordering? If you have time-based data like log files, log things, usually those lines in the log files have an extending time, a scanning time. You don't jump within time coordinates, it's slowly growing.
27:02
If you do a Lidar scan by a plane, that plane usually moves in one direction so you have a natural order of your data. The BRIN data type, block range index, allows you for every block of data on the hard drive on your system
27:22
to just store the beginning and the end of that data. Instead of having an index to every row in your data, you just have the beginning and the end of the data block. That's similar if you look at those old libraries or in the old companies when you have the physical storage of customer things
27:43
from A to B, from C to E, from F to whatever, and we store it in the BRIN index. What's good about it? It's a very, very small index for a lot of data. The bad thing is, from the index you don't get directly to the row,
28:03
but if you have something like temporal data, which is strictly ordered by time, or you have something geographical data which gets from a plane scan and you sprint an index, you have a very small index. A very small index leads to a lot of index in your cache, maybe in your main memory, maybe even in your level 3 or level 2 cache
28:24
of your processor, so it can be used very quickly and you get to the right data very fast. One thing that will be in PostgreSQL 10, which is scheduled to come out in autumn this year, declarative partitioning. What is partitioning and why should we care?
28:43
If you store your data in a table, it will be coming in and appended to the end. We're just talking about append only to the end, and the end, the end, the end. Now, if you have log files of robots in a factory, you are not really interested what this robot was doing two years ago.
29:04
That's why Skynet came to power, nobody looked at what it did two years ago. But in a practical environment, you don't look at the data that happened two years ago. You need just the last six months, or four months, or three months, because earlier data is no more relevant for today's actions.
29:21
So you need an efficient way to delete the data that was produced more than six months ago, more than seven months ago, more than eight months ago. If you have everything in one big file, so January, February, March, April, and now in May you decide to delete January,
29:43
you have a hole in your file, database file, which will be filled up, and if the data from May, which will be put into those free space, is longer than January, then you have some here, and some here, which makes it hard to search.
30:00
Also, deleting data from a database takes a lot of I.O. Because we have to declare every record as invalid, that's the way PostgreSQL works. If you have a petition for every month, you just tell the system drop all the data from January. You have a petition for January,
30:21
and you can just drop that petition. I've put it out here with data for February, and data for March, and the declarative petitioning in earlier versions from PostgreSQL, you had to work very hard to do this petitioning. You had to write trigger or rule codes to move the rows in the right petition, and do this.
30:42
Now with 10.0, and that's one thing Robert Haas and the other guys from EDB really pushed forward, you have this declarative petitioning, which allows on a declarative way to say, yeah, put these datas there, and then after dropping. Looking really forward to this feature.
31:01
Another stuff, we have a lot of features in PostgreSQL. There are additionally special features. I told you about that PG Point Cloud before. I also mentioned briefly, not really reading it out, but it was on a slide, postures with geographical data. We have a lot of more extensions out there running.
31:23
If you have extensions, that's something like packages. We all love a specific programming language, which currently I think we have five to seven packaging systems in Python, which is great to have a large variety,
31:41
which is challenging, but you have to deal with which one. PostgreSQL, it was worse. We didn't even have any structure to have those extensions, so people distributed makefiles and an instruction how to install it, which kind of worked, but was not really scalable. Now since 9.1, my former colleague Dimitri
32:02
was the lead developer of those extensions. We have a great extension name and alt extension update, so the whole life cycle of extended functionality for PostgreSQL databases can be managed with commands create extension, update extension, modify extension.
32:21
We have a nice way of packaging and distributing those information. Another thing to look for in those Internet of Things and Industry 4.0 applications, you might want to communicate with the devices
32:43
somewhere in the world, and you have a master database where all your company data is, and in the Internet Exposed database, you just want to have the data that is really, really necessary to communicate with one specific device,
33:02
because every data that's not in that database cannot be compromised or stolen, so have a slow one. With 9.6, 9.5 somehow, we invented logical decoding in PostgreSQL. Logical decoding gives all the changes
33:23
that happens to your database in a logical kind of human readable format, but also in a machine readable format. So every change that happens to any table in your database is written by PostgreSQL in a binary log anyway, and with logical decoding we found a way
33:42
to extract information from that binary log and use it for different purposes. So with PG Logical, that's an open source extension distributed by second quadrant, based on that logical decoding within PostgreSQL, you can do the following. You have two tables that are relevant
34:00
for your R2DB2 supporting database for a microservice, and you just want to replicate those two tables to the outside. You use PG Logical to get that streaming data outside, transcode the write ahead log in readable form, and that's essentially the thing.
34:22
You set up the node, your master database, you create a replication set, which tables should be replicated, you add the tables you want to be replicated to your other database to that set, and on the smaller database,
34:41
which is with your microservice or in the internet or something, you create a subscription to that publishing on your master database, and you can stream those changes out. And the good thing is, those data that's streamed out does not really put any strain on your transactional processing.
35:02
You can do the same way the other way around. So you have data collected by various databases only for specific devices, and you can stream those changes with the PG Logical decoding, which the decoding is included in PostgreSQL,
35:20
that's a free and open source extension, PSD license, stream that data from multiple databases to one master database, so that collect the information of various branches and put it together in one big database. Now we found a way to do it online without inflicting the actual processing
35:43
on the databases and without writing ETL scripts or something. Another feature that's becoming better and better in PostgreSQL is the realization that it would be nice to have every data in PostgreSQL, but it's unrealistic. You have legacy systems in other databases,
36:03
you have new legacy system data in new humongous databases, or you have very strange data sources and you want to consolidate them all in one database because you just want to have a report across the production data, the economic data, and some marketing data,
36:22
and you want to consolidate it in one report. PostgreSQL has the foreign data wrapper. That's an implementation of the SQL Management of External Data standard. Since 2003, in the SQL standard, we implemented in 9.1 for reading external data. In 9.3, reading and writing to other databases
36:43
that is not PostgreSQL, from PostgreSQL, and the integration of the external data and the features we are able to put in those foreign data wrappers get better from every version. There is one very fascinating thing. There is a multicore library,
37:01
which allows you to write foreign data wrappers in Python. Why would you do that? They have one example on their side. You can create a foreign table which reads your IMAP account. So every kind of data that's somehow accessible by Python,
37:32
you can push into a PostgreSQL table. So you can join your production data with your Twitter feed or with your IMAP box.
37:41
Is it really useful? I don't know. The little stuff. Who has ever been trampled by an elephant? Oh, give this guy a towel. Anyway, has anybody, I'll put it here, has anybody bitten by a bug or by a fly?
38:00
Yeah, so it's the little stuff that counts. So one major nuisance of sorting data and ordering is collations. Because if you're still in ASCII with A to Z, it's fine. But if you need to order acrylic characters
38:21
or you have the people from Miley Cyrus. Yeah, cool. If you use station B, you can also have Hannah Montana. Very good. Anyway, so if you sort data and you have people with Croatian or Slavic names or Turkish names,
38:47
it's very challenging what is sorted why. And even if you go to Germany, there are different discussions when to sort an umlaut and how to do it. So collations. The funny thing about those collations,
39:01
the linguists and the computer scientists are still updating it. And PostgreSQL doesn't reinvent the wheel, so we were using the operating system things like glibc to do those sortings. And now from how a database works, you have indices which point to records
39:24
and the sorting in those indexes are based on how you sort values. And looking up those values and index also needs the same sort order than when the index was created. Just imagine the index is sorted by another way.
39:40
It's from Z to A and you expect from A to Z. You never find something. Now the operating system guys, in minor security releases of their operating systems, they changed the sort order. So you had AB in version 1 and BA as the correct sort order in version 2. Not with those letters but with acrylic or whatever things.
40:03
Now with 10.0 my colleague Peter Eisentraut uses the ICU internet components for Unicode, created it and we are basing or you can optionally base your collations on ICU, which additionally gives you a correct sort order for emojis.
40:22
Now with PostgreSQL 10.0 not only can you make your users happy with having a stable collection, not corrupt your index by operating system change, but you can also correctly sort those things. Peter wrote in blog post and there's even a standard how to sort emojis.
40:41
Another good news, one of the challenges of current or later PostgreSQL releases was the interface. We had pgAdmin 3 which kind of worked, which was nice. It was created pgAdmin 4 and we got some challenging discussions
41:01
within the booth when we were in conferences. Customers, when I went there, after they got to know me that I'm not a bad guy, they asked, hey, I had real challenges when I tried this pgAdmin. Is it me? No, it's not you. So we had second quadrant, hired two guys in Brazil, William and Rafael. They had written Omnidb.
41:21
They were young, they were not with us. They wrote it in ASP.net and good news, that's the final present from this talk. We have a EuroPython 2017 preview. They rewrote it in Django and Python and it's available in this URL. I have two screenshots. We have outer completion and we are tolerant, it's Omnidb.
41:46
It connects to many, many different databases and in their wish list, what they're planning to program is even a support for moving data between or structure between those databases.
42:01
So it's free and it's available as a preview. I asked those guys to just create it for EuroPython so I have something to give away at the end beside my towels and that's present. And that gives us two minutes for questions. Thank you very much. Well, okay, one question.
42:30
Okay, you showed us the example how you make an index out of the bjson field. I was wondering, is it possible to build the index only on specific keys in the bjson field
42:43
in order to improve the performance, for example? Yes, yes, yes. That was possible, sorry, I skipped a while. That's possible since a long time. In PostgreSQL we have partial and functional index. You can create an index on a part of, oh sorry, that's not the JSONB, that's the other one.
43:03
Okay, it crashes. Anyway, you can create an index on just one attribute, you can create a functional index, you can even do that with normal columns to create an index on upper first name. Functional index have been in PostgreSQL since nearly forever
43:21
and they can also use the JSONB. So you can have an index on just one attribute, yes. But we're very proud that you can have one index and have all index, cool. There are tests and compared to the loudest document database
43:43
on the market, we get similar numbers or faster numbers. Alright, that was it. There's another question. Time is up. Time up, sorry. So you have to find him at his booth. I'll be at our booth or somewhere around and the guy who got bitten by an elephant, there's a towel for you.
44:01
Thank you very much. Don't forget that. And don't forget to rate the talk.