We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PostgreSQL and Python, a match made in heaven

00:00

Formal Metadata

Title
PostgreSQL and Python, a match made in heaven
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
PostgreSQL and Python, a match made in heaven [EuroPython 2017 - Talk - 2017-07-11 - PyCharm Room] [Rimini, Italy] his talk is about a love story. The perfect love between PostgreSQL and Python. PostgreSQL is the default database choice for many Python developers, because it is robust, stable and open source. In 2ndQuadrant, we breathe PostgreSQL and we love python too, using it as much as possible for internal and external open source projects. We want to share our love for python and PostgreSQL and how they work together. So let us tell you a real love story
IntelDemonSoftwareSoftware developerOpen sourceAxiom of choiceDatabaseDefault (computer science)Table (information)Inheritance (object-oriented programming)E-textReplication (computing)BackupRadio-frequency identificationMoment (mathematics)Linear regressionSystem callTable (information)Cartesian coordinate systemIterationSheaf (mathematics)Group actionOperator (mathematics)Game controllerCausalityMaxima and minimaDatabase transactionExpert systemProjective planeSpherical capOpen sourceReplication (computing)Mechanism designWordInsertion lossObservational studyAreaActive contour modelSoftware testingMereologyPosition operatorMilitary baseDirectory serviceService (economics)Computer fileData recoveryLogicLattice (group)Software developerCASE <Informatik>Different (Kate Ryan album)Crash (computing)Impulse responseInstance (computer science)Process (computing)TouchscreenDefault (computer science)Speech synthesisDatabaseArithmetic meanInternetworkingPhase transitionView (database)CloningServer (computing)Goodness of fitMultiplication signProgramming languageMathematicsRaw image formatConcurrency (computer science)Axiom of choiceDirection (geometry)Core dumpMatching (graph theory)Structural loadHeegaard splittingBackupPhysicalismFreewareSoftware bugPlug-in (computing)Computer animationLecture/Conference
Type theoryFunction (mathematics)Operator (mathematics)Price indexExtension (kinesiology)CircleObject-relational mappingDevice driverDatabaseBackupOpen sourceAreaArmType theoryPoint (geometry)Different (Kate Ryan album)FreewareOrder (biology)Inductive reasoningMassStrategy gameRelational databaseLine (geometry)Moment (mathematics)CodeCausalityWireless LANWrapper (data mining)Open sourceDigitizingPosition operatorLibrary (computing)Procedural programmingBeat (acoustics)ResultantComputer fileStandard deviationComputer programmingSubject indexingDiffuser (automotive)Functional (mathematics)Musical ensembleSoftware developerVideoconferencingCodierung <Programmierung>Group actionFood energyState of matterSurfaceInsertion lossSoftwareBus (computing)InternetworkingTable (information)Projective planeDevice driverMultiplication signSpherical capGreatest elementStress (mechanics)Address spaceRight angleFrame problemExtension (kinesiology)Interface (computing)Pattern languageDatabaseLatent heatWordOperator (mathematics)Confidence intervalCycle (graph theory)BitCore dumpObject-relational mappingWritingService (economics)BackupQuery language
Revision controlAlpha (investment)BackupCustomer relationship managementDatabaseOpen sourcePlastikkarteText editorElectronic visual displayQuery languageFormal languageDebuggerCommon Language InfrastructureDefault (computer science)Complete metric spaceStatisticsExtension (kinesiology)Graph (mathematics)Event horizonSoftwareSystem callReplication (computing)Real-time operating systemMoment (mathematics)DampingOpen sourceComputer configurationMereologyGraph (mathematics)Revision controlExtension (kinesiology)DatabaseMeta elementGroup actionQuery languageText editorResultantDescriptive statisticsSystem administratorCircleProcedural programmingGraphical user interface2 (number)Cross-platformCartesian coordinate systemBoolean algebraFunction (mathematics)Parallel portInformation retrievalSoftware testingArithmetic meanRadical (chemistry)Default (computer science)Server (computing)User interfacePlanningTable (information)WordWorkloadPlastikkarteBackupAlpha (investment)CASE <Informatik>Complete metric spaceClient (computing)Process (computing)CodeVirtualizationCycle (graph theory)Video game consolePattern languageSoftware developerWhiteboardOpen setRepository (publishing)Sanitary sewerPosition operatorAuditory maskingSpherical capExpressionSocial classKey (cryptography)Structural loadMultiplication signFormal languagePresentation of a groupWeb 2.0Digital photographyPoint (geometry)Computing platformDegree (graph theory)Sparse matrixPlotterCore dumpCausalityLeak
Presentation of a groupDistanceObject-oriented programmingDatabaseLecture/Conference
BitExtension (kinesiology)Particle systemCondition numberWorkloadMereologyDatabaseGoodness of fitLecture/Conference
Row (database)DatabaseOrder (biology)Subject indexingSequenceTable (information)Query languageMeeting/Interview
Query languageRow (database)MereologyGroup actionProcess (computing)PlanningSelectivity (electronic)2 (number)CausalitySensitivity analysisSpeech synthesisArithmetic meanRight angleMachine visionSet theoryBootingNumberLinear regressionLecture/Conference
DatabaseGrass (card game)Software developerGroup actionLecture/ConferenceMeeting/Interview
WebsiteDistribution (mathematics)DatabaseConstructor (object-oriented programming)CuboidPhysical lawStaff (military)Order (biology)CodeTheoryLecture/Conference
DatabaseProcedural programmingMereologyPresentation of a groupMathematicsAtomic numberRight angleFunctional (mathematics)TrailTable (information)Data storage deviceStaff (military)GradientProcess (computing)Personal digital assistantMusical ensembleSheaf (mathematics)Numbering schemeAuthorizationDifferenz <Mathematik>Moving averagePatch (Unix)Cartesian coordinate systemDifferent (Kate Ryan album)Lecture/Conference
Right angleWeb pageElectronic mailing listSoftwarePosition operatorLecture/Conference
Transcript: English(auto-generated)
Thank you for being there. Okay. This is a story about love. So I think that's the moment to talk about love. What is the meaning of loving what you do?
It's not a rhetorical question. I want an answer from you. I really want an answer. What is the meaning for you of doing what you love? Go on. All of those were good answers. Okay.
I have another question for you. And what do you love?
It's another non-rhetorical question. What do you love? Coffee. Me too.
Those are good answers too. I can say to you what I love. I love coffee, obviously, but I love innovation. Okay. And thankfully I'm not the only one. I'm lucky enough to work with people that love innovation too
and bring innovation to the world every day. I don't work in Wonderland, but I work for second quadrant. I'm Giulio Calacocci. I'm an open source developer and every day I work with Python, writing in Python for POSGES and with POSGES.
If you want to tweet something, you can use the EuroPython hashtag or the PMPLove hashtag for this talk. So, I promised you a love story. Probably one of the biggest love stories of all is the match made in heaven between Python and POSGES.
You can see the elephant and the python in love between each other because one of the questions that I've been asked not only yesterday but also in different conferences was, Python and POSGES get along together well.
Yes, a lot. They have a lot of things in common. Really, a lot of things in common. First one, the first letter is a P and they are both open source. Both of them have a huge, really, really huge community
that is really, really involved in the development of the programming language and of the database. In the case of POSGES, the community is responsible for the robustness of the database. Both are powerful. Python is powerful, it's flexible.
POSGES is powerful and stable. Both of them are really well documented. You can find a lot of documentation on a lot of stuff on the internet on Python and POSGES together.
You know already Python because this is a Python conference. So, please, allow me to introduce you POSGES. One important thing. The name is POSGESQL or POSGES or if you feel like a close friend, a PG, okay? If you call POSGES like this, the elephant will be happy.
Please, please, please, please don't call him POSGRIT or POSGRESQL or any other iteration between the letters of the name. Otherwise, the elephant will be confused.
I'm not sure that he will answer your questions. Okay. POSGES or POSGESQL? Okay. Is de facto the default database choice for many Python developers? There are a lot of developers around the world
that have chosen POSGES as the default database for the application. Obviously, most projects support more than one database but usually tests and development is done on POSGES.
Why? Because POSGES is free. It's free as a speech. POSGES is stable. POSGES is free to reach and we are going to see just the tip of the iceberg of the features that are part of POSGES. And POSGES is released frequently. POSGES is released once a year with a big release with new features
and one time every three months with bug fixes and minor features. We have a lot of features because it's feature-rich.
In POSGES, the first one is the MDCC that obviously is not something that only POSGES implements but is something that POSGES supports and is the multi-version concurrency control that is something that makes you sure that your transactions are isolated between others.
So when you do something, your database is always consistent because the first directive for a database is to keep your data safe and POSGES is particularly good in doing that. I've seen databases during my work crashing in really bad ways
but I've seen people recovering the data until the moment of the crash. So POSGES is safe. POSGES has the transactional DDL that is something that is less common than MDCC and basically when you do a change to your table,
for example, you drop a column, if you do that on a transaction, that action is transactional. Every operation that involves changing your database is safe and transactional.
So who is working on the database at that moment is safe until the transaction is committed. It is something that not happens in every kind of database. Every biggest database like Oracle does not support something like this. POSGES supports table scenarios.
Table scenarios is something that allows you to create a father table with a lot of one or more child tables to allow you to split your data in sub-tables and access them through the father table using a logic.
And from POSGES 9.5 you can do that also on remote servers so you can split and shard your data on remote servers and split evenly the data and the load on different instances of POSGES. POSGES supports full-text search.
Something that is very important, especially for people that use Django and develop in Django it's so important that Django have a plugin that allows you to use full-text search on POSGES easily. Then we have the replication.
Replication is something that allows you to create a new server that is a master and clone the server on another instance that is a standby. The standby is always synchronized with the master and in case of disaster, if the master goes down you can promote the standby and reduce to a minimum the downtime in case of a bad situation.
Then POSGES supports physical backup. And this is cool because, at least for me because I am a disaster recovery expert. Physical backup is something that makes you copy your data directory
while your server is running. You don't have to stop your server to make a backup. And you're going to copy the physical file of the server and not asking for SQL dump. So everything happens while the server is running
with no downtime for you. So taking backups, it's easier and it's faster. POSGES supports the point-in-time recovery. That's it, like a kind of magic because if someone drops a table at 10 o'clock in the morning you can recover your database at 9 and 59
and restart just a second before the disaster. Then supports JSONB. JSONB is a specific type of data that was think of people that like to use JSON especially people that like the NoSQL type of data
that can store entire documents written in JSON inside your database so you can mix between a relational database and something different. And another cool thing, and we will talk about it later
but now I introduce it to you are the foreign data wrapper, FDV. It allows you to connect to every kind of source of data outside the database. For example, you can connect to another database not only POSGES but also MySQL, Oracle, DB2
or you can use as source a text file, you can use as source a remote service. You can use everything and import in your database as a table and you can do queries on that and if the remote source allows it
you can also insert data right straight inside your database. And we have also POSGES. POSGES is just partial extension that has extra types, extra functions, extra operations, index enhancement for handling just partial data.
POSGES has a lot of functions that allow you to move inside a geographic area and find points that are near and is very important for people that develop using GIS.
The relation between POSGES and Python is a filter cycle because as I said a lot of Python developers choose POSGES and because of this the software that is based on POSGES
and as an example that are based on POSGES and documentation of POSGES is better than the other. So more people are going to use them. So POSGESQL community grows. If the POSGESQL community grows the database grows stronger. If the database grows stronger more people are going to use
Python and POSGES together and this is a cycle because of this the Python community grows and the cycle restarts. This is important. But now it's time to talk about the love story and let's get down in business.
This is a showcase of things that have been done using Python and POSGES and because a lot of people ask me, okay, but can you tell me at least a couple of projects that have been created using Python and POSGES. Sorry. Yes, I can. I created a talk on that.
So, fifth thing. We want to connect to POSGES using Python. We have a lot of Python ORMS like SQL alchemy, Pee Wee, Pony ORM and Django ORM. Okay. They are written in Python and they handle POSGES
and they have one thing in common. Every ORM that I named uses Psycho PG-2. It is, no, no, clap your hands because it's a moment of Italian pride for me because the main developers of Psycho PG-2 are Italian.
Really. So, Psycho PG-2 implements the Python DB API 2.0. It's based on only PQ that are the libraries that are the standard libraries for POSGES and it's open source because LG GPL.
So, it's really free. It's really easy to use. So much easy that a lot of people uses it and I've seen projects around the internet wrapping Psycho PG-2 to create an even easier interface for that.
So, it's really, really powerful. Obviously, it's not the only one. There are a lot of other drivers. These are the most famous. We have PyPos SQL. PygreSQL, PG-8000, OCPGDB and a lot of words and letters together.
By the way, Psycho PG-2 is not the only one. There are a lot and everyone is valid.
Okay, what if I tell you that you can use Python inside POSGES? Every, let's say, serious database usually have an internal procedural language that allow you to write procedure to handle your data
in a faster way that you would do in an external program, okay? POSGES allow you to load Python inside the database and use Python in these libraries inside your database to modify your data. You can use Python to react to action
like insert on update or you can use it to prepare your data for a huge dump of data or whatever you want. You can use POSGES in Python, Python inside the POSGES and write Python code inside POSGES.
This adds the flexibility of Python straight inside POSGES. But wait, I promised you that I would talk more about frame data wrapper. As I said, you can use the frame data wrapper technology to connect to something external. Usually, as I said, frame data wrappers are written in C.
But if you use multicorn, that is a software that is written in Python, you can use Python to write frame data wrappers. In 2016, in the Italian Python Conference,
I've had a talk on this and I've written, just for fun, free frame data wrappers where I think the longest one was 150 lines of code, comments included, that allow me to connect, for example, to SoundCloud and query the service
asking for songs straight inside the database and then I was able to see the results of my research on a table. And that project started because I was really trying to organize my music collection and then POSGES helped me to do that.
If you want to see that, you can find the code on my GitHub, it's that one. There is also a video, unfortunately it's in Italian, so if you don't understand Italian, you can skip that. Another thing that you want,
after having loaded the data, through frame data wrappers, having loaded Python inside your POSGES, is to back up your database because I really can't stress enough people of the importance of making a backup. You can do that using Python.
There are a couple of software that allow you to do that. The most famous are Wall-E. It's a really nice program that really acts nicely with FUS and S3 from Amazon. And another one that is a bit famous is Barman.
So, let's talk about Barman. Why you would like to use Barman? Because it's open source. It's one of the most used backup tools for POSGESQL. It's filter-rich, it's easy to use because one of the points that the developers have
while developing it is to keep it simple. And it's developed by a team of nice people. How I know that? Because I'm one of those people. I'm one of the Barman developers. So, let me introduce Barman for a bit.
We are going to release the 2.2 version. The alpha version of the 2.2 will be released on July 17. And the killer feature that we are going to introduce is the rsync-based parallel backup that allows you to copy faster and really, really faster
because we have done some tests on that. Copy a huge database on your backup server. And this is important because, like I said, more people use Python software with POSGESQL
the better the software becomes. And Barman is an example of that. I can say that because I worked on Barman for a lot of years. And Barman has grown stronger because of the people that reported back to us that founded some features like this one.
And now, if Barman is stable, it's because of this. Okay, I'm not able to pronounce correctly. So I'm going to call it HA.
Okay. The meaning of that word is that you want your database to be available to all the people most of the time because if your database is down, probably the one of your concurrent is not. So, the more you reduce your downtime,
the better it is for you. How you can handle that? As I said before, we have streaming replication that allow you to synchronize more than one server with your master. I said so. How can you handle the promotion of one or more servers
in case of disaster? You can use this Python software that's called Patrony that is written by Zalando. So, it's used by them. It's open source because I have an MIT license. It's written in Python. It is based on Zookeeper or etcd or console
because you have more than one option. It depends on what you know. It's really powerful and to explain it simply, when your database shuts down, your master, there is a discussion between the other nodes
that decide which node will be promoted to the master and it happens almost automatically. So, no more calls at the middle of the night because the database is shut down. Obviously, we have to recover it,
but you can do it the morning after. We have talked about database. We have talked about HA. Now, we want to handle our database maybe easily. There are a lot of tools that allow you to do that. We have the OmniDB tool that's open source
and have been recently rewritten using Django. The 2.0 version of this software has been written from scratch using Django. It's cross-platform and has a nice PostgreSQL editor.
This software is written by two really amazing guys from Brazil and now they work for second quadrant. Those are my colleagues. Then, we have another tool that works historically near Postgres.
It's PG admin 4. It's open source. It's multi-platform. It gives you the ability to see the plan of your query because when you create a query, Postgres analyzes it and decides how to act. You can see the plan that Postgres creates for the query
and you can see if you can optimize your query to be faster or to act differently. Another thing that is really, really useful, especially for developers, gives you the ability to debug the procedure language that is called the PLPGSQL while writing a start procedure.
Usually, it's not easy to debug start procedures. You can do that putting a breakpoint on your code and step-by-step analyze it. After the graphical interface I showed you,
because OmniDB and PG admin 4 are graphical, we have the command line that is the tool that most of the DBA or the CDAMIN loves. We have the default client tool that is PSQL that is part of Postgres, is released with Postgres.
It's really powerful and it's born to work with Postgres by the people that develop Postgres and have a lot of meta comments that allow you to perform actions like retrieving descriptions of tables, retrieving descriptions of schemas,
or following foreign keys, or listing database just with a backslash command. Usually, it's a backslash and one letter, so it's faster and easy to use. But, obviously, it's not perfect. I have some issues and browsing the web,
searching for tools, nice tools for this presentation, I have discovered that exists PG-CLE that is obviously less powerful than PSQL, but gives you syntax highlighting, smart completion,
and always try to pretty print the output of the tables on your terminal. Obviously, it's not like PSQL, but if you have to write just a quick query, it's faster. Smart completion is obviously contextual of what you're writing.
The last one is the workload analyzer because you want to monitor how Postgres behaves. You can do that using the software that is developed by Dalibault, that is a French company that works on Postgres like second quadrant. It's open source and is composed by two parts.
One part is an extension that resides in Postgres and is written, obviously, in C because it's part of Postgres, and one part is the user interface that is entirely written in Python and allow you to see real-time graphs and see performance chart
so you can inspect your database while it's working and see when or why, helps you understand why your database is under-load on that moment. If you have a huge amount of locks, or a fault of a couple of queries that are really long, for example,
you can do that only monitoring your database, and this is a Python tool to do that. What I show you is just the tip of the iceberg of the possibility of the things that are around the world written in Python and Postgres.
What I show you is the result of love and passion, and that's why I said that passion is a keyword for this talk. What I want from you is to get involved and spread love and being part
of the virtual cycle that I showed you before. To do that, participating in conferences like this or like these two that are the next conferences on Postgres, that we have the Italian PG Day, that it's going to happen in Milan in 2017-13 of October,
or the Postgres QL Conference Europe that will happen in Vasa on October 24-27, 2017. In this conference, in these events, you can talk with people that write the code of Postgres.
You can talk with people that are there to help you to write better Python software with Postgres because it's their job. So it's important to meet these people and to get involved in that circle of virtual cycle.
Thank you. Okay, thanks to you. Who has questions for him?
I'm here for an interesting presentation. I'm curious about... Oops. I'm curious... I'm curious about how the Postgres database works.
Okay, no. Sorry for the technical issue.
So, you heard the question, I have to repeat that. No, okay, you asked me if the workload analyzer will impact the performance of that database. Yes, it could marginally, but if you're in a situation that your database
is always under either workload, maybe you want to spend a bit of horsepower on trying to understand what's going on. It's something you can deactivate. It's an extension. It's part of Postgres, but you can turn it on and off.
So you can turn it on when you need it, turn it off, not impacting your performances. Fine? Okay, good. Other questions?
So we have like a table of 100 million records, right? We want to do select one. So is there anything specific to Postgres
that's going to allow us to do faster queries and searches than, let's say, MySQL or some other database? So if we have like 100 million rows, right, and you have a select query, just select one from those,
is there something specific to Postgres that's going to allow us to optimize the database, like clustering, indexing? I'm repeating the question because,
so I'm sure that everyone heard. He asked me if exists something that helps you to create faster queries. This is the sense of the question, right? That helps you writing faster queries. Yes. As I said, PgAdmin 4, for example,
makes you see the plan of a query. So you write a simple query, just like a select star from something with a million rows, that for process are not a lot of rows to be honest. And you can see how Postgres intend to act on that query.
And this is graphical. You see what happens and you can decide that you can work and split easily that query in a query and a subquery or something that's more performance-like. It depends on how Postgres reacts. What you are searching is just inside the database.
It's part of Postgres. Okay, another question? Someone else? By the way, I will be at the second group booth in the Piazza room. So if you have questions, you can find me
or you can search for this guy. Stand up, Marco. For the technical question or if you want to ask something that's more marketing or stuff related, you can ask to that girl that is there or to the girl that is in the booth, okay? Okay.
Any questions down there? Ah, I didn't see you. Thanks a lot for your talk. I'll try this. Can you hear me? You talked a lot about the relationship between Python and Postgres and I actually know this pretty well because we work with both at my company.
But I'm wondering for other people who don't have experience with PostgreSQL yet but are already Python developers, do you have any good how-to material about using PostgreSQL with Python? Because I think people are pretty familiar with databases in Python, so they might be using Mongo or MySQL or something.
How can they get started with Postgres? Do you have any tips for them?
Okay, so he asked me some tips, okay, to get started in Postgres, maybe migrating from MySQL or Oracle and stuff, right? Okay, probably the biggest tip I can give you is read the documentation
because Postgres documentation is really, really organized and you can find what you need, exactly what you need, just with a simple search on the documentation website. There are a lot of tutorials on how to install Postgres
and usually for biggest distribution is just command line. And getting started, it's almost pure SQL. Postgres is almost pure SQL, not strange construct. So if you know, generally speaking, SQL theory,
you're able to use Postgres just out of the box. So one issue that we have when we're developing for databases is that we wind up with the schema. We may have hundreds of tables. We may have dozens of stored procedures, functions,
everything like that in the database. What is the best way to organize and to manage that process? Do you know of any tools that allow you to actually manage and keep all that stuff in sync? You can do it on the command line, yes.
You can do it on the command line, but are there tools that allow you, for example, that if you have a stored procedure and you make a change to that stored procedure that you can automatically publish all the changes to the database from your repository? Can I answer using that? Yes, okay. There are tools, you have heard the question?
Yeah, okay. There are tools that allow you to do that. They are not written in Python, so we're not part of my presentation. There are tools that allow you to keep track of your changes and diff different schemas, see the changes, patch the schemas and reload them.
Then there are techniques that allow you to do that with less impact, but our techniques are not specific tools to do that. In my experience working in second quadrant in these years, I've seen people doing a lot of strange stuff to do that,
and I've always seen that working. So, Borges is tough. It can handle every kind of change, and because I said it's transactional. You can try that. No, it's not. Roll back. Everything is good, okay? So, another question?
Ah, it's more a follow-up remark. It's more a follow-up remark to what you just asked. There is a lambic from the same author as SQLAlchemy that is a Python tool to track schemas. He knows more, I guess, but doesn't do everything, I guess,
but might be worth looking into. All right, one more? Okay. If you search on Google for Postgres is awesome on GitHub,
you will find a web page that has a list that is always updated of software for Postgres, so we find the tools you were asking, you will find the collection of tools for Postgres, and if you don't know that, there is also Python is awesome. Okay? Thank you.
All right.