We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Introducing Asynchronous SQLAlchemy

00:00

Formal Metadata

Title
Introducing Asynchronous SQLAlchemy
Subtitle
An overview of the asynchronous support introduced in SQLAlchemy 1.4
Title of Series
Number of Parts
115
Author
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
With the increasing popularity of asynchronous frameworks, such as FastAPI, communicating asychronously with databases has become a hot topic in Python development. While there were already a lot of database drivers with asynchronous support available, support for asynchio in Object-Relation Mapping (ORM) frameworks was a lot rarer. That is, until the release of version 1.4 of SQLAlchemy back in March of 2021, which added support for Asynchronous I/O to one of the most popular ORM frameworks for Python. In this talk, Sebastiaan Zeeff will give an overview of the newly added support for asyncio in SQLAlchemy. He will go over how easy it is to set up and use the new AsyncEngine, show you how to avoid getting bitten by some of the pitfalls, and explain how you can use the synchronous functions and methods of SQLAlchemy under asyncio. While this talk is an introduction to SQLAlchemy's asyncio support, it is aimed at those who already have some experience with ORM and asyncio. Extensive knowledge of SQLAlchemy is not required, although having a bit of experience with the framework won't hurt.
19
Thumbnail
43:26
Goodness of fitBitMereologyPresentation of a groupMeeting/Interview
Function (mathematics)outputPairwise comparisonTask (computing)Loop (music)Event horizonBroadcast programmingDatabaseCartesian coordinate systemoutputEndliche ModelltheorieSystem callElectronic mailing listFunction (mathematics)Query languageSynchronizationDifferent (Kate Ryan album)Multiplication signPresentation of a groupMereologyObject (grammar)Client (computing)AdditionResultantCodeTerm (mathematics)Software developerFlow separationLoop (music)BitTask (computing)Dependent and independent variablesGradientComplete metric spaceEvent horizonDatabaseGroup actionServer (computing)Functional (mathematics)Service (economics)Pairwise comparisonOnlinecommunityStandard deviationLibrary (computing)Acoustic shadowSocial classMethodenbankComputer fileThread (computing)Perspective (visual)Process (computing)Operator (mathematics)Self-organizationExecution unitInformation technology consultingNumberEnterprise architectureRegular graphMobile appStatement (computer science)Ocean currentLevel (video gaming)Entire functionInstance (computer science)Computer animation
DatabaseDependent and independent variablesQuery languageoutputBroadcast programmingGroup actionRevision controlSynchronizationExtension (kinesiology)RepetitionGroup actionObject-relational mappingServer (computing)Coma BerenicesFunctional (mathematics)Statement (computer science)ResultantBitContext awarenessLetterpress printingCustomer relationship managementMereologyFunction (mathematics)Connected spaceUniform resource locatorSet (mathematics)Revision controlPoint (geometry)Extension (kinesiology)outputQuery languageAdaptive behaviorMusical ensembleProjective planeCodeCycle (graph theory)Cartesian coordinate systemInformationCASE <Informatik>Level (video gaming)ExpressionDifferent (Kate Ryan album)SoftwareInterface (computing)Object (grammar)Sinc functionDatabase transactionUniversal product codeDatabaseSynchronizationField (computer science)Endliche ModelltheorieTrailTimestampDefault (computer science)Instance (computer science)Functional (mathematics)String (computer science)Multiplication signSystem callRight angleMultilaterationComputer animation
Default (computer science)Server (computing)Letterpress printingFunction (mathematics)MetadataSynchronizationObject-relational mappingSynchronizationoutputIntegrated development environmentFunctional (mathematics)Structural loadKey (cryptography)Instance (computer science)MetadataInformationRelational databaseStatement (computer science)System callException handlingMereologyLetterpress printingDefault (computer science)Parameter (computer programming)BitSocial classMultiplication signDatabaseError messageWeb 2.0Level (video gaming)TouchscreenFunctional (mathematics)Attribute grammarFlow separationObject (grammar)Operator (mathematics)NumberSet (mathematics)Green's functionField (computer science)Endliche ModelltheorieLine (geometry)Cartesian coordinate systemCodeCASE <Informatik>Computer configurationComputer animation
Shared memoryPlanningAsynchronous Transfer ModeDefault (computer science)InformationSystem callMaxima and minimaProjective planeCodeEndliche ModelltheorieBitSoftware repositoryDependent and independent variablesDatabaseMultiplication signComputer animationLecture/Conference
Number theoryPlane (geometry)Lecture/Conference
Transcript: English(auto-generated)
Wait, well, today we have Sebastian, he will be talking about the asynchronous SQLAlchemy. How are you, Sebastian? Oh, I'm fine. Thank you for asking. Yes.
Where are you streaming from? I'm streaming from the Netherlands in Gijsveig, that's near the Hague. Oh, nice. I'm in Amsterdam. How? Nice. Cool. Okay, I wish you good luck and thanks.
Yes. You can start whenever you want. Well, hello, everyone. Welcome. My name is Sebastian. I'll introduce myself a little bit more later. But today we're going to talk about asynchronous SQLAlchemy. It's a very cool new and exciting feature that's part of SQLAlchemy 1.4 that has just been released.
And this beautiful painting is of an alchemist. You can see it at the Mauritshuis in the Hague, if you're ever there. It's a beautiful painting of an alchemist and I thought it was fitting for this presentation. So since we're talking about asynchronous SQLAlchemy, we also first need to discuss what the ingredients will be for this talk.
And for that, it's basically this. We're going to talk about what the differences are between synchronous and asynchronous input output. Now, mind you, this is not going to be an introduction of AsyncIO in Python because this talk isn't about that.
I assume that you've already seen it, but we are going to discuss what the consequences are of AsyncIO for SQLAlchemy and what you have to do. Then we're going to do a small comparison between synchronous and asynchronous SQLAlchemy. What are the differences? Which parts stay the same? Do you have to relearn the entire library again?
Or can you just rely on your existing knowledge? And we're also going to talk about a very important problem that happens as soon as you start talking about asynchronous input output. And that is eager versus lazy loading in your SQLAlchemy objects.
And this is going to be very important. So we're going to dedicate quite a bit of time in this presentation to that issue. Well, first of all, who am I? Well, I'm Sebastian. I'm 35 years old. I live in the Netherlands and I'm currently working as a code Smith and developer for Ordina Pythoners.
Ordina is an IT service company in the Netherlands. We mainly work for larger clients. So we are an IT service company. We do a lot of consultancy work. And you can think, well, governmental organizations or banks or other larger organizations.
That's mainly what we do. And we really try to focus within our Pythoners unit on enterprise-grade Python. So really developing Python as enterprise applications. So it's really a lot of fun. We have a fairly large team and it's a lot of fun to work there. I'm also one of the three owners of Python Discord. I don't really like the term owner, but it's a Discord terminology.
So we just stuck with that. And Python Discord is an online community for Python enthusiasts. We currently have over 200,000 members, which is a lot. But obviously it's a little bit like Slack. So the actual active number of members is much lower.
But still, we currently have over 100 volunteers who dedicate some of their time to discussing Python, helping others with Python, or organizing Python-related events. So that's really cool. And I'm really glad that I can be a part of that. And I'm really proud of that.
So that is who I am. So now let's start talking about asynchronous SQLAlchemy. Well, first, let's start to think, let's first think about what it means for an application to be asynchronous.
And one of the things that's really important is that we're talking about asynchronous input and output. That's where the name Async IEO comes from. The IEO stands for input and output. And that's really what's going on here. Now, first, consider a very simple synchronous Python application, for instance, a very simple SQLAlchemy application.
As a developer, you may have an idea that your application is structured in functions and classes and modules and files and packages and that you jump all over the place. But if you think about it from a Python perspective, basically your application boils down to a list of instructions.
And this is what we call a single threaded application. We just have a thread of instructions and Python will execute those one by one for you. So that is really simple. So Python goes to the first instruction and it goes to the second instruction and it will only move on to the next instruction once that instruction is completed.
It's not really relevant or important for us to really talk about what an instruction means. This is really a very abstract explanation. But what is interesting is when we hit this grad instruction here, this is an instruction that performs input and output. And in this application, that is making a query to a database server and getting a result back.
What's going to happen here in a synchronous application is that this call is going to be made. SQL query will be sent to the database. There it goes. And then what you get back is a response with your data and Python will just sit here on this call waiting for the response to come back.
And only once it's ready, will it move on to the next instruction. And this is basically what your single threaded application normally does. In asyncio, we're working with a slightly different model. So what we're working with here is that we have an event loop.
And an event loop means that you can have multiple tasks that are all scheduled to happen at some time. So instead of having your application as a long list of instructions under the hood, it's still that. It's better to visualize your application in terms of separate tasks that you want to execute.
You always have a current task that Python is currently executing. But in addition to that task, you also have other tasks that are scheduled that are not yet being executed. They're just waiting there in that event loop until they get a chance to run as well. Something interesting will happen here if we try to do the same.
If Python will first just execute those instructions one by one. But as soon as it hits this input output statement, then something interesting happens. We're going to wait for the future result of this operation. We're going to await it with the await keyword.
And basically what that means, it sets the process of input output in action. It sends out the request, but then it also signals to the event loop. This task is currently waiting for something. It cannot do anything else. Let's just put it back into our event loop like this.
And then in the meantime, while task one is waiting for the result of that asynchronous input and output, another task can become the current task and can start executing instructions again. This is the basic, very simplified idea of asyncio, very high-level perspective.
But what is important here for SQLAlchemy is that every little bit of input and output that we're going to do needs to be probably executed in such an asynchronous way. Because if you start executing synchronous input output calls, so calls that really need to wait before they get the result back,
then you will block the entire event loop. So it's important that every task that executes such an input output operation really does that in an asynchronous manner. At the same time, if you now consider your new SQLAlchemy app, your fully asynchronous SQLAlchemy app,
this is still something that is mostly about the SQLAlchemy application itself. It doesn't really have anything to do with the database. So what that means is that, yes, we need to make sure that every bit of input and output, ooh typo there, input and output is scheduled in an asynchronous way within our application.
But the actual request that we need to send to the database and the response that we get back, the actual SQL query is going to be mostly the same as what it would have been in a regular synchronous SQLAlchemy app.
So that asynchronous input output part is really about how you schedule your operations within your own application. And this has one really big advantage for every one of us. That means that basically all of what you know about SQLAlchemy already, how you build queries, how you select, how you use with, and so on, will translate almost one to one to asynchronous situations as well.
It's just about how you send out that query that's going to be different. And I think this is really reassuring. There were a lot of people fearing that when AsyncIO was introduced in the standard library, we would need to build some kind of a shadow standard library with all kinds of asynchronous features.
You may have had that fear about SQLAlchemy as well, that you needed an entirely new SQLAlchemy with only asynchronous coroutine functions, that you had to relearn the entire thing. But that's not the case. You can basically use what you already know.
I think that's really great. Do notice that there's an old though here in this sentence, in the middle sentence. And that's what we're going to talk about later. There are some consequences when you're using things like implicit or lazy loading of data. And that is going to be really important later.
So, right. This was very high level. Let's see some asynchronous SQLAlchemy in action. What I'm going to do is I'm going to compare the asynchronous piece of code with an asynchronous piece of code.
And both of them will take the same action. And then we can really compare what the differences are between the two versions. Mind you, these examples are going to be very simple, very simplified, because this is not a talk about very advanced features, which you can do with the SQLAlchemy query expressions.
This is much more about the difference between synchronous and asynchronous SQLAlchemy. For those of you who are interested in reproducing these results, the versions I've used are Python 3.9. I'm using SQLAlchemy 1.4.22. And there's an important note in the documentation about this.
And that is that the Asyncio extension is currently considered to be beta-level software. They predict that the interface is not going to change a lot from now. But you need to be aware of that. It's probably going to be, I don't know, I'm not involved with the project myself.
I'm just a happy user of it. But my suspicion is that it will be released in full with the 2.0 release. When it comes to adapters, I'm using asyncpg version 0.23 for my async database adapter and pyscho-pg2 2.9.1 for my sync adapter.
And I'm querying a PostgreSQL 13.3 database. And I'm running it in an Alpine-based Docker container. Now, you basically know everything about my setup. And this is, I think, all you need to replicate my examples.
Right. So let's see something in action. So first of all, we're just going to set up the SQLAlchemy engine. And this is the synchronous engine. And it's just using SQLAlchemy.createEngine. I enter the database URL. And one precaution, never hard code your credentials like this,
EuroPython, colon, EuroPython in production code. But here for this example, it's clear. And my database name is also EuroPython. And it's located, hosted on the local host and should be accessible over port 9876. I've also enabled the echo, which means that SQLAlchemy is going to output
a lot of information that we can use later to explain some of the stuff that's going on. I've trimmed most of it in the output that I'm going to show you because there's a lot of output. But I am going to show you some relevant parts of it. And I've enabled the future is true, which means that I'm mostly going to use the new 2.0 style querying
that will be part of SQLAlchemy starting from 2.0. 1.4 is actually a really nice compatibility release. You can already start to use that new style query language. And then your application will be future compatible,
so forward compatible with the new 2.0 version. Since the async part uses that exclusively, I've chosen to do that as well for asynchronous part here. So right, now I've got an engine. Let's see how that works in an asynchronous engine. Well, it basically works the same.
The only thing I'm doing different here is that I'm importing the AsyncIO extension for SQLAlchemy. And that contains a function called createAsyncEngine. And I can use that to create an asynchronous engine for myself. So you can see here, I've specified the asyncpg adapter in my URL,
but the rest of the data is basically the same. So at this point, the two pretty much look the same. There are no awaits here yet, which is also an important lesson. When you create an engine, there's no connection being made with the database yet, so you don't even know if your settings work. You will only see that
when you first start communicating with your database. But now I've got an asynchronous engine, and I've got an asynchronous engine that I can work with. Well, let's execute a very simple SQL statement. This is basically the same introductory statement, SQL statement that you can find in the tutorial.
I've just changed the text to Hello Europe, Python 2021. So I create my statement, SQLAlchemy.text, and if I use the synchronous engine, I can just do withEngine.connect as connection or com, and then I can execute the statement, and I can print a result,
and it prints Hello Europe, Python 2021. Now, what's the asynchronous version of that? Well, that looks like this, and the first thing that you should notice is that the statement that I'm issuing is exactly the same, and that goes back to what I said earlier. The part that needs to go to the database
is obviously just going to be the same queries that you're used to. The only thing that's really different is how you set up the connection. Now I have an async context manager, async withEngine.connect, and when you actually execute your statement on the connection, you also have to await that result. Now, once I have that result back,
that's going to be a regular result again because I'm not lazily loading in the results from the database, so if I then print the result, I get the same string out again. So if you compare the two, you can see that they're pretty much identical only for setting up the connection, so the actual part needs to do the input and the output with your database.
That's something that's going to be different, but the statements that you're going to execute are mostly going to be the same, so everything you know about SQLAlchemy, about making select statements, is just going to carry over to the asynchronous SQLAlchemy. Right. That looked rather simple, but what's the catch here then?
Well, I think the main catch is that you cannot simply rely on implicit input and output, and that's something that happens in SQLAlchemy quite a lot, especially if you're using it naively. You're not really considering this, then SQLAlchemy will make implicit calls to your database almost all the time.
To illustrate that, I'm going to introduce a very simple object-relational mapper model. This is a traveler, and this is a really simple traveler. We're going to keep track of an ID, a created ad field, a name of the traveler, and an age of the traveler, and what's important here is that the created ad field has a server default.
This means that if you create a traveler instance and you save it in the database, if there's no value for created ad, then the SQL server will run a SQL function to get the current timestamp, and it will use that as the default for that field. What's important there is that this is something
that happens in your database, so your application doesn't know that value until it has made a request to the database to ask for that value that's been created. That's something that we can see in action. Here, when I create a traveler object,
Sebastian, that's my name, and I'm age 35, and I can save that into the database with an ORM session. Then I start my transaction, and I add my object. Since I'm using session.begin, this will auto-commit the object, so it will actually save the object in the database.
That's basically what we can see here. This is the echo output of SQLAlchemy trimmed down a little bit, and here you can see the actual database transaction going on. This is until the print with the separator, as you can see on the screen.
You can really see here that SQLAlchemy will insert this new traveler into the database. Now, we want to look at what happens when we want to print the created ad date for Sebastian. When the print statement hits here, everything below the separator line, you can see that SQLAlchemy has to issue
another SQL statement to the database to get that newly created created ad value because that was created in the database because it was a database default. Even a simple attribute access like Sebastian.createdAd here can implicitly send a request to the database for information.
In synchronous applications, that's fine, that works, but in an asynchronous application, that's not really going to work. Here we're trying to do the same thing. I'm creating asynchronous session for my ORM using the async session class.
I've also specified expire on commit is false. This is a nice setting that means that my ORM object will not completely will be invalidated after a commit so that I prevent the number of inputs and outputs I need. Just basically, I'm going to do the same thing.
I'm going to use my async session now with async width, and I'm going to add the database object. Well, that part's going to work fine. But then when I get to the part where I want to print the created ad here, let's just move it up a little bit.
Then it will try to retrieve that data from the database. It's going to make another call to get that created ad value, and then you can see that you get a exception. In this case, the exception is a little bit cryptic, something about a green light. We'll get back to that later. But what it basically says here was
IO attempted in an unexpected place, and that is what is happening here. We cannot just use lazy loading in an asynchronous situation. So what's the solution for this? Well, for defaults, there is a really simple option. If you go back to your ORM model,
you can add a mapper argument called eager defaults, and this means that when you create your new object, it will also return the value of the default that was created in the database. So we're going to execute the same bit of code again. I'm just going to create another Sebastian,
and what you can see now in the SQL that's being emitted is that there's a returning part, and that returning part has Trevor.createdAd in it. So the database is going to send back that piece of information during the creation of the object, and then the print happens without failure because the information is already available.
So this is something you really need to think about. If there's anything that needs to be lazy loaded, you need to take care of that. Well, there's also another situation in which lazy loading is really something you need to consider, and that is with relationships. Here you can see that the traveler class
has been slightly modified, and now it has a relationship with a country, and that's the destination country. So there's a destination ID, which is a foreign key to country.ID, and there's a destination relationship attribute, so we can easily access that country
if we have a traveler instance. So what is the issue here? Well, it's basically the same. Let's say that I'm going to add Sebastian to the database again, and Sebastian likes to travel to Norway because they have a nice environment that we don't have here. You can hike there in the mountains.
So I'm going to add Sebastian here, but now I have an issue. If I now go to retrieve Sebastian from the database again with my session, and I'm then going to look at Sebastian's destination, then I can see the destination is not loaded, and this is because by default,
SQLAlchemy only lazily loads in related models, and there's a simple reason for that. If you have a very complicated database with a lot of related models, if it were to retrieve all that information at once when it gets a single instance for a single object, then you would make huge SQL queries all the time,
so that's not really something that's nice. So the problem here is in asynchronous code, normally when you then access the attribute, implicit IO will take place to load that destination from the database, but in an asynchronous situation, that simply cannot happen.
So what's the solution? Well, the solution is to modify our statement, and one thing that we can do is we can do a joint load, and this basically tells SQLAlchemy when you retrieve Sebastian from the database, you should also already load the related destination field.
That means that all the information will be prepared, the database will be sent to us in one single go, and now if I execute the same statement and I get Sebastian from the database, you can see that if I print Sebastian, then destination is filled with Norway, and if I print something with where Sebastian is going,
it's going to say that Sebastian is traveling to Norway. So it's really important in an asynchronous SQLAlchemy that you think about lazy versus eager loading, and you need to use eager loading where possible. You really need to think about what are the attributes that I want to use on this ORM model,
and you really need to load in that information beforehand, or you have to load it in after, but in an explicit IO call using the asynchronous methods. Otherwise, you will just get errors all the time. So that's really important. You really need to take eager loading into account. Well, then finally, what if I want to run something
that specifically uses asynchronous IO function? Well, there's also a solution for that, and that is runSync, and what asyncsession.runSync basically does, it takes a synchronous IO function or a function that performs synchronous IO functions
and it will turn those into asynchronous calls under the hood using something called greenlets. And this is really close to the database adapter level. It's not something you see, and that means that it allows you to use
the metadata.createAll, which is a synchronous function. Normally, you can run that with runSync, and then SQLAlchemy using greenlet will take care of... Hello, Sebastian. Yes. Sorry to interrupt. We'll take care of... Ah, am I over time now?
Yeah, it's over time. Ah. If you want, you can finish, or we can go for the questions. I need only about one minute or so. Okay, okay. Go ahead. So under the hood, metadata.createAll will now be called asynchronously.
You can even do this with your custom SQLAlchemy functions, but it's really important. This only works for SQLAlchemy IO functions. If you try any other kinds of IO, like web requests, they will not be transformed into asynchronous calls. So this is a really neat feature,
and it also allows you to use the old synchronous style. Just put it in a function and run it with runSync, and then suddenly, all your synchronous calls will still work. So it's pretty neat, but do be aware of how it works. Well, and then the summary. I think the most important thing is that you don't have to be scared of asynchronous SQLAlchemy.
Most of your knowledge will be directly transferable. You do need to think carefully about operations that perform input and output, and if you really want, you can still run synchronous functions that do synchronous database IO functions with runSync. And that's it for me.
Thank you very much. Okay, thank you so much, Sebastian. It was really nice. I've been using it for some time, SQLAlchemy 2.0, and I find it super cool. Yeah.
You have some questions. The first one is if you will be able to put the code in a GitHub public repo and share it in the chat later. Yes, I will be able to. I need to tidy it up a little bit, but I will share it. Then are there any plans to change the defaults
for a synchronous mode to use eager loading? I suspect nearly everyone will hit these issues at least once. That's a good question, but I cannot answer it because I'm not involved in the project itself. I'm just an enthusiastic user of SQLAlchemy, but there are some issues with eager loading
by default, and that's basically when you have a lot of relationships and you were to always eagerly load all those relationships. That means that a very simple SQL call that would only normally get you a minimal amount of information suddenly turns in this very heavy, long-running SQL call in your database
because it needs to load all those relationships between models, and there could be many-to-many relationships in there and all kinds of other relationships. So I suspect that they will put the responsibility at your feet. You really have to think about what is the information that I want to use instead of that they will just load everything for you
because that can be very expensive, but that's my guess. Okay, thank you so much. There are a couple of more questions. I also have a question, but I will add it in the channel. Thank you so much. Yes. Thank you.