We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Python's stability promise

00:00

Formal Metadata

Title
The Python's stability promise
Title of Series
Number of Parts
141
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Many modules you use and love have a portion of their implementation written in other languages, and for that a Python extension need to be made. Python offers a C-API that allow people extending the language, and being a nice glue-language, C is also a bridge to many other languages as well. So if everything is simple, what's the deal with stability? Changes in the C-API might break the functionality in older versions, so PEP 387 saves the day with a policy for backward compatibility. Starting from Python 3.2, the Limited API was introduced, which defined a subset of Python's C-API that it's promised that if used, the code can be compiled in one version, and run in many others as well. Also, having a Stable ABI compatible wheel, allow you to only have one-wheel-per-OS, and not one-wheel-per-python-version, which can simplify your release process. This talk will introduce the Limited API concept, and provide the necessary information to include it in your project.
114
131
Stability theoryEvolutionarily stable strategyComputer fileBlogFile systemSlide ruleFormal languageBitReading (process)Multiplication signRight angleComputer programmingExtension (kinesiology)Graphics tabletProjective planeIterationOverhead (computing)Field extensionElectronic mailing listBackupCasting (performing arts)1 (number)Computer animation
Field extensionProjective planeComputer animation
Projective planeData managementRight angleData structureRandomizationBuildingRevision control
View (database)Constructor (object-oriented programming)Directory serviceType theoryNetwork topologyError messageDefault (computer science)Function (mathematics)Coma BerenicesSystem programmingSatellite10 (number)Surface of revolutionModule (mathematics)Point (geometry)Pauli exclusion principleRevision controlCarry (arithmetic)Directory serviceAnalytic continuationMathematicsMultiplication signException handlingFile systemRevision controlCASE <Informatik>Video gameProcedural programmingSystem callSeries (mathematics)ImplementationPauli exclusion principlePerfect groupFunctional (mathematics)Computer configurationMereologyElectronic mailing listField extensionCartesian coordinate systemLimit (category theory)SubsetMacro (computer science)Formal languageNumberCodeBinary codeProjective planeIntrusion detection systemFigurate numberSoftware developerRight angleInterface (computing)Software maintenanceWeb pageWindowSlide ruleIterationProcess (computing)FeedbackEngineering drawingComputer animationXML
Group actionField extensionDigital-to-analog converterMoving averageComputer-generated imageryTotal S.A.Projective planeSlide ruleMacro (computer science)Functional (mathematics)Maxima and minimaException handlingRight angleCodeType theoryModule (mathematics)Revision controlObject (grammar)Communications protocolInternet forumFlagBuffer solutionBitMultiplication signDistribution (mathematics)MereologySoftware developerUniform resource locatorHost Identity ProtocolCASE <Informatik>Social classData storage devicePattern languageSoftware frameworkCartesian coordinate systemBoom (sailing)Web pagePoint (geometry)MathematicsLimit (category theory)Source codeMachine learningElectronic mailing listInformationParameter (computer programming)Computing platformComputer architectureGraphical user interfaceVirtual machineMemory managementComputer animation
Computer fileCompilerProjective planeLimit (category theory)Network topologyMathematicsComputer fileRight angle
Limit (category theory)GodRevision controlProjective planeNetwork topology
Distribution (mathematics)Extension (kinesiology)Pauli exclusion principleRepetitionHome pageBinary codeVariety (linguistics)StatisticsObject (grammar)Entire functionRevision controlSource codeField extensionBinary fileMenu (computing)Network topologyMechanism designSoftware maintenanceProjective planeSymbol tableKeyboard shortcutMereologyPauli exclusion principleXMLUML
Advanced Encryption StandardAnalog-to-digital converterComa BerenicesFundamental theorem of algebraInformationCodeEmailModule (mathematics)Function (mathematics)Pointer (computer programming)Independence (probability theory)Process (computing)Source codeProgrammable read-only memoryFAQTime evolutionExtension (kinesiology)Field extensionOverhead (computing)Binary fileHuman migrationRead-only memoryObject (grammar)Asynchronous Transfer ModeStandard deviationPauli exclusion principleRevision controlImplementationElectronic data interchangeRight angleMoment (mathematics)Software developerCore dumpControl flowSymbol tableMultiplication signRevision controlImmersion (album)FlagBit rateGoodness of fitStability theoryMathematicsBinary codeRow (database)Universe (mathematics)WebsiteLimit (category theory)Software maintenanceProjective planeTraffic reportingOnline helpBitNetwork topologySource codeComputer animation
Multiplication sign
Memory managementVariable (mathematics)State of matterType theoryMultiplication signMathematicsLecture/Conference
Coma BerenicesMathematicsLecture/ConferenceMeeting/Interview
Roundness (object)Lecture/ConferenceComputer animation
Transcript: English(auto-generated)
Thank you very much for being here. You're really brave. I was expecting an empty room, so I'm happy that at least you're curious about this topic. The main reason why I'm giving this talk is because once I searched for a talk on this topic and I couldn't find anything.
Only blog posts and discussions around all the time. So I hope that you're still interested into this thing. And a bit of a disclaimer. This is a really controversial topic. So you might have your own opinion, and I really encourage you to talk to me later on. I want to know what you think about it, because it is a problem that sadly is still not solved.
And the other thing. There is a couple of references in the slides that are not true. So I encourage you to find which ones are not true. Okay? Then I give you some homework. So this is how it started. The other day I wanted to, you know, for some reasons, wanted to get a list of all
the files that I had in my backup SSD. And I said, okay, it was not a large, really, backup. As you can see, there are only two millions of files. So I went there and I said, what do you use to do that for some large file system, right? You can use glob, right? I guess that everyone here knows glob. So you do something like this.
You get your files. But the problem that I found is that it was a little bit slow, right? But then I said, what is the better approach? What is everyone else encouraging people to use? Pathlib, right? Who uses pathlib here compared to glob? Okay. I'm really happy that you're not using glob. Nothing against it, but it's faster. So I thought, okay, you give it a try. I know it returns an iterator and I'm casting it to list, so of course there is some overhead.
But still, it's way faster. So then I thought, is it fast enough? I remembered. This is only the motivation for the story. And then I thought, hmm, maybe I can use any of the languages, because here, can anyone programming more than Python, maybe other languages?
So I thought, which other language do I know that I can use that is popular to beat this? Here you have a clue of the language that I was using. Can you guess which language was it? You can shout if you want. Maybe if I zoom in the picture a little bit.
Of course, it was C++, because you have the sky, all the tone of blue, and we all love C++, so I was using C++. So the good thing is that, as you know, you can extend the Python using the C API. Maybe who here has written, even if it's a hello world extension in Python, in C, for example?
Okay. So people are at least aware of it. You can extend it, right? So we will create a new project there for the extension. I hope that you can read well there. Yeah, I think you can. I want you to answer this to me. What is the tool we use to create a new, empty project for C API extensions?
Wait. Why nobody knows the tool? Come on. We have this tool since forever. No? Okay. Of course, we use cargo, right? So I have like a proper version, so if you don't believe me, you have there that at
least it's the Python package manager. It's okay. So we do cargo new and something. And then you go to something, and then you have all the things that you want to write, Everyone is following, right? So this is the base thing that you can get, of course, and then you can hear a base structure of things, and then we can build things with cargo, like you usually
do with cargo build, and then you have your wheel generator, right, for some random project. Cool. Everyone is following. Perfect. So in C++, we have the file system directory iterator, and this is something that was integrated in a couple of versions ago, and I was curious, like, could I do something
that is better with this? So this is the implementation of the function, a glob written in C++. It's really simplistic. Maybe it's not really optimal or anything like that. As you can see, there is just an option to say, like, it's recursive, it's not recursive, whatever, whatever, and then we have some lists and readers on that.
So these are the times that we got, and with fast glob, I got this. So, of course, what is the outcome of this that I managed to have? Who have thought, like, using a compiled language is faster than Python? It's like groundbreaking thing, right? I mean, they're 100% times faster than glob, so of course I was ready for this, right, and going on Reddit being the number one post, whatever, changing the life for many
Python developers, but the package has a couple of questions that we still need to answer. I guess that still you cannot, if you see some issues with the wheel there that we have there. So which are these questions, right? What happens if people is not using Python 3.11, can they use that package?
Do they need to provide one for Python version? What about changes in the C API? Can I only reduce the scope to whatever is included in 3.11, for example? There were a couple of things included in the limited API from 3.11 onwards. Will everything explode? Will something change internally, opaque types, slot IDs, internal micro changes, many
things that we need to figure out. So luckily for us, maintainers, we have PEP 384. This is an historical PEP. If you go to the documentation, it really says this is just for historical purposes, but this is what evolved into that little page, the documentation that you have seen many times for sure.
In a nutshell, you can define the limited API as at the beginning of your code before the Python.h, you can specify there the version of Python, you can just leave three as well. If you're using Windows, you link with one DLL, not the other one, then the tag of the wheel change, and then you will have something like ABI 3, and you might have
some performance issues depending on what you're using. So I wanted to include this PEP as well, because the other day in the C Python panel, there was some discussion about incompatible changes, so I wanted to show you in case you didn't know, like how the things kind of work, right? I thought that this was an old discussion, and when I took this screenshot, I thought, okay, the 2020 was the last time someone discussed about it or amended it.
I noticed that a couple of months ago, there were also some discussion as well. So this thing, it's still evolving. So there's still figuring out, or maybe as you saw in the panel, things are changing on how to change things. So in a nutshell, this is including everything that is public API.
If you were in the panel, you know that sometimes we have public API that should not be public API, and the politics says something like if you have incompatible changes, of course, it's only except if you have a large benefit-breakage ratio, you'd have a deprecation process, but if not, then you need to follow some procedure of having two consecutive minor releases, continuous releases.
There are some exceptions that happen time to time, and you don't need to care about soft deprecations, which is when you stop using something and put any warning there. Okay. And I wanted to put the slides, maybe, because there was a question about, like, how do people deprecate stuff? So in a nutshell, I don't want to bore you with this, because it's only Texas.
Discuss the change, add warnings, wait for Python releases, check for feedback, final removal, and then any questions you ask the series called. Okay. In case someone thought, oh, the pronunciation of this guy is really bad, I'm not saying ABI or API in a weird way, they're two different things. So in case you don't know, pretty sure you all know, ABI, application binary interface,
just for linking all the whole thing for the internals of the Python, and API, it's what you're using from the API that Python provides to have your extensions. And this is more or less where they, in a nutshell, what they do, so ABA-related, AB3 on your tag, and the other things are related to functions.
Okay. I promised you that was a boring part, but it was really necessary, because talking about policies and stuff is not really fun. So what's the catch? We're using the Python limited API. The subset is limited, right? So you will encounter that the whole things that you have available for you, you cannot
use them all. So that's really a problem, because maybe you have a large project using, you know, some C API, and then it's a problem. Functions are as low as the macros. Most of the macros are not part of the limited API. So if you were using one of those, that's also a problem. There is no guarantee that when you have set up the limited API, your whole code conforms
with this. We'll see something later on, an example. There's no guard about calling invalid functions, so functions that are outside the limited API as well, and you might need some extra functionality that is not provided. So can I use maybe, I don't know, a generator wheel for each Python version?
Some of you, I guess, who know CI wheel? Okay, a few hands. It's a really amazing project, it's really simple, and I think that it's really magnificent that they can generate everything with it. But the problem with this that I have is that then you encounter projects like this. Maybe you don't visit the pages in PyPI, but this is the numpy release, for example.
Nothing bad about it, but as you can see, many things around. So a little bit confusing. So why is that a problem? And now I want to tell you a little, little story. So when we're releasing this project that I'm working on, if you're not aware of Qt,
just imagine framework, do graphical interfaces, other applications, in a nutshell, really all is well. But one of the problems is it has many, many, many, many, many module classes, methods inside the classes. It's a huge thing, right? So of course if you ask me how challenging it was to port this thing to Python, it took many, many years.
So for technical preview, we had a wheel that was 400 megabytes, okay? As you know, PyPI, the limit is 100 megabytes, and per project it's 10 gigabytes, so imagine. We had support for the three main desktop platforms and also some different architecture in some of those, and we had Python 2 support as well.
So that means we were having like a minimum of four per Python version. So this doesn't sound so bad, maybe, still for you, but it's not really an optimal scenario. Because first of all, we needed to ask for an exception, right? So these are a couple of slides with a few technical details that I just want to share there. I will try to go fast, so don't worry about it.
So first of all, the things that we did is that most of the macros that we were using were implementing as functions. We needed the buffer protocol, which was adding 311, so we implemented our own. Some flags for debugging and noting when modules have loading, and other functions there that you can see there are some special case. Because you want to have the source code of your project with a function that of course
is valid when you are using limited API, when you are not using limited API, when you are using Python 3, and when you are using Python 2. Remember this was many years ago. So just for your information, in case you are not familiarized with functions and micro, this is for example the simplest one to get an item for a list, and this is the microform. So you see we leave aside all the protection or checking the parameters, arguments, whatever.
So I guess maybe you would say, oh, yeah, but that's really convenient, so let's only use macros, but that's really, really bad, because then your code will explode with any change, right? So then we did other low-level things, I don't know if someone has the light of experience of working with PyType objects, but yeah, we needed to change some slots inside,
adapt some stuff there, so eh, really boring things. And then we were just creating our own heap types, which is a lot of work, whatever, whatever, so this is the part that's technical, but anything. So we couldn't use the whole limited API for this, and the problem was with this,
because as you know, many Linux distributions do not use set up UI for packaging the things, right? So we needed to use CMake and other things. So we ended up with a source code that is filled with Py limited API, if, def, everywhere. So of course, the team was doing an amazing effort, and it's really thankful, at least for, this is one of our external developers that did a lot of the heavy lifting things,
and there is, if you're interested in this topic that might be a little bit boring, but necessary, you can find it there, in this URL. We also then experienced a little bit, who he says, maintain a package that has PyPy wheels, I'm sorry, but it's difficult, and yeah, but that's another side, there are
many things that conflicts and things that not work well when you try to have everything in the same place. So is this still an issue? One of the things that I do on my spare time is I'm one of the moderators of PyPy, so I handle, if at some point you request for an increase, maybe you will see my face there, and then I started to notice a pattern.
So people were showing me, oh, yeah, we have this following release, it's like each release is two gigabytes, and sometimes we release like every two weeks or something. Some other people, oh, yeah, whatever, whatever, so 900 megabytes, 400 megabytes, and so on and so forth, and it's really a never-ending story. A lot of projects out there, now with the whole boom of data science and machine learning,
they have these huge wheels, so they are providing one wheel per Python version. So of course, storage is not infinite, but it is still an issue, so yeah, many projects, I just put there all the things that I could find quickly. So back to our project. The whole magic that you can do for this, to compile these kind of things, is that
for the whole infrastructure that we had before, the Py project file, first of all, this is a simple one, and then you have a simple setup UI, and on the top, I think that on my file, I had the limited API definition as well, but this is not enough
to create an ABI tree wheel. So for that, what I needed to have, it's these little things here. So is someone still using only setup UI here, and not Py project.toml? No? I am glad to hear. So still, it's a little bit wonky, so you still need to use this file, at least we
defined it. Maybe you will tell me, no, but you can specify here, limited API. That doesn't change the tag. So I'm sorry, but that's the problem. And then when you do that, of course, we go back to our beautiful tool that we all know and use, and then you can see there that then we have the ABI tree wheel, right? So yeah, this is what the limited API guarantees you, of course, that you will have a package
that will be compatible with the many Python versions that you saw there, and the example is 3.7, so that means that I can install for 3.7 onwards all the packages that I have there, so that's one of the caveat things. So is anyone here using already packages, ABI tree packages for the things that they
project there? I see one hand. Only one hand? Oh, my God. Okay. I hope that you can take this with you. So the problem with this thing is that you will say, okay, cool, so after I do that, it's everything validated there. What do you think? Can you say yes or no?
So it is not validated. So you can still manually modify the labels of your wheel, put ABI tree there, whatever, whatever, and it's still an invalid thing, but there are no internal mechanisms that you can have to start this kind of thing. So luckily for us, maintainers, there are a couple of packages around.
There is the first one is this audit wheel that maybe you heard about it, used mainly to see if you are complaining with all this PEP 600 and stuff, but there's also the ABI tree audit. That is a really cool tool that I found, doing some research here, if someone is developing this thing here, please let's talk later, and you can see that it really reports some
stuff there. You can see there that, for example, this is a binding generator of PyQ, which is another project, and they are not really ABI tree compliant, because they have a symbol that is not part of the limited API. So that's really a problem when I'm saying that it's not being there.
So then you say, okay, that's too many a word, can I rely on a binding generator here? Who here uses a binding generator? Okay, a few hands. So if you're using PyBind 11, for example, you heard about it, you don't have support in PyBind 11 for generating wheels with ABI tree. I think CFFI has one, but anyway, internally we're using Shuboke that we do have, well,
it's simple, but we pass a flag and it works, but you can still generate it, you need to use it for my work as well, but we promise you it will be valid. Or if you're using Rust, you can, of course, use a Pyo tree, and that also generates wheels that are compliant with ABI tree.
So okay, everything is good, right? Everyone loves the stable ABI and limited API. I don't see the following person in the crowd, so I think that I will not be ashamed to show the comments, okay. So then I read this one. And then I thought, it's really important that what we are using currently for providing
are easy to live for, it's really painful for the people that is developing the internals of Python. So when I saw this post, of course, I could share a little bit of the sentiment here with Mark that, of course, changing things internally will break ABI compatibility forever.
So I invite you to look for that, there is a lot of insight inside this post, a lot of people complaining and maybe saying, oh, we can solve this in Python 4, but what happened with Python 4, and many other things. And it's really problematic, right? Because so far, at least most of the changes that you have seen before in the talks,
that improving the speedup and everything, they really require that you will start modifying some stuff. And this can be really, really tricky. So of course, it's kind of like a love and hate relationship, if I did something. Does anyone have read this, by the way, or not? It was new. Oh, yeah, so you know. Yeah, you know for sure.
Okay. So yeah, because it's usually like, at the moment, I was saying that it should be like this, right? I mean, the internal C Python core developers, like developing everything, running around and then improving and getting more version, but it's usually like this, because we are really, it's a burden on their side. But we need to somehow pick a side, right? So are we doomed with all these things that I am telling you now that, of course, you
will not be motivated to use it? And I don't want to trick your mind, but maybe not. We are. And HPY, and you saw a talk before, might be the answer. If you don't know it, or you didn't watch the talk, maybe you can watch the recording afterwards. But if you go at least to the website, you can see there that one of the goals there
is to beat this, to have some kind of universal binaries that, of course, we will forget about it and live happily ever after. So final remarks. Oh, I used less time than I was expecting. Final remarks as a maintainer, at least so far, is that I would say that as a maintainer, still, this is an open discussion. I saw the other comment that I saw on discuss Python.org, and they still were in 2022
saying something like, yeah, we should sit together, let's go to a spring, let's discuss disable ABI and everything and everything. And yeah, I think that, still, if you are maintaining a project, I would say that please adapt the limited API. I think it is better in that regard. You will find sometimes symbols, after you generate your wheels and your users start
to use them, you will find, like, oh, invalid symbol, or we don't find this reference, whatever, because the limited API breaks between versions, so it's really interesting for you to see that and report and help back. So now the question that I open to you, and if you want to ask me now a question or something, or maybe we can discuss later, is should we back this promise overall?
So yeah, thank you very much for your time, and that's what we do. Thank you, Christian. Python packaging has come a long way, but it's nice to see a talk that also talks
about the problems that we still have there. We have time for questions. There are two microphones here in the room, and you can also ask your questions on Discord, so please come up to the microphone, or if you don't want or you are a remote participant, you can also ask on Discord. No, but I still can hear, okay, you can try.
Does this one work? Yeah, yeah, yeah. Great. Thank you for a great talk. I think this is a topic that deserves more attention than it gets. So awesome.
And the question, when you migrated to universal ABI and you had to migrate to heap types, did you also consider migrating to module state, or did you keep the global variables for your types around? We kept the global variables for some time. But there might be some changes in the future that we got, but yeah, we kept it.
Thank you. You're welcome. And if you are too shy, don't worry, we can catch up later. And if you have any questions, please approach me and let's discuss about this uncomfortable and boring but necessary topic. Yes, this is the right place for a conference. Thank you very much. Please give another round of applause for Christian.