We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Python’s Journey: From Upstream to Enterprise

00:00

Formal Metadata

Title
Python’s Journey: From Upstream to Enterprise
Title of Series
Number of Parts
131
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Have you ever wondered how Python gets from the first alpha version upstream to years of stability in your enterprise Linux systems? And what products and useful components are created for you along the way? In this talk, Lumír will take you through the incredible journey of Python delivery from the first alpha version shipped to Fedora Linux a couple of days after the official upstream release, through containers developers can use for testing with many old and new Python releases in their CI, to Red Hat Enterprise Linux and its main and alternative Python application streams and containers with various Python versions ready to be deployed to production environments with years of required stability. In this talk, Lumír will talk about: * Python maintainers’ focus on speed of delivery in Fedora and stability and reliability in RHEL. * How to use containers based on Fedora for early adoption of new Pythons in CI/CD pipelines. * What challenges do we face during ten years of maintenance of old Python interpreters. Come and learn how you can benefit from our efforts, use a modern development environment, and deploy your apps with guaranteed stability.
Design of experimentsEnterprise architectureOptical character recognitionProgrammable read-only memorySoftwareCodeSoftware developerAlpha (investment)Beta functionOptical disc driveGroup actionBitSelf-organizationSingle-precision floating-point formatSoftware maintenanceDistribution (mathematics)Right angleStatisticsControl flowCurveSoftware testingCore dumpSoftwareSystem administratorEvent horizonDataflowVirtual machinePhysical systemRegular graphSoftware developerFitness functionMultiplication signCodeType theoryDifferent (Kate Ryan album)CASE <Informatik>Thermal conductivitySlide ruleCartesian coordinate systemCycle (graph theory)Revision controlIntegrated development environmentAlpha (investment)Library (computing)Metric systemVirtual realityBeta functionProcess (computing)Architecture1 (number)Open sourceEnterprise architectureOpen setSoftware engineeringCombinational logicComputer animation
Component-based software engineeringDistribution (mathematics)Continuous functionComputer fileDesign of experimentsEnterprise architectureArc (geometry)Rule of inferenceString (computer science)Query languagePrice indexDefault (computer science)Proxy serverCache (computing)Patch (Unix)Software bugRevision controlMultiplication signSpring (hydrology)MereologyCache (computing)Enterprise architectureCuboidRight angleNumberStability theoryBitCycle (graph theory)Flow separationMedical imagingSoftware testingVulnerability (computing)Library (computing)Doubling the cubeOpen setStreaming mediaAlpha (investment)Set (mathematics)1 (number)CurveDistribution (mathematics)String (computer science)SoftwareKey (cryptography)Lipschitz-StetigkeitSemiconductor memoryNumber theoryCommon Language InfrastructureOpen sourceBeta functionRegular graphCartesian coordinate systemEnumerated typeDefault (computer science)Public-key cryptographyVirtual machineBranch (computer science)Different (Kate Ryan album)Repository (publishing)Closed setCollisionLatent heatExterior algebraMessage passingTable (information)Slide ruleAnalytic continuationUniform resource locatorMathematicsGoodness of fitRange (statistics)Computer animation
In-System-ProgrammierungString (computer science)Query languageRule of inferencePrice indexDefault (computer science)Proxy serverCache (computing)ASCIISource codeDistribution (mathematics)Revision controlConservation of energyRevision controlCartesian coordinate systemSoftware maintenancePhysical systemMereologyPatch (Unix)Vulnerability (computing)Directory serviceModule (mathematics)SoftwareFile archiverComputer fileImage registrationDefault (computer science)CodeCache (computing)Shape (magazine)Web 2.0FrequencySystem administratorOpen sourceGroup actionRight angle1 (number)Single-precision floating-point formatTelecommunicationFunction (mathematics)Level (video gaming)File formatMessage passingDifferent (Kate Ryan album)Pauli exclusion principleProxy serverInternet service providerMehrplatzsystemOpen setSpacetimeEnterprise architectureKeyboard shortcutVirtual machineStreaming mediaIntegrated development environmentMultiplicationMetropolitan area networkBootingSource codeConfiguration spaceLink (knot theory)Web crawlerMathematicsWeightDistribution (mathematics)Library (computing)Software bugTape driveSimulationPower (physics)Software developerInclusion mapMultilaterationNP-hardSet (mathematics)Computer animation
Personal digital assistantControl flowMultiplication signElectronic mailing listDisjunctive normal formExterior algebraPresentation of a groupPhysical systemVideo game consoleEmailBitRevision controlInterpreter (computing)Computer animationLecture/Conference
Transcript: English(auto-generated)
Hello, everyone. My name is Lumir Valhar. Who am I? A little bit about me. I'm a senior software engineer in Red Hat working mostly with Python and for Python. I'll describe it later because this is basically the whole talk will be about. I'm also code
of conduct team member for this year of EuroPython. I'm Python community organizer organizing meetups for more than a decade now and also PyCon check. I'm a medic in a Czech Red Cross, volunteer firefighter, drummer and a couple more stuff. So staying close to me is probably making you safer above average. That's probably not the best thing
to say before a social event, but you know what I mean. So what's the motivation of this talk? At the beginning of my career, I was a Linux administrator which meant basically administrating hundreds of Linux machines and installing thousands of packages from
a Linux distribution every single day. And I knew it's open source, right? But I have no idea what amount of work is behind every single RPM package. And I'm going to tell you today what amount of work is behind Python you are probably using
every day. So that's about me and about the motivation. And now we can take a look who you are. I'm not a magician neither. Nor the LLM. So I'm not going to guess. But statistics is clear here. So when it comes to software, and when I say software, I mean something which might be on the left side too new and untested and unstable for you.
And on the right side, something which is too old and too featureless. And so you are probably somewhere in the middle. You want something which is tested. You want something which is new enough to have the features you want. But you also want something which
is still supported, right? So you are probably somewhere in the middle of this bell curve. But we are on both sides in both extremes. We are working on the one side with the latest possible software in federal Linux. And with not that fresh, I would say. But
I'm going to tell you why it's interesting and maybe important for you and how you can all benefit from the work we do. And when I say we, I mean all of the Linux administrator, all of the Linux maintainers, and creators. So it's not just about federal, CentOS, whatever.
It's about every single Linux distribution, every single Linux maintainer. And where are the maintainers when it comes to code flow? They are usually the invisible people in the middle. Typically invisible until something breaks a lot. Then you know that
there is a maintainer who will take care of it. But usually invisible people in the middle. On the one side, you have Python on this conference and for this specific example you have Python core developers. In upstream, on the other hand, you have users downloading, installing RPM packages or any other kind of packages. And in the
middle, there are Linux developers and distribution maintainers. Quite a lot of invisible people with a lot of work to do. So I started as a Linux administrator. Then I switched the job a couple of times, working on networks and stuff around. And then I have the opportunity
to join the team maintaining Fedora Linux distribution. And what I can tell you, Fedora loves Python, really. We are trying to make the Fedora distribution the best distribution for Python developers ever. And you can benefit from our effort, even if you are
not using Fedora at all. I will describe that later. So when I say Fedora loves Python, I mean it. A lot of them, really. Well, you know, 2.7, not that much. And there is always one, the main one, the main that is a base for everything else in the federal Linux distribution, the specific release. I will talk about it in a bit. But it doesn't
really matter whether you are developing Python software for very old Linux distribution or very fresh ones or whatever is your deployment, whatever is your architecture, you can probably find the right Python version in Fedora. So you can use Fedora as a developer machine,
you can develop the software, test it there, and then you can deploy it basically anywhere you want. And it's not just about the amount of Python releases we have. All of them at the same time. It's also about speed a little bit. Because Fedora is basically the first one when it comes to fresh releases. And as you can see in the slide, it took
us only four days to package the first alpha of Python 3.13 last year. It took us another four days to package or to update to first beta. And when the final release came out, it was at the same day in federal Linux. Which is very important. You might think,
oh, this is all the, you know, ad for federal Linux. But it's not. You can benefit from that as well, and I will show you in a minute. And you might think, why should I be interested in alpha releases of Python, right? There are usually releases a week
after the final release of the previous version, like 3.12 final is out, and a week after 3.13 alpha 1 is out. That's true, but you have to count with five months of development included in the first alpha version. Because after the first beta of
the Python development cycle, there are no new features allowed in Python. Wing, wing. So well, and when I'm saying that, it's important because it's our world, basically. It's important for us. But it might be a lot of benefit. It might have a lot of benefit
also for you. Because we are packaging those stuff into federal. That's true. But we are also providing those tools to you, and we created something we called federal Python talks, which is obviously a combination of federal. All the Python's available on top of federal, and the talks, which is kind of important to in testing of Python applications
and libraries, because it can basically take all the Python's you have installed, all the different versions of libraries you support, create a huge metrics, and test it in the different environments. It's really, really interesting. So you have federal as a base.
You have all the Python's there. You have talks on top of it. And all fit into a container, federal Python talks. You can download it. You can use it in your CI CD systems. You can use that for testing, which means that even if you're not using federal at all,
you don't like federal. You prefer the Debian Linux systems or whatever. You can still take that container and test with the first alpha version four days after it was released upstream without the need to compile it for your own. You can test with all the bunch Python's from 3.6 and to 3.13 and very soon 3.14 alpha one in a couple of months.
So that's for developers. And where is the benefit for regular user? That's a kind of different story. As I mentioned earlier, there is always one main Python. The thing
that runs when you type Python 3. And that's important because even if you are just a regular Linux user, it runs a lot of tools you might not know about, but it's a baseline for everything else in the distributions. And that's completely different story. Because
if you package a Python as an alternative one like 3.6, it's just a Python. It's there. You can use it to develop your software, test it, whatever you want to. You can create a virtual environment, install all your dependencies from PyPI to virtual environment, and it just works. That's fine. But from the main Python in federal release, the situation
is completely different. Because thousands and thousands of packages depend on Python in the Linux distribution. And it's not just the case of Fedora. Which means that we really be careful and it takes a lot of time to prepare new main Python for next
federal release basically. Because there are 4,500 RPM packages that need Python to build. And there is 5,600 RPM packages which needs Python to run. Those are huge numbers. And again, it's not just about the Linux distribution. It's about the benefit behind
all that for you all. And that has to be ready when we finally switch the Python from 3.12 to 3.13 or any other new release you can imagine. And that preparation for a switch is basically 12 months of continuous work, rebuilding all the packages. We have
to rebuild all of them because the packages contain the PYC files, the caches, which might be completely different in the new Python versions, and we have to do that. So we are rebuilding them and we are of course finding bugs, right? Because there are people like Victor, and Victor likes to remove stuff from Python. So it sometimes
breaks something, and I would say oftentimes. So we are rebuilding all the packages, all the thousands and thousands of packages in federal distribution, and we are finding bugs and we are finding bugs with alpha 1, which is something you think it's completely
not important, completely something you don't want to care about, but we do. And we are not just rebuilding them and finding the bugs. We are reporting the bugs upstream, I mean on GitHub or any other tool like that you might imagine. We are fixing the problems right away. We are sending the pull request. We are taking the patches back to federal
distribution to have it ready for our users, and so on and so forth. And those are countless. Like hundreds of bugs, hundreds of pull requests filled every year just to make the whole Python ecosystem ready for the next major Python release. And that's really something.
So when you install Python 3.13 final release in October this year, we already spent 12 months testing that and filing bugs and preparing it to work. Well, sometimes it works, like
yay, you are so nice, sending a pull request to us. That's good. I can merge it right away and release a new version of Pytest request or image in some popular library. Sometimes it's not that nice. Alpha 1, why do you care? Alpha 1, close and don't bother me, please. Send me text message when beta comes out. So after all of that work, we
are finally ready to switch the main Python in specific federal release, and that means that new versions of Python are usually faster than the older ones, so that's benefit for all the users. Even if you are just using some application, maybe GUI application
or CLI, it will probably run faster, it will be better, it will consume less memory, all the benefits, so that's the benefit for the regular Linux users, and not just them, but for the basically whole Python community. And after that, you might think, all right,
12 months, quite work. You tested all of them, like thousands of packages, nice, send some pull request. So after the release, you can finally rest and sleep, right? Well, I would like to say yes, but no. The problem is that after we do all this work into federal
distribution and upstream, we have to do it also for more enterprise distributions, some open source ones, like all of them are open source, but some completely open ones like Centro Stream and Red Hat Enterprise Linux as well, and that's completely different.
Like the package set is obviously much smaller because it wouldn't be feasible to support thousands and thousands of RPM packages written in Python in the enterprise distribution, that's not possible, but they are basically the same problems for all three of them, but different solutions we have to come up with. And there is one, the biggest one,
scary one, and that's the support. On one hand, as I mentioned earlier, on one hand if you are on one side of the bell curve, you want the latest, the greatest software, everything fresh, new versions, and that's awesome for us, and it's kind of, let's
say, easier for us, because when a security vulnerability is found in a software, upstream usually fix it, or we fix it for upstream, depends, or anybody else, it's open source, you can all fix it, and then they release a new version saying it's fixed in that version
and you can just rebase it in the Linux distribution and that's great. And we can do that in the federal Linux, and that's fine, and again, we are helping the ecosystem by fixing those bugs, proposing the changes and taking the patches back to federal distribution or updating to the latest version. But on the other side, for the enterprise Linux,
there is a completely different promise, like we don't need it to be fresh, it might smell a little bit, but it has to be secure, like make sure it's stable and make sure it's safe, that's all we ask. And that's kind of challenging, because the release
cycle of federal distribution is twice a year, in fall and spring, and that's fine, and the support for N minus 1 ends a week after N plus 1 is released, am I right? It doesn't really matter. Every half a year there is a new federal, but that's not true for
enterprise Linux, nobody from enterprise wants to update Linux distribution every half a year, they wouldn't do anything else on hundreds and thousands of machines. And that brings us to an interesting problem, an interesting collision, because the table on the left shows you which pythons are the main in which enterprise Linux version, which might be important,
you can see that 3.6 is in rel8, 3.9 in rel9, and there are also alternatives, so also in the enterprise Linux you can use different Python versions, not just the main one it was released with, but the problem is that the enterprise Linux is a much, much
longer support, like ten years, and maybe more if you have enough gold, you know? But the basic is ten years, and what now? Like upstream support for Python 36 for example ended in September 2021, and our support for enterprise Linux 8 will
end in 2029, it's eight years, eight more years when they will slowly remove the range from the GitHub repository and slowly forget about the famous Python 3.6 release, by the way the first one with F strings, so that's why this one is famous, right?
And that's ongoing, that's a rolling wheel, so we have something we want to keep stable, and we on the one hand can rebase and update and do everything and people are ready for a new software, but on the other hand they want to keep it stable, and that's a problem, and I will show you why, or I will show you two examples of the same problem,
the AppSim solution, the Federer solution, and the enterprise solution. So the first one is CVE 2021-23336 in URL lib. The number is not that important
and if you have no idea what the CVE is, it's basically just an enumeration of security vulnerabilities so they are hard to remember, and the second number, basically the first number, the second part is a year it was registered for that specific number, and it's
just a counter. So the problem with that vulnerability was discovered in Python itself that W3C consortium recommends using ampersand for, as a separator, in URL lib,
for different key pairs, like if you have a URL, as you can see in the slide, and there is a key value, key value, key value, they can be separated by either ampersand or semicolon or both, and the recommended way was to use only the ampersand, of course,
but somebody discovered that in Python by default, both semicolon and ampersand are both allowed, and that might be a problem, because when Python communicates with a different application and they use a different setting, it might mean a different thing on a different level like for proxies and webcaches and so on and so forth, so that's a problem.
How we can solve that? Well, upstream says that if the recommendation is to use ampersand, let's switch to ampersand, like now. Okay, that's one way how to look at that, but we can kind of do the same, maybe, maybe not. So in Fedora, we have decided to do the same,
basically, and we not just updated the supported releases of Python to the latest version, which included the fix, and it was fine, so we can say that the vulnerability is fixed on our side as well, but we also took the fix from the newer pythons and backported
it to older ones, so basically the pythons in Fedora and also in your CI CD systems very soon, are in better shape than in upstream, because it's fixed. We fixed it also in the unsupported versions, and that was fine, but the problem is that the change is backward
incompatible, right? If you have a software for, I don't know, something enterprise, let's say. I can disclose the detail later. So if you have enterprise software, you cannot do that. You cannot just switch the default and hope that it will work like we are talking
about banks, right? So it might not work like that. It might not be simple. So what we decided to do on the enterprise level is that we decided to adjust the patch from upstream to keep the old default, the old behaviour, allowing both, ampersand and semicolon, but showing you the warning if you use that default. So if you are proactively changing
that behaviour in your code, that works fine, and you can do that, and nothing happens for you, but if you are with the default, it will show you a warning in the log files, and if you are debugging the applications, you will see that, oh, don't do that, really.
Don't use the default. Set one or another, but don't use the default. Don't use both. And we basically provide multiple ways how to set the default because sometimes the enterprise are not really allowed or able to change the application they are running, even they
are written in Python itself, like you have a bunch of very old code full of spiders and nets and whatever, and you don't even touch it. I don't want to. Please don't make me. So for enterprise, we usually provide a way how they can reverse the change or how they
can set the default some other way, but please don't make us change the code, don't make us touch the application. So you can do it in a Python code itself. That makes perfect sense. You can do that in the config file, in etc for the whole system, for the whole machine, and you can also do that as an environment variable which makes
it possible to do that for a single user or stuff like that. So that's it. Single problem, but multiple different solution, and our goal is to make sure that everything is secure, whether you accept new version, you are using the latest Fedora with old Python, you are using enterprise Linux, centostream, whatever, our
goal is to make it safe, and we are usually also participating in the discussions about the possible ways, possible solutions, and that's going to be next example. As I mentioned, the second part of the CVE is the year it was registered, and it was fixed 15 years
after the registration year. The problem is that the tar file module directory travels, which means that if you have specially crafted tar archive, it can be used to rewrite something on your system, because it's powerful, right? It was used to back up the whole systems
on the tapes, so it has to be capable of back-upping, hard links, special files, and everything that Linux supports, which is not that used nowadays, but it was back then, and the format is still the same. So the format is powerful, and it's well-documented,
so when somebody requested to register the vulnerability that is possible to take the tar archive and say Python extract that, it can destroy your system, upstream said yeah, it's documented that way. Read the documentation before you use that, and don't unpack untrusted
sources. Don't do that, man. It's a really bad idea. And we agreed with that, right? It's documented, it's the same situation, like pyaml, please don't use pyaml, and the default loaders and stuff around. Everything is documented, but people don't read, usually. So it might be a good idea to have a safer default. If you want to use something
special about tar files, you can, but it might be a good idea to have a safer default. So Peter, Peter Richtering, here on the EuroPython as well, our former colleague, is now hosting the keyboard open space, decided to fix that problem, and it required to write a PEP,
and it did that while we were in the same team. That's why I can talk about it right now. So he proposed that, okay, let's have different levels of trust.js in the tar file archives, and one can be fully trusted, so I know what is inside, how special it might
be, I know what is inside, and I fully trust the tar I have, and the other might be the tar means to support all the features Linux have, and the data is, please, extract only fires, but don't do anything special to my systems, don't overwrite etc passes,
and stuff like that, right? So he proposed that PEP, it was accepted, and there is a deprecation period in Python 3.12 and 3.13 saying that if you use the default, it's a bad idea, you should switch, because the default will be different in the Python 3.14, and it will be then only the fully trusted, sorry, the default will be data
then. So what we can do in Fedora, we just updated basically to the latest version of Python where the fix is implemented, and also fixed all the unsupported Python releases,
so again, they are in the better shape than upstream, but the enterprise, we were even more strict, we implemented the fix, and patched it to be, so the default is the same as it will be in the Python in a year, to be as safe as possible, because
this vulnerability was kind of critical. So Linux maintainers and Linux distribution developers are also part of those discussions, and parts of those solutions, so what's the summary, what's the final take from that, is that hundreds and hundreds of bugs are
resolved even before you can imagine them, even you can first can run the Python code, the bugs that might be in your favourite library are already solved, because somebody spent 12 months testing that, contributors help also you prepare your application and
libraries, especially if they are packaged somewhere, in some Linux distributions or some other distribution format. You can benefit from the work we do in Fedora everywhere, it's just about the fixed bug, but if you are really interested about new Python, and you are really pissed off how long it takes for GitHub Actions and all the other
CI providers to include new Python, Fedora is away, really, and it's ready to be deployed to GitHub Actions and used with the latest Python. If you want to, I don't know why, but if you want to, you can use the single Python version for decades, but there
are so many nice features, right? Don't do that. Update instead. And the final thing is that the enterprise Linux distributions are not just built on top of open source, they are part of building open source, and that's something I didn't know back then
as a Linux administrator, and I'm really proud of now. Thank you. Again, there's a problem. Thank you so much for your presentation. If you have any questions,
you can move to microphone. Okay. So I remember reading something on the Fedora mailing list about the private Python version, so it was kind of surprising to me when you mentioned that switching to Python, not switching in time might break the system. So can you
elaborate a bit more about what the private Python version in Fedora is about? I don't think there's anything like that. There's no private Python version in Fedora. The Python, there is always one main Python, which everything else in the Fedora written in Python or based on Python runs on, and that's completely publicly available
for everybody. So if you run Python 3 in your console, it will run the same Python as used by tools like DNF, let's say. So it's backed exactly the same thing. The only problem is that the alternatives are like alone, just the interpreters and nothing more,
but the main Python have to work together with all the hundreds of RPM packages. You're welcome. Okay. So it's time for coffee break. Enjoy it. See you at 3.30 p.m.