We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PyRun - Shipping the Python 3.7 runtime in just 4.8MB

00:00

Formal Metadata

Title
PyRun - Shipping the Python 3.7 runtime in just 4.8MB
Subtitle
How to put Python on a diet without losing functionality
Title of Series
Number of Parts
118
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Python has become the defacto standard tool for many people to write tools, command scripts, smaller applications and even large applications. On Windows, it is fairly easy to build application bundles using e.g. py2exe, but on Unix, the situation is less obvious, unless you want to rely on OS specific Python distributions, which often require severall 100MB with of installation on the system and are usually customized in distribution specific ways. Instead of relying on OS installed Python distributions on Unix, our open-source eGenix PyRun provides a more or less complete Python runtime (interpreter and stdlib modules) in a single file, which can be ""installed"" by simply copying the binary to the destination system. The file can be as small as 4.8MB for Python 3.7, by using compressors such as upx. Due to its size, it's also the perfect Python distribution for Docker containers. The talk will show how PyRun works, is built, how to customize it to include additional modules and applications.
20
58
Run time (program lifecycle phase)GoogolPoint cloudComa BerenicesSoftwareProcess (computing)BitGoodness of fitRight angleLecture/Conference
MathematicsPoint (geometry)SoftwareCore dumpSoftware developerFile Transfer ProtocolRun time (program lifecycle phase)SoftwareBitMultiplication signInformation technology consultingBlogComputer-generated imageryProjective planeFile Transfer ProtocolBinary fileHydraulic jumpGastropod shellRevision controlSpacetimeWebsiteSingle-precision floating-point formatDirectory serviceComputer fileScripting languageWeb 2.0Virtual machineInterpreter (computing)Computer animation
Run time (program lifecycle phase)Product (business)Revision controlServer (computing)Computing platformCodeExtension (kinesiology)Modul <Datentyp>Module (mathematics)Link (knot theory)AngleStandard deviationConvex hullBlogSoftware testingPhysical systemOcean currentWeb 2.0BitMereologyVirtualizationComputer fileConfiguration spaceComputing platformProjective planeRun time (program lifecycle phase)Standard deviationCycle (graph theory)Different (Kate Ryan album)WindowParsingBinary codeFreezingBytecodeServer (computing)Library (computing)Single-precision floating-point formatOperator (mathematics)CASE <Informatik>Extension (kinesiology)1 (number)File systemProduct (business)Open sourceVirtual machineFile Transfer ProtocolCartesian coordinate systemMultiplication signLipschitz-StetigkeitBootstrap aggregatingComplete metric spaceCodeModule (mathematics)Forcing (mathematics)Revision controlCore dumpProcess (computing)Interpreter (computing)Module (mathematics)Sign (mathematics)Uniqueness quantificationComputer clusterData compressionComputer animation
Software testingRevision controlData managementCache (computing)Computer fileDirectory serviceIntelGame theoryRevision controlVideo gamePoint (geometry)BitComputer animation
Computer iconProduct (business)Scripting languageIndependence (probability theory)Basis <Mathematik>BuildingPersonal digital assistantMobile appModule (mathematics)Core dumpSoftware developerRun time (program lifecycle phase)Modul <Datentyp>Inclusion mapCore dumpGastropod shellModule (mathematics)Software developerComputing platformMedical imagingModule (mathematics)Scripting languageBinary codeDomain nameData compressionCartesian coordinate systemExtension (kinesiology)BitComputer fileMobile appCASE <Informatik>Multiplication signProduct (business)View (database)BuildingSingle-precision floating-point formatOrder (biology)Software testing1 (number)Level (video gaming)Projective planeRegular graphCompilation albumRight angleClient (computing)Band matrixMereologyExecution unitSet (mathematics)FreezingCollatz conjectureForm (programming)Computer animation
BlogSoftware testingConvex hullInformation privacyComputer iconObject (grammar)Computer filePatch (Unix)LogicModule (mathematics)Normal distributionDistribution (mathematics)Module (mathematics)Standard deviationImplementationComputer animation
Data structureExistenceSource codeParameter (computer programming)Scripting languageConfiguration spaceDirectory serviceModul <Datentyp>Numerical digitComputer fileLibrary (computing)Variable (mathematics)Statement (computer science)BuildingLine (geometry)Windows RegistryFunction (mathematics)Type theoryCore dumpPhysical systemInterpreter (computing)Interface (computing)Operator (mathematics)AbstractionSocial classStatisticsMathematicsMaxima and minimaThread (computing)Regular graphNormal (geometry)Operations researchComputer configurationLinker (computing)Object (grammar)Rule of inferenceShared memoryCase moddingModule (mathematics)Standard deviationCodierung <Programmierung>Modulo (jargon)Complex (psychology)Network socketDefault (computer science)Lie groupInformation securityAlgorithmMenu (computing)Hash functionMereologyExtension (kinesiology)Latent heatError messageWebsiteRevision controlLink (knot theory)Structural loadCache (computing)Slide ruleNormed vector spaceSoftware developerLibrary (computing)Computer fileShared memorySheaf (mathematics)CodePrice indexFluid staticsParameter (computer programming)Module (mathematics)MereologyDescriptive statisticsRevision controlExtension (kinesiology)Configuration spaceBuildingMultiplication signMixed realityInternet forumLipschitz-StetigkeitSequelRight angleComputer animation
Patch (Unix)LogicDefault (computer science)Error messageThread (computing)Interface (computing)Context awarenessVariable (mathematics)Modul <Datentyp>AbstractionSocial classFlagBinary fileRevision controlSource codeMathematicsMemory managementMultiplication signBitRevision controlEndliche ModelltheorieSource codeDistribution (mathematics)Term (mathematics)Directory serviceBinary codeFreezingBuildingPatch (Unix)Process (computing)1 (number)LogicCodeComputer fileComputer animation
Binary codeLibrary (computing)Embedded systemComputer animation
Embedded systemLibrary (computing)Computer programmingINTEGRALPoint (geometry)Functional (mathematics)Integrated development environmentLipschitz-Stetigkeit1 (number)Meeting/Interview
Multiplication signComputer animation
Limit (category theory)Computer fileSuite (music)File systemSymbol tableFormal grammarSoftware testingEmulatorLecture/Conference
BuildingComputer fileFile systemStandard deviationCodeOrder (biology)Process (computing)Extension (kinesiology)Library (computing)CASE <Informatik>Revision controlLimit (category theory)Computer animationMeeting/Interview
CASE <Informatik>Lecture/Conference
Transcript: English(auto-generated)
So thanks for the introduction. Can you all hear me? Do I need to speak up a bit? Yeah, it's OK. Good. Great. So PyRun. This is something that I do for my day job.
We're not doing Europe Python. So just to give you a little bit more background of what I'm actually doing all the time during the day. So I have my own consulting company. I'm a CTO of a fintech company in Cyprus. I'm a senior software architect, and I've done lots and lots of things in the Python community.
You can read up all of that on my blog. I don't want to go into too much detail here. Let's jump directly into the talk. So what is PyRun? PyRun has a long history. It started, I think, I don't really remember. It started in the late 1990s. I think it was 1998-ish, kind of.
And it started with a completely different project. The project was called MXCGI Python. At the time, I named all my tools MX something because of a naming conflict I had with the Zopcorp packages for MX daytime. And the idea there was that I wanted
to use Python on the typical FTP website hosters that you had at the time. So in those days, you couldn't just upload a script and then run it if it was Python. You could do that with Perl. They all supported Perl at the time.
But Python was not really a thing there. So what I wanted to do is I wanted to get sneak Python in on the web hosters machines, which was kind of like hacking an executable in there. And it worked really well because I found that you can upload it into the CGI bin directory,
and then you could also upload a shell script, which then turns your uploaded file into an executable. So I thought, OK, let's try this. Let's do this and take Python, make it really small, upload it, because FTP space at those times was expensive. And so I wanted it to be really easy to do.
So I just created a single file out of Python and then uploaded it to FTP to the FTP hoster and ran the script, and then I had an executable Python interpreter, and I could upload my Python script to the CGI bin directory and run it. And I was not alone with that wish
to run Python on one of these hosters. There were actually quite a few other people. So the project, sorry about that, I'll just press here. So there were quite a few contributors who then created these single file Python binaries for lots
and lots of different systems. At that time, it was not like today, where Unix basically means you have Linux, you have maybe FreeBSD, and then maybe you have a Unix system like Mac OS. There were lots and lots of different systems, Solaris, HPUX, all kinds of variants of that.
So you needed to first figure out what system was running at the FTP hoster, and then you could upload the correct binary. So that lasted a couple of years. It then pettered out early in the 2000s because then the web hoster started to support Python.
And then basically, I dropped that project. Now, in 2009, my company was producing a product or wanted to produce a product, which is one part of the product is a server application that's written in Python, needs to run on Linux. And I needed some way to ship this product to clients.
And so the problem that I ran into was that if I were just to use the OS-based Python installation, there were so many variants of that OS-based Python installation that I couldn't really support it because I was basically a one-man show as a company. And so I needed something that had basically a stock
configuration, something where I knew exactly what kind of Python to expect. And so I remembered that I had this MXCGI Python project. And I then revived it and then turned it into a bit more than just a single executable. For Windows, the solution for us was very easy.
We could just use py2exe for this. So pyrun currently does not run on Windows because of this, because we don't have a need there. But on Unix, there was no appropriate solution for that. So I started to do the revive the MXCGI Python,
and I beefed it up a bit. So I had a business requirement. Let's add some more business aspects to this. So the business requirement for me was to create a single executable that has the complete, or more or less the complete, Python runtime, including the standard library in a single file so that installing Python on a machine
literally becomes a single copy operation. And I wanted it to work on Linux, on FreeBSD, on Mac OS. Those were the ones that I was interested in at the time. It probably does work on other systems as well. Now, how does this work? How many of you know the Freeze tool in Python?
OK. I bet you didn't know about this five years ago, because, or maybe, let's say, 10 years ago. Python 3 is a bit older now. So the Freeze tool, what it does is it takes Python modules, it compiles them to bytecode,
and then stores the bytecode in a C struct or array. And then it puts everything into C files, and then compiles everything as Python C extensions, and then puts that into a module file that you can then link.
And so this is how you can get Python code into an executable or a library. And this tool has been around for ages. It was written by Guido himself. Later, Mark Hamman extended it to also work on Windows. I don't exactly know why Guido wrote this,
but he probably had some use case for it. Nowadays, it's used for the import lib, because for the import lib, you have this bootstrap problem that you first need to, if you want to run Python, you first need to get the code from somewhere, the Python code, and import lib is written in Python. So the issue is that if you want to import something, the import has to use import lib,
and so you need to somehow figure out how to do this. And the way it works is that a small, let's say, core part of the import lib is actually frozen into Python as well. And this is why it started to be used again. When I started to write pyran, the Freeze tool
was not maintained anymore. So I had to do some fixes to make it work again. So how does it work? Essentially, I wanted to take the standard library, which is mostly Python modules. It's also a few C modules. So you had to do two things. One was to get all the C modules, the extensions that are being built in the standard build process of Python,
to not be compiled as shared libraries, but instead as static libraries so that you can link them directly into the interpreter. And then the second step was taking all the Python modules that you have in the standard library and then convert them to C extensions as well and also link them statically. So you get everything into a single file.
Now, of course, you can do this for a single application. It's a bit tedious to always redo everything for every single application that you want to run. Or let's say you want to do a release cycle in your product, and you always have to run all these things again, which I didn't really like. So what I decided to do is to just take the standard Python,
turn that into a single executable, and then wanted to ship the application code, the Python code, as a zip module so that you essentially get two files, the pyrun and the executable and then the zip file with the packages, the Python code.
Now, that was relatively easy to do. But then I wanted a little bit more because I thought that, well, we're almost there. We always almost have something which is more or less identical to Python, and it's tiny, and it would be really nice to use it pretty much everywhere. Instead of virtual ends, for example, you copy your pyrun
and you're done. You don't have to have a separate installation for a virtual end. So I thought I'd add the Python command line as well. Now, the problem is that the way pyrun works, it cannot use the C command line parsing that we have in Python, but instead it has to use Python code for this.
So I had to rewrite most of the command line parsing that's being done in Python, in Python, and then, again, do the same thing, wrap everything, put it through freeze, and then put it into the executable. So I managed to do that. That's very nice.
I managed to do that. It is a bit slower, of course, because it's Python running to do the command line parsing, but there's a trade-off there because when importing things from the C extensions that freeze builds, the import is a lot faster than going to the file system
because it doesn't have to go to the file system. File is always very slow. If you load everything into RAM, it's much faster. So I could make it a little bit slower, then gain all the flexibility, and then have it work. And it even supports interactive use now. So you can actually start it and it comes up with a command prompt again, so you can use it
just as regular Python. So this is essentially where I am. So now I have a PyRun. It's open source. It's a free, more or less, drop-in for a standard Python runtime. It doesn't use hundreds of megabytes in the file system. You don't have to install it anywhere. It runs on, or let's say it works with Python 2.7,
3.6, and 3.7 now. It also supports lots of older versions. So 2.4 is the oldest version that I still support, not in the current version of PyRun, but in previous ones. And it runs on all the platforms that I wanted to have it run on.
The executable size is between 3.7 megabytes for Python 2.7 and 4.8 megabytes. Of course, I'm cheating a bit. I'm using UPX compress for this, but anyway, it still works. The startup time is a bit slower. So this is what it looks like.
And because I wanted to not only talk, but only show some stuff. So this is the project. Let's go here, for example, for 2.7, in case anyone is still interested in that.
So here you go. 3.7, can you read that? Is that, should I make it bigger? No, it's okay, good. So just to demonstrate how this works, let me just do this and you can actually see it. So this is the UPX version.
Where is it, 2.7, UPX. So this is what you get when you run it. And it works in standard way. You can import stuff, you can do all kinds of things. You can, basically it's a standard Python. You can also run PIP with it.
So I can do like this. Actually, let me see whether it's already installed. It is already installed. So I can install PIP, I can install setup tools, and I can then also run PIP.
The PIP will then use, this is a bit annoying. I can then install something, let's say, let's say this one.
Okay, installed something, and then I can go here and I can then run this. This is a game of life in Python. So that was 2.7. Let's go to 3.7. 3.7 is a little bit, turn this off.
So you can see here, down here, 4.7 megabytes.
The original one, the uncompressed one is 14.4 megabytes, which also is not that big. But it's amazing that you can actually compress it down to that size. And it works in the same way as the 2.7 one I just showed. So let's go back here.
By the way, if you have questions, just feel free to ask. So we, I can just answer them right away. These are some use cases of ejanks-pyron. I'm pretty sure that there are lots more.
These are mostly the ones that I came up with. I know there are other projects that try to do similar things. And they are better in marketing than I am, so they come up with more use cases. The ones that we really care about is that we're independent of the OS Python installation. That's the most important one. We want something small to easily ship to clients.
We want to easily make it available as a download without having too much bandwidth use. It's extremely good for Docker containers because for Docker containers, because it's so small, you can just easily put it into a container image and then the loading the image is very fast.
It's much faster than having a regular Python installation there. And it's very easy to build single app applications out of it. I'm gonna go into that detail some more because what I've added is integrated zip file support for PyRun. And the way it works is very simple. You create your Python application.
Very important is you have to add a dunder main module. How many of you know what dunder main does? Some of them. Okay, so the way it works is when you have a zip module and you have Python run that zip module, then if it finds a dunder main module
at the top level of that zip file, it will execute that. So it works basically like what you have with the typical, the main execution part that you put into Python scripts. And so if you have an application like that, if you have a script and you just add this dunder main module to your zip file,
you then concatenate PyRun that zip file. You produce a new file. Let's say hello. You make it executable and you're done. So you don't need compilation anymore. So this basically made my day because I did not have to send zip files around anymore
for like let's say application updates. I could just create a new executable and send that around. And just so, to show you how that works, so I prepared a little something here.
So you can see there, there's the main module. It's just a very simple hello world, right? And then you have the, I put that into a hello zip file. Actually, I don't want to unzip it. I just want to read it.
Forget it. It's, let me just do that again. So I put the dunder main in there. Right? And then I concatenate the two. So I can choose the slightly bigger one, the 40 megabyte one, which is faster to load
or I can use the UPX one, which is smaller to load. So let's use the UPX one. And I simply put the hello zip behind that and I create this one, hello, and then UPX.
And you make it executable. You run it. There you go. So, as you can see now, thanks, so it's like magic, huh?
It's like you turn 100 megabytes into 4.7 megabytes. Yeah, so it's really, really easy to create these single file apps now. Basically, these three steps or two steps that I have here, I'm probably going to turn into some shell script or something as well so that you can just run that.
Okay, so customizing PyRun. Of course, this was the easy way to do things. You just get the PyRun for your platform. Maybe you have to compile it yourself. Normally, we provide binaries for these.
We haven't done that in a while because the build form that we had basically crashed and I haven't had time to refix it and then get it running again. So, the nice thing about this is that you don't actually have to be a core developer to do this. It's not really, if you know the right places to fix
and the right places to tweak, then it's not that hard. So, of course, you should know a bit about how tool freeze works. I put a special file in here, this PyRun done to extras.py.
In that file, you just import whatever packages you want to have and then freeze will automatically find them for you and then integrate them into the package. So, it's very easy to add new modules. It's a bit harder to exclude modules. So, let's say you have a Python package
that has test modules and you don't want to ship the test modules together with your product, then you typically want to exclude that. And in order to exclude things, you actually have to go into the make file and then into the excludes variable and put your particular, whatever you want to exclude. Let's say your test sub-package, you exclude that so it doesn't get integrated into the package.
And then the next thing is, the next step, let's say, is if you want to add custom C extensions that you have or maybe you have dependencies that have C extensions. For those, of course, you have to tell Python when it compiles to add those to the executable
that comes out. In order to do that, you have to use the module setup. How many of you know or have ever edited this file in a Python installation when compiling it? Extremely few, okay. So, here's some fun.
So, let's say, what's the name again? Let's say we go here. So, this is the standard distribution of Python. It has a few patches because I need to do a few tweaks for pyran, but not really that much.
And then, the way that Python determines whether to compile any of these modules that you see here into the executable that you're building or into a shared module, it looks into the setup file, which looks like this.
This is an extremely old file. It still references the makefilepre logic that Python used in the very early days to build C extensions. So, in the very early days, you did not have anything like distutils. You had to basically do everything yourself. You used this makefileprein concept and then you added your configuration
into one of these setup files and this would then make Python compile your extension into an executable or a shared library. And then, you could put it into your package. So, there's lots of description here. You know you have, it looks a lot like a makefile.
Then, you come down to this section. This section down here tells Python which modules to actually integrate into the executable. And then, there is, let me see where I can find it.
So, all these modules that you see here, those are statically compiled and put into the, right, here it is, statically compiled into Python and everything that comes below this shared indicator
in here, this will then be compiled into a shared library. And as you can see here, I commented out that shared. So, everything that comes below is still gonna be compiled into the executable because that's what I wanted. And as you can see here, these are just, you know,
C modules that Python sign-out lib uses and then I had to make a few fixes because I wanted to add some of the modules that typically don't get added as a static version into Python. This file is not being, let's say,
it's not well maintained anymore because normally nowadays, everything gets compiled as a shared library. And so, some parts are missing like the various SHA modules here, for example. They were not in that file. I had to add them. Some things are also removed. For example, tkinter I don't use so I did not put that in.
And then, if you want to add other stuff, then you can just go down here and just append it. So, the way it works is you just have to tell Python where the C code is, whether you need any parameters like for example here, these are the things that you have to do for SQLite.
And then, you just add it to the setup file and essentially, you just let the make run and everything works out by itself. So, that was a short tour through the setup file. I don't think I have time
to actually show the compilation, but let me just maybe go through the changes that I had to make to go from 3.6 to 3.7. And this is interesting. I don't know, is Victor here? No. Victor Stinner, he made some changes between those two releases and because the import logic in Python
sometimes changes from release to release or there are new ways integrated into the Python build process of how to configure certain things, you always have to touch the code base a bit. There was a lot to do from 3.5 to 3.6.
The path from 3.6 to 3.7 was basically just a few hours work. This is just to give you an idea of how you can port pyrun to new Python versions. It's actually quite easy. You just take what's there already for the existing Python version and then you have to tweak the patches a bit.
You have to, sorry, you have to then adapt your setup a bit because new models get added, of course. Others may need some tweaks in terms of definitions that you have to add. And then basically, things just work. And then the freeze tool itself also sometimes needs some fixes because what I did, like I said,
I had to tweak freeze to make it work with Python again. And what I did is I basically copied the freeze tool into pyrun, into the source code package so that I can apply changes to that as well. And I had to make some fixes there as well and that was done, I was done and everything worked.
So this is where you can get pyrun. Like I said, the released source and binary versions are a bit older. The release ones only support 3.5 as the latest version and then 2.7.
What I will do is I will put the current version that I have already working. I will put that up on GitHub so you can download the sources and then compile it yourself. Compiling it yourself is pretty easy. Basically, the package comes with a makefile, it comes with some documentation. You just have to run make distribution
and then basically you're done. Everything should then be done for you and then you can just pick up the release package of pyrun from the distribution directory and then you can use that. Yep, that's it. Thank you for your attention.
We do have time for a couple of questions. There is a microphone there. You can just ask away. Hello, thank you for the talk. I have a question. Is it possible to use pyrun to embed Python in binaries
or if not, how much work would that be to create a statically linked library to be used by executables? You want to take pyrun and put it into some other executable. I would like to have a library that I can then use
to embed Python, for example, a C program so that library doesn't have any dependencies, external dependencies. That would require some work but it's possible. Yes, definitely. So essentially what you have to do is you have to,
basically you have to change the way that the main, the main function works in pyrun. You have to remove that and turn it into a library but I suppose you could do the same trick with libPython, let's say, or libPyrun and then get everything integrated.
You probably have to define some entry points there for the library, some new ones, to get everything working but it should be possible, yes. Okay, thank you. Okay, we have time for one more question Anybody?
Well, I have a question but let's see if there is a question there from the audience. Yes, go to the, well, no, no, no, it's okay. Oops, sorry. Thank you. Thanks for the talk. I'm just wondering, with this pyrun, it looks great and everything but I'm just wondering whether there are,
are there any limitations, some features, Python features that you cannot use with it? I mean, are there any downsides or is it a silver bullet for all the problems? Well, there is, there's one downside with this whole approach and this is also why, for example, the Python test suite does not fully run which is that some packages, they put extra files,
let's say, text files or symbol files or whatever, into the packages themselves and because of the way those packages are written, they try to actually access the file system to get to that particular extra file that they have put into that package
and, of course, pyrun doesn't have a file system. It also doesn't emulate a file system so the files are not available. I had that issue with pyrun because, for example, the Python grammar is one of those files. It gets put into a special file that gets installed in the file system and in order to make that file available,
I had to basically take the file and in the process of building pyrun, I had to integrate it into pyrun as well and then write some extra code to make it available via the standard APIs inside pyrun. So I had to do some of those tweaks. I only did those tweaks for things that Python itself needed. If you want to, if you have something like that,
then you would have to do those tweaks yourself but that's actually pretty much the only limitation I know of. I could imagine that some packages that have external C extensions that they might be hard to compile into a static version
because sometimes the way that those shared libraries are built is very complex and if you want to turn everything into a static library, then you can run into problems but it's just a matter of effort. You can get this working pretty much in any case.
All right, let's thank Mark again.