We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

PIP Internals

00:00

Formal Metadata

Title
PIP Internals
Title of Series
Part Number
41
Number of Parts
173
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceBilbao, Euskadi, Spain

Content Metadata

Subject Area
Genre
Abstract
Xavier Fernandez - PIP Internals pip is certainly one of the most used package in the Python ecosystem, but what actually happens when you pip install foo ? The talk will mainly focus on two aspects: - how does it perform an installation and resolve dependencies ? - how does pip find installation candidates and select the 'best' ?
Keywords
51
68
Thumbnail
39:40
108
Thumbnail
29:48
Interface (computing)Line (geometry)CodeSheaf (mathematics)Density of statesDirected setPresentation of a groupMereologyRevision controlMathematicsFunctional (mathematics)Term (mathematics)Electronic mailing listSuite (music)Streaming mediaInterface (computing)Correspondence (mathematics)Lecture/ConferenceXML
Directed setPredictabilityDensity of statesLogical constantSystem callComputer fileIntegrated development environmentConstraint (mathematics)Pauli exclusion principleComputer configurationSoftware developerInformationSet (mathematics)Directory serviceData structureComputer fileRevision controlMultiplication signIntegrated development environmentLink (knot theory)Form (programming)Optical disc driveMathematicsGoodness of fitDataflowGroup actionThermal conductivityGraph coloringInteractive televisionProduct (business)Parameter (computer programming)Endliche ModelltheorieComputer animation
Integrated development environmentDensity of statesSample (statistics)System callComputer filePhysical systemElectronic signatureForm (programming)Latent heatBitIntegrated development environmentComputer fileWaveComputing platformSet (mathematics)Reduction of orderLogicMetadataCombinational logicRevision controlComputer configurationComputer animation
System callIntegrated development environmentDensity of statesLogical constantConstraint (mathematics)Revision controlComputer animation
Logical constantDensity of statesCurve fittingLocal ringConstraint (mathematics)Revision controlTwin primeDesign by contractDimensional analysisSpring (hydrology)Computer animation
System callIntegrated development environmentIndependence (probability theory)Density of statesLogical constantMessage passingSystem callComputer fileRevision controlIntegrated development environmentString (computer science)AuthorizationCellular automatonTextsystemOrder (biology)Medical imagingComputer animation
System callRevision controlDensity of statesDiscrete element methodResultantMereologyPhysical system1 (number)Functional (mathematics)Link (knot theory)Point (geometry)Correspondence (mathematics)Computer fileRevision controlXMLComputer animation
Link (knot theory)System callRevision controlInternet forumDensity of statesPoint (geometry)Forcing (mathematics)Revision controlUniform resource locatorNeuroinformatikSet (mathematics)Link (knot theory)MetadataComputer animation
Link (knot theory)Directory serviceTemporal logicIndependence (probability theory)Cache (computing)Structural loadSet (mathematics)Instance (computer science)Uniform resource locatorFlow separationComputer configurationMereologyIntegrated development environmentLogic gateDirectory serviceMetadataStructural loadComputer fileCache (computing)Lecture/ConferenceComputer animation
Cache (computing)Density of statesBinary filePoint (geometry)Link (knot theory)Link (knot theory)Computer fileRevision controlComputer configurationBuildingNormal distributionGenderUniform resource locatorSet (mathematics)CASE <Informatik>Figurate numberDistribution (mathematics)DiagonalLinear regressionMathematicsForm (programming)Computer animation
Sheaf (mathematics)Density of statesDirectory serviceScripting languageComputer fileComputer fileSingle-precision floating-point formatDirectory serviceSystem callRevision controlScripting languageComplex (psychology)WebsiteRow (database)StapeldateiSet (mathematics)WordData managementParsingDistanceWhiteboardWave packetState observerComputer animation
Single-precision floating-point formatOpen setExt functorCompilerData Encryption StandardMetropolitan area networkRevision controlArithmetic meanDirectory serviceStudent's t-testView (database)Family of setsInstance (computer science)Descriptive statisticsDistanceFigurate numberTrajectoryTerm (mathematics)XMLComputer animation
Crash (computing)Density of statesOrder (biology)Internet service providerWater vaporOrder (biology)Moving averageComputer animation
Order (biology)ImplementationInternet service providerSystem callComputer fileMetropolitan area networkRevision controlFile formatMaxima and minimaComputer configurationDigital filterAxiom of choiceRevision controlGame controllerAndroid (robot)Computer fileFlow separationPhysical systemComputer configurationPoint (geometry)Confidence intervalCore dumpDifferent (Kate Ryan album)Graph (mathematics)Event horizonMusical ensembleSet (mathematics)Group actionFile formatMereologyInstance (computer science)Uniform resource locatorWordRight angleComputer animation
Subject indexingPrice indexDensity of statesLink (knot theory)Independence (probability theory)Electronic mailing listComputer fileInformation systemsIcosahedronSubject indexingSemantics (computer science)Natural languageComputer configurationLink (knot theory)Computer animation
Subject indexingDensity of statesPrice indexForm (programming)Link (knot theory)Link (knot theory)Uniform resource locatorSubject indexingNormal-form gameUtility softwareOrder (biology)Integrated development environmentMereologyComputer animation
Link (knot theory)Subject indexingIndependence (probability theory)Density of statesPrice indexComputer fileNumbering schemeDirectory serviceComputer configurationWeb pageData typeIcosahedronParsingBinary filePort scannerMeta elementReal numberRevision controlComputer fileLink (knot theory)Uniform resource locatorCASE <Informatik>Functional (mathematics)Subject indexingLatent heatComputer configurationQuicksortDegree (graph theory)Structural loadSet (mathematics)Different (Kate Ryan album)Directory serviceSource codeComputer animation
Subject indexingWeb pageLink (knot theory)Independence (probability theory)Computer fileDensity of statesPrice indexOvalFile formatInteractive televisionSelectivity (electronic)Physical systemLink (knot theory)Geometric quantizationFile formatNumberoutputCASE <Informatik>Computer animation
Digital filterOvalLink (knot theory)File formatComputer fileBinary fileRevision controlOrder (biology)Metropolitan area networkIndependence (probability theory)Discrete element methodPrice indexSubject indexingDirected setSicRevision controlLink (knot theory)CASE <Informatik>Computer fileMiniDiscStandard deviation1 (number)Source codeProduct (business)Form (programming)Normal (geometry)Web 2.0Computer animation
Function (mathematics)Instance (computer science)Software2 (number)Constraint (mathematics)BitCASE <Informatik>Drop (liquid)InternetworkingSoftware developerService (economics)Different (Kate Ryan album)Revision controlGroup actionLogical constantLatent heatFile formatState observerNormal distributionDirectory serviceReal numberNoise (electronics)Computer filePlanningMultiplication signLecture/Conference
Transcript: English(auto-generated)
Thank you. So like he said, I'll talk to you about the PIP internals. So I guess you all know PIP and use it almost every day. But I guess you don't really know what it really does.
At least not everything. So first, a few disclaimer. PIP has only one API. It's not really one, but it's command-line interface. So everything I'm going to talk to you about is only true for this version of PIP and might change in the future.
But obviously, the internals don't change completely from one version to the others. But clearly, you shouldn't rely on the function I'm about to show you. So this talk corresponds to the latest PIP version,
so 7.1. I'll present you three main parts. I guess the one you use the most corresponding to the installation of packages. So first, how does it resolve dependencies or doesn't?
How does it perform an installation? And in the end, how does it select a package? So first, how do you tell PIP what you want to install? There are three main ways. So either you directly pass an agreement,
like my packages, like Django, or a link to a file, or a path to a directory or a file. You also can install editable package. So it's a .e or double dot editable option with a path.
This is used mostly in development to be able to change a package while you, to test the packages. And finally, requirement files. I think you should certainly know the three ways.
And there is now a new one since 7.1 called Constraint. It's a little bit special because what it does is setting Constraint on the package versions, but this won't cause the installation.
So you have to ask for, let's say, Django to be installed, and then, through the Constraint files, you can say what version you want to install. So this is the structure that holds this information is called a requirement set.
It's just a big bag of requirement. And yes, you can, of course, ask Pip to install several at the same time. So you can ask for several editable package,
also package, give it a requirement or two, and also Constraint. You can completely combine as much options as you want. So like I said, the requirement set is just a big bag of requirement.
And from the common options, so the four one I just showed you, Pip will call add requirement on all the four items, beginning with Constraints. So what does add requirement do?
It does a few things. First, it checks for environment markers. If you don't know what environment marker are, they are defined in the current PEP is three, four, five, but will be extended in the next package metadata PEP,
so four to six. And it allows to specify required dependencies for specific environment. So for example, you can put in your setup.py files, something like that. It will only require asyncio for 3.3, since you will get it for free in next versions.
So you can do this kind of things, or even a little bit more complex. So this is an example from the wheel package. And it is a combination of environment markers and extras.
So what it says is that if you ask it to install wheels with signatures, it will install carrying always, because it's required. And on every platform, but windows, it will also install a pyxtg.
Environment markers are certainly useful to avoid the logic in your setup.py files, and will allow you to have these optional requirement in wheel files.
So this was the first thing add requirement does, it's checking the markers, whether we should add the requirement to the bag or not. Then it will check if the same requirement was already supplied, and will reject any double requirements. So you can't ask Django version one,
and then ask him to put Django version greater or equal than one. It will reject it for now. Except, and this is important, if the previous requirement was a constraint, and then it will always use the constraint version.
So if you have something like that, so putting a Django 1.8.3 in a constraint file, and then asking PIP to install 1.4.3, it will use the version specified in the constraint files. This can be really useful to force PIP
to install a custom version or local version with small patches, whatever the packages will ask. And then finally, and this is our currently PIP reserve dependencies. If you have already one dependency, one requirement,
and another package requires the same package, it will just ignore it. So this is an example. Imagine you have a simple package with an install requirement on a PIP 8, when version 1.2, and you ask PIP to install simple on PIP 8.1.3,
what will be called is simple, add that requirement simple, then add requirement PIP 8.1.3, and when we will reserve dependencies and try to put PIP 8.1.2, it will be completely in your ignore.
So now you have a big bag of requirement, and PIP calls what the method is prepare files that will, as it says, prepare the files for the installation. So it will call prepare files on old unnamed requirements. So unnamed requirements are editable ones, or when you ask him to install a pass or file,
then named requirements, and after all that, the newly found ones. So the dependencies of the previous requirement. So what does prepare file do? So first it will check whether it's already installed,
and if the currently installed version matches the requirement, then it will try to find a link corresponding to, if you say PIP install Django, PIP has to know where is the files to install Django.
So this is done in the find requirement, and I will talk more about this function in the last part. And there is an interesting point here. This is the wheel caching since version seven. When it will set the link for the requirement,
it will try to check whether it has already built the wheel for this file, for this link, and if so, it will use it. So giving a faster install. Then it will unpack all the file, all the requirement it found in a build location,
and compute the metadata. So running setup.py agafo to get the requirement, and compute the dependencies, eventually with extras. And that's the way it detects new dependencies and new requirements, and puts it back in the requirement set.
So for instance, if you PIP install a flake8, it will find the three requirement, PIP, PyFlex, and PyCabe. It will populate things for all, so calling the find requirement for each of these three packages.
And in the build location, it will unpack the four requirement asks. So flake8, PIP, PyFlex, and PyCabe, with either setup.py to install with setup tools,
or if the files were wheels, just a deep directory with the metadata. So as I've already said, since PIP 7, you have a new wheel caching option. So that will work, giving two things. First, you need to have the wheel package
installed in your environment. And then you should not be using the load or low cache DR options. For if you use either of the two, the wheel caching won't happen. And then what it will do,
is the PIP has all the requirement and unpacked in the build location. And it will try for each of them to build the wheel. Obviously, it will ignore requirements that are already wheels. It will also ignore editable requirements or links pointing to version control system.
Since there is no real, because it could be subject to changes. And of course, if you want to change, if you change your file, you don't want to use the last wheel you've already built. So the thing is these requirements
are prone to changes. So you don't want to build a wheel for something that will change. And of course, you can opt out of this option with the no binary option, either passing it all the old tag or the specific package.
And there are some other special case, but I won't talk. And then, if the wheel building was successful, it will replace the previous link with the file link to the wheel file and unpack it in the build location.
Otherwise, just ignore the failure and keep the previous unpacked standard distribution file. So now PIP has all the requirements it needs in
in a directory, one for each requirement. And there are basically three possibilities. Either as a requirement is a wheel. So it's already unpacked and the wheels are basically the unpacked package
the way it should be installed. So all PIP has to do basically is move the directory or the files in the site packages directory to be installed. And there are some complexity with scripts and data that I won't talk about. And it will put also a record file
to ease an installation of the package. So this is for the wheel files, it's quite simple. Then, yes, okay. And of course for wheel, you just need PIP. You don't need any additional setup tools.
So how does PIP install setup.py files? So it's using some kind of hack to make sure that you always have setup tools. Even if your setup.py works with these details.
So this is not exactly the command, but that's the idea. Basically opening setup.py and compiling it and executing it with the setup tools available so that setup tools has already monkey-patched details.
And most importantly, PIP install calls setup.py install with some option, mainly record to record the install files and single version externally managed to avoid some setup tools hack.
If you only run setup.py installs with setup tools, you will end up with invocation of his installs that will also try to install all the requirements, which might not be what you want. And if you directly run setup.py install
with these details, you will end up without any install files.txt, meaning that PIP won't be able to uninstall the package you just installed. So always prefer PIP install. That will run the setup.py of the current directory
to avoid the previous issues. And for editable requirements, it's quite simple. It's the same. It will just run setup.py develop with the option, no depth. So now you have the wall.
Bag of requirement, requirement set. And what PIP does is first uninstalls the old version, if you need to. Then tries to install the new package and success, all is good and the install is committed.
But if the install doesn't work, it will roll back the uninstaller. But it will only do this for the last install package, not the wall install corner. Meaning that if you have a package A and B, so A version one requires B version one, A version two, and so on.
You start with A1 and B1 in your VM. Then when you run PIP install A Gal2, PIP will uninstall B1. Well, it might. Install B1 and install B2 successfully. Then it will try to uninstall A1,
but unfortunately it crashed on A2. So A1 will be restored, but B won't change. So you might end up with A1 and B2. So it's good to know that you PIP won't roll back the wall installer, just the last one. It also tries to respect the provided orders
and will, since a recent version, will try to install dependencies first. But there are no order guarantee. And if you really need it, it means that you, to run the setup.py of your package, you need to have some requirement.
You should be using use requirement, use setup requires. But of course there are some drawbacks that is currently owned by setup tools. So if, so PIP doesn't know about setup requires, and since when you will invoke the setup.py call,
that setup tools is find it, and we try to install the requirement with the install, and the install won't honor the option you gave it. Hopefully maybe PIP might take control of this feature in the future. So now to the biggest part,
which is when you PIP install FUBAR, how does PIP know which file it should be using? Because you might have several PyP servers, maybe a private PyPI, maybe a wheelhouse, and some other solutions. So your PIP is exposed to different versions,
formats, or locations. And all this magic happens in file requirement. So you give it a requirement, and whether you want an upgrade or not. And it will return the files to installer. The way it does it is first calling find all versions
that will basically find everything you could install in your system. So just for Django, it will find more than an Android installation candidate, and on my setup with a private PyPI and a wheelhouse, it finds more than 200 possibilities. Then it will filter the requirement you might have given,
and with the pre option, it will remove all the pre-releases. No, it will keep the pre-releases. And finally, it will sort by version, so preferring the latest one. Depending on whether you told him to upgrade or not,
it will choose the latest one or the currently one. So the main point is on find all versions that has three main choices. Indexes, extra indexes, indexes, yes. So they come from the option index,
extra index or no index. You have also find links, and also dependency links that are in the way of depreciation, but not really. It's complicated. So indexes expect something like schema, some index,
and we'll try to go to package name to list all available files, but of course, you can give it all kind of names, and PIP won't try all of them because they are basically endless. So with modern indexes, what PIP does is
simply goes to PyPI, simple Python data utility. And PyPI will redirect to the correct location. So this is with modern indexes. With older and simpler, what he will do is go to the given name packages, package name.
It will return the 404, and then it will just simply go to the simple index, which on PyPI is quite big, and will match all the links with a normalized form. And this is only done on the first index, because you can have multiple index, and the first one will give it the correct name.
So you go from simple to simple Python data utility. So now you have three main sources, index links, find links, and dependency links. They are put in a function called sort location that will return two lists,
file location and two other locations. And what is important is that for find links, you have here a special case for find links, that if you give it the directory, it will get all the files out. So this is how wheelhouse is built.
Just put all the wheel files you want in a directory, and give it through find links. Then to find the find links option, you can also give it direct location, and also via dependency links. There is a lot of specific cases for URL locations
that correspond to the indexes, and that will check whether it's a secure location, and you can add some with a trusted option, trusted host option. And then all these links are passed as HTML,
basically check all the links. And there are some special case for those. I won't talk about this. And all these links are then put in a big filter called package versions, that will push some links at the end, and select or reject on lots of criteria.
Basically whether your system supports the format, remove some Mac OS files, this is already a special case I found. Whether you ask him to reject wheels or standard disk for this package, it will also check the compatibility of the wheel files.
Try to extract the version from the find name, and reject if it doesn't. Check also on the Python version. If you, this is something I did in new. And also check everything about internal or external files.
Finally, if you come back here, you see that you have three main sources, files, URL, find links, and depth links. And finally, PIP will prefer always file sources, then direct find links, then URL links, and finally dependency links.
And this is basically what find all version returns. And this is an example for paper eight. So you will, you find all the files from the wheel house, send from, all the files from my private PIPI, and then a normal PIPI. So it contains the package, the version, and the direct link.
Thank you. If you have any questions. Yeah, we do have a few minutes for questions. So, is there any question around? Yes.
Have you looked at Conda at all? And could you speak about the differences, what any advantages using PIP versus Conda? No, I haven't checked on the internals of Conda. Okay, that's cool.
Could you explain what a wheel is, please? What is a wheel? It's a kind of new package format, but new is it's already a few years old.
And what it's, it's basically, if you take a standard distribution and you install it in a side packages, and then you take the directory PIP as itself in the side packages and you zip it,
it's basically a wheel. So a wheel to install it, you just have to unzip it and put the directory inside packages. So it provides a much faster install.
You talk about constraints. Isn't it a bit dangerous to use constraints?
Well, there is no surprise. You won't use constraints by accident. So if you tell PIP to install a specific version, it's because you made it. Yeah, but actually I don't see a real use case where you want to override a dependency version.
I think one use case is because you know that you don't want to use the version someone might require. The truth is you might then use directly requirement files,
but I think it's a huge case of the guys that added these features. I think it's Robert Collins. It's for OpenStack where they have a lot of requirements and they want to pin all of them. So they just put everything in the constraint files.
And every time they call PIP install, they just give it the constraint files and they are sure that everyone will have the same versions because when you give a constraint files, it won't install everything that is inside. It will only install it if you on the side requires a requirement.
Constraint files are just to pin a version on the requirement and it won't install the like requirements, requirements files. It won't install everything.
Okay, hi. I have very easy question. Why versions of your software jumps so fast? Is it incompatible each release or it's just the kind of software currently is working?
It doesn't, well the thing, maybe you are referring to the fact that it went from 1.5 to six. Yes. Well, it's just because they dropped the one but you really should think of six as almost like 1.6. So it's a semantic versioning currently or just?
I don't, well, clearly the path from 1.5 to six is not semantic versioning. Okay. I don't think developers are just plan to stick
on semantic versioning, I don't know. Okay, thank you. Okay, so it's supposed to be the end of this talk. So what we can do is those who want to go to lunch, you can go now and make as much noise as you.