We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Python on Windows is Okay, Actually

00:00

Formal Metadata

Title
Python on Windows is Okay, Actually
Title of Series
Number of Parts
132
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Packages that won't install, encodings that don't work, installers that ask too many questions, and having to own a PC are all great reasons to just ignore Windows. Or they would be, if they were true. Despite community perception, more than half of Python usage still happens on Windows, including web development, system administration, and data science, just like on Linux and Mac. And for the most part, Python works the same regardless of what operating system you happen to be using. Still, many library developers will unnecessarily exclude half of their potential audience by not even attempting to be compatible. This session will walk through the things to be aware of when creating cross-platform libraries. From simple things like using pathlib rather than bytestrings, through to all the ways you can get builds and tests running on Windows for free, by the end of this session you will have a checklist of easy tasks for your project that will really enable the whole Python world to benefit from your work.
Axiom of choiceChecklistComputer filePhysical systemMobile appFatou-MengeProjective planeCodeVirtual realityVirtual machineWindowCASE <Informatik>Type theoryModule (mathematics)View (database)Open sourceLibrary (computing)Mobile appComputer fileWebsiteLaptopPhysical systemQuicksortPolar coordinate systemLevel (video gaming)Computing platformEndliche ModelltheorieNumberVariable (mathematics)StatisticsDependent and independent variablesVisualization (computer graphics)Binary fileLattice (order)Web pageScaling (geometry)Subject indexingRevision controlEquivalence relationEstimatorRadical (chemistry)Source codeDirectory serviceMereologySoftware developerComputer configurationScripting languageInstallation artMassFamilyPiArchaeological field surveyDivisorResultantChecklistComputer animation
Virtual machineCache (computing)Video gameRootkitWindowPairwise comparisonConfiguration spaceGoodness of fitString (computer science)Computing platformRevision controlLevel (video gaming)File systemMereologyCASE <Informatik>Projective planeDirectory serviceGroup actionCodierung <Programmierung>Set (mathematics)Cartesian coordinate systemFunctional (mathematics)Cross-platformComputer fileOperating systemParameter (computer programming)Element (mathematics)Sampling (statistics)Semantics (computer science)User profileCodeSheaf (mathematics)Type theoryWeb pageModule (mathematics)Mixed realityLine (geometry)BitPasswordSign (mathematics)SynchronizationVariable (mathematics)Object (grammar)Single-precision floating-point formatIntegrated development environmentLibrary (computing)Right angleResolvent formalismPhysical systemProxy serverMobile appTable (information)Standard deviationDivisorOperator (mathematics)NumberLatent heatVirtual machineObject-oriented programming1 (number)Category of beingRoutingCache (computing)Endliche ModelltheorieAuthorizationCollisionOnline helpProfil (magazine)Heegaard splittingNegative numberLoginElectronic mailing listVideoconferencingUniform resource locatorUnicodeDifferent (Kate Ryan album)Electronic program guideSinc functionInstallation artSurjective functionComputer animation
CAN busSoftware testingMultiplication signVirtual machineCodeData conversionInternet service providerHeegaard splittingComputing platformProjective planeCombinational logicWindowPauli exclusion principleMathematicsPoint (geometry)String (computer science)Library (computing)File systemCodierung <Programmierung>Bound stateComputer configurationCompilerStreaming mediaCartesian coordinate systemBitComputer fileGoodness of fitOperating systemRight angleMereologyDefault (computer science)Compilation albumCollaborationismInformationContinuous integrationInstallation artReading (process)Software developerWeb pageDifferent (Kate Ryan album)Physical systemMultitier architectureConfiguration spaceChecklistForm (programming)FreewareService (economics)UnicodeTranscodierungSource codeMappingProcess (computing)Online helpVisualization (computer graphics)Single-precision floating-point formatCuboidOperator (mathematics)Time zoneCASE <Informatik>Set (mathematics)System callBasis <Mathematik>Subject indexingComputer animation
Computer animation
Transcript: English(auto-generated)
Thank you. Glad to see a pretty good turnout here. In general, talks about Windows don't get a huge amount of interest at a lot of Python conferences. And in fact, Windows tends not to get a lot of attention at Python conferences in general. I want to start off with a few reasons why that shouldn't be the case, but the fact that it is,
it's kind of reflected by the makeup of the laptops you see around most Python conferences. Typically, most people are here with a Mac. There's a decent chunk on Windows, and there's a few on Linux, there's a few people with Windows. I will say, EuroPython has a lot more Windows machines and a lot more PCs than any other Python conference I've been to, and I feel so welcome here, it's so good.
But this is a rough breakdown anecdotally from the conferences I've been to, predominantly Mac. If you start looking at what the speakers are using up on stage, it gets even more clear that this is a Mac-dominated community. And as I say, I designed this for a US audience, where this would totally be true.
In here, I can already see it's not. There's only a few glowing apples facing me right now, which is very pleasant. This is our community, how it looks for the public spaces we're in, when we're meeting with each other, when we're hanging out. But does it reflect any sort of reality? I went and grabbed some actual stats from a number of sources.
Downloads from the Python package index. This is rough and approximate, but if you look at the packages that are interesting to all platforms, you tend to see breakdowns like this. Majority Linux, roughly even Mac and Windows downloads. If you look at packages from Conda, Windows gets a much bigger chunk.
The data science world, more into Windows than it is, than PyPI in general. If you look at the PSF survey that was run last year, a full 50% of respondents were using Windows. We're now up to half the Python community is on Windows. If you look at users of Visual Studio Code,
again, just over 50% are using it on Windows for Python, and roughly even Mac and Linux. If you go to PyCharm, it's even more of a majority. We have about 70% of PyCharm users are using it on Windows, and again, smaller on the other two. And finally, this ratio is a little unfair,
but this is the python.org downloads by operating system. People who downloaded Python from the official website, the ratio is off, because very few people should be downloading Python for their Linux distro from that website. Very few people are going to be getting their Mac installers from there when you can get them from Brew. But just to give you a sense of scale,
this is 14 million downloads per month of Python for Windows. That's over 150 million downloads a year of Python for Windows. Even at its most conservative estimate, from the actual data, we have a community that looks like this. But when you look around the conferences, we're seeing this part of it.
We just simply don't have this big half of the community showing up to our conferences, engaging with our open source developments, giving talks, releasing projects, but they are using it. There's a huge, quiet mass of people using Python for work, making a living, feeding their families with Python on Windows
that are simply not visible, that are not present in our community, not visible in a way that other platforms are. We have essentially a divide in our community. Now, I don't want to make this sound like it's all the conferences' fault because I don't believe that's true.
A lot of these people, and I've spoken to many of them, simply would never consider coming to a conference. They would never think that they would be welcomed at a Python conference. It's never even entered their mind that that's a thing that they'd want to do, that there are people doing things that they can engage with, that they can share with, who would want to have them there.
And as a result, we end up with such a biased view of what the Python community looks like from the conferences we do have. I believe a lot of this comes from the code that is out there, the projects that are out there, in particular the opinions that are put out there, and I want to start addressing some of those today.
This session, I'm going to go through five questions, five topics, phrases, questions. For each one of these, I'm going to bring up two ways in which you can be wrong about this, and these are all real opinions that I've heard from people that have been stated publicly. I haven't had to make anything up here. And many of these you'll look at and go,
well, that's obviously wrong, and if so, I'm glad that that's your reaction to it. If you're a Windows user, you'll look at it and go, that's obviously wrong. But the problem is there are people out there who don't recognize this and aren't acknowledging it, so I'm not calling you out, but I want to bring these up so that I can address them with one simple way to improve the situation.
For each of these topics, I'm going to give you one simple thing you can do with your code, your library, your projects, your tools, that will make that code better, not just for Windows users, for any platform users. All of these should improve your code generally, and at the end, one simple checklist, so you have your cameras ready for that one, of the things that you can just go off today,
check, possibly change about your project or how it's presented, that will be more inclusive, more welcoming to the full Python community that we have available to us. Everyone ready? Okay.
First up, how do I run Python? You see the initial instructions on a lot of people's pages. Pip install this, and then type this command, and now you can run the code. The first assumption a lot of people make is that everything is going to be on path. Now, what this means is if you're in a terminal,
you can simply type the name, and it will automatically locate either a user-specified copy of the tool or a machine-wide site one or a system one or some model like this to locate and figure out what to run. So say you've just installed black. You'll type black, and that will run that tool that was just installed,
because it can look it up on the path, and it knows which one to run. Sounds great. Unfortunately, on Windows, this is not how the path is designed to work. It's often how it's abused to work, but it's not how it's meant to work, and so it doesn't always work this way. Path on Windows starts with the system files.
The path variable is how the system locates system libraries. You can come along and add your app to this, and now your app can be found automatically, but know that it's not looking in, like, a user bin directory. It's not looking in a specific place where there are only files to be found on path. It's your entire app is now available there,
and because your app can be there, it means their app can be there, and someone else is going to come along with some other app and declare that they're the most important app in the world, and they need to be first, and guess how easy it is to find your system files now, and this is when everything starts going wrong. Modifying the path on Windows is really just rolling a dice
to figure out what you're actually going to get. Next assumption. Typing Python 3 is going to launch Python. This is not true. Windows has an executable called python.exe. The .exe is optional, but there's no 3 in this name. There's no 3.7. There's no Python 2.
There's just one Python executable that's the same for all versions of Python on Windows. Now, arguably, this is not the right way to do it, but it's the way it is currently done, and you can pretend that it should be some other way, but that doesn't actually help anyone because it's not true.
What is true is Windows also comes with the py launcher. This is a slightly different tool. It's py.exe, but you can run it with py. This tool automatically finds the latest install of Python on someone's machine and will launch it, so it's the equivalent of typing Python 3 on a Linux system,
which is going to be similar to the correct, hopefully to the latest version of Python that you want to run. The downside of this for launching a tool, say, black, is that black has no way of saying, I need to be run with py.exe. You have to say, using py, launch black,
which brings me to my one simple suggestion for these problems. The python-m option. Typically, you'll type Python, followed by the name of a script, something.py. The full path to a script, it will go and launch that file. The dash-m option lets you type the name of a module.
So just as if you type Python, import that module, this command will do the same thing. It will automatically import that module, find if it's runnable, and run it. And so my recommendation is, make sure your module can be run in that way. Don't rely on someone installing black and then typing black. Tell them that they can use py dash-m black
and make sure that that works. This actually works better on all systems. The main reason you have to activate virtual environments is to make this work automatically. But if you don't do that, then the full path is going to work just fine. Whatever version of Python you launch, using dash-m will run in that version of Python.
Pip actually recommends always using dash-m pip on all platforms to make sure that you install the packages into the version of Python that Python is going to launch. Because as soon as your path gets messed up, you can pip install in the wrong place, and then you go to import it and it's all broken. So it works everywhere. But please document it. You've got a little section in your README
that says this is how you run this project at a line. Python dash-m, name of the project. That's all it takes, and suddenly you've enabled a big group of people to use your project on their system. Handling paths.
Hopefully you think this is one of these silly assumptions. Of course I know that. Of course everyone uses forward slash. No. Here's a sample path on Windows. Those are backslashes, not forward slashes. Which means if you come along and do something like this,
it splits it into exactly one element because there's no forward slashes in this string at all. That is a bad idea. Now, Windows gets a little bit interesting here because if you give it a path with forward slashes instead of backslashes, it'll interpret it just fine. It'll automatically convert the forward slashes to backslashes,
locate your file just perfectly all right. But if you're getting anything from a user, if you're getting anything from ScanDir or the glob module or the command line, it's going to come with backslashes. And if your code assumes that it's going to have forward slashes, you're going to be broken on a number of systems.
The next common assumption that people make with path handling is that they know how to do it. So on POSIX systems, this is the root directory. On Windows systems, this is the root directory.
It needs to be handled differently. You can't assume a single slash at the start is the root directory. On Windows, this is also the root directory. So now splitting our backslash doesn't help either. This is not a root directory. This is not a directory at all. Feel like you know how to handle paths yourself?
There's a couple of edge cases. Okay, who knew that this is a root directory? Three, four. I didn't until I started doing research and someone pointed out, hey, guess what? There's another way to refer to the root directory and it involves a GUID. And now all of your code for handling C colon backslash is broken.
So one simple idea. Don't handle paths. Don't manipulate them like strings. Use PathLib. This has been part of the Python standard library since three, four. There's a backport, so you can install the PathLib module
for older versions if you need to. And it comes with an object-oriented model for paths. They're essentially still strings underneath, but you get objects to manipulate them with. To add more items to it, forward slash on all platforms, it's the divide operator, with a string, will join segments to the path.
There are properties to reduce it, to take pieces off the end, properties for the various parts of it, names, suffixes, functions to change those, so you don't have to split strings or anything. There are file system operations such as glob straight onto it, which will give you a list of new path objects from that directory and comparisons.
Basic comparisons are handled. They're one step better than comparing strings. They're deliberately not at the level of comparing, like, inodes or anything. They're not going to tell you this is exactly the same file, though there are other functions for doing that more reliably.
Settings. If you've got a command line tool or something that isn't a straight Python API, you've probably got some big mix of command line arguments and environment variables and hopefully a configuration file where people can put settings so that they're always used, like proxy settings, username, password, whatever. Where do you keep these?
One assumption that I've seen made is that TILT is the home directory, and on Windows, this is not the case. TILT on Windows is simply a character that you can use anywhere in a path that you like, and so if you try and navigate to the TILT directory and someone actually has one called that, then you've just gone into some other directory.
If not, it's probably not there. It certainly doesn't resolve to any of the specific user folders. One way to find those, and I hesitate to say this. I'll get to the real recommendation in a bit. One way to find that is to look at the user profile environment variable, which will tell you the root of the user's profile directory, which leads to the second assumption.
The home directory is the best directory for keeping configuration settings. I personally don't believe this is the case on any platform, though I know people like to argue about that, and certainly other people know POSIX better than I do, so I'm willing to listen to them in that case, but certainly on Windows,
if you find the user's profile directory, that's not a good place to put anything. It comes with a whole set of subdirectories. There's a group, documents, pictures, videos that are user-visible. The user can see these, can put whatever they want in there, can modify things that are in there, and then there's a set of hidden directories under the app data folder is the most common,
which then breaks down into slightly different semantics. The local one is guaranteed to stay only on that current machine. The roaming one might automatically sync to another machine that the user signs into, and so you have all these different places where you may want to put configuration. You may want to put settings. If the user should see it and edit it, maybe documents is the right place.
If it should automatically go to every machine that user ever uses, perhaps roaming is the right place. If it's a local cache, maybe app.a.local is the right place. But the easy way to deal with all this is to get the app-dirs module. It's a single-file package. I believe it's MIT licensed,
so most people should have no problem dropping in a simple file if that's what you want. But it comes with functions that work across multiple platforms to provide these directories. So in your code, you can simply say, app-dirs, give me the user data dir. Here's the name of my app. Here's my author. It will make sure it doesn't collide with other people's apps or your own apps.
And it gives you some somewhat opinionated directories, but they all end up in generally good spots for configuration, caches, logs, and per-machine locations as well if that's where you need to look or modify.
Text encodings, always a lot of fun. First assumption, UTF-8 is always the correct encoding. Okay, let's start with a brief history lesson. Who knows what happened in 1985?
Windows 1.0 was released. Who knows what happened in 1991? Unicode 1.0 was released. So for six years, Windows was shipping with international support without Unicode even existing. So there's no way that UTF-8 was the right encoding for that.
Now, Windows is really good at backwards compatibility. You might have noticed that you can build or install applications for early versions of Windows, and they'll continue to work, which means that all of the support that was there in 1985, 86, 87, 88, 89, 90 is still there.
You may be familiar with these as the A and W APIs. The A APIs are a whole lot of the Windows APIs that end in a capital A. They take char star string arguments. They use what's called the system code page. So a char in C, 255 possible values.
There's more letters than that in the world. We know that. How do you map between them? The code page is a system-wide setting that says use this particular table to map these characters into what they really mean. The W APIs are the new ones, the replacements. They use 16-bit characters. The encoding is always UTF-16.
Natively, Windows uses UTF-16 without surrogate pairs, which we'll chat about later if you are interested in that, uses UTF-16 internally. So in fact, for Windows, UTF-8 is never correct. You've always got to convert it generally to UTF-16 for the operating system to be able to use it. Which comes to the next assumption.
There's a lot of negatives on this, so let me go through and explain. We know that bytes are not text. We know that bytes are not text. It's why we have Python 3. It's why we have two different types. We have a bytes type and a str type.
But there is an assumption that I have seen come up from time to time, and in fact, I had to write a pep and make changes to help deal with this, where if you have something that's stored in bytes and you pass it to the operating system and then get it back, it will come back just as bytes. Bytes in, bytes out, and never be corrupted
because it's just some blob of data. Now, in POSIX, this works for things like file system paths, which are arguably text because they get shown to a user, but if you have a path as a blob of bytes and pass that into the OS and get it back, then it'll be just fine because the OS ignores it and says it's just a blob of bytes.
Not true on Windows. Because as I mentioned, the native encoding is UTF-16, which means if you pass a blob of bytes to something that the OS thinks is text, such as a file system path, it's going to convert it to UTF-16. It will do that with your code page, so you've got 255 possibilities for what the letters will turn into.
Then when you get it back, it will go back to your code page, and if there's anything that doesn't encode properly, you'll get question marks. So the one simple idea is to simply use strings. Python has built in all the ability to do the OS conversions for all platforms.
So if you're even concerned about invalid encodings in POSIX file system paths, Python can handle that. It can encode them into strings in a way that will be preserved through string processing, and then when you pass it back out, it will convert them back correctly, and you won't lose those invalid encoded characters.
For Windows, obviously, it's Unicode to Unicode, back to Unicode. It does have to do some transcoding, but it's a reliable mapping, and so letting Python do the OS conversions is generally the safest thing to do for all platforms. For Iostream conversions, anytime you open a file
and rather than reading bytes, you want to read text, you need to do your own conversion. This one's probably the hardest recommendation of the day because choosing the encoding to use for reading and writing user data can get really messy really quick, so this is why I've quoted part of the Zen of Python up here. In the face of ambiguity, refuse the temptation to guess.
You will still be tempted to guess. I don't blame you. Do what you need to do. When you have the opportunity to say, this file must be UTF-8, do that. If you don't know what encoding file is you've been given, I would suggest assume it's UTF-8, and if it fails, complain to the user.
Tell them it needs to be UTF-8. Let them change it, or let them tell you what encoding it should be. But ultimately, and as part of my work preparing PEPs 528 and 529, which deal with encoding, we decided there was no safe way to ever make a change to how Python does stream encoding by default.
The only safe way to approach it is for you as the application developer to know what encoding it should be because you have some extra information or you can force the data provider to provide the right thing, or as a library developer, let the user specify, or let the caller specify what encoding it should be.
But please refuse the temptation to guess. I think we all care about this one, right? How do I make my code work? A lot of people are very concerned that nobody can install packages on Windows,
and this is simply not true. Most Python packages are pure Python code. There is absolutely no reason they can't be installed on Windows because that's simply extracting a zip file. That works totally fine all of the time. There is some code that includes C code.
For those, you need a compiler. Now, if you do want people on Windows to be able to install it, the two options are tell them they need a compiler, and the easiest way to get the compiler is to install Visual Studio Community Edition, which is totally free and includes a single checkbox
that gives you the right compilers for Python. Or the other option is to build a wheel. So wheels are a different form of packaging that can go on the Python package index. They are precompiled, which means you will compile the package for Windows 32 or 64-bit and publish it already built.
Users can simply download and extract that. I mentioned Windows compatibility earlier. It works totally fine. You can build them on the latest Windows 10. It will largely work all the way back to as far back as Python supports, and it will continue working in the future, so the compatibility is not the concern there. The concern is generally not being able to build on Windows at all
or not having your own machine for it. But I'll point out that in that case, the problem isn't that nobody else can install on Windows. It's a personal problem if you don't have the ability to install on Windows. Similarly, people refuse to do it
because they claim they can't test on Windows, and that's just absolutely false. That's just absolutely false. This idea comes up far too often. I'm trying not to get too upset about it right now, but it is something that people say, and frankly, I don't think there's any good excuse for this.
More than half the Python community is running Python just fine on Windows. If you want to engage and include that community, you might have to step outside of your shell and do a little bit of work. That might involve getting a virtual machine from somewhere, but you can do that.
This is not outside of the bounds of your ability as a developer. So you get two ideas for the price of one here. Either get continuous integration set up or collaborate. There are many, many free continuous integration providers.
You don't want to install Windows on your home machine. That's fine. There are plenty of people that will let you use theirs. Automatically, you can set it up. Visual Studio Team Services will give you free Windows, Mac, and Linux builds all in the one system. You can build all your wheels in one go and publish them all from there, and it has a free tier that will more than get you started.
AppVay is also available. You tend to see that show up in combination with other CI providers that will do the other platforms. But these are free. There's no excuse for not having access to this stuff because everyone is dying to give it away to you. If you are interested in Visual Studio Team Services,
I can talk to you about that for days, so come and find me afterwards. But the other option is to collaborate, is to actually get on your GitHub page and say, I'm sorry, we don't have great Windows support right now. If anyone can help me occasionally test the library, which is download the sources, run the tests, and let me know whether there's a problem or not,
that would be very much appreciated. If there's someone who at release time can install this, can have the compilers, can produce a wheel that I can upload and publish, I'm willing to accept your help. It's not a big step. We have a large community of people who are willing to do this, but are largely not going to walk into your project
and demand the right to do it. You have to invite them because it is your project, and if you care about it, it's very easy to invite people to come and help you out. So, where to from here? Here's the checklist that I offered earlier. If you're releasing a project, ask yourself,
does dash m work on it? If it's a tool, can people Python dash m to start running my code? Do I manipulate paths by hand? Am I using string operations, regex, on paths? Do you put your configuration in weird places?
You force people to navigate into places they may never have looked at on their machine to change settings. Do you keep your text in strings, or do you just keep it floating around as bytes and assume that all the encoding is going to work out on its own? And do you have continuous integration for your project or collaborators who are helping you out,
even on a part-time basis? Python's a great community, and it's really, really sad to have a split between so many of the people that make it such a great time.
I would love to see even just these little things, changing projects that people are releasing, making it easier for people to use and share the projects that make Python great, and to engage and to fix that breach in the community, bring more people together, bring more people to these conferences,
and overall make Python even more of the greatest community on earth. Thank you. I'll take questions out in the hallway afterwards.