We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Python 2 is dead! Drag your old code into the modern age

00:00

Formal Metadata

Title
Python 2 is dead! Drag your old code into the modern age
Title of Series
Number of Parts
132
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The clock is ticking on Python 2.7, with support to be dropped in January 2020. With major dependencies such as Django, NumPy and pandas moving to Python 3 only, the time has come for even big established codebases to consider upgrading. Many organisations are still postponing for various reasons; we will attempt to demonstrate that with a bit of planning and perseverance, and the assistance of some handy tools, we can embrace the future! This session will provide a first-hand perspective on how we upgraded a large (~65,000 lines of python code) 8-year-old Django project with multiple external dependencies from Python 2.7 to Python 3.6. We will briefly discuss the benefits of upgrading to Python 3, and architectural considerations. The session will primarily focus on the practicalities of upgrading the code itself. We will not try to provide a single “best” solution for upgrading to Python 3, but rather will introduce some of the available tools, provide an insight into how we used them, and their advantages and disadvantages from our experience. We will discuss preparatory steps and approaches, strategies for dealing with external dependencies, and “gotchas” that we encountered during the process. The aim of this session is to provide an example of how a Python 3 upgrade on an established commercial product can be successfully completed, and to furnish audience members with a set of tools and strategies to help them with their own projects. Prerequisites: basic knowledge of Python.
35
74
Thumbnail
11:59
Smith chartCodeSoftwareDrag (physics)Observational studyInformation securityUnicodeKey (cryptography)String (computer science)Core dumpLine (geometry)Computing platformLink (knot theory)Query languageBitFunctional (mathematics)SynchronizationParameter (computer programming)Error messageIterationDefault (computer science)CodeTexture mappingDescriptive statisticsMultiplication signString (computer science)Expression1 (number)Observational studyElectronic mailing listRevision controlArithmetic meanModule (mathematics)TwitterDifferent (Kate Ryan album)Pairwise comparisonSemiconductor memoryConcurrency (computer science)outputMatching (graph theory)Positional notationDrop (liquid)Type theoryComputing platformLibrary (computing)MappingLine (geometry)Projective planeShared memoryCASE <Informatik>Core dumpElectronic visual displayData dictionaryMultiplicationTransformation (genetics)Substitute goodObject (grammar)Information overloadSoftware developerInterface (computing)Computer configurationAreaSlide ruleTerm (mathematics)Software bugMoment (mathematics)Computer animation
Drum memoryCovering spaceElectronic program guideInstallation artElectronic mailing listMultiplication signNumberoutputInformationVariety (linguistics)Interpreter (computing)ResultantAreaComputer programmingPopulation densityRevision controlCodeElectronic visual displayRaster graphicsProjective planeCovering spaceOnline help1 (number)Different (Kate Ryan album)Scripting languageSeries (mathematics)Set (mathematics)Computer animation
Computer fileMathematicsoutputLetterpress printingFunction (mathematics)Directory serviceException handlingElectronic mailing listStatement (computer science)Differenz <Mathematik>Source code
Latent heatInformationLibrary (computing)Software testingCodeProjective planeSoftwareUtility softwareRevision controlElectronic mailing listMappingSoftware developerIntegrated development environmentFlagComputer programmingComputer animation
Physical systemSoftware testingCodeBitUnit testingSoftware testingRevision controlDescriptive statisticsPhysical systemTerm (mathematics)Different (Kate Ryan album)String (computer science)VirtualizationMultiplication signWrapper (data mining)FreewareProper mapUniqueness quantificationProcess (computing)Computer animation
Directed setRevision controlData storage deviceElectronic mailing listLocal ringTotal S.A.Food energyResultantSoftware testingHuman migrationMultiplication signComputer fileBitSoftware repositoryLibrary (computing)MultiplicationCASE <Informatik>Data managementSource codeJSONXMLUML
CodeMobile appBackupEntire functionCode refactoringStatement (computer science)Software testingCodeBitMathematicsMobile appIterationSoftware testingRevision controlStatement (computer science)Multiplication signOcean currentElectronic mailing listProcess (computing)Differenz <Mathematik>FlagCartesian coordinate systemComputing platformQuicksortCASE <Informatik>ConvolutionData conversionString (computer science)Letterpress printingConservation law3 (number)Execution unitFront and back endsBackupQuery languageLibrary (computing)TesselationResultantFunctional (mathematics)Set (mathematics)Projective planeCode refactoringMappingMultilaterationComputer animation
RoundingException handlingString (computer science)Message passingRandomizationDefault (computer science)Communications protocolComputer fileLine (geometry)CodePoint (geometry)Software testingEstimatorCodeProcess (computing)Core dumpMappingMultiplication signTesselationMessage passingDefault (computer science)Communications protocolCuboidLevel (video gaming)Key (cryptography)Error messageLibrary (computing)ResultantType theoryMathematicsTask (computing)WebsiteObject (grammar)1 (number)Suite (music)Roundness (object)Vulnerability (computing)Strategy gameDifferent (Kate Ryan album)Hash functionMereologyException handlingComputer fileSoftware bugOperator (mathematics)Attribute grammarBitCoprocessorSoftware maintenanceProjective planeInformation securityCache (computing)Installation artRoundingComputer animation
ExistenceComa BerenicesSlide ruleComputer animation
Transcript: English(auto-generated)
Hello, everyone. Welcome. Can you hear me? Okay. Awesome. So, a little bit about me. Just a little bit. I'm Rebquok on GitHub, Twitter and most other places. I'm a software
developer at Ecomagica. We're here in Edinburgh. We are hiring, but only in Montreal at the moment. But if you're interested, then come talk to me later. I'm also an ex-psychologist. I haven't spoken at a conference since my academic days, which were a long time ago.
So bear with me if I'm a little bit rusty. So this talks about how we as an individual company went about upgrading a large established Python 2.7 code base to Python 3.6. I'll
talk a bit about our experience and our general approach and some of the gotchas and pitfalls we encountered along the way. This was only our approach. It's certainly not the only way to do it. I'm by no means telling you that it's the right way to go about it. And I'm not going to attempt to give you a single best way of approaching
the problem. Instead, I just hope to give you a useful case study and share with you some of the lessons that we learnt. So why did we and why should you even want to upgrade to Python 3? Well, the main reason obviously is that the deadline for dropping
Python 2.7 support is approaching pretty fast. January 2020, so less than a year and a half away now. Major projects are also either dropping or planning to drop support
for Python 2, including Django, NumPy, SciPy, Pandas, but there are lots of others. So in the not too far off future, you're going to be stuck with old versions that are only being bug fixed if you're lucky and you don't get any new features. So in terms of motivation to upgrade, that's the stick. What's the carrot? Why should you
want to embrace Python 3 rather than just grumbling about how annoying it is you're being pushed into it? That would and has filled at least another few talks. So I'm just going to touch on a few highlights here. I've got some references at the end of
the slides that go into more detail if you're interested in that. So first, there's that Unicode thing. So Python 3 gets rid of the overloaded string type where objects can represent either textual or binary data. This link here is a nice
description of how it came about, why Python 3 largely exists to fix that. In Python 3, it's always a text string and Unicode by default. There's some better iterations. So in Python 2, you have a lot of pairs of functions that do the same
thing except that one's eager and one's lazy. Python 3 eliminates all of the lazy versions and instead makes everything lazy. So everything's an iterator. Iterating over them works exactly the same way, but it no longer creates an
intermediate list. So it makes it harder to write codes that accidentally uses up lots of memory. We also have some restrictions on comparators. So you now can't do nonsense comparisons between different types. Incidentally, foo is
greater than 4 according to Python 2. We get some advanced unpacking. I won't go into detail on this, but in Python 3, you get the nice star notation for unpacking both iterables and especially dictionaries. So this one I came across
quite recently, but it's a nice way of making new dictionaries from existing ones. We get the option of keyword only arguments in Python 3. So this is a function with two positional and one keyword arguments. This is how we would
do it in Python 2. And any of these three methods of calling it would be valid, but they might not do what you expected them to. You can use the same definition in Python 3, but you can optionally add this star argument, and
that means that the keyword argument that follows it has to be called by name. So now, only that first method of calling it is valid. So when you use keyword only arguments, you can avoid accidentally passing too many arguments to a function and then having them misinterpreted as the
keyword argument. F strings are awesome and are totally the reason to go all the way to Python 3.6 and the reason that we did go to Python 3.6. As well as the just variable substitution, they can contain any Python expressions
including method and function calls. They're more reasonable, they're more concise, they're less prone to error, and they're also faster than other ways of formatting strings. And then there's asyncio, which is the new concurrency
module that's been introduced in Python 3.4. I'm not going to say much more than that about it, because I don't know a lot more than that about it, but I'm told it's very cool. So a little bit about the project that we're dealing with. So Ecometrica's mapping platform is a big Django project that does some cool stuff with GIS data, and some of
my colleagues will tell you more about that if you would like to know. It works with GDAL and other underlying GIS libraries to import and transform mapping data sets, display areas of interest on an interactive map interface, and run user-defined queries
across multiple data layers. It's about eight years old, it consists of around 70,000 lines of Python code in the core project, and it has a bunch of dependencies, including some of the typically temperamental GIS ones. This is what it looks like.
We're looking at national parks here in the Amazon, highlighted in purple. So we can upload and show display layers like this one, which shows land cover in 2010. We can explore individual areas of interest and have a look at results of some user-defined
questions based on information from raster data sets like carbon density, biomass within an area, or how land use has changed over time, and there's a number of ways that those can be displayed. So there are a bunch of useful tools that are
out there to help you with your Python 3 upgrade. I'll take a look at a few of them, but this isn't by any means an exhaustive list. There's lots of help out there, these are some of the ones that we used. So first up, you want to
upgrade your project, that's all well and good, but what about all your dependencies? Will they still work when you upgrade? And there's a quick way to do a first check, and that's the can I use Python 3 package, which does what it says on the tin. So you pip install it, and then you just
run it on your dependencies from the command line in a variety of different ways. Can I use Python 3 relies on projects being classified on PyPI as supporting at least one version of Python 3, so it's not perfect, it depends on you saying that you're Python 3 compliant, Python 3
compatible, otherwise it won't find it. So next, there's a tool called 2 to 3, which I think most people will have heard about. It's usually installed with the Python interpreter as a script, and it reads Python 2 code and
applies a series of fixes to transform it into valid Python 3 code. So here is an example of a little Python 2.7 program that just takes some input from the command line and says hello to you, welcome to your Python whatever year you want. So we just run it from
the command line with a list of files or directories to transform, so 2 to 3 welcome.py, and 2 to 3 outputs a diff of the fixes that it's going to make for Python 3. So you can see here it's identified print
statements, and the raw input that is turned changes to input in Python 3, and it also picked up the change to the exception syntax. So that's a useful first start
for us. Linting can also help you. So PyLint has a Py3K flag, which will highlight Python 3 incompatible code. So when you run it on our little example program, it prints out a list of identified Python 3
issues. Note that it identified this one, which 2 to 3 didn't, so neither of them are perfect, and you still need to review your code, but they can help. So just
briefly a side note about supporting Python 2 and 3. So our main project, our mapping project is end user software. It doesn't need to support external developers, and we were happy to go Python 3 all the way with it, but we did have external dependencies,
and those need to continue to support both Python 2 and 3, and there are tools to help you with that too. So future and 6 are libraries that provide utilities for writing Python 2 and 3 compatible code. Modernize is built on top of 2 to 3. It's used
in a very similar way to 2 to 3, but it's more conservative, so it uses 6 to try and fix up code to be both Python 2 and 3 compatible rather than just changing it all to Python 3. Tox is helpful to let
you run your tests with specific environments, so you can make sure that your tests are going to run under every version of Python that you plan to support. And the Python docs and Django docs also have a lot of useful information on porting to Python 3 but still maintaining compatibility with
Python 2. So going on to what we actually did. First things first, we needed Python 3 on our system. We were on Ubuntu 16.04. That ships with Python
3.5, but we wanted Python 3.6 because F strings. So there was a little bit more setup involved, but only a little bit. So we had to install these additional packages, but that was pretty much it.
And other than that, we were using virtual env with virtual env wrapper. So we just specify our Python 3 version when we create the virtual env. And there really wasn't that much else that we had to change in our deployment process. That pretty much covered it. So the first thing that
we really needed to do in terms of upgrading the code itself was some research. So you need to learn about the differences, the main differences between Python 2 and 3. The Unicode issue is the one that everyone knows about, but there are lots of others and it's worth reading up on the differences before you start. Python 3 porting.com
has a free online book. It has guidance to porting to Python 3 and a pretty comprehensive description of the differences. And the Python future projects cheat sheet, which is here, is
also a useful reference. So next we had a look at the project's test coverage. So we're going to use our unit test as a tool to help figure out whether our upgraded code was working. So it's important to have decent test coverage
before we started. It could have been better, but it was respectable. So we didn't spend a lot of time improving the test coverage specifically for doing this process. So dependencies. This is the one that tends to put people off
upgrading. So we want to use Python 3, but we rely on external dependencies and dependency X, Y, and Z doesn't support Python 3 yet, so we just give up until they do. And for quite a while, we periodically checked the blocking requirements and just put stuff off until the
list looked better. But by the end of last year, our list of pending dependencies was looking kind of manageable. More and more packages are supporting Python 3. So this is the result when we ran can I use Python 3. It
doesn't look so good. We've got 11 projects blocking our Python 3 upgrades. We had about 75 total in our requirements file, so it could have been worse, but still. But things start to look up a bit when we look at the list in a bit more detail. So three of these are
things that we didn't use anymore, so we just took them out. Another four were they showed up because they don't have Python versions identified in the classifiers on
PyPI, so they weren't correctly identified by can I use Python 3, but they did all have Python 3 supported versions that we could upgrade to. There was Python scrubber. We took that out, too. That wasn't compatible,
but it also wasn't a very active package, and it's a bit out of date, and there was another more up-to-date package we could replace it with that did the same thing and was compatible, so we did that instead. Django migration test case. That one also wasn't compatible at the time, but
hopefully someone had already made a PR to upgrade it, so we used that. Then we had YAS 3FS. YAS 3FS is a package for syncing your local files with S3. That did cause some issues. It looked good to start with.
There was a pull request supporting Python 3 that had been merged, but it turned out only to address a few fixes, so we added the remaining Python 3 support to that package. And then the last one on here, Django hashed file name storage. That's a library that we maintain at Ecometrica, so how
bad for not upgrading it sooner. The same went for a couple of other dependencies that we install from private repos, so we upgraded those ourselves. We maintained Python 2 compatibility for other users. We added CI and TOX to make sure that we're testing under multiple Python versions and we keep our compatibility.
So in the end, we whittled this list down to only a few that we really needed to put any significant effort into fixing up. So next was the exciting bit, fixing the code. Actually, updating the code is quite a daunting process to start, because you know
that your changes are going to be so widespread throughout the project. But in this respect, it was kind of nice to be working with a Django project where the code was mostly divided up nicely into Django apps. So we worked app by app. The first thing we did was to run two to three on the entire
app, keeping the backup files that two to three generates so that we could easily check back on the previous version of the code. Pretty much, we just accepted all the changes that it suggested. And then we ran the tests on just that app, fixed the code as necessary until
the tests were passing. And committed the changes app by app, made things a little bit easier for code review. And then we ran Django, something invariably broke. We fixed it again until it ran properly. We ran the application so we could
manually check the functionality of that app. And then we kind of proceeded to the next app. So that got our code mostly working. But the
next step was to review it and refactor things. So here again, committing app by app was useful. It helped keep things together. It also made it easier for other people to code review. My code reviewers are actually in this room and will attest to the fact that it was
still pretty horrible to do. But in the previous step, we also just fixed up code until the tests worked. So now what we needed to do was to review the diffs more carefully. In particular, to fix up two to three's over
conservativeness. So two to three is designed to convert Python 2 code to be valid Python 3 code for any version of Python. In
some cases, it may add extra code that you don't actually want. So the main cases we found of this were converting new iterators to lists unnecessarily. So whether you actually need to convert to a list depends on your current use
case. Two to three tends to be over conservative and wrap everything in lists when it isn't necessarily needed. It also sometimes wraps print statements with extra parentheses, especially if you've got print statements that have
been expressed as functions already. There's also the specific case of isCallable that was initially removed in reintroduced in Python 3.2.
So it doesn't need to be replaced for newer Python versions, but two to three still does it. Then we did quite a lot of refactoring. So especially places where we've been doing manual byte to string conversions. They sometimes got a bit convoluted
because we'd just done what was necessary to get tests to pass and get the app running. With a bit more attention, Python 3 generally allowed us to simplify things quite a lot. And then just as a warning, if you use from future
import Unicode literals, it helps to keep your Python 2 and 3 compatibility, but it does sometimes introduce some sort of subtle issues. Python future project has quite a good review of the pros and cons of using that for two to three
compatibility. Next up is linting. We didn't actually do this. I wish we had, but I didn't know about it and didn't discover it until later. It would have
definitely avoided a few issues. Once your porting is done, you can run PyLint with the Py3K flag, which will highlight some Python 3 incompatible codes that your tests might not have found. And then user testing.
So we dedicated quite a lot of time to front-end manual testing. It's tedious, but it did find issues that our unit tests didn't. And it's also useful for us to have our GIS specialists who are familiar with the platform data review it and make sure that process data sets look like they should and queries
generated expected results. And then everything was going so well, we thought we were more or less done, and we ran into one final hurdle, which is this library called GDAL to MB Tiles.
So GDAL to MB Tiles is a library that generates mapping tiles from geo-reference files and lets you display them with a mapping library like Mapbox. It has some extra fiddliness around installing. We install it separately in our deployment steps. It's also minimally used in the mapping
project. It's kind of used in a side one. So it slipped under the radar when we were assessing dependencies and during our initial testing and upgrading it turned out to be a mammoth task that I don't have time to go into, but just when we thought we were more or less done. So the moral of the story is check all your dependencies no matter where they're coming from.
So there were a bunch of gotchas that we encountered, things that tripped us up along the way. Most of them were a result of lack of thoroughness in the first step when we should have been learning about the Python 2 to 3 differences. But some are maybe a little bit less immediately obvious, a little bit more
obscure, not necessarily identified by things like 2 to 3. One is rounding. So the rounding strategies changed in Python 3. Python 2, it works the way you were taught at school. Exact halves are rounded away from 0. So rounding 2.5 gives you 3, rounding
3.5 gives you 4. In Python 3, that's changed, and exact halves are rounded to the nearest even. This is bankers rounding. The advantage is supposed to be that it's unbiased, so it produces better results than with
operations that involve rounding, whereas the old way is biased towards the upper value. But now rounding 2.5 will give you 2. 3.5 still gives you 4. But it may introduce bugs that you didn't necessarily expect.
Exceptions. The exception.message no longer exists. Exceptions, if your tests don't actually check for every exception that you have, then you may miss them. This doesn't get picked up by 2 to 3. If you've got custom
defined exceptions, they may have a message attribute. Django has some, so you kind of need to check anything that's not a core exception and find out whether the message attribute is valid or not.
Hash. In Python 3.3 and up, the inbuilt hash function uses a random seed for each Python process, and that means that hash returns different values in different Python processes. That was introduced to address the security vulnerability, so while you can turn it off, you really shouldn't.
In a few places, we were using hash on cache keys, which meant that whenever you had a new Python process, you didn't find your cached items anymore. Pickle also turned out to be a problem. Objects that are pickled in Python 2 give you unicode errors when
you try to unpickle them in Python 3. The pickle protocols also change, so you have 0 to 2 in Python 2, 0 to 4 in Python 3, so if you need to load objects pickled in Python 2 and Python 3, you have to make sure you specify the right protocol, and if you're
using Django Redis, that defaults to the latest protocol, so if you're using the default, it won't work in Python 3. Sorting and comparing things, I don't really have time to go through you that much, but you now need to be using the same type in Python 3. You'll
get errors if you don't, and it can sometimes give you some odd bugs that you didn't expect. So this is just from a rough estimate of the git commits on the core mapping projects. So for lessons learned from this, well, upgrading
any project to Python 3 is going to be hard work. It doesn't have to be too painful. It went more smoothly than we expected, really, once we got started. If you're not quite ready to embark on your Python 3 upgrade yet, you can make your Python 2 code Python 3 compatible as much as possible, and in
the more recent parts of the code base where we did this, upgrading was much simpler. Being familiar with the changes is really useful. It also lets you know what new things you can take advantage of. 2-3 is really good. It's a fantastic tool, but it can only do so much, so you really need
to review everything that it does. You can't rely on it to find everything. Test your friend. If your test suite covers more your major code paths, then you can be reasonably confident your code is working. Check all your dependencies, not just the ones in your requirement file. Lastly, be
prepared to spend some time upgrading third party libraries. Don't give up or justify putting off your upgrades just because the maintainers haven't done it for you yet. That's it for me. Thank you. There's some
resources on things that I didn't go into in much detail, and I'll upload the slides later.