Python 2 is dead! Drag your old code into the modern age
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 132 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/44980 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 201825 / 132
2
3
7
8
10
14
15
19
22
27
29
30
31
34
35
41
44
54
55
56
58
59
61
66
74
77
78
80
81
85
87
91
93
96
98
103
104
105
109
110
111
113
115
116
118
120
121
122
123
125
127
128
129
130
131
132
00:00
Smith chartCodeSoftwareDrag (physics)Observational studyInformation securityUnicodeKey (cryptography)String (computer science)Core dumpLine (geometry)Computing platformLink (knot theory)Query languageBitFunctional (mathematics)SynchronizationParameter (computer programming)Error messageIterationDefault (computer science)CodeTexture mappingDescriptive statisticsMultiplication signString (computer science)Expression1 (number)Observational studyElectronic mailing listRevision controlArithmetic meanModule (mathematics)TwitterDifferent (Kate Ryan album)Pairwise comparisonSemiconductor memoryConcurrency (computer science)outputMatching (graph theory)Positional notationDrop (liquid)Type theoryComputing platformLibrary (computing)MappingLine (geometry)Projective planeShared memoryCASE <Informatik>Core dumpElectronic visual displayData dictionaryMultiplicationTransformation (genetics)Substitute goodObject (grammar)Information overloadSoftware developerInterface (computing)Computer configurationAreaSlide ruleTerm (mathematics)Software bugMoment (mathematics)Computer animation
06:57
Drum memoryCovering spaceElectronic program guideInstallation artElectronic mailing listMultiplication signNumberoutputInformationVariety (linguistics)Interpreter (computing)ResultantAreaComputer programmingPopulation densityRevision controlCodeElectronic visual displayRaster graphicsProjective planeCovering spaceOnline help1 (number)Different (Kate Ryan album)Scripting languageSeries (mathematics)Set (mathematics)Computer animation
09:14
Computer fileMathematicsoutputLetterpress printingFunction (mathematics)Directory serviceException handlingElectronic mailing listStatement (computer science)Differenz <Mathematik>Source code
10:02
Latent heatInformationLibrary (computing)Software testingCodeProjective planeSoftwareUtility softwareRevision controlElectronic mailing listMappingSoftware developerIntegrated development environmentFlagComputer programmingComputer animation
12:12
Physical systemSoftware testingCodeBitUnit testingSoftware testingRevision controlDescriptive statisticsPhysical systemTerm (mathematics)Different (Kate Ryan album)String (computer science)VirtualizationMultiplication signWrapper (data mining)FreewareProper mapUniqueness quantificationProcess (computing)Computer animation
14:19
Directed setRevision controlData storage deviceElectronic mailing listLocal ringTotal S.A.Food energyResultantSoftware testingHuman migrationMultiplication signComputer fileBitSoftware repositoryLibrary (computing)MultiplicationCASE <Informatik>Data managementSource codeJSONXMLUML
17:33
CodeMobile appBackupEntire functionCode refactoringStatement (computer science)Software testingCodeBitMathematicsMobile appIterationSoftware testingRevision controlStatement (computer science)Multiplication signOcean currentElectronic mailing listProcess (computing)Differenz <Mathematik>FlagCartesian coordinate systemComputing platformQuicksortCASE <Informatik>ConvolutionData conversionString (computer science)Letterpress printingConservation law3 (number)Execution unitFront and back endsBackupQuery languageLibrary (computing)TesselationResultantFunctional (mathematics)Set (mathematics)Projective planeCode refactoringMappingMultilaterationComputer animation
23:22
RoundingException handlingString (computer science)Message passingRandomizationDefault (computer science)Communications protocolComputer fileLine (geometry)CodePoint (geometry)Software testingEstimatorCodeProcess (computing)Core dumpMappingMultiplication signTesselationMessage passingDefault (computer science)Communications protocolCuboidLevel (video gaming)Key (cryptography)Error messageLibrary (computing)ResultantType theoryMathematicsTask (computing)WebsiteObject (grammar)1 (number)Suite (music)Roundness (object)Vulnerability (computing)Strategy gameDifferent (Kate Ryan album)Hash functionMereologyException handlingComputer fileSoftware bugOperator (mathematics)Attribute grammarBitCoprocessorSoftware maintenanceProjective planeInformation securityCache (computing)Installation artRoundingComputer animation
29:07
ExistenceComa BerenicesSlide ruleComputer animation
Transcript: English(auto-generated)
00:07
Hello, everyone. Welcome. Can you hear me? Okay. Awesome. So, a little bit about me. Just a little bit. I'm Rebquok on GitHub, Twitter and most other places. I'm a software
00:25
developer at Ecomagica. We're here in Edinburgh. We are hiring, but only in Montreal at the moment. But if you're interested, then come talk to me later. I'm also an ex-psychologist. I haven't spoken at a conference since my academic days, which were a long time ago.
00:42
So bear with me if I'm a little bit rusty. So this talks about how we as an individual company went about upgrading a large established Python 2.7 code base to Python 3.6. I'll
01:02
talk a bit about our experience and our general approach and some of the gotchas and pitfalls we encountered along the way. This was only our approach. It's certainly not the only way to do it. I'm by no means telling you that it's the right way to go about it. And I'm not going to attempt to give you a single best way of approaching
01:24
the problem. Instead, I just hope to give you a useful case study and share with you some of the lessons that we learnt. So why did we and why should you even want to upgrade to Python 3? Well, the main reason obviously is that the deadline for dropping
01:45
Python 2.7 support is approaching pretty fast. January 2020, so less than a year and a half away now. Major projects are also either dropping or planning to drop support
02:00
for Python 2, including Django, NumPy, SciPy, Pandas, but there are lots of others. So in the not too far off future, you're going to be stuck with old versions that are only being bug fixed if you're lucky and you don't get any new features. So in terms of motivation to upgrade, that's the stick. What's the carrot? Why should you
02:25
want to embrace Python 3 rather than just grumbling about how annoying it is you're being pushed into it? That would and has filled at least another few talks. So I'm just going to touch on a few highlights here. I've got some references at the end of
02:43
the slides that go into more detail if you're interested in that. So first, there's that Unicode thing. So Python 3 gets rid of the overloaded string type where objects can represent either textual or binary data. This link here is a nice
03:04
description of how it came about, why Python 3 largely exists to fix that. In Python 3, it's always a text string and Unicode by default. There's some better iterations. So in Python 2, you have a lot of pairs of functions that do the same
03:25
thing except that one's eager and one's lazy. Python 3 eliminates all of the lazy versions and instead makes everything lazy. So everything's an iterator. Iterating over them works exactly the same way, but it no longer creates an
03:43
intermediate list. So it makes it harder to write codes that accidentally uses up lots of memory. We also have some restrictions on comparators. So you now can't do nonsense comparisons between different types. Incidentally, foo is
04:03
greater than 4 according to Python 2. We get some advanced unpacking. I won't go into detail on this, but in Python 3, you get the nice star notation for unpacking both iterables and especially dictionaries. So this one I came across
04:22
quite recently, but it's a nice way of making new dictionaries from existing ones. We get the option of keyword only arguments in Python 3. So this is a function with two positional and one keyword arguments. This is how we would
04:42
do it in Python 2. And any of these three methods of calling it would be valid, but they might not do what you expected them to. You can use the same definition in Python 3, but you can optionally add this star argument, and
05:01
that means that the keyword argument that follows it has to be called by name. So now, only that first method of calling it is valid. So when you use keyword only arguments, you can avoid accidentally passing too many arguments to a function and then having them misinterpreted as the
05:21
keyword argument. F strings are awesome and are totally the reason to go all the way to Python 3.6 and the reason that we did go to Python 3.6. As well as the just variable substitution, they can contain any Python expressions
05:44
including method and function calls. They're more reasonable, they're more concise, they're less prone to error, and they're also faster than other ways of formatting strings. And then there's asyncio, which is the new concurrency
06:00
module that's been introduced in Python 3.4. I'm not going to say much more than that about it, because I don't know a lot more than that about it, but I'm told it's very cool. So a little bit about the project that we're dealing with. So Ecometrica's mapping platform is a big Django project that does some cool stuff with GIS data, and some of
06:25
my colleagues will tell you more about that if you would like to know. It works with GDAL and other underlying GIS libraries to import and transform mapping data sets, display areas of interest on an interactive map interface, and run user-defined queries
06:41
across multiple data layers. It's about eight years old, it consists of around 70,000 lines of Python code in the core project, and it has a bunch of dependencies, including some of the typically temperamental GIS ones. This is what it looks like.
07:00
We're looking at national parks here in the Amazon, highlighted in purple. So we can upload and show display layers like this one, which shows land cover in 2010. We can explore individual areas of interest and have a look at results of some user-defined
07:23
questions based on information from raster data sets like carbon density, biomass within an area, or how land use has changed over time, and there's a number of ways that those can be displayed. So there are a bunch of useful tools that are
07:45
out there to help you with your Python 3 upgrade. I'll take a look at a few of them, but this isn't by any means an exhaustive list. There's lots of help out there, these are some of the ones that we used. So first up, you want to
08:04
upgrade your project, that's all well and good, but what about all your dependencies? Will they still work when you upgrade? And there's a quick way to do a first check, and that's the can I use Python 3 package, which does what it says on the tin. So you pip install it, and then you just
08:21
run it on your dependencies from the command line in a variety of different ways. Can I use Python 3 relies on projects being classified on PyPI as supporting at least one version of Python 3, so it's not perfect, it depends on you saying that you're Python 3 compliant, Python 3
08:43
compatible, otherwise it won't find it. So next, there's a tool called 2 to 3, which I think most people will have heard about. It's usually installed with the Python interpreter as a script, and it reads Python 2 code and
09:02
applies a series of fixes to transform it into valid Python 3 code. So here is an example of a little Python 2.7 program that just takes some input from the command line and says hello to you, welcome to your Python whatever year you want. So we just run it from
09:24
the command line with a list of files or directories to transform, so 2 to 3 welcome.py, and 2 to 3 outputs a diff of the fixes that it's going to make for Python 3. So you can see here it's identified print
09:41
statements, and the raw input that is turned changes to input in Python 3, and it also picked up the change to the exception syntax. So that's a useful first start
10:04
for us. Linting can also help you. So PyLint has a Py3K flag, which will highlight Python 3 incompatible code. So when you run it on our little example program, it prints out a list of identified Python 3
10:25
issues. Note that it identified this one, which 2 to 3 didn't, so neither of them are perfect, and you still need to review your code, but they can help. So just
10:45
briefly a side note about supporting Python 2 and 3. So our main project, our mapping project is end user software. It doesn't need to support external developers, and we were happy to go Python 3 all the way with it, but we did have external dependencies,
11:00
and those need to continue to support both Python 2 and 3, and there are tools to help you with that too. So future and 6 are libraries that provide utilities for writing Python 2 and 3 compatible code. Modernize is built on top of 2 to 3. It's used
11:21
in a very similar way to 2 to 3, but it's more conservative, so it uses 6 to try and fix up code to be both Python 2 and 3 compatible rather than just changing it all to Python 3. Tox is helpful to let
11:40
you run your tests with specific environments, so you can make sure that your tests are going to run under every version of Python that you plan to support. And the Python docs and Django docs also have a lot of useful information on porting to Python 3 but still maintaining compatibility with
12:01
Python 2. So going on to what we actually did. First things first, we needed Python 3 on our system. We were on Ubuntu 16.04. That ships with Python
12:24
3.5, but we wanted Python 3.6 because F strings. So there was a little bit more setup involved, but only a little bit. So we had to install these additional packages, but that was pretty much it.
12:40
And other than that, we were using virtual env with virtual env wrapper. So we just specify our Python 3 version when we create the virtual env. And there really wasn't that much else that we had to change in our deployment process. That pretty much covered it. So the first thing that
13:02
we really needed to do in terms of upgrading the code itself was some research. So you need to learn about the differences, the main differences between Python 2 and 3. The Unicode issue is the one that everyone knows about, but there are lots of others and it's worth reading up on the differences before you start. Python 3 porting.com
13:25
has a free online book. It has guidance to porting to Python 3 and a pretty comprehensive description of the differences. And the Python future projects cheat sheet, which is here, is
13:42
also a useful reference. So next we had a look at the project's test coverage. So we're going to use our unit test as a tool to help figure out whether our upgraded code was working. So it's important to have decent test coverage
14:01
before we started. It could have been better, but it was respectable. So we didn't spend a lot of time improving the test coverage specifically for doing this process. So dependencies. This is the one that tends to put people off
14:22
upgrading. So we want to use Python 3, but we rely on external dependencies and dependency X, Y, and Z doesn't support Python 3 yet, so we just give up until they do. And for quite a while, we periodically checked the blocking requirements and just put stuff off until the
14:42
list looked better. But by the end of last year, our list of pending dependencies was looking kind of manageable. More and more packages are supporting Python 3. So this is the result when we ran can I use Python 3. It
15:03
doesn't look so good. We've got 11 projects blocking our Python 3 upgrades. We had about 75 total in our requirements file, so it could have been worse, but still. But things start to look up a bit when we look at the list in a bit more detail. So three of these are
15:27
things that we didn't use anymore, so we just took them out. Another four were they showed up because they don't have Python versions identified in the classifiers on
15:41
PyPI, so they weren't correctly identified by can I use Python 3, but they did all have Python 3 supported versions that we could upgrade to. There was Python scrubber. We took that out, too. That wasn't compatible,
16:00
but it also wasn't a very active package, and it's a bit out of date, and there was another more up-to-date package we could replace it with that did the same thing and was compatible, so we did that instead. Django migration test case. That one also wasn't compatible at the time, but
16:20
hopefully someone had already made a PR to upgrade it, so we used that. Then we had YAS 3FS. YAS 3FS is a package for syncing your local files with S3. That did cause some issues. It looked good to start with.
16:41
There was a pull request supporting Python 3 that had been merged, but it turned out only to address a few fixes, so we added the remaining Python 3 support to that package. And then the last one on here, Django hashed file name storage. That's a library that we maintain at Ecometrica, so how
17:00
bad for not upgrading it sooner. The same went for a couple of other dependencies that we install from private repos, so we upgraded those ourselves. We maintained Python 2 compatibility for other users. We added CI and TOX to make sure that we're testing under multiple Python versions and we keep our compatibility.
17:24
So in the end, we whittled this list down to only a few that we really needed to put any significant effort into fixing up. So next was the exciting bit, fixing the code. Actually, updating the code is quite a daunting process to start, because you know
17:42
that your changes are going to be so widespread throughout the project. But in this respect, it was kind of nice to be working with a Django project where the code was mostly divided up nicely into Django apps. So we worked app by app. The first thing we did was to run two to three on the entire
18:01
app, keeping the backup files that two to three generates so that we could easily check back on the previous version of the code. Pretty much, we just accepted all the changes that it suggested. And then we ran the tests on just that app, fixed the code as necessary until
18:22
the tests were passing. And committed the changes app by app, made things a little bit easier for code review. And then we ran Django, something invariably broke. We fixed it again until it ran properly. We ran the application so we could
18:43
manually check the functionality of that app. And then we kind of proceeded to the next app. So that got our code mostly working. But the
19:03
next step was to review it and refactor things. So here again, committing app by app was useful. It helped keep things together. It also made it easier for other people to code review. My code reviewers are actually in this room and will attest to the fact that it was
19:21
still pretty horrible to do. But in the previous step, we also just fixed up code until the tests worked. So now what we needed to do was to review the diffs more carefully. In particular, to fix up two to three's over
19:41
conservativeness. So two to three is designed to convert Python 2 code to be valid Python 3 code for any version of Python. In
20:01
some cases, it may add extra code that you don't actually want. So the main cases we found of this were converting new iterators to lists unnecessarily. So whether you actually need to convert to a list depends on your current use
20:20
case. Two to three tends to be over conservative and wrap everything in lists when it isn't necessarily needed. It also sometimes wraps print statements with extra parentheses, especially if you've got print statements that have
20:40
been expressed as functions already. There's also the specific case of isCallable that was initially removed in reintroduced in Python 3.2.
21:02
So it doesn't need to be replaced for newer Python versions, but two to three still does it. Then we did quite a lot of refactoring. So especially places where we've been doing manual byte to string conversions. They sometimes got a bit convoluted
21:22
because we'd just done what was necessary to get tests to pass and get the app running. With a bit more attention, Python 3 generally allowed us to simplify things quite a lot. And then just as a warning, if you use from future
21:41
import Unicode literals, it helps to keep your Python 2 and 3 compatibility, but it does sometimes introduce some sort of subtle issues. Python future project has quite a good review of the pros and cons of using that for two to three
22:00
compatibility. Next up is linting. We didn't actually do this. I wish we had, but I didn't know about it and didn't discover it until later. It would have
22:20
definitely avoided a few issues. Once your porting is done, you can run PyLint with the Py3K flag, which will highlight some Python 3 incompatible codes that your tests might not have found. And then user testing.
22:42
So we dedicated quite a lot of time to front-end manual testing. It's tedious, but it did find issues that our unit tests didn't. And it's also useful for us to have our GIS specialists who are familiar with the platform data review it and make sure that process data sets look like they should and queries
23:00
generated expected results. And then everything was going so well, we thought we were more or less done, and we ran into one final hurdle, which is this library called GDAL to MB Tiles.
23:20
So GDAL to MB Tiles is a library that generates mapping tiles from geo-reference files and lets you display them with a mapping library like Mapbox. It has some extra fiddliness around installing. We install it separately in our deployment steps. It's also minimally used in the mapping
23:40
project. It's kind of used in a side one. So it slipped under the radar when we were assessing dependencies and during our initial testing and upgrading it turned out to be a mammoth task that I don't have time to go into, but just when we thought we were more or less done. So the moral of the story is check all your dependencies no matter where they're coming from.
24:03
So there were a bunch of gotchas that we encountered, things that tripped us up along the way. Most of them were a result of lack of thoroughness in the first step when we should have been learning about the Python 2 to 3 differences. But some are maybe a little bit less immediately obvious, a little bit more
24:21
obscure, not necessarily identified by things like 2 to 3. One is rounding. So the rounding strategies changed in Python 3. Python 2, it works the way you were taught at school. Exact halves are rounded away from 0. So rounding 2.5 gives you 3, rounding
24:41
3.5 gives you 4. In Python 3, that's changed, and exact halves are rounded to the nearest even. This is bankers rounding. The advantage is supposed to be that it's unbiased, so it produces better results than with
25:00
operations that involve rounding, whereas the old way is biased towards the upper value. But now rounding 2.5 will give you 2. 3.5 still gives you 4. But it may introduce bugs that you didn't necessarily expect.
25:23
Exceptions. The exception.message no longer exists. Exceptions, if your tests don't actually check for every exception that you have, then you may miss them. This doesn't get picked up by 2 to 3. If you've got custom
25:41
defined exceptions, they may have a message attribute. Django has some, so you kind of need to check anything that's not a core exception and find out whether the message attribute is valid or not.
26:01
Hash. In Python 3.3 and up, the inbuilt hash function uses a random seed for each Python process, and that means that hash returns different values in different Python processes. That was introduced to address the security vulnerability, so while you can turn it off, you really shouldn't.
26:21
In a few places, we were using hash on cache keys, which meant that whenever you had a new Python process, you didn't find your cached items anymore. Pickle also turned out to be a problem. Objects that are pickled in Python 2 give you unicode errors when
26:41
you try to unpickle them in Python 3. The pickle protocols also change, so you have 0 to 2 in Python 2, 0 to 4 in Python 3, so if you need to load objects pickled in Python 2 and Python 3, you have to make sure you specify the right protocol, and if you're
27:00
using Django Redis, that defaults to the latest protocol, so if you're using the default, it won't work in Python 3. Sorting and comparing things, I don't really have time to go through you that much, but you now need to be using the same type in Python 3. You'll
27:20
get errors if you don't, and it can sometimes give you some odd bugs that you didn't expect. So this is just from a rough estimate of the git commits on the core mapping projects. So for lessons learned from this, well, upgrading
27:41
any project to Python 3 is going to be hard work. It doesn't have to be too painful. It went more smoothly than we expected, really, once we got started. If you're not quite ready to embark on your Python 3 upgrade yet, you can make your Python 2 code Python 3 compatible as much as possible, and in
28:00
the more recent parts of the code base where we did this, upgrading was much simpler. Being familiar with the changes is really useful. It also lets you know what new things you can take advantage of. 2-3 is really good. It's a fantastic tool, but it can only do so much, so you really need
28:20
to review everything that it does. You can't rely on it to find everything. Test your friend. If your test suite covers more your major code paths, then you can be reasonably confident your code is working. Check all your dependencies, not just the ones in your requirement file. Lastly, be
28:43
prepared to spend some time upgrading third party libraries. Don't give up or justify putting off your upgrades just because the maintainers haven't done it for you yet. That's it for me. Thank you. There's some
29:03
resources on things that I didn't go into in much detail, and I'll upload the slides later.