Moving big projects to Python 3
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 118 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/44841 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 201930 / 118
2
7
8
15
20
30
33
36
39
40
45
49
51
55
57
58
63
64
66
68
69
71
74
77
78
80
82
96
98
105
107
108
110
113
115
00:00
GoogolPoint cloudPhysical systemMaxima and minimaCharge carrierScale (map)Service (economics)Core dumpDigital signalContinuous integrationServer (computing)Integrated development environmentVirtual realityMultiplication signComputer-assisted translationIntegrated development environmentCuboidMedical imagingLevel (video gaming)Statistical hypothesis testingCore dumpAdditionProduct (business)Process (computing)QuicksortStatistical hypothesis testingPhysical systemMereologySoftware developerCartesian coordinate systemWeb 2.0Software frameworkServer (computing)Interactive televisionStability theoryProjective planeProbability density functionComa BerenicesRevision controlComplex systemMathematicsSoftwareWeb applicationError messageType theoryState of matterAsynchronous Transfer ModePoint (geometry)Procedural programmingSlide ruleBitNormal (geometry)Service (economics)LengthContinuous integrationNetwork topologySinc functionLatent heatTurbo-CodeTraffic reportingSelf-organizationOpen sourceLogical constantBranch (computer science)PiLecture/ConferenceComputer animation
07:58
Level (video gaming)Revision controlHash functionCurvatureLatent heatOvalHash functionInstallation artMultiplication signUnit testingSoftware maintenanceLine (geometry)PlanningFunctional (mathematics)Statement (computer science)CASE <Informatik>Statistical hypothesis testingRevision controlSoftware developerLevel (video gaming)Standard deviationSelf-organizationLibrary (computing)Type theoryPersonal identification numberProjective planePoint (geometry)CodeMathematicsService (economics)Execution unitMedical imagingElectronic mailing listStatistical hypothesis testingBitPhysical systemComputer fileINTEGRALSystem callSinc functionMetropolitan area networkFlow separationComputer animation
15:33
Strategy gameCodeOperations researchRevision controlTotal S.A.Normal (geometry)Military operationStatistical hypothesis testingNormal operatorSoftware developerTask (computing)Extension (kinesiology)BitMultiplication signInsertion lossError messageGoodness of fitDuality (mathematics)TrailRevision controlService (economics)CodeWhiteboardMathematicsGame theoryPhysical systemStrategy gameContinuous integrationStatistical hypothesis testingStatistical hypothesis testingFunctional (mathematics)Software bugSet (mathematics)Branch (computer science)Structural loadProduct (business)Total S.A.Level (video gaming)
23:08
Integrated development environmentString (computer science)ASCIIWritingHuman migrationStatistical hypothesis testingMiniDiscStatistical hypothesis testingLevel (video gaming)CodeCore dumpScripting languageError messagePlanningComputer fileDatabaseMultiplication signoutputStatistical hypothesis testingPlug-in (computing)Uniform resource locatorMereologyPhysical systemReal-time operating systemStatistical hypothesis testingIntegrated development environmentProduct (business)CodeHuman migrationLine (geometry)Virtual realityRead-only memoryWritingMiniDiscDirectory serviceProjective planeMobile appOrder (biology)Level (video gaming)Revision controlQuicksortStructural loadCartesian coordinate systemUnicodeComputer animation
30:42
Term (mathematics)Multiplication signElectronic mailing listProjective planeDrop (liquid)Core dumpResultantMereologyLecture/Conference
33:14
Lecture/Conference
Transcript: English(auto-generated)
00:04
So, my name is Leonard, don't bother about my last name. If somebody asks me how to pronounce it, I get self-conscious and then I mispronounce it.
00:21
I'm born in Sweden, but I live in Poland with my wife, daughter, two cats and way too many fruit trees. I've been using Python since Python 1.5.2 and I've been working with Python and web since 2001. I wrote the book on how to move from Python 2 to Python 3.
00:43
You can find it in HTML and PDF on Python3porting.com. It's open source on GitHub. And I work for Brightcore. Brightcore, we're doing this type of software that insurance companies use to deal with insurance policies and claims, so it's not fairly interesting unless you are in the insurance
01:04
industry. We work completely remotely and, yes, we are hiring, so if you want a job and you are looking for remote work, you can talk to me. I'm new to the whole recruiting bit and I've done this before, but talk to me anyway.
01:21
We are not on Python 3 yet. We have just started. It's an ongoing effort. But at my last job, called Shoebox, which is also a very nice company and which most likely are also looking for people, although I should warn you that the system there is insanely complex, we successfully moved this large and insanely complex system to Python
01:47
3 last year. So let's step back in time, back to the Stone Age, when you or somebody at your current job made some sort of application in Python.
02:04
And this is you, back in the Stone Age, with your web framework. And whoever did this, you or the other person at your job, made such a good job that this application is still running. It's probably a web app.
02:21
It's probably some old version of turbo gears, web to pie or maybe even zoop. And you have for years now been bravely running away from Python 3. But you can't run any longer because Python 3 is committing suicide.
02:41
But don't be afraid of Python 3. A lot of people are afraid of it and think it's horrible and bad and everything, but it's not the killer rabbit of Kerbanog. It's just a regular old Python. Now, the hard part of porting to Python 3 is getting your system into a state where
03:02
it's easy to port, and this is something that is a benefit for you anyway to do this, to fix up your system. The porting itself is quite easy. It's what comes first that is hard. And that first step of that is to stop being a fire department, because many large
03:22
organizations are constantly just putting out fires in their applications. And that's not a good situation to port to Python 3, because if the changes that you are making to your system as a part of normal development keeps breaking it and
03:42
turning into problems and you have to fix them in panics, then moving to Python 3 is going to create several of these fires, and that's going to be a big problem. Also, if you are in constant firefighting mode, you don't have time to move to Python 3. So you have to first get development to be normal and calm and regular.
04:09
So you have to get out of firefighting mode. Now how to do that is in itself a whole talk or maybe a whole conference, and I was asked, would not be the person to do that anyway, because I'm not a DevOp guy.
04:24
I'll just mention some things that I've seen being done to fix this situation. And this slide here assumes that your software is a service of some sort, a web app or some other service, and that you have like a production environment that you need to
04:41
keep up. Because that's the firefighting that I've seen and that I know, and I don't even know if you can have firefighting, if you have some other sort of application, but if you do have firefighting in another sort of situation, then come talk to me, because I'm interested in hearing why you have firefights in that situation, why you're
05:03
firefighting. So to port to Python 3, you need to have tests, because otherwise you don't know if it's going to work on Python 3. But tests also help with stability. So if you are firefighting and there is a problem, make sure you have a test to make sure that never happens again.
05:22
Always add tests. And you have to run those tests, and that means that for any sizable project you need to have continuous integration. If you have a production situation, if you have a production server, you also need staging servers to test things on.
05:40
You should have automatic deployment. Deploying the latest release or just making the latest release from master or from a production branch should just be pushing a button. You shouldn't need to do anything more, and everything else should be automatic.
06:01
Extra points if this is done automatically every night to a staging server, so you know that your release procedures actually work. And monitoring. Of course, as the previous speaker said, you should know if there is a problem before
06:23
your users know it. And there are some Python specific things you can do, too, like you should run in an isolated production environment, and that means a buildout or a virtualenv and maybe some sort of containers, containers are very in now and have been for several years, and
06:43
that helps so you don't get weird interactions with operating systems. Like for example, Docker, and for most of you this, what I'm going to say now, it's probably obvious, but I just only realised it the last few months when I worked at Brightcore, so I'm going to mention it because it's new to me.
07:02
If you use Docker on production, you quite often have to rebuild the Docker images. For example, every time you have new requirements, you rebuild the Docker images because a part of the images is the virtualenv that you install and you install all the packages. And if you do that, and some new requirement creates a conflict when installing, you get
07:26
that error when building the document images, not when pushing to production, and that's a really good thing, because your deployment doesn't mess up production because it breaks when you're building the images. In addition, you can then use those images on continuous integration and maybe even develop
07:45
on them so that your developers have exactly the same environment as production. So that's really nice, and it's like, oh, wow, now I understand why everybody is talking about Docker, it took me years. So with all these things in place, your firefighters can take it easy, and you can
08:04
go on to preparing, or you can go on to planning, which I'm going to talk about later, or you can do both at the same time. So there's two stages here, preparing and planning, and they are independent, you can do both at any time.
08:21
And the first preparing is that you should pin all your versions of all your packages, every requirement that you have. And if you don't know what pin means, it means that in your requirements file you specify exactly which version, not at least this version or less than version, exactly which
08:41
version should be pinned. And unfortunately I have not found a way to require this in pip, to tell pip that everything has to have a pinned version. So one way you can do this is to verify in the install script, or if you have Docker
09:03
in the build images, that what you installed by getting a pip freeze, you get a list of exactly what you installed, and compare that to your requirements file, so that you don't have installed something that's not in the requirements file, for example. Another way to do it is to add hashes to the requirements file.
09:23
Then you're specifying not just which version, but which exact package to install, so you install a specific wheel or a specific egg or something, you can have several hashes for each version, so you can say all these are okay.
09:40
This has the benefit that as soon as you specify one hash, pip will refuse to install anything that doesn't have a hash. So that way you know that you are getting exactly what you want when you're installing it. It's extra maintenance, extra work to get all these hashes in, but it also means that
10:01
if somebody uploads a malicious package to the cheese shop, you won't download that by a mistake. You know exactly what you're installing. So one of those versions, make sure that you know exactly what you have when you're installing.
10:21
You might also, as a preparing, want to increase the test coverage even more, because it's very good to have a line coverage when porting to Python 3, so there's no hidden Python 2 statements somewhere that you missed in the porting. What percentage of test coverage you want is really a matter of opinion.
10:43
100% is obviously awesome, but for a big project that's generally unobtainable. 90-95% maybe seems reasonable. And you can bridge the gap by reading all the lines that are uncovered by actually having before every big release, or at least before you're trying to do the last big pushes,
11:09
that you actually check all the code and you just read it manually, because at some point that gets easier than writing a test for them. When testing, there's one big thing that you might encounter, and there's this philosophy
11:26
when it comes to unit testing that you should test each function separately, you should have one test for one function, and every call from that function out to other functions
11:41
should be mocked. But if you do that, you only test that the function is doing what you tell it to do. You don't actually test that it works. And if the API calls then changes, the test will still pass, and this is a huge problem
12:01
with Python 3, obviously, because the standard library changes. So this type of testing is practically useless when porting to Python 3. So if you are doing this in your unit tests, if you have this principle and follow that to mock out all the calls from a function when you test it, then you need to have 95%
12:25
coverage from your integration tests, your unit tests you can basically ignore. After this you need to upgrade your dependencies. You have to make sure that the latest Python 2 compatible version is what you're using of all your dependencies.
12:45
And after you have done that, you have to make sure that all of the dependencies you have are also Python 3 compatible. And you may have to replace or in worst case port those dependencies. But since those are separate packages, that's
13:01
generally relatively easy to do unless the package is highly magical, but by today most highly magical popular packages already support Python 3. And if they don't, like for example Python MySQL, there are forks of them that people are moving over to that do support Python 3.
13:22
So this stage can take a significant time, especially if you have not been keeping your dependencies up to date. I have met people and talked to people here that are still running on like Python 2.6 because they actually can't upgrade to Python 2.7 and stuff like this. So if you're in that situation, expect this
13:42
to take time. And then you come to stage 3, planning, or you already did it. And planning here is a lot about how many people in your team should do the porting, should all of them be involved, and should you move to Python 3 directly
14:05
or should you have Python 2 and Python 3 compatible code for a while. And there's three questions there I have for you. The first is can you stop adding features and stop firefighting and for how long can you do that? Because porting will in best
14:22
case take two weeks and in worst case even if you do everything at one go it can still take months. Can you stop adding features and stop firefighting that long? Do you have some deep magic that only a few of your developers understand? Because that
14:44
deep magic has a big risk that it's difficult to port to Python 3 and that bit will then block everything else. And how big is your team? If you have 50 people you can't put all of them on porting to Python 3, that's just a logistical nightmare, the mythical
15:05
man month remains mythical even with Python 3. So you can't put 10 people on doing this, maybe more, 20 I think is stretching it. Unless you are very good in your organisation
15:23
at putting a lot of people on doing one thing. And if your system is already split up to multiple separate services, then you can put one team on each of these services so then you can easily put 5 or 10 people on each service so then you are way ahead
15:44
of the game. But most of these old systems are monoliths. So some different strategies here then is to do it all in one go and you don't have deep magic, you can't stop adding features for a month, maybe why not do it all in one go? Well, it takes less time to do it, it's less work in total, a little bit, but
16:08
not a lot less work, but a little bit. And you can aim directly for Python 3 code which is a benefit and speeds things up. But there is a high risk of doing this. If you start doing this, you put all your 7 developers on porting to Python 3 for two
16:23
weeks and then you discover that there is some huge issue that means you kind of have to stop right now. Well, then you go back to adding features and adding and fixing bugs and your two branches are going to start to diverge. And there is a risk that when
16:41
you start half a year later with Python 3 again that you basically have to throw away all the work that you did during those two weeks. So it's a very high risk strategy of doing it. And of course all other work has to stop. So slow and steady is a safer strategy. And this means that you aim to write code that
17:03
will run under Python 2 and Python 3 at the same time. Although you run it on Python 2 in production, until everything works under Python 3 and then you can switch. This is the low risk version. It doesn't disrupt normal operations. It's a little bit more work
17:24
and more importantly it takes longer time because you're going to still do all your other work at the same time so Python 3 gets pushed a little bit to the side and it can take half a year to get through all of this because not everybody is working on it.
17:41
And of course you need dual version support which means it takes a little bit more work. What you can do if you have a development team that is small enough to fit into one big house, you can start with a Python 3 sprint for all the developers but not aim
18:02
for Python 3 but aim for Python 2 and Python 3 compatible code so it runs on both. That way when you come back half done you can switch to having a dedicated team to do the last bit or just do it as a background task when you don't have anything that is really, really critical. And this is what we did at Chewbox. We rented a house in southern Spain
18:24
during the winter when there's low season so it was cheap. Got all the guys, almost all the developers in there and we tried to move to Python 3 for a week. And we got almost the whole way there. It was, we got a fair bit done. So of course we weren't done but
18:46
you know we had solved most of the critical issues and it's a lot of fun to get everybody into one room and just hack away on something. So this is low risk because you're aiming for Python 2 and Python 3 compatible code. It only disrupts your normal operation briefly for a week
19:04
or two or however much you want to take. And everybody gets on board and feels involved which is good. It's not just one or two guys in a corner sitting, porting to Python 3 where everybody else just sits and go oh Python 3, Python 2 that was good we
19:20
shouldn't have and stuff like that. So everybody gets involved so it's good. The drawback is that you do still need dual version support. It's still fairly slow although not as slow as just the really slow version. Then you come to the actual porting stage and there's
19:43
several things you need to do here. You will not start to run your tests under Python 3 here. This will obviously fail and that's okay but your continuous integration system still needs to
20:02
run it under Python 3 and make sure that as much as possible runs under Python 3 because otherwise people will add back incompatible code. And if you have some people trying to port to Python 3 while other people are adding Python 2 code you're going to backslide and
20:20
you'll never ever going to get done. The trick to stopping this is continuous integration. But of course you cannot just let your continuous integration say no this failed because it doesn't work on Python 3 because in the beginning basically all tests will fail or
20:40
in fact it probably won't even be able to find the tests in the beginning. So what you need to do is get your CI gurus the people who knows your continuous integration system well to set it up to keep track of which functions which tests that once passed
21:01
under Python 3. And if they passed under Python 3 then the CI run should fail if that test no longer runs under Python 3. And that way every time you change something and some tests stop working under Python 3 you're going to have to fix that. And sometimes that means that you make a small little change. It's like it's a bitsy little change
21:24
you just fix a little bug and suddenly lots of things that used to work under Python 3 no longer watch under Python 3 and then you have to spend a whole day fixing all this. Sometimes it's the tests that need fixing. It's really boring work but these things happen and you have to do that to stop this backsliding. We turned it off briefly
21:45
for a firefighting thingy at shoebox and forgot to turn it back on for a month or two and there was loads of incompatible code added during this time even though everybody knew they shouldn't do it. It just happens by mistake and you basically have to go on
22:04
and fix a lot of issues that you already fixed once again. So that's really annoying. So have this. Stop the backsliding. I'm sure you know what 2 to 3 is already. It's this tool that will convert Python 2 code to Python 3 code. What's really helpful
22:25
here is to use modernize. Modernize is a set of extensions to 2 to 3 that will convert from Python 2 code to code that is compatible with both Python 2 and Python 3 and it does this by using the 6 compatibility layer. There's another compatibility layer called
22:45
Python future. It also has its own 2 to 3 extensions but Python future inserts a lot of magic primarily into Python 2 to make it look more like Python 3 and this magic has bitten me several times so my recommendation is to not use Python future but to rely on
23:05
Python modernize. And as I mentioned the first errors you will get are errors that actually prevent you from even finding the tests to run. The test runner won't find anything because you will just get import errors everywhere and behind those import errors
23:25
there's usually either other import errors or syntax errors. So you're going to have to fix that. And the way to fixing that especially in the beginning is to figure out what is wrong, find one of these Python modernize or 2 to 3 fixers that will fix
23:45
that specific wrongness and then run it. Maybe even just on that file where you had a problem because if you start with just going oh I'm just going to run Python modernize on everything and then go on from there then when you find errors those errors may
24:03
be in lines that already have been changed and then you don't know if that error was really there from the start or if it's an error that was introduced when running the fixers. So therefore in the beginning you need to do this slowly and carefully one fixer at a time maybe even on one file at a time just to fix that file and then you run
24:28
the tests again and the import error you get is in some other location so then good then you fix that and then you go on to the next import error. You could of course
24:42
just once you find the error you go oh I know what to do here and this is an easy way easy thing to fix and it's tempting to just change the code save it and run the thing again but the problem is the next error you will get three lines down is the same thing
25:00
again and doing that gets quickly very boring so use these fixers to run on files so it doesn't get so boring because it will fix several places at one time but do one fixer at a time and then you just fix fix fix fix and this is where the book is finally useful
25:24
because the book is about finding how to fix these errors and as you get more confident you can start running those fixers on like maybe a whole directory at a time and things like this because you're starting to get a better feel for what is happening but if you run it
25:47
on a lot of files at once and you're several people doing this you're going to get merge conflicts so this is why it's good to do it one file at a time if you're many don't forget that you have scripts in your development environment usually you have some
26:04
sort of helper scripts to create test data to copy databases from production so you can test on real data locally these kind of things loads of these little helper scripts they're going to have to be ported to if they run in a separate virtual environment you can actually do that first as practice as a good thing to get up and running on on porting
26:26
if they run in the same environment same virtual environment as your main application that's usually because they import that implication application to do things and then you're going to have to port them last but don't forget that these also have to be ported
26:43
and the sooner the better basically you also need to write data migration tests you have to take the data that you have that is generated under Python 2 and make sure that you can still load it under Python 3 and that you get the right thing that you get Unicode when you expect
27:02
Unicode that the encodings are still correct basically anytime you're loading loading data from a database or disk you need to have a test there and if it doesn't work you need to write migration scripts and if you're using pickles well I'm sorry you're in deep shit
27:28
so once all tests pass or maybe even before you try to push Python 3 to staging try to run this under Python under on the staging under Python 3 this is going to fail the first few
27:41
times and that's okay and then once everything seems to work test it properly on staging with production data that everything seems to work fine click through everything be thorough and once that also works you pushed it put to production or if you don't have a production
28:02
then you make a release if you have production and you can actually move like one customer at a time to Python 3 do that take it slow and careful if possible if you need to migrate the database to get onto Python 3 try starting everything read only so you know that it at least works
28:28
in that situation first before you enable editing if you can fall back to Python 2 be prepared to fall back to Python 2 if there's an error and then when you have it on production
28:47
I've had it for a few weeks or so you party you're done yeah I got it on Python 3 and after party you have to clean up and that's not so fun but once you cleaned up after the party you have to clean up the code and that is a lot of fun now you can get rid of all
29:06
those Python 2 backwards compatibility things and that feels very satisfying this is a really nice part of the project getting rid of all the old craft see this as an opportunity to just
29:23
prettify your code in general just go through it fix it up remove anything old and ugly pet ate it maybe run it through black to get everything formatted exactly as it should be and things like this make your code feel new and shiny again it doesn't take very long to do this
29:44
actually one because you have to go through the code to remove the old Python 2 backwards compatibility things anyway prettifying and cleaning up the code in general is basically you get that for free and in general even with a big system this just takes a few days
30:01
so do it it feels really nice and then done you're up on Python 3 the code doesn't even run on Python 2 anymore everything is fine and finished and you have all the new features of Python 3 and you can start using them so in summary stop firefighting prepare and plan
30:26
in whatever order you want fix the tests on the Python 3 push to staging production and then clean up that's the general plan any questions uh and going to uh 2 plus 3 compatibility
31:13
and support also for long term because we are talking about people who will still be using Python 2 in years to come um and then we have another big project related to the first one
31:26
and we are after the experience with this we we were thinking of going straight away to Python 3 drop support for Python 2 now I got from your talk that maybe this could have been
31:43
the result of using future is because exactly what we got is we spent so much time fixing the Python 2 part after migrating to 2 plus 3 the 3 uh the 3 was working well the 2 was suddenly broken everywhere yeah so is this would you recommend if someone has to still support 2 and 3
32:06
to to really I mean is it futurized versus modernize and 6 yeah I recommend modernize and 6 then if you need to run on both Python 2 and Python 3 yes I mean if you're already using futurize and we have both on bright core and on on shoebox in the list of our requirements
32:28
with futurizes there because it's being used by other packages that we are using so people are using it successfully so if you are using it successfully and it's it's working then that
32:40
that's fine um but but if you're not already using futurize I would recommend against it because I think it's more trouble than it's worth anything else all right uh yeah come and talk to me about your experience in trying
33:08
to move to Python 3 that's interesting too so thank you