We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Moving big projects to Python 3

00:00

Formal Metadata

Title
Moving big projects to Python 3
Subtitle
Did you think the language differences were difficult bit?
Title of Series
Number of Parts
118
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Next year Python 2 is no longer maintained. But you have a monster code base with clever tricks and libraries that don't support Python 2, and your data may be stored in a format that is hard to move Python 3. And that's the easy bit. This talk focuses on the process of moving, not the code changes. Because it's the process that is the hard part. How do you get your code in a state where it's ready to move? How do you get the whole team on the boat to Python 3? All Python 3 talks I have seen, including those I have given, and all the texts on how to port, including the book I wrote, focus on the code changes. With increasing backwards compatibility in Python 3 and forward-compatibility in Python 2, this actually became a lesser problem for big code bases. The extra issues of large, old code bases Can you stop adding features? (1 min) Separate team vs getting everyone on it (2 min) Python 2 compatibility: You need it (1 min) The steps Fix your development process (2 min) Replace old libraries, or take over maintenance and port them (2 min) Make sure your tests are solid (1 min) Run 2to3 but only backwards compatible fixers (2 min) Run tests on Python 3 to stop backsliding (4 min) Run all tests: Expansive or slow Store passed tests Detect tests that change Turning it off adds a lot of extra work Port all your little utilities and tool scripts (1 min) Fix fix fix fix (1 min) Add tests with Python 2 data, to test migration (2 min) You might need migration scripts Extra careful staging tests (1 min) Production: Try, fail, repeat (1 min) Clean the code up (3 min)
Keywords
20
58
GoogolPoint cloudPhysical systemMaxima and minimaCharge carrierScale (map)Service (economics)Core dumpDigital signalContinuous integrationServer (computing)Integrated development environmentVirtual realityMultiplication signComputer-assisted translationIntegrated development environmentCuboidMedical imagingLevel (video gaming)Statistical hypothesis testingCore dumpAdditionProduct (business)Process (computing)QuicksortStatistical hypothesis testingPhysical systemMereologySoftware developerCartesian coordinate systemWeb 2.0Software frameworkServer (computing)Interactive televisionStability theoryProjective planeProbability density functionComa BerenicesRevision controlComplex systemMathematicsSoftwareWeb applicationError messageType theoryState of matterAsynchronous Transfer ModePoint (geometry)Procedural programmingSlide ruleBitNormal (geometry)Service (economics)LengthContinuous integrationNetwork topologySinc functionLatent heatTurbo-CodeTraffic reportingSelf-organizationOpen sourceLogical constantBranch (computer science)PiLecture/ConferenceComputer animation
Level (video gaming)Revision controlHash functionCurvatureLatent heatOvalHash functionInstallation artMultiplication signUnit testingSoftware maintenanceLine (geometry)PlanningFunctional (mathematics)Statement (computer science)CASE <Informatik>Statistical hypothesis testingRevision controlSoftware developerLevel (video gaming)Standard deviationSelf-organizationLibrary (computing)Type theoryPersonal identification numberProjective planePoint (geometry)CodeMathematicsService (economics)Execution unitMedical imagingElectronic mailing listStatistical hypothesis testingBitPhysical systemComputer fileINTEGRALSystem callSinc functionMetropolitan area networkFlow separationComputer animation
Strategy gameCodeOperations researchRevision controlTotal S.A.Normal (geometry)Military operationStatistical hypothesis testingNormal operatorSoftware developerTask (computing)Extension (kinesiology)BitMultiplication signInsertion lossError messageGoodness of fitDuality (mathematics)TrailRevision controlService (economics)CodeWhiteboardMathematicsGame theoryPhysical systemStrategy gameContinuous integrationStatistical hypothesis testingStatistical hypothesis testingFunctional (mathematics)Software bugSet (mathematics)Branch (computer science)Structural loadProduct (business)Total S.A.Level (video gaming)
Integrated development environmentString (computer science)ASCIIWritingHuman migrationStatistical hypothesis testingMiniDiscStatistical hypothesis testingLevel (video gaming)CodeCore dumpScripting languageError messagePlanningComputer fileDatabaseMultiplication signoutputStatistical hypothesis testingPlug-in (computing)Uniform resource locatorMereologyPhysical systemReal-time operating systemStatistical hypothesis testingIntegrated development environmentProduct (business)CodeHuman migrationLine (geometry)Virtual realityRead-only memoryWritingMiniDiscDirectory serviceProjective planeMobile appOrder (biology)Level (video gaming)Revision controlQuicksortStructural loadCartesian coordinate systemUnicodeComputer animation
Term (mathematics)Multiplication signElectronic mailing listProjective planeDrop (liquid)Core dumpResultantMereologyLecture/Conference
Lecture/Conference
Transcript: English(auto-generated)
So, my name is Leonard, don't bother about my last name. If somebody asks me how to pronounce it, I get self-conscious and then I mispronounce it.
I'm born in Sweden, but I live in Poland with my wife, daughter, two cats and way too many fruit trees. I've been using Python since Python 1.5.2 and I've been working with Python and web since 2001. I wrote the book on how to move from Python 2 to Python 3.
You can find it in HTML and PDF on Python3porting.com. It's open source on GitHub. And I work for Brightcore. Brightcore, we're doing this type of software that insurance companies use to deal with insurance policies and claims, so it's not fairly interesting unless you are in the insurance
industry. We work completely remotely and, yes, we are hiring, so if you want a job and you are looking for remote work, you can talk to me. I'm new to the whole recruiting bit and I've done this before, but talk to me anyway.
We are not on Python 3 yet. We have just started. It's an ongoing effort. But at my last job, called Shoebox, which is also a very nice company and which most likely are also looking for people, although I should warn you that the system there is insanely complex, we successfully moved this large and insanely complex system to Python
3 last year. So let's step back in time, back to the Stone Age, when you or somebody at your current job made some sort of application in Python.
And this is you, back in the Stone Age, with your web framework. And whoever did this, you or the other person at your job, made such a good job that this application is still running. It's probably a web app.
It's probably some old version of turbo gears, web to pie or maybe even zoop. And you have for years now been bravely running away from Python 3. But you can't run any longer because Python 3 is committing suicide.
But don't be afraid of Python 3. A lot of people are afraid of it and think it's horrible and bad and everything, but it's not the killer rabbit of Kerbanog. It's just a regular old Python. Now, the hard part of porting to Python 3 is getting your system into a state where
it's easy to port, and this is something that is a benefit for you anyway to do this, to fix up your system. The porting itself is quite easy. It's what comes first that is hard. And that first step of that is to stop being a fire department, because many large
organizations are constantly just putting out fires in their applications. And that's not a good situation to port to Python 3, because if the changes that you are making to your system as a part of normal development keeps breaking it and
turning into problems and you have to fix them in panics, then moving to Python 3 is going to create several of these fires, and that's going to be a big problem. Also, if you are in constant firefighting mode, you don't have time to move to Python 3. So you have to first get development to be normal and calm and regular.
So you have to get out of firefighting mode. Now how to do that is in itself a whole talk or maybe a whole conference, and I was asked, would not be the person to do that anyway, because I'm not a DevOp guy.
I'll just mention some things that I've seen being done to fix this situation. And this slide here assumes that your software is a service of some sort, a web app or some other service, and that you have like a production environment that you need to
keep up. Because that's the firefighting that I've seen and that I know, and I don't even know if you can have firefighting, if you have some other sort of application, but if you do have firefighting in another sort of situation, then come talk to me, because I'm interested in hearing why you have firefights in that situation, why you're
firefighting. So to port to Python 3, you need to have tests, because otherwise you don't know if it's going to work on Python 3. But tests also help with stability. So if you are firefighting and there is a problem, make sure you have a test to make sure that never happens again.
Always add tests. And you have to run those tests, and that means that for any sizable project you need to have continuous integration. If you have a production situation, if you have a production server, you also need staging servers to test things on.
You should have automatic deployment. Deploying the latest release or just making the latest release from master or from a production branch should just be pushing a button. You shouldn't need to do anything more, and everything else should be automatic.
Extra points if this is done automatically every night to a staging server, so you know that your release procedures actually work. And monitoring. Of course, as the previous speaker said, you should know if there is a problem before
your users know it. And there are some Python specific things you can do, too, like you should run in an isolated production environment, and that means a buildout or a virtualenv and maybe some sort of containers, containers are very in now and have been for several years, and
that helps so you don't get weird interactions with operating systems. Like for example, Docker, and for most of you this, what I'm going to say now, it's probably obvious, but I just only realised it the last few months when I worked at Brightcore, so I'm going to mention it because it's new to me.
If you use Docker on production, you quite often have to rebuild the Docker images. For example, every time you have new requirements, you rebuild the Docker images because a part of the images is the virtualenv that you install and you install all the packages. And if you do that, and some new requirement creates a conflict when installing, you get
that error when building the document images, not when pushing to production, and that's a really good thing, because your deployment doesn't mess up production because it breaks when you're building the images. In addition, you can then use those images on continuous integration and maybe even develop
on them so that your developers have exactly the same environment as production. So that's really nice, and it's like, oh, wow, now I understand why everybody is talking about Docker, it took me years. So with all these things in place, your firefighters can take it easy, and you can
go on to preparing, or you can go on to planning, which I'm going to talk about later, or you can do both at the same time. So there's two stages here, preparing and planning, and they are independent, you can do both at any time.
And the first preparing is that you should pin all your versions of all your packages, every requirement that you have. And if you don't know what pin means, it means that in your requirements file you specify exactly which version, not at least this version or less than version, exactly which
version should be pinned. And unfortunately I have not found a way to require this in pip, to tell pip that everything has to have a pinned version. So one way you can do this is to verify in the install script, or if you have Docker
in the build images, that what you installed by getting a pip freeze, you get a list of exactly what you installed, and compare that to your requirements file, so that you don't have installed something that's not in the requirements file, for example. Another way to do it is to add hashes to the requirements file.
Then you're specifying not just which version, but which exact package to install, so you install a specific wheel or a specific egg or something, you can have several hashes for each version, so you can say all these are okay.
This has the benefit that as soon as you specify one hash, pip will refuse to install anything that doesn't have a hash. So that way you know that you are getting exactly what you want when you're installing it. It's extra maintenance, extra work to get all these hashes in, but it also means that
if somebody uploads a malicious package to the cheese shop, you won't download that by a mistake. You know exactly what you're installing. So one of those versions, make sure that you know exactly what you have when you're installing.
You might also, as a preparing, want to increase the test coverage even more, because it's very good to have a line coverage when porting to Python 3, so there's no hidden Python 2 statements somewhere that you missed in the porting. What percentage of test coverage you want is really a matter of opinion.
100% is obviously awesome, but for a big project that's generally unobtainable. 90-95% maybe seems reasonable. And you can bridge the gap by reading all the lines that are uncovered by actually having before every big release, or at least before you're trying to do the last big pushes,
that you actually check all the code and you just read it manually, because at some point that gets easier than writing a test for them. When testing, there's one big thing that you might encounter, and there's this philosophy
when it comes to unit testing that you should test each function separately, you should have one test for one function, and every call from that function out to other functions
should be mocked. But if you do that, you only test that the function is doing what you tell it to do. You don't actually test that it works. And if the API calls then changes, the test will still pass, and this is a huge problem
with Python 3, obviously, because the standard library changes. So this type of testing is practically useless when porting to Python 3. So if you are doing this in your unit tests, if you have this principle and follow that to mock out all the calls from a function when you test it, then you need to have 95%
coverage from your integration tests, your unit tests you can basically ignore. After this you need to upgrade your dependencies. You have to make sure that the latest Python 2 compatible version is what you're using of all your dependencies.
And after you have done that, you have to make sure that all of the dependencies you have are also Python 3 compatible. And you may have to replace or in worst case port those dependencies. But since those are separate packages, that's
generally relatively easy to do unless the package is highly magical, but by today most highly magical popular packages already support Python 3. And if they don't, like for example Python MySQL, there are forks of them that people are moving over to that do support Python 3.
So this stage can take a significant time, especially if you have not been keeping your dependencies up to date. I have met people and talked to people here that are still running on like Python 2.6 because they actually can't upgrade to Python 2.7 and stuff like this. So if you're in that situation, expect this
to take time. And then you come to stage 3, planning, or you already did it. And planning here is a lot about how many people in your team should do the porting, should all of them be involved, and should you move to Python 3 directly
or should you have Python 2 and Python 3 compatible code for a while. And there's three questions there I have for you. The first is can you stop adding features and stop firefighting and for how long can you do that? Because porting will in best
case take two weeks and in worst case even if you do everything at one go it can still take months. Can you stop adding features and stop firefighting that long? Do you have some deep magic that only a few of your developers understand? Because that
deep magic has a big risk that it's difficult to port to Python 3 and that bit will then block everything else. And how big is your team? If you have 50 people you can't put all of them on porting to Python 3, that's just a logistical nightmare, the mythical
man month remains mythical even with Python 3. So you can't put 10 people on doing this, maybe more, 20 I think is stretching it. Unless you are very good in your organisation
at putting a lot of people on doing one thing. And if your system is already split up to multiple separate services, then you can put one team on each of these services so then you can easily put 5 or 10 people on each service so then you are way ahead
of the game. But most of these old systems are monoliths. So some different strategies here then is to do it all in one go and you don't have deep magic, you can't stop adding features for a month, maybe why not do it all in one go? Well, it takes less time to do it, it's less work in total, a little bit, but
not a lot less work, but a little bit. And you can aim directly for Python 3 code which is a benefit and speeds things up. But there is a high risk of doing this. If you start doing this, you put all your 7 developers on porting to Python 3 for two
weeks and then you discover that there is some huge issue that means you kind of have to stop right now. Well, then you go back to adding features and adding and fixing bugs and your two branches are going to start to diverge. And there is a risk that when
you start half a year later with Python 3 again that you basically have to throw away all the work that you did during those two weeks. So it's a very high risk strategy of doing it. And of course all other work has to stop. So slow and steady is a safer strategy. And this means that you aim to write code that
will run under Python 2 and Python 3 at the same time. Although you run it on Python 2 in production, until everything works under Python 3 and then you can switch. This is the low risk version. It doesn't disrupt normal operations. It's a little bit more work
and more importantly it takes longer time because you're going to still do all your other work at the same time so Python 3 gets pushed a little bit to the side and it can take half a year to get through all of this because not everybody is working on it.
And of course you need dual version support which means it takes a little bit more work. What you can do if you have a development team that is small enough to fit into one big house, you can start with a Python 3 sprint for all the developers but not aim
for Python 3 but aim for Python 2 and Python 3 compatible code so it runs on both. That way when you come back half done you can switch to having a dedicated team to do the last bit or just do it as a background task when you don't have anything that is really, really critical. And this is what we did at Chewbox. We rented a house in southern Spain
during the winter when there's low season so it was cheap. Got all the guys, almost all the developers in there and we tried to move to Python 3 for a week. And we got almost the whole way there. It was, we got a fair bit done. So of course we weren't done but
you know we had solved most of the critical issues and it's a lot of fun to get everybody into one room and just hack away on something. So this is low risk because you're aiming for Python 2 and Python 3 compatible code. It only disrupts your normal operation briefly for a week
or two or however much you want to take. And everybody gets on board and feels involved which is good. It's not just one or two guys in a corner sitting, porting to Python 3 where everybody else just sits and go oh Python 3, Python 2 that was good we
shouldn't have and stuff like that. So everybody gets involved so it's good. The drawback is that you do still need dual version support. It's still fairly slow although not as slow as just the really slow version. Then you come to the actual porting stage and there's
several things you need to do here. You will not start to run your tests under Python 3 here. This will obviously fail and that's okay but your continuous integration system still needs to
run it under Python 3 and make sure that as much as possible runs under Python 3 because otherwise people will add back incompatible code. And if you have some people trying to port to Python 3 while other people are adding Python 2 code you're going to backslide and
you'll never ever going to get done. The trick to stopping this is continuous integration. But of course you cannot just let your continuous integration say no this failed because it doesn't work on Python 3 because in the beginning basically all tests will fail or
in fact it probably won't even be able to find the tests in the beginning. So what you need to do is get your CI gurus the people who knows your continuous integration system well to set it up to keep track of which functions which tests that once passed
under Python 3. And if they passed under Python 3 then the CI run should fail if that test no longer runs under Python 3. And that way every time you change something and some tests stop working under Python 3 you're going to have to fix that. And sometimes that means that you make a small little change. It's like it's a bitsy little change
you just fix a little bug and suddenly lots of things that used to work under Python 3 no longer watch under Python 3 and then you have to spend a whole day fixing all this. Sometimes it's the tests that need fixing. It's really boring work but these things happen and you have to do that to stop this backsliding. We turned it off briefly
for a firefighting thingy at shoebox and forgot to turn it back on for a month or two and there was loads of incompatible code added during this time even though everybody knew they shouldn't do it. It just happens by mistake and you basically have to go on
and fix a lot of issues that you already fixed once again. So that's really annoying. So have this. Stop the backsliding. I'm sure you know what 2 to 3 is already. It's this tool that will convert Python 2 code to Python 3 code. What's really helpful
here is to use modernize. Modernize is a set of extensions to 2 to 3 that will convert from Python 2 code to code that is compatible with both Python 2 and Python 3 and it does this by using the 6 compatibility layer. There's another compatibility layer called
Python future. It also has its own 2 to 3 extensions but Python future inserts a lot of magic primarily into Python 2 to make it look more like Python 3 and this magic has bitten me several times so my recommendation is to not use Python future but to rely on
Python modernize. And as I mentioned the first errors you will get are errors that actually prevent you from even finding the tests to run. The test runner won't find anything because you will just get import errors everywhere and behind those import errors
there's usually either other import errors or syntax errors. So you're going to have to fix that. And the way to fixing that especially in the beginning is to figure out what is wrong, find one of these Python modernize or 2 to 3 fixers that will fix
that specific wrongness and then run it. Maybe even just on that file where you had a problem because if you start with just going oh I'm just going to run Python modernize on everything and then go on from there then when you find errors those errors may
be in lines that already have been changed and then you don't know if that error was really there from the start or if it's an error that was introduced when running the fixers. So therefore in the beginning you need to do this slowly and carefully one fixer at a time maybe even on one file at a time just to fix that file and then you run
the tests again and the import error you get is in some other location so then good then you fix that and then you go on to the next import error. You could of course
just once you find the error you go oh I know what to do here and this is an easy way easy thing to fix and it's tempting to just change the code save it and run the thing again but the problem is the next error you will get three lines down is the same thing
again and doing that gets quickly very boring so use these fixers to run on files so it doesn't get so boring because it will fix several places at one time but do one fixer at a time and then you just fix fix fix fix and this is where the book is finally useful
because the book is about finding how to fix these errors and as you get more confident you can start running those fixers on like maybe a whole directory at a time and things like this because you're starting to get a better feel for what is happening but if you run it
on a lot of files at once and you're several people doing this you're going to get merge conflicts so this is why it's good to do it one file at a time if you're many don't forget that you have scripts in your development environment usually you have some
sort of helper scripts to create test data to copy databases from production so you can test on real data locally these kind of things loads of these little helper scripts they're going to have to be ported to if they run in a separate virtual environment you can actually do that first as practice as a good thing to get up and running on on porting
if they run in the same environment same virtual environment as your main application that's usually because they import that implication application to do things and then you're going to have to port them last but don't forget that these also have to be ported
and the sooner the better basically you also need to write data migration tests you have to take the data that you have that is generated under Python 2 and make sure that you can still load it under Python 3 and that you get the right thing that you get Unicode when you expect
Unicode that the encodings are still correct basically anytime you're loading loading data from a database or disk you need to have a test there and if it doesn't work you need to write migration scripts and if you're using pickles well I'm sorry you're in deep shit
so once all tests pass or maybe even before you try to push Python 3 to staging try to run this under Python under on the staging under Python 3 this is going to fail the first few
times and that's okay and then once everything seems to work test it properly on staging with production data that everything seems to work fine click through everything be thorough and once that also works you pushed it put to production or if you don't have a production
then you make a release if you have production and you can actually move like one customer at a time to Python 3 do that take it slow and careful if possible if you need to migrate the database to get onto Python 3 try starting everything read only so you know that it at least works
in that situation first before you enable editing if you can fall back to Python 2 be prepared to fall back to Python 2 if there's an error and then when you have it on production
I've had it for a few weeks or so you party you're done yeah I got it on Python 3 and after party you have to clean up and that's not so fun but once you cleaned up after the party you have to clean up the code and that is a lot of fun now you can get rid of all
those Python 2 backwards compatibility things and that feels very satisfying this is a really nice part of the project getting rid of all the old craft see this as an opportunity to just
prettify your code in general just go through it fix it up remove anything old and ugly pet ate it maybe run it through black to get everything formatted exactly as it should be and things like this make your code feel new and shiny again it doesn't take very long to do this
actually one because you have to go through the code to remove the old Python 2 backwards compatibility things anyway prettifying and cleaning up the code in general is basically you get that for free and in general even with a big system this just takes a few days
so do it it feels really nice and then done you're up on Python 3 the code doesn't even run on Python 2 anymore everything is fine and finished and you have all the new features of Python 3 and you can start using them so in summary stop firefighting prepare and plan
in whatever order you want fix the tests on the Python 3 push to staging production and then clean up that's the general plan any questions uh and going to uh 2 plus 3 compatibility
and support also for long term because we are talking about people who will still be using Python 2 in years to come um and then we have another big project related to the first one
and we are after the experience with this we we were thinking of going straight away to Python 3 drop support for Python 2 now I got from your talk that maybe this could have been
the result of using future is because exactly what we got is we spent so much time fixing the Python 2 part after migrating to 2 plus 3 the 3 uh the 3 was working well the 2 was suddenly broken everywhere yeah so is this would you recommend if someone has to still support 2 and 3
to to really I mean is it futurized versus modernize and 6 yeah I recommend modernize and 6 then if you need to run on both Python 2 and Python 3 yes I mean if you're already using futurize and we have both on bright core and on on shoebox in the list of our requirements
with futurizes there because it's being used by other packages that we are using so people are using it successfully so if you are using it successfully and it's it's working then that
that's fine um but but if you're not already using futurize I would recommend against it because I think it's more trouble than it's worth anything else all right uh yeah come and talk to me about your experience in trying
to move to Python 3 that's interesting too so thank you