We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Writing an autoreloader in Python

00:00

Formal Metadata

Title
Writing an autoreloader in Python
Alternative Title
Writing a Python autoreloader
Title of Series
Number of Parts
118
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Reloading your code changes quickly is an integral part of the development experience in frameworks like Django and Flask that developers have come to rely on. But how do they work under the hood and what challenges do you face while writing one? In this talk I will explore how I refactored the autoreload implementation in Django 2.2 and the lessons I learned along the way (hint: it's surprisingly complex!). I will also be introducing a library I've developed to simplify this if you ever find yourself writing your own.
Keywords
20
58
Point cloudGoogolSheaf (mathematics)BootingTheory of relativityImplementationLecture/ConferenceComputer animation
Chemical equationCodeProcess (computing)Web pageCASE <Informatik>Connectivity (graph theory)Physical systemPerturbation theoryWeb browserInterface (computing)Gastropod shellMixed realityMathematicsMereologyProduct (business)Formal languageFerry CorstenObject (grammar)Erlang distributionModule (mathematics)Computer programmingEntire functionProcess (computing)Software frameworkInteractive televisionFunctional (mathematics)Software bugBootingVideo gameWindow1 (number)Existential quantificationBitMultiplication signMessage passingTheory of relativityRight angleNP-hardSoftware developerCodeForm (programming)Term (mathematics)AuthorizationIntegrated development environmentComputer fileState of matterConnected spaceProgrammer (hardware)Server (computing)Software testingGoodness of fitSign (mathematics)Point (geometry)Loop (music)Network topologySummierbarkeitCommitment schemeInheritance (object-oriented programming)Source codeError messageProgramming languageImplementationStress (mechanics)Variable (mathematics)Set (mathematics)Proper mapLatent heatComputer animation
Modul <Datentyp>Letterpress printingComputer fileAerodynamicsImplementationStatisticsPasswordTouch typingMathematicsCodeSensitivity analysisPhysical systemMultiplication signSheaf (mathematics)CASE <Informatik>Point (geometry)Data structureLatent heatEscape characterField (computer science)Computer fileModule (mathematics)Library (computing)ImplementationTouch typingMereologySoftware developerWater vaporSlide ruleElectronic mailing listPerturbation theorySampling (statistics)Right angleLine (geometry)AuthorizationFitness functionBootingSoftwareDirected graphWritingSocial classTimestampWindowRevision control2 (number)Image resolutionComputer hardwareSoftware bugSoftware testingMathematicsComputing platformDifferent (Kate Ryan album)Arithmetic meanDefault (computer science)Process (computing)Projective planeExtension (kinesiology)Functional (mathematics)System callLimit (category theory)File systemBytecodeProduct (business)Semiconductor memoryCase moddingRandomizationKernel (computing)StatisticsObject (grammar)Uniform resource locatorStructural loadAttribute grammarLoop (music)Cross-platformEndliche ModelltheorieLevel (video gaming)MappingGoodness of fitNormal (geometry)Type theoryComa BerenicesComputer animation
ImplementationComputer fileStatisticsMathematicsSoftware testingCountingElectric generatorModul <Datentyp>Module (mathematics)WebsiteLetterpress printingComputer fileModule (mathematics)Library (computing)Projective planeFerry CorstenFunctional (mathematics)Point (geometry)Software testingResultantCodeWater vaporProcess (computing)File systemPhysical systemCache (computing)Mobile appMathematicsMereologyLevel (video gaming)INTEGRALSet (mathematics)Virtual realityElectric generatorType theoryStack (abstract data type)Buffer overflowDistribution (mathematics)CASE <Informatik>Software developerWebsiteThread (computing)State of matterParameter (computer programming)ReliefSingle-precision floating-point formatLoop (music)Multiplication signStandard deviationLatent heatElectronic mailing listImplementationIterationNumberGame theoryDirectory serviceTheory of relativityError messageSymbol tableLink (knot theory)2 (number)BitTouch typingStatisticsMiniDiscInfinityGoodness of fitComputer animation
Installable File SystemServer (computing)Electronic signatureCrash (computing)Exception handlingChemical equationTrailComputer fileState of matterNetwork socketLogic gateBranch (computer science)Line (geometry)DemonPhysical systemProjective planeLoop (music)Operating systemLibrary (computing)Web applicationComputing platformFile systemMathematicsProcess (computing)Virtual machineKey (cryptography)Metropolitan area networkFacebookDifferent (Kate Ryan album)CodeFerry CorstenSoftware testingRevision controlStructural loadMultiplication signKernel (computing)Server (computing)WindowMessage passingBitGreen's functionGroup actionDataflowDirectory serviceStatisticsWater vaporComputer animation
Mathematical optimizationMultiplication signComputer fileDirectory serviceEndliche ModelltheorieMathematicsStatisticsImplementationCASE <Informatik>Loop (music)
WindowMultiplication signMathematical optimizationComputing platformDifferent (Kate Ryan album)Point (geometry)Finite differenceProcess (computing)MiniDiscConfiguration spacePhysical systemTranslation (relic)Meeting/Interview
Library (computing)Library (computing)SpacetimeFormal verificationIdentity managementSoftware frameworkTheory of relativityAuthorizationWeb pageXMLUMLComputer animation
Multiplication signProcess (computing)Plug-in (computing)Perturbation theoryModule (mathematics)Network socketText editorRight angleDirectory serviceExclusive orMathematicsBitCodeGame controllerExtension (kinesiology)Computer fileUtility softwarePhysical systemLatent heatSet (mathematics)Regulärer Ausdruck <Textverarbeitung>Cartesian coordinate systemImplementationPattern languageTheory of relativityGame theoryComputer configurationCASE <Informatik>Metropolitan area networkBasis <Mathematik>Computer animationLecture/Conference
Transcript: English(auto-generated)
Can you guys all hear me? All working? Okay. So, yes, my name is Tom. I'm going to talk to you today about writing an autoreloader in Python. I've broken the talk down into four sections. We're going to talk about what an autoreloader is, we're going to
talk about Django's implementation, we're going to talk about how I rebuilt it, and we're going to talk about the aftermath of how, what happened after I rebuilt it. Sounds good? So, firstly, what is an autoreloader? Like all good programmers, I Googled this and nothing came up, which surprised me. There was no definition of an autoreloader,
although it's a common development term. So I wrote this definition, which sounds sufficiently technical and vague, so it's a component in a larger system that detects and applies changes to source code without developer interaction. So, raise your hands here if you use an autoreloader in your day-to-day life, in some kind of framework. So, yes, pretty much everyone, right? Raise your hands if you could write one, or you
know in detail how it works. So, okay, one and a half people. So this is why I find them interesting. They're really common. Every developer, or most developers use them. They're a critical part of frameworks, like Django. If the autoreloader doesn't work, as you will find out later, it's kind of a big deal, even though they're not
a production thing, they're not really well understood, and they're really language-specific. An autoreloader in Python is very different from an autoreloader in JavaScript. So as an example of an autoreloader, a really simple one would be automatically refreshing your browser tab every time you change an HTML file or a JavaScript file. That's
an autoreloader. So a special case of an autoreloader is a hot reloader, and this is the holy grail of autoreloaders, because they're really fast and really efficient. So it reloads the changes to your code without restarting the system. So a really simple example of this is changing the style sheet on a web page. This is kind of hot reloading.
The browser can take the changes to the style sheet and it can apply the new styles to the page without refreshing the tab. You can hot reload CSS. And these are impossible to write safely in Python in the general case, and I'll tell you why. And a special shout out to Erlang where you hot reload code while deploying. That's how you deploy
code in Erlang. You hot reload it in production. I wouldn't stress doing that in Python. So you might say, Tom, Python has reload. Isn't that a hot reloader? Isn't this implementation hot reloading a module? So reload does nothing but reimport the module. All it does is you give it a module and it just reimports it. So, yes, this is technically
hot reloading a single module, but you need a lot more before this is a hot reloader. I don't know how well that translates into other languages than English, but what I mean is reloading a single module is very different than hot reloading an entire system or components within an entire system. And the reason for this is dependencies
are the enemy of a hot reloader. And Python modules have lots of interdependencies. So all hot reloaders are one thing in common. They all leverage language or framework features that manage dependencies between things. So in Erlang, the example, everything uses message passing. So if you want to hot reload a component in an Erlang system,
you can just bring it down and you can bring it up again. The messages, there's no dependencies between things. The dependency is message passing, which is quite easy to, it's quite a nice interface to hot reload on. CSS, it's not really a programming language, so you can just take it down, remove the style sheet from the page and add a new one and the browser takes care of the rest. React.js has a hot reloader and it leverages how React components work
themselves. So React is all about removing components from a page and adding them again and having React take care of laying out the page for you or rendering the HTML. So hot reloading a component in React is just deleting the component and adding a new one, which is really quite nice in React because it's how it works. So imagine that you could write a hot reloader in Python. So this is a
little bit wordy. You import a function inside your module. So you have a function, a module, your module.py. From another module, import some function, okay? So you have a reference to that function in your module. You then replace the code in some function with some new code. So you've
rewritten it, you fixed a bug or something. After your hot reloader kicks in, what does your module.somefunction reference? If it references the old code, then your hot reloader hasn't worked properly, so it's not right. Okay? So you could go through and find all modules or all references, all places that reference the some function function. You could
then hot reload those as well, and you could cascade, and then find all the modules that reference the module that references that one, and you can go through the whole tree of objects. This is, it just sounds complicated. It sounds really complicated, and it's really impossible to do in the general case. For any given Python programme, it's
impossible to do that safely. So for limited smaller cases, it may work. For example, IPython has a hot reloader that works in a lot of cases, but it works, it leverages how IPython is just a shell. So you don't hot reload an entire programme, you kind of hot reload parts of the
REPL that you're using. Similarly, if you have a single reference to something, then you can hot reload that safely. You can use reload if you have one reference to one module, you can call reload and you can replace the reference. That's hot reloading, that works. But to do it in the general case, you will end up with bugs, and what's worse than
having an autoreloader that doesn't work is an autoreloader that you can't trust. So if you end up with some bugs in development, hard to track down ones, you're missing, something's not right, and it's because the hot reloader hasn't worked properly, that's a terrible development experience. You're going to be spending time chasing bugs that don't exist. So how do we reload code in Python? We
turn it off and on again. We restart the process on every code change over and over again. So this is kind of like refreshing the browser window every time you make a change to a JavaScript file. You lose all the state in the process, so you lose any connections that are open, et cetera, and it starts again from fresh. This
ensures that the system or the programme is right. It works, pretty much, rather than a hot reloader where you might have some kind of bugs or you can't reload code properly. So this is how Django, the Django autoreloader works. When you run manage.py run server, Django re-executes manage.py run server again with a
specific environment variable set. The child process actually runs Django, so it runs the entire framework, it imports all your modules and does all the stuff that you want it to do, and it watches for any file changes. When a change is detected, it exits with exit code 3, and the parent Django process
restarts it. If it exits with another code, it's an unexpected error and it terminates or it shows you a message that's useful. So it's quite a simple loop. You have a process that's a supervisor and it will restart the child process when it exits. This is the most common and simplest form of an autoreloader. A little bit of the history of the Django autoreloader. The first
commit was in 2005. No major changes until 2013 when inotify support was added. KQ support was added in 2013 and it was removed one month later, which is never a good sign. I will talk about what inotify and KQ are later on, but the point here is Django code is usually very high quality, and there's
lots of emphasis on testing and readability. The autoreloader started to me as definitely an old and crafty part of Django. The code was very different, and purely because it was something that wasn't well understood. It kind of worked, don't touch it, leave it alone. The code was definitely not idiomatic and it was very hard to extend, and it was a pen-only code, right? Everyone's seen this, you kind of
just chuck features on, et cetera, you bolt it on and you hope it works. So there were some new features that we wanted to add to the autoreloader that just wouldn't have worked with the current implementation, so we needed to rewrite it. So the summary so far. An autoreloader is a common development tool. Hot reloaders are really hard to
write in Python. Python autoreloaders restart the process on code changes, and the Django autoreloader was old and hard to extend. To the fun part, we're going to rebuild the autoreloader. I like breaking things down into sections, so there's three or four steps. First one is we need to find files to monitor. We can't reload on code changes if we don't know what code we're changing or we
need to watch for. We need to wait for the changes and we need to trigger a reload. We need to make it testable, of course, especially if you're refactoring an old implementation, and bonus points make it efficient. You shouldn't prematurely optimise stuff, so get it working and then optimise things. Cool. So finally, files to
monitor. Everyone here knows this module, it contains all the modules that are currently loaded by Python. Python has quite a few modules, so just running a hello world on ipie, ipython has 642 modules loaded. Python itself, just importing sys and printing the len sys modules has 42 modules loaded, so that's quite a few modules. Sometimes
things that are not modules end up in sys modules. Sys modules is effectively a dictionary and it can be modified by arbitrary Python code. Some libraries do some crazy things, especially in development. For example, the typing.io isn't a module, even though it's in sys modules, it's a class. This was actually a bug in the
Django auto-related implementation. I naively assumed that things in sys modules are modules, which isn't true. Python imports are really dynamic as well. It's one of the most flexible and best parts of Python. You can import from zip files, you can import from pyc files.
You can write arbitrary loaders in Python to do random things on imports. So this guy here wrote a 60 lines of code, he wrote an importer that imports code directly from GitHub. So you can do from GitHub underscore com dot whoever dot username import project and it will import that code, download the code from
GitHub, install it or make it available to Python and it's there. Don't do this in production, but there's a lot of magic that can go into imports. They're not as simple as a file in the file system and a module in memory. The more common use cases for these kind of loaders are pytest. Pytest rewrites the bytecode of your test files, so it changes the
assert keywords that you use into a function call that pytest can do things with. Cython as well, which is a library for letting you write C extension modules in a nicer syntax than C. It can import, you can compile the module on import, which is quite handy in development, I guess. So yeah, there isn't
always a mapping between a module and an actual unique file, or you could have two modules with the same file, et cetera. So what can you do? What can you do if someone wants to import code directly from GitHub in development? You can't really do anything. The point here is the imports are very dynamic and not all changes can be detected. So we
can try our best to detect them. And this is a really simple implementation of something to list all the files that are installed, or modules that are loaded. So each module has a spec attribute, and that object has an origin, which is the path to the location, which can be a zip file, et cetera,
et cetera. All of these code samples are really simplistic. So the actual implementation in Django is over 40 lines long. It actually was going to include a slide with it on, but it just didn't work. It was too big. But this is conceptually what we want to do. We want to iterate over sys modules and we want to return a list of all of the file
paths we want to monitor. Pretty simple. So we found the files we want to monitor. We want to watch for changes and trigger a reload. So all file systems report the last modification time of a file. So there's a function OS stat. You can give it a file path and it returns a structure. One of the fields on the structure is the M time, which is
the last modification time of the file. And we can use this to detect changes to a file. And the important thing to note here is that the last modification time is pretty abstract. It can mean different things on different platforms and operating systems. So file systems can be weird. HFS, which
was the default file system on Mac OS before the latest version, had a one second time resolution. So there was no nanoseconds. In the previous slide, that's the timestamp, including nanoseconds. HFS would just be to the second. Windows has 100 millisecond intervals, so files may appear in the future. Linux, it depends on your hardware clock.
So the current time of the Linux kernel is cached in memory and is updated by some kind of clock every 10 milliseconds normally. Python does a great job about abstracting operating system specifics away, but you really can't escape from the realities of the file system that you're running on. Case in point, Mac OS has a case in sensitive file system by default, which isn't
something that you can abstract away. So there could be different system calls or different ways that you find the last modification time of a file on different platforms. Python can abstract that away. What the actual modification time means, you can't abstract away. Network file systems can be even weirder, and they mess
things up completely. OS stat is generally really fast, except if it's on a network file system. That could require like a network access, right? So if you're for some reason developing a system on a network file system and that network file system lives on the other side of the world, for whatever reason you want to do that, the stat could have a huge latency. Clocks might be able to sync as
well. If you have two developers working on it, one clock might be completely wrong, one clock might be right. So you end up with one developer writing a file, the other developer reads the file, or the autoreloader kicks in, and the times are different. The times are one year in the future, one year in the past. And the time can be set by
anything. It doesn't, you can change the last modification time of a file arbitrarily. It doesn't mean that the file has been modified, and the M time not changing doesn't mean the file hasn't been modified. So the reason we use this, despite all those limitations, is it's really easy to implement, it's generally efficient unless you're
running on a really weird network file system, and it's pretty good cross-platform support. So here's a really simple implementation of an autoreloader that uses stat. Through a function called watch files, we have a dictionary that maps the file paths that we've seen to the modification time as reported by the file system. We have a wild true loop, and we
go through and iterate through each of the files returned from the previous function that we wrote. We call our stat on the path, and we get the modification time, and we get the previous modification time. And if they differ, then we exit with exit code three, otherwise we sleep for one second. So really simple, obviously there's a lot
more to this, if the file doesn't exist, if it's been deleted, et cetera, et cetera, this again is a simplistic implementation. So we found files from autoreloader, we can wait for changes, so how do we make it testable? So when I was researching this talk, I went through and looked at
a bunch of other projects that use an autoreloader. It surprised me, there are not many tests for autoreloaders in the wider ecosystem. So the tornado project has two, flask has three, and pyramid has six. And most of these are high level integration tests, they're like spawn a process, touch a file, assert that the process exits with exit code three. The point here is
not to shame these projects into saying, oh, it sucks, they don't have any tests, the point here is it is a hard thing to test, usually. Obviously, these autoreloaders work very well, and more tests doesn't always mean that it works, but it's a hard thing to test. And the reason is, is an autoreloader is generally an infinite loop that runs in threads and relies on a big ball of
Excel state, which is the file system. And each of these things is hard to test by themselves. But they're even harder when you combine them together. So how do we make things testable? And this isn't some crazy idea that I've had, it's just to use generators. So if we make our autoreload implementation a generator, the
only modification we do is add a parameter telling the function how long to sleep for, and we yield after each iteration of the loop. And it lets you write slightly better tests. So this is a simple test. We create a reloader, creates the generator, we call next on it, which ticks, so it has one tick of the loop, then it hits
the yield and returns to this test. We fiddle with a file somehow, we mutate the state of the disk, and we call next again, and it should exit code 3. So this is, we have a way to pause the autoreloader, essentially, and it allows us to make changes to the file system and then resume it. So you can extend this test to work with symbolic links, permission errors, files
being intermittently available, et cetera, et cetera. So we've made it a little bit more testable. And how do we make it efficient? So surprisingly there were two slow parts to the autoreloader in Django. The first one was iterating the modules, which surprised me, and the second one is checking for file system modifications. On an SSD, checking for the file
system modifications on OSDAT is really quite fast. Iterating the modules every second was the slowest part, especially you have a really large Django app with maybe 5,000 modules loaded. So how do we make it efficient? We can just use LRU cache. So we have a function, the one we wrote before, get files to
watch. We call another function with the frozen set of all of the modules that we have currently loaded. That function, the sys modules files, takes the modules and it has an LRU cache on it. And it returns the same implementation that we had before. So in reality, like, sys modules can change, but after an app is booted, it doesn't really change
that much. You might import something in a function, so it can mutate, but in the happy path, it doesn't. So you can just cache the results of all of this. And you can skip all the processing, checking if it's a zip file, resolving symlinks, et cetera. It can all just be cached into a single list and returned without needing to iterate through them. In the
Django implementation, on this MacBook with a solid state drive, it took up 30% of the time of the each-order related tick, which was quite a lot of time. So can we skip the standard library? Raise your hands here. Has anyone during a debugging experience or process edited a system library file, standard library
file? OK, so it happens, but not very many people. And maybe a specific type of developer would. In the general case, no one really does that. The average developer won't need to. So it would be quite good if we could just skip watching them. We could just skip all of the system packages, all of the standard library. We don't need to rewatch them. They don't really change. This is actually a
lot harder than it sounds, because how do we know where the standard library is? I googled it, I got to a Stack Overflow answer, and I was like, OK, good, this is going to be simple. There were 20 answers, and each of them were different, which is never a good thing. So the first one was this. Get site packages. That's cool. It's not available in a
virtual environment, so that's no good. We can call this function. That works, but it returns a single path. Some Linux distributions have more than one site packages directory. So I went to IRC, and I asked, and I was like, OK, I feel like I'm pretty experienced with Python. I've never needed to do this before. Why is it so hard? Am I missing something? Someone linked me to a project. I
think it was related to coverage, and I couldn't find the code snippet for this, but it used five or six different ways to try and detect the standard library, and it fell back to checking whether site packages is in the path of the file. So at this point, it boils down to risk versus reward. It might not be safe to do this in all cases. What happens if your project is called
site packages for whatever reason? And if you make a mistake, then it's going to frustrate users. The autoreloader won't work in all cases, and that's just not a nice place to be in. And no other autoreloader I could find does this. So other gains could be huge. You could reduce the number of packages or modules you're searching for by 70, 80%. It's not safe
to do in the general case, though it doesn't get done. But what you can do is use file system notifications. So calling stat repeatedly is kind of wasteful. You're just asking, are we nearly there yet? Are we nearly there yet? It would be nice if the operating system can tell you, or the operating systems can tell you, when a file is modified. So you say, tell me when this file is modified, and then you just
wait, and the operating system will tell you. So each platform has different ways of handling this. Watchdog is a Python library, and it implements five different ways, and it's 3,000 lines of code. And all file system notifications on operating systems are
directory-based, whereas we care about files, which makes it a little bit harder, because you get notifications for any file in a directory which has changed, and you need to filter them out and make sure that it's only files you care about. Notifiers are also potentially expensive. They're generally designed for longer-term monitoring. They're designed for a daemon that's watching a bunch of files, and it performs an action when a change is made. In our flow, we're going to
create and destroy them quickly. Every time a Python process shuts down, Django restarts it, has to create a new watch thing with the kernel, and it's going to use more resources than it should. So this is the actual feature that we wanted to add to the Django water reloader. It was using a system called Watchman from Facebook. So
Watchman is a daemon that runs on your machine, and it handles all of the icky differences between platforms for you. You register watches with it, it does the right thing, and it returns changes to you over a socket. And it handles Git changes, which is one of the reasons you want to add Watchman in the first place. If you check out a new branch in Git, you're
going to have hundreds of notifications flying at your process, saying that everything's been changed. But with this, it will wait until all of the checkout has finished, and then it will send one single bulk update telling you that the process is finished. Otherwise what might happen is your process receives one file has been changed, mid-checkout is going to restart,
and it's going to be in an inconsistent state if the checkout is still happening after the Django process has restarted. And the daemon can be shared with our projects. So if you have a JavaScript project that also uses Watchman, quite a few of them do, they can share the watches and generally make it more efficient. So this is how we do it in pseudocode with Watchman. We connect to
some kind of Watchman server, we tell it what files to watch, and in the while true loop, we just tell it to wait. This waits in a socket for a message from the Watchman daemon, and if there are any changes, we exit with exit code 3. In this way, we don't write any platform-specific code, and we don't have any issues regarding weird OSX versions that don't
use a particular library or something like that. Cool. So we've made it efficient as well. So the aftermath. The code was much more modern, and it's easy to extend. It was faster, and it can use Watchman if available. There were 72 tests. This is in Django,
and it's no longer a dark corner of Django. I might be a little bit biased in saying that, seeing as I wrote it, but it was certainly, in my opinion, a little bit better. So it's all good. I'm a genius, work first time, test for green, ship it, et cetera. Everyone's happy. Not quite. These are all issues from the Django ticket tracker after we released the version 2.2,
the new autoreloader. There are quite a few of them, unfortunately. So more tests doesn't always mean that it works. So this is my favourite issue, and the issue is it doesn't work on Windows, essentially, without using Watchman. So it doesn't work intermittently. So I want to
highlight this, because this is a great example of how you can make what seems to be a really simple optimisation that makes sense, and have it completely backfire in a way that you don't know why. So in the Django implementation that I discussed before, we might be watching for a file that doesn't exist yet. Some files, Python files in Django, if
they are there, that's a change. So for example, the models.py, if you were to create a directory with a models.py and add that file, the stat reloader would, the first tick, the first time it detects the file is there, it doesn't pick that up as a modification, because it's the first time it's seen it. Only the second modification, where it
can compare the modification time of the previous time to the current time, does it reload. So I was like, okay, that's a corner case, need to fix that. So we store the last time of a loop, and if the previous M time is none, which means that we haven't seen this before, and the modification time of
the file is greater than the time of the last loop, then we reload, okay? This doesn't work on Windows 25% of the time, and I could never work out why. So you would restart the process, would restart, and it just wouldn't work, but then you would restart it manually and it
would work, and on all other platforms it worked fine. If you know Windows and you want to tell me why this is, please, because it keeps me up at night and I don't know, but the point here is you get all kinds of strange behaviour, across different operating systems, across different disks, different configurations, and simple optimisations can bite you. So keep it simple if you're writing your own, and keep it really simple.
In conclusion, don't write your own autoreloader. Use this library. This is a library from Pylons called Hopper, and this is a fantastic library. In the abstract of this talk you may have seen, I was going to present a library that I wrote myself that took all of this knowledge and distilled it into a library. This is that library that someone else has written, probably better
than I could. So check it out if you are writing your own framework and you want to add an autoreloader, it's really good. Cool. I'd just like to thank Onfido, who's the company I work for. They're paying for me to come here and give this talk. We are in the business of identity verification. It's a really interesting problem space from the theoretical, like what is an identity, to the more
interesting, how do you handle millions of identity checks as fast as possible, with as little fraud as possible. So if you're interested in any of this, Onfido specifically, come talk to me afterwards, send me an email or check our careers page. Any questions?
So we have time for a couple of questions. So does your autoreloader handle
it properly if editors do weird stuff when saving files, like creating a copy first and then renaming? Can you say that again? I'm sorry. Like many editors nowadays do like safe savings, so it doesn't overwrite the file but rather create a new one and then over replace it. So Watchman handles that for you, quite nicely as well, so
it looks at common patterns where you create a separate file and then you do an atomic move. The stack reloader handles that as well, because it doesn't watch for the new file, the .new file, which is then moved. So as far as it knows, the individual path has been changed but not the other one. Okay, thank you.
Hi. If restarting the process isn't really an option, let's say you have plugins for an
application and you can still kind of control how the code in the plugin looks because you're defining the API. Would you say reloading without restarting is possible or just don't? So the exclusion to hot reloader is a plugin system where you have a single reference to that plugin or you control the plugin and you can safely or you
know that you can safely delete the reference and reimport it. It doesn't always work if you have, for example, a C module, extension module that that plugin relies on. It might have some initialisation code, you can't really safely hot reload those at all. So it depends. You can write a hot reloader in
some specific cases, a plugin one in general, a pure Python plugin one is definitely one of those. But it's safer if you find into weird issues to just restart the process. So a good implementation might be both. If you can detect a change, if you can somehow diff the changes and work out what needs to be updated, you could hot reload simple changes and
then fall back to a restart if needed. Thank you. Cool. All right. One super quick question. There you go. I tried Watchmen years ago when it just came out. Is the API better now, easier to use?
The API is better-ish, but it's still a little bit harder than I would have liked to use. The simplistic code that I showed where you register a file is nothing like you need to do. You need to work out, it's directory-based, so you need to work out the set of files, which common directories do you want to watch, minimizing the amount of directories that you do watch. It doesn't take care
of any of that for you. But in general, it's quite nice. I mean, the simplistic case, you say watch this and you just get notifications on a socket. And it provides utilities for filtering out specific files, regular expressions on the files, et cetera, in a way that's quite cross-platform and takes a lot of code off you. But it's definitely more complicated than I would have liked to. All right. Let's thank Tom again.
Thank you very much.