We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How to read (code)

00:00

Formal Metadata

Title
How to read (code)
Title of Series
Number of Parts
118
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
When you learn a new language, like French or German or even English, you first learn how to read. Then you learn how to write. When you learn a new emprogramming/em language, you first learn how to write. And that’s it. Imagine that you were never formally taught how to read. And that you were told that you should just figure it out … by writing … a whole bunch. How would that even work? I don’t think it would. If you can’t read. You can’t write. It’s that simple. Do you think that Shakespeare would be Shakespeare if he never read a single book in his entire life? No. Nothing is created in a vacuum. Good writers are good writers because they’re good readers. Just as reading is an invaluable skill so to is reading code. It’s a skill that’s never formally taught. But it’s a skill that is essential nonetheless. In this talk I’ll show you how to effectively read code so that might write better code.
Keywords
20
58
CodePoint cloudGoogolCodeDifferent (Kate Ryan album)BitText editorComputer animationSource code
CodeSurgeryComputer programmingBitArtificial lifeCodeIntegrated development environmentLine (geometry)Machine codeBlock (periodic table)Reading (process)Plug-in (computing)Latent heatInteractive televisionStudent's t-testOrder (biology)Figurate numberMultiplication signType theoryWave packetGreatest elementPresentation of a groupHypothesis
Lie groupMarkov chainChainComputer-generated imageryCodePrice indexIcosahedronElectronic visual displayExecution unitInclusion mapElectric currentMatrix (mathematics)Pattern languageRandom number generationParameter (computer programming)State of matterFunction (mathematics)Block (periodic table)CodeAuthorizationQuicksortSystem callNumberSocial classHidden Markov modelGroup actionMatrix (mathematics)ChainOrder (biology)Scripting languageDisk read-and-write headSurgerySpacetimeText editorBlogUnit testingMaxima and minimaObject (grammar)BuildingMultiplication signRevision controlCASE <Informatik>IntegerBitMappingAxiom of choiceData dictionaryType theoryRight angleOcean currentSubject indexingMereologyPredictabilityProcess (computing)Uniform resource locatorLink (knot theory)Slide ruleGoogolPoint (geometry)Row (database)Instance (computer science)HypothesisElectric generatorMessage passingFunctional (mathematics)Structural loadSubsetLecture/ConferenceSource codeXML
Axiom of choiceState of matterRange (statistics)Electric currentStatisticsComputer-generated imagerySimultaneous localization and mappingMenu (computing)CountingExecution unitPrice indexQuantumMatrix (mathematics)Uniqueness quantificationChainRandom numberCommon Intermediate LanguageElectronic data interchangeMIDIInstallation artInduktive logische ProgrammierungCodeLevel (video gaming)Single-precision floating-point formatRow (database)Subject indexingFunctional (mathematics)Matrix (mathematics)Stack (abstract data type)Point (geometry)Multiplication signEntire functionDecision theoryProcess (computing)Online helpNormal (geometry)Buffer overflowBlock (periodic table)CodeWindowException handlingInternet forumBitElectronic mailing listPresentation of a groupQuicksortSocial classRange (statistics)Group actionState of matterChainMathematicsRepository (publishing)SummierbarkeitObject (grammar)Inverse elementNumberLoop (music)Electric generatorFunction (mathematics)Transformation (genetics)Hecke operatorIterationScripting languageCountingRight angleInheritance (object-oriented programming)RobotString (computer science)Variable (mathematics)TupleDifferent (Kate Ryan album)Surgery2 (number)Uniform resource locatorSoftware testingFrame problemHash functionAttribute grammarPreprocessorControl flowReverse engineeringReading (process)Type theoryoutputOrder (biology)ProgrammschleifeIntegerBlogRandom number generationWindow functionBacktrackingGoogolForcing (mathematics)Computer clusterComputer animationSource codeXML
Transcript: English(auto-generated)
Alright, how to read code. All of these examples will be a bit different. I'm going to try to give you an Atom example. There's been a lot of VS code, there's been a lot of PyCharm. My preferred editor is Atom, though I believe you could do many of these
ideas in any preferred IDE developing environment. In order for me to get this kind of interactivity in Atom, I've installed a plugin called Hydrogen. Hydrogen is basically a plug right into Jupyter and allows you to execute code line by line. Why am I doing this? Well, live coding's
a little bit crazy, except this is going to be live coding with some training wheels. Everything has been pre-baked so that I'm not going to type. You can see I'm quite
is that reading code is a lot like archaeology, where you're not just going to read it from top to bottom. You kind of have to scrape and dig and figure out what's going on inside of these things. If you don't like the archaeology metaphor, perhaps surgery might
be appropriate. Again, reading code is really trying to understand what a specific block of code is trying to do. I'm going to take you through, in these next 30 minutes, an example of reading some code. To make it a bit fun, I thought, hey, let's try
to build a rock, paper, scissors bot. I had this idea because I'm teaching a full-time data science program in Canada, and my students just really needed to be introduced to this and find a block of code that we could leverage and then retool, inspect, and see exactly
what was going on. I chose rock, paper, scissors because humans are terrible at generating random numbers. We think we're quite good, but we're not. We have predictable patterns. I read this on Reddit a couple days ago. In rock, paper, scissors, you should just
always pick paper, because men will choose rock. On top of that thesis, I kind of believe that, hey, maybe it's the case that someone's going to hit rock, rock, paper, or paper, paper, scissors. There's going to be some type of predictable pattern. In order to dissect this type of pattern, I thought that I could use a Markov chain to try
and decide what was going on. Markov chains are quite complex, though, and so I wanted a starting point and found one Googling. This is a wonderful article, guest-posted by Ankur Nankan, who apparently has a book. I was reading through this blog post and literally just grabbed his code and wanted to better understand what was going on. So all of these
slides, or rather, they're not slides, all of these examples would be available at this GitHub link, so if you want to follow along, please grab it. And then if you want to read the post on which this is derived, you can go to this website. But the first step in reading any piece of code is to pretty much just copy and paste it,
and so that's exactly what I've done here. Here is the block of code that I lifted, and straight away it seems not too bad. I can sort of understand what's going on. Seems to be an initialization function, a next state method, a generate state method.
Seems to be using NumPy. Not quite sure what this thing's doing just yet. We'll have to give this a better look. But when we're lifting code, I like to put it into Atom such that I can instantiate all of it and pretty much just run and see what's happening.
So provided in the article was this transition matrix that defined where you are and where you're going to go. It seemed a bit clunky, but I'm just going to try it. Let's see what happens. Stuff this into an instance of the Markov chain, call it weather chain, and
then define where the states would be. It seems like that's what the author was doing with this block of code. So I'm just going to run and see what the outputs might be generated from this. There seems to be an argument in next state called current state. Okay, that makes sense. What does a Markov chain do? It tries to grab your current
state and predict the next thing. So it seems like if I just run this method call, okay, hmm, it's just generating some outputs. Oh, you see rainy, but most often it would hit sunny because that's the place in the transition matrix. So too snowy will look like this. And then if I look through the next method that was implemented in
this class, there's generate states. Seems like it might take a number. We'll see what's going on here. So all I've done is I've copy and pasted the code. I've run the pieces of code in my Atom Editor with hydrogen, and I haven't done anything
interesting. I understand what I need to put into it. I understand what I'm going to get out of it. I don't really understand how it works just yet. I see it's using a dictionary. Don't really know what this piece is doing just yet. In order to better understand and read this code properly, I will drop it
into a brand new text space, and I will try to do some surgery. Surgery that is going to be as minimally invasive as possible. And so what I've come up with is, let's just grab the pieces from the script before. The transition matrix and the states. Load those up. And my version of minimal
invasive surgery will actually be chop off the head. So what I recommend when you're dealing with a class and want to better understand what's exactly going on inside of this thing, is take all of the code and literally just lop
off the head. So if you do that, I'll just comment these bits out, and I'll replace them with something that wasn't intended for this, but I'm going to grab mock from unit test and build up this self object. This will allow
me to keep all of the pieces of code pretty much exactly as it is. So I'm going to lop off the head, bring in mock and self. So I'm going to bring in this object, doesn't look like all that much, but it will allow me to proceed with business as usual. So now I can run each of these pieces of code and
everything will actually just work. So if I go look at self now, this is the amazing part about atom and hydrogen, is you can actually just highlight bits of code and see exactly what's going on inside of these things. So once I've
loped off the head, I have brought in this self mock object. Now it's time for me to kind of dig deeper and see what's going on in these blocks of code that maybe I don't fully understand just yet. This seems confusing to me. Okay, that
seems to be like where all the heavy lifting is going. I want to now better inspect and move a fine-tooth comb through all of these pieces. This is my silly little inspecting gif. So right now we have states. This is rainy, sunny, and snowy. This is the transition matrix that matches up to
this. I want to see exactly what each of these pieces are doing though. So I'm going to run mock, build on this object. I'm going to see what this type of thing does. Okay, it actually just looks like it's turning it into a NumPy
array and making sure that it is at least 2D. Cool. I'll put states into states, and now I want to see what this thing's doing. So right away I'm thinking that something doesn't sit right. We're trying to build up two different dictionaries that seemingly, I think, do the exact same thing. So it
looks like one maps it to integers, and the other just maps it back to the original states. I understand why that is maybe happening. We're gonna do things on top of a transition matrix, and a matrix can only accept integer locations. So I get that, but I think I'm gonna have to come back to these
pieces. They seem okay for now. The code works. I love code that works, code that works is code that works. But I think we can do a better job, and I'm gonna come back to these things. But this is just a big, long piece of code to effectively map, it seems, the states back to integer placements. So I'll go
grab a current state, Sunny. I'll stuff it back into next state. I'll run this piece and see that, oops, if I actually run everything again, I perhaps did something out of order. There we are. Still giving me state outputs, but I
think we need to now go deeper on this object. So again, with Atom Hydrogen, reading code is pretty damn easy. I can collect and rather inspect exactly what's going on from the inside out. So current state, we know, is
defined as Sunny. I can go look at this and see, okay, that's Sunny. Well, going into the index dict, this is just a dictionary, I'm gonna pass in Sunny and get back zero. Okay, yeah, that totally makes sense. We are mapping zero to Sunny, and then I think what is happening here is this is your transition
matrix, and with a little bit of subsetting, we can pass in rows and columns. This would just be the zeroth row and all the columns. I would expect this block of code to just give me the first row back. That seems like
exactly what it's supposed to do. Looking at random choice, what I know about random choice is if I just drop this in, I can give it any sort of one, two, three, and I believe it will just give me one or two back. Seems like this p is actually just modifying the probabilities of the array associated
with this thing. So it seems like I'm just gonna pass in the states and it's gonna map it to probabilities. Okay, cool. I sort of know what's going on now. I hope you can follow as well. With these pieces, I'm now gonna generate or rather
see what generate states is meant to do. I'll build this loop, and now I'll try to take apart the loop to better understand what's going on. I've already done this, but effectively, I've copied and pasted it, and then I'm gonna grab the four i in range number and just let's start at one, see what that
does. So we're going to build the future states objects, set it to one, run our next state. Oh, that's the thing that we just teased apart up here. So run the next state, I think we're just stuffing that into an object called NS, pending that to the future, and then setting current state to the state that
we already built. So looking into future states now, oh it's sunny. Actually, it seems like this i business isn't even captured in the loop. So breaking apart actually seeing what was going on, I now notice that, hey, I could probably replace that with an underscore to kind of denote to me
that like this doesn't really matter. It doesn't matter which iteration we're in the loop, it just matters that we're running a loop a whole bunch of times to generate a bunch of states. Okay, so we've gone a little bit deeper. Now I want to make some changes. I saw that this doesn't feel all that right,
and this is probably pretty verbose. I'd like to change this to make it more legible so that others that are reading my code could better understand what's going on in here. So I'll move into adjust. And while I'm making these adjustments, I was talking about rock-paper-scissors, and I'm giving you
sunny, rainy, and snowy. Well that was the variables that the original post described. So I'm going to take sunny, rainy, and snowy, and in fact I'm seeing this transition matrix now, and I don't even know how I would codify
rock-paper-scissors into this. Because what I'm trying to do is build a rock-paper-scissors bot that would try and beat a human, exploiting the fact that a human probably isn't the best at generating random numbers. So it seems like this transition matrix is actually doing most of the work. How
would I even build that in the first place? Because I'm probably going to be getting states that are like, oh, it was rock-paper-scissors, or in a string of dates it was sunny, sunny, rainy, rainy, rainy, sunny, snowy, snowy. It's probably going to look like this. And so my transition matrix isn't actually all that super helpful. I'm going to need to figure out how I can move my
states into something that looks sort of like this, and back out the decision matrix. So as we move forward, let's just go to rock-paper-scissors and replace sunny with rock, and snowy with paper, and scissors with rainy. So I'll do
that here. I'll build up the states, and effectively when I reach this point in trying to read this code, I went to Google and Googled how to build a transition matrix. Found this post. You can go look at it, but in the interest of time I'm just gonna drop it in, copy and paste it, and just run
it, because that was the first recommendation. So the code that I grabbed from Stack Overflow to build up one of these transition matrices looks like this. So immediately, not really sure what's going on, but I'll execute the code and try to get to an output. So seems like the last bit is stored in
Probs. We'll see what Probs does, and that looks sort of like a matrix, except it's been normalized across the entire thing. So that's not precisely what I want, but it gets me pretty close. So I'm gonna backtrack a little bit and try and see what was going on in this whole Window function
business. So with Window, it grabbed states, and it tried to get me closer to Probs. So I'll execute this block of code, and oh no, it's a generator, so I'll explode it with a list and maybe just inspect the first five pieces. And it's
trying to match up, okay, what was going on here? Rock, rock, paper, rock, scissors, oh, I see what it's doing. It's building rock to rock, rock to paper, just chaining the last piece to the next piece. Well, hmm, this seems like a
big old function to just put the place that you're at to the next place. When I was reading this, a light sort of went off, and I imagined that, hey, I could just take these two lists, stagger one of them, and then zip them up. This is just like a vanilla Python function that comes batteries included.
So if I staggered this list, zipped it up, and then peeked inside with list, I'd be able to replicate everything that this function did up here. So by running the code, by getting to the output, something went off in me,
knowing that I could just reach for a zip, and just replace all of that nonsense with code that looks like this. So we're still not there. We need to get to probabilities. We're trying to take states, we're trying to move into a transition matrix such that we can build our micro-op chain. So taking one step forward, we will, I think,
go into counts and see what this was doing. So counts, it looks like, huh, it was just going from one state to the other and counting them up. Well, I know there's another Python tool for that in collections called counter. And so what I can do is take our zipped up stagger states,
force it into a list, put it into a new object, and run counter on top of it. By running counter, I think we get rock, rock, five, rock, paper, two, pretty much the same output. Okay. Now with these pieces,
it looks like this is just a dictionary with tuples as the hashes and values as the counts. Maybe I can take this and run items on top of it, unpack this thing, and have x, y, count be captured in each of these places,
and I'll just stuff it into a pandas data frame that will have those locations and these counts. So I've taken someone else's code from Stack Overflow, read through it, executed the code, see what was going on,
and now I'm almost there. It's still not perfect, but the original code wasn't perfect either. So with this, I'm going to not adjust but refactor my code and go over here. So running from the top again, we'll bring in rock, paper, scissors. And actually, before we get to the transition matrix creation
thing, I think we need to solve this problem because I wasn't thrilled with it when we started. We've got these two different things that are doing the exact same thing. Well, not exact same thing, but the reverse of each other. I don't want them to have separateness. I want it to be all packaged together, I want to be able to run it, and I want to be able to keep things clean. Well, as a data scientist, I know in my toolbox
that I have something called Label Encoder that is imported from scikit-learn pre-processing. Label Encoder is this overpowered tool, I think, that allows you to say, hey, our states that we had are rock,
paper, and scissors. We can fit states into this Label Encoder and actually run something called fit-transform that will take all of these states and transform them into integers. Everything I think needs to be an integer because we're doing index location in a matrix and we can't just plop rock,
paper, scissors in there. So with this chain and with Label Encoder, I believe now I can run a method called inverse transform. This will take the generated chain and map things back. This seems to be exactly what this block of code was trying to do. And just peeking out what was going
on in this thing, seems like I was mapping rock to zero and zero to rock. Well, what Label Encoder does is exactly that, and there's a built-in inverse transform method. So I'll take this chain that's been transformed,
I'll do the same bits that we did in the last script, so the staggered chain that was zipped together, I'll run counter on top of it, and then I'll describe and try to build out the matrix to pre-populate it. So to do this, I'll find out how many unique states there possibly are. It should just be three. And I'll quickly build out a matrix with a list
comprehension to do that for me. So this would be zero, zero, zero, one, zero, two. I can now unpack this whole counter object and then fill it in with some vanilla Python. So this is now my matrix. I think we've got
it up. So we will put it into a function called chainToMatrix. This is the exact code above. And run this on top of our chain. So here now is the matrix, but we need to normalize it such that I can
pass it into the transition matrix choosing function. It needs to be a probability that will sum to one. So is this normal? No, it's not yet. But my normalize function can be pretty damn easy. It's just taking the sum of the entire row and it's dividing it by whatever the row and column index
is by the entire row. So when I run this, here's just a couple examples, treat this as a row. These would be summed up to one as with this. Because it's a matrix, I need to apply this function on top of everything. And so I'll just iterate through every single row, for row and matrix,
do the normalizing process, append and bring it back. So now I can take the entire matrix and stuff it into a transition matrix. That looks all right. So now we've gone from a process of chained together states and I have
pretty much what I need to generate the Markov chain. So now I'm just going to run it and take the function and just rebuild it with my pieces. Before we had this whole big block using the index dict. But instead, I'm going to use my whole label encoder inverse transform business.
So I can peek inside and see these would be the classes. Here's the transition matrix. Let's run it. Seems like it works. So now that I have most of the pieces ready to go, I'm going to package it up. It seems
like something that would be useful to other people. So this is the code that I'd come up with. It's not all that different, save for this piece here. As I was building this presentation, it was like, oh, like I've kind of gotten rid of NumPy, I've gotten rid of pandas, it'd be cool if I just could build this all in pure Python. And so I
spent some time building out a label encoder for myself. It's not all that difficult. Maybe you can take this presentation, the ideas that I've shown you and kind of peel apart this for yourself. I knew I wouldn't have time to fully explain it. But the rest of the functions, convenience functions, the chain to matrix, the normalized, the normalized matrix,
they've all been presented. You've all seen them. And now I've just taken the Markov chain that was given on the blog post and rewritten some of these methods to do exactly that. So now it's time to play with this thing. I've actually packaged it up for you. It's available on my GitHub
or PyPI as Mark. So you can pip install it. And what this will allow me to do is bring in a Markov chain from Mark, take the chain that we've been playing with, rock, rock, paper, rock, scissors, build up that chain, and now predict next states, generate a whole bunch of next states,
or just use something different. If you read the synopsis of this presentation, I made reference to Shakespeare. I was like, oh, it'd actually be kind of cool if I went to Gutenberg and grabbed a text of Shakespeare and see how well my Markov chain could do on top of this. Who needs GPT2? So we'll build two
functions to quickly go through this, take the entire text of maybe something like Othello and generate some sentences using our pure Python Markov chain builder. And so I've been running a couple of these. They seem like they
look like they could be Shakespeare. There's a bunch of just nonsense. I'll let you read through a couple and see if they pass the sniff test. Most don't. I wanted to take something like Oz, so the Wizard of Oz, see if I could do the exact same thing. Runs pretty quick. This is actually
running through the entire book, did it in like 20 milliseconds with pure Python. Here's some Wizard of Oz Markov chain-generated sentences. And they're not perfect, but they do all right. So in summary, what should you
take from this? Reading code is a lot like archaeology. It's a lot like surgery. And if you're going to perform surgery, you need a tool. My preferred tool is atom and hydrogen. And I'd like you to not be scared of copying and pasting code. Make sure that you are given credit where credit is due
and making changes that are appropriate. But when you're copying and pasting code, in order to read it, get to an output as quick as possible. See what the inputs are, see what the outputs are. That gives you a better sense for what was going on. If your code is encapsulated in some type of class, these things are hard to get into, except if you just lop off the head and
use my little self-mock object instead. Once you've done that, rerun all the attributes and the methods to make sure that you didn't break anything, and then start inspecting those blocks of code with a fine-tooth comb that you are sure they're doing the heavy lifting. If something is especially complex, you should probably take it, do some inlining, see what each step is trying
to do, and then peek inside of those things that are iterators by exploding it with some type of list. Loops can often be difficult to parse and see what's going on, and so delooping the loops by just extracting the code seems to work. You should try and slim down and replicate your outputs
from those borrowed bits of code. And anytime you're working with a matrix, you should probably use label binarizer, label encoder. These things are just overpowered, they're really great. And then this should be obvious, but maybe it's not often. Make sure after you've made all these changes that things
still work before you try to encapsulate in a class. But that is my presentation. Here is the tool that you can use to play with it. All of the code is available at this repository. I think I have two minutes, and so I'll toss it over for some questions. Thanks for your attention.