Deletion Driven Development: Code to delete code!
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 67 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/37667 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Place | Cincinnati |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Ruby Conference 201654 / 67
3
5
9
10
14
18
19
20
21
22
23
27
28
29
32
33
34
36
38
40
41
46
51
56
57
58
59
62
66
00:00
Hill differential equationMachine codeSystem callTwitterLattice (order)Arc (geometry)Computer animationEngineering drawingLecture/Conference
00:31
Different (Kate Ryan album)BitState of matterInheritance (object-oriented programming)Computer animation
01:07
Twin primeForestException handlingLie groupMetropolitan area networkComputer animation
01:36
Line (geometry)Office suiteSoftware maintenancePoint cloudForm (programming)Enterprise architectureCartesian coordinate systemData managementFunctional programmingSoftware frameworkComputing platformMultiplication signPower (physics)Programming languageMachine codeOpen sourceGoodness of fitTerm (mathematics)TouchscreenLatent heatCASE <Informatik>EmailComputer animation
03:30
Machine codeInheritance (object-oriented programming)Latent heatFactorizationSoftware developerStapeldateiText editorImplementationReduction of orderNoise (electronics)Machine codeProjective planeState of matterSoftware maintenanceMultiplication signComplex (psychology)WordComputer animation
05:09
ParsingMachine codeMereologySequenceRule of inferenceProgramming languageFormal grammarString (computer science)Context awarenessComputerPositional notationForm (programming)Predicate (grammar)Symbol tableOperator (mathematics)Regulärer Ausdruck <Textverarbeitung>Directed setConvex hullTerm (mathematics)BuildingAbstract syntax treeParsingDerivation (linguistics)Electronic program guideRadical (chemistry)BitRepresentation (politics)QuicksortMachine codeVariable (mathematics)CodecContext-free grammarData structureSyntaxbaumProgramming languageParsingType theoryFluid staticsAbstract syntaxVirtual machineRule of inferenceSequenceCoefficient of determinationPositional notationSet (mathematics)Instance (computer science)Formal grammarParsingSocial classSymbol tableCombinational logicSlide ruleReading (process)Group actionPoisson-KlammerExpressionSoftwareError messageArithmetic meanForm (programming)Extension (kinesiology)Dynamical systemAreaComputer fileStatisticsToken ringLevel (video gaming)Validity (statistics)Control flow graphDiagramComputer programmingAlgorithmMereologyPoint (geometry)Right angleAbstract syntax treeSpherical capTerm (mathematics)Predicate (grammar)Computer animation
11:06
ParsingParsingMachine codeComputer virusProcess (computing)MereologyBefehlsprozessorCoprocessorMathematicsDefault (computer science)Data typeMenu (computing)Exponential functionShift operatorInheritance (object-oriented programming)Block (periodic table)System callMereologyExpressionInformationMachine codeCoprocessorSystem callHash functionParsingSocial classProcess (computing)Electronic mailing listData structureParameter (computer programming)Point (geometry)File formatTrailDefault (computer science)Computer configurationContext awarenessInheritance (object-oriented programming)Different (Kate Ryan album)Ocean currentInstance (computer science)Revision controlSymbol tableBlock (periodic table)Positional notationParsingShift operatorNetwork topologyFunction (mathematics)Vertex (graph theory)SyntaxbaumFormal grammarPointer (computer programming)Type theoryQuicksortInterface (computing)Demo (music)Computer animation
15:42
Inheritance (object-oriented programming)Shift operatorProcess (computing)Stack (abstract data type)RankingElectronic signatureSocial classCoprocessorParsingParsingMachine codeCellular automatonTraffic reportingMereologyBuildingHash functionExponential functionMathematicsKey (cryptography)Computer virusChord (peer-to-peer)Compact spacePersonal digital assistantComplex (psychology)Dean numberSocial classProcess (computing)CoprocessorCoefficient of determinationSign (mathematics)CASE <Informatik>Attribute grammarMessage passingPerfect groupBitProblemorientierte ProgrammierspracheContext awarenessSyntaxbaumElectronic signatureBlock (periodic table)Inheritance (object-oriented programming)Different (Kate Ryan album)SpacetimeMathematical analysisUniform resource locatorHash functionSystem callWordNatural numberOcean currentParsingVertex (graph theory)ExpressionKey (cryptography)Stack (abstract data type)1 (number)Network topologySet (mathematics)Row (database)Game controllerEndliche ModelltheorieException handlingGroup actionValidity (statistics)Logische ProgrammierspracheLengthComputer configurationFunction (mathematics)Graph coloringMachine codeTrailObject-oriented programmingNumberPoint (geometry)Line (geometry)Symbol tableString (computer science)PlanningInformationMappingMultiplication signComputer fileMereologyTraffic reportingInterior (topology)File formatSoftware maintenanceGoodness of fitParsingDirection (geometry)Computer animation
25:17
MiniDiscPartition (number theory)ParsingProcess (computing)Machine codeSystem callTraffic reportingParsingComputer configurationPattern languageFocus (optics)Key (cryptography)Hash functionSingle-precision floating-point formatTrailSymbol tablePoint (geometry)Roundness (object)Process (computing)Right angleCore dumpAuthorizationNumeral (linguistics)Generic programmingVirtualizationImplementationComputer configurationCASE <Informatik>Form (programming)Type theorySystem callProblemorientierte ProgrammierspracheMultiplication signParameter (computer programming)Partition (number theory)MiniDiscSocial classProgramming languageCoprocessorTraffic reportingProjective plane1 (number)Perfect groupAttribute grammarIntegerEndliche ModelltheorieHash functionParsingQuicksortAsynchronous Transfer ModePattern languageSet (mathematics)Row (database)CountingDatabaseMachine codeEntire functionContext awarenessTable (information)Latent heatGradientWeb crawlerInternet service providerShared memorySpacetimeExpressionComputer fileComputer animation
29:39
Machine codeContent (media)Programming languageMachine codeCombinational logicParsingHand fanLibrary (computing)Computer animation
30:41
Machine codeMereologyMessage passingMachine codeSoftware bugComputer animation
31:10
Coma BerenicesXML
Transcript: English(auto-generated)
00:17
Welcome to a talk that I like to call Deletion Driven Development.
00:21
My name is Chris Arkand. Here's what I look like on Twitter and GitHub. I'm a really social person and love meeting new friends at conferences and whatnot, so be sure to say hello. My username everywhere is just Chris Arkand. We are here in Cincinnati, Ohio, the lovely city of Cincinnati. I've never been here before, but I'm really enjoying the week, especially the giant plates of cheese with a little bit of chili that they put in them.
00:42
It's really good. I hail northwest of here up in Minnesota on the Canadian border. Minnesota goes by a bunch of different names you've probably heard of before. One is the Land of 10,000 Lakes. There's the North Star State, that super cold place. If you're from Canada, you might know it as the Great White South.
01:04
And if you're at a Ruby conference like this one, you might vaguely remember it as that one place where those JRuby guys live, right? So like these two guys, I live in the Twin Cities of Minneapolis and St. Paul. We have absolutely gorgeous summers there and have beautiful forests and lakes to enjoy.
01:21
In the winter, we love playing hockey, and the winters always look as beautiful as they do in that bottom picture, except if you've been there, you know that I'm lying. It often looks a lot like this. That's a man on a snow bicycle riding through a blizzard. After such a blizzard, sometimes things look a lot like this. This is a line of cars parked on the street, if you can't see.
01:43
But hey, I'm just going to repeat that the summers are lovely, and you should at least come visit during the summer sometime if you're nearby. So I'm a Ruby developer at what Aaron Patterson has always described as a small startup, you might have heard of us, called Red Hat. I work remotely out of Minnesota. There's no engineering office there.
02:01
And at Red Hat, I work on ManageIQ. So ManageIQ is an open source cloud management platform that powers Red Hat cloud forms downstream. It basically aggregates all of your enterprise infrastructure into one place, and adds a bunch of functionality on top of it. The code base is hosted on GitHub. It's easy to find, and you can learn even more about it talking with me afterward or at ManageIQ.org.
02:23
And we're always on the lookout for good developers. If you're interested in joining us, please see me, send me an email, whatever. I also have a ridiculous amount of swag with me at this conference, so if you feel like you don't have enough stickers or shirts or lanyards or screencloths or even ManageIQ candies, please seek me out here at the conference.
02:43
So why am I here? I am here because I love programming. And as such, you can probably imagine that I love writing code. However, there's something else I love even more. And that is I love deleting code. So Ruby has been a successful programming language for some time now, and we as Ruby developers might now maintain legacy applications that have been developed on for many years.
03:05
A consequence of our long-term success is that these applications may contain unused, obsolete, and unnecessary code. Now I'm going to tackle a specific case today. I'm going to talk about methods that stand worthless and dead, unused by any callers in the application.
03:20
And that might be fine for frameworks where public API is exposed and never called within the framework itself. But in terms of an application, it just adds cruft. Now how does code like this end up in our projects? There's a couple of reasons that I can think of. Think of like a developer from the beginning of the project years ago, adds methods that they think will be useful someday but actually aren't.
03:42
They never actually get used, the implementation might change underneath them, and they don't actually work anymore. That's a kind of over-engineering. There's also poorly written code. So imagine a brand new, inexperienced developer joins the project, which is great. They write a very specific method that isn't very flexible and is completely unhelpful beyond anything besides the single spot that it's used.
04:04
Maybe it could be refactored and written a little better, more generally. Now hopefully these two examples are just caught in code review, but sometimes they aren't. It also could be that things have just been refactored over time and methods just aren't needed anymore. So you might ask, who cares? The short answer is that unnecessary code is confusing.
04:22
It adds complexity where there shouldn't be any, and it creates an unnecessary maintenance burden on future developers. And it makes you scroll more in your text editor, which is annoying. But don't just take my word for it, other people think so too. There's a great post by Ned Batchelder called Deleting Code from 2002.
04:40
And in this post there's a snippet that I'd like to share. If you have a chunk of code you don't need anymore, there's one big reason to delete it for real rather than leaving it in a disabled state. To reduce noise and uncertainty. Some of the worst enemies a developer has are noise or uncertainty in his code, because they prevent him from working with it effectively in the future.
05:03
Now before we get wound up trying to implement a feature and already lost in the noise and uncertainty, I ask, what if we could programmatically find unused code to delete ahead of time? Before we're trying to implement something else in that area of code. It turns out we can, to an extent. So today I'm going to describe how a static code analyzer can be built to find potentially uncalled methods.
05:26
Now because Ruby is a dynamic duck type language and this analyzer is only static, it's not going to be 100% accurate. But it has the potential to point out some areas in our code that we can clear out some cruft and add a bunch of deletions to our GitHub stats.
05:40
So we start with some Ruby code that we want to analyze. The first thing we need to do is transform the code into some data structure that we can reason with. Which brings us to part one, parsing the code. Now some of you know how language parsing works and are very, very aware of how it works with Ruby, some of you maybe aren't. And I think it's important for everyone to understand how things work from the ground up.
06:02
So this is really a high level overview of how general language parsing works from a grammar, how Ruby does it, and how we're going to do it. So, do you understand the following sequences of characters and how do you know? The boy owns a dog. Okay. A boy bites the dog.
06:21
Kind of weird, but okay. Loves boy the. Now how could you programmatically determine which of those are correct and which are not? There's a way to do it and I'm going to throw a couple definitions that might be very familiar to you. One is a context-free grammar or a CFG. It's a set of rules that describe everything contained within the language.
06:42
It basically answers the question, what sentences are in the language and what are not? We also have Bacchus-Narr form or BNF. This is just one of the two main notation techniques for describing the CFG. So here is a context-free grammar for all of the sequences of characters that we showed you earlier.
07:01
It's really simple to look at. Everything you see on the slide here is a symbol. And symbols are split into two groups. The non-terminals with the little brackets there and the terminals that I'll put as all caps. The way it works is a symbol on the left is replaced with an expression on the right. An expression can be a combination of non-terminals or terminals and a non-terminal is always replaced via some rule.
07:26
Now terminals are the actual token found within the language. They terminate at that point. That's why they're called terminals. So let's look at that first example. The boy owns a dog. Now notice I didn't say sentences, I said sequences of characters. We don't actually know that this set of characters up here is an actual sentence.
07:43
But we have a rule for it and we can try to apply it. Now if this thing is truly a sentence, that means it has to be a subject followed by a predicate, at least within our simple grammar. Now if we try and split it out and say the boy is a subject and owns a dog is a predicate, we can keep following the rules by replacement.
08:01
So if the boy is truly a subject, a valid subject, it has to be an article followed by a noun. And again, we'll go through and see, alright, let's try an article is the and noun is boy. Again we go through, we see the rule for article, it has to be the terminal the or a, which it is.
08:22
Doing the other side, there's boy or dog, terminals found, it parses correctly. You can also do the other side, parsing down until you find those terminals. So in the end, we've identified every part of what we call a sentence in our grammar. And if we do a little bit of rearranging around, we can see this.
08:43
This is a parse tree. It's the discrete representation of our language that we can use to reason about the sentence. What about a boy bites the dog? Again, if you go through it all, it has the same sort of structure, so it parses out correctly. It's totally fine. But do boys often bite dogs? Maybe, could happen, seems a little weird.
09:03
We'll come back to that. Loves boy the. As you can imagine, this one doesn't work out. There's a reason why. Well, if this is a sentence, it has to begin with a subject. And if that's a subject, it has to begin with an article, and an article isn't valid if it's anything but the or a. It's a syntax error. It doesn't belong in the language.
09:21
So in programming terms, the written sentences or the code we're talking about here equate to these conclusions. The boy owns a dog makes sense. A boy bites the dog is technically correct. It's maybe not what we meant, though. In software terms, that could be a software bug, right? You write Ruby, you don't have a syntax error, but it might be the wrong thing. Well, maybe a boy bites the dog, maybe he doesn't. I don't know.
09:43
And then loves boy, though, is a syntax error. It doesn't work. So what does this all have to do with Ruby? Well, you can ask Ruby the same thing. How does Ruby know the meaning of these characters? How does Ruby know that this is a class definition, though, named person? That it has two methods, initialize and say hello, that there's an instance variable name, et cetera, et cetera.
10:01
Well, Ruby does the same thing that we did. My English examples were easy because we skipped lexing, tokenization, and actual programmatic parsing. We just kind of went on intuition saying, this looks like a subject. But hopefully it captured the high-level essence of parse trees for you. It's a bit more complex in how Ruby actually accomplishes it. Here's what CRuby does.
10:21
So Ruby has the infamous parse.y grammar file, which it gives to Bison. The resulting parser code is used to scan through your Ruby files and tokenizes them, then parses the tokens into an abstract syntax tree, which is then compiled to instructions for the virtual machine. Now, the parser generated from Bison is what's called an LALR1 parser. And I'm not going to describe how an LALR parser works for you today, because it's a bit out of scope.
10:44
However, I'm going to plug this very excellent book. So the diagram I just showed you is from this book called Ruby Under a Microscope. It's by a fantastic human being by the name of Pat Shaughnessy. In it, he explains all about Ruby internals. The first chapter is all about tokenization and parsing that I just showed you,
11:01
including an in-depth explanation of the parse algorithm. So it's an early, fascinating read. You should definitely go check it out. So how are we going to do it? We're going to do something a little different. We're going to use a gem called Ruby parser, which is a Ruby parser written in Ruby using RAC. RAC is an LALR1 parser generator. So let's take a look at an example.
11:20
We'll have a class named person with a method greet that takes in a name and just says hello name. So you can initialize a new person and say hello RubyConf. So Ruby parser has a class method called forCurrentRuby that brings an instance of Ruby parser for a grammar for the current running version of Ruby, which you can then feed the parse method to that.
11:45
And it gives you back this. Now this is an S expression, or a sexbeat. It's a notation for nested list data, and it originally comes from Lisp. Now, nested list data is tree-structured, so it's the perfect notation for describing parse trees.
12:01
So, awesome. This data contains the structure of our code. You can see the block node up here is the top-level context with a class named person. Within that is a definition node named greet that takes in an argument name, etc. So now that we have a parser to put our code in a format we can work with, we now need a way to process it, which brings us to part two, processing the S expression.
12:25
So before we do something really useful with our sexbeat, we need a way to easily manipulate it and start getting some general information that we care about. Information like finding where exactly methods are defined and in what classes. We're going to begin to process everything by building a very minimal tiny class called MinimalSexbeatProcessor.
12:44
And the goal of this class is simply to run dispatch calling a method given a node type in our S expression if it exists. So in our initialize method, we'll build a sort of dispatch hash. We'll take the public methods in this interface, find all that start with the prefix process underscore,
13:01
and key them within a hash according to their suffix. That is, if we had a method named process definition, we would seek out the method from its prefix, take the suffix, and place the method name as a symbol within the processor's hash keyed by the suffix. Note the name corresponds to a node type in our sexbeat.
13:24
Next we'll write the main method for our processor. Every node in the tree will be passed into this method. It simply looks into the processor hash to call the correct processor method given the current expression's node type. If there isn't one, we call a default method that we can set as an option. If we don't have a method to call and didn't set a default, let's just return nil.
13:44
Also, we'll put a cute little warning output to say we didn't recognize the node type and are calling the default method if that's actually what we're doing. Now this class, if you noticed, is pretty worthless. There are no processors. It's simply a base class. So to demonstrate an example of a processor subclass, I'm going to define a subclass called SillyProcessor.
14:07
Within our SillyProcessor, we'll define two processor methods. If we encounter a method definition in the expression, process definition will be called. Process not definition will be the method that we'll set as the default for all other node types.
14:21
So the methods just call puts to tell us information about the nodes. You can see if we encounter a method definition node, we'll say processing a method definition node in all caps. And for everything else, we'll just say here's the node type. So both of these methods call some method called process until empty. Now process until empty iteratively shifts and calls process on the next node in the expression until the expression is empty.
14:46
Every processor method calls this in the end to start parsing the next node. Lastly, we'll fill our initialize method to call super in the parent first and set the options that we want. So we'll say, hey, if you don't understand, if you don't have a processor for this current node, we'll call process not definition.
15:04
And we're going to turn off warnings because we expect most of the nodes probably won't be identified. It looks like that. So we'll put together a little demo. Again, we'll just get the S expression from Ruby parser and then call process on that.
15:24
And we can see this is what it looks like. It's about what you expect. It goes through, finds all the nodes, and for the method definition node, it does something different. Now, you might be thinking, wow, that's pointless. That's because it is. But we've now added the next tool to our tool belt. Now we're at the point that we can run whatever code we want at a given node.
15:41
And this allows us to build more complex things like, say, a method tracking processor. So using some information from Ruby parser, we can now record where we see method definitions in classes and their line number. We'll set up this class with the same options as our silly processor, but we have a couple new things.
16:00
We have two new stacks, a method and a class stack. We're going to use these to keep track of where we are in the code or the tree. We also have a method locations hash that will populate with the method signature as keys and the file name and line number as values. Now the file and line number will be taken directly from Ruby parser.
16:23
We'll define a couple of processor methods. So process definition will just shift off the node type on the expression. The next thing will be the name. And then we'll call this in method that gives a block and this process within there. So this doesn't do much other than signify we're in a method by calling in method. Class does the same thing.
16:41
So process class, we'll call in class. And then the process until empty is the same thing that you saw before. Here are the two location methods that actually do the work. Now this might be a little hard to see, but it's very simple. They both just add the current method or class onto their respective stacks and pop them off once we yield to the block passed in.
17:04
With in method, we also record the current method signature in our method locations hash with its location. So another thing to think about is that if you enter a new class, that's a new method space. Different methods could be in that context. So we save the old method stack when we go into a class, use a new one for that class,
17:22
and then revert back to the old stack when we're done processing it. Lastly, a couple little helpers. So the current class name would be the first thing on the class stack. The current method name would be the first thing on the method stack. And then we'll record a signature that you're used to seeing with a class name with a little hash and the method name.
17:46
Great. So let's expand on our example a little bit. We have a person with greets. We're going to add a little say goodbye, which does magically almost the same thing. We have a class called dog with a bark method that woofs.
18:01
And then we're only going to call greet there. The important thing to pay attention to is where the definitions are defined, right? So we have greet on two. We have a method on six and 12. So if we do the same thing as before and pretty print the method locations, you can see that we found person greet on two, say goodbye on six, and bark on 12.
18:23
Perfect. So awesome. We now know where methods are defined. The generic processors that we've built so far to process the S expression tree and record method locations provide the footing with which we can build our tool on. The only thing we need to do now is process the call nodes within the tree, see what's being called, and line them up with what colleagues we're now tracking.
18:43
Which brings us to part three, building the dead method finder. Yes, we finally reached the point where we can build the dead method finder that we've been working towards.
19:02
So we'll do that. Dead method finder will subclass from method tracking processor. So in this one, there are two important collections that we'll maintain. Known is a hash containing sets. We'll use this to maintain a mapping of method names to the set of classes that define them. It also will be called, and this will just be a set of methods that were called.
19:25
Here are two processor methods. So for process definition, we're going to key into that known hash, adding the current class name being processed to the set of classes that call this method. Then call process until empty on the remaining sexp nodes.
19:41
With process call, we'll add the method being called to the set of called methods. Next, we'll define an uncalled method. This is where we'll take the difference between known and called methods, in other words, the ones that weren't called. And for each uncalled method, we then key into the known hash to find where a method by that name is defined by class and line number.
20:02
We'll also have a little helper method called plain method name. This is just because the method name for Ruby parser is a string, but we're going to use symbols. It looks like that. So, let's expand on our example once again. So with greet and say goodbye, instead of calling puts directly, let's have a little helper called speak that does it for us.
20:23
We'll also have a pet dog in person that takes in a dog object and sends it pet. And with dog, we'll add a little attribute accessor called fed, because hey, maybe you want to keep track of whether or not you fed the dog. And we'll add a pet method as well. So to add a little bit of realism, I have a dog named Ruben.
20:41
So I will call myself, the person.new, and my dog there. I'll greet Rubikage, or Rubikage, Rubiconf. Ruben will bark, and then I will pet Ruben. So, we process the sexbee, and then we put out the uncalled, and we see person say goodbye is supposedly uncalled,
21:01
and dog pet is supposedly uncalled. Now if you're looking closely, you'll find that this is wrong. There's a problem here. Supposedly, on line 38, I pet Ruben, but we're not finding that that's actually called. Now why is that? It's because we've hit an edge case, right?
21:21
The problem is that our process call is using send as the method being called, which it is really, it is using the method send. But we want to take into account that sending implies a method call directly. In other words, we've hit an edge case. So looking at our S expression for that, we can find where the actual method being called is, and add some logic to say that that is the method being called,
21:40
and handle the edge case. We'll say, hey, when the method being called is send, public send, or underscore underscore send, look through that, find the literal being sent, and we'll say that that is the method being called. Let's try it again. Great. That's there, that's being called, so it's no longer found in our output.
22:01
Now this is improvement, but it's still not correct. What about this guy here? We never use this attribute accessor, FET, right? So it should be in the output. Looking at our S expression for that, we realize that at our accessor is itself a method call that defines methods, right? A getter and a setter, which is why our method tracking processor doesn't find them.
22:22
Let's add another case in our caller processor to handle that. We'll say, hey, when you encounter attribute accessor, go through and record that as a known method. Now record known method does the same thing that we were doing in our process definition method, except it also double checks that the method location was recorded.
22:43
While we're here tinkering, let's also pretty up that output into something that's a little more readable. Let's define a method called report that looks through the uncalled methods. For each of them, let's look up the location of that method via method locations in our parent class, throw all of that into a pretty formatted lens report, and skip this class if there are no uncalled methods.
23:03
If there are, we join it all together, and we print it out. Processing the code now is just getting the S expression, processing it, and calling report. So here's what it looks like after we've fixed the adder accessor case and prettied up our output. You can see that, awesome.
23:22
Say goodbye is supposedly uncalled, which is not. And our attribute accessor is not used either, which is not. Awesome, so great, perfect, awesome. Now remember that this static analysis is not always 100% accurate, and that these are potentially uncalled. So we do some manual checking ourselves to be sure that these actually are deleteable.
23:42
So as suspected, it looks like they are deleteable in this very easy example, so we delete them. Doesn't it feel great? Isn't deleting code fun? It's perfect. So hey, we did it, we made a dead method finder. Now we can start finding code to potentially delete left and right, yes? Time to open a dozen pull requests with a bunch of red deletions, right?
24:00
Are we truly done? No, we are. Ruby is complex to parse. And Ruby has a lot of edge cases. But the good news is, adding edge cases is easy. So think about, for example, ruben.fed equals true, if we actually did use that accessor. If we add that to the code here, you'll notice that it's still marked as uncalled here.
24:24
Now if my dog were here and saw him that being fed was an issue, he would probably say, just deal with it. And that is an actual picture of my dog wearing sunglasses. Well, back to the edge case. So it's an attra assign node, which means attribute assignment. The variable ruben has the message fed equals sent to it with the value of true.
24:42
So we can record that, again, as a method call. Perfect. Looks great. There's so many other edge cases though, what about Rails methods? So every bit of Rails DSL and controllers and models that you're used to using would be an edge case. And there's a lot of them, right? There's after commit, there's before create, there's after create, there's before update, there's before destroy,
25:03
there's before filter, except it's not called before filter now, now it's before action. Around save, validate, validates, validates length of, after validation, validates format of, validates cuteness of, validates confirmation. Yeah, you get it. Makes me sad. And what about my own DSL? So in ManageIQ, we actually have our own virtual column implementation for active record.
25:22
Now this digs deep into active record internals to allow us to treat any method as a database column amongst other things. It's mainly used for reporting purposes, reporting attributes of entire tables with extra attributes sprinkled in. So for example, you could have a class disk that has many partitions and you
25:41
might define a virtual column called allocated space, the type of integer that uses those partitions. So we actually add a DSL to Rails models in the form of these virtual column calls. The point here though is that this DSL and Rails DSL calls aren't that difficult to handle, it's just another edge case. All of these methods look essentially the same, most of the time it's arguments of symbols naming methods to be called.
26:06
So we can go in and call them all, basically. The point that I'm trying to make here with all these little edge cases is that as with most things, with the right tools, the job isn't very difficult and customization is easy. The other thing that's easy is that you can execute this code on your project right now.
26:24
So there is a Ruby gem called debride. Now the author of this gem told me last night that it's pronounced debreed, but apparently Gorbipuff's eye doctor proclaims it as debride, so we'll go with that. To debride something is to remove dead contaminated or adherent tissue and or foreign material, which sounds a lot like deleting useless code.
26:44
So when I first thought of programmatically finding dead code to delete, I went down the exact same path we just did, starting with rack and Ruby parser. I then discovered the lovely simplicity of processing S-expressions with a gem called sexbprocessor. It was created to easily do generic processing of S-expressions given by Ruby parser.
27:03
It also provides a method-based sexbprocessor subclass to do the method and class tracking that we did with our method tracking processor today. Then to my great delight, I just stumbled on debride, which does exactly what we did today with our dead method finder. What's more is that everything you see here on the stack from Ruby parser on is written by the same person.
27:23
Each one of these projects is written by Ryan Davis of the Seattle Ruby Grid. And you people are in serious luck because Mr. Ryan Davis is here at this conference. In fact, Ryan, are you here in this room? There you are. Can I get a round of applause for Ryan for all the fantastic stuff?
27:41
Thank you. So I've been hacking on debride off and on for the past several months, customizing it for ManageIQ, and finding crufty code to delete on a project that started on Rails 1.2.3 nearly a decade ago, and thought it would be fun to rebuild the basic concepts for you today. All of the code you've seen is a modified and minimalist example of ZenSpider's very excellent work.
28:04
So what does debride provide that we haven't covered today? Well, it covers more edge cases. Our simple method tracking processor and dead method finder are the core of what debride does, but there's so much more to consider. What about defining methods on singleton classes? What about numerous other uncommon Ruby syntax like calling methods with colon colon, all those little edge cases?
28:22
Ruby is a crazy flexible language and there are many cases yet to be handled. Debride also adds all sorts of lovely little options like excluding particular files, whitelisting methods based on a pattern, focusing on a particular path. There's a Rails mode just like the one you saw earlier.
28:40
So I mentioned I've been hacking on debride to find even more dead code. Now besides adding your own DSL, whitelisting patterns and all that, you can easily do something like this to find other criteria for what might be deleteable code. So remember what I said about methods that are maybe way too context specific that might have only one caller? Here's a really hacky way to find those methods.
29:01
Instead of keeping a set of call methods, we can keep a hash with every value being a call count. Then every time we counter a call, we just increment that count. And then the ones that are called once are the ones that are called once. Perfect. So those are, again, cases where a method might not even really need to exist.
29:21
Maybe there's something that's way too specific that the caller itself could actually just handle it there. Just depends on, you know, what you're doing. So I've been busy enough hacking and deleting code that although I've opened a couple little things in the project, I've still got plenty more that I want to refine and then share. So pushing more work upstream is definitely my future consideration.
29:41
Yes please. Ryan says yes please. So remember, tools like this are but one tool in the toolbox when finding ways to clean up your code base. Today we looked at one way, parsing and statically analyzing Ruby itself, to find potentially uncalled code. But there's so many more awesome tools to use in combination. For example, there's Old Code Finder by Tom Copeland,
30:02
which is a Ruby gem that basically checks code content by date and authorship in Git. So maybe you have someone named Fred who used to work at the company and doesn't anymore for many, many years. Well, you might want to take a look at his code specifically because there might be more stuff that you can get rid of. There's also Unused by Josh Clayton over at Thoughtbot. It's written in Haskell.
30:20
It utilizes C tags to statically find unused code, so it's not particularly strapped to one programming language. And I am a huge, huge fan of using C tags. I have not looked into that yet, but I really, really want to. Someone mentioned, I think last night, there's also a library called Scythe. I haven't heard of it. I see a couple nods, though. That is also another one that you should check out that I forgot to add.
30:43
So before I go, here's a parting message for you. And if you used Merb before it was merged into Rails, you might recognize this. No code is faster, has fewer bugs, is easier to understand, and is more maintainable than no code at all. So when you go home from this awesome conference, delete some code.
31:02
It feels fantastic. Thank you.