We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Strategic refactoring using static analysis

00:00

Formal Metadata

Title
Strategic refactoring using static analysis
Title of Series
Number of Parts
96
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Decoupling legacy code bases is not only hard, but often we don’t even have a clear idea of the current couplings that exist in our system. Without a clear overview of the current state we can’t make sound decisions on what we should do to improve. This fact leads to refactorings that are subjective, these refactorings might not yield value except for some subjective measure of “less ugly”.Refactoring with a purpose. I would argue reading code is like walking around in the forrest, enjoying the small details - the scale is 1:1. Static analysis can draw maps from a higher perspective, which is critical for strategic thinking. By using static analysis we can make a map of the current couplings in our system, make a plan for refactorings based on that map, and verify that we have obtained the goals we set out to accomplish by re-running the analysis. This can make refactoring easier, cheaper and yield code-bases that can be proven to be more modular. Using examples from NRK TV (tv.nrk.no and psapi.nrk.no) I will demonstrate, using NDepend, how to analyze unwanted couplings and make rules to make sure they never again will appear in your code base.
Code refactoringMathematical analysisSoftware developerAssembly languageStrategy gameTime domainMetric systemComplex analysisCodeFluid staticsFehlererkennungscodeMotion blurDampingInheritance (object-oriented programming)Context awarenessLevel (video gaming)Slide ruleCodeWeightIndependence (probability theory)Mathematical analysisCartesian coordinate systemDomain nameBitCode refactoringProjective planeExpert systemObject (grammar)CASE <Informatik>Axiom of choiceVideo gameReal numberNamespaceProduct (business)Function (mathematics)Hand fanMetric systemComplex analysisStrategy gamePerspective (visual)Fluid staticsGoodness of fitDecision theoryMathematical optimizationSpacetimeWritingMultiplication signRing (mathematics)Assembly languageDivisorRight anglePhysical lawSound effectString (computer science)System callObservational studyMoving averageFood energySimilarity (geometry)Broadcasting (networking)View (database)WordDialectUMLComputer animation
Broadcast programmingSeries (mathematics)Software testingFocus (optics)Computer programmingProgrammer (hardware)Group actionCategory of beingElectronic program guideCompilation albumDifferent (Kate Ryan album)Physical lawFitness functionBitInformation privacyComputer animation
Different (Kate Ryan album)BitMixed realityComputer programmingAlphabet (computer science)Lie groupArc (geometry)Group actionComputer animation
Computer programComputer virusSineMereologyBitCartesian coordinate systemDifferent (Kate Ryan album)Physical lawTheoryTopological vector spaceClient (computing)Web 2.0Front and back endsContent (media)Mobile appWeb browserComputer animationProgram flowchart
Random numberRule of inferenceSource codeComputer programClient (computing)Execution unitLengthInformationCodeHookingPole (complex analysis)Menu (computing)Annulus (mathematics)Gamma functionChemical equationAssembly languageComplex analysisMetric systemKolmogorov complexityHeegaard splittingReduction of orderTime domainInheritance (object-oriented programming)WebsiteUser profileFilm editingView (database)Data managementCode refactoringFront and back endsCartesian coordinate systemPlanning2 (number)Military baseComplex analysisMedical imagingWeb 2.0Metric systemInformationImplementationDifferent (Kate Ryan album)LengthBitAssembly languageMobile appPoint (geometry)Arrow of timeLoginWebsiteProduct (business)Operator (mathematics)CodeQuicksortRule of inferenceWeb pageProfil (magazine)Level (video gaming)MathematicsTablet computerAverageMaxima and minimaComputer fileEntire functionDebuggerVisualization (computer graphics)Key (cryptography)Endliche ModelltheorieWeb applicationVariety (linguistics)Object (grammar)Similarity (geometry)Configuration spaceGraph (mathematics)Inheritance (object-oriented programming)Ring (mathematics)Cycle (graph theory)Beer steinClient (computing)Electronic mailing listVacuumVideo gameMereologyUniform resource locatorRight angleSet (mathematics)Forcing (mathematics)Food energySoftware testingSummierbarkeitMultiplication signSubject indexingLibrary (computing)Independence (probability theory)Automatic differentiationWind tunnelGroup actionLine (geometry)Bridging (networking)Decision theoryMusical ensembleArc (geometry)Traffic reportingProcess (computing)Basis <Mathematik>Computer animation
Execution unitWebsiteCodeMeasurementUser interfaceMenu (computing)Gamma functionMaxima and minimaSimulated annealingDrum memorySimulationWechselseitige InformationGraph (mathematics)MathematicsCycle (graph theory)Maxima and minimaCartesian coordinate systemQuicksortMedical imagingLine (geometry)Branch (computer science)LogicWebsiteMultiplication signPoint (geometry)Different (Kate Ryan album)System callThread (computing)ExpressionSpring (hydrology)Statement (computer science)Graph (mathematics)DiagramLevel (video gaming)Object (grammar)CASE <Informatik>MereologyGame controllerAssembly languageMilitary baseMeasurementMusical ensembleString (computer science)Inheritance (object-oriented programming)Element (mathematics)View (database)Projective planeSound effectFactory (trading post)CodeRule of inferenceGoodness of fitData managementSoftware developerComplex analysisDivisorAverageResultantFront and back endsNamespaceTwitterComputer animation
Assembly languageDomain nameVariable (mathematics)CodeElement (mathematics)Type theoryQuery languageField (computer science)Ring (mathematics)Menu (computing)Execution unitInheritance (object-oriented programming)View (database)Color managementUniverse (mathematics)BitMetric systemExecution unitComputer programmingType theoryGraph (mathematics)Inheritance (object-oriented programming)Complex analysisGraph (mathematics)INTEGRALAdditionSoftware testingCartesian coordinate systemLink (knot theory)DivisorLevel (video gaming)Process (computing)Endliche ModelltheorieNamespaceCycle (graph theory)Rule of inferenceCausalityError messageMilitary baseGenderSystem callText editorSymbol tableFeedbackMeasurementCodeTime zoneServer (computing)Game controllerLine (geometry)Data managementProjective planeFilm editingMathematicsDomain nameLogicOrder (biology)ResultantAssembly languageApplication service providerCode refactoringComputer fileTraffic reportingRight angleFront and back endsUsabilityQuery languageTwitterComputer animationEngineering drawing
Host Identity ProtocolSpacetimeString (computer science)Group actionSeries (mathematics)Metric systemCodeExecution unitMenu (computing)Drum memoryTwin primeSoftware frameworkArrow of timeEndliche ModelltheorieProjective planeGraph (mathematics)NamespaceBuildingSocial classGame controllerINTEGRALType theoryView (database)Configuration space1 (number)Series (mathematics)Assembly languageComputer programmingSlide ruleTraffic reportingMereologyCodeMathematical analysisSoftware repositoryImplementationInterface (computing)Computer fileBitRule of inferenceReading (process)Graph (mathematics)Cycle (graph theory)Matrix (mathematics)Multiplication signOpen sourceMilitary basePoint (geometry)Mobile WebProcess (computing)Core dumpBit rateWordSqueeze theoremAreaServer (computing)Row (database)Hand fanUniform boundedness principleVideo gameReal numberTheory of relativityComputer animation
Metric systemCodeView (database)Data managementComputer fileTrailProjective planeMultiplication signCountingMetaprogrammierungMathematicsWindowBuildingRule of inferenceInterface (computing)Reading (process)Error messageGreen's functionGraph (mathematics)Metric systemView (database)Goodness of fitArithmetic progressionComplex analysisLevel (video gaming)Code refactoringPoint (geometry)BitSoftware developerType theoryString (computer science)Client (computing)Control flowDifferenz <Mathematik>Port scannerLine (geometry)Reflection (mathematics)Traffic reportingQuicksortNamespaceEntire functionCodeQuery languageMathematical analysisRight angleCommitment schemeImplementationCondition numberSocial classNumberSign (mathematics)Module (mathematics)Arrow of timeEndliche ModelltheorieRun time (program lifecycle phase)Sound effectArithmetic meanMetropolitan area networkComputer programmingCASE <Informatik>Direction (geometry)Assembly languageBoom (sailing)Slide ruleMusical ensemblePlanningGrass (card game)Revision controlSystem callAreaBlogInheritance (object-oriented programming)Square numberFactory (trading post)Inversion (music)INTEGRAL1 (number)Computer animation
Transcript: English(auto-generated)
Okay, good morning everyone. Well, I've left the slide up for a while so you've probably seen the topic. Strategic refactoring using static analysis. So first, who am I? I'm Bjorn Einar. I work at NRK.
That's the National Norwegian Broadcasting in Norway. So, well, the Norwegians of you know what this is. For the foreign people it's like the BBC of Norway. So, I like to use this hashtag when I talk about NRK TV. You can if you want to tweet.
So, the agenda for today. I want to try to keep it loose so if you have any questions feel free to interrupt me. Just throw questions at me. Just shout out so I can see it. I can't see you very well because of the lights so just scream loud. But I'll try to go through some simple definition of what I mean by strategy.
What I mean when I say static analysis in this context. I'll be showing you the domain we're working at. Then I'll go through Ndependent which is the tool that I'm using when I'm doing static analysis. Then we'll look a little bit of refactoring at the assembly project level.
And I'll go a little bit more into detail on Ndependent. And we'll go and look at namespaces and some more complex metrics and stuff like that. And everything I'll be presenting today is real life code. Everything is just stuff we have in production.
And there's no secrets. I'm happy to show everything we have here. So, just ask questions and I'll dig into it. So, first some definitions. Well, refactoring. It's basically rewriting the code without changing any of the functional specifications. That's the way I see it to improve some other metric or some other aspect of the code.
When I say strategic in this context I just mean strategic at the domain level which I'll go into a little bit. Or strategic that we're talking about refactoring at assembly, dependency, namespace level.
Not necessarily making one method cleaner or nicer by some other metric. When I say static code analysis, well, that's a big topic. I'm no expert in static analysis in general at all. I'm not going to claim that. In the .NET space there's everything from StyleCop to ReSharper to FXCop and the code analysis tools.
And the PEND which I know of that's doing static analysis. So, it's everything that's just running and analyzing your code without actually executing the code.
So, StyleCop is very at the syntactic level at your code. How do you name your methods and so on. Lower casing, upper cases. And can't do any of the strategic stuff that we're talking about here. There's FXCop and ReSharper is doing a lot of these things as well.
I'll show you a little bit. But, to me, the role choice of the static analysis tools is independent. And you'll see why when we get into the details. So, that's the tool I chose to use here. And, well, you know, we're talking about legacy applications here.
For a definition on that, I will go to Karolina's talk in an hour. So, but I guess all the code we have in production are defined as legacy. I don't know. So, I'm a big fan of military strategy and stuff.
And, you know, it's sad. It's like when you listen to gangster rap and you feel cool when you refactor stuff. It's very sad. Some of us do this. You feel like a general when you're refactoring and making strategic decisions. I like these metaphors, even though I think they're quite sad.
But there are some similarities, right, that we're able to work at a higher level of our code than sitting in the trenches or sitting in the code base that we're able to lift our perspective to a higher level and make some higher level decisions. So, you know, we want to make some kind of map of our code that are showing assemblies, namespaces, dependencies between them and so on.
This is how I think of a lot of the clean code movement that's going around. Not that it's bad to write clean code, but people seem to be like this guy.
I'm the cleanest fighter. He's in the trenches. His friends are rotting corpses, smelling around him, but they claim to be the cleanest. It's a very, very local optimization to clean yourself when you're in a trench, right?
And I feel a lot of the same with clean code. Like, rewriting a method, it's fine, like do it, but it's not going to achieve any higher level strategic objective for your business. At least, that's my claim. So, I think I've narrowed it down a bit to talk about strategic refactoring of our applications, and the applications I'll show is NRK TV, NRK Radio, NRK Super with independent.
So we sort of narrowed it and made it a little bit more concrete. So I'll just quickly show you the domain we're working on. So this is NRK TV, so test programs, you can watch TV programs, there's lots and lots of programs in here.
There's a login, so I'm logged in, so I have my content, so I can continue watching stuff that I've seen. There's my programs, there's also like, this is, we have very high focus on privacy, so like, now apparently I've seen Freke
Votellingen, which is a Danish series about erotica, so I can delete that because I probably don't want people to see that. And there's like, sex educations, oh, I probably should delete that as well, right?
So it's personalized, there's some live TV, TV program guides, categories, so that's the TV.
Then there's the TV for children, which is aimed at the younger age group, looks a bit different. I think this is quite interesting, it's, can any guess why there's like Q and X even though there's no programs in Q and X? Like, why don't we filter it out?
And it's like, because these kids are learning the alphabet, so if we take out Q and X, we confuse the kids. So it's like, it's a very different age group. And it's not only cartoons in here, it's like, it looks a little bit like, you know, cartoonish, but there's also some serious stuff about learning, learning alphabets, but also we go into hebertopics, that's substance abuse and things like that, so it's a mix of education and entertainment.
And then there's NRK radio, which is a live radio. And that's a bit different, you can listen to, you get some notifications of what is currently playing, and these things are a bit different.
And then there's the API team that I'm working on, which is the backend, we'll talk about the backend here. There's, with all the, where basically all these applications go and fetch their data. This is like a quick overview of the domain, this is our story, parts of it is in Norwegian.
I joined around here, like September last year, so I can't take credit for any of this, but basically the idea is that there are lots and lots and lots and lots of clients. The stuff that we looked on now is just the web clients, so that's just the web clients, but there is smart TVs and there is apps and all kinds of clients that are playing the content from the same API.
But we'll be looking mostly at the frontend stuff on the web. I just took this picture before I left, this is our team, these are the frontend people here, these four. Ainar is doing a talk later today here, there's Stola which is on the backend team,
Steinar which is on the audience and sitting behind here, and you know, nice little group. There's our operations, there's ops, he has a cop that says no, and there's the release manager that like, you know, clicks octopus and decides what goes into production. Again, just an overview of all the clients, the stuff we'll be looking at now are the desktop, the web stuff, and how that relates to the backend.
I took this, this is showing how many users we have on the desktop, and there's a lot, almost like a million unique users a week.
This is just to show that the desktop stuff is really important, and I chose a metric that shows that it's really important. Like if you look at how many minutes people actually watch content, that's different. And this is just a little bit of background of why we chose to do refactoring of these applications to begin with.
And that was during the implementation of this part, which is like direct feed, I just took this yesterday. This is stuff that is live, but is not necessarily broadcasted on TV. And to implement this feature, basically, we finally took too long. We spent several months getting the MVP of this product out.
The reason being that the code base was too complex and too to work with, basically. And we decided, you know, something needs to be cleaned up here, we need to be able to move faster than what we were able to do. And all this, basically all the stuff that you've seen now with the NRK Super, NRK Radio, NRK TV is the same web application.
It's just configured to look differently, running a different host. Which was probably a good idea at some point, because you could get very quickly out something that looked quite similar, but then it's grown different over time, like you had the login stuff on one page,
you had some live view on the radio was coming in and play, and the TV stuff looks completely different. So at some point, stuff that looks quite similar end up being very different. And another aspect is that, you can see here, on the children's stuff, they watch it mostly on the tablet.
They don't use the desktop that much, whereas NRK TV is very heavily used here. So like, when we want to make changes to our very, very large audience on the desktop side, we always need to think of what happens to this stuff, which isn't as business critical, so we have these strategic decisions as well, where the fact that these applications intertwine is not too good.
Okay, so we need a plan, right? And this was posted not too long ago, this is Montgomery and the plan for D-Day. And the overall of the plan is just like one very simple page.
And I think we need sort of the similar things when we work with complex code bases. We need to have a simple plan when we work with complex stuff. It says like, heavy air bombing from as soon as light permits until after 8 hours. And there's the forward body, and there's the main body to follow, and it says down here, the key note of everything to be simplicity.
And this is a quite heavy, large operation. The other similarity I found is like, after they found the beaches and landed the main troops, the objective is to cut dependencies as well, like cut railroads, cut tunnels, cut bridges, to be able to sustain further operations.
After that, I think all metaphors break down, but that's fine, all metaphors should do. So, Ndependence, let's start looking at the tools and see some concrete examples. The thing about Ndependence is, it's true what they say here. You get this insight into apps you didn't have before, there's a breadth and depth of information,
but, and we can just take a look at how that looks. So if we just, we can build a very small and simple application, like a console app. I can, you know, if, if args any, we'll do something, right?
We'll do, I'm not going to write too much code, but I just need, we'll say, you know,
hi NDC, and else, you know, I don't know, something else, doesn't matter. We're not going to run this code, right, we're just going to analyze it statically, so we're just going to build it.
So, I built the code, there should be a folder here, with an application. We're not going to run it, we're just doing static stuff that I can analyze. So this is Ndependence, the full tool, and I can, what I can do is, I can just say analyze some assemblies in a folder,
I'll take out the visual studio, host the exe file, and just analyze this exe file. Do your job, you know, generate your reports, and it's true, it's like there's lots of data. It says here that my max and average cyclomatic complexity is two,
which is, sound makes very sense, like if this do that, else do this. But I already have like critical rules violated, actually I violated six rules already, right? And we can look at what it is, it's not that important, but the idea here is that there's a breadth and variety of information
and graphs and charts and things we can analyze about the code base. And I just talked about like, what did I say, like simple, the simplicity is the key to it all. So how do we use a complex tool in a simple way?
So the first thing is that we sort of, we make a plan, where we say, we get an overview of the assemblies, and the first mission is to, you know, cut the back end from the front end, I'll show you in the code after a while, and apply some custom metrics when we refactor,
I'll go into the details now. But the thing is then to be able to come up with a simple plan. So the way this looks at all the three websites you saw, which is one website today, one code base, has some couplings to the API of historical reasons. So the first plan is cut those, cut those dependencies.
Second plan is take the super part out of the other applications. You can probably as well get the radio out and we can do a lot more things, but that's the plan, two things. So that's a simple plan, right? So the first thing Endepend can do for us is to generate a map of all the dependencies we have.
So what you see here is that the front end web application, it has some common settings, common logger, common cache, it's fine, right? But there's this arrow here that points to web API models, so it's using some code from some back end models for some reason. And that has transient dependencies that will pull in the entire back end library,
also Elasticsearch, which is used in the back end for the indexing, and so you get this entire thing. So the first objective is basically cut this line, right? And that's what I mean by strategic, it's just like, cut that line, I don't care how it's done.
And Endepend can also tell me, that line, what does that depend on? It still doesn't make a lot of sense to me reading out, like there's a plug profile configure that is using some get serious ID from a plug view model. But I don't really care, all I want to do is to give this to someone and say, delete these dependencies.
I don't care how you achieve that objective. At that point we're at the tactical level, and I'm not going to go into details on that, that's at a local level, you can refactor it, you can duplicate the code, whatever, to get rid of it. I'll show you how we can build this as well, the graphs.
And this is just to show you that the same image can be generated from ReSharper using the architect overview there. So this is basically the same image with the dependencies between assemblies, it's just oriented a bit differently.
I can just show you how this is done. So right now I'm going to analyze this super application,
which has been split out from the code base. When I analyze it I can ask and depend, view application assemblies only in a graph, and it will draw this, where did it go?
It will generate this nice little diagram for me, which is the same we saw. And I can ask it, the way I've generated these graphs is just build a graph of code elements involved in this dependency, for example.
This is quite heavy in this case, so it's a more difficult objective to cut out, but you can get this.
So what you see is that even though I'm at this level only caring about cutting these assemblies, there's still like 73 violated rules,
8000 violations, 7 critical rules, but the way I use Underbend is that we can't go and start refactoring and cleaning up all that stuff, I'm just using it to achieve some objectives. And I think it's important when we work with a tool like this to not get all hung up in doing everything the tool tells us,
because then we needed to start a lot earlier than many, many years into the project. But anyway, what you can see here is that what we've achieved at this point is getting rid of the backend assemblies. In this case, the super site has been split out and just renamed, duplicated out, but you can see that there are no backend references.
Just that simple thing reduced the size of the package that we deployed from 300 something megabytes to 100. Still way too big and I think now we're down at maybe 30, but just this one change does have dramatic impacts.
At this point what we had done is that we had separated the super website out, duplicated all the code, get rid of the backend references, which is all good,
but we duplicated 12,000 logical lines of code. We just copy pasted it and we took all the complexity from super and okay TV and we duplicated it, which gives us the opportunity to change one without affecting the other, but there's still lots of that and legacy code
in both of the applications. The question is how do we get rid of that? The image that I used to explain what we're doing is just basically we had the yarn or string, blue and red string intertwined into a ball. How do you separate that out?
Do you try to get the threads out one by one or do you just duplicate them and cut off the blue parts from one end and the red parts from the other? To know that you're actually cutting out thread and not just duplicating it, we're using Endepend at the build server.
I'll just show you how we've set that up. We basically mesh all of that. I set up a nightly build with Endepend. I can go into details later on how it's all set up.
That is giving us these trend charts, which trends, well this is lines of code, but I've set up with Endepend, which is lines of code in the super assemblies, lines of code in namespaces, lines of code for view generated code, lines of code for controllers,
and lines of code for the managers in this, also for average complexity and max cyclomatic complexity in the different namespaces in the duplicated code. What you see is that when we duplicated the code, it wasn't that long ago basically. It's just in time for this talk, 1st of June.
That was merged into master branch and we have just in the super assembly time, 10,000 logical lines of code. That's a lot more in lines of code. This is actually just your statements and expressions and so on. At this point, we haven't really achieved any cleaning up.
Then the guys started deleting code. You see we have deleted approximately, I would say, 40% of the code from super, which was TV and radio stuff, which is not used at all. We can see by the complexity that we haven't really reduced any complexity.
I think that is because we have just deleted the controllers and the views and the stuff that is really separate, but we haven't started going into refactoring the methods where they could be really simplified inside controllers and so on. I would say at this point, we should continue into refactoring
more local complexity. Before we stop. This is nice because we can go to the managers and say, we need a few more weeks. We've already accomplished this. We've reached the reduced complexity and the amount of code by half, but we need some more time to clean up some of the other stuff.
They can take this to their managers and tell them what they're doing to see that we are actually able to prove that we're getting something out of it. I think too many times developers go like, oh, the managers don't understand why we need time to refactoring, and we need results a lot of the time. We just say, but it's cleaner now,
we've done some major refactoring, it's better now, but I think we should show with some graphs and charts how we've actually improved.
I will show you a little bit of how we go about to set up making these metrics that we're using when we're refactoring. One of the most powerful things that Endepen gives you in addition to these charts and graphs
and trends that you can dig into is a domain model, almost, of your code, using types, methods, fields, names, business assemblies that you can query with what they call CQL links, so we can use link to query your code. Even I can metaprogram against the code base
to make this metric. I'll just show you how do I make a custom metric like that for my code base. We can look at the dashboard again. It should be here. It's a bit like flying a spaceship navigating around here, but this is one of the graphs I made.
How do we make these graphs that contain, for example, lines of code for controller reviews? This is the code required. It's like application namespaces where the name contains controllers, child methods, some lines of code. This is how you give it a name,
and this is the unit. If I want to make another metric, I just take an existing metric. Normally, I just start with one of independent metrics. I like to make something like this, but I want to tweak it for my own use. I can make a new metric in here, and I can say where it contains,
I don't know, ASP. You see I get the result immediately under here, and I can call it lines of code for ASP. This is the generated code in the ASP namespace. Now I see there's 2,000 lines of code. I can make a new Cren chart that I call NDC, and I can say
I want to have this... I guess that's in... Here it just shows up. The stuff is that name that I gave it here in the comments, and the units show up here, and I can add that to my graph, and I get a graph,
and I add that in, and it will show up in the reports that I build server. The level of metaprogramming, it's super simple if you know link, and you have any idea that your application consists of namespace methods and namespaces. Basically, you can start writing very complex and nice queries. This is how you do cyclomatic complexity.
This is how they do IL nesting depth, and you can find all the rules you can look at your dependency graph again, and you can go like... I want to cut this as well, this common stuff, and I could go and say
I could generate a code rule that matches if this dependency exists, and I can make a code rule, check that in, that generates a warning if it exists, check it into the server, get a warning at the server, and refactor it out to see that it goes green, so it's almost like test-driven,
but we're still at the assembly level right now, how to cut project references and assembly references. I also need to give some credit
to the guys who actually deleted the code, because I haven't touched the code base. I just put up the metrics and so on. I'm on another team, and I'm the API team. I'm not touching the frontend code, so these are Tobias and Stian. They did all the deletions. I should actually show you just how it looks,
because it looks very simple to delete 2,000 logical lines of code, and then you look at some of the pull requests right here, and it's like remove unused controllers from super, and you have this 11,000 lines of code,
and this is maybe 1,500 logical lines of code, and you code review this stuff, and you're like file change, 200 file change. Okay. So this is the stuff where it gets tactical. This is where I move out. I like to be at the strategic level when it's about code reviewing this stuff.
It's like, as you see, people are a little bit, aren't you test-driven to want to test look after you, and I'm like, yeah, you're deleting your tests as well, right? So you need to really know what you're doing. The only thing you can do is rely on your integration tests
that are testing your UI and your manual testing process to catch any errors that you're doing right here. But I think they had some fun as well. I was looking through the pull request yesterday, and some of them had like, yeah,
I think, oh, I got a little bit eager, right? And I think people get a little bit more eager when they get actually these graphs and measurements, and they get some feedback from the build server that they're actually doing a good job, and we can go to management, you know? So it's a little bit competitive in actually getting rid of all the legacy code and getting rid of complexity.
So you have these hairs as well. Oh, this is a small one. It's only 2,000 lines of code. It was, like in the beginning, it was a bit demotivating because you're like, oh, I deleted 8,000 lines of code, and I would see this little dip in the graph
and turn it into logical lines of code, and we had to delete a lot more. But that just ended up being real and telling you how things really are and not how you want it to be. So this was the assembly level, and now we can go in and look internally at assemblies. This is when it gets really nasty
because unlike F sharp, which tells you you have to do stuff in order and you can't go and reference stuff cyclically, C sharp lets you have reference between namespaces without really giving you a warning. The only tool that I've seen that really gives these warnings
is if you use ndependent quite actively. So how it looks like, if I open this and I look at, okay, what are the namespaces inside here? And all the red arrows are namespaces that are mutually dependent. And if you try this on your own code base, unless you are very strict about it and you use a tool like this,
this is often how things look. It can be innocent things sometimes, just like the interfaces was put in the same assembly as the implementation maybe, or it was using some concrete types that are quite easy to clean out, and sometimes they're just horrible. So here we have, you can see,
okay, so my models and my controller has a mutual dependency, so that means the models reference the controllers and the controllers reference the models. So why is the model using the controllers? So now we're inside the assembly looking at how the namespaces relate, right? So I can say, okay.
And now graphs don't work very well anymore because there's just too many namespaces. So we use the matrix view. So I say, keep only the involved namespaces and show me this cycle. This is pretty cool. Well, where did the graph matrix go?
It's this spaceship view again. I just need to find the, or it depends on me. I think it hid behind here, yeah. This is the model's namespace and controller's namespaces, and I can click them and I can get a view of only the related stuff.
This is probably why it's hard to do just by visual inspection and reading code without a proper tool, because it is quite complex. So what you see, the green ones here are the ones that go from controllers to models, which are fine, but where are these? The blue one is going in the other way.
So you can see, what is that one? It's the extra material season model and the serious controller that has a relation. And there's one more blue guy like that. And this is the same slide I just showed you in real life.
So this is the dependency. So basically there's one method that Enderpan tells me about, which is in the model class to get some URL to a season reference for a program that uses the MVC framework to get the serious controller
and call a method on it. And after we had split it out and cleaned out some of the code, this was dead. This is dead code. And just by deleting that, now I could easily delete it. It's only used on TV anyway. And when I run the analysis again, I get this nice little one-directional arrow. So at this point, basically, we can start iteratively just cleaning out.
And hopefully, if you get all those dependencies straightened out, you get a nice non-cyclic view. You should really try this on your own code base. I tried it on some open source projects and so on. And the code isn't as clean as the people like to think, I think,
because we a lot of times don't run this kind of analysis. The last thing I just wanted to show you is how this stuff is actually just part of the build process.
So the first thing I showed you was just to analyze manually some folders in an assembly. But what you can do is you can configure these projects like I'm working on now. It's an ND project file. So this is basically... I can show you where it is.
This is just like... in a repo. For example, in the super here, there's the ND project file that is basically where it's generated by the ND Pen tool. Where did it go?
I normally modify this a little bit by hand as well. This tells you which assemblies do you want to scan, which assemblies are part of the namespace, and the tool helps you building these, but you can easily modify them. Where do you find code analysis reports?
Here are all the definitions of the rules and the series and the custom stuff that you made. So like the charts that I made by right-clicking and stuff, it just gets saved like this. So it's quite easy to edit in Pure Text as well. And this is all the definition for the reports that get run by ND Pen. And to get those on the build server,
it's basically we just... We just go into the build configuration in TeamCity. There's a build step. So after the build command is run, we're using Faker, so we're just running a build command. And after that I have an ND Pen step. And that just is a built-in integration
to TeamCity, so I just say, there's an ND Pen runner, there's my ND project file, which is the stuff that contains all the metrics, all the graphs, all the rules, all the warnings and so on. And I just say, compare it to last successful build. And then you get these nice graphs that everyone can enjoy.
There's a lot more features to the tool that I haven't even looked at, which is like code changes versions, which is more like... The code diff in GitHub only tells you what are the lines of text
that has been changed, right? Whereas ND Pen has a higher level view of the code, so it can tell you more logically what has changed and where has stuff changed in your code between commits. There's a lot more advanced queries and metrics you can do. What I would like to hear about are other people showing how they do
refactoring at a strategic level in a tool like this, because I think this gives us sort of a common model to talk about refactorings that are at a higher level than just working with the code itself. And I would really like to hear how other people do it.
What I really miss is like more data on what effect these metrics have. And what I mean by that is like, okay, so we've done this stuff. We reduced the amount of code by two. We reduced complexity by this and this much. What effect does that have on the speed of the team? I know that's really hard to measure, but that's something I would be very interested in understanding.
The stuff about the managers, I think that's really important, because I think what I've found is that when I'm able to give these graphs to my managers and they can show this, this is how the code looked before. It was like a mess of red arrows that got in all directions, and this is how it looks now.
Can they get two more weeks to clean up this stuff? Chances are they will, yeah, cool, continue going on. But what a lot of time I find that developers expect is that they go to the managers and they say, like, we've been refactoring. We're really proud. We made a lot of good progress. Can we continue?
If you can't show any evidence of your improvements, it's very hard to actually get any more time. The main reason I'm doing this talk is because I would like to have other people sharing their experiences at this level.
Most of the stuff that I've heard about up until now when I talk to people are like, yeah, I've used Endepend. Oh, what did you use it for? And it's always like, oh, this client wanted me to come in and analyze the code base of someone else to give them a report on the status of the code base. But I haven't seen too many experiences of how it's actually being used
so does anyone have any questions or comments? Or things you, yeah, sure.
Yes, yes, okay. So the question is, if I understand you correctly, can you, when you integrate it into continuous integration, stop the build process and stop the continue, if there's something wrong.
And yes, okay, I will show you because if you, so this is, in this case, I'm not doing it right now. I probably should, but I haven't done it like that. But this is our CI build and I have this nightly Endepend build so it's just on the side and this goes red all the time
because there's, the reason is because the inspections error from Endepend are too high. But what I could do, if I go into, okay, if I go into the CI build now,
hopefully the guys won't be too annoyed and I'll go and to the build step, I add Endepend and I'll say the project file is, hold on, see if I can find it.
So you would have to configure all the rules just like we have in the project file and you say this is where it's going to run. And you say,
now it will run that and we can say build failure condition is if inspection errors are more than zero. Right now they're going to have a problem because the number of inspection errors in this analysis as it's currently configured
is something like 8,000. So if I save it now, they need to actually fix 8,000 inspection errors for the build to go green.
Yes, so the comment is that you could tell it what is important and what should stop the build and not. And you can. And that's what you would do in here. You would actually go in here and you would say, what is the rules that are violated right now? Methods with too many parameters,
for example, it's a critical rule that stops them from releasing the build. And you could open this rule, which is defined in this and the project file. And you could say, sorry about the messy windows here, but it says warn if count is larger than zero. And you could go like, I guess inform if or
what the name of the command is, but you could take the level of warnings down and then you could just say that, these are the definitions that will break the build as inspection warnings, and you could set it up to break at, inspection warnings is more than zero
and inspection errors. So you could say, disable failing the build on warnings, fail it on errors, and you could go in and set the errors, everything that are errors now, the warnings and so on. So definitely you can, and I would say you should do it, or I should do it,
and definitely when you remove a dependency, you should add a rule and add it into CI and make it stop the build. The thing is that, what I'm showing you now is just what we have actually done and we haven't actually done that, but what I showed you now
is how to do it and we could and should probably do it like that and then just tighten the rules and the point I wanted to make in a way is just that it's possible to use Endepend
without caring about all the fancy features to begin with, you can use it strategically to do a few things and don't even care about it because I think what happens sometimes is that people open the tool and they get 10,000 warnings and they're like this is super cool but there's no way I can relate to 10,000 warnings and they close the tool and they go about doing their own day-to-day work again.
But no, I think I should definitely basically go in and put all the errors to warnings and break the build. And the other thing is just that we have four build agents and I have one license so I just run Endepend on one of the agents
so my license is for one agent. If I want to run it on each CI build, I would probably have to license all the agents, which would cost me a little bit more money. And I feel confident to go and ask for that money now but when I started the project, I was just experimenting with Endepend to see if it had any value. Now that I see that it has value for us, I would probably ask for money
to license it on the other agents and start integrating it into our CI builds. Thanks. Any other questions? Cool. If you want to see any details of it, just ping me in the tools.
Oh, yeah. So the question is that you can use the metrics to show your managers that you're getting progress
but how do you make sure that it's... How do you convince them that these are good metrics to focus on? I don't know, actually. I think managers like graphs, so they see graphs going down.
They're like... You know, users going up, complexity going down. That's my simple answer to it. That worked for me. But sure, that can be a problem. You still have to convince people. I think the only thing that I found, and there's papers on this, I don't have the reference right now,
is that when you double the amount of code or complexity of code, you use four times the exponential time to understand it. And when all your namespaces are referring each other, you have to understand the entire thing, and that's maybe ten times larger, which takes a hundred times longer to understand
than if you can understand one namespace as a module at a time. So I think you could argue in that way that there are some signs behind this that argue that if you reduce the amount of code and the amount of code that is dependent on each other, understanding it, reading it, and fixing it is faster.
But I think that's the best I can give you. Try the graphs, and if the graphs don't work, then go look for the papers. If that doesn't work, then move along. Oh, that's a good question.
When you use dependency inversion, how can you track all the types and stuff? I didn't mention that. After I used Endepend, I hate assembly scanning,
like with a passion. When you just go and scan assemblies and pick up interfaces where the references are just a string to some other assembly, that's like when you have a bloodhound chasing you and you just cross a river to get rid of your tracks. Endepend doesn't have anything to work with,
because you just told them that all the types and all the stuff that are in this file, you should really go and look for it and ask it to do whatever metaprogramming and reflection on your code. So the thing is that I get more and more focused on helping Endepend now.
So instead of going like assembly scanning stuff, you would go like implement this class with this interface. And then it's possible to see what's being used where. But I think being explicit about which interfaces you are using and what concrete types are behind the implementations and so on, helps a bit.
But sure, that's my best answer. You can come and try if you want later and see what it can and can't find. But being explicit about your types and interfaces
helps Endepend and helps your analysis. We've got bitten a lot of times by assembly scanning. You can't see it, it doesn't break anything. Then you try to run it and things break at runtime. And you need to do runtime analysis to catch that. And that Endepend can too.
But if you just help it a little bit statically, everybody's happier, I think. Okay, thanks.