We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Deep Design Lessons

00:00

Formal Metadata

Title
Deep Design Lessons
Title of Series
Number of Parts
110
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
It seems that the word 'design' has been surrendered to UX over the past few years. We don't talk about the internal design of software as much as we used to. However, there are still things to learn. In this talk, Michael Feathers will outline and codify some of the gems that he feels are less understood today.
Software developerBit rateControl flowCodeError messageCellular automatonRight angleCalculationLevel (video gaming)Decision theorySoftware frameworkMessage passingElectronic mailing listChainPhysical systemComputer programmingQuicksortSoftware testingTransformation (genetics)Object (grammar)Type theoryMereologyIntegerCellular automatonEvent horizonTable (information)Medical imagingBitConstraint (mathematics)Social classOperator (mathematics)Data acquisitionString (computer science)Error messagePrimitive (album)WebsiteFunctional (mathematics)CodeBit rateMachine visionSequenceSeries (mathematics)Multiplication signDifferent (Kate Ryan album)Flow separationDependent and independent variablesSingle-precision floating-point formatResultantoutputNetwork topologyData structureSpacetimeChemical equationFile formatCondition numberRow (database)CollaborationismPreconditionerTap (transformer)Standard deviationFormal languagePoint (geometry)EmailDatabase transactionSign (mathematics)Graph (mathematics)NumberGodSystem callRange (statistics)Order (biology)FrequencyAddress spaceMetreClient (computing)Codierung <Programmierung>State of matterSource codeCountingContext awarenessDesign by contractProcess (computing)Rule of inferenceCategory of beingDataflowEndliche ModelltheoriePeer-to-peerView (database)Metropolitan area networkDomain nameService (economics)Projective planeSoftware developerObject-oriented programmingStructured programmingCompilation albumData storage deviceData modelEquals signCASE <Informatik>Representation (politics)InformationPhysical lawArithmetic meanVideo gameData typeProgramming languageEnumerated typeINTEGRALParameter (computer programming)DatabaseFlagDichotomyDirection (geometry)Cartesian coordinate systemLimit (category theory)SphereCommunications protocolNumbering schemeBuffer overflowLocal ringGreatest elementSoftwareWeightException handlingBounded variationBuildingLibrary (computing)Roundness (object)Entire functionArithmetic progressionAreaInternetworkingUniform resource locatorWordOptical disc driveControl flowBoundary value problemCausalityHypermediaMathematicsCasting (performing arts)Forcing (mathematics)Perspective (visual)2 (number)Numeral (linguistics)Multitier architecturePolymorphism (materials science)Constructor (object-oriented programming)ConsistencySelf-organizationFormal verificationSubstitute goodHill differential equationDot productSummierbarkeitVariable (mathematics)Fluid staticsIdeal (ethics)Ring (mathematics)ECosKey (cryptography)MathematicianUtility softwareReal numberCodecFunction (mathematics)Term (mathematics)DecimalPresentation of a groupForm (programming)ParsingTranslation (relic)BootingSubsetPrice indexProgrammer (hardware)Inheritance (object-oriented programming)TrailArmLine (geometry)Position operatorGame controllerRoboticsTelecommunicationField (computer science)Maxima and minimaOrientation (vector space)Negative numberEncapsulation (object-oriented programming)DiameterFunctional programmingUnit testingJava appletPattern languageInterior (topology)Object-relational mappingSyntaxbaumHistogramUniqueness quantificationShared memoryMixed realityConservation lawMatching (graph theory)WritingLoginGreen's functionAxiomEndomorphismenmonoidCompilerCharacteristic polynomialComputer animation
Transcript: English(auto-generated)
there, we're describing integers, right, we're describing the ranges of integers in various programming languages. And it's odd about this because quite often we don't think about particular limits that we have on the integers and on the data types that we have within our languages, we know that they're there, we kind of hope that we have values that are always going to fall within those ranges, we know that overflow and underflow
are possible and all those things. But it's interesting to consider just what these ranges mean in terms of larger level program organization. Okay, so integers, we definitely have questions of overflow and underflow and what kind of things are valid. Some of these things can go away, quite often in many languages today, you have the notion of promotable
integer integral types, so that you have integers that will always be unbounded, right, so you can keep on going and keep on going and they'll never overflow, they start building like a bigger internal representation to go and handle, you know, very, very large sums and stuff along those lines. But the big question we always have to deal with is really just what is appropriate for our domain.
So let's take a look at an example here. Here's a class called Account, and I'm sure you've seen this kind of thing as an example in many little presentations, it's almost like a fundamental object-oriented program, right? What do we know about this class? What's permissible with this class? Well, if we look at it, it seems like any integer
can come in, right, we can basically go ahead and pass in any number as our initial balance. We can deposit values into it, and they get added onto the balance, we can subtract them, and that's a withdrawal, and we can also go and get the balance, right, so all very simple things to do. What are the appropriate ranges for these things? Okay, now it's funny, there's no real constraint
when we're looking at this now in the code, but it seems like a reasonable constraint might be to go and say, look, we're only going to allow positive balances in our bank account, okay? And that might be a particular type of bank account that we care about, no overdrafts might be the best way of going and looking at this. How do we change this code in order to go and make this sort of thing possible? Can we do it with the type system?
It's kind of odd to think about. There is no, depends on what language you're working in, some languages don't have unsigned integer types. If they do, it makes things a lot easier. If they don't, then you're basically stuck on you putting constraints inside of this code in order to understand, in order to go and basically accommodate the conditions where things can go wrong, right?
So for us, we probably wouldn't have a check in the very beginning with our constructor. We'd also want to go and have a check with our deposit, verifying that what we have coming in happens to be positive, or non-negative. And then our withdraw, we'd want to go and basically check to verify that our balance is never put into a negative state by something that happens with the withdraw. Now of course, in doing these things,
we're going to end up going and putting in all sorts of things, exception checking and stuff like that, putting our code into a state where a user can put, can do things to it, okay? They're going to put it into a state where it's not really valid any longer. And this really has to do with the limitations of static type checking in languages, and the limits of types in general. Now this view of types that we have can also be
extended out a bit, since I like to go and put these kind of annotations when I'm thinking about things. We can look at the particular values that are coming in to these particular functions and annotate them and say, look, we've got things between zero and int max, those are things we can go and place in. Get balance can return a value between zero and int max.
That can serve as an understanding for us about what really is legal, what's permissible with this particular type of class. Now, it's funny with this. When we have this data representation view of an object, what's it like when we start using inheritance? Some of you have used design by contract. You ever hear of this at all? Okay, it's an idea that was first articulated
by Bertrand Meyer with Eiffel, and then basically it's been adopted by just about every contracts library you'll ever see. C sharp has contracts libraries and such and such. And they're basically ways of going and sort of like asserting particular conditions, tying them to the code in such a way where you can say, look, I have a precondition for a particular function. And if I pass in values that meet that precondition,
I'm guaranteed that the post conditions will hold for that particular class. And as a result, it makes it rather easy for us to reason about things. Now, what happens when you start to use inheritance? Well, that's when things start to get a little bit strange. If we look at this, would it be permissible for us to go and have a, let's take a look at this class
right here, would it be permissible for us to go and subclass this class and then do the checks inside of it to go and verify that the values coming in are only positive? Would that work out okay? A subclass of this class. Nobody wants to answer, oh God.
Well, of course you could do it, right? I mean, you could definitely go ahead and write a subclass of this class and then override, deposit so that you check the incoming values, override withdraw so you verify that you don't withdraw into negative. But the thing about that though is it would start to mess up with your polymorphic substitutability, right? You've heard of Liskov Substitution Principle? Yeah, okay.
Definite LSP violation when you do that sort of thing. And what's funny about this is that essentially you can look at the same thing from a data perspective also. From a data perspective, we can essentially say that subclasses can weaken preconditions and strengthen post conditions, okay? In other words, we are kind of allowed to basically accept more, okay, in a subclass.
But we can guarantee, we have to guarantee more, we can't guarantee any less than what we have previously and so there's a way of looking at this as like a data constraint that happens on the input values that you happen to have for a class and the output values that come back from it. And so it's funny because if you have this mindset, you're thinking about the constraints of things coming in,
the constraints of things going out, the constraints that are left with the class, it helps your reasoning about how things fit together within a program. Now, again, that's all data, it's all constraints. Let's try looking at another example here. Suppose I've got my account class and it uses a tax table, okay?
And here, I have a method called apply and I pass a tax table into the account. And what does that actually do? Well, if we look at it, it basically goes ahead and adds onto the balance, okay, the rate for a particular category. And you can see the category happens to be a variable which is part of this class, right? Now, when you do this, what kind of constraints
do we have around this particular operation? It's kind of funny because a lot of them are kind of silent in a way, right? If we have a constraint on balance, basically saying that it has to be positive or non-negative at least, we'd have to go and basically make sure that we can't have values coming back from rate for category which happen to be negative, right?
Or else we could possibly put ourselves into a situation where we've got a negative balance. That could be kind of awkward. The other constraint that we have here is kind of subtle. We have to pass in some kind of a category that rate for category understands. We have to pass in some data that rate for category understands. And of course, it isn't really specified in this code here but you know, category could be an enumeration.
It could be an integer in which case you'd have to go and sort of think about, gee, am I handling all the cases for all, you know, two to the 16, two to the 32 integers that there are? And I have to be able to go and handle errors gracefully. I need to be able to go and sort of make sure that all that works. So we've got constraints running in both directions when we do this sort of thing. Now have you ever heard of tell don't ask at all?
Have you heard of that idea? Okay, in object oriented programming, there's something that was written up by Dave Thomas and Andy Hunt back in the early 2000s. And the idea behind this is that essentially object orientation is much better when you are not asking questions of objects, instead when you're telling them to do something for you, right?
And it's a subtle thing, because when you think about it, it's like, oh, what difference does it make, right? But it's subtle, but it's also very valuable. It helps you go ahead and arrive at better encapsulation decisions. The thing about this code right here that's a little bit awkward is that we're asking the table for something, some integer, and that means we have to be very aware of what comes back, and we have to basically know,
does that play well with the stuff that we want to go and add it into? Is it negative? If it's not negative, everything's okay. If it's negative, we're in trouble. We need to be aware of that constraint. And as a result, things are a little bit awkward that way. We have coupling between these two objects. Let me show you an example of something which doesn't have as much coupling, and it's because we're dealing with a pure tell system.
If you look at this, take a look at what's going on here. Here's our deposit method. I've added something new to this. I have a transaction record object, and I tell it to go and record the balance and the value, okay? Now, what can go wrong when I call this? Do I have to worry about the types
of balance and value as much in a way? The important thing is that transaction record, the record method, will take the full range of those values, right? But we don't have to care too much about what's being supplied back to us. We don't have to care about things being out of range, out of the range that we can handle.
In essence, any problem that occurs is gonna be a problem that's gonna be on the other end. We give it to them, and we're never even gonna hear about the problem unless it throws an exception back in our face, right? We pass information in, and it can choose to go and take that information and ignore it. You can choose to take that information and record it. It can do all those things, but we've basically passed the problem on, okay? It can't impact us any longer.
And this is a kind of neat thing about object orientation is that essentially, in many cases, we can kind of organize our objects in such a way that we are not getting things from people, and we're not getting things from people, then we're actually not concerned about the things that they could pass us that are wrong. If we flip it around and say, look, we're just gonna pass things to other objects
in order to go and get our work done, we actually have a higher chance of not having as many errors. And it's kind of fascinating when you do that. Essentially, what happens is, if you look at things from a design-by-contract point of view, the contract that we would have for a method like this is just really skinny. It's all about going and saying, look, I passed you an integer. I passed you any valid integer. You accept it, right?
There's no particular constraint on this. And because there's no particular constraint on this, it's kind of like we have a higher chance of producing things with fewer errors. So people have seen this style of programming much at all, where it's like, it's all about notifications and things along those lines. It's cool, okay? And if you look around at some, like the N unit testing framework internally
has a very tell-don't-ask type style. So this is the J unit testing framework in Java. And you'll see this type of pattern you're being used over and over again. Some of the worst muddles you'll ever get into when you're working in object-oriented code are where you have some collaboration between yourself and several other objects. And it's like, I grab from this, I grab from this, I grab from this, I do things, pass it on to somebody else
or change my internal state. And when you discover that you want to test stuff like that, you find, oh, terrible problem, because now I've got to go and mock out all this craziness. I have to go basically know what state this will be when I call it, what state this will be in when I call it, and deal with time dependencies and stuff along those lines. When you organize your design in a way where you're pushing things out, often your testing is way easier.
And your design's then being a lot more decoupled. Now, funny thing about this, though, when you're doing this sort of thing in a design, one thing that tends to happen is you start to go and get like a bit of a dichotomy in your design, right? You end up having objects where you pass things to other objects, right? But then what do you pass?
Well, here I'm just passing data. I'm passing balances and values and stuff like that. As you move further and further into this tell-don't-ask style of programming, one of the things you start to do is you start to consolidate some of these primitives into bigger objects. They're just really like data objects, right? And so you're passing around these objects so you have like a layer of little messages
that you pass around, and then basically you have these hubs which are things that accept and receive the messages. Does that sound like anything else? It's kind of like networking in a way, isn't it? It's kind of like you have these autonomous things and you pass messages back and forth. People decode these messages, do things with them, and stuff along those lines. So it tends to kind of dichotomize.
And again, it comes down to the contract between these things being all about data, just like what we were seeing with our constraints looking a little bit earlier and things. So let's go a little bit further with this, looking at data again. Is there anything wrong with this code at all? I'm just gonna camp on this until somebody tells me.
I know you can't be happy with this. Or if you are happy and you want to tell me about it, tell me. Am I happy with this code? No? Come on, speak up, you'll tell me.
Okay. Go ahead, yeah. Yeah, that would be. It's funny, because what you're doing is you're presenting a solution, but what's the problem? What he said was, why doesn't the client choose which function to call? That's fair, but what's the problem? Is there a problem here, though?
Beyond the... Okay, so if you look at it from a single responsibility point of view, function's doing two different things. Right? This is actually a very, there's an old piece of advice from way back in the structured programming days, and that piece of advice was never pass control flags into functions, right? And it's funny about this, because you can look at it from all different directions.
One of them is single responsibility. Essentially, inside you've got to decode these things. Another reason why this is particularly bad is if you look at this and you say, I'm passing a control flag into this run method, how does anybody know what true or false mean, right? It's very error-prone, and I don't know, have you seen code like this before?
Raise your hands if you've seen code like this before. Okay, yeah. Lots of people do this sort of thing, they just don't understand that there's a problem there. So essentially, true and false don't really mean all that much to us. Would it be better if we were going and actually going and passing, you know, or calling the specific functions that we care about? Would this be better if we had like an enumeration? And we pass in stepwise or continuous?
Could do that, and then we'd have to go and say, well, we could do that, but why don't we just have separate functions for that, right? That's another argument. One reason why you might just pass an enumeration in could be this is part of an API, and you want to go and have a very narrow API. You don't want to go and have a whole series of functions to go and do various things.
So you basically make enumerations part of your API. Possible, that could work. But anyway, there's something kind of smelly about this thing. I want to compare this to something else, which is a little bit smelly also, and see if we can find the underlying principle behind this. Okay, this is some Ruby code. What's wrong with this code? If anybody screams that it's Ruby, I'll have a fit, right?
Anything wrong with this at all? I may not have the best example for this, but the thing that I think is funny with this, and this happens all the time in code, is that we have a layer separation here where we have one part of the program
constructing a data string, and then we have another part of the program decoding that data string in order to go and basically do something. Okay, now think about that. We do that sort of thing all the time with like XML and JSON and stuff along those lines, right? And particularly we do that sort of thing when we are passing it from one system to another, through a network or something along those lines.
What gets to be really crazy is when you do that kind of stuff inside of your program. Okay, what makes that crazy inside of a program? Okay, what happens is you've got one program, one part of your program that's going in and constructing something, right? Only for another part of your program to go and, you know, destruct it,
basically go ahead and parse it out, right? So you hope that you basically get both ends of that right, okay? On one end you're constructing something, and on the other end you're basically parsing it, and essentially whenever you go and modify the data format over here, you're gonna modify the data format over there, right? And that's like a natural coupling between two different parts of the system.
What's underlying between this example and the previous one is that both of those examples are cases where we can be lossy. They're cases where we can lose information in the program, right? Essentially when you're calling this function over here, it's kind of like, well, true or false, what does that mean, right? You have to go inside the function to understand what true or false mean.
And over here it's kind of like, oh, construct this string and hope we did it right, and it's like, well, then we have to go and do the parsing, but we've lost information in the type system. We're in the explicit representation of the objects to go and sort of, you know, be able to keep track of it. So we are introducing the possibility of error. We're also making it harder to understand what's going on with things. I look at this as being like a smell
I call private language, okay? And these are just two examples of this kind of thing. Private languages spring up quite often when people don't really see direct ways of doing things. I don't know if you've ever seen this sort of thing before where you have like some part of a very big function and what people do is they say, okay, I'm gonna set this flag here because I've discovered this problem.
I'm gonna go down further and further and further. Now I've discovered another part of the problem. I'll set this flag here, and then down here, set another flag here, and then the whole bottom of the method is like, oh, if this is true and this is true and that's false, then do this. If that's true and that's true and that's false and that's true, then do this. You ever seen code like that at all? Yeah, okay. And again, this is a private language.
You're building up one representation in order to decode it later in order to go and solve a problem. So anyway, it's a code smell. It's something I don't really, I don't think I've ever seen anybody write up or talk about, but the thing that's interesting to me is that this sort of thing shows up when you are thinking about things from a data perspective, okay? As much as we love behavior in objects,
it's kind of like we can still see, as I was saying earlier, data at the boundaries when we pass a message from one object to another. We still see data in the parameter list. We see data in the structuring like it's passed back and forth. We definitely see data within these little private means of communication that we just kind of build up inside of objects. So that's a little bit awkward, and it's a nice thing to know about.
Okay, so essentially it's sort of like summarizing that. Translating the strings loses constraints. A thing that I find interesting also is that this is really something which happens all the time when you're dealing with error handling in applications also, okay? What's everybody's favorite method of handling errors in an application?
Logging. What? Logging. Logging? Yeah, logging is good, yeah. What about the E word, exceptions? Well, I guess it's overstated to go and call that a favorite, right? Anybody here love exceptions? No, okay. Yeah, I think exceptions are one of those kind of like beneficial evils in a way sometimes.
The thing that's very easy to do when you're working with exceptions is to go and sort of, you know, as soon as you discover a problem, you throw an exception, and you hope that somebody up higher in the stack is going to catch it, right? And you hope that the behavior when you throw that exception is well-defined. You know, is it really understandable to the person who receives it that this is something that I want to terminate the application for?
I want to log at a higher level. Do I want to retry things? It's a very easy way to go and defer problems. But the thing that's funny about this is it still has that same issue that what we do is we have local information about a context. We produce an exception, and we take that exception, we throw it back to another context where we don't have the inner context at all, right?
We try to make a decision based upon that. So it's kind of like, again, building a private language between this piece of the program and the other piece. Can you get past that? Of course you can. I mean, you can add a stack trace into your exception. You can add in a context to go and get ahold of all the variables that occurred back in that lower context and stuff like that. But again, you're now in that position
of writing a program to decode your program. At the upper level, you've got to go and parse that stuff and understand what to do with it, that kind of thing. So it feels like this thing of inner and outer context, of private languages, is rather pervasive around applications. And again, I think that we see that when we are looking at data constraints in particular areas of code.
This kind of leads me to something else that people aren't really aware of that much these days. Anybody ever hear of this one at all? Postul's Law? It's really been dawning over me over the past couple of years that design, the people that really understood design in the context of software development, that people don't listen to that often,
are really the architects of the internet, right? There are so many little tacit bits of knowledge that are just kind of baked into the stuff that the IETF has done, the protocols that we use day to day, TCP IP, things like fail fast, all these other things. This one is just neat because it tends to be true
of networking systems. It also tends to work out well with Unix command line utilities. And you know what? Once you understand this one, you start seeing applications of this all over the place in all the code that you work in, okay? So what's this about? Be conservative in what you send and liberal in what you accept. When in the world does that mean, okay? Well, let's say that you're writing some utility, okay?
And your utility is supposed to accept tab delimited values, okay? Just rows and rows and rows of them. And then what you do is you basically tabulate those things and produce some number that's like a summation of each one of the lines or something along those lines. Should you throw away your input if somebody goes and puts a space
rather than a tab between their fields? Depends on whether spaces are significant or not. I mean, if they're just numbers, you might say, well, I'll be a little bit lenient. I'll accept spaces and tabs, right? And maybe you don't even document that. So you're kind of like, oh, okay, this is kind of just, I'll accept this sort of thing. It makes my program a little more useful to do that sort of thing.
Now when you're putting out your values, should you just put out tabs and spaces willy-nilly? No, pick one, okay? Always produce tabs or always produce spaces, right? And so what you're doing with this is what you're doing is you're creating like a funnel. You're basically allowing lots of people to produce input for you and being very forgiving in what you accept, but you're also going and saying, look, from me, you can count on me producing exactly this format
and it's kind of like, that's how you deal with errors, right? It's kind of like you're being accepting of what is being produced for you. You're trying to go and leave the world in a better place by producing something which is a very tight format. Somebody see examples of this in other places in programming at all?
It's funny, you can look at Liskov substitution as being almost a variation of this at a very high level. It's kind of like when you have classes, it's kind of like they, subclasses should be permissive in what they accept but restrictive in what they produce. That's that whole precondition, post-condition thing
I was talking about earlier, right? And when you think about this even deeper, you start to realize that this is fundamental to all systems, not just networking systems, not just computing systems. It's like when you build pipes, like in a plumbing system, isn't one end a little bit wider to connect to the other one in a way? And it's a little bit narrower. It's like there's this thing where basically you want to be permissive on the input and then sort of like more restrictive on the output.
And by going and doing that sort of thing, you end up building robust systems over time. Again, this is something that you start to see as a result of looking at data formats and stuff along those lines. One of the weird experiences I had as a programmer early on was working at a biomedical company and I was lucky enough to work trying to develop
translators or parsers for a data format that was designed by some biologists. And generally that's okay because biologists are very good at producing data formats. Isn't that true? No, it's typically that thing. Quite often people, they know their domain,
they feel they can do this sort of thing, they do it and they make some mistake and they don't really quite understand what the mistake was. The data format I was working with was something called Flow Cytometry Standard, FCS. And there was a very cool thing about FCS format. You could save data in comma separated value, in binary, you could save it in arbitrary precision numbers, you could do this, you'd have all these data formats
and headers and all these other things. You could save data in a million different ways in FCS format, which meant writing a writer was very, very easy. Writing a reader was a real pain because you had no idea which of the end formats people chose to go and write things out in. And I think with that sort of thing, if anybody that was on the team developing this standard
was really aware of that, they would have said, oh no, we have to be restrictive just because this is gonna be a pain to go and write readers for this. And I think, yeah, essentially you don't see developers making that same mistake all that often. Because they are aware of the pain involved in receiving something and having to go and deal with it.
So it's funny about this, even though you may not have heard of Postel's Law, it's probably something you've encountered where you probably almost have baked into you now. You almost have this sense that that's just the way that you organize things. And so yeah, it's kind of an interesting thing to be aware of. So let's look at data one last time.
What's wrong with this code? Now if nobody finds anything wrong with this code, I'm just gonna walk off the stage. Sorry? Too many dots, okay. But you know, when you think about it, that shouldn't be a problem because dots take up very little ink, right? So they're green. Dots are green relative to the other punctuation we have in languages, right?
Okay, that was a joke, right? This is a classic example of a violation of the Law of Demeter. You've heard of the Law of Demeter, right? And the notion is that essentially when you have an object, you shouldn't be going and sort of letting your internals out for other people to use, right? And there's a bunch of reasons why this can be awkward, okay?
One of them is that basically goes and makes, it puts a burden on the person who's using this. They not only have to know how to deal with an account, they have to know that an account has a calculator, have to know that a calculator has a superannuary table and they have to know the superannuary tables have cells and you address them this way and this is, you increase the burden on people trying to understand your stuff dramatically, right?
Beyond that, from a dependency point of view, it's possible to break code that is written this way very easily by going and saying, well, maybe I don't have my calculator on the account, maybe I have it on another object that's held by the account. Anything you do to go and alter internal structure can go and alter these chains of objects and force recompilation or force test errors
and stuff along those lines. And you know, the other thing too as well is just from a compilation point of view and it's statically typed language, anytime I touch any of these intermediate classes, probably gonna force this to recompile. In dynamically typed language, you probably wanna go and force the rerun of all the tests for all these intermediate pieces also, right?
So you're convinced it's bad? It's kind of bad, right? What about this? Is this code okay? It's Ruby code. Makes it inherently bad, doesn't it? Anyway, I'm not gonna go and paint this as being an ideal vision of what code should be
or anything along those lines, but I feel more comfortable with this code than with the previous code. This code, even though it's a little bit terse, you know it's a little bit weird, it relies upon something which is more functional in style than object-oriented in style, okay? What we have here is we have a series of transformations.
We basically say we have a list of events. We map each of the events to a date. We only pull out the date of each of those events. We sort all those dates. We do a unique on those, which basically goes and gives us all those dates uniquely. It removes all the duplicates. Then we call each cons to, which is kind of something which goes and gives me each consecutive one of those things as pairs.
Then we map those pairs into the difference of the dates and then we do a frequency histogram and we select certain things within certain ranges and stuff. Now I'm sure you guys looking at this, you're just like, oh my God, that's horrible. It's just terrible. It's code written by mathematicians that have no design sense or anything like that at all. There are much tamer examples of this sort of thing
that I feel much happier about, but the style itself, I don't feel bad about regardless. The thing about the style, which makes this useful and interesting, is that it's fully transformational, okay? Essentially, there is no state over here in events which is being modified. When I do a sort, I get something completely new out.
I get basically a completely new array of things and then when I do a unique, I get a new array of things also. There's nothing I can do in a later part of the sequence that's gonna change anything earlier in the sequence. The other thing too is that generally, each one of these things I'm working with is accepting an array and returning an array back.
Right? So we don't have that big type issue that we were talking about a little bit earlier of like, okay, here it's like I've got five different types of things all the way down to my chain. Here, it's just basically a bunch of array transformations giving me the things that I want with these things. Can you still get into trouble with this? Yeah, I mean, somebody could take the map operation
and move it off of arrays, but they'd be shot, right? I mean, that'd just be a terribly nasty thing to do. It'd break everybody's code in the world. But it's interesting to notice that there is this thing where, in a particular context, this demeterish stuff is actually okay, and another one that isn't quite okay.
And a lot of that comes down to change characteristics. How easy is it to change the code when you're doing these things? So it doesn't feel like there's any strict rule with that. The thing that I think is neat about this is, and I was looking for a word for this, in category theory, which gets bandied about in the Haskell and functional programming language communities, there's this concept of something called an endomorphism.
And an endomorphism is essentially a function which accepts a type and returns the same type back. It can do things to the values, right? But it accepts that type and returns another one back. And you'll see that roughly here, most everything we have is endomorphic. We accept an array, we return back an array. We accept an array, we return back an array.
And we're able to go and chain things together in a very nice style that does things well. Now, anybody ever see a framework called jmock at all? Some of you have. There's also variations of this in the .net space, but it's like those very strongly DSL-ish testing frameworks where you're able to go and say, this .should equals this, da-da-da.
And you have long message chains that make the entire thing read like English. And you might look at that and say, well, gee, it's that, right? But it's basically going ahead and sort of dealing with this trade-off, the trade-off between having things which can cause trouble and break later when people move things and having something which is a bit more expressive
and quite often relying upon this notion of having the same type used prolifically across the chain of operations. Again, this is something you basically see when you look at the data that goes between things. Now, it's funny about all this. The topics I've discussed so far have been kind of mix and match.
I think the thing that's really common is basically looking at the data representation between things. And I think that's something that we can try to go and basically pay an awful lot more attention to as we develop code. And I don't see anybody talking at that very much in design, so I wanted to share it with you guys. I did want to go and basically mention a couple of things at the end, though, about the law of demeter. We just basically saw cases where what we consider
to be the law of demeter is not quite the same as what you might expect it to be. There's other cases where basically undemittered code is okay. So when you are working in a situation where you have this kind of thing going on, can anybody explain, or anybody describe
what systems this sort of thing tends to happen in more than others? I mean, do you see this in data acquisition systems or compilers? I don't know. I tend to see this sort of thing a lot in database-centric applications. I know about you guys, right? You ever see that sort of thing?
It's like I have a bill of sale and I go and I get this thing dot this thing dot this thing dot this thing because essentially you have records and subrecords and subrecords based upon what your data schema happens to be, right? So how do you get rid of that when you're working in an application? How do you adhere to the law of demeter when you're doing that sort of thing?
It's kind of hard, isn't it? I mean, one thing you can do when you have a data model is you can say, look, I'm accepting, I'm gonna basically say that this one thing is primary, like the bill of sale, and even though it has subobjects, what I'm gonna do is I'm gonna place all of the methods that were on those things, try to push them up to the top and basically have high-level operations
that allow me to manipulate all the substructure without having to care about it all that much. Does it work? It can at times, right? But we always have this tension in OO applications that are database-centric between having like an overarching thing which is a container and then all these things underneath that you need to go and change also. So it's very typical that when you're working with a database-centric application
that the law of demeter stuff doesn't quite work out as well. If you're doing things that are kind of workflow-y, it's kind of like I have this one processing step, and if I pass this thing through and pass it to another processing step and another processing step, then quite often you can basically have like some common data representation you pass across and not have as many demeter violations. You're not pulling into things and stuff like that as much.
I had an experience years ago which kind of opened my eyes to something rather fascinating. I was visiting a team that had the worst law of demeter violations I've ever seen in my life, and it took me a long time to figure out why they were doing it and why it was happening. It was a giant Java project, and it turned out that what they were doing is every time a visitor visited their site,
they loaded a user, and when they loaded the user, they loaded up every bit of data that could possibly be accessed from the user and used in that session. So you can imagine what that was like. So you basically, each time through, you have the session, you navigate to the user, you're able to go and get the entire history of everything, duh-duh-duh-duh-duh, all this stuff, and I just like looking at this
and saying, this is just madness, and so we'd look at particular functions, and it's like, okay, you've got this one sub piece, and it goes and talks to this, which talks to this, which talks to this, which talks to this, and it took me a long time to understand why that was happening, and the reason was essentially what I was saying. They loaded everything all at once, right? And in most database-centric applications,
what you end up doing is saying, look, I need to go and have, you know, I need to have a bill of sale, I need to have a transaction record, stuff like that, and you're basically dealing with one piece plus its subsidiaries, its immediate subsidiaries, and you get, through joins and stuff like that, only the piece that you care about, and so as a result, you're going to have a constrained sphere of what you're working with, right?
And it was kind of fascinating to notice that because they didn't do that, they ended up with far worse diameter issues than you would have otherwise, and it kind of led me to this notion that essentially it's like databases are better, are good not just for storing data, but also the constraints they place on your development. They kind of force you to go and basically sort of get down to a smaller crux, a smaller change set
in the things you need to pull in and modify, and then push back into your database. So it seemed kind of interesting that way. So yeah, the law of diameter is a rather powerful and interesting thing. I think the thing that makes it overlap in this talk a bit is that it is about representation. When you have a structure that looks like this,
essentially you are making the relationship between pieces very obvious, and that's good at times, except if that structure changes. So cases where you want to make things very explicit are cases where the structure of the data holds information, rather than just the data itself.
Some cases where that sort of thing happens, things like parse trees in the compilers and stuff like that. It's not that you have an equal sign and you have a variable and you have a literal. It's the entire tree structure that has a meaning. So as a result, you want to have structured data like this, and it ends up looking very un-demetered in a way. But I think as programmers, we just need to really be aware
of when we need that kind of thing, when we need a complicated data representation and when we don't. And you can essentially go all the way back and start looking at all the different ways that you can see data at each step in your program, and quite often it gives you insights that you would not necessarily have had otherwise. So anyway, that was my weird talk. It's all about data representations across things
and how that ties into different aspects of design. Any questions or comments? Yes? Yeah, sure. There's one and there's the other?
Yeah, okay, so the comment is that basically the problem is that this is basically taking you deeper into the object and these are all off of the same object. And that's fair. Basically all these things are things which are un-enumerable or array. And so they are just aspects of the same protocol.
So yeah, it's just essentially that's the thing. And quite often, I mean, it's funny because now that this has become more popular, many people are going and adopting this point of view. They're basically saying that a key aspect of demeter is the amount of type exposure that you have. It's like how many types are you really exposing in a context? And in this case, it's generally just one and that makes things a lot easier to deal with.
That's fair. Yes? Okay. Okay, the question was would I say that if we're using ORM
we would end up with a structure like this and that's bad and that we should avoid it. I think it's really data has to be structured in certain ways in order to go and make it easy to add access. The representation that you use inside of your program doesn't have to be exactly the same.
It's probably gonna map quite easily. My bias is always with that to pull up only the amount of information we need and in the form that we care about into objects that may not have to expose as much. But there's a real tension there because then you can end up going and producing a lot more classes to represent the same data in a way.
So I think it's just really a fundamental tension and the indication that really is a fundamental tension is the fact that ORM is not a solved problem. It's always causing people trouble. That's just inherent to the mismatch between database schema and stuff. Any questions or comments at all?
Okay, I think I'm a little bit early but thank you very much.