We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Overcoming Our Obsession with Stringly-Typed Ruby

00:00

Formal Metadata

Title
Overcoming Our Obsession with Stringly-Typed Ruby
Title of Series
Number of Parts
65
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Producer

Content Metadata

Subject Area
Genre
Abstract
We use Strings. A lot. We use them for pretty much everything that isn't a number (it's jokingly referred to as "stringly-typed"). It ends up creating unnecessary complexity in our applications, making the boundaries between classes and modules hard to understand. What if we used Ruby's power to create richer data types? These types would define clearer boundaries in our system, and make our code much easier to understand.
39
String (computer science)Control flowComputerVideo gameLine (geometry)Goodness of fitString (computer science)Type theoryCartesian coordinate systemSoftware engineeringBuildingRight angleProgrammer (hardware)TwitterComputer animationLecture/Conference
String (computer science)Programmer (hardware)Type theoryEntropiecodierungDifferent (Kate Ryan album)BlogTwitterComputer animation
String (computer science)Programmer (hardware)Computer configurationImplementationString (computer science)CodeProgrammer (hardware)Multiplication signNumbering schemeComputer configurationFunction (mathematics)Office suiteImplementationFigurate numberoutputDatabaseQuicksortEntropiecodierungEmailDigitizingComputer animation
Numerical digitCodeAreaException handlingAddress spaceMobile appEmailModule (mathematics)CodeDatabaseHash functionTorusEntropiecodierungDigitizingAreaOffice suiteCodeLevel (video gaming)Lattice (order)DatabaseProgrammer (hardware)Physical systemService (economics)Address spaceIntrusion detection systemHash functionSocial classEmailMereologyBitProcess (computing)InformationData storage deviceString (computer science)ImplementationNumber1 (number)Cartesian coordinate systemNumbering schemeLetterpress printingTable (information)Protein foldingRow (database)Group actionState of matterComputer animation
Walsh functionAddress spaceProper mapCodePhysical systemRadio-frequency identificationAddress spaceNumbering schemeGroup actionSoftware bugData storage deviceType theoryDatabaseProcess (computing)EmailService (economics)CASE <Informatik>Row (database)ResultantWalsh functionException handlingBlogHill differential equationExplosionOffice suite
outputReading (process)Address spaceWalsh functionTable (information)Constraint (mathematics)Machine codeComputer programmingoutputRow (database)DatabaseString (computer science)Address spaceCASE <Informatik>Process (computing)CodeReal numberPhysical systemConstraint (mathematics)Regulärer Ausdruck <Textverarbeitung>Video game consoleMathematicsMultiplication signTable (information)Social classRight angleVideo gameProgrammer (hardware)SeitentabelleError messageComputer animation
Multiplication signPhysical systemCategory of beingSocial classProcess (computing)Cartesian coordinate systemCodeCuboidString (computer science)Boundary value problemEntropiecodierungMereologyPoint (geometry)BlogQuicksortXML
CASE <Informatik>Address spaceMathematicsParameter (computer programming)CodeComputer animation
Hash functionString (computer science)Hash functionExpected valueCodeString (computer science)Key (cryptography)Physical systemoutputMultiplication signState of matterSocial classProcess (computing)Goodness of fitBoundary value problemObject (grammar)Programmer (hardware)CASE <Informatik>Data typeJava appletRight angleLecture/Conference
Type theoryOperations researchArithmetic meanNumerical digitMachine codeString (computer science)CodeBoom (sailing)Boundary value problemData typeBoundary value problemRight anglePhysical systemDigitizingLattice (order)MereologyString (computer science)Operator (mathematics)Object (grammar)Numbering schemeSocial classEntropiecodierungType theorySoftware bugCodePoint (geometry)Set (mathematics)DivisorException handlingTouch typingRow (database)Line (geometry)Machine visionImplementationSign (mathematics)Representation (politics)Streaming mediaValidity (statistics)Arithmetic meanEndliche ModelltheorieWalsh functionCircleSource codeProcess (computing)Service (economics)Division (mathematics)InfinityComputer animation
BitCodeComputer programmingFormal languageType theoryFigurate numberResultantMultiplication signRight angleComputer animation
Machine codeInstance (computer science)Code2 (number)Social classVariable (mathematics)MultiplicationHookingInstance (computer science)EntropiecodierungBoundary value problemRight angleSoftware developerParameter (computer programming)MathematicsWorkloadComputer animation
Data typeBoundary value problemSoftware developerMultiplication signResultantCodeType theoryCausalityRight angleObject (grammar)Data structureSocial classInstance (computer science)Software testingLatent heatDependent and independent variablesRaw image formatParameter (computer programming)CoroutineSingle-precision floating-point formatXMLComputer animation
Multiplication signCore dumpMathematicsType theoryCompilerError messageCompilerSoftware testingString (computer science)Communications protocolWrapper (data mining)CodeType theoryProgrammer (hardware)WritingDatabaseQuicksortRight angleCode refactoringString (computer science)Function (mathematics)MathematicsFormal languageMereologyStandard deviationoutputLibrary (computing)Error messageCommunications protocolWrapper (data mining)Data typeExpected valueJava appletPoint (geometry)Compilation albumMultiplication signOnline helpSuite (music)Software testingException handlingDisk read-and-write headCore dumpCalculusCompilerCASE <Informatik>Object (grammar)Data structureMusical ensembleHeegaard splittingDivisorProcess (computing)Degree (graph theory)Traffic reportingStreaming mediaSystem callComputer animation
Machine codeString (computer science)CodeInheritance (object-oriented programming)Library (computing)Communications protocolOnline helpType theoryConfidence intervalString (computer science)Object (grammar)Run time (program lifecycle phase)Social classPoint (geometry)Multiplication signCodeStreaming mediaInterpreter (computing)Moment (mathematics)Revision controlRight angleWorkstation <Musikinstrument>Representation (politics)ImplementationInterpolationDefault (computer science)Lecture/ConferenceComputer animation
Machine codeAliasingData conversionString (computer science)CodeString (computer science)Raw image formatRepresentation (politics)Type theoryObject (grammar)Data conversionBoolean algebraCommunications protocoloutputDefault (computer science)Multiplication signException handlingMessage passingComputer animation
Machine codeAliasingString (computer science)Exception handlingMessage passingWeb 2.0Formal languageParticle systemCommunications protocolData structureType theorySystem callPresentation of a groupRepresentation (politics)Projective planeOperator (mathematics)Multiplication sign1 (number)Data conversionIntegerObject (grammar)Discounts and allowancesElement (mathematics)Template (C++)Computer animationLecture/Conference
String (computer science)Discounts and allowancesObject (grammar)Turm von HanoiLibrary (computing)Standard deviationString (computer science)Operator (mathematics)1 (number)Element (mathematics)Data typeCommunications protocolComputer animation
DatabaseAddress spaceMachine codeCodeData conversionSequelCore dumpString (computer science)DatabaseCodeLine (geometry)Sign (mathematics)Physical systemRow (database)Representation (politics)Communications protocolMappingData typeSerial portPlug-in (computing)Object (grammar)Address spaceCore dumpHash functionMessage passingData conversionSocial classDefault (computer science)Wrapper (data mining)Structural loadEndliche ModelltheorieTable (information)Power (physics)ResultantInstance (computer science)SeitentabelleType theoryMultiplication signSinc functionEntropiecodierungGoodness of fitFunctional (mathematics)GenderSpeech synthesisTerm (mathematics)Civil engineeringUsabilityComputer animation
Address spaceEmailEntropiecodierungEmailString (computer science)Multiplication signData structureAddress spaceCASE <Informatik>Social classData typeRepresentation (politics)XMLComputer animation
Rational numberDiscounts and allowancesOperator (mathematics)State of matterEntropiecodierungSocial classData structureDiscounts and allowancesRule of inferenceType theoryQuicksortPhysical systemEnumerated typeComputer animation
Type theoryCodeError messageData structureState of matterSet (mathematics)EntropiecodierungType theoryError messageRight angleMultiplication signBoundary value problemCodeWikiData typeComputer animation
Process (computing)Green's functionSlide ruleInformation technology consultingLink (knot theory)
Transcript: English(auto-generated)
Dave Copeland, otherwise known as Davefront5000 on Twitter.
I wrote this book about how to be a good software engineer. I wrote this other book about how to build awesome command line applications in Ruby, and these aren't particularly applicable to our talk, but what we'll learn in this talk will help us build better applications and be better programmers, I hope. So the title of the talk, right,
Overcoming Our Obsession with Stringly Typed Ruby. So stringly typed is what we're talking about. Does anybody know what I mean by stringly typed? Oh, and you came anyway. So if you've ever heard of Jeff Atwood, otherwise known as Coding Horror on Twitter, he writes his programmer's blog, and he's got this great post that's a few years old.
He goes through different types of programmer jargon, and there are these very amusing phrases for things that we do as programmers, and the one that I thought was the most amusing was this one, stringly typed. And so what he says in the blog, he says it's a riff on strongly typed, used to describe an implementation that needlessly relies on strings when programmer and refactor-friendly options
are available. And we use strings all the time. Input comes as strings. Output needs to be a string. Rails love strings. Databases love strings. We use them all the time, but it can get us into trouble if we use them when we really shouldn't. So we'll have a motivating example to demonstrate how we can get into trouble.
So we'll talk about zip codes. So if you're not from the United States, this is also called a postal code, but basically it's a bunch of numbers that help the post office figure out where a particular piece of mail is supposed to go. It's sort of a pre-sorting thing that they do. And the basics of a zip code is that it's five digits.
It's not a number. It looks like a number, but it's always five digits, even if they begin with zero. So, like, zip codes in Puerto Rico would start with 009. And, like I said, they map to a particular area of the United States as defined by the post office. And you can sort of read into them, like, the first couple of digits can kind of give you a sense of where in the country it is,
and you can see that the numeric value of a post code or a zip code is lower on the East Coast than it is on the West Coast, but generally it's just a meaningless string of numbers that they use. So with that in mind, we're going to look at a very simple application designed to send letters to people. So we have a database of addresses,
and we have this third-party API that we're going to use that's going to allow us to send the letters. So, like, your account is overdrawn or you're due for a refund or something like that. And basically that service will take care of, like, printing and folding and stamping and mailing the letters. And we do what programmers do, which is make two systems talk to each other
and wrap some service to do something more specific for whatever it is that we need to do. So this is the Ruby API that the third-party mailing service is providing us. That's where the implementation isn't there. So we will configure a bunch of letters that we might want to send in their system, and those all have IDs. So we will find, whenever we want to send a letter,
we will find it by the letter ID, which gives us back this class that has this method called mail that takes the four parts of an address as strings, a street, a city, a state, and a zip code. And so calling that will mail that letter to that address. So to build our part of the system, we'll make a database of addresses,
and we'll use strings to store them, because that's what the database gives us. It's a pretty reasonable design table. And then we'll have some code of our own. So this bit of code here, its job is to read in address information from the command line and store it into the database. And we can assume, for the purposes of this talk, that this $database thing is some super simplified way
of getting data into the database. It's not anything like ActiveRecord. It's just something simple so that we're not too worried about the database layer for this. And so we can use this bit of code to add stuff into our database. Now, to actually send the letters, we're going to make two classes. So the first one's job is to wrap
the third-party mailing service with something that's a little friendlier for the internals of our system. So our system deals with an address as a hash. So this letter sender is going to take the letter ID of a letter we'd like to send, a hash of the address to send it, and it will handle figuring out from that hash how to call the third-party service.
The second class that we're going to need is going to handle getting an address out of the database. Oops, sorry. There it is. So it's going to take the ID of an address and the ID of a letter. Using the ID of the address, it will, again, use our simple database API to get the address out and then call the letter sender.
So this is very simplified, right? But we've separated concerns, and, you know, this probably has shades of an application that you work on. This is very simplified, but, you know, it has all the hallmarks of what we do. We're integrating with something, we're wrapping this, we're getting things out of a database, and everything in here is a string, but it's, you know, pretty simple.
So let's see it in action. So we'll store an address, 45 South Fair Oaks Avenue, Beverly Hills, California, and the only zip code everybody knows, 90210. Okay, so we've created an address. This is the address of the Peach Pit from 90210, if anybody ever watched that. So let's send the Peach Pit a letter.
Okay, we sent letter number 12 to the Peach Pit. Everything looks good, no explosions, everything's great. Happy Path works, everything is looking pretty good. Let's send another one. So 1675 East Altadena Drive in Altadena, California. Now, this is the address of the house that they used on 90210 to represent
where Brendan and Branda Walsh lived. This is where that actual house is. So we're gonna send that house a letter, but instead of using the proper zip code for Altadena, California, we're gonna just type Walsh. Database successfully inserted that row. We now have an address that is clearly wrong. Let's send the Walsh as a letter.
Okay, with the letter sent. So that looks good, right, except that the address is wrong. Walsh is not a zip code, yet our system was totally happy to send a letter to an address that had that as the zip code. So what might go wrong in this case? So if we're lucky, the best case is that our third-party mailing service will blow up and tell us that this is a bad zip code
and refuse to send this letter. Every other possibility is worse. What if it gets sent to the post office? What will they do? Will they return it to us? Maybe. Will they trash it? They might. Will they try to send it? Sure. And if they do send it somewhere, where will it go? Is there a West Altadena Drive and they send it there by mistake?
Is there an East Altadena Drive in some other town in California where it goes? Who knows? We've given them an invalid address, we've made their job hard, and if this letter was important, it might not get where it needs to go. So clearly there's a bug in our system somewhere. So you could argue that the problem is in reading the input, right?
We have this zip code as a string, but not every string is a zip code. So maybe if we restrict our input from being anything that doesn't look like a zip code. Okay? So we could do that. So now when you're adding addresses to the database, we're never gonna allow an invalid zip code. But I could just go into the database
and insert an invalid zip code, right? There's nothing stopping me, and this is a real thing that could happen. We might need to bulk import data from some other place. Someone who is not a Ruby programmer might need to get data into the database. There might be a need to get data into the database through something other than that cheesy command line thing that we wrote. So this is a real possibility.
So we could say, well, maybe our database design was bad. Maybe we should do a better job of designing our database. So we could do something like this. If you're familiar with Postgres or other databases, this is called a check constraint, and this means that the database will refuse to insert anything into the addresses table unless the zip code matches this regular expression.
That's darn handy. So that would prevent, right, inserting bad data into the database. But there's nothing stopping me from going into a console and just calling the letter sender directly with an invalid address. And again, this is a reasonable thing that we might want to do. What if it's an emergency and I don't have time to get it into the database? What if someone else sees this class and wants to use it
and they expect it to deal with this error case and it obviously doesn't? What if we need to send letters to addresses that aren't in our database and don't have the protections that we just added? Right, so there's a lot of real reasons that we could do that. So we could add a check in letter sender, but then what are we doing? We're, like, adding zip code checks everywhere. And even though our system that we're looking at,
right, it's pretty small, but in any reasonable system, like, having to check this everywhere is untenable. I mean, you can't build a maintainable system this way because every time you want to add, change, modify, do anything that has to deal with the zip code, you've got to make sure to remember to add a check to make sure that it's correct. And that is not good.
What we really need is a better way to deal with this. So if we think about our system, like, it looks like this. We've got our three classes that we made and they're all pretty small, they're pretty single purpose, they're pretty easy to understand, they're probably pretty easy to test. It feels like we did a really good job designing our application,
so why are we having this problem? So I've drawn the parts of the application as blobs instead of the traditional box because, to emphasize the point, that the problem is the boundaries between these applications are very loose, they're very ill-defined, they're not very strong, and so anything can come and go. We really want only zip codes going around
in our code base, but we've done nothing to indicate that. In fact, we're allowing anything as a string to go through our code base, and so these boundaries are bad, which means it's easy to do the wrong thing and easy to create a problem. So what we'd like to, well, specifically, a boundary, to be very clear,
is what's coming and going if you think of a method, right? It's what arguments are expected. If I'm calling this method, what am I supposed to pass in? What am I not supposed to pass in? What am I going to get back? If the answer is you can pass in anything and you'll get back anything, it's really hard to know what this method does, it's really hard to use this method correctly,
it'd be really hard to make changes to this method. It's not good. What you really want to say is I expect a zip code, and I will return you an address or whatever the case may be. But that's not what we do. We put strings and hashes everywhere. We use them all the time. Why? Because
input is strings. Input from Rails is a hash. When we produce things, it's always in a hash. APIs want hashes, they want JSON objects, or whatever the case is. And the problem is that we have these strings with no expectation of what's supposed to be in them. We have these hashes where we have no idea what the keys are supposed to be, and so it makes it very hard to understand.
But there's a reason that we do this. And it's not because we're stupid or lazy or not good programmers or something like that, it's that there's a convenience to this. If we wanted to create a system using Java, for example, so Java, yes, you can make a string, you can make a hash, but it is not as easy as it is in Ruby, and the APIs around strings and hashes in Java
are somewhat cumbersome to use, so you don't tend to create systems in Java based on strings and hashes. Instead, you think really hard and you create special purpose classes for everything before you actually start writing code. In Ruby, though, we don't have to do that. We can start exploring the design
of our system, figuring out what classes we need, figuring out what those boundaries are by executing code, and strings and hashes let us do that, because we don't have to think right up front, what do I need in these strings and hashes? I'm just gonna, like, just push code around and see if I can get things just kind of working before I think too hard about what's supposed to go in these strings
or what's supposed to be in these hashes. Problem is, if we leave the system like that, we leave it in this state where it's very hard to understand and very hard to modify. So what we need to do is, after we've figured out what the boundaries are between our classes, we need to use, make use of data types to explain and define explicitly what is supposed to be coming
and going, what are the boundaries. So a data type is a set of possible values. So, integers is a data type, and so there's an infinite number of them, but not every number is an integer. So, these are integers, these are not, so that gives you some feel of a data type, right? A string can be anything.
A zip code can only be five digits, so the possible values of a zip code are much, much smaller than the possible values of a string. A data type also defines allowed operations. So, what can you do with a number? You can add it, multiply it, divide it. You can't add, you can add strings, right, or concatenate them, but you can't divide
a string, that doesn't make any sense. Division is not an allowed operation on a string. Thinking about our zip codes, right, what operations are allowed on zip codes? But anything can be done, you know, so many things can be done to a string. And finally, a data type also affords you to assign meaning to these possible values, right? So, if I give you the string 20005,
it is meaningless. But if I tell you that that string is a zip code, then you know it means Logan Circle and Washington, D.C. There's meaning to these values. So, if you can describe your data as a set of possible values, the allowed operations, and some kind of meaning of those values, then you've said a lot about the system that you're doing. You've said a lot about the boundaries coming
and going, a lot more than what you get with just a bunch of strings. So, we can compare our implementation using strings to what we would actually like to have happen, right? A string, the possible value is anything. Anything can be a string. Operations, there's a ton. Meaning of a string can be anything, or really the meaning is nothing. A zip
code, right, it's only five digits. There's only 100,000 possible zip codes. Operations, there's probably none, at least not so far. What they mean, they have a meaning. So, we've implemented a system using this thing on the left, and we want to use the thing that's
on the right. So, that's the source of problems. If we could just arrange things differently, our system might be better, right? If we had these firm boundaries, so these circles would represent where we actually need to deal with the string, but everywhere else in our system can be safe in the knowledge that it's getting a valid zip code. We can reuse those parts of the system. We can add to
those parts of the system without having to also add all these stupid checks, because then we can say, well, the data type that we want is a zip code, and we can be sure that that's coming and going, and that makes it a lot easier, and this bug that we saw would have not manifested itself in the same way. So, let's make one. You could do it in a lot of ways. I think making a class is the best
way, but it's not certainly the only way. So, we need to get the tainted potentially invalid string into our class at some point. So, the initializer will take a string that we're going to assume is going to contain those five digits of the zip code. But, since the string can be anything,
we do need to check. So, in here we will check if the string is not exactly five digits in a row, then we will raise an exception. Otherwise, we're good to go. Just this, just these four lines of code will help us dramatically, right? If we create a zip code with 90210, that is valid, so zip1 is assigned. We can use it. If we try to create the zip code
with Walsh, that will explode. Zip2 will never be assigned, and that means that anywhere in our code when we have a reference to a zip code class, we know that we have a valid zip code. We don't have to check. It's already been checked. It's coming into us as a correctly modeled data type. Obviously, Ruby lets you
subvert all of this, and I'm not going to talk about that, but the intention here is encoded in our code. Like, it's very clear what we're trying to accomplish. We are going to need one operation, which is to get out those bytes, right? Because our service later down the line needs a string, we need a way to get that string out, so we will provide this method raw-zip-code that just returns it.
And know that this operation, it happens to return our internal representation, but we could represent internally using an array of five characters if we wanted to. It doesn't really matter. This provides that operation of what is the raw string of the zip code. So, that, if we were to implement our system using that,
then we would reduce the places in the system where we would potentially have an invalid zip code, and then the internals of our system would be just what we want. Nice, firm boundaries that are very clearly defined. So, I would say that this thing that we've been through, it's part of the design process, right? We figure out what problem are we
trying to solve, and then we just shove our code around trying to figure out how to solve it. What is the code supposed to look like? What classes do I need? Get it to work, and then once you understand what you're trying to do and how you're going to do it, you can define the boundaries that you've created between your classes by using data types and creating a class that matches them. That makes everything
easy to understand and is generally, generally good. So, we've seen how to make a class. We know generally what that class was trying to accomplish and why we care, but how do we actually let people know? How would I know I'm supposed to pass in a zip code and not a string? How am I supposed to know that I'm getting a zip code back and not some other
type of object? So, Ruby has no way in the language to do this, right? You can't, in the language, say this variable is supposed to be this type. There's no way to look at a Ruby program and figure that out. You have to run the program and see what happens, and there's no reason to think that running the program twice would have the same results, so we have a little bit of an issue, but
we follow a few conventions that make it pretty simple. So, if you have a variable named zip underscore code, it probably stands to reason that's supposed to be an instance of the zip code class, because our code base contains a class called zip code. That seems reasonable to assume, so that's a convention we can follow. If we need to have multiple zip codes in play, then we can
you know, suffix zip underscore code to whatever the variables are. So, sender underscore zip underscore code, we can assume is probably an instance of a zip code. Method names would be the same way. If I have a method name that has zip underscore code in it, I can assume it's going to return me an instance of the zip code class. And you can and should document the public
methods. That makes it unambiguously clear what is expected. What's coming? What's going? What is the boundary? What class can I do? And if you do that, Rdoc will hook it all up and cross-link it and everything if you ever generate and look at Rdoc. Here's an example, right? So, we've got this geolocate method takes zip underscore code as an argument, so we can assume
that that's an instance of zip code, but because we have two seconds in our lives to help someone, we're going to write a documentation that makes that clear, because this is a public method. Nearest zip code, same deal. Since the name of the method has zip underscore code in it, we can assume we're getting a zip code back, but again, we will take two seconds to write that out because this is a public method
and it might be helpful. Our private method centerpoint also takes a zip code clearly, but since it is a private method and no one is going to see it and it is likely to be changed later during refactoring, no real reason to write a document comment there, although you could. I mean, it can't really hurt. So, we talked about
a convention for that. We talked about how we can kind of communicate to other developers and to ourselves what the boundaries are. Can we enforce these boundaries, right? We'll talk about whether or not we should, but let's just see what it would look like if we tried to actually enforce these boundaries, right? To actually write our code so that it would not work to misuse the data
types. So, you can check the type, so isa is a method on every object. It returns true if the object is the same class as the class given or a subclass. So, if we made a subclass of zip code, this would still return true. So, that would be a way to check. If you didn't want to allow subclasses, you can use instance of.
So, this will return true if zip underscore code is a zip code, but not if it is a subclass of zip code. So, this is a stronger check of the type of the argument. A looser check would be to look for the methods that it supports. Maybe we don't care if it's a zip code, but we want it to respond to the method raw zip code. So, respond to is a way to do
that. This is, you know, when they talk about duck typing in Ruby, like that's kind of what's going on here. This is called structural typing, which means the type of something has to do with its structure, not its name. So, this is a weaker check, but it's a check. For checking return values, you can't easily access the return values
of every method like we could with the arguments. So, you'd have to do it in a test. So, in all of your tests for these methods that return specific data types, you would assert that the result is whatever the class is that you want. And you'd have to remember to always do this. So, that is what it looks like. And if you had to write every single routine like that, it would probably be pretty painful.
Right? So, should we do that? Is there any value in doing that? Some programmers think it is unwise to write code where the types are not checked. That it is a bad idea to allow code to be written where you're not doing these checks all over the place. And that's fine. You could write type checks everywhere if you wanted to. In Ruby, I think that would be pretty painful,
and I don't think we need to. And if that's the kind of code you want to write, I mean, there's other languages that are really tailor-made for doing lots of type checks. Ruby is not one of them. And I think that's fine. So, then the question is, is checking these types, like, should we ever bother? Like, is there any point to it? I think there's two points to it. Two times that it might be useful. If there's some sort of
risk involved, which we'll talk about, or if you're doing a big refactor, it can be helpful. So, risk. Right? Whenever I'm doing this, I kind of ask myself a couple questions. How often am I expecting to get the wrong type in? Is it something that's going to be really hard to screw up? Is it really obvious what to do? Or is this really easy to screw up and everyone's going to, like, pass the wrong thing in?
And then I balance that with what happens if the wrong type does come in? Will there be an exception just generated that is nice and I don't have to bother with checking? Or will the medical laser cut off someone's thumb, and in which case maybe I do want to check? So, you do this calculus in your head to try to figure out if you can be helpful by adding a type check.
I don't often do it, but, you know, I do think about this when deciding whether or not it's useful to do. A big refactor is another one. So, if you've ever worked in a statically typed language with a compiler, and you want to do a big refactor, which I mean you're changing, you're making fundamental changes to internal core objects,
and it can be tricky because it's a big change. With a compiled language, right, you can change the types, however you need to change them, compile everything, and then you'll get a big slew of errors about everywhere that you've screwed up the types, and then you fix them, and then you're done. This is called lean on the compiler, if you've ever heard of that. This is darn handy. It makes it really easy
to do these sorts of refactors in a language, you know, like Java that compiles. So, obviously we don't have a compiler in Ruby, but we do have a test suite, and so we can leverage that with some type checks to kind of fake it out, right. So, we'll change whatever core types we want, add in explicit type checks to our code, anywhere that's using the things that we just changed, right.
We're gonna add those type checks to lots and lots of places because when we run our test suite they will all blow up with nice, helpful error messages about our misuse of the data types that we've created. Fix the errors, and then we can remove all the explicit type checks because they really don't need to be there. So, this is the sort of thing you can put in your pocket if you are faced with a large change that you want to make
and you're a little nervous about how well you're gonna be able to do it. Adding these explicit type checks can, can help. But both of these amount to like, checking explicitly is not something that you probably are gonna do very often. So, the other problem. The world wants a string, right.
HTTP is strings. Rails is strings. Databases are strings. Output is strings. Input is strings. So what do we do when we want to live in this world, we have nice, well-defined data structures, but the world wants these dirty strings. So, there's two things you can do. First, you can adapt some protocols that will
turn your data type into a string in a standard way that various parts of the Ruby library expect. You can also create wrappers that will take strings and turn those into your data type. And by judiciously using those wrappers in just the right place, you avoid having to convert strings to your data type and back throughout
your code. So let's see a few examples. Protocols. If anyone has read Confident Ruby by Avdi Grimm, if you haven't, you should buy it. It's excellent. It contains all kinds of tidbits about Ruby. It's super, super helpful. Chapter 4.2 talks about protocols adopted by the runtime in the standard library for turning
objects into simple types like strings. So for example, here's our zip code class. If we were to try to use it in IRB, right, the string version is this horrible thing with this object text thing. I don't know what that is. Puts does the same deal. String interpolation does the same thing.
So, what's going on here is that Ruby is using a protocol to turn your object into a string. The default implementation of this protocol is this mess. That's 2s, if you couldn't guess. So we can just alias 2s to our raw zip code method. We can just say, okay, the string representation of our zip code is this raw zip code internals. And now
it works. Wonderful. But there's a second protocol, and I bet we've all seen this one before. No implicit conversion of zip code into string. You've probably seen this with a boolean, like put s some string plus a boolean. And at first it just used to drive me crazy, because it's like, dude, you have 2s, you know how to convert it into a string. But this is actually Ruby
doing a type check. Ruby is saying that if you want to append your non-string object to a string, that's probably wrong. You're probably making a mistake. I'm not going to let you do that. So, I'm going to blow up. I think you're wrong. And you can tell Ruby, no, I am right. I know what I'm doing.
I would like this converted to a string. And you do that by implementing toStir. So toStir is not implemented by default, which is why we get this exception. So, if you implement toStir, that is sending a very strong message to everyone that like it is cool to convert me into a string and this is exactly how I'd like you to do it.
And, oh geez, and it works. So that is great. These protocols are also useful for web templating languages like ERB that we're showing here. It will call toS or toStir depending on the templating language to turn your object in. So you don't have to wrap your data structures in these presenters
that turn them into strings, as long as their string representation is appropriate. For simple types like this, I think it is, for a more complex type, maybe you wouldn't want to do it this way. So, there's some other ones that don't have to do with strings. There's toA, converts any object into an array. So if you implement this,
your object can be used with a splat operator to be turned into elements of an array, which is darn cool. toInt is like toStir but for integers, so if you have like a price or a discount or something like that, this is a good way to go. toH is added in Ruby 2.0 to turn it into a hash.
So, those are some protocols to turn your custom data type into a more primitive one that the standard library and other libraries might be expecting. The second one is wrappers. So we're getting strings out of our database, as we saw before. And we want to turn those strings into our zip code
data type. So we could add a line of code here to do that, but then you've got to remember to add this line of code everywhere that we called selectFrom. We don't want to do that. We would like to isolate that inside the database. And so this goofy database handle that we have, like it doesn't have this feature. But we can add this feature by using the power of SimpleDelegator.
Does anybody use SimpleDelegator? Okay, this is cool. So, SimpleDelegator, you inherit from SimpleDelegator like we're doing, and you can see here on line six we're creating an instance of our wrapping database. And so what that's going to do is any message we send to database is going to pass it through to the underlying object. So
we've created a wrapper around our goofy database handle that will pass all the methods through. But since we're doing that in a class, we can add methods to that and give it better functionality. We can intercept methods. So we'll add a method called convert that is this DSL-ish thing that will tell the database we'd like to convert
the zip code column of the addresses table into a zip code class. And it saves it here in a table. So then what we can do is we can override the selectFrom method to basically do the conversion inside. So first thing we have to do is call the original object selectFrom method to have it
do the work of selecting from the database and getting those strings back. SimpleDelegator gives you a method called __getobj__ which I assume was created because no sane person would make a method named that so they can be safe that it will be available. That returns the object that we're wrapping. We then call selectFrom on that object
to get the raw results. Then all we need to do is convert those raw results that have strings into a result set that has our data type in it. So convert results will take raw results, which is an array of hashes, and it will convert each row of that into a into a hash that contains our
data type. And convert row is a little gnarly. So take away from this that this is possible and not necessarily how this code works. But basically we go through the hash that we're given, and if there's a conversion for the table and the column that we've got, we do the conversion. Otherwise we just leave the value alone, and at the end we return the
converted row. So that's a way you could add wrapping to anything that you want. So this lets you kind of consolidate it into one place where it's needed and not have to do these conversions everywhere. Now, I'm guessing you're probably not using a database mapping tool invented for a conference talk, but maybe
something else like ActiveRecord. So ActiveRecord has this feature. It's called serialize. You give it the name of the column and an optional class that will handle the conversion. By default it does this silly thing with YAML, but you don't want to do that. We want to convert it ourselves. So we'll give it our zip code class, and then we'll implement two methods on that class, load
and dump. Load takes the string from the database and is expected to return the data type that we want. Dump takes the data type that we have and is expected to return the string representation suitable for putting into the database. Notice we're using the to-stir protocol here. So this code is actually fairly generic. If we had a richer system of types, we could
generalize this somewhat. And so this is all you need to have ActiveRecord use your custom data types. And what will happen if you do this, because remember you can't make a zip code if it's invalid if someone does put invalid data in the database, it will blow up the second it is fetched. So this will A, tell you
what data is bad and where someone was trying to access it, but B, it will prevent an address with an invalid zip code from populating your system and infecting it with this bad data. It absolutely prevents it, thus allowing the internals of our system to continue to be implemented as if they're getting valid data because we have ensured
that that will happen. SQL, I don't know if anybody uses this. I haven't used it in a very long time, but it does support this as well. I have never run this code. This is to demonstrate that it is supported, but basically you use a serialization plugin and you register the serializer. So you've got a name of it and then two lambdas,
one which is sort of like a dump that we saw before that's expected to get the string, the other is like load that is expected to return the actual object and then throw a couple of methods in your model and now using SQL you have all of this goodness as well. So this makes it really easy to use
custom data types without having to worry about strings leaking in to your system. So this whole thing was talking about zip codes just because it was very simple and easy, but there's lots of other places you could do this with even the simplest things. So email address, we store those as strings all the time.
So these two email addresses are the same. They're the same email address, but the string representations that we have here are false. So using a string is probably not a good idea. Then we have to do this crap all the time. I mean, we've done lower and down case and all this is terrible. What if we had a class, a data structure, right? The data type of email,
these two emails are the same. So the data type can reflect that. Prices, right? If you're doing e-commerce, we have prices everywhere. So what is one third off of a hundred dollars? It is not that. 33.33.33.36 is not a price. Right? Price.new 100
discount third. 33.33 is a price. So by using a simple data structure, we can enforce the rules of our system. Make sure the prices are always correct. It also forces you to think about stuff like this. Like there's some sort of mathematical operation I'm sure we could do that would have us debate do we round up or do we round down. Well,
you should think about that and the price class would be a great place to encode that decision. Enumerated types like states or provinces or status codes or error codes, these are also really great for making data structures because they have a very well defined and obvious set of possible values. Okay, so that is my entire spiel to take away
anything. Think about using data types when you're designing and building your code. Don't forget to define those boundaries because you have something in mind. You have in mind data that is coming and going. Instead of just letting that data go off somewhere or be in documentation or written in a wiki, like put it in code. Right? When you make a data structure explicitly
in code, that's putting your knowledge and your intent in the code so that others can benefit and so that your code is easier to read and write and understand. So thank you. Here's a link to the slides if you want to check them out later. They're online. If you have a crappy consulting job you can come have a great job at my company so talk to me later about that.
And that is all I have, so thank you.