s/regex/DSLs/: What Regex Teaches Us About DSL Design - TIB AV-Portal

s/regex/DSLs/: What Regex Teaches Us About DSL Design

00:00

0

Formal Metadata

Title

s/regex/DSLs/: What Regex Teaches Us About DSL Design

Title of Series

Ruby Conference 2015

Number of Parts

66

Author

Contributors

Ruby Central, Inc.

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/37570 (DOI)

Publisher

Release Date

Language

Producer

Production Place

San Antonio

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Many Ruby domain-specific languages go for beauty over usability - and it shows, when you try to use them. But one of programming's oldest, most common DSLs - regular expressions - is both as ugly and as persistent as a cockroach. What makes regexes tick? By breaking down their design, we'll learn concrete principles that go deeper than "Englishy:" principles like "composability" and "deep domain integration." We'll learn how to get precise about the API design and boundaries of our DSLs. We'll write a micro-DSL that is usable without monkeypatching.`

Ruby Conference 20155 / 66

1

30:09

Your own 'Images as a Service'

2

34:23

Writing concurrent libraries for all Ruby runtimes

3

35:38

Working Compassionately with Legacy Code

4

42:04

Why is nobody using Refinements?

5

35:31

s/regex/DSLs/: What Regex Teaches Us About DSL Design

6

35:35

Using Ruby In Security Critical Applications

7

31:03

Time flies like an arrow; Fruit flies like a banana: Parsers for Great Good

8

39:23

The Seven Righteous Fights

9

38:09

The Not So Rational Programmer

10

44:22

The Math Behind Mandelbrot

11

29:50

The Joy of Miniature Painting

12

35:25

The Hitchhiker's Guide to Ruby GC

13

28:04

The Art of Ruby Technical Interviews

14

34:48

Tagging your world with RFID

15

46:14

String Theory and Time Travel: the humble text editor

16

25:16

Storytelling via the Psychology of Professional Wrestling

17

42:29

Stately State Machines with Ragel

18

44:33

Softly, softly typing

19

37:24

Shall We Play A Game?

20

39:11

Seven Habits of Highly Effective Gems

21

28:21

RuntimeError: can't save WORLD

22

40:24

Ruby's Environment Variable API

23

32:04

24

27:10

Ruby in 79 AD (Open Sourcing my Role as Indiana Jones)

25

41:38

Ruby 2 Methodology

26

44:21

27

40:06

Not so Neo In the Matrix

28

28:57

Nobody Expects an Inquisition! - A Programmer’s Guide to Asking Questions

29

34:34

Moneyball at the keyboard: Lessons on how to Scout Talented Developers

30

25:48

Mo Money Mo Problems (with Ruby)

31

35:06

Mind Over Error

32

28:47

Messenger: The (Complete) Story of Method Lookup

33

32:13

Manage Your Energy, Not Your Time

34

34:41

Making it on your own and the pitfalls of gem dependencies

35

1:24:22

Ruby Conference 2015: Lightning Talks

36

45:47

Learn to Make Music. With Ruby.

37

27:32

Keynote: Stupid Ideas for Many Computers

38

43:29

Keynote: Leagues of Sea and Sky

39

42:55

Keynote: Consequences of an Insightful Algorithm

40

1:12:38

Keynote and Q&A: Matz

41

36:30

Just a Ruby Minute

42

43:57

JRuby 9000 Is Out; Now What?

43

39:10

Inside Ruby's VM: The TMI Edition.

44

40:30

I Estimate this Talk will be 20 Minutes Long, Give or Take 10 Minutes

45

44:08

How to Stop Hating your Test Suite

46

30:03

How to Performance

47

42:11

How to Crash an Airplane

48

42:30

How does Bundler work, anyway?

49

26:31

Hardware Hacking: You can be a Maker

50

37:50

Hacking Spacetime for a Successful Career

51

42:57

GDB: A Gentle Intro

52

34:55

Extremely Defensive Coding

53

46:41

Everything You Know About the GIL is Wrong

54

35:57

Domo Arigato mruby Roboto

55

30:10

Design Thinking for Rubyists

56

37:13

Cucumbers Have Layers: A Love Story

57

37:39

Communicating Intent Through Git

58

24:29

Code, Culture and the Pursuit of Happiness

59

34:41

Changing the Unchangeable: The Hows and Whys of Immutable Data Structures

60

40:23

Building CLI Apps for Everyone

61

28:36

Botany with Bytes

62

41:17

Bikeshed! Live!

63

45:47

Beating Go thanks to the power of randomness

64

29:33

A Tale of Two Feature Flags

65

42:02

A Muggle's Guide to Tail Call Optimization in Ruby

66

40:13

A Guided Read of Minitest

Automatic playback

Speech

Text

Image

00:00

Regulärer Ausdruck <Textverarbeitung>Order (biology)String (computer science)Matching (graph theory)Regulärer Ausdruck <Textverarbeitung>Computer animationLecture/Conference

00:45

Regulärer Ausdruck <Textverarbeitung>Boundary value problemPlastikkarteGroup actionMatching (graph theory)String (computer science)SpacetimeFrequencyPlastikkarteTouchscreenRegulärer Ausdruck <Textverarbeitung>BuildingSingle-precision floating-point formatPhysical systemNeuroinformatikBlock (periodic table)Group actionMotion captureNumberWordSubsetBoundary value problemPoint (geometry)Default (computer science)Latent heatComputer animationLecture/Conference

02:26

Execution unitHydraulic jumpRegulärer Ausdruck <Textverarbeitung>EmailPiGraphical user interfacePointer (computer programming)Hill differential equationLipschitz-StetigkeitUniform resource locatorProgramming languageComputer programmingProblemorientierte ProgrammierspracheRegulärer Ausdruck <Textverarbeitung>Element (mathematics)Programmer (hardware)Validity (statistics)EmailProjective planeProgramming languageRight angleFrame problemPattern languageComputer programmingRegular languageHill differential equationSource codeCellular automatonProblemorientierte ProgrammierspracheNumberComplex (psychology)PlastikkartePersonal identification numberVector potentialTouchscreenSequelProcess (computing)Address spaceControl flowPoint (geometry)Gastropod shellSheaf (mathematics)Utility softwareComa BerenicesIndependence (probability theory)InformationSurfaceScripting languageSign (mathematics)Power (physics)OctahedronFormal grammarDemosceneSocial class19 (number)String (computer science)WordSinc functionImplementationTheorySoftwareWikiGame controllerLecture/Conference

07:05

Inclusion mapEmailLemma (mathematics)Regulärer Ausdruck <Textverarbeitung>Normed vector spaceMaxima and minimaHill differential equationExecution unit10 (number)DisintegrationProblemorientierte ProgrammierspracheRegulärer Ausdruck <Textverarbeitung>Product (business)Validity (statistics)EmailProjective planeComputer animationLecture/Conference

07:34

DisintegrationProblemorientierte ProgrammierspracheProblemorientierte ProgrammierspracheRegulärer Ausdruck <Textverarbeitung>Validity (statistics)INTEGRALRight angleProgramming languageBitLatent heatProcess (computing)Data structureProgrammer (hardware)LogicArithmetic meanComputer programmingState of matterCellular automatonComputer animationLecture/Conference

08:47

Query languageDigital photographyTwitterProgramming language2 (number)Computer-assisted translationQuery languageHash functionRight anglePoint (geometry)Interface (computing)Computer animationLecture/Conference

09:15

Query languageComplex (psychology)Inclusion mapTerm (mathematics)Core dumpFocus (optics)Problemorientierte ProgrammierspracheInterface (computing)Hash functionFunctional (mathematics)INTEGRALQuery languageComputer-assisted translationDigital photographyTwitterGraph drawingLogicComplex (psychology)Programming languageSoftwareString (computer science)CircleTransport Layer SecurityComputer animationLecture/Conference

10:29

2 (number)String (computer science)Coefficient of determinationMatching (graph theory)WordSoftware frameworkProgramming languageSoftware testingProgrammer (hardware)CASE <Informatik>Goodness of fitImplementationINTEGRALFormal grammarProblemorientierte ProgrammierspracheTerm (mathematics)Right angleTape driveComputer programmingType theoryTelecommunicationComputer animationLecture/Conference

12:19

Rule of inferenceFormal grammarShape (magazine)ParsingProblemorientierte ProgrammierspracheSinc functionProgramming languageFitness functionRegulärer Ausdruck <Textverarbeitung>Combinational logicMereologyData structureComputer-assisted translationMatching (graph theory)BuildingCellular automatonCoefficient of determinationComputer animationLecture/Conference

13:09

Computer configurationComputer clusterMacro (computer science)DemonChainingInflection pointBlock (periodic table)Configuration spaceData structureSocial classLibrary (computing)Interface (computing)Cellular automatonLevel (video gaming)Block (periodic table)WordConfiguration spaceAxiom of choiceData structureTouchscreenComputer configurationWave packetSequenceElement (mathematics)Formal grammarSeries (mathematics)INTEGRALMedical imagingSoftware frameworkVideo gameObject (grammar)Software testingCategory of beingShape (magazine)QuicksortCASE <Informatik>TwitterEndliche ModelltheorieQuery languageCodeImplementationRule of inferenceMacro (computer science)ChainHookingDifferent (Kate Ryan album)Problemorientierte ProgrammierspracheBuildingType theoryComputer animation

15:14

Windows RegistryContext awarenessData structureBlock (periodic table)Formal grammarNetwork topologyCellular automatonBlock (periodic table)Data structureCodePattern languageAxiom of choiceProblemorientierte ProgrammierspracheContext awarenessWeb 2.0Directory serviceShape (magazine)BuildingAbstractionMultiplication signSlide rulePoint (geometry)Different (Kate Ryan album)FrustrationFormal grammarLine (geometry)RoutingFitness functionCuboidComputer animation

16:38

CodeData structureBlock (periodic table)Social classDifferent (Kate Ryan album)Library (computing)Fitness functionCodeData structureBlock (periodic table)String (computer science)Formal grammarProblemorientierte ProgrammierspracheContext awarenessPatch (Unix)AbstractionRegulärer Ausdruck <Textverarbeitung>Projective planePerfect groupOrder (biology)Instance (computer science)Cellular automatonValidity (statistics)Multiplication signWordObject (grammar)Process (computing)Entire functionFrustrationArtificial lifeProgramming languageHypermediaComputer animation

19:09

Interior (topology)MUDCASE <Informatik>ImplementationBit rateMatching (graph theory)WordExpected valueProgramming languageTrailSocial classData structureObject (grammar)Extension (kinesiology)2 (number)Scripting languageRight angleCodeLibrary (computing)Server (computing)Process (computing)Context awarenessClient (computing)Different (Kate Ryan album)Sampling (statistics)Range (statistics)Set (mathematics)Universal product codeSymbol tableString (computer science)Projective planeGreedy algorithmBound stateRule of inferenceOrder (biology)WavePoint (geometry)Instance (computer science)MereologyQuantum stateDescriptive statisticsSemiconductor memorySoftware developerBoundary value problemMilitary baseGroup actionMultiplication signBlock (periodic table)CuboidInterface (computing)Similarity (geometry)System callInformationLoop (music)Software frameworkMathematical optimizationGodLatent heatSoftware maintenanceMeta elementWeb applicationOnline helpView (database)Regulärer Ausdruck <Textverarbeitung>Computer animationLecture/Conference

24:38

Extension (kinesiology)Limit (category theory)SpacetimeLatent heatProgramming languageExtension (kinesiology)Boundary value problemShape (magazine)Formal grammarPatch (Unix)WordBound stateRegulärer Ausdruck <Textverarbeitung>WebsiteFrequencyComputer animationLecture/Conference

25:49

Limit (category theory)SubsetEmailProgramming languageParsingRegulärer Ausdruck <Textverarbeitung>Stack (abstract data type)Validity (statistics)Complete metric spaceBuffer overflowGroup actionComputer clusterComputer animation

26:18

Execution unitMaxima and minimaInclusion mapEmailIntrusion detection systemConvex hullRoyal NavyDisintegrationProblemorientierte ProgrammierspracheINTEGRALOrder (biology)CuboidProblemorientierte ProgrammierspracheLimit (category theory)CodeFree variables and bound variablesClosed setMultiplication signCellular automatonSource codeComputer animationLecture/Conference

27:16

Physical systemAerodynamicsComputer programmingProblemorientierte ProgrammierspracheRule of inference1 (number)Right angleHypermediaComputer-assisted translationInformationOpen setTouchscreenGroup actionWebsiteElement (mathematics)InternetworkingBuildingMultiplication signFeedbackLaptopGoogolEntire function2 (number)System callTwitterLibrary (computing)Scaling (geometry)Table (information)Order (biology)Focus (optics)CausalitySelf-organizationAverageSpacetimeEndliche ModelltheorieRoyal NavyGoodness of fitSurjective functionProblemorientierte ProgrammierspracheComputer animationLecture/Conference

31:54

Multiplication signBound stateDependent and independent variablesPower (physics)SpacetimeSoftware testingSlide rulePoint (geometry)Software repositoryLevel (video gaming)Computer animation

32:45

Windows RegistryBlock (periodic table)Modulo (jargon)Formal grammarData structureComputer configurationMacro (computer science)Group actionGreatest elementComplex (psychology)Slide ruleBlock (periodic table)Parameter (computer programming)Object (grammar)Software testingLibrary (computing)Different (Kate Ryan album)System callTerm (mathematics)Hash functionImplementationCASE <Informatik>Rule of inferenceComputer configurationData structureExecution unitInterface (computing)Decision theorySpherical capPoint (geometry)Multiplication signTouchscreenVibrationCopyright infringementComputer animationLecture/Conference

Transcript: English(auto-generated)

00:16

My name's Betsy Hable, and this afternoon we're going to be speaking about Regexes,

00:20

and specifically their DSL design, and what we can learn from it when we're designing other DSLs. So just to keep everyone on the same page, we're going to start with a quick introduction for regular expressions to anyone who's not familiar with them, or for anyone in the audience who could use refresher since they haven't worked with them in a while.

00:44

Here's the simplest Regex I can think of. It searches a given text for the letters D, O, and G in that order and with no characters between them. So it'll match any of these strings here. And here's a less trivial example.

01:04

In this one, we use the period wildcard to match any character. Since this wildcard matches any character, the regular expression D period G, which is now on the screen, thank you Google, can match the strings dig, D space G, D exclamation point G, or a lot of other things.

01:25

There are a lot of other little wildcards. They can match more specific things as well. Word characters, white space, even a thing called a word boundary, which is the first or last character of any given word. Both characters and wildcards can be grouped

01:41

if the default groupings aren't powerful enough. And you can specify the number of characters to be matched with other wildcards like star and plus. The specifics matter less right now than the mere fact that there are a lot of things you can do. Getting a little more complex, you can use capture groups to single out

02:00

specific subsets of your match for special treatment, and a back reference to refer to a previously captured capture group. Also, Peter Piper picked a peck of pickled peppers later on within a single regular expression. So we've got all of these building blocks, and individually, they're pretty simple.

02:26

Good little computer. There we go. And I'm not going to pretend that all regex are simple. This, for example, is an email validation regex that someone, somewhere, for some reason, recommended that other programmers use in production.

02:43

These simple elements that make up regexes can be combined in ghastly, hieroglyphic-esque ways, and often are. So at this point, you may be wondering some things. Things like whether it is possible to learn about designing DSLs,

03:01

or indeed about designing anything from something that produces screenfuls of mess, and that can't even fully parse an email address in the process. Because, of course, that email validation regex I just showed you did not actually work. And the answer is that regex are old.

03:22

Like C, like shell scripts, like Vim, regex are gawky and horrible, and everyone has used them for decades anyway. They are too bloody useful to erase. They are too bloody useful to give up, no matter how much we try to replace them with tools that are nominally aesthetically prettier.

03:41

Anything that that bloody useful has to teach us design lessons, whether its surface seems polished or not. The biggest goal of software design, over and above how elegant things are, is getting the damn thing to work. And regex, bless them, do that if nothing else. Some of that, as we will see later in this talk,

04:00

is because they get to cheat, but we can still learn from the ways they cheat. So, how old are regex anyway? They were first defined as a mathematical concept back in 1958. They were an outgrowth of set theory used for describing grammar of regular languages. A decade later, they were implemented as a simple, independent programming language.

04:23

Note that this first implementation treated them as a programming language in their own right. A few years after that, they began to see wider use when they were embedded into a concrete tool, the Unix utility grep. They then became embedded in more and more powerful tools, such as set and awk,

04:40

and were embedded into the programming language Perl in 1987 as a first-class language concept. In other words, regular expressions got a lot more powerful and useful, and therefore a lot more used, when they became a domain-specific language for string processing, embedded within a more general-purpose language. In the 28 years since Perl came on the scene,

05:01

regex implementations have been baked into countless other programming languages. We're at the point where they're considered a language feature, rather than a language in their own right. Most programmers have forgotten that or never knew. And when I frame regex historically like that, contrasting their early days as a programming language in their own right,

05:22

with their modern days as an embedded DSL, it naturally is the question, what are DSLs anyway? Are they appreciably different from programming languages? Well, I don't necessarily think that the C2 Wiki

05:40

is an authoritative source. It's someplace where a lot of smart people have had a number of informed opinions, a number of informed opinions over the years. And they define DSLs in this consensus that has reached through a sheer,

06:02

stunning amount of debate as programming languages, as programming languages designed specifically to express solutions to problems in a specific domain. There are a lot of spirited discussions about the merits of this pattern, because two programmers and three opinions, and C2 Wiki, but it's universally agreed

06:21

by all of these programmers with all of these opinions that both their potential beauty and the potential horror of DSLs stems from their place as languages in their own right, because languages are difficult to design. They also do some talking about whether regex are actually a DSL, fascinatingly enough.

06:42

A lot of people don't think they're complex enough to count as a language. To each their own, but I am the kind of person who will die on the hill that CSS and SQL are also programming languages. And regex have far more complex control structures,

07:02

even if these control structures are not actually powerful enough to avoid this kind of email validation regex, and to let you express those ideas in a more concise fashion. But that cautionary tale aside, which is absolutely what we think of

07:21

when we think of regex in fear, in the wild, most production regex are a lot closer to this basic example. And while D.G isn't necessarily what we think of, it's a perfectly valid regga expression, and it exemplifies one of regex's

07:41

genuine intuitive strengths. It's not just that far a leap to figure out that a regex containing the letter D will match on the letter D. More generally expressed, we can call this feature of regga expressions tight domain integration. Wow, I timed that right. Remember, DSLs are programming languages

08:00

designed specifically to express solutions to problems in a specific domain. When DSLs tie themselves closely to the quirks and structure specific to a domain, they get a leg up in solving domain-specific problems. This is something that goes a bit deeper than the ordinary programmer's superpower of naming things.

08:22

You're not just importing concepts from the problem domain into Ruby. You're replacing the logic of Ruby with the logic of that problem domain. Reggixes get to cheat a bit when it comes to this tight domain integration. They're a text processing language, and they're written using text. Most DSLs we write don't get that automatic cheat,

08:42

but we can express this tight domain integration with a little more work to figure it out. For example, we're going to build a query language that runs Twitter searches, targeted Twitter searches specifically, and we'll start with the simplest query possible,

09:00

which is searching my Twitter feed for photos of my cat. We can see here why that is the simplest thing possible, or we will in about 10 seconds. At this point, right, you don't really need a DSL to express the thought. A simple hash interface would convey my intent as clearly,

09:22

and implementing that interface would be far more straightforward. But what if you want photos of cats in my general social circle? Suddenly, a more complex query language starts to make sense. These two examples are roughly comparable,

09:42

but when we start to add more complicated logic around the network diagram of my Twitter friends, then our quote unquote simple hash interface starts to look a lot less simple. This hash below would be difficult for the search function to parse and difficult to actually use. It would be difficult to document and difficult to remember.

10:01

This is happening because we're defining our API in Ruby's terms rather than our domain's terms. It starts to look like a bad DSL, actually, and specifically one without tight domain integration. In the first example, by admitting that we were writing a DSL, we were able to maintain a tight focus on the core domain concepts, which ultimately led to a smoother design.

10:23

Now, you'll note one thing that I am not saying here. A lot of people talk about strings like this as examples of successful API design because they're Englishy. What's actually happening, though, is more complex.

10:42

The two examples we're going to be looking at in about five seconds are both RSpec from different years of the framework. They're both, I suppose, Englishy in the loose way that we're using the term before. That is to say they both use English words to name things, and their grammar occasionally

11:01

causes those English words to flow together in a way that apes an English sentence. The top example is definitely the Englishier of the two. It's pretty much a sentence in its own right, but it's been supplanted by the second style as RSpec has evolved, which is against what we'd be thinking if Englishier

11:22

was always the goal of API design. It's been replaced by that for a lot of reasons, among them a much cleaner implementation. It actually isn't any harder to work with in practice, which goes against the idea that Englishy is the goal.

11:40

The mark of a good DSL isn't how closely it approaches English, it's whether it enables programmers to write programs. The RSpec DSL neatly encapsulates domain concepts like test cases and assertions, achieving the same tight and necessarily intuitive domain integration that regex achieve by having dog match dog.

12:00

And only some of RSpec's tight domain integration comes from it choosing good names for things. The vocabulary of the DSL makes sense, but languages are made of grammar as well as vocabulary. And this brings us to our next big principle of good DSL design, namely composability. If I want to make a regex that searches

12:22

for either dog or cat, the answer's pretty easy. Regex's grammar is simple and for the most part intuitive, or a combination of back references are really as complicated as it ever gets. Since all it's doing is providing a facility for simple text matching, and since it's made out of text, it once again gets to cheat and for the most part

12:42

lean on its own structure to develop a grammar. Since most domains aren't quite such natural fits for one character after the next, they need to develop more complex composition rules. When we build Ruby DSLs, we are building languages that are implementing in Ruby, and which lean on the Ruby parser.

13:00

And because of that, we're constrained by Ruby's grammar in deciding which composition rules to adopt. In practice, this leads us toward three basic shapes. The first and simplest is the class macro DSL. Specifically, the class macro with a lot of configuration options. This sort of example is useful as a top level

13:22

hook interface between a library and classes that want to make use of its features. It's how a lot of the Rails framework, for example, is expressed, as well as a lot of image attachment libraries. It's not necessarily that expressive, because you can only build concepts with it that can be expressed in a configuration hash, but it's easy to read and easy to implement and hard to screw up.

13:47

The next most complex of the DSL styles we're going to talk about is method chaining. In this style, which will hopefully appear on the screen now, you use a series of methods that return self to build code sequences

14:01

that continuously refine what an object means before using that object. This is a very common JavaScript DSL structure, but in the Ruby world, I've mostly only seen it in test libraries, like MOCA mocks or RSpec matchers. Honestly, I wish it were used much more often. Since it's designed around the idea of continuously modifying objects, it's easy to manipulate

14:21

and reason about, and it can be bent to match a lot of different domain models. In our example Twitter query DSL, our composition rules focus on the shapes of the relationships that people have with each other. In MOCA, they focus on the different properties of mock objects. In each case, the grammar which defines how elements can be composed also echoes the domain structure.

14:41

In other words, type domain integration matters at both the vocabulary and the grammar levels of a domain-specific language. The last common Ruby DSL style is the block structure. In its simplest form, the one-level block DSL, it's a common choice for tiny configuration DSLs.

15:03

It provides a really pretty interface with a minimum of implementation. A little computer? And the, there we go.

15:22

You can also build nested block DSLs. Since this style pushes you toward code that takes on a tree or a nested structure, it's a strong choice when the pattern echoes the landscape of that domain. In the Rails routing DSL, for example, the tree shape echoes the directory structures that web routes visually imitate.

15:42

This block structure is a common one in Ruby DSLs. It defines a grammar that feels removed from the ordinary one-method-after-another rhythm of Ruby, and so it feels DSL-y in the same way that arranging things in sentences feels English-y. It's not that hard to implement necessarily from a lines-of-code perspective,

16:01

but because it relies on passing blocks of code in between different contexts, it's sometimes hard to reason about. When things go wrong, it can be difficult to intuit or even find the context in which any given line of code is executing. And this leads to one of my most common frustration points with other people's DSLs,

16:21

namely them using the block structure inappropriately because it looked DSL-y. The slide illustrating my point should appear in a few seconds, but in the interest of time. The abstraction that they try to implement with these inappropriate block structures

16:43

doesn't neatly fall into a nested structure necessarily. And so when I write code that tries to fit what I'm trying to express within this nested structure that doesn't fit it very well, I wind up needing to pass around prox a lot or use a bunch of instance vals or both

17:02

in order to get things done in a dry way. Worse yet, because I'm passing around all of these blocks that are evaluated in various contexts that I know very little about immediately, I need to read the get library's code and really know a lot about what context

17:20

these blocks are being evaluated in. I need to care about the internals in a way that I wouldn't necessarily need to care in a less leaky abstraction. And to be frank, this talk was inspired by a DSL that made me do that. It also was designed in a way that wasn't easy to extend or modify, and so I wound up needing to monkey patch it a lot.

17:41

It was a really bad, perfect storm of frustration, and so I was trying to write a talk to figure out why I hated that entire process. So all through the project I was working with that DSL on, I wanted two big things from it. I wanted it to be easily extensible with ordinary object-oriented techniques so that I didn't need to monkey patch it all the time,

18:03

and I wanted it for me to be easily able to merge blocks of code written in that DSL. In other words, I wanted it's grammar to allow for better composability, and when I started working on this talk, I figured that those two were the same thing. I really did think I was going to wind up

18:21

proving that DSLs were irrelevant, and I was wrong. And here's why. Regexes are made of strings. You can trivially build a regex with Ruby using perfectly ordinary string manipulation Ruby. You don't need to use class val and feel dirty about it the way I did in the regex examples I was showing before.

18:43

And I figured that as long as I was going to say that you can do stuff like this with your DSL, it was going to be perfectly fine, it was going to be great. And this talk was going to just be about how to make it possible to do that stuff. But if we accept that domain-specific languages are just languages, then what actually is the difference

19:03

between combining regex fragments with Ruby and intermixing Ruby with other languages? What's the difference between the regex with embedded Ruby up top and the JavaScript with embedded Ruby below? There isn't all that much of one.

19:22

And if we poke at our instinctive ew reaction to that JavaScript with embedded Ruby, we can figure out why. So in this example, we're initializing a JavaScript array and then using embedded Ruby to manually build up a set of literal push calls that reassembles a Ruby array in JavaScript world. When I've seen this first example in the wild,

19:41

and yes, I have seen it in three different production code bases, God help me, is generally been in the context of web application view. In other words, the developer was writing that code to transfer an in-memory Ruby array on the server to an array on the client. But of course, there's another more widely accepted way to do that, it's the example below.

20:02

You just write an API endpoint on the server that returns the array and then the client-side JavaScript accesses it using an ordinary Ajax call. In writing embedded Ruby, we're ignoring an existing well-defined interface for transferring information between the client-side and the server-side.

20:25

And in ignoring that interface, we can figure out what's going wrong. It's not just that we're ignoring the interface, by the way, when I first had this ereaction to the array push,

20:42

I didn't actually know enough JavaScript to understand that there was an accepted way to not bullshit that. But if there's a defined interface for us to ignore, then that means that we must have two objects that the interface is between. In this case, the objects are the Ruby server

21:00

and the JavaScript client, but we can as easily think about that as the Ruby and the JavaScript. We can think of the languages as kind of objects in the CS meta sense. This is a little easier to understand when we look at the regex example. It's very clear that the two different objects

21:22

are the languages themselves. And if a chunk of any given language is a object in its own right, in again some very interesting meta sense, then what we're doing when we use Ruby to compose a regex or assemble a JavaScript array is crossing those object boundaries of the language.

21:43

Those interpreted Ruby strings are not actually spiritually different from using instance about to call a private method. They're reaching into the JavaScript's business and messing around with it, which is part of why code generated using this method is so very hard to understand and debug.

22:02

And so once we've got that mental framework in place, what's the difference between interpolating Ruby into JavaScript, like the example above, and interpolating Ruby into RSpec? And I know I just said a really weird thing.

22:21

RSpec is written using Ruby, so it sounds funny to talk about interpolating Ruby into RSpec. But again, in order for a DSL to be useful, it needs to be a language in its own right. We need to give it that respect. And so we need to accord RSpec that respect. And RSpec is kind of weird in this way, right,

22:42

because it expects you to embed Ruby into it, but it expects you to embed this Ruby in specific, cordoned off, and well-defined places. When you embed Ruby in a place that isn't one of those, like by using an each loop to define a group of similar examples, then you're crossing language boundaries, and it feels icky in the way

23:01

that that always does and should always do. If I were to try to use ordinary, object-oriented techniques to try and extend RSpec, like I wanted to be able to do with that bad DSL I was talking about earlier, that would also be crossing those boundaries.

23:21

When was the last time you tried to extend the class that all described blocks build instances of? For that matter, when was the last time, outside of Sam's talk earlier, that you thought about the fact that described blocks instantiate an object? RSpec's language design successfully hides these implementation details from you,

23:42

just like a good library and a good language should. You don't think about C when you're writing Ruby, unless you're doing weird optimization. More than that, it successfully obscures its own Ruby-ness. We nearly forget using it, that it was written in Ruby, and therefore must be made up of the objects and classes that make up all Ruby implementations.

24:03

We get to do that because RSpec has removed the need to think about it. Instead of asking users to use ordinary, object techniques to extend RSpec, its maintainers have defined some specific extension APIs, such as the shared example API and the matcher API.

24:22

And for matters connected to the actual purpose of RSpec, namely the structure of example groups and examples and expectations, you're expected to still not interfere. In other words, any language's rules of composition stay within the language.

24:40

Composability is not about how easy it is to cross language boundaries to do whatever you want. It's about how easy it is to do what you want in a sensible way while staying within the bounds of the language. And that's great and all, but it doesn't solve one of the problems I had with that other DSL, the terrible one that I'm deliberately not naming. Um, but I couldn't do all the things

25:03

I wanted to do with it, period. Never mind sensibly while staying within the bounds. That's why I needed to monkey patch its internals. And so how do we avoid that problem in our DSL designs? Well, we can provide a small defined extension API like RSpec does. And that lets us define new words in the language

25:23

without bending its grammar out of shape. But there's another way, and I like this one better. And it's very simple. One of the beautiful things about regular expressions is that they search within text and they occasionally replace text.

25:42

They do not try to do anything more. They do not claim to do anything more. They have chosen one specific problem space and they don't try to solve any other problems. As Stack Overflow's funniest answer is quick to remind us, regular expressions can only parse regular languages, and those are a very small subset

26:01

of all the languages in the world. They have their limits. They are not a complete parsing engine for anything, especially not HTML. And also, again, not to be a dead horse, email validations.

26:21

And that is totally okay because they do not need to do anything but search text. I'm going to call this closed domain integration. It's not enough to integrate deeply with domain. You just need to go through the limits of that domain and no further. And in order to get there,

26:41

you need the flip side of this coin, namely constraining the domain definition so that you know where those limits are. It's okay to define these limits with big red placeholder boxes like RSpec does and say user code goes here. But you need to have that really specific definition. You need to know where those boxes lie.

27:02

If you do that, it makes the problem of covering the domain completely, one that is even solvable in the first place. So I'll start wrapping up now. As Rubyists, we are not going to stop rewriting DSLs anytime soon. It's one of the things everyone jokes about us,

27:22

but actually it's a strength because DSLs are very powerful, and they're kind of cool when they're done right. So the question then becomes, how do we write the good ones rather than the ones that Aaron is having feelings about right here?

27:42

And so you can treat your DSL like you would any other API. You can expose what people need. You can close off the other stuff. You can stay close to domain you're describing and have sensible composition rules, and you can keep everything small enough to be complete.

28:03

Getting there though is again a very hard problem. While a good DSL is often more usable than a good vanilla library API, a bad DSL is much less usable as we've all experienced than a bad vanilla library API.

28:20

I'm not saying right now that you're doomed to screw up because obviously you've seen this talk and every DSL you design from now on is going to be perfect, but a good DSL is a lot more work than a decent vanilla API, and that's something that you get to respect. You're going to need to write that decent vanilla API

28:41

anyway in order to implement the DSL. And so I'm going to suggest that you do that first and figure out if you need more and let things lie like that. That's everything I need to say right now. I'm Betsy Hable again. I'm BetsyTheMuffin on Twitter,

29:01

which is going to pop up on the screen in about five seconds. I am very sorry about the AV issues. I'm not entirely sure what's going on with Google Docs. This talk is going to be up on my website at the URL on the screen shortly after this talk, probably sometime during the lightning talks or dinner whenever I can get a decent lock onto GitHub

29:23

and with the conference internet, really. I tweet about books, code, my cat, and feminism at BetsyTheMuffin, and I co-organize a meetup back home called Learn Ruby in DC. This is an informal space for newbies to ask questions and find mentorship.

29:41

If you are interested in making a meetup like that in your own hometown, or if you also run a meetup like that and want to talk shop, then please talk to me. I think it's a really good model for building the community, and I would love to share nice stories and also pitfalls so you can avoid them.

30:03

I work for a great little organization called ActBlue that builds fundraising tech for democratic candidates and causes. We focus on small-garliter nations, which is a surprisingly powerful thing.

30:21

Our average donation size is around $30, and we've raised nearly $850 million over the approximately decade we've been in business. This really helps those donors' voices be heard

30:41

in a way that keeps the party accountable to the voices of people who only have $30 to spare at a time. It's something that means a lot to me. We are also committed to building sustainably at the kind of scale that can bring in that much money over time.

31:01

We have a modern-tested stack, and we have a focus on maintaining culture that ... Well, my third day was one of our biggest days of all time, right, and pretty much everyone on my team hip-chatted me over the course of the day saying, by the way, Betsy, I know it's end of quarter. You're going to close your laptop at 5.30,

31:22

and you're going to have dinner, and you're going to do everything but be on call. And we're also hiring Rails, UX, and DevOps people right now, so if the values I just outlined sound good to you, if they resonate, then please talk to me. I'd love to work with you.

31:43

Many thanks to Noel Rappin, Kenzie Connor, Chris Hoffman, Tina Wiest, and the entire membership of Arlington Ruby Users Group for invaluable feedback while I was developing this talk.

32:06

That I personally have built. I have not built enough ... I have not built enough things that actually require DSL. Like, I really do take the responsibility to go up to those bounds and no further quite seriously. And so I've built some templating stuff

32:22

that I'm pretty proud of, but other than that, I haven't worked in any problem spaces that I feel require that level of power. Unfortunately, that's all been proprietary stuff, so I can't point you to a GitHub repo. The question from Walter is, whether I have any mental litmus tests for when something does want to DSL.

32:42

So, for that, let's kind of go back a few slides. A lot of slides, hi there. Now you're working quickly. What the hell? So, if you can see in that second example on the bottom,

33:01

we're getting an increasingly complex hash interface. And one of the things about that is that as you acquire more and more options for what any given library access point, we'll call it that even though that sounds really fancy and it's not a fancy concept.

33:24

What any given method call that's at the front edge of your API winds up starting taking a lot and a lot and a lot of parameters, you should start thinking about ways to encapsulate all of those parameters within an object.

33:41

And a lot of the time, a nice, simple method chaining DSL is a great way to actually build that parameter object in a way that's clean and readable. It's one of the questions I kind of anticipated getting was someone calling me on differences between RSpec and Minitest because they're very different stylistically

34:01

in terms of implementation. But in terms of the ways the Minitest DSL has evolved over the years and the RSpec DSL has evolved over the years, one of the interesting things is is that they've actually evolved toward each other. I think that it's valid

34:23

to want something like the full-on test unit prefix everything with test style. It drives me bonkers. And through the years, we've seen a lot of things like RSpec, like Minitest spec syntax, like shoulda, that attempt to impose more structure

34:41

than the test case magic API gives you. And there's no hard and fast rules in programming, so this is going to be matters of taste. But the outer edges of the RSpec spec API

35:00

with describe blocks and get blocks seem to be something that a lot of different things just ultimately eventually decide works for test cases, even if that's not where they start out. Cool, wonderful. Well, I will let you all get to the lightning talks. Thank you so much.