Beneath The Surface: Harnessing the True Power of Regular Expressions in Ruby - TIB AV-Portal

Beneath The Surface: Harnessing the True Power of Regular Expressions in Ruby

00:00

5

Shamrell-Harrington, Nell

Formal Metadata

Title

Beneath The Surface: Harnessing the True Power of Regular Expressions in Ruby

Title of Series

Ruby Conference 2013

Number of Parts

50

Author

Shamrell-Harrington, Nell

License

CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/37463 (DOI)

Publisher

Release Date

Language

Producer

Production Place

Miami Beach, Florida

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Many of us approach regular expressions with a certain fear and trepidation, using them only when absolutely necessary. We can get by when we need to use them, but we hesitate to dive any deeper into their cryptic world. Ruby has so much more to offer us. This talk showcases the incredible power of Ruby and the Onigmo regex library Ruby runs on. It takes you on a journey beneath the surface, exploring the beauty, elegance, and power of regular expressions. You will discover the flexible, dynamic, and eloquent ways to harness this beauty and power in your own code.

Ruby Conference 201332 / 50

1

37:25

Test Driven Neural Networks with Ruby

2

42:54

Ruby-core dilemmas

3

41:40

Ruby On Robots Using Artoo

4

39:26

REPL driven development with Pry

5

33:19

Recommendation Engines with Redis and Ruby

6

31:02

Rapid Game Prototyping with Ruby

7

31:52

Raft: Consensus for Rubyists

8

51:36

Ruby Conference 2013: Questions for Matz

9

33:33

Promiscuous: A robust service-oriented architecture framework

10

29:46

Profiling Ruby: Finding 10x Gains In RSpec and CRuby

11

35:05

Preferring Object-Orientation to Metaprogramming

12

51:07

Opal, A new hope (for Ruby programmers)

13

45:22

Object management on Ruby 2.1

14

32:24

New Ruby 2.1 Awesomeness: Fine-grained Object Allocation Tracing

15

37:04

My KidsRuby Journey

16

40:06

Mastering Elasticsearch With Ruby

17

24:21

Mangling Ruby with TracePoint

18

38:20

Maintaining Sanity

19

2:01:31

Ruby Conf 2013: Lightning Talks

20

32:46

Keynote - Living in the Fantasy Land

21

18:23

How To Roll Your Own Ops Framework In Ruby (If You Really Have To)

22

35:07

How to control physical devices with mruby

23

31:23

Fault Tolerant Data: Surviving the Zombie Apocalypse

24

26:37

Extreme Makeover: Rubygems Edition

25

41:53

Extending Gems - Patterns and Anti-Patterns of Making Your Gem Pluggable

26

42:01

Extending CRuby with native Graph data type

27

37:56

Eliminating branching, nil and attributes

28

36:04

Effective Debugging

29

41:32

Compilers For Free

30

40:41

Build your own Ruby-powered Arcade Machine!

31

32:21

"Bioruby"...understanding the intricacies of 'Life' using Ruby

32

32:26

Beneath The Surface: Harnessing the True Power of Regular Expressions in Ruby

33

27:17

Being Boring: A Survival Guide to Ruby Cryptography

34

25:20

Becoming a Software Engineer: Inspiring a New Generation of Developers

35

49:45

Bad Ruby code doesn't exist

36

38:33

Arrrr me hearty! Sailing the Seas of DRb in a Shoe

37

35:02

API design for gem authors (and users)

38

45:52

Advanced Concurrent Programming in Ruby

39

39:27

A Peek Inside The Ruby Toolbox

40

28:08

A Lightweight SOA Framework using Ruby, Apache Thrift and AMQP

41

17:02

Your Development Machine in the Cloud

42

34:59

Visualizing Garbage Collection in Rubinius, JRuby and Ruby 2.0

43

35:32

Unleash the Secrets of the Standard Library with SimpleDelegator, Forwardable, and more

44

26:47

Under The Influence

45

38:37

Thinking about Machine Learning with Ruby

46

31:32

The tricky truth about parallel execution and modern hardware

47

32:11

The Polyglot in the Code - An Elixir/Ruby Mashup

48

44:00

The Future of JRuby

49

48:32

The Big Picture

50

36:57

That's Not Very Ruby of You

Automatic playback

Speech

Text

Image

00:00

Regulärer Ausdruck <Textverarbeitung>Power (physics)SurfaceRegulärer Ausdruck <Textverarbeitung>Context awarenessWritingMultiplication signNumberPlastikkarteComputer animation

00:27

InformationRegulärer Ausdruck <Textverarbeitung>Source codePower (physics)Natural numberSurfaceNumberRegulärer Ausdruck <Textverarbeitung>PlastikkarteComputer programmingCodeMatching (graph theory)Process (computing)ParsingMultiplication signParsingString (computer science)Library (computing)Computer animation

01:48

Regulärer Ausdruck <Textverarbeitung>AbstractionNetwork topologyInformationMatching (graph theory)SurfaceRegulärer Ausdruck <Textverarbeitung>Abstract syntax treeCodeSeries (mathematics)Extension (kinesiology)Network topologyLink (knot theory)Finite-state machineCASE <Informatik>Binary multiplierForm (programming)DataflowStandard deviationPoint (geometry)String (computer science)SpeciesBit rateAlgorithmDiagramComputer animation

03:17

Finite-state machineFinite-state machineMathematical modelState of matterLevel (video gaming)DiagramMultiplication signShared memoryInheritance (object-oriented programming)CircleCoefficient of determinationMixed realityComputer animation

03:51

Group actionState of matterFinite-state machineMultiplication signWordCoefficient of determinationVirtual machineEndliche ModelltheorieComputer animation

04:32

InfinityVirtual machineState of matterLimit (category theory)Endliche ModelltheorieState of matterProcess (computing)Coefficient of determinationNeuroinformatikSemiconductor memoryVirtual machineNumberFinitismusCASE <Informatik>Computational physicsArithmetic meanComputer animation

05:19

State of matterProcess (computing)Matching (graph theory)State of matterRegulärer Ausdruck <Textverarbeitung>AlgorithmMultiplicationLink (knot theory)String (computer science)Finite-state machineRight anglePresentation of a groupCASE <Informatik>

05:52

Finite-state machineRegulärer Ausdruck <Textverarbeitung>State of matterMatching (graph theory)Abstract syntaxString (computer science)InformationForcing (mathematics)WordCASE <Informatik>Abstract syntax treeDirected graphMultiplication signAbstractionVirtual machineBit rateComputer animation

07:28

String (computer science)Stack (abstract data type)TrailRegulärer Ausdruck <Textverarbeitung>Matching (graph theory)Thomas BayesFinite-state machineAdventure gameComputer configurationString (computer science)Multiplication signPoint (geometry)State of matterWordExterior algebraDifferent (Kate Ryan album)BacktrackingCASE <Informatik>Mixed realityObject (grammar)Dean numberSpring (hydrology)Stack (abstract data type)BitAxiom of choiceComputer animation

09:36

String (computer science)Greedy algorithmMaxima and minimaMach's principleBacktrackingQuantificationSign (mathematics)Regulärer Ausdruck <Textverarbeitung>WordECosDifferent (Kate Ryan album)MetazeichenMultiplication signFinite-state machineAxiom of choiceProcess (computing)CASE <Informatik>Matching (graph theory)Spezielle orthogonale GruppeNumberPermutationString (computer science)Greedy algorithmPrisoner's dilemmaComputer clusterLoop (music)Maxima and minimaLine (geometry)CodecCurve2 (number)Right angleBitRow (database)FrequencyBookmark (World Wide Web)State of matterSource codeProgrammschleifeMixed realityEntire functionObject (grammar)ArmDefault (computer science)MappingMereologyComputer animation

17:09

BacktrackingMach's principleQuantificationProgrammschleifeRegulärer Ausdruck <Textverarbeitung>QuantificationPrisoner's dilemmaSpezielle orthogonale GruppeMatching (graph theory)Sign (mathematics)String (computer science)DampingBacktrackingTheoryGoodness of fitProgrammschleifeInformationBitFinite-state machinePoint (geometry)PermutationMultiplication signCodeSemiconductor memoryCASE <Informatik>Sheaf (mathematics)Latent heatImage resolutionRow (database)Computer clusterSampling (statistics)RepetitionAxiom of choiceTrailBit rateComputer animation

21:48

Regulärer Ausdruck <Textverarbeitung>CASE <Informatik>Software engineeringString (computer science)WordActive contour modelError messageRegulärer Ausdruck <Textverarbeitung>CodeCASE <Informatik>Active contour modelVideo gameString (computer science)Address spaceMessage passingPresentation of a groupResultantSystem callData structureMereologyWave packetComputer programmingGreen's functionMultiplication signProcess (computing)WordDivisorSocial classCovering spaceData conversionCharge carrierContext awarenessLatent heatMatching (graph theory)Bit rateRoutingRow (database)Line (geometry)Student's t-testState of matterCode refactoringMultilaterationTest-driven developmentTwitterComputer animation

29:21

Regulärer Ausdruck <Textverarbeitung>SurfaceVideo gameRegulärer Ausdruck <Textverarbeitung>Electronic mailing listComputer programmingParsingSoftware testingWebsiteFormal languageIterationParsingComputer animation

30:04

Regulärer Ausdruck <Textverarbeitung>Software testingCodeWebsiteString (computer science)Electronic mailing listComputer programmingCore dumpFunction (mathematics)Programming languageoutputLogicFormal languageProcess (computing)Physical systemPhysical lawLink (knot theory)

30:46

CodeRegulärer Ausdruck <Textverarbeitung>QuantificationComputer programmingProcess (computing)Function (mathematics)Regulärer Ausdruck <Textverarbeitung>Formal languagePhysical systemSurfaceoutputLogicSystem callLink (knot theory)Presentation of a groupTwitterCuboidGreedy algorithmComputer animation

31:29

SoftwareTwitterLink (knot theory)Multiplication signAuthorizationPresentation of a groupComputer animation

Transcript: English(auto-generated)

00:02

My name is Nell and I used to be intimidated by regular expressions. Has anyone else here

00:21

felt intimidated by regular expressions? I see a lot of hands that just went up. I used to look at a RegEx like this and I would feel a sense of dread in my heart. Now what this RegEx does is it validates Visa credit card numbers. Now once I knew the context, I could kind of see what was going on, kind of pick out clues here

00:43

or there, but I had no idea how I'd ever write something like this. It's human nature to fear what we don't understand. Now it took time, but once I understood how a RegEx parser actually works, how it does that magic where it finds that match in the string, I realized it was simply a process. A logical process like any program

01:05

that I could grasp. Then I knew how to use RegEx without fear. How to harness their power to match exactly what I wanted, exactly when I wanted. I'm here today to share this knowledge with you. To help you move beyond your fear

01:22

by understanding how regular expressions work beneath the surface. When it comes to RegExes, knowledge truly is power. And today I'm going to show you how that power can be yours. Ruby and regular expressions work together in a harmony, in a symphony

01:41

of code. If I was really gonna learn regular expressions anywhere, I'm so glad I learned them in Ruby. What we see in Ruby, however, when we use things like the match method, is just the tip of a very large iceberg. A lot more goes on beneath the surface in the Onigmo regular expressions library. Let's take a dive together beneath that surface.

02:05

The Onigmo regular expression engine was introduced in Ruby 2.0. Ruby passes regular expressions and strings to Onigmo, and Onigmo handles the actual matching. Now Onigmo is actually a fork of the Oniguruma RegEx engine that was used in Ruby 1.9.

02:22

Both of these provide the standard RegEx features you'd find in any engine. But what these two do is they handle multibyte characters, such as Japanese text, particularly well. Onigmo adds some new features that were introduced in Perl 5. Now Patrick Shaughnessy, who I know is in attendance here, there he is, has a great

02:42

article entitled Exploring Ruby's Regular Expressions Algorithm. I'll include a link to that in my resource notes. Now in this article, he lays out the workflow of Onigmo. When Ruby first passes a RegEx to Onigmo, Onigmo reads the RegEx and parses it into an abstract syntax tree. An abstract syntax tree

03:02

simply represents some code, in our case a regular expression, in a tree form that's easier for Onigmo to compile. Onigmo then compiles this tree into a series of instructions for the engine to execute. Now these instructions can be represented by a finite state machine. Now what on earth is that?

03:22

A finite state machine is a mathematical model that shows how something works. It's like a diagram or a map that shows how something can get from one state to being in the other state. This will be clear with an example, so let's go ahead and create one. I'm first going to create a finite state machine for a dog. In particular, this is my parent's

03:42

dog Annie. She's a very cute, whippet Irish Terrier mix. And like most dogs, she loves to go in and out of the house all day every day. So each of these two circles, these nodes, represent a state that Annie can be in at any given time. She can either be in the state of being in the house, or she can be in the state of being

04:00

out of the house. So how does she get from one state to the other? Well, if she's in the state of being in the house, she can go through her doggy door and transition to the state of being out of the house. Likewise, when she gets bored outside, she can go through the doggy door again and be in the state of being in the house. So that's an example of a finite state

04:22

machine. But even with an example, a very cute example if I do say so myself, just those words, finite state machine, are still quite a mouthful. Let's break it down. The machine is what I'm modeling. In our example was Annie the dog. State means we're modeling states that Annie, or

04:41

that, pardon me, modeling states that a machine can be in. In the case of Annie, she can either be in the house or she can be out of the house. Finite means there are a limited number of states our machine can be in. States are often limited by physical reality. Annie really can't suddenly be under the ocean, unless she's in Miami and

05:01

playing on the beach, I suppose, or suddenly be on the moon. In a computer, physical memory is not infinite. There's only so much a computer can process before it will crash. Therefore, the number of states a computer process can be in is usually limited by physical memory. Now before I move on, I want to

05:21

mention that, like many dogs, Annie loves to stand halfway in the house and halfway out of the house. Now, in these cases, she's in multiple states simultaneously. There are ways a computer process can be in multiple states simultaneously as well, including regexes. Now it's out of the scope of this presentation,

05:40

but the article Regular Expression Matching Can Be Fun and Fast by Russ Cox delves into an algorithm by Ken Thompson that allows for this. I'll also include a link to this in my resource notes. So let's make a finite state machine for this regex. This regex looks for the word force in any string that I pass to it. So

06:01

when I use this regex in Ruby, I'm gonna declare it. I'm gonna declare my string that I'm gonna call match on my regex and pass it my string. After O'Nigmo reads the regex and parses it into that abstract syntax tree, then compiles it into those instructions, my finite state machine will look something like this. A regular expression tries to match a string one

06:21

character at a time, starting with the left most character. So the first character this regex sees is that capital letter U. Now that doesn't match the path to the next state. It would need a lowercase f for that, so it stays there on that starting state. Next it sees the lowercase s. Now that

06:40

still doesn't match, so it still doesn't move from that starting state. Now it's gonna do this for several characters, so let's go ahead and fast forward. When we come to this lowercase f, now things start to get interesting. A character in the string matches the path, which means my finite state machine can move on to the next state. Then

07:00

it sees a lowercase o in the string. Once again, that matches the path to the next state. Does the same thing with the r and the c and the e, and we have a match. We've reached that final state in our finite state machine, which means we're at a matching state. Oniguo passes the information back to Ruby, then Ruby

07:22

returns a match data object containing our match. In this case, it's the word force. Now that was a pretty simple example. For our next example, let's try something a little more complicated. Let's try a regular expression that uses alternation. This regular expression will match

07:41

a capital letter Y, followed by either the characters o-l-k or the characters o-d-a. I'm providing two alternate ways my regular expression can find a match. So in Ruby, again, I'm going to declare my reg ex, declare my string, the word Yoda. I'm gonna call match on my reg ex and pass it that string. And this time my

08:03

finite state machine looks a little bit different. There are two paths that can lead to successful match. So after it matches the Y in my string, it now has to make a choice. Which path should it try first? In the case of alternates, a reg ex engine

08:21

will always try the leftmost alternate first. But before it tries that o-l-k path, it saves both the point in the string where it is and the state it's at to what's called the backtrack stack. Every time my reg ex chooses one path over the other, it saves the string and the state just in case the

08:40

match fails and it needs to try the other option. I like to think of it as being kind of like a choose your own adventure book. It's marking a place that it can come back to. And it's a good thing it did. As soon as it sees that D in the string, it knows it has no way to get from its current state to that finishing matching state. Now, normally, having no path to

09:02

the finishing state would cause the reg ex to fail. However, because it has something in that backtrack state, stack, it can backtrack back to the point where it chose which path to follow and try the other one. This time things go better. After it matches the lowercase o, it's next

09:21

able to match the D and the A, and hurrah. This time we have a match. Back in Ruby, it returns the match data object containing our match, which is the entire string in this case, Yoda. Now finite state machines become even more interesting when we use quantifiers. Now it's easy

09:43

to look at this regular expression with our human brains and see the word no followed by a plus sign. However, Onigmo sees this as a capital letter N, followed by a lowercase o, and a plus sign quantifier. That plus sign after the o means the o can appear one or more times.

10:03

So in Ruby, again, I'll declare my reg ex in my string. This time, the string is the word no, famously yelled by Luke Skywalker in the Empire Strikes Back. I'm gonna call match on my reg ex and pass it that string, and this is what my finite state machine looks like. It's pretty simple at first. It

10:22

matches that capital N, then it matches the first lowercase o, and now our reg ex has a dilemma. Technically, it has a correct match right here. It has a viable match. It could go ahead and return this back to Ruby and declare it found that match. But it also sees

10:40

more lowercase o's in the string. It could either return the match here, or it could follow that curved o path and loop back on itself into the same state. So what should it choose? It chooses to keep looping and match that second o. By default, the plus sign quantifier is

11:02

a greedy quantifier. A greedy quantifier will always keep looping as long as there is more string to match. A greedy quantifier will always match as much of the string as it can get its greedy little arms around. Even if it has a successful match, it will always be hungry for more. It's greedy. A greedy quantifier uses maximum

11:25

effort for maximum return. It will try every permutation of the reg ex to see if it can get the biggest match possible from the string. So it just keeps on looping and matches that third o, then the next o, and we have

11:41

a match. Back in Ruby, I'm gonna get my match data object back, and it matched the entire string. Capital letter n with all four of the lowercase o's. But what if I want the opposite behavior? What if I want to match as little of the string as possible? I would use a lazy quantifier. Whoa. Did someone

12:06

just lean on the light? Oh, it's OK. It was mood lighting. Lazy quantifiers deserve darkness. So lazy quantifiers do the opposite of greedy quantifiers. They match the least number of characters possible. Lazy quantifiers use

12:24

minimum effort for minimum return. They're lazy. They do just enough to get the job done, and then they stop. I make a quantifier lazy by adding a question mark right after the quantifier. The plus sign is the quantifier. The question mark is a modifier

12:42

on that quantifier that makes that quantifier lazy. So when I match my string using this reg ex, again, declare my reg ex and my string, calling match my reg ex and passing it that string, it's gonna start off the same. It's

13:00

then that lower case o, and now it finds itself with that same choice. Should it keep looping, or should it go ahead and return the match right here? Well, because this is a lazy quantifier, it chooses to go ahead and return the match. It's done just enough to get the job done. It's done.

13:20

And back in Ruby, I'm gonna get my match back, and notice that I have the capital letter N and only one lower case o this time. This choice, whether to keep looping or return the match, is the essence of greedy and lazy quantifiers. The difference between them is that a greasy, greedy quantifier will always choose to keep looping whenever able, and a lazy quantifier will always return

13:43

the match as soon as it has a viable one. Now, even though greedy quantifiers are greedy, they're also reasonable. If a greedy quantifier matches an extra character, but then that character is needed later in the reg ex to make a successful match, it will go ahead and give the character back. It will

14:02

always choose to allow for an overall match versus holding onto any extra characters. So let's try another example, but this time let's use the star quantifier. Now, before I continue, I should recommend that you use the star quantifier with caution. Star quantifier after a character means that character

14:20

can appear any number of times. And any number of times includes zero times. So, the dot character matches any character. Then we have the star quantifier. It will be any character appearing any number of times, followed by the word moon. So in Ruby I'm gonna declare my reg ex and declare my string. In this case it's

14:41

another of my favorite lines from Star Wars. It's that's no moon. I then call match on my reg ex and pass it the string. And in my finite state machine, it first sees that capital letter T. Now that matches the dot meta character path. So it's able to move on to the next state, and when it gets to this state, there are two paths that it

15:01

can take. If it finds a lowercase m in the string, it can go ahead and move on to that next state. Or if it finds any character at all, it can again loop back in on itself, follow that dot character path, and be back at the same state. So h, lowercase h, matches any character. So it goes ahead and matches it. It then sees

15:22

the lowercase a in the string, and again, that matches the any character path. So it's gonna do this for a while, so let's go ahead and fast forward a little bit. Until we get to that lowercase m. This is where things start to get interesting. My reg ex has a dilemma. It can either take that path that matches the lowercase m, or

15:41

it could take that looped any character path. And what should it choose? Well because that star quantifier on the dot character where my reg ex is greedy, it keeps on looping over that any character path. It does this again for the o, and the second o, and the n, and uh oh. My reg ex has

16:03

no more characters left in the string for it to match. And it's still not at that finishing matching state. So it now has to make another choice. Should it backtrack and give back some of those characters it matched? Or should it declare the match a failure? Well remember, greedy quantifiers are reasonable. The star

16:24

quantifier, or, pardon me, the dot with the star quantifier goes ahead and surrenders some of the characters that it matched, starting with the most recent one it matched. So it surrenders that lowercase n and that doesn't make things better. So it goes to the next one. It's that lowercase o. Still no

16:42

match for that m path. Then it surrenders the next o. Still no match. Things are looking grim. Until it gives back that lowercase m. Now we have a match, and we can move on to the next state, where it again matches the lowercase o, then the second o, then

17:03

the n, and huzzah. We now have a match. Back in Ruby, I'll get back my match data object, which is the entire string that's no moon. So with backtracking, we were able to find a successful match. But backtracking tends to be slow. When

17:24

you hear someone complain about how regular expressions are slow, they're probably complaining about backtracking. It's great when backtracking lets my regex find a match when it otherwise wouldn't have, but when it doesn't find that match, when my match fails, all that work, all that extra memory

17:40

it used is for nothing. So let's look at an example of this. This regex will match the capital letter n followed by the lowercase o appearing one or more times, followed by a lowercase w appearing one or more times. In Ruby, I once again declare my regex. I'm matching the string no again, and call match on

18:01

my regex and pass it that string. In my finite state machine, it matches the capital letter n. Then it matches the lowercase o. Then it loops and matches the other o's, because remember that plus sign on the lowercase o is greedy. Then the next o, and the next, and the next, and uh oh. Once again it's

18:22

at the end of the string and it hasn't reached that final matching state. It hasn't found that lowercase w appearing one or more times. Now it has to make a choice. Should it fail the match now, or should it try to backtrack? Well, because it's a gritty quantifier and therefore reasonable, it goes ahead and backtracks.

18:42

It gives back some of the characters. So it gives back that lowercase o at the end. Still no match. That's not a w. Then it goes to the next o, and the next one until it gets here. That's the last lowercase o it can backtrack over, and there's definitely no way it can make a match now.

19:01

So it fails the match. Now that backtracking was a lot of extra effort. Sometimes that effort is worth it. But when it's not, there's a third kind of quantifier. The possessive quantifier. A possessive quantifier allows you to control the backtracking in your Regex. Possessive quantifiers do not backtrack. They either make a

19:24

match on the first try or they fail the match. So let's look at an example of this. I make a quantifier possessive by adding a plus sign after it. This Regex matches a capital letter n, followed by a lowercase o appearing one or more times, and now I've added a

19:40

second plus sign after it. One of these plus signs is the actual one or more quantifier, and the other is again a modifier on that quantifier that makes it possessive. After the o, my Regex also looks for lowercase w appearing one or more times. In my finite state machine, my Regex first matches

20:00

the capital letter n, then it matches the lowercase o, and it proceeds the same, matches the next lowercase o, and the next, and the next one, until it comes to here, where it has that same dilemma. It hasn't yet found that lowercase w appearing one or more times.

20:23

It didn't find it on the first try through the string. It has to decide whether it should backtrack and give back some of those extra o's in the hopes it might find that lowercase w somewhere, or it needs to give up and fail right now. A possessive quantifier always chooses to fail rather than

20:41

give up any of the characters that it matched. Possessive quantifiers save both time and memory by making a Regex fail faster. You use a possessive quantifier when you know there's a point in your Regex where continuing, where backtracking would be pointless. The match will fail no matter how many permutations it tries.

21:01

Use possessive quantifiers with caution. They can potentially cause unexpected failures. Generally I've found the best place to use them is within smaller sub-expressions or nested quantifiers within your regular expression. When used carefully, they can significantly speed up a regular expression's matching.

21:21

So far I've taken you through the bits and pieces of how a regular expression works. It's good information to know and great theory to understand, but it doesn't explain how to practically use a Regex in your everyday code. Crafting a regular expression for a specific need is as much an art as a science. In

21:41

this last section, I'm going to take you through crafting a regular expression from scratch for use in real, functioning code. Back in May, Audrey Grimm tweeted a regular expression challenge. It was to create Ruby code using the gsub method and a Regex that would convert a snake case string into a camel case string. Now I was away from on vacation and unplugged

22:02

at the time, so I didn't see this until much later. I'd like to present my solution for you now and take you step by step through how I developed it. First step was to whiteboard the requirements for my solution. First thing it needs to do is find the first letter of the string and capitalize

22:21

it. Then it needs to find any character that follows an underscore and capitalize that one. Finally, it needs to remove the underscores from the string. These steps will transform a snake case string into a camel case string. So let's start with that first step. I need to find the first letter of my

22:40

string and capitalize it. Now I'm a test-driven developer and I develop my regular expressions through the same red-green refactor method. Red-green refactor process, pardon me. So first I write a spec where I define the basic structure of my program. I'm creating a class called case converter, and I'm adding a method to that class called upcase cars. I expect

23:03

when I pass a lowercase string to upcase cars, it will return that same string with the first letter capitalized. Next I draft a regular expression just to capture that first character in the string. I'm gonna use the slash a shorthand, which anchors my regular

23:21

expression to the beginning of my string. Next it's gonna need to find the first letter at the beginning of the string. In my first draft of this regex, I use the slash w shorthand, which will match any word character. So let's plug this into the actual upcase cars method. I define my regex, then I call

23:41

gsub on my string and pass it that regex. Next I use a block and tell it every character that regex matches, call upcase on that character. So when I run my spec, my spec passes. But there's a problem with this regex. I want to capitalize the first letter of my string,

24:03

even when that string starts with an underscore. Now in this spec, I state that when I pass it the string underscore method, I expect to, to the upcase cars method, I expect to receive that string back with the first letter capitalized. Now when I run this with my current code, with that current regular expression, this

24:22

spec fails. Let's take a look at the error message from that spec. I expected to get back a string with the lowercase m capitalized, but I got back that lowercase string instead. Something is not right here. There's a problem with the slash w shorthand. Sure,

24:41

it matches all word characters, but in its mind all word characters includes underscores. If I pass it a string that starts with an underscore, it will match the underscore, not the first letter. My Ruby code will then call upcase on the underscore, and naturally nothing will happen. I was presenting a draft of this, and

25:02

I was watching a chat from some students, and when I said this line, someone else responded, sure, if you upcase an underscore, it becomes a hyphen. It doesn't. I need to be more specific. Instead of the slash w shorthand, I'm gonna use a character class.

25:23

This character class will match any lowercase letter from a to z, which is exactly what I need to capitalize and nothing more. Next, furthermore, I'm gonna allow my reggots to match one underscore at the beginning of the string. Finally,

25:41

I'm gonna add in a question mark after that underscore that makes that underscore optional. This reggots will match both the string with an underscore at the beginning of it and a string without an underscore at the beginning of it. So my code, I'm gonna plug in this reggots to my upcase cars method, and this time my spec passes. So I'm ready to

26:01

move on to the next requirement for my solution. Find any character that follows an underscore and capitalize that character. Again, I'm going to define a spec. I expect that when I pass the string some underscore method to my upcase cars method, I will

26:20

return that same string, but with the letter s and the letter m capitalized. So to do this in my reggots, I now need a reggots that will match both the first lowercase letter of the string and any lowercase letter care- letter directly following an underscore. I take my current reggots and I add an alternate to it. This will now match

26:42

the first lowercase letter of a string or any lowercase letter in the string. Now to make that alternate specific to lowercase letters that follow underscores, I add in a look-behind. This look-behind adds a context to that last character class, to that alternate. It states that it will only match a lo- the lowercase letter

27:03

if it directly fa- is, pardon me, if it directly follows an underscore. So when I add this reggots to my code and run my spec, my spec passes. Now it's time to move on to the final requirement for my solution. I need to remove any underscores from the string.

27:22

Again I create a spec. For this requirement, I'm going to have a separate method in my case converter class called remove underscores. When I pass in a string with an underscore in it, I expect to get back that same string with the underscore removed. Now my reggots for this method is actually pretty

27:41

easy. I just need to find a literal underscore in my string. So in my case converter class, I create my method, remove underscores, I declare my reggots, which is just a literal underscore. Then I call gsub on the string I pass into it, and I tell it that anything that matches this reggots, replace it with an empty string. This effectively

28:03

removes all underscores from the string. Then I pass both a reggots and an empty string, and when I run my spec, my spec passes. So finally, I now have two separate methods for my solution. I need to combine them

28:21

into one method to combine the results together. I create another spec. This one for a method called snake to camel in my case converter class. When I provided a string with all lowercase letters and an underscore, I expect the method will return that string with the letter s and the letter m capitalized and the underscore removed.

28:44

My snake to camel method will first call upcase cars on the string that's passed into it, then call remove underscores on the result of that upcase cars method. When I run my spec, my spec passes.

29:00

The code I presented here is available at this GitHub address. I'll also be tweeting out this address after this presentation. There is definitely more than one solution to this challenge, and I highly recommend anyone who's interested, please submit a pull request or tweet out a solution. I'll retweet it. It'd be great to have some discussion going.

29:21

Life with regexes is a journey. A journey that is at first fraught with peril. But it becomes much easier as you learn and understand what happens beneath the surface. Here are a few tips to help you along your way. Powerful, elegant regular expressions are not developed all at

29:41

once. Develop your regexes in small pieces. Make sure those individual pieces work, then combine them together into larger holes. When you write a regular expression, you are programming in another language. The language of the regex parser. Like any program, regex need to be developed iteratively.

30:01

They come in drafts. Whenever I'm crafting a regular expression for use in Ruby, I first develop it in Rubyler. Rubyler is a fantastic site that allows you to easily create and test regular expressions on test strings. Now, a tip I picked up from Myron Marston on the rogues parlay list was to, whenever

30:21

I create a regular expression in Rubyler, make a permalink of it. Rubyler allows you to make permalinks of any regular expression you create on there, then copy the URL for that permalink and paste it into a comment in my code. This has been enormously helpful whenever I've needed to come back to a regular expression that I wrote a few weeks ago, or anyone

30:41

who's not as into regular expressions as I am needs to edit my code. Regular expressions are programming language of their very own. Like any programming language, they can be learned. They are a logical system and process that, at their core, is no different from any program that takes in a certain input and

31:00

returns a certain output. Regular expressions are powerful. So powerful they inspire fear in many of us. But that fear will be overcome through understanding. Here's the call to action. Fire up Rubyler. Experiment with greedy, lazy, and possessive quantifiers. Play with regular expressions. Have fun with

31:21

them. Move past your fear and take a dive beneath the surface. I think you'll be amazed at what you find. I'm Nel Shamrel. I'm an engineer with BlueBox. That's my Twitter handle. I used a ton of resources in putting this presentation together. That's a link to all of them. All of them are fantastic. I'll tweet out that link after this presentation

31:42

as well. Please check it out. Explore all these awesome works by many authors that helped me put this together. And with that, I'm ready for any questions. We have thirteen minutes. We have plenty of time.