Time to Use Regular Expressions More, Um, Regularly
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 60 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/37411 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2018 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
1
10
14
18
24
25
26
32
34
40
41
44
46
54
00:00
Regulärer Ausdruck <Textverarbeitung>Total S.A.Service (economics)WindowQuicksortParameter (computer programming)DampingRegulärer Ausdruck <Textverarbeitung>Order (biology)Real numberData miningWeightSurfaceControl flowOffice suitePhysical systemFocus (optics)Self-organizationMereologyData storage deviceProcess (computing)Different (Kate Ryan album)Presentation of a groupPerformance appraisalComputer animation
03:55
Regulärer Ausdruck <Textverarbeitung>Group actionSocial classQuantificationMeta elementRegulärer Ausdruck <Textverarbeitung>Greedy algorithmImplementationMereologyLine (geometry)Exterior algebraReal numberPattern languageString (computer science)Order (biology)CuboidElectronic mailing listExpert systemInformationBitLatent heatOptical disc driveMultiplication signComputer animation
06:54
Regulärer Ausdruck <Textverarbeitung>Scripting languageComputer fileEmailApplication service providerType theory2 (number)Video gameInjektivitätRegulärer Ausdruck <Textverarbeitung>InformationServer (computing)PasswordoutputWebsiteOperating systemParsingNeuroinformatikPhysical systemUniverse (mathematics)Validity (statistics)Hecke operatorComputer animationLecture/Conference
10:51
ParsingFormal languageHierarchyAliasingFunction (mathematics)Regulärer Ausdruck <Textverarbeitung>Pattern languageMeta elementSingle-precision floating-point formatRadical (chemistry)FrequencyLocal GroupSocial classMereologyDrop (liquid)Regulärer Ausdruck <Textverarbeitung>Escape characterSlide ruleMetazeichenPattern languageExterior algebraParsingString (computer science)Noise (electronics)Existential quantificationProgramming languageInformationGoodness of fitMatching (graph theory)Point (geometry)Electronic mailing listWeb 2.0WordDot productPersonal identification numberLine (geometry)10 (number)Computer fileMultiplication signWebsiteWindowCode1 (number)Exception handlingBookmark (World Wide Web)Optical disc driveWritingParsingMeta elementComputer animation
18:07
Regulärer Ausdruck <Textverarbeitung>Pattern languageSoftware testingModule (mathematics)Active DirectoryDefault (computer science)Sign (mathematics)TouchscreenResultantPairwise comparisonInheritance (object-oriented programming)Directory serviceVariable (mathematics)Matching (graph theory)Regulärer Ausdruck <Textverarbeitung>Semiconductor memoryString (computer science)Operator (mathematics)Generic programmingInformation retrievalPattern languageRule of inferenceCASE <Informatik>FrequencyComputer animation
22:19
Pattern languagePlastikkarteRegulärer Ausdruck <Textverarbeitung>Matching (graph theory)Dot productGoodness of fitPattern languageMultiplication signGraph coloringProcess (computing)1 (number)Universe (mathematics)2 (number)PlastikkarteRegulärer Ausdruck <Textverarbeitung>Arithmetic meanString (computer science)Computer animation
27:31
Regulärer Ausdruck <Textverarbeitung>Meta elementString (computer science)Default (computer science)Lattice (order)Direction (geometry)Pattern languagePlastikkarteCursor (computers)Point (geometry)Touch typingProcess (computing)Regulärer Ausdruck <Textverarbeitung>Matching (graph theory)Event horizonPattern languageDot productTrailFiber bundlePerturbation theoryExecution unitGreen's functionCursor (computers)Duality (mathematics)Graph coloringComputer animation
32:43
Regulärer Ausdruck <Textverarbeitung>Pattern languageString (computer science)outputPartial derivativeBacktrackingCursor (computers)Cheat <Computerspiel>Regulärer Ausdruck <Textverarbeitung>Matching (graph theory)BacktrackingMultiplication signWordComputer animationLecture/Conference
34:01
Regulärer Ausdruck <Textverarbeitung>Cursor (computers)String (computer science)Partial derivativeGreen's functionPoint (geometry)Cursor (computers)Message passingEqualiser (mathematics)Computer animation
36:29
Partial derivativeBacktrackingRegulärer Ausdruck <Textverarbeitung>CloningPattern languagePower (physics)String (computer science)QuantificationPosition operatorDot productPoint (geometry)Equaliser (mathematics)Function (mathematics)Line (geometry)Matching (graph theory)Presentation of a groupBitPlastikkarteLimit (category theory)Regulärer Ausdruck <Textverarbeitung>Condition numberString (computer science)Process (computing)Right angleSlide ruleNumberSign (mathematics)System callCASE <Informatik>Multiplication signArrow of timeEndliche ModelltheorieOrder (biology)Ferry CorstenPhysical systemTime zoneBacktrackingReading (process)Cursor (computers)Green's functionTouch typingComputer animation
45:50
Pattern languagePower (physics)Regulärer Ausdruck <Textverarbeitung>String (computer science)QuantificationGreedy algorithmBacktrackingMathematicsMatching (graph theory)CASE <Informatik>EmailNatural numberWindowSensitivity analysisVideo gameWordRight angleMultiplication signQuantificationBitMereologyPotenz <Mathematik>BefehlsprozessorGreedy algorithmDot productStapeldateiPattern languageBacktrackingRegulärer Ausdruck <Textverarbeitung>Power (physics)Metropolitan area networkMemory cardGoodness of fitType theoryDefault (computer science)Atomic numberPoint (geometry)Computer networkInverter (logic gate)String (computer science)Order (biology)Process (computing)Perturbation theoryExterior algebraVariable (mathematics)Axiom of choiceControl flowComputer animation
55:11
Regulärer Ausdruck <Textverarbeitung>MathematicsGreedy algorithmString (computer science)Pattern languageGamma functionRegulärer Ausdruck <Textverarbeitung>Greedy algorithmString (computer science)Line (geometry)CASE <Informatik>Sound effectPattern languageAbsolute valueAsynchronous Transfer ModeMatching (graph theory)ResultantTerm (mathematics)Multiplication signType theoryGoodness of fitComputer animation
01:01:05
Regulärer Ausdruck <Textverarbeitung>Operator (mathematics)String (computer science)Default (computer science)ForceWindowMatching (graph theory)CASE <Informatik>Regulärer Ausdruck <Textverarbeitung>Line (geometry)Heegaard splittingSpacetimeReal numberEmailCASE <Informatik>Matching (graph theory)Computer animationLecture/Conference
01:02:39
Control flowRegulärer Ausdruck <Textverarbeitung>Source codeSocial classWeightImplementationVariable (mathematics)Price indexLengthGroup actionWindowSocial classRegulärer Ausdruck <Textverarbeitung>Decision theoryLatent heatWeightPattern languageMultiplication signDependent and independent variablesPattern matchingDot productAbsolute valueObject (grammar)Matching (graph theory)Casting (performing arts)Sign (mathematics)Type theoryPoint (geometry)Motion captureUniform resource locatorComputer animation
01:05:51
Object (grammar)Control flowMaxima and minimaRegulärer Ausdruck <Textverarbeitung>Physical systemGroup actionPrice indexLengthWindowWeightImplementationVariable (mathematics)Social classPoint (geometry)InformationRight angleBuildingSign (mathematics)WeightRegulärer Ausdruck <Textverarbeitung>MereologyObject (grammar)Computer animationLecture/Conference
01:07:22
Regulärer Ausdruck <Textverarbeitung>Price indexGroup actionLengthSample (statistics)Object (grammar)FrequencyBuildingError messageWeightElectronic mailing listParameter (computer programming)Data typeCASE <Informatik>Lattice (order)Physical systemComputer configurationPattern languageAsynchronous Transfer ModeCase moddingRepetitionString (computer science)AliasingComputer fileDefault (computer science)Sampling (statistics)Regulärer Ausdruck <Textverarbeitung>WeightCuboidMultiplication signVariable (mathematics)Error messageDirectory serviceGoodness of fitSlide ruleInfinityFlow separationCasting (performing arts)Sign (mathematics)WindowIntegerExpert system2 (number)BuildingCASE <Informatik>Revision controlNumberObject-oriented programmingHierarchySocial classTerm (mathematics)Object (grammar)Pattern languageGraph coloringComputer animationLecture/Conference
01:13:56
AliasingComputer fileRegulärer Ausdruck <Textverarbeitung>String (computer science)Default (computer science)Object (grammar)MultiplicationRepetitionLie groupPattern languagePermianPositional notationPoint (geometry)Memory cardContext awarenessIntegerLine (geometry)Parameter (computer programming)Function (mathematics)Plane (geometry)Network topologyDefault (computer science)HierarchyContext awarenessRegulärer Ausdruck <Textverarbeitung>Electronic mailing listCASE <Informatik>Line (geometry)Parameter (computer programming)Selectivity (electronic)NumberWeightRevision controlComputer fileGoodness of fitPosition operatorPoint (geometry)2 (number)String (computer science)Computer configurationWeb pageRepetitionPattern language1 (number)Term (mathematics)Online helpDifferent (Kate Ryan album)EmailTraffic reportingAliasingMultiplication signSensitivity analysisComputer animation
01:20:28
Line (geometry)Pattern languageNetwork topologyPlane (geometry)Regulärer Ausdruck <Textverarbeitung>Asynchronous Transfer ModeWeightWeb pageString (computer science)Single-precision floating-point formatMultiplicationVariable (mathematics)Computer configurationRegulärer Ausdruck <Textverarbeitung>Multiplication signFreewareOnline helpWeightMereologyPattern languageSoftwareLine (geometry)Product (business)Operator (mathematics)Object (grammar)MultiplicationDifferent (Kate Ryan album)Software developerJava appletMatching (graph theory)Single-precision floating-point formatDot productWindowField (computer science)Mobile appNetwork topologyWebsiteSoftware testingNumberCodePunched cardInformationView (database)Visualization (computer graphics)Point (geometry)Internet forumAddress spaceComputer animationLecture/Conference
01:26:53
Regulärer Ausdruck <Textverarbeitung>WeightRegular expressionNumerical digitSocial classRange (statistics)Limit (category theory)Square numberVertical directionSpacetimeASCIIRegulärer Ausdruck <Textverarbeitung>Home pageSocial classMatching (graph theory)WordFitness function1 (number)Content (media)AreaCuboidComputer fileLine (geometry)DigitizingPoisson-KlammerNumberWindowSet (mathematics)Inverse elementPoint (geometry)Right angleProcess (computing)Goodness of fitSquare numberComputer animation
01:32:30
Regulärer Ausdruck <Textverarbeitung>NumberComputer fileNumerical digitQuantificationGroup actionSoftware testingString (computer science)Performance appraisalPattern languageDigitizingRegulärer Ausdruck <Textverarbeitung>Personal identification number (Denmark)Matching (graph theory)Point (geometry)Information securityQuantificationMultiplication signMaxima and minimaGraph coloringMetreString (computer science)Metropolitan area networkSurvival analysisSelectivity (electronic)Proper mapGoodness of fitComputer animation
01:37:05
Coma BerenicesXML
Transcript: English(auto-generated)
00:11
So, welcome, welcome, welcome. Thank you so much. Some housekeeping things. How many of you have seen me do a talk before? I just want to see. Okay, that's a fair amount of you. Okay, thank you very much. And let me just say thank you, thank you, thank
00:22
you, thank you, thank you. You may know that I've been doing this for 35 years and this is the last talk I'm ever going to do. And so, let me just take a minute to say that if you guys hadn't bought my books and gone to my classes and stuff, I would have had to get a job and I don't think I could have handled that. I mean, seriously, it's just, so, respect, thank you, thank you, thank you, all of you. How many of you
00:42
ever used a regex of any kind? That's almost everybody, right? Copy and paste it. How many of you are, you know, pretty badass about regex? That's all right, come on. I'm hoping not to bore you, okay, because I will tell you about what we're going to do. We're going to start for the basics and we'll, you know, we'll wrap it up. My, you know, my goal is, my goal is I want you, if you're not already good in regex,
01:09
I'll make you kind of good in regex. We'll go through the things you need to know. But my approach to this is going to be a little bit different because I want to be sure that the focus is PowerShell. I mean, part of what I'm here for is even
01:22
if you know regex, I hope to teach you some things about how PowerShell does it because the commandlets and the various parameters are good, but they're a little wonky and sometimes you have to punch out the .net in order to get this stuff. And finally, I want to say thank you to Lisa at Brookstone.
01:40
Did you notice yesterday that Jeffrey was having a little trouble sometimes where his Surface would kind of like not do anything and he'd have to talk for a bit? I was sitting up front and I only noticed it because mine was doing that too. I'm thinking it's one of those Windows as a service updates that makes Windows better. And, no, and seriously, I smoked it out.
02:01
I have a Bluetooth mouse because it's a Surface. It only has one USB port. It's not a 3.1 for some reason, but I have a USB port. And so for the last few days I've been using my Bluetooth mouse and the system just, and it's really annoying.
02:23
So I finally said this is crazy. I need a wired mouse. So yesterday after I left, I walked over and said where am I going to find a wired mouse? I'm in Bellevue. This is Microsoft world. I should be able to find one easily. I know. I'll go to the Microsoft store. The Microsoft store doesn't have wired mice because I couldn't figure out why.
02:47
So they said I should go to the Brookstone, which was nearby. I knew that wasn't going to work, but there was nobody in the Brookstone. So there was this nice woman and she said, no, we don't do those. I don't know where you'd get one of those. I said, really? I said, because I'm doing this presentation and I need a wired mouse.
03:02
She said, wait a minute. She went back to her office and she gave me her mouse and I'm going to bring it back to her. So without Lisa, this talk would not be possible. So, okay. Other thing too is I'm not sure what the PowerShell organizers intended,
03:21
but I'm not going to talk for an hour and 45 minutes straight because you all would kill me. And there would be lots of, you know, dampness in the underwear and such. So having done this for 35 years, it's real hard to sit for that long. So what I want to do is I'm going to go 50 minutes until 10 of, 10, and we'll take 10 minutes.
03:44
Just get outstretched. I don't know if there's coffee or anything like that, but at least we'll get a chance to do a bio break and that sort of thing. Yes, that means there will be 10 minutes left that are not in the presentation. Be sure to kill me on the evaluations for that. But anyway, so our goals in this talk, I'm assuming that very few of you, you know, on the order of 10% are RegEx experts.
04:03
So you probably know more about this than me, which is fine. I want you to understand why we'd use them. Because, you know, we always hear that RegEx is good. It would be nice to get some information about, you know, how exactly that is. Then from there, as I said, we want to drop into syntax. Now, the worst thing about RegEx in most people's minds is the syntax.
04:22
And they're probably right. Well, that's not true. I guess it's the engine as well. But anyway, we're going to start off with just a little bit of syntax. Just enough to do this world's simplest RegExes. And the reason we're going to do that is we're going to then go to the engine.
04:41
It's been my experience that when I look at RegEx papers or whatever books, they all get heavy deep and real about what the syntax is, you know, to write the long strings of line noise, which is great. We'll get to that. But the problem is, once you know how to do a RegEx, so how many of you have done a fair amount of RegEx? Let me see the hands real quickly. I want to ask these people.
05:00
So you learn some RegEx, and within a week, did you end up writing some RegEx pattern that looked really good to you? But you're running a thing, and it's been, you're waiting five minutes to get the answer, and I've got an i7. It should be faster than this. And it just boils down to, if you don't know the engine, then you're wandering around in the dark. Even if I could give you the Vulcan mind meld,
05:23
and you could learn all of the syntax, you'd still be dangerous in RegEx. So we're going to go to the engine, because the engine is important. It's how PowerShell thinks. That will lead us to realize that there are parts of PowerShell where PowerShell becomes greedy. And I mean that in a technical sense. The sad part is, you can make it not greedy,
05:43
but your only alternative to greedy is it's got to be lazy. I've had employees like that. But you know, it's, so anyway, so we're going to do that. And then I want to jump over to the specifics of running it on PowerShell. Because there's a couple of odd things, not odd things,
06:01
but different from what you've seen in a Unix or Linux implementation or something like that. From there, then we will meet the RegEx-y commands. There are specific RegEx-y specific commands. And then with the time that's left, we'll do that syntax. The reason I did that is very specific.
06:21
I've done this talk a few times, not many times. Because here was my thinking. I told some of you this before. I like to do these talks. And sometimes I run out of stuff. There's nothing interesting to me. And so what I'll do is I'll say, you know, we all have this list of stuff I really need to learn that we don't get to.
06:41
And RegEx has been around since 1951. So we've had plenty of opportunities to learn it. And so my thought is if I can package it up into a little talk, then I can save you guys a tremendous amount of trouble. And that way you can check that off your box of stuff I really need to know, okay? So, all right then. So why RegEx, you know, the basic stuff?
07:00
First of all, it's an old, well-understood tool. I mean, it is so old, it's older than me. And that's old. And so that's, and what that leads to is something that's even cooler. Is that if you want to talk about a platform-independent tool, this is the poster child. I mean, I don't think there's an operating system
07:20
on this planet that doesn't support RegExs. And probably lots of other stuff too. I mean, you know, my phone probably knows how to do RegExs. It's a sad thought, but it probably does, you know? So that's pretty nice. That means even if you never pick up PowerShell again, which would be a terrible, terrible error, then you still have the stuff that you'd need to know
07:42
to use it in Unix, Linux, blah, blah, blah, blah, blah, whatever, okay? So we can also use it for things like input validation. I've written, for example, when I was building ASP scripts or something like that, the question is, somebody wants to create a password to make an account on my system. Well, does the password meet our criteria, you know?
08:02
What about this thing? If I'm logging, if someone's signing up for my newsletter, I don't want it anymore, but when I did, you know, the identifier, as it is for everybody, is the email. But you know that there was a problem, I guess there still is, where if you're not careful, you ask for an email and somebody types a SQL injection string. Now, I don't know about you,
08:21
but when SQL injection became a thing, which was around 02 or something like that, I was sitting down, getting ready to write some VB script in my ASP pages, and saying, how do I get past this? And everybody, ding, regex, and all of a sudden, life got a whole heck of a lot easier, you know? We can do it to find and update text.
08:41
Imagine you've got a website. Website is gonna be a folder full of text files. Maybe your company's name has changed from Acme to Ajax or something like that, and you want to be able to make those changes. The easiest way to do it, and the Unix people have been doing it for a million years, with a tool called grep. I'm sure you've heard the phrase. And grep stands for?
09:01
Generalized regular expression parser. That's right. Only we sad geeks know that stuff, you know? Sadly, I know that too. And there's a grep, there's a grep clone, there's something that behaves very much like grep in PowerShell, called select-s-string, and we're gonna meet that guy as well.
09:21
So a couple of examples I can think about is, you've got a folder full of files because it's your website, and you want to do a global search and replace on all the files. Or alternatively, this is another one we've all been struggling with for the last 20 years, is we have all these files that have been sitting in our computers for the last however many years,
09:40
where we're a university or a hospital, maybe we're just a public company, a publicly traded company. Is there any information in that that has publicly identifiable information? You may know that Server 2012 introduced a file classification infrastructure. But if you wanted to be able to type in the things you want to look for,
10:01
they gave you two options. B-star, you can use that. Anything fazier, you gotta go to regex. And in the last 10 years, Microsoft has been doing this more and more. So you're gonna find that even if you don't use PowerShell, this is going to be useful stuff. And the best thing, how many of you are OneNote users?
10:21
Okay, great, because if you're a PowerShell user and you're not a OneNote user, I hope you're an Evernote user, because I don't know about you, but if something takes me more than 30 seconds to figure out in PowerShell, I copy and paste it into my OneNote, which is on my phone and it's on my iPad and all that kind of stuff, which is just absolutely lovely.
10:41
I have everything on OneNote, but you know what OneNote can't do? It can't do regular expression searches. If I could do regular expression searches on my OneNotes, I'd never have to leave the house. It is actually a feature request. It's been on voice, on the OneNote voice, for the last two years or something like that.
11:02
But go vote it up, go vote it up in one day, one day. Anyway, now what's it bad for? Well, you know, it's not a parser. It can parse a little bit, but it manages, excuse me, it matches the patterns. It doesn't, it's not a parser. So you still end up having to write parsing code if you're doing this stuff. It's quite odd looking,
11:22
as I've already said a couple times. It looks like line noise. It is said to be hard. You know, once you get into it though, it's kind of addictive. And let me suggest right now that if you want to play with this, get on the web and there are about a million sites that have a list of all the English words. These lists are good or bad, it doesn't matter.
11:43
But if you've got tens of thousands or hundreds of thousands of words, it can be tremendous fun. Because at that point you can, even if you're not cheating in crossword puzzles, I just find it fun to say, are there, for example, I came up with this question this morning I haven't answered yet, which is are there any English words
12:00
that start with the word X and have two vowels next to each other afterwards? So this kind of stuff we can all find out. Anyway, now not everything's good, it's not good at everything. You'd think that, when you think about text processing, the discovery of palindromes could be interesting. Things that are same forward and backward. Noon, civic, stuff like that.
12:20
It's not particularly good. At least I've never figured out how to do a general purpose palindrome detector. If you told me a length, like you said it's an eight character word, then I could write that regex, okay? So let's start from regexes. By the way, also, as I think you've seen in these talks, please ask questions.
12:41
If something's not clear, remember that when I'm talking text stuff, it's the 10% that are laughing with me and don't think by being quiet that that's good. No, I mean, if something I said doesn't make sense, if I forget to define something, please, please tell me. So the notion is that you need a regex pattern and your regex pattern needs,
13:01
I'm using my surface, which means unfortunately I can't read this as well. So there's a regex, the notion is we get a string and the string is what you wanna find the patterns in. So you've got this big pile of text and you're searching for some word or something like that. That text is called the target string.
13:21
Some people just say the target, some people say the string. And then there's the pattern itself, the regex pattern itself. And the regex pattern can be as simple as regular old text. So the simplest stuff, it's just regular old literals. For example, if you wanna see if there's any Bs in the text just search on B, that's the actual pattern.
13:40
In this case, we've got a very simple one, just B-E, that's all there is to it. Well, will it match? Well, it will match. It's gonna try to apply itself to the target information. So for example, if we said something like to B or not to B,
14:00
how does regex work? Well, we're gonna see in more detail. But the notion is the B matches that B and the E matches that E, so we're happy. That was a match. But notice that the way regex normally works and it can change this behavior.
14:20
But by default, regex's only question is what? Is there a match? Because if we say to B or not to B, how many Bs are there in there? There's at least two. And yet regex in general, you heard the weasel words, there's a way to adjust these things. But regex in general, leftmost first. I'm gonna say that again, I'll have a slide on that,
14:41
but leftmost first. Notice also it doesn't always necessarily match the places you'd imagine. Of course, when you say, for some reason, if the B is in the first letter, we don't tend to think of like A wouldn't jump out of us as having a B. And notice also, if we say many consider Abe Lincoln to be the best president, well, we haven't hit that B.
15:04
Make sense? That's the easy one, good. Great, great, great. And in that case, that part of the pattern are called literals. The alternative is a metacharacter. Metacharacter is a wildcard. But the thing is, they are,
15:24
the one I'm gonna show you is dot. Dot says any one piece of text. So it's a wildcard, but it's a wildcard that's taking its meds, you know? And so that would mean, so if we have that pattern, this guy, metacharacter's dot,
15:43
it matches any single character. So B dot, now dot matches anything, but there has to be something there. And so B dot would match, as you see here, if we say B E dot, we need to match a B and an E
16:01
and then anything. So we got that, B and an E, you gotta have that E. Bet, abet, antebellum, bear. It would not match just B. If the target string were just B, it would not match. And why? Because the B would match B, the E would match E, and the dot would have nothing to do.
16:23
So it would not be sufficient. Make sense? That's an important point. The reason I stress that is that all of you know the asterisk wildcard that's existed in the Windows world for longer than the beginning of time. And that one, that one's easy going. You can give it nothing and it still matches.
16:42
With a dot, that is not the case, okay? Yes, please. Would it match if the string was B E space? Would it match if the string was B E space? I think so. Yes, I think so. Because dot matches everything except for the new line character.
17:02
So I'm, oh, oh, yeah, yeah, absolutely. Or my favorite is when I accidentally save the text file as Unicode. That never works well with the regex. But that leads to the next question. Whenever we see a meta character, my question is always, oh, wait a minute, I wanna be able to use that thing, you know?
17:21
And so, what if I wanted to use a period? Well, if you've done any Unix work or C work or any, you know, out of that part of the universe, then you will not be surprised. But if not, it's a backslash. Backslash tends to be what we call in programming languages an escape character. I don't know what its official word is in regex.
17:44
So, and so, here's the extreme example I can think of was if you wanted to do will I M, that's the way that you would do that. Where the dots would be, you have to have backslash dots. Does that make sense? Questions, anybody? You guys got all quiet, it's like this.
18:01
You're a pin drop. I'm gonna mention, I have a reference slide here about these other meta characters. We're gonna meet all those guys as time goes on here. So, it's all well and good to talk about regexes, but wouldn't it be nice to be able to try one out? Well, there are a lot of things in PowerShell that will use regexes. But the basic one is you could take any string
18:22
and use dash match. And by the way, a little sidebar here. If you've done PowerShell for a while, you probably know the comparison operators. And if you've only done it for a little time, you've probably learned dash EQ is equals and dash NE is not equals. And then you probably also learned early on
18:41
that dash EQ doesn't like wildcards. So, if we're doing active directory stuff and I say Sam account name dash EQ J star, I'm not gonna get everybody whose name starts with J. You'd have to have an actual name with J asterisk. Got a J low, why not a J asterisk? I don't know, you know, but J star. So, and in that case, what do you gotta use?
19:02
The like operator. Well, you knew that, most of you knew that. EQ is exact equal, dash like lets us use wildcards. What everyone forgets is that 85% of the time, whatever that commandlet is will also take dash match. Now, that doesn't always happen. A lot of what I do is active directory stuff
19:21
and when I'm teaching people beginning PowerShell, I wanna get them into it. You can't just teach PowerShell in a vacuum. It's nice if you can point it at something. I like active directory because pretty much everybody understands enough about active directory. But it's a sad thing that when you do, active directory's most generic retrieval tool
19:40
is get dash AD user in case you don't know. And you could have, you know, you could do the usual comparisons. It's sad to see though that there is no dash match in active directory, bummer. But it's because of the way it's built into the hood. But there's ways to get around it, of course. So the way this works is that I have my target string
20:01
and I apologize because I have to point at the screen but I don't wanna talk to the screen. So I'm coming back to you to do this. And if it looks strange, it is, sorry. So that's our target string and I wanna see whether SH is in there. The way I do it is I just say there's the string, which of course could be a variable or something coming across the pipeline, whatever.
20:20
And dash match and then SH. The response, which you will get on the console, is a true or a false rather than a dollar sign true or dollar sign false, right? Yes? Didn't that bother anybody? Doesn't dollar sign true bother you?
20:44
Well, dollar signs, okay, why is there a dollar sign on the front of a variable name? Well, because they cost memory. But, but that means that dollar sign true is a variable
21:01
and truth is not variable, that's what's wrong with this country. I know we do constant stuff. Anyway, so there we see. And additionally, you don't see it unless you ask for it but there is a matches, which is in a raid because we'll be able to put lots of, we'll see a lot of stuff in it later.
21:21
And so if we do this, if we say PowerShell, like that, and then I also, I not only want to see the result but I also want to see matches, I see true and I see what it matched. We were looking for SH period and it matched SH. That's super important. It's particularly super important
21:40
when you're just getting started. Because if all PowerShell tells you is true or false, you matched it, you don't know whether you almost matched it. This is a case where almost isn't what you're looking for. I mean, if you're lazy, you can, because it's easy to build sloppy filters, you know, excuse me, sloppy patterns. So that's the reason I always say to people if you haven't done this before,
22:00
start out with the dollar matches at zero. Notice it's zero because by default, even if there's 15 things that could match, we always get, which the first one, which by the way, also, I'm going to say it again, is the leftmost. Always remember the rule of the leftmost when it comes to regex. Any questions on that?
22:21
Now, that's nice, but that's not what I want. I mean, I want those wild, wild cards, the ones that haven't been taking their meds for a while. The ones that I say, go forth and suck down the universe and they do that. You know, all in RAM in no time at all. So very, very exciting stuff. Now, so if we have a pattern B dot P, okay?
22:40
So that's only going to match, you know, one character. So it's a wild card. So as I say there, you see that I did the stuff in color. And by the way, did I pick an okay font? Can you guys read this in the back okay? Okay, if you can't, just let me know. We could just do the old, you know, magic blow up thing, right? So anyway. Thank you, Dr. Arsenovitch.
23:03
And so it'll match oboe. It would match able, but not bone. Because again, those of you who know regex wouldn't even think this, but beginners tend to see, oh, well, let's see, that's a bone. So there's a, where was my bone? Ah, buh buh buh buh buh buh buh buh. Where's bone? Is that, oh, got a B, that matches.
23:21
We got an O, that matches. Oh, we just skipped that one and that. Yeah, you can't do that. You can't do that because dot is positional and it only takes one character, okay? So sometimes it would be nice to say, I don't want to worry so much about it. Maybe I want a little more open-ended. And the more open-ended thing is not an asterisk,
23:40
as you'd expect from the stuff that we do. It's the dot, which we already have. And then after it, you can put a plus, or as you'll see later, an asterisk. We'll talk about an asterisk in a moment. The plus's job is it says, it modifies. That's an important thing. Technically, it's called a quantifier. That modifies the behavior of dot. And so it says, okay, dot, where is it?
24:04
There we go. So okay, dot. I see dot here, and then there's an asterisk after it. And that means, now I'm gonna tell you about the asterisk. Asterisk means zero or more matches. Plus means one or more matches.
24:22
That's important. That's a big distinction. You'll get it quickly if you haven't done this before, but it is a thing we all trip over at some point. Because what is dot star? Dot star is the old asterisk that we've known for ages. It'll match a null, it'll match a nothing, and it'll match a thousand characters if you want.
24:41
Okay, that make sense? All right, good. And always when you're getting started, try to think like PowerShell. Play PowerShell in your head, you know? So for example, this guy, B plus E matches bear. Well, now what, okay? So because it's B, A,
25:03
actually it doesn't do that, sorry. If I did, sorry, that one does. Because it's B, and then this matches the A and the Y, and then I've got the, does that make sense? Experts, I know I'm boring you. Beginners, is this making sense?
25:20
What's that? I'm sorry, second place. B, A, Y, E. Oh, for the star, yeah, yeah, yeah, okay.
25:41
Thank you. Okay, other thing to know about regex, oh please, hi, short. Dot says anything goes in here. So for example, if I had a pattern G dot OST, then ghosts would match there, okay?
26:02
But if there were a ghost, that wouldn't. You can't fit two characters in there. Again, that's the thing that's, if you're new to regex, because we're, I don't know about you, we're so used to that asterisk that we've been using for, since the dawn of time. Good question. Other questions, anybody? Great.
26:22
Oh, good question, yeah, yeah, yeah, yeah, yeah. DOS had a sync, that's right. The question mark in DOS, I don't even know if it works anymore, but the question mark in DOS did dot, because it was just one character. It was a wildcard for just one. That's exactly right. I forgot, and I haven't used that in ages, but yeah, but it's still there, cool.
26:40
What's that? Oh, I was talking about in DOS. Excuse me, when I said DOS, in the command line interface. DOS. I think we're on to trace in Quattro by now. Anyway, so yes, good questions. Thank you, thank you, thank you. What else, anything else?
27:02
Yes, please. So the question is, where do I have to place, where do I have to place the plus or the asterisk? So the plus and the asterisk are close cousins. They do almost exactly the same thing. They are quantifiers,
27:20
which means they come after the pattern. So for example, if I wanted to say, if I just wanted a string of Es, then my pattern would be E, because that's the pattern, and then I'd have a plus. Why no dot? Because I'm saying the pattern I'm looking for is E, and I want all of them I can get.
27:41
So an E with a plus, that says, match how many Es? One to a million. Okay, thank you. Anybody else got any questions? All right, great, let's go. Let's see how the engine works. So the regex engine, this is how things actually happen, and I should mention there is some regex terminology
28:02
I should be sharing with you. One is when we have a pattern like that, the pattern is composed of pieces. I call them chunks. Most people call them chunks. The correct phrase is atoms. And an atom is a single unit that gets something done.
28:23
Now, there's some distinctions here. Important to get this. A literal like B, that is an atom. E, the E next to it, that's an atom too. But this guy, dot plus. Dot plus together is an atom, and why?
28:42
Because who's the main event there? Dot. What does plus do? It's a quantifier. It says, okay, you like this? We can have this many of them. Questions? Great, thank you. All right, so we talked about that. The engine scans, as we've already seen,
29:02
the engine scans you left to right, trying to match things one at a time. And so, basically, I want to show you how the engine matches, and I have been trying for quite a while to build a graphic to do this. So I'm gonna come over here, because I want to narrate, and I need to be able to, there we go.
29:22
Otherwise I'll look dumb when I'm doing this. So, dumber. So here's the deal. I got a pattern. My pattern is going to be B, B-E. Doesn't get easier than this. I'm gonna ask the question, is B-E inward a beam? And yes, it's a trivial question, so of course you saw that that was the answer.
29:40
So how do we find that, though? Conceptually, if you want to understand the RegEx engine and you do, think about three cursors, three pointers, whatever you want to call them. The first one says, where am I in the pattern? That's the red one.
30:03
Sorry, I should say. That's the red one, okay? And that keeps track of what we're currently matching. Then there's, this is the target, this is the target string. Think of this as the cursor that says, this is where we're gonna try to match. And you'll see another one in a minute.
30:21
So the first thing that happens, we start off like that. The, on the pattern, we're pointed at the first letter there, the leftmost, which is B. And on the target, we are, and now the engine says, are they the same? And the answer is, well heck, they are. Hot diggity.
30:41
When that happens, don't touch the red. That's bump the green. So we bump the green, so there was no match. So now we've moved, this thing has moved from there to there, it's gone from A to B. Everybody with me on this? Okay, because this could be a terrible graphic.
31:02
I made this up last night, and I'm hoping that this is the ultimate answer. Because previously, I just had to have two laser pointers, the red and the green, so that we could keep track of what was happening, and, oh no, I've really done this on several occasions. I did this in Slovenia about a year ago,
31:20
well, a little less than that, and had more people in a talk than has ever been at the Slovenian show in the last 25 years. So dual laser pointers are the trick to getting better attention as a speaker. So where are we now? It's that we're still pointing at the B in the pattern, because that doesn't move.
31:42
We bump this guy over. Is that a B? Is that a B? Yes, it is. Hot diggity, we've got a match. When we have a match, we bump them both. Now, have we finished our job yet? No. Why have we not finished our job? Well, we put this guy to work, but we haven't put this guy to work yet.
32:01
So what have we got now? So we're happy. Notice that we've got a partial match. That's what I got in that orange color there. And now we've moved everything over, and the question is, is E equal to E? That's a toughie. But yes, we're gonna say that is. And that means hot diggity. We're gonna keep, what do we do? We bump red and green.
32:23
But what does that do? Oh, uh oh, red's got nothing left. What does that mean? We matched, great. Does that make any sense? I know that was easy, but we had to do an easy one, okay? Let's get a little more complicated and then a little more complicated. We're gonna do two more. All right, so far?
32:41
No more patterns? Success, success. Questions on this? We good? Because if you get this, once you get the engine, you can leave. Rest is easy, just, you know, you read it off, cheat, cheat, cheat. So, RegEx concept, the left shall be first, okay? It always works that way. The interesting thing about it is,
33:00
there are times that if you've done some weird stuff with wildcards, which we, you know, we end up doing with Medicare, Medicare just, sometimes you'll look at what you actually got and say, how on earth does anyone think that's the left most? But that's why we're doing the engine. Okay, let's do a second one. This was, this one wasn't interesting,
33:23
because the first time we found a partial match, it turned out to be great. But what if, what if it wasn't a beam, so we'd match the B? Maybe it's about, or something like that. So it'd be like, hey, we got a B, we're really happy, and now we're gonna match the E with the O, and you can't match E with an O,
33:40
so we have to start over. So, let's do a situation where we do a, we have a failed partial match, and then have to backtrack a bit. Backtrack is a bad phrase in human conversation, but in the regex world, you'll use the word backtrack a lot, okay?
34:00
So, here's our deal. I want to, and we're gonna add a third cursor here, so here's my question. Regex, is there BE in imbibe? I did this specifically because I was thinking about this summit, because I've been to all of these, and there's a lot of imbibing, so I thought that would be interesting. Can we be in imbibe?
34:20
And so, hopefully not now, but still, you know. So, same as before, all sessions in 4.1 have been canceled for the rest of the day, because Manasi is a clumsy idiot, so. So, red cursor here, green cursor here, sorry.
34:41
Red cursor here, green cursor here, and so our first thing we look at, does B equal I? No match, what do we do? We bump the green, leave the red where it was. Moved over here, now it's our next question. Does M match? It doesn't match, bump the green.
35:01
Is this making sense? Is it starting to feel normal? Is it starting to feel like, yeah, you know? And so, so now what happens? So now we're trying to match a B to a B, to a B. A B, finally, excellent. I'd be so happy just knowing that I've done that. So because we had, because we did that, when we got a match, we bumped the red and the green,
35:23
okay, and before we do, we have a partial match, right? We've already got a B to B, we might have to backtrack. If we gotta backtrack, we have to remember where to start. Where did we start off when we started looking for this? Well, we already rejected this,
35:41
we rejected this, we rejected this. If this goes wrong, we're gonna back up to the point after this. So we're gonna create an extra little cursor there that says, if we screw up, you go back to this guy and go to the one after that guy, all right? So remember, we last started from B, that didn't work, now we're gonna start from I next, if it doesn't work.
36:03
We try to be positive, you know? So, all right, so that's where we are now. We have a partial match, we have a partial match, we got a B, and we're hoping that the next thing will be good, okay? Now we try to say, uh-oh, let's see, so the B matched, after the B comes an E,
36:22
and after the B comes an I, oh, E does not equal I, bummer, so we've gotta go back for another pass. This is where we're gonna do the backtrack. The notion is we're gonna move back to the last position that we didn't start from, from the rightmost position that we didn't start from last time.
36:40
So what are we gonna do now? We start like that, notice that we have moved our starting point before, when we got our almost pattern, we were at B, and we remembered that because it had the blue cursor on it. We go to the next one, so now we're starting from I. Does E equal I? It does not. What are we gonna do now?
37:01
Don't touch the red, bump the green. Bump the green, by the way, if you have ever written a regex engine, or you know the deep internals of the regex engine, it is more complex than this, but this is close enough that it's a mental model that you can use in order to get a feel for what your system's doing or not doing, all right? So with that in mind, so now we got B and B,
37:23
that's good, we're happy. Once again, let's remember that, because if this fails, we're gonna have to go to the next character, and then we bump, and so now we have E and E. What does that mean? Because the next thing we would do is that I should move the red arrow over.
37:41
That would mean we've satisfied the pattern, and we're done. We did indeed demonstrate that we can B in imbibe, okay? Questions? Please. What if the I was actually a B? Okay, oh, great, great, great. So what would end up happening there would be that we'd rolled off the end.
38:03
What you'd find is that eventually this guy would, eventually we would run out of target. If we run out of target and we haven't matched, then that means it was a fail. That's a dollar sign false rather than dollar sign true. Does that make sense? You're looking like I'm completely wrong. Yeah, that's not what I asked. Oh, I'm sorry, go ahead. Do it again.
38:22
I can't receive it, but all right. If the second I in imbibe was a B, right? Oh, okay, the second I in imbibe, so it'd be I-M-B-E-D. Then what would end up happening, oh, we would keep doing this, I'm almost matched, but then I'm not, I'm almost matched, but then I'm not.
38:40
But then how do you back up to look for a new match? Just keep working with that. So the idea is, let's say that I match, I'm sorry, I didn't include it. So let's say I match on this B. I remember that that's where I matched at. Okay, great, where I got a partial match. When that fails, I've got to remember this B,
39:02
because if I don't remember this B, then I don't know where to get started. So we're matching the B as much as it is the position on the line that it's analyzing. Don't think of it as matching the B and it matches that position. No, I think that's a great question. You're basically asking, what if there's no match? I mean, how do we know that there's no match?
39:21
Okay, I'm saying, and I'm being stupid, I'll get it. No, no, no, go ahead. He's trying to ask about multiple partial matches. So yeah, it's a partial match. So you've got BI did not match. BB does not match BE.
39:41
But then you have to figure out, oh, I have to look at the second B and go back and match that with the first one. I have done an incomplete representation of corporate. I'm sorry, I'm just trying to get the, yeah. So I think what would happen here. And talk to me later.
40:04
The last step on the previous slide would just be done again, one character over. So you would just keep. No, I think he's found a niche condition where I screwed up in which case I'll go back and look at it. But definitely the basic idea. Basic idea is we try to match, we try to match, we try to match.
40:21
When we get a little bit of a match, great. If we get, if we exhaust it, terrific. We got a match. If not, oh, we got a partial one. You got to forget about that, back up a bit. Not back to where you were doing before, but the next one is you don't want to move. And you slowly step your way to the target until you either pull off the end, which means that you had a false on the target
40:42
or find something in which case it's good, okay. And I apologize. I promise we'll talk about this later, okay. Because you might've caught something and I want to know. Since I want this award winning presentation to be. You know, great, great, great.
41:00
So we saw two of them. We saw a relatively straightforward one. Then we saw one that involved a little bit of backtracking, but here's the one that's really fun. Now let's add a wild card that's off its meds, okay. Our old friend dot plus or star, excuse me, dot plus or dot star, one of those. And so let's imagine that my target string,
41:22
because again, I've hung out with you guys at other sessions, is going to be beer house. That seemed like a, you know, that's where the imbibing would happen. And so the question would be, can I find, can I find B dot plus E? Okay, so remember, you're RedJacks experts, fill it out already. B dot plus E.
41:42
Now, half of you said to me when I said, what are you here for? Why do you want to use RedJacks? What do you want to learn? Half of you said, you know what the problem is? When I write one, I can't read them anymore. Yes, we believe, many people believe that RedJacks is a write only language. One. One could argue. One is working, I'll need to read it again. One is working.
42:01
But the problem though, is you have to be really careful. Remember how I said, just don't take the true or false, like what's actually matching? Because if you, it's really, there are times it's late Friday and you're like, yes, this is working. And I have an exit book that wasn't matched. And it works for three months and all of a sudden the phone call arrives at 3 a.m. You don't want to be in the time zone at that point. So, all right, so what's the parallel effect?
42:22
It looks like a B, what is that? That's a literal? What does dot plus mean? It's got to be one. What character? It doesn't care. Anything that matters, including spaces, who exposed me before, that's the first one, there we go. So that's great. So we're looking for something that looks like B,
42:42
at least one character I haven't specified, but maybe. B. This is important.
43:01
You know wildcards, you know star. Star leads you down the wrong road. This one not only can consume all characters, you have to get a truck in front of it to make it stop. The job of this thing, that guy, dot E, to be dot plus, each job is to go forth
43:23
and just from that point just consume everything in the string. Because that's my job, I can. Why are you doing it? Because I can. It's a dumb phrase, isn't it? Because I can. I can drink my own urine. It just does not seem like it. It's just, what's that? That's well referenced.
43:41
Absolutely not. Why do I drink my own urine? Because it's sterile and I like the taste. You know us. No, I, no, I, there's just, that is a fallacy. The reason people, people think that urine is sterile because a German chemist who was writing about it, remember this original article was in German,
44:02
which means no Americans ever read it. And well, you know, look, 75 years ago, all the technical stuff was there. But what he actually said was, the amount of viruses, mess, and whatever else you would find in urine is small enough that we can consider it to be zero. But, so just understand.
44:23
Don't drink your urine. It's important not to drink your urine. Some of the wine around here might taste a lot like it. I meant it in some restaurants. I was not impugning the lovely West Coast wine. Yes, yes, yes.
44:41
Somebody had a question, somebody. Yes, please. So more or less a question about the matches that come out. Yeah. Right. So, are you, would it potentially be smart then to use Regex to validate the matches that come out?
45:01
We know that there's some Regex out there. At least it's not urine. Okay, so, I'm sorry, I'm not using Regex to validate. So say you do a Regex, right, you have the example of she rather than, you know, like. Exactly, yeah, yeah, yeah. Or whatever the example is. You have three-letter match instead of a two-letter match.
45:22
Anyway, if you wanted to validate the output of those matches in PowerShell, really the only way to make sure that you were getting the correct, like, limit to be. Hang on, hang on. I guess I'm not understanding. What do you mean validate the matches? So, you said that potentially you could write a Regex that would return a match.
45:41
Oh, yeah, yeah, not what you expect. Yes, that happens all the time. So not what you expect. So you do the number count, right? Did you take that match and say, I mean, I know you can build a more complex Regex, but I'm just saying, the same market. Short answer, that's a great question I don't know, and I don't have enough excess CPU to answer the question. Okay. But no, but seriously, my email is markupnasty.com,
46:03
it will be until I'm dead, unless I get married. No, that wouldn't work. Anyway, so. So, you know, anything you don't get answered, send me an email and I'll do my best to answer your questions. Other questions, anybody? Absolutely, oh no, that's why we're doing this, so we can walk through this one.
46:21
This is fun. No, I'm serious, because the first time you see like, you say a bad word. You say holy before it, so it's not that bad, but still. I mean, hell, if distinguished fellows can use the F word on stage, then.
46:47
Are you dissing my friend Jeffery's number? No, I'm not a distinguished fellow. He's not a distinguished fellow anymore. He's CTO of Azure now, so he's no longer distinguished. He has some new title now, so.
47:01
But when I have dinner with him tomorrow night, I'll tell him he said that. So. So, so what are we doing here? We're saying, give me a B, that's easy. Give me an E, that's easy. What's in the middle? I'm not sure, okay? So let's just see what happens. So the plus quantifier again, what does it do? It says one or more, right?
47:21
One or more. Okay. So we're gonna try to match B dot plus E, and again, what is dot plus? As many dots as we can get from one to whatever is zero does not count. All right, so again, are reds in place is the way to expect it on the first item. The, this, oh, by the way, I should mention also, look at that little B versus big B.
47:41
Nobody asked me that question. Most, every other red checks I know of, are some that aren't that way, but any of the red checks that I know, start out case sensitive, and you've gotta tell them to be insensitive. PowerShell, however, just starts out insensitive. There's a microphone where it's Windows.
48:00
I can't speak to those things, but yeah. Yeah, so PowerShell out of the box, PowerShell red checks out of the box, and by the way, what you guys just said was funny, but it's not true. Dot net red checks, which is what PowerShell red checks is built on, is case sensitive by default. You have to tell dot net red checks to be insensitive.
48:24
Yes, exactly. Sorry, just being a nice guy. Referring to anybody by the name of, by the word insensitive makes me feel bad. So this does match. This would not match on most other red checks without some adjustment. Well, great, so we've got a match. We're really happy about that. What do we do? We bump the green, left the red in place, right?
48:42
Oh, sorry, we did that for the, forgive me. We matched, so they both move. Great. So the B matches the B. All right. Now we've got a dot plus, because remember, how many atoms are in this? Three. Three, exactly. That's an atom, that's an atom, that's an atom. Okay, great. So that means that we've matched the B.
49:02
Now we gotta find out what happens with the dot plus. Now, so the question is, what do we have on this side? We've got some text starting here. We have a dot plus. How much of that is matched by dot plus? And I gave it away on the PowerPoint, but why did I do that on the PowerPoint?
49:21
What does dot plus have the power to do? Suck up everything. Now, your rational brain, having seen lots of other things, says, well, no, it must just step along and step along and step along. No, it doesn't do that. What it does is says, minds, all minds! And so, bam, that's it.
49:41
So at this point, have you ever turned the speakers on while you're using regex? Yeah, you should see. So that'd be awesome. Microsoft had like a sound-producing regex. That would be neat. Ooh, oh, damn. Ooh, good, hot dog! I think there's a business opportunity there.
50:04
Hey, it's time I said we were gonna take a break. So let's take 10, okay? Get out, get a stretch. Come on, do it. You know how. It's hard to sit there and learn things. And come ask me questions. I don't need a break.
50:23
The string variables match originally and why I think this works. Because I'm talking a lot, and I'll discover this later. Was it Snover said yesterday about what we're supposed to do? Okay, welcome back. Welcome back. How's everybody doing? Everybody happy so far? Yep. We're about to halfway point. I said to you,
50:40
I said to you that it is often the case that I don't finish the slides, okay? That's why I'm doing the, you know how in school, they teach you the journalistic inverted pyramid? I'm trying to do the important stuff first. If you don't get to the syntax stuff, it's easy. It's right there. There's examples. There's stuff you can type and a lot of kind of stuff. But this engine you gotta get.
51:00
Okay, questions on what we're doing so far? Good. Okay, good. All right then. So where were we? So we found that B matched B, that wasn't a surprise. And we found out, again, the reason I'm doing this particular example is that people don't get that this is what Top Plus does. It's just, it's a wild man. So what have we got now?
51:20
So now we're in a situation, let's see what's going on there, right? Is it, it just eats everything. So now we're in a situation where we have matched. We have matched the B. We have matched the dot plus. Good so far. But then we have an E. Do we have an E over here that's left over that can match?
51:42
Look closely. You wanna say yes, the E's right there. But dot plus has already grabbed it. Is there an E at the end? There's not. Because dot plus has sucked it all up, mine, mine, all mine, the batch fails.
52:01
That make sense? Well I don't know about you if that makes me sad. What's the answer? The answer is backtrack. With backtracking what you do is that first we matched the B, and then there was all this stuff. And there's my E over here. But I'm dot plus ate that one too.
52:23
There's no E for me, we gotta stop. Now the engine backtracks. It says okay, dot plus, little bit of lithium but let's just back up one okay? So now, yeah, yeah, yeah, so I'm sorry about that, let me back this up. So now what we wanna do,
52:42
I haven't got this part in the animation yet, but the idea is that we're used to everything, we've now backed dot plus off. So what do we have? B matched B, dot plus matched all of this except for the E, why, because we backed it off. So B is happy, B's got B,
53:00
dot plus has everything almost. Is there an E at the end of this now? Is there a free E at the end of beer house? Yes there is, because we backed that one. So if there wasn't, it would go back to the last E in beer? If that hadn't worked,
53:20
it would keep backtracking. So like when I was doing something where I was grabbing HTML and I was trying to parse that, I was being lazy. And I said here, find this stuff, and that's what this data is, and just keep going. And occasionally I did HTML where it would just go off to la la land because this back, think about the exponential nature of this. What if I've got a bunch of kids?
53:41
So it's like, first we do backtrack, backtrack, backtrack, and what if the next time, as we step over, we have to do more backtrack, backtrack, backtrack. So backtracking can make life really miserable, but we have to have it in order to get wild cards working. So if we were at PowerShell Summit Germany, and we were going to a beer house. Then we'd be fine, exactly. There'd be no E. There'd be, yeah.
54:01
And we'd have to backtrack all the way back to the E. But we'd have to wait and put the verbs at the end. Anyway, so we totally don't want that to happen. And that's why that's,
54:21
because this is silly. I mean, it'll get the job done, but it's really slow. What we'd like to do instead is we have the alternative. Now this behavior is called greedy. I told you there's something called greedy. That's it. The behavior is called greedy. We have to instead make it lazy. I wish they'd come up with better words. So your choices are you can have greedy or lazy.
54:42
Have I mentioned I'm retiring in an hour? It's an interesting thing about IT. It's like I did a talk about 10 years ago about IPv6, an introduction to it. And I would ask the crowd, no, I'd say to the crowd, so how many of you are looking at the IPv6? How many of you are hoping that you'll be able to retire before IPv6? And all the hands go up.
55:02
Anyway, so to do that, all we have to do is we just have to change our pattern a little bit. Where's it go? We gotta put a question mark in it. So here we see, here's the original,
55:20
this is the easy match. This is, so my target, dollar T, is beer house. Then I've got dollar M is gonna be my match, the result of whatever the match is. So I say dollar T dash match B E. What comes back? Comes back that it's true, and that we match B E. That make sense?
55:40
We saw that before, that was easy. Then we said, hey, wait a minute, let's do the greedy thing. If we do the greedy thing, just think about what happens here. What matched? Beer house. How much, how much of that string got used?
56:02
All of it. Here's what I said to you. I said, find me a match that is B, some stuff, and E. If I said to you, beer house, if I said B, some stuff, and E,
56:23
what would you think the answer would be? Well for me, I think B matches B, dot matches E, and E matches E. So it should be B E E. You nacked, right? I mean, don't you,
56:40
if I had talked about the engine and said, here, find me a B followed by, I don't care, followed by E, you'd be like, simple, Mark. What are you, blind? Yeah, getting there, but I mean, still, B E E. But no, what did we get? We got the whole beer house. Wait a minute, that's nuts. That's just nuts. That's a side effect of greed. That's why we have a beer house. It sounds like fun, but it's not gonna end well.
57:03
Never does. Although having one that's good, particularly during the disease, during the cold season, you know, beer is good for what ails you. So anyway, ar ar ar ar ar ar ar ar ar ar ar ar ar ar ar ar. But if you're in Seattle, it'll be a bitter ale.
57:23
Oh, stick around, it gets worse. So what happened there was when we did Greedy, we matched, but the pattern we matched was all of the beer house. Now let's talk about Lazy. How does Lazy work? Well, the way Lazy works is that with Lazy,
57:41
B matches B, dot plus is now in Lazy mode. It's like, I'm so not taking, you know what? I'm over material goods, I do not want a lot of stuff. But it's my job, so I'll take one. And then we'll see. Okay, so now B's matched B, dot has matched E. What's the only thing left?
58:00
E. It makes a very big difference whether we're lazy or greedy. Because a greedy pattern, what did it get? The whole string. Did I do that too fast, does that make sense? Someone yell if I get, okay. This is a slightly trivial case for the lazy dot plus.
58:25
Absolutely. But otherwise, in a non-trivial case, it would go one by one. Yes, absolutely. So if there were to match, right, what would happen? In that case, dot plus would do what? It would say, on the first pass, okay, we got that.
58:41
That's not it yet, but I'll take that. It just keeps, but it takes them one at a time and then sees if there's an E after it. That's the way it works. Oh, you're not an E? Okay. I'll take you, but. I mean, and as you said, this is about as trivial of an example as I can give you. Imagine how in the real world, if you're trying to do, you're actually searching for bunches of stuff, this is how these things can get exponential
59:02
in terms of time. Question? He had multiple paragraphs, and the first paragraph began with a B, and the last paragraph ended in an E, that first. So when you say paragraphs, are you trying to pick up on this on this new line thing? Yeah, you get a new line, right? Will that pass by the new line?
59:21
Good point, that's a great question. So I'm handing you a document, some text file, and the B starts in one paragraph. There's some extra wrinkles within lines that I can get to. I can type if I don't, it's there. Basically, the standard behavior is the dot matches everything except the new line.
59:47
So what happens instead here? Now we decided to make it lazy. So again, let's go back to B dot plus E, that was greedy. How do we convert greedy to lazy? We look inside and ask ourselves, do we want to be?
01:00:01
Do we want to be greedy? And we don't. But the thing is, we're not big enough to get like Zen or something like that. So we go lazy. And so that's all I did. I added that question mark there. And now look, it's still true. What's the text that matched? Again, there's other tools we'll talk about in a minute
01:00:21
but that, you know, dash match will still let you do a lot of stuff. You know, if you're careful about remembering to pull up the matches. Does that make sense? Questions anybody? All right, cool. Yes, please. So in terms of the pattern matching, it just simply says, just to say, okay, character at a time? Yep, yes it does.
01:00:40
If you're lazy, then yes, you only, but you know, that could be inefficient too. I mean, stop thinking about it. What if, you know, Alan, like 95% of the cases, you really use beer house that you want. You know what I mean? Because it should be clear by now that you're gonna be happiest at regex if you have some knowledge of the kind of data that you're looking at. And the better knowledge you have.
01:01:02
Okay, all right, thanks, great. So that was great. Engines, that was the essential thing. We got through that. Phew, okay, we still got a half an hour. We're gonna get this done. regex tools. So there's lots of power, the PowerShell regex tools out there.
01:01:20
There's dash match, dash replace and dash split. They take regex as well. Here's a really, here's a kind of interesting, silly one, but if you say, if you pass in a string, this is a sentence dash split, and you pass it E dot, for example, then E dot space, I don't, yeah, E dot,
01:01:41
I mean, E dot, yeah, yeah. Then it ends up yanking, whenever it sees an E dot, it yanks that out and adds a new line to it. So that's why this, if you've never seen split, that's what it does. But the idea is that was the sentence, and then afterwards, it becomes that. See, we've got multiple lines out of it.
01:02:01
There's some E's and spaces missing. Make sense? Yeah. I've never found a real world use for it, but it just seems like it'd be wonderfully destructive if I could just figure out what I could do with it. I wanna briefly talk about case matching in PowerShell. In the PowerShell world, we often don't care about case, but if you do, then there is,
01:02:22
match has a cousin C match. C match, and you may know about that, because there's also what, there's a C eek and a C like, clike, or whatever it is. So you should not be surprised that we have. There's also a not match, by the way. So all those things are out there.
01:02:41
All right, questions? Ultimately, if we're dealing with PowerShell, excuse me, if we're dealing with PowerShell and regex, it's always a good idea to remember that where do we get this from? PowerShell is really merely surfacing the underlying .NET regex classes.
01:03:01
And so, but the PowerShell guys are, they have to make decisions about how complicated they're gonna make this for us, and so they decided to shield us from some of the specificity. And so, we got more kinds of stuff that we can do here. So, one way to get to the .NET, to the .NET regex,
01:03:26
is that there is, it's a cast. You can do this. I could say something like, dollar sign R equals regex cast, and then that is the pattern. So what we're doing here is we're warming up
01:03:42
this thing to be able to do kinds of pattern matches for us. So now I got dollar sign R. It's a regex type of object. And then I can say this. I can say something like, rather than dash match, it looks like dollar sign R dot matches.
01:04:00
So it's a method. And then inside, there's the target. And so what's that gonna do? Well, let's see. So we have, we got this. Our pattern is dot dot bb dot. What does dot dot bb dot mean? Is that going to ever be greedy?
01:04:22
Is dot dot, is dot dot bb dot gonna be, and does dot dot bb dot, is that ever gonna become greedy? No, why? What's missing? Plus or asterisk, exactly, exactly. So it's never gonna be greedy. It's never gonna be lazy. It's a nice regex. So, if I then match that against yabba-dabba-doo,
01:04:46
it's going to find that, oh, look. Well, this is the cool thing. The first thing that dot net, that grabbing the dot net cast means that we now get multiple responses. Well, that's pretty exciting. No?
01:05:01
I don't get out of notes. That's pretty exciting. Because, seriously, you know how it is. When you're working with something initially, the more data you can get about it, the happier you are. And so, for no other reason, I would tell you to consider using the regex cast all the time that you're starting to play with this. Because it makes it a little easier.
01:05:21
Can we get the rest of the PowerShell to give us all? Absolutely. Stay tuned for SLS. And look what it says. It says true. It said on there, captures. We'll talk about captures in a little bit. Index, notice where we found it. This one was at the zeroth location, at the sixth location. And it actually shows you the text, I like that.
01:05:40
I like that. From a pedagogical point of view, if I'm teaching people how to do this, or if I'm just hoping to get this done in the next three hours, so that I can then solve some problem or something, I like that. Now, if you want the most, any questions on that? Please. I noticed there's more than one. No, no, no. Sorry, that was the point I was making.
01:06:02
No, the point I was making was, if we did this on the pure PowerShell at dash batch, it would give us just this one. Right? This gives us all of them. And because you know that that information is sitting in matches, it's then easy to pull up stuff right now. Okay. But if you want total control,
01:06:21
if we need total control, then you use new dash object. And basically, what are we doing? If you've never used new dash object, then the notion is that the .net's got this bunch of assembly molds where you can build different stuff for you. That's a, you know, that's a class. And then we just pull down the regex object, and we're going to do it by talking directly to it,
01:06:42
and that gives us a few extra things that we could be doing that we weren't doing before. Now, by the way, I'm going to get this scrolled off because this, it would be at the building over there. New dash object can get a bit big, and the syntax is in the PowerPoints, I promise. In the next slides, we'll be seeing it. Look what's happening.
01:07:00
It says, dollar sign regex equals, and you can call it anything. I'm just calling it dollar sign regex because I'm 60 and I can't remember what I did with something a minute later, so. New dash object, regex, and then there's our pattern again. And then we do all this other stuff, okay? Now, why would I want to do that? Gosh, that looks like what you just had, Mark. Yes, it is similar, but I haven't told you the good part.
01:07:23
And that's a sample run of what that looks like. Looks like what we just saw. Okay, so why is this good, Mark? Well, let's do a sidebar before we talk about that. Sidebars, you may know that PowerShell for the last two versions has had a kind of object called a time span. Come on in, guys, we've got seats. Come on in. I'm not even talking about you, no problems.
01:07:42
And time span is a chunk of date time, essentially, and you can decide how long you can create one of these things to be. Anybody ever had to use a time span? Yeah, oh, learn them, learn them. They're weird. When you let them be like, why do they do it that way? The Jeffrey must have known. There must have been a reason. But anyway. So.
01:08:01
The distinguished race. All hail Jeffrey, he is object oriented. So. Anyway, so with time spans, you can create them by subtracting one date from another. But the other way is you can actually say new dash time span, you can tell it how many seconds, how many minutes, stuff like that. But I don't want that.
01:08:21
I want smaller intervals. So you could do something like this. You could say something like, normally it looks like dollar sign span equals new dash time span dash seconds one. That's the smallest one you can make officially. But, here's the interesting thing. If you have a time span, you can actually add numbers to it.
01:08:42
And that will make the time span a little bit bigger. And so if you do this. Dollar sign TS span equals, we're gonna do a cast to time span, and the integer one. That will say smallest thing we could possibly have.
01:09:01
Now, how many of you played with this kind of stuff before? Oh, because you're gonna be surprised. How big is that? It is .001 milliseconds. Full stop, hang on. I know about this, because I do active directory stuff, and so I'm looking at dates and times all the time. So I know more than I wish I did.
01:09:23
And I know, and you probably, you heard this sometime, but you forgot it. The way most of Windows thinks in terms of time are time ticks that are how long? 100 nanoseconds. You would think this would be 100 nanoseconds, but it's not, it's way bigger than that.
01:09:41
So nevertheless, if you do this, if you ever need a short time span, I'm gonna show you an example where you want to, then you say time span cast to 10,000, and that will give you a one millisecond time span. Okay? I know you're saying, why are we using this, Mark? Just trust me, we're getting there. Handling timeouts. If you don't go infinite on one of your regexes,
01:10:06
you're not working hard enough, okay? So that time's gonna come, and you want to hear something bizarre? If you build a regex in PowerShell, you can't say time it out in three minutes.
01:10:20
I'm thinking if the regex doesn't get done in three minutes, I'm leaving, right? So it'd be nice if people have a timeout. Here's the cool part, this is why I'm telling you this. The .NET class includes a timeout. Now I've got two things about it. First of all, didn't appear until .NET 4.5. And the PowerShell guys, I think, didn't know about it
01:10:42
because I mentioned, I was talking with regexes about a particular individual in the PowerShell hierarchy. And he was saying that regex was a dumbass idea, and why didn't text go away? And I said yes, that's probably true, but it hasn't. And did you know that .NET now lets you do timeout
01:11:01
because he was starting to say, and you know it goes infinite sometimes, and I said, do you know that changed in .4.5? And he said, oh, so I'm hoping that very soon we'll see that incorporated in the PowerShell tools. Until then, here's what you do. Yes, that looks messy, just copy that to the PowerPoint. Here's what it looks like.
01:11:21
There's our new dash object, that's our pattern. This is the blah blah. Regex experts in the room, that's how you set multi-line? Ignore case? Isn't it doing that already? No, and why?
01:11:42
Because we're building it straight off .NET, which means it is case sensitive, and I don't know about you, but I just don't need that that often. I have enough reasons that I can blow things up like feeding Unicode to a text file, but anyway. So, yeah, you don't do it, but I would say you don't do it twice, but apparently I'm not that smart. So, and then there's the max time.
01:12:03
Now you can do something off of there, and in this case, just for fun, I'm sure you have free time. Build one of these, give it a timeout of three milliseconds. In my experience on a Windows 10 box, with a fairly good processor, everything takes a minimum of about 4.5 milliseconds.
01:12:22
Having said that, I just tested all that stuff in my room a few hours ago, and everything's way faster. I don't know what they did. No, no, I'm dead serious. I have three Regexes I can normally put up, and we're doing this. Down there. So, you'll see this about Regex engines. People are constantly tweaking. I don't know what Microsoft did,
01:12:41
but they did something in the last six months that's changed the way that this stuff works, okay? So, that's how to get, we talked about getting insensitive. I wanna make sure, is there any questions on this? So, there's several ways that we say I'm gonna be case insensitive. Yes, please. So, on your previous slide. Yeah, yeah, yeah.
01:13:01
So, dollar sign, max time. Oh, sorry, is a time span. Okay. That's a time span variable. Okay. You're right, I should have used the same one, I should have, duh. Thank you, I'll fix that. This is why I teach, because I learn more from you guys than you learn from me, so.
01:13:21
You get an error. It is an error. You actually get red text, yeah. It was funny, when Jeffrey was talking about how they were gonna play with the error text, I kept waiting for someone to say, I have an idea, why don't we make it white on black so people can read it, you know? It's like the first, I love PowerShell, but the first time I used it, I'm like, did you guys hire the wired guys
01:13:40
to do the color or something like that? It's just very pretty, can't read it. Anyway. And yes, I know you can change it, I just get tired of changing the box. I mean, because at this point, my, let me show you something. At this point, if I try to open my PowerShell prompt, at this point,
01:14:01
it takes that long for the thing to get, oh, okay, so it was, it was pretty, oh, that's a date time example, that's why I just wanted it, I just, it's a one-liner in the profile, I'm happy to share with the community. I'm not saying anything about it, I'm just kind of interested when it happens, that's all.
01:14:41
Yes, yes, that's right, that's exactly right, yeah. But you do get the true or false information, you just have to extract it from another member. Other questions, anyway. All right, great. Let's talk about the star of the show. This is the grep for Windows, it's select-string. Basically what they did is all the little quirks I've been talking to you about,
01:15:01
most of them, they make it a little easier. It doesn't change the greedy or lazy, if you've got a crappy regex pattern in dash match, it's gonna be a crappy regex pattern in select-string as well. And as I said, I'm hoping that in the next six months or a year or something like that, that we will see an option for the timeout stuff, that would be just really, really nice.
01:15:20
So, the notion here is this is nice, because it's good for a number of things. First of all, you can point it at a folder full of files and you can tell it to go look for those files and pull them out and do all the regex matches on those guys. Additionally, it loves the pipeline, it's not pipeline-deaf like a sad number of things are. You can just pack it with a bunch of stuff and you can get everything that you want.
01:15:45
By default, it is built to take text files and go boom, boom, boom, boom, boom, boom, boom, boom. It thinks in terms of lines. It is not one of those tools, the so-called multi-line tools, that see an entire page as being one big line.
01:16:02
It sucks in line-by-line and it reports line-by-line. So, you give it a file with 500 lines and 13 of them matched, it will show you just the 13 that matched. And if you know how to ask it nicely, it'll tell you what position, what line number, what lines are around it,
01:16:21
to kind of help you, like you gotta go find this stuff. It's something that clearly, you know how sometimes there are things in Microsoft tools where you say, these guys never use this stuff, you know? This is something where those guys use this stuff. It has to be because there's a lot of good stuff in here. Can select string tail-like rep here?
01:16:43
I wish I were a Unix guy, so I don't get that. I think there is a tail option there in newer version somewhere. So, get content-weight. Yeah, get content-like weight. Okay, thank you. So, you wait for tail? Was that what I was hearing? How does that go?
01:17:00
I don't know. It's in there somewhere, please. The other side of the head, I think. Anyway, sorry, I don't know. Here, forgive me. Okay, so, stuff we got. Pattern is the essential one. And by the way, it's positional parameter one, positional parameter two. They so know you're doing folders.
01:17:22
There's only two positional parameters. And the second one is, what folder and what wildcard should we be using to pull the files off? So, that's the first thing, okay? Case sensitive, we'll make it case sensitive. All matches, again, same story. What if I have 13 lines, I pull one out, and one of those lines has five copies?
01:17:43
Well, our default behavior in regex is what? We don't really care what's in there. It's just a yes or no question. But if you want those things, you can add dash all matches, and then it'll return all of those. I found this tremendously useful. I was trying to go through my Outlook and find all the non-delivery reports
01:18:02
for my mailing list that I used to do. And you pull this stuff off, and it's hierarchical data, but SLS is smart enough that it not only grabbed the emails off of the Outlook object, it also then could grab out the attachments and scan them. It's a great tool. I mean, I just can't tell you how awful it,
01:18:21
if you're doing regex-y stuff, at some point, you're gonna find this thing. So, let's see, what else? This is another great one. So, there's path. Path says, where is it? Literal path is in those situations where you want to give it a literal that perhaps has a dot, a star, or something in it. And this way, it knows that there's no wildcards. In contrast, path can have wildcards.
01:18:41
That's the only difference. Dash context is very interesting. It's dash context and then an integer. And that says, you know this is a file with 10,000 lines. When you find one, do me a favor, show me the one above and below it, or show me the five above and the five below it. That's gorgeous. It's just absolutely gorgeous. If you're ever trying to go through big piles of files, which I've done because I'm trying to collect
01:19:02
all my old columns and see which ones might have some value, not many at this point because they're old, but you know, so. It's all good stuff. Any questions? Let's see some examples. So, let's stuff the pipeline. We'll have a string. I just love, love, love SLS, comma, you'll love it too.
01:19:22
Let's punch that into SLS. And remember, what's the first positional parameter? The pattern. So what I gotta do here, I'm sorry, I'm sorry, forgive me, SLS is the alias for select string. Just take me a second. I know SLS, what does it stand for?
01:19:40
So, what's that gonna show us? It's going to, let me, time, get short. Basically this is showing us the stuff that we want because you see the dot matches? And the dot matches there is gonna show us all of those matches. So, it's, I hope you get a feel.
01:20:00
Here's an example. When I did that, you get a lot of data, and so if you pump it out to OGV, here's an example. I had a couple of text files, and they had some lines, and I'm looking for the number of times, and I know you can't see it, so, but.
01:20:20
So that's your, so this is your command up here. There's our command up there, select string path. It goes to this particular folder, pattern saw, all matches, punch it into out grid view. And so what does it show us? It shows us up top, there's the yes or no
01:20:43
for ignore case, and then there's line number. I found this, and what text field did I do? Then there's the line itself. It shows you the whole line. Then, I know I sound like I'm really excited about a command line, but it's just, this is nice. It does pretty much almost exactly what I want,
01:21:01
and you don't really find that often without writing a script, you know? So, that's the path we got it from. Again, what was the pattern? S-A-W, and then, look what we got here. So like, here, this line, I've got a saw here. I saw a plane, I wish I saw a flower. I used a saw to cut down a tree. I saw a saw well cat.
01:21:23
They don't exist, but it just, you know, it was a good adjective, so I thought I'd use it. So there's two, and notice how it reports it. See, saw and saw, over there. So, make sense, questions? Anybody? Okay.
01:21:42
As I've said before, as I've suggested before, one of the things about the dot cannot do is dot cannot match a new line. Dot can't match a new line. You can change that, though. You can change that behavior. It's either called a single line mode or a multiple line mode, and they're easy. There's just a few switches to do, and it's right there on the PowerPoints.
01:22:00
I won't go into detail because I want to do some other stuff. Here is one of the best pieces of advice I can give you. Don't paint yourself in the corner. Don't think you can do this by yourself. You can, but it'll take a lot of time. Let's be very clear. The thing I love, you know, the coolest thing about PowerShell, I can't tell you how many times I've had friends
01:22:21
who are brilliant dot net coders, and I don't know if you know this, are there any developers in the room? Okay, stop listening for a minute. So when I go to developer conferences, they're like, Mark, you're a great speaker. Well, thank you, that's really terrific. Do I get a check? And then, you know, some of these conferences don't pay so well. And so, oh yes.
01:22:49
Yeah, I don't know. So, oh yes. So they'll grab me. We'll have dinner. It's like, why do you work with IT pros? So what do you mean?
01:23:01
They're so dumb. I said, they're not dumb. What's your IP address? We don't get no respect. So, but the great part is, oh, dot net's wonderful. Yeah, yeah, yeah, yeah, yeah. What can you do with that PowerShell? And then you show them a new dash out there. Because like, if you're a dot net coder, you can read the MSDN documentation because that's so helpful. And more alternatively,
01:23:21
what you'll probably do is you end up spending three hours writing some code to test the stupid thing out. Because that's the problem. In PowerShell and in dot net, because dot net is PowerShell, right? But I mean, is that, you have to look at an object and you think, oh, an AD user, does it have all the information that I want? We got to write a lot of code to test that if you're just doing it for Visual Studio. But what can we do? We do new dash object and we can poke around with get member.
01:23:41
You show it to someone. All of a sudden, the faces are like, I want that. I so want that. I mean, the great thing about PowerShell is the immediate gratification. You need that for regex too. And so, if this sounds scary, save yourself some time. There's some great online testers or you can buy them. There's a $25 one called regexbunny that I'm told is very nice.
01:24:01
Somebody showed me Xpresso. Xpresso, is that it's name? You said it's free? It's free. And that's what a Windows app or something like that? Cool. I like some of these locations. Regxtra.com is the, this is wonderful. This is a free thing these guys put up. They've got a cheat sheet about all of those regex operators we're not gonna have time to get into. And it's really terrific stuff.
01:24:24
The other one is regex 101 is very nice. Now, you should be aware that regex engines can be a little bit different. They can be just a little bit different from each other. And so, no, I mean that. I mean, 95% of the regexes will run just fine on any engine. But you will find some funny little dark corners where maybe, I don't know,
01:24:42
the Java engine behaves differently than the .NET engine. So it'd be nice to have one that's based on the .NET engine. And there is one. It's called regex hero. It's online and it runs off the .NET engine, which is great. The only problem is it requires Silverlight.
01:25:00
There's a large company out of Redmond that's trying to kill that product for reasons that make no sense to me at all. But those are cool things to have. Definitely. Oh nice, regex workbench. Do I have that visual studio to join it? Cool, thank you. Regex workbench, thanks, thanks.
01:25:22
And that's a free download or, okay. Or you have to go to Software Assurance and pay them an extra $100 a week or something. How many people are in your building? Only one is using this tool. It doesn't matter. How many heartbeats are there in the building? Regex, tell me about regex golf.
01:25:54
And thank you, you brought up a very important point I was gonna forget. There is no right answer when it comes to regex. People who know regex are smart people
01:26:02
and they love to outdo each other. I mean, never walk into a room with regex people and imagine that you can tell them this is the proper pattern to detect email addresses. None of you will get out of the room alive. Because, I'm serious, regex is fun and everybody who loves it has a website. And so, no it's true, and they have forums.
01:26:20
And so if you're really stuck you could do this. Just post on the forum, I don't know how to do this. Inside of five minutes you will A, have 30 different answers that will all work. And there'll be three dead members of the forum because they will have been, you know. So, seriously, efficiency yes. I mean, you know, don't make your regexs three minutes. But, really don't sweat, you know,
01:26:41
don't sweat getting it too, you know, too heavy, deep, and real, okay? So, the other thing, and there's an example of, there are a million sites out there. And by the way, they're usually really nice folks. You know, unless you suggest that you have a regex that's more efficient than your anatomy. So the only thing left is regular expression syntax. I've got just eight minutes, but let me do a little bit here.
01:27:01
So, first is a character class, character class. Sometimes I want to be able to say this or that or that or that. We do that with square brackets. And you can have a range, zero to nine. What does that mean? That means that we'll match anything from zero to nine. B to V will match anything
01:27:20
from the letter B to the letter V. Or we can have a pile of things. That's called a range, that's called a set. A set is where you just have characters. I created this set, what's that do? Makes balance, okay? You can put them together. You could have B dash C, F, B, N.
01:27:41
What does that mean? It takes B, it takes C, it takes F and B and N. Note the square brackets around it, they're called custom classes. Why do they go custom classes? Because there are some built in, baked into the box classes. Does that make sense so far? When we get past them, the other ones will all start to fall apart. These are the ones to get started with. For example, I had this thing,
01:28:01
is TH five digit match zero to nine? Is it gonna match? It is because zero to nine fits against that five. Is there a digit match? And here, what is it? We're doing A through F, five, six, seven, H dash J. So what we're saying is we'll take numbers, but the numbers that we'll take
01:28:21
will be five or six or seven. And so is there a digit there? Well, five matches the five. Does that make sense? I'm sorry to speed up, but I wanna just run through. You can see that they're easy once you just see them. And all these pages that you're gonna have have got examples. So check them out. Also, there are probably people who are doing a better job explaining this to me, so online stuff can help as well.
01:28:42
Okay, red text also includes some predefined classes. They are things that are prefixed with a backslash and then a number. Excuse me, a letter. So for example, backslash W, little W, means word characters. They have defined certain things as being characters that are in words.
01:29:00
So notice, what do you have in the word characters? A to Z, A to Z, zero to nine, underscore, enanti, I would maybe put some other stuff, but they don't, okay? Notice the lowercase. So for example, here's a quick jump ahead. So what's backslash little W do?
01:29:21
Backslash little W means one character from what group? Backslash little W. What's the class? What's in that class? Word characters. Word characters, so what? A to Z, zero to nine, underscore. Okay? Good. How could I find, by a pile of text,
01:29:42
you know enough to do this? So any character we imagine in a word, it's not exactly right, but it's close enough. Every character we have in a word can be described as backslash little W. How can I find a word? Well, we've got a class called backslash little W. That's just one character, though.
01:30:01
How would I say one or more word characters? Backslash W with a quantifier, which was? Does that make sense? If you haven't heard that, think about that. That's where regex starts to take off. That's where you've got your foot in the accelerator.
01:30:20
In other words, the only things. So he's technically speaking, and then dot is a predefined class. Full stop. Is a predefined class in itself, though, if it's? I suppose, yeah, yeah, that's a good point. You're right. The dot essentially is a predefined class that matches everything, makes that new line. Good call, I like that. So.
01:30:41
Does slash capital D match new line? No, I'm not sure. I'm not sure. I know that backslash little S, backslash big S do include new line. From what you're saying, maybe it would be logical,
01:31:01
but I'm sorry, I've never tried it. It's a really good question. Anyway, so little W means word characters. Backslash big W means anything but that. So it tends to be something, and it's inverse. The lowercase is the something, the capital is the inverse.
01:31:23
Backslash D is digits, zero through nine. What would backslash big D be? Everything but zero through nine. And so you can create your own character classes. S is interesting.
01:31:40
Backslash S is white space, which generically means different things to different people, but here it means these particular things. Mostly we're looking at what? Line feeds and character turns, which boils down to did you make the text file with Notepad or all the other tools, right? So, oh, another thing that's very sexy about this also is that there are some tools where if you feed Windows tools an in quotes Unix file,
01:32:03
in other words a text file that doesn't have CRL at the end of it, that it'll suck it up anyway. All the PowerShell stuff I've come across in this area has no problem with that. I mean, even a GC for that matter. I was surprised, I never noticed that before. Even a get content will handle text file, which is really, really nice. And you know why? Remember I told you to download the file
01:32:21
with the million words or something like that? They're all off Unix sites, so they only have line feeds at the end of them, which means Notepad's not gonna like them. But everything else will like it, okay? Now, at this point it is a law that if you're doing a regex presentation, you're not allowed out of the room until you've done the social security example, okay? So, it's like a rule, I don't wanna go to jail.
01:32:41
So, we know that digit is what, slash D? So our matching pattern is going to be backslash little d, little d, little d, why? Because social security numbers are one, you know, dot, dot, you know, three, and then two, and then four. Make sense? Except to the 20% of the room says, what's a social security number?
01:33:01
We don't have those in my country. I know, I know, sorry. I don't know what your social security numbers look like or I would do that, okay? So, anyway, so then, so we grab that match as we see, you know, what, in this case, that they match properly. There's our, it was true, and the value we pulled out was that.
01:33:20
So, this is either be good for checking if any of that bad PII is there or for stealing that bad PII if you have access to the data, so. So we talked about quantifiers. We've already met two quantifiers. One quantifier is plus. It's, because it amps things up. So, if I say, if I give you a character and I say,
01:33:41
plus, what does that mean? That, how many? One and up. What's a star? Zero and up. We can be more specific. Maybe you wanna say, I only wanna have, oh, I don't know, somewhere between two, I got an X here, and I expect to see somewhere between two and seven Xs.
01:34:02
These would be, by the way, consecutive again. There's something in the middle. The only thing that matches this is X, X, X, X, X, et cetera, et cetera, et cetera. Would X match it? X would not match it. Why would X fail? Because the minimum we want is two Xs, okay?
01:34:22
Does it make any sense? You guys can quiet. There's the optional qualifier. This is a great one. Now, I spend a lot of time out of the country, and I love my friends that I see outside of the country, but for some reason, they don't know how to spell things properly. Because, no, really, Brits, great people,
01:34:41
Canadians love them. I mean, they're like five meters away. See, I even sounded Canadian from over there. I said meters. But this color thing. Le Booer, what happened? Well, she went to Le Booer and didn't survive. It had been labor, she could've survived it. So the trick is, this quantifier, the question mark,
01:35:03
this says zero or one. So it'll accept one oo in color, or no lo, lures, oo in color. Now, unfortunately, it would also match that. We're getting there, okay? But see, the question, that's what the optional one does.
01:35:22
But so this means zero or one, as in like that. So if your pattern said CLOU question mark R, it would match color and color, okay? Whose fault was that, by the way? Why do the other countries that were once
01:35:42
helpless victims of the British Empire all misspell things in the same way but the Americans don't? No, no, what the man said. Daniel Webster wanted us to be a distinct language, which was a dumbass idea. And that's why, like everybody else spends defense
01:36:01
with a C and we do S. And, oh, and it's time for me to stop, I'm so sorry. Hang on, let me, yikes. See that nice Jeff Hicks man in the back? He did not even come down and take the podium, see? So that's all we had. You're gonna get the rest of the PowerPoint. Let me ask you some questions.
01:36:21
Have I convinced anybody to go out and play with rejects? Have I frightened anyone away? So I'm never going to do that ever again. Okay, that's entirely possible, okay. So I hope it inspired, remember that it's okay to cheat. There's lots of examples, you're not cheating them. Use the online regex tester, select string is the thing.
01:36:40
And with that, let me say for the very last time ever, I really appreciate you guys being here. Go back and learn some stuff from that and remember to use this knowledge for good and not for evil. Thank you.