Just when you thought you couldn’t refactor any more…
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 69 | |
Autor | ||
Lizenz | CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben. | |
Identifikatoren | 10.5446/37778 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
RefactoringZeichenketteRegulärer Ausdruck <Textverarbeitung>Befehl <Informatik>URLMarketinginformationssystemVideokonferenzSyntaktische AnalyseZahlzeichenKompakter RaumVollständigkeitSchlussregelPhysikalische TheorieVideokonferenzEin-AusgabeTypentheorieURLProgrammbibliothekCodeMessage-PassingDifferenteYouTubeStandardabweichungDateiformatProzess <Informatik>ZeichenketteProgrammierspracheHomepageMatchingSyntaktische AnalyseArithmetisches MittelKontextbezogenes SystemExpertensystemRegulärer Ausdruck <Textverarbeitung>RankingFormale SpracheMaschinenschreibenCOMReelle ZahlSoftwareentwicklerRechter WinkelBitMereologieDigitalisierungHypermediaDesign by ContractCASE <Informatik>IterationStreaming <Kommunikationstechnik>BildschirmmaskeXMLUMLComputeranimation
07:25
MusterspracheRegulärer Ausdruck <Textverarbeitung>VariableVideokonferenzKompakter RaumVollständigkeitNichtlinearer OperatorBefehl <Informatik>CASE <Informatik>ZeichenketteSyntaktische AnalyseMotion CapturingGruppenoperationKonstruktor <Informatik>GruppenoperationZeichenketteMatchingQuadratzahlRegulärer Ausdruck <Textverarbeitung>DigitalisierungPoisson-KlammerCodeCOMVideokonferenzZahlenbereichQuantorNichtlinearer OperatorCASE <Informatik>Komplex <Algebra>Schreiben <Datenverarbeitung>MultiplikationsoperatorBefehl <Informatik>Hash-AlgorithmusInstantiierungYouTubeTypentheorieResultanteProgrammbibliothekStandardabweichungArithmetischer AusdruckMereologieKonditionszahlPhysikalischer EffektMessage-PassingPaarvergleichKeller <Informatik>HypermediaSystemaufrufSymboltabelleMotion CapturingURLDifferenzkernArithmetisches MittelComputeranimation
14:39
Hash-AlgorithmusMotion CapturingZeichenketteGruppenkeimVideokonferenzVollständigkeitKompakter RaumCodeURLHash-AlgorithmusKontrollstrukturDatenstrukturTypentheorieKonfiguration <Informatik>Motion CapturingYouTubeVersionsverwaltungVideokonferenzResultanteWechselsprungSchreib-Lese-KopfMultiplikationsoperatorNatürliche ZahlMereologieHypermediaEinfache GenauigkeitSpeicherabzugInstantiierungBitMatchingCOMGruppenoperationCASE <Informatik>GradientenverfahrenTupelComputeranimation
18:11
p-BlockENUMSystemaufrufRegulärer Ausdruck <Textverarbeitung>TypentheorieVideokonferenzBefehl <Informatik>KontrollstrukturKompakter RaumVollständigkeitTypentheorieMatchingCodeVideokonferenzZahlenbereichCharakteristisches PolynomKompakter RaumInstantiierungKonditionszahlAbzählenDivisionKlasse <Mathematik>IterationKontrollstrukturZeichenketteGeradeWechselsprungLoopRegulärer Ausdruck <Textverarbeitung>ResultanteGanze ZahlMailing-ListeMathematikMusterspracheObjekt <Kategorie>p-BlockSpeicherabzugDefaultArithmetischer AusdruckZeiger <Informatik>DatenstrukturFormale SpracheKardinalzahlLeistung <Physik>CASE <Informatik>HypermediaParametersystemSpannweite <Stochastik>Phasenumwandlungt-TestMereologieArray <Informatik>Physikalische TheorieComputeranimation
24:08
URLTypentheorieVideokonferenzVollständigkeitKompakter RaumRegulärer Ausdruck <Textverarbeitung>Hash-AlgorithmusVideokonferenzCodeURLMusterspracheYouTubeGeradeMereologieStandardabweichungKonditionszahlMatchingProgrammbibliothekDateiformatRegulärer Ausdruck <Textverarbeitung>CASE <Informatik>DatenstrukturMathematische LogikArithmetisches MittelObjekt <Kategorie>MinimumAbzählenTypentheorieMailing-ListeMotion CapturingSchlussregelDifferenteRechenschieberIterationLeistung <Physik>Gemeinsamer SpeicherWärmeleitfähigkeitMaskierung <Informatik>BildschirmmaskeMapping <Computergraphik>Lesen <Datenverarbeitung>p-BlockCOMKardinalzahlRechter WinkelEinsBefehl <Informatik>SichtenkonzeptVarianzWhiteboardInterpolationForcingArithmetischer AusdruckHypermediaComputeranimation
30:05
InstantiierungWärmeleitfähigkeitKlasse <Mathematik>TeilbarkeitMusterspracheVideokonferenzVorlesung/Konferenz
30:40
COMXML
Transkript: Englisch(automatisch erzeugt)
00:12
I've been here, good afternoon. My name is Claudio. I'm really excited to be here at RubyConf. As I was saying, this is actually a talk about Ruby,
00:23
about the Ruby programming language. It's a language that we all love to write, we love to read, and it's a language that is a little old, but it's still evolving. As a matter of fact, there is a new release every year and there are new methods added every year.
00:41
So, this is just an overview of some of the methods that I'm gonna touch during this talk, and some of them have been in Ruby for many years, and others are new, they were just added in 2.4, released last December.
01:01
The format that I chose for this talk is a refactoring journey, and I invite you all to come to this journey with me. What we're gonna do is we're gonna take a problem, it's actually a real problem that I had in my previous job, and then try to find the solution in Ruby with a first pass, and then iterate
01:24
through the problem using some of these methods to make the code better. So, whether you are a beginner or you are an expert Ruby developer, I hope you will all learn something from this talk.
01:44
So, now that the doors have magically closed, we can get it started. So, let me give you some context here, because this is the example, the problem that we're gonna use to look at those methods. It has to do with YouTube URLs.
02:02
In my previous job, I had to deal with YouTube URLs a lot. If you have ever watched a video on YouTube, you have probably seen a URL of the first type, that is a YouTube video. The second URL is also a YouTube video, it's just a shorter format. The third and the fourth one are URLs for YouTube channels
02:24
and then the last one is for a YouTube playlist. So, what we have here is different URLs, but really different strings that mean different things. And what we're gonna try to write
02:41
is a method that is called parse and it accepts a string as an input and it's gonna tell us what type of YouTube resource we are looking at. So, in the first case, that is YouTube video. So, ideally this method parse would tell us
03:01
it's a type video and that's the idea of the video. In the second case, that is a YouTube channel, so that's what we expect. We expect type channel and the name of the channel. And the third one, it's not a YouTube URL, so it's simply type unknown. Now, because we are rubies, we're not gonna be happy
03:23
with just writing code that is correct and complete. We want something more. So, we're also gonna try to write code that is clear, meaning readable, and compact to follow the philosophy of do not repeat yourself.
03:40
So, this is the problem that I'm gonna talk about in the next 30 minutes. And before I start looking into that, I'm just gonna stop here for a second and give you all a chance to think about this. If this was like a job interview or something like that and this was the problem, what methods would you use
04:01
if you had all the methods of the Ruby standard library available to try to solve this?
04:21
Okay, so let's start. And I'm gonna give it a first pass. So, what I can observe here is that those URLs, they start in different ways. The first one starts with youtube.com slash watch, and I can imagine that is a video. The second one just starts with youtube.com slash,
04:42
and then I can imagine that's a channel. The last one doesn't even have YouTube, so maybe unknown. So, my first attempt is gonna be to use the method start with from string. And this is straight from the documentation, start with returns true if the string starts
05:03
with one of the prefixes given. So, you can say hello, start with hell, true. You can also ask hello, start with heaven, hell. It's also true because one of the prefixes is a match. Hello, start with heaven, paradise, that's false.
05:23
So, using this method, we can try to find the type of a URL. And this is how our first pass would look like. So, how does this rank for what we're trying to achieve?
05:42
Well, it's not perfect. I would say it's clear, meaning it's pretty readable. It's pretty English if text start with and so on. It's not compact because we have some repetition there. The method start with appears twice. It's not complete. We're only trying to match the type.
06:02
We're not returning the ID or the name. And even worse, it's not correct. And the reason why I say that is because if your URL is simply youtube.com slash, that is not a channel. That is actually YouTube's homepage. And the reason is that YouTube has some strict rules
06:22
about how the URLs are formed. And this is basically the rule. When you have a video on YouTube, every video has a unique ID. And this ID is always 11 characters. And it's not any 11 characters.
06:42
They are all letters, digits, underscore or hyphen. So I don't know if you knew this. Maybe this is something you can take away from this talk. When you have a channel name, also it has to have at least one of this character. So as you can see, it's not as easy
07:01
as just saying start with. We need something more powerful. We need to deal with regular expressions. So for the next iteration, we can use this method, string match question mark. This is actually a new method. It was just added to Ruby 2.4.
07:21
And it lets you compare string with a regular expression and it's gonna tell you whether it's a match or not. In the first case, the string Ruby is a match for R dot dot dot, which means R and any three characters, that's true. But if you just try R dot dot, that's false
07:42
because it's missing one character. And if you try P dot dot dot, it's also false because the first letter is an R, it's not a P. So using this method, we can now rewrite our code to look like this. If it's a match for a certain regular expression,
08:01
then it's a video, otherwise it's a channel, otherwise it's unknown. Now let me explain what this regular expression means. For instance, the first one is YouTube dot com slash watch question mark V equal. That has to match exactly. And then there is this thing in square brackets.
08:22
What the backslash W means is match any letter, any digit, and the underscore. So it's all there in that backslash W. The only other thing that we want to match is the hyphen, so we just add it there. And then the number 11 is a quantifier in curly brackets.
08:42
And it means we need to match exactly 11 occurrences of those characters. So that's exactly what we want. That's what a YouTube video URL looks like. In the second case, we still have the same square brackets and we have the plus sign. That means it has to match at least one occurrences.
09:01
So a channel name needs at least one character. And so with this in mind, we now have a solution that is correct. We are now actually matching videos, channels, or unknown. And I would say it's still pretty clear. It still reads pretty English. It's not complete. We're only returning the type and not the ID or a name.
09:23
And it's not compact. As you can see, we have some repetition. Text match, text match. So let's give it another try and let's try to remove this duplication. To do that, we're gonna use the triple equal operator.
09:42
This is what the Ruby documentation says. Following a regular expression literal with a triple equal operator allows you to compare against a string. And even more important, it's used in case statements. What that means is that if you want, you can explicitly use the triple equal,
10:02
try to compare R dot dot dot to Ruby, that's true. But you don't have to because if you use a case statement, then internally, Ruby is gonna use that operator. So if you say case Ruby when P dot dot dot, Ruby is actually using that operator. And if it's true, then it's gonna return starts with P.
10:23
Otherwise, it's gonna return starts with R. So it's kinda the same as before but it allows us to remove some duplication because we don't have to repeat match. So in short, we can take the code we had before and slightly change it to use a case when statement.
10:43
Case text, when it matches the first regular expression is a video and so on. So the code is still correct. It's still pretty clear and now it's compact. We're now repeating any method. But it's not complete. We're only returning the type.
11:03
And if you remember, we wanted to return two things. For instance, when we match a video, we wanna return the type and the ID. When we match a channel, we want the type and the name. So we are maybe halfway through it but we're still missing one part.
11:21
In short, what we want is when we have a match, the thing that we're matching, we wanna return it. So if we look back at the code that we had, those 11 characters that we're matching, we wanna capture them and then we wanna give them a name, ID, and then return that in the result.
11:41
Same thing with the channel. When we're matching a channel name, we wanna return it. So how do we do that? How can we capture and name these groups? We can use another feature of the Ruby standard library called regular expression captures.
12:01
What it means is when you write regular expressions in Ruby, you can actually group a part of the expression in parentheses and you can give it a name. And if you do that, then when you have a match, you can ask just for that match in return.
12:21
This is an example from the Ruby docs again. Let's say you wanna match the string $3.67 and you have a regular expression that is the dollar sign, then this group in parentheses that we call dollars, it's any number of digits, then you have dot
12:41
and then another group called cents. So once you have a match, then you can just say give me from the last match just the dollar amount. And that is what that dollar tilde symbol means is a special variable that holds the value of the last match.
13:00
So you can just say give me the dollar amount of the last match, you're gonna get three, give me the cent amount of the last match, you're gonna get 67. So with this in mind, we can now write our code in a slightly different way. It starts to look a little complicated, but it's not very different from before.
13:22
We have the same regular expressions, but now we're capturing the thing that we're matching. So in the first case, we're capturing and we're calling that ID. And then if it's a match, we're gonna take that and return it in the resulting hash as ID. In the second case, we're matching what is the channel name
13:42
and then we're returning it. And so finally, we have a solution that is correct and it's also complete. So if we were now writing Ruby, we could just stop here and say, well, this just works. I don't care if it's readable, if it's complex, let's just go to some other problem.
14:01
But we're writing Ruby, so we wanna make things a little better. Specifically in this case, it's still not compact because we're still repeating something here. This ID, this name that we gave, we actually see three times and same with name below.
14:21
So is there a way not to have this repetition? Well, I'm glad you asked, because there is. It's another method that was just added to Ruby 2.4 and it's called named captures.
14:40
The concept is very similar to what we just saw with captures. It's just a more compact and elegant way. Basically, when you're capturing multiple groups, you can just call this single method named captures and you get back all of them. You don't have to individually specify, I want the dollar amount or I want the cent amount.
15:02
This is the same example as before, so if we just call named captures on the last match, we're just gonna get a hash back that says dollars three, cents 67.
15:22
So let's see how we can use this in the code that we have. Let me break it down. For instance, let's say that we're trying to match the URL youtube.com slash conflicts. That is the name of a channel. When it's a match, this named capture is just gonna return a hash, in this case, name conflicts.
15:44
All we need is just then to add the other part that we want, the fact that the type is a channel, and we have back what we were looking for. So in this case, name conflicts type channel. So the code is becoming a little more dense,
16:01
but it's doing exactly what we want. And this is how the code looks like. When we're capturing a video URL, then we get the ID back in named captures and we just attach type video and we have the result that we want. When we're capturing the channel name, it's the same thing.
16:23
Now is this any better than before? Well, I don't know. Maybe we now are just hitting our head against the wall because it feels like we're not really getting anywhere. Like we still have code that it's not really clear. It's not compact. Now we're repeating something else.
16:43
So it feels like we are iterating, but we're really not getting there. And this is a feeling that you all might experience when you're trying to make your code better. You think you're following some path that's gonna lead somewhere and then it doesn't. So normally when this happens,
17:00
I take a coffee break or a lunch break or I just go home and sleep, and then in the morning I'm just like, wow, now I know what I have to do. I have to use something completely different. So we don't have time to go home and sleep, so I'm just gonna skip that part. But it does work like that.
17:20
When you are looking at a problem too much, you don't see exactly why things are happening. And in this case, this repetition is happening because what we've had so far is either if, else if, else if, or case, when, when. And it's almost in the nature of these structures to have some repetition
17:40
because you are listing all the options. As I said at the beginning, there are even more options of YouTube URLs, so if we had to do that, we would probably end up with some repetition. So is there a way in Ruby not to do that and instead to have Ruby itself do that for you?
18:01
Yes. There is. So this is really the biggest jump that we are gonna make now. We're gonna jump and talk about enumerables. So enumerables is a class and really the structure that sets Ruby apart from many other languages.
18:23
What are enumerables? Arrays, hashes, ranges, things that you can iterate through. And the enumerable class has this find method that is very powerful. Once again from the docs. If you use find, it passes each entry to the block
18:42
and it's returning the first for which the block is not false. So basically you have a list of objects and Ruby's gonna go through all of them one by one and try to see if there's one of them that matches your condition and as soon as it finds one, it's gonna stop and return that one.
19:01
The first example. If you have all the integers from 18 to 99 and you say find a number that is divisible by 17, internally Ruby's gonna say if 18 divisible by 17 then return it. Else if 19, else if 20 and so on.
19:20
But you don't have to write that and as soon as it reaches 34, it's just gonna return that for you. Another characteristic of this method is that if it does not find a match, for instance if you're trying to find a number between 18 and 29 that's divisible by 17, by default it's just gonna return nil.
19:41
But you can actually override that. You can tell Ruby what to return if there is no match. You do that with that extra argument if none. So in the last case, I'm just saying if you don't find a match, just return the value zero. So it's really powerful because then you don't have to write if, else if.
20:02
You just use the power of numerables. So now jumping back to our problem, this is really like the biggest change that we're gonna see. We're not dealing with cases anymore. What we are doing here is we are recognizing that we have a list of patterns.
20:22
Those are the same patterns that we talked about before. The first one is the regular expression that matches a video ID and that is gonna be of type video. The second one for a channel is the type channel. So we're just basically giving a name to these things called patterns. And then what we're telling Ruby to do is
20:41
just go through these patterns one by one, try to see if the text that you get is a match for any of this regular expression. As soon as you find a match, just return what we wanted from before. Type video ID one, two, three. So once again, it's pretty powerful
21:00
because now we only have eight lines of code and we don't have any method that's been repeated. The only portion that's missing is that gray line. How can we return something, what we want in that case? Basically, how can we tell this find method
21:21
that as soon as it finds a match, it has to return something specific, what we want it to return and not just the first match. To do that, we have to use the break statement. So what break does is exactly that.
21:41
If you have a loop and you can use break to break out of the loop and to return a specific result. In this example, in the first case, we're not using break, we're just saying stop as soon as you find a number that's divisible by 17.
22:02
Find is simply gonna return the number itself, 34. But if you wanna return something else, for instance, a string found it, then we can just use break and then we say break with that value when you find it. And so that's what we're gonna do here
22:21
and we're gonna use break. So now this code is becoming very dense but it's really getting to the core of it. It's saying go through all the patterns, so patterns find, try to see if there is a match for an variable expression. If there is, stop there,
22:41
take what you captured in the match, for instance, ID and the ID of the video and then just add the type that you matched. So if we're matching a video URL, we're gonna get type video and the ID. If we're matching a channel URL, we're gonna get type channel and the name.
23:02
If we're not matching anything, if you remember, we wanted just to return type unknown and we can just put it there. We can just use this extra argument of find to just say this is what you can return if there is no match. And so this is getting close to really being very compact.
23:26
You can actually make it even more compact if you inline the if condition and this is as compact as it can get probably. It's still correct, it's still complete but now we're dealing with clear. Like is it readable?
23:42
That is probably the most subjective of those characteristics. Some people might find it readable, some people like compact code, very dense. Other people like to be a little more verbose. So I don't have an answer for that. What we can try to do is we can see if we can do a last iteration
24:01
to make it a little more explicit. So for our last iteration, just gonna take a step back and remind ourselves what we wanted to get. We wanted to have a method that is able to parse all these different formats of URLs.
24:25
Now in the code that we have so far, we're actually only dealing with two of those formats, the video ID and the channel name. If we had to put all of them in this code, it would actually look more like this and now it starts becoming even not readable
24:41
because I guess it's a very long hash. And this pattern structure that we identified, now it doesn't really have a specific meaning. It's including many patterns. We have patterns for videos, patterns for channels, patterns for playlists. So just for ourselves, just to make the code more readable,
25:04
what is important and what's also one of the hardest thing to do is to name things. Since we have identified that some of these are video patterns and some of them are channel patterns and playlist patterns, it actually makes sense to separate them in different constants
25:21
because if we give them a name, then we kind of don't even have to read the regular expression. We just say, oh, this is the list of video patterns. If ever YouTube has a new way, we can just add it there but we don't have to actually read. Once it's there, it's there. We have channel patterns and we have playlist patterns.
25:40
So it's very declarative. It's very readable. The only step that it's missing is we want an object that it's basically gonna go through all of them. First, it's gonna try all the video patterns one by one and stop as soon as you find a match. Then it's gonna go through all the channel patterns and so on.
26:01
And so this is the very last method from the Ruby standard library that I'm gonna introduce today and that is enumerator.new. What this method does is it creates a new enumerator object which can be used as an enumerable.
26:20
So it's an object that you can tell Ruby to iterate through. And this is what we're gonna want here. We want an object that basically first goes through all the video patterns, then through all the channel patterns and playlist patterns. And this is how you write that code. You just use enumerator.new and then once again,
26:42
it's very declarative. Inside the block, you say well first for each video pattern, add it to the pattern and then if it's a match, the type is video. Then do the same for all the playlist patterns and for all the channel patterns. And so here we are now with this code
27:00
that it might be maybe too dense. As I said before, some people might prefer another format and I'm happy to discuss this with all of you after this talk. But I think it has some advantages. The first one is that really all the first lines of code, they're just declaration. We're just saying these are our video patterns,
27:20
channel patterns, playlist patterns and patterns. So that part really is gonna stay as it is. And the bulk of your code is just the last three lines of code, that's where we're doing the real iteration. So if the first line seem a little scary, those are just things that you're gonna write once and they're gonna stay like that
27:40
unless YouTube adds more formats and you're just gonna add them there. And your real logic is just those three lines at the bottom. So in my opinion, this code is correct and it's complete for sure. It's compact for sure, maybe too much. And in my opinion it's clear because of this,
28:01
because it's really defining things as they come. And instead if you have like an if, else if or a case statement, you can't escape from it. Like as you're reading, you have to read all the conditions diagonally in a way. And so this is basically where I'm gonna stop.
28:22
I'm just gonna add one very last thing. Those patterns that I typed at the top, they just say youtube.com. But really YouTube URLs can also be www.youtube.com. They can be http, youtube.com, https, youtube.com. You don't wanna add all of those variants at the top.
28:44
You can just tell a match to match an interpolated regular expression that has optional captures. It's basically matching every regular expression and also these variants. Whether there is www or not, it's still gonna be a match.
29:04
So I know that was a lot after lunch. But I really wrote this talk to just give you an overview of the power of some methods that are inside the Ruby standard library. And as somebody said yesterday in another talk,
29:22
Ruby is very powerful and you can do amazing things with it, both in a good way and in a bad way. Like you can just play around, there are no rules. These are some of the concepts that I introduced. And I think in my opinion this was a great journey to have
29:41
because it really, if you guys have different opinions and if you have seen something that gave you some other ideas, just feel free to talk to me. And yeah, that's all I have. And once again, the slides are available. And I'm up for questions if you have any. Thank you.
30:05
The question is, would I consider extracting those into separate classes? Yes, I would. I tried to keep it compact in this talk. But yeah, that's a good idea. For instance, those video channel players patterns
30:23
that can definitely be their own classes. So you can play around with it. I really can't see anything. But if you have questions, just come find me or I'm gonna be here for the rest of the conference. So just hit me up. Thank you.