We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Edge Segmentation - Part II

00:00

Formal Metadata

Title
Edge Segmentation - Part II
Title of Series
Number of Parts
21
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer

Content Metadata

Subject Area
Genre
Boundary value problemMathematicsLevel (video gaming)Object (grammar)Image resolutionComputer-generated imageryKeilförmige AnordnungLine (geometry)PixelCurveApproximationString (computer science)Position operatorDerivation (linguistics)Template (C++)Operator (mathematics)GradientEndliche ModelltheorieMassFunction (mathematics)Thresholding (image processing)CalculationNormed vector spaceHypothesisLine (geometry)Graph (mathematics)MathematicsMedical imagingHysteresekurveBoundary value problemDialectIdentifiabilityCorrespondence (mathematics)Cross section (physics)Profil (magazine)GradientShape (magazine)AdditionDegree (graph theory)BilderkennungVector spaceDirection (geometry)Matrix (mathematics)Latent heatLengthThresholding (image processing)Asynchronous Transfer ModeConnectivity (graph theory)Parametrische ErregungPixelStrategy gameOrder of magnitudeMatching (graph theory)1 (number)Sheaf (mathematics)PreprocessorPosition operatorNormal (geometry)Local ringLevel (video gaming)CASE <Informatik>TrailMereologyAngleConvolutionRange (statistics)Derivation (linguistics)Instance (computer science)Orientation (vector space)Operator (mathematics)Series (mathematics)Function (mathematics)Template (C++)AreaFocus (optics)String (computer science)Point (geometry)Data structureCurveProcess (computing)Content (media)Term (mathematics)Computer animation
CalculationNormed vector spaceHypothesisThresholding (image processing)Derivation (linguistics)Inverse trigonometric functionsOperator (mathematics)Computer-generated imageryGradientKeilförmige AnordnungSet (mathematics)PixelLogarithmGraph (mathematics)Contrast (vision)Derivation (linguistics)Thresholding (image processing)LoginOperator (mathematics)GradientPixelKonturfindungNoise (electronics)Clique-widthDialectResultantActive contour modelCurveLevel (video gaming)CASE <Informatik>BitPrice indexSign (mathematics)TrailOrder of magnitudeHysteresekurveLocal ringMedical imagingAreaDifferent (Kate Ryan album)Software testingMathematicsMultiplication signMereologyStandard deviationSigma-algebraFunction (mathematics)Flow separationCurvature2 (number)VarianceConnectivity (graph theory)Set (mathematics)EstimatorSelectivity (electronic)Type theoryComputer animation
Operator (mathematics)Computer-generated imageryDerivation (linguistics)Keilförmige AnordnungPixelSet (mathematics)LogarithmGradientFlagDirection (geometry)Maxima and minimaApproximationClique-widthOrthogonalityTransformation (genetics)Normed vector spaceAngleOrder of magnitudeAreaGradientMereologyMaxima and minimaNormal (geometry)Orientation (vector space)Negative numberInformationDerivation (linguistics)Thresholding (image processing)Standard deviationOrthogonalityAverageClique-widthCorrespondence (mathematics)Medical imagingScaling (geometry)CalculationPixelLine (geometry)Operator (mathematics)Position operatorFunction (mathematics)Direction (geometry)Parameter (computer programming)Graph (mathematics)NumberDivision (mathematics)2 (number)NeuroinformatikGraphics tabletRight angleFitness functionSerial portDifferent (Kate Ryan album)Matrix (mathematics)Sound effectConvolution1 (number)Context awarenessRepresentation (politics)Sigma-algebraDialectType theoryLength
Operator (mathematics)Derivation (linguistics)Graph (mathematics)Axiom of choiceDegree (graph theory)ResultantActive contour modelCountingMedical imagingObject-oriented programmingOrder of magnitudeSmoothingDialectClique-widthDerivation (linguistics)Boundary value problemPixelGradientDifferent (Kate Ryan album)Thresholding (image processing)Computer animation
Operator (mathematics)Derivation (linguistics)Online helpObject-oriented programmingNumberPixelNeuroinformatikLine (geometry)Computer animation
PolynomialMaxima and minimaPosition operatorPixelKeilförmige AnordnungOrthogonalityApproximationPoint (geometry)Order of magnitudeDerivation (linguistics)String (computer science)Uniformer RaumChainCoefficient of determinationTransformation (genetics)Point (geometry)PixelJunction (traffic)Graph (mathematics)Differential equationLine (geometry)Tracing (software)Quantum stateGradientTrailMedical imagingDirection (geometry)Nichtlineares GleichungssystemPosition operatorCross section (physics)PolynomialCASE <Informatik>CoefficientShape (magazine)Derivation (linguistics)Right angleString (computer science)Maxima and minimaGraph (mathematics)Order of magnitudeComputer animation
Operator (mathematics)Derivation (linguistics)PolygonPoint (geometry)ApproximationAlgorithmMaxima and minimaPixelThresholding (image processing)Heegaard splittingComputerVertex (graph theory)String (computer science)Boundary value problemPixelConnectivity (graph theory)Pattern languageGroup actionGraph (mathematics)ApproximationCurveRepresentation (politics)Vector spaceShape (magazine)Thresholding (image processing)NumberMedical imagingPoint (geometry)Object (grammar)DistanceAlgorithmPolygonMereologyFunctional (mathematics)Position operatorMathematical optimizationSet (mathematics)Type theoryGreen's functionRecursionAreaLine (geometry)Connected spaceGoodness of fitMaxima and minimaRaster graphicsComputer animation
Operator (mathematics)Derivation (linguistics)Endliche ModelltheorieGraph (mathematics)Correspondence (mathematics)Multiplication signPolygonNeuroinformatikComputer animation
PolygonMaxima and minimaPixelThresholding (image processing)Heegaard splittingComputerVertex (graph theory)Point (geometry)ApproximationAlgorithmPolynomialSpline (mathematics)EllipseEndliche ModelltheorieInterpreter (computing)Parameter (computer programming)Graph (mathematics)Computer-generated imageryTransformation (genetics)SpacetimeKeilförmige AnordnungObject (grammar)Line (geometry)PixelPoint (geometry)DialectRight angleType theoryCASE <Informatik>NumberParametrische ErregungDifferent (Kate Ryan album)DivisorSubject indexingLengthConnectivity (graph theory)Closed setCoordinate systemOrthogonalityVector spaceTwo-dimensional spaceCodomainLimit (category theory)CurveProcedural programmingParameter (computer programming)Physical systemGraph (mathematics)Image processingException handlingGraph coloringFunctional (mathematics)Nichtlineares GleichungssystemEllipseSpacetimeNetwork topologyShape (magazine)Endliche ModelltheorieInstance (computer science)Representation (politics)Semiconductor memoryMultiplication signMedical imagingLatent heatSineDistanceTrigonometric functionsObject (grammar)Interpreter (computing)Task (computing)ThetafunktionConstraint (mathematics)Form (programming)Direction (geometry)String (computer science)ApproximationComputer animation
Parameter (computer programming)Object (grammar)Maxima and minimaSpacetimeComputer-generated imageryTransformation (genetics)PixelKeilförmige AnordnungPoint (geometry)RadiusExtension (kinesiology)Normed vector spaceCluster samplingLine (geometry)Linear mapBoundary value problemUniqueness quantificationEndliche ModelltheorieCodomainParameter (computer programming)Medical imagingLine (geometry)AngleDialectPixelMaxima and minimaGradientCurveConstructor (object-oriented programming)Graph (mathematics)Cartesian coordinate systemSpacetimeProcess (computing)RadiusMereologyNormal (geometry)Three-dimensional spaceDimensional analysisMultiplication signObject (grammar)ResultantThresholding (image processing)NumberSemiconductor memoryCircle2 (number)Point (geometry)TrailLocal ringElement (mathematics)Representation (politics)Boundary value problemDistortion (mathematics)RadiometryContrast (vision)CASE <Informatik>Interpreter (computing)Computer animation
Point (geometry)World Wide Web ConsortiumEndliche ModelltheorieLinear mapObject (grammar)Boundary value problemComputer-generated imageryUniqueness quantificationComputer animation
Transcript: English(auto-generated)
Welcome to today's lecture in image analysis one. We are still in this region here
in terms of the contents of image analysis. So we are with the mid-level processes. Last week we discussed point extractors and towards the end we started with edge extractors because segmentation is about point, edge and region extraction.
We ticked off the point extraction, now we are with the edge extraction. And the first thing is when we discuss linear structures in an image, we have to differentiate between edges or step edges and lines or bars. Here I'll focus on edge extractions
of what is an edge then, an edge is an area in the image where I have a sudden change between bright and dark image areas. Typically these edges corresponds to the boundaries of homogeneous regions and thus they very often correspond to object boundaries but not always.
And I think this is where we stopped the last time. We said, well, we have a certain series of operations we have to apply to detect edges. First we have to find out which pixels may be in the vicinity of an edge or contain an edge.
Then we have to define the actual edge pixels which may be based on thinning. We may estimate the position of the pixel with sub-pixel accuracy. After this step here, we still have kind
of a digital image and every pixel kind of knows whether it's an edge or not or for every pixel we know whether it's an edge or not. And we may know the precise position of the pixel inside of the edge but we do not yet know how this edge pixels belong together.
But for instance, we wouldn't know that for instance this edge pixel here and this one actually corresponds or are parts of the same edge. So the next step is we have to apply edge tracking to get its edge pixel streaks or edge pixel strings. So we have to know how these edge pixels
belong together to form longer edge pixel strings. And then of course, all of this is still applied on a pixel to pixel level. So we will afterwards approximate these edge pixel streaks by some mathematically defined curves. Depending on which operator we apply, some of these steps may not be necessary.
This is the standard case if we work as gradient operators. Right, and now I'll have a closer look at all of these individual steps. And I'll also discuss local methods and global methods
and the ones I'm discussing here are the local methods first. So the first step was well, the extraction of edge regions. Well, there are different strategies we can apply. We could apply template matching. We could apply gradient operators
and there we can use the first or the second derivatives and we can use parametric edge modes to just look for edges that have a specific shape. Template matching, well, it's the first here but it's actually not a method that's used very often.
Don't mind, just as well, eliminate it. It can be done, right? So you can define specific filter matrices which are susceptible to extracting edges in a different region. For instance, if you use this filter matrix, you will extract vertical edges. Using this one, you will identify pixels along a horizontal edge.
And then you have two additional filter matrices for edges that are locally diagonal, locally follow a diagonal direction, right? So there can be a short range of these templates. So you apply convolution of the image
with these four marks. If the output value is larger than a threshold, then you save the output value and the orientation of the best mark which is the one having the strongest response, right? And in this way, you can identify edge pixels. This is a relatively sensitive towards image contrasts,
changes and noise, and actually it's not really used anymore today. What's used much more frequently is to use gradients. Remember, we discussed gradients in the section on image pre-processing.
So the gradient is a vector containing the first derivatives in X and Y. How do we get the first derivatives? We convolve the image with a gradient operator. Now a gradient, remember, this is a vector. Its two components are the first derivatives in X and Y respectively. And like every vector, it has a direction and length.
The length is also called magnitude or amplitude, right? And if you compute the length or norm of the gradient, then the length of the gradient is a very good idea
for whether there is an edge or not. We can also get, sorry, we shouldn't read the amplitude, we should actually read direction. Sorry, this is a hard one. So the directional angle of our vector
is orthogonal to the edge. So we get the direction of the edge by subtracting or adding 90 degrees to the equestangles of the ratio of the first derivatives. Now, if we have a cross section across the edge, this would be the gray value profile.
So we would have a low gray value on one side and a large one on the other. It might also be the other way around. So we start with the large one and then go to the low one. If we have the norm of the gradient, we will always have a shape something like this, right?
The actual position corresponds to the strongest gradient magnitude. And how do we identify edge regions there? Well, it's simple. We compute the gradient magnitude in every pixel and then we apply a threshold to the gradient magnitude
and all the pixels where the gradient magnitude is above the threshold corresponds to edge regions. This threshold is sometimes called hysteresis threshold in the picture. Now, how can you get the threshold? That's a difficult one because the actual size
of a gradient which you consider an edge depends on the image contrast. If you have a very high contrast, then you would apply a relatively large threshold. If you have a low contrast in your image, you would apply a lower contrast.
So it's this type of threshold selections always a bit difficult. It's a good idea to make your threshold dependent on the noise level of the image and the noise level of the image can be estimated from the images. There are methods out there to estimate it. I have no time to go into detail
but if you have an image, you can get an estimate of the variance of the noise component. And then, well, if you know the variance of your noise component, it means you basically know how accurate your gray values in the homogeneous areas are to emphasize there.
So for instance, if your standard deviations, sigma r, would be five gray values, a typical value by the way, well, then if the difference you observe is two,
is this significant? No, because it may actually just be due to noise. How large would we expect the difference to be? Well, you would expect it to be at least three times the noise level because if it's larger than three sigma and if you assume that the noise is normally distributed,
well, you could no longer explain your gray value magnitude by noise. And this means it's probably something that's related to the image. So then you could assume it's actually a significant edge. So we can actually then apply an authesis test.
You make the threshold dependent on the noise level of your image. This is a very good way of doing this. Right? Any questions?
Now a very famous operator that's based on this principle is the canny operator. The canny operator works just like this. The only difference is that it actually applies two different thresholds. It applies the first threshold and then extracts the edges.
And then afterwards, well, we shall see that one part of edge, or we said this already, one part of edge detection is edge tracking. And if in the tracking, the operator realizes that, well, it has reached an n pixel,
then it will still continue to track the edge if the gradient magnitude is below a second smaller threshold. This means that basically if you have a strong edge, this is considered to be an evidence that perhaps it should be continued even in areas
where the local image contrast is a bit smaller than the one indicated by the first threshold because the fact that we get there by tracking an edge already is kind of assumed to be a prior that makes it more likely for an edge to occur. So this is why the canny operator applies two such hysteresis thresholds.
But other than that, it's very much this procedure. So we compute the gradient magnitudes, apply a threshold to the gradient magnitudes. In case of canny, we apply a second one and also add pixels that are between the second and first threshold to the edge
in the edge tracking process. Okay, this is edge region extraction. And of course, as indicated here, edge regions have a certain width. So if you apply such a threshold, you cannot expect the results to be, well, a picture in which every pixel is flagged.
Well, this is what you get, a picture in which every pixel is flagged as being an edge pixel or not. But you cannot expect these streaks of edge pixels to only have a width of one pixel. The edge regions will have a width of several pixels. The width, of course, depending on the operator
you use for computing the gradients. If you use a derivative of Gaussian operator, then it will depend on the sigma value of the Gaussian. Any questions? We can also get edges from the second derivatives.
Then we would use a Laplace operator or an LOG operator if you want to apply some music. And then, well, at edges, the second derivatives vanish. So we go for zero crossings. Mind you, we are not looking for areas where the value of the second derivatives is zero.
Why? Because flat regions will also have a second derivative of zero. If there are no changes in the gray values, the result of the Laplace operator will also be zero. But we go for zero crossings. So we go for pixels where one neighbor is negative and the other one is positive.
So then we have a change, a change of the sign of the output of the LOG operator. These pixels are edge pixels. A good thing about this is we don't need any thresholds. In the earlier case, in case of the Canny operator,
we needed a threshold for the gradient magnet. See, we don't need a threshold. We just have to remember the sign changes. Very big advantage. But it comes at a cost. And the cost is the second derivatives are considered to be a bit noisier
than the first derivatives. Once we have more noise than we have in the other case. Another advantage is that we won't need to thin out the edge regions. But here, we have several pixels next to the edge that are above this threshold. Here, there's only one pixel
where we never change of the sign. So we don't have to apply the second step, the thinning step if we use the second derivatives. We get closed sets of edge pixels, or so-called closed contours. Because essentially, the edge then is a level set. This is a set, is a level,
is the curve of all, a curve containing all pixels that have an output of zero of the Laplace operator. So it's like contour line in the map. It's the zero contour line in the map. Every edge is such a zero contour.
This is something we can also not expect here. But as I said earlier, we may get noisier results. Most people use the camera. Any questions?
Right, so gradient calculation. We have an image. We get the first derivatives of x and the y. Here is the derivative of Gaussian operator. And then we get this gradient magnitude. Of course, here, you remember the fact that these images are largely gray,
or it's because the first derivative can be positive or negative. You have a shift in scale. The output of the derivative operator, such that zero corresponds to a gray level of about 127. So everything that's kind of average gray here has a value of zero. Black areas have strong negative values,
white areas have strong positive values. Well, after computing the gradient normal and gradient amplitude, there are no negative values anymore. So here, zero areas that are black correspond to areas where the gradient magnitude, the length of the gradient vector is close to zero. White areas correspond to pieces where we have a strong gradient magnitude.
And now, of course, this is the type of picture to which we apply the threshold. And this is for a derivative of Gaussian operator with a width of one, with a sigma value of one. For sigma three, of course, we see that some of these noisier edges here
kind of disappear. We get this in the context of scale space. We increase the scale parameter to some fine details are lost. The other ones are emphasized. And if you go even further with our standard deviation, our standard deviation of the Gaussian, we, this effect is kind of increased.
Any questions? Interesting thing, by the way. We kind of have a lot of room here. Where do you think this comes from? Then what do you do with the convolution
if the filter matrix doesn't fit with the image? The different things you can do with these people. I think I did this using my clock. I can't remember, actually. I think they did zero padding. So they assume everything outside of the picture is zero. And then, of course, if we get an edge, this is what's under the two points.
That's just a good massage. It may not be the smartest thing to do with one, but you should grade it. This is called zero padding. You add zeros outside the image, and then, of course, you get a strong edge and this is like that, right? It might not be the smartest thing.
Right. Gradient directions. We had this already in areas where we, in homogeneous areas, the computation of the gradient involves the division of various long numbers.
And this means it's numerically unstable and this means we just get noise. We cannot trust these gradient orientations in areas where the magnitudes are small, but in areas where the gradient magnitudes are large, that is near the dominant image edges, our gradient orientations contain information
about the direction of the edge. This is the AOG output, again, using two different values for sigma. AOG was Laplacian of Gaussian.
And sigma, of course, being the width of the Gaussian. And now, again, this is a representation where zero corresponds to every grade. So black is a strong negative output, white is a strong positive output. And what we see by the AOG operator is that next to an edge, we always have a strong maximum
and a strong minimum next to each other. And the actual edge is in the serial crossing between the black and white line. And of course, sigma will have an impact on both BPA. So now we've defined the edge pixels,
either by applying a threshold to the gradient magnitudes or by finding the serial crossings in here. Any questions?
So the second step is thinning out. We don't have to apply it if we apply serial crossing, we only have to apply it if we apply a threshold to the first derivatives or to the gradient magnitudes. So we have to reduce the size of the edge regions so that we are only left with a line of the widths of one pixel.
This is called thinning or sterilization or non-maximal suppression. Now, if you work with gradient magnitudes, the position of the edge corresponds to a local maximum of the gradient magnitude. But if we have a 2D,
which we always have 3D images, of course, we are only allowed to look for a maximum in this direction, in the direction that's orthogonal to the edge. And the direction orthogonal to the edge is the direction indicated by the gradient vector.
So we only are allowed to look for maximum in the gradient direction. If you look for relative maximum here, what we will do is we will cancel out every second edge pixel along the edge, which is of course not what we want. We want to reduce the width of the edge region,
but we do not want to thin out the edge along the edge. And this is the reason why non-max impressions are applied in the direction orthogonal to the edge direction, and this means in the direction of the gradient vector. If you wish, we may want to carry out sub-pixel positioning of the edges.
We could use global techniques, such as the half-transform, which is something I'll discuss later in this part of the jet ballistic with the local techniques. So non-max suppression for every pixel, we check two of its neighbors, whether there is a larger value
of the gradient magnitude or not. If there is not, then this pixel here is a local maximum, and we keep it, and the others are eliminated from the edge pixel. And of course, the neighbors we compare, now this depends on the direction of the gradient.
If the gradient direction is either thus, like this, or like this, then we take these two neighbors. If it's either in this direction or this direction, we take these two neighbors. If the gradient is in our subtle direction, either this way or that way, which is these two neighbors, and the other is which is these two neighbors.
So this pixel only survives if these two gradient magnitudes are smaller. And this is what we get. All right, so we have gradient magnitudes of our platoon image computed using a signal value of three. We apply a threshold, and we apply non-max suppression, and this is what we are left with.
So now all of these edge regions have only a width of one pixel. Are we finished? Because let's assume our goal is,
for one reason or the other, to extract platoon's noses. Okay, let's find the contour of the platoon count. Well, pretty obvious, it's this one, right? Yeah, sure, you may see this. But what we have here is actually a digital image
having many pixels, and all we know is that every pixel, whether it's an edge pixel or not. We do not yet know, for instance, that here we actually have a pretty long streak of edge pixels that belong together and corresponds to the boundary of an object.
All we know is for every pixel, well, it's an edge pixel or it is not an edge pixel. So we still have some way to go until we get what we want. This is another example. Every image in this case, and we, again, we see how the result depends
on the choice of the degree of smoothing we applied before computing the first derivative. So again, this is after applying, or this is the result we get after computing gradient magnitudes, applying a threshold to the gradient magnitudes and carrying out non-max suppression,
and the results are for two different degrees of smoothing used when computing the first derivatives using a derivative proportion of them. I have a question. Aha, I have an online question, oops. Doesn't non-max suppression only work
with an uneven number of pixels? If the line is at the given point, two or four pixels wide, will it be able to extract the maximum? Yes, it will. So, because what happens is, if you have, oops, you won't see this.
If you have something that's four pixels wide, and this is the maximum, so I have to place my hand such that people in the computer have a chance of seeing this. Oops. Okay, it doesn't help that everything kind of looks mirrored in the camera.
So if this is the maximum, then this one will have two neighbors that, well, it will not be the one that's a local maximum, so it will be suppressed. This one will also have a pixel with a higher value next to it. Well, this is the only one that has two neighbors where both values are smaller
than the value at this pixel, so this will be the only one that survives, right? Sorry. Sorry for choosing the wrong thing. It's embarrassing, I didn't want to, it's thoughtless. Okay, does this answer your question?
Right. Non-max positioning, sub-pixel positioning. Well, what we can do is we can take the gradient magnitudes, all right, and this would be the position of the maximum here,
and what we can do is we can approximate the shape of the amplitude by a polynomial, and the typical case is that you choose the next two neighbors, so you would choose these three points, approximate them by a second order polynomial,
and then you would see that probably the peak of the polynomial would be somewhere to the left because, well, otherwise this gray value, this magnitude here would be larger than that one. So we approximate these three points here
by a second order polynomial, and then we go for the maximum of the polynomial. So having estimated coefficients of the second order polynomial of this shape, and of course, this is a cross-section in the direction of the gradient again, so we can take the derivatives
of this second order polynomial and set it to zero, and this is an equation we get, and we can solve for S, which is the offset of the actual edge position in the direction of the gradient, and in this case, well, S would be the offset in the center of this pixel here, and the actual edge position where our polynomial has a maximum,
and this is the actual position. So in this way, we can locate the edge with an accuracy that's better than one pixel. Any questions?
We still have not, we still, or still all we have is an edge image. So all we have here is a digital image where we know for every pixel whether it's an edge or not, but we don't know how to do it together,
and here I'll discuss edge tracking. So it's a local technique. There are many different flavors to it. By the way, this is not the only way in which this can be done, but this is a typical way. So the technique is called edge tracing. We start with the first edge pixel we can find,
so we go through the image line by line, check every ray value, whether it's zero or one, and we stop at the first pixel that has a value of one, which is an edge pixel. Go to this pixel here. And now starting from this pixel, we connect it to its neighbors.
So we check its eight neighbors, and we see whether there is another edge pixel. If there is one, we add it to the edge pixel string. We do the same here. Well, we find another edge pixel, so we add it to the edge pixel string. Same here, same here, right?
So now we have already a string of edge pixels starting here, and now we are here. What happens here? Here we see that we actually have two neighbors, or two directions we could follow. We could do one of two things. We could follow one direction, and continue, or we could say, okay, obviously,
this is something like a junction point or a node point, and insert a node point here. And this is what I did here, so I insert a node point here, and I stop tracking. Now, of course, we started here, and now we have an edge pixel string going from here to here. But of course, the fact that I started here
was based on the green chance. It happened to be the highest one in the image. But actually, as we see, it might be a good idea to follow in the other direction. So what one does, one stops tracking and the tracing in one position. One inverts the order of the edge-pixel chain,
and then one continues the same thing in the other direction. So I'm going to add the next edge-pixel, the next one, and the next one, and here in this case, well, in this case, we are at the edge of each. There is no neighboring edge-pixel left, so we insert a node point and say we are finished here.
And then we will repeat the whole thing, starting from the next remaining edge-pixels, and the next edge-pixel, which would be that one, and we will track in this direction and in this direction. So in this case, we would have extracted two edge-pixel strings.
So this is edge tracing. There are different flavors to it. For instance, in this case, I just looked at the eight neighbors, but I could, of course, also consider the direction of the gradient and just look at the neighbors that are approximately orthogonal to the gradient direction.
And so on and so forth. Right? So this is a local technique. I said already we can use global techniques, such as the half-transpiration that's implemented specifically. That's edge tracing, edge tracking. It's another way of doing it.
Are we finished yet? Well, to a certain degree, but not quite then. Remember we said, okay, I want to have the group's nose, or the boundary of the group's nose. Well, at least now for all of these white pixels, I know how they're kind of connected. But this is still at the very local level, right?
So you will still have kind of the pattern of the pixels in this edge-pixel string. And you would like to approximate this using some mathematically defined functions and one type of function that can, or is it quite very obvious, polygons. All right? For instance, I assume this is your edge-pixel string.
All right? So it's a set of, a connected set of edge pixels, always a distance of one or 1.4 pixels between neighbors. And now we want to approximate this by a polygon. And a very famous algorithm for doing so
is the Takispeki algorithm. I hope I pronounced this correctly, I actually don't know. All right, so what do we do? We start by approximating this edge-pixel string by a straight line from the starting point to the end point. This is a good approximation,
well it depends on the shape of the curve, of course. What we can do is we can check, well, what's the largest distance of any point on the original group curve from this red approximation? So we select the point having the largest distance from the approximation. All right, and this will be this point, number C here.
And now what we do is we check how large this distance is. If it's below a given threshold, we say, okay, this is good enough. So let's say our threshold got one pixel, point that is furthest away from this straight line has a distance smaller than one pixel,
but that's good enough. All right, and then we will be finished. Well, very often, the distance will be larger than such a threshold. And what we do then is we introduce this point into the curve. This gives us the next approximation. What do we do next? We repeat this recursively.
So we repeat this for this area and we repeat it for this area. So we get these points having the maximum distance and then we insert the corresponding points. What do we do next? We repeat this recursively for all the polygon segments.
And at some stage, in this case, there will probably, or let's just assume that this will be the largest distance here, but let's assume it's smaller than our threshold. So we're basically done. But of course, in this way, we kind of, the red curve is not yet an optimal approximation for blue ball. What we will do is we will kind of compute
the adjusted straight line through all of the segments and then we would intersect the straight lines to give us the polygon positions. And in this case, it may actually happen that some small parts here interrelated earlier disappear.
So we have to make sure that we get a consistent shape of our polygon. And now we have a vector representation of our edge. And this is already very much what we would like to have when we describe object boundaries. How do we describe object boundaries?
Well, by a polygon. In every map, objects are described as polygons, but the polygons describe the boundaries. The same could be done for faces and images or for cars. So this is actually a first step of vectorization. We go from a raster representation to a vector representation of our boundaries,
which is very meaningful and brings us already closer to what we want to have at the end. Now, if we apply this to the green picture, I've already identified the blue's nose. No way.
Add to our example, look at the blue's nose. So what we know is now we would have approximated this by a polygon, we would also have approximated this edge by a polygon and this one and all of the others as well. We will still have to find out which of the polygons
we have corresponds to the double of the nose we're looking for. But this is another story. This involves model knowledge. This is nothing we can do just like that. Now we could analyze all of these edges and perhaps we have some kind of a model of what the edge of such a blue's nose would look like. And then perhaps based on this model knowledge,
we can select the right one. But that's another story. For the time being, we do not yet know this. So we always have to be a bit careful. If we see this, we think, oh, it's pretty easy, right? Obviously this is, well, yes, sure for us. The computer doesn't have our experience, right? It has to still, we have to still move on
to introduce more knowledge if we want to do such an interpretation task. But for the time being, we are already happy to have such a representation of every edge extracted in the image.
Any questions? Do we have to use formulas? Any opinions?
Sorry, I didn't get you acoustically. Yes, yes. But perhaps we could also save memory in other words, right, because these other types of curves, maybe.
So it's quite common to use splines, for instance. Sometimes we even have an idea of the shape, of the geometrical shape of the objects we're looking for then we might look specifically for these curves. So it might make sense to just
extract straight line segments if this is what our model requires, or ellipses or circles, and then we can apply a method such as the half truth, which I'll discuss in detail. So kind of as a summary, how do we get from edge pixels,
from edge regions to edges? Well, we first thin out the edge regions, such that those trees are one pixel wide, and we identify the nodes, which are the edge pixels that kind of have either only one neighbor, or more than two. Then we apply edge tracing,
and get these individual edge pixel streaks, all of them indicated by different color here, and afterwards we approximate them accordingly. So these are the steps we have to carry out. Any questions?
Damien, we'll of course come back to your point of origin. It's all, it works all in the same way, with one exception, and this exception is back in support of that.
Because then of course, the problem is, you're starting at a point in that vector, what can you do? Well, you have your control, it's still being represented by a vector. The vector is a certain length. You've got the length of the tree,
this gives you the limits to the vector, and you give it a point kind of opposite, to the starting and end points, and then you give it a system there. So, A and B would be identical, or at this point,
these two points would actually be identical. Well, the blue line would move on like this, but you would choose the point summary center, as this first approximation, the first approximation of the nodes control would be the connection of these two points here, and then you would apply this procedure to both halves of the vector in the very same way. So the only difference is really that you start,
you can see close contouring, nothing else start, with the connection of the starting and the end point, because they are identical, you have to select the start point, start and end point, which is the same point of course, and some point in the other. And if you represent the edge with the string by a vector, as I said earlier, then you just take the length of the vector,
divide it by two, this would be the index to the vector, and this is your, okay? More questions? Good question, by the way. Right. Global methods.
Those of you who are geologists, there are not so many, I see one at least. Those of you who are geologists, we have seen this in my lecture on image processing already. So half transform is a method to search for curves of a given type of parametrization.
So in this case, we're not looking for polygons, we're looking for simpler curves. They can be described by a small number of parameters. And the classical case is to look for straight lines, of course. So if we look for a straight line, we need to define two parameters,
because a straight line in an image has two parameters, and a good parametrization is to use the length of the orthogonal vector and the direction of the orthogonal vector from the origin of the coordinate system to our straight line. So it's this form of the straight line equation.
X times the cosine of the direction and of the normal plus Y times the sine of the direction and of the normal minus rho, which is the distance of the straight line from the origin is zero. And the parameters of the straight line
are over theta. Now half transform is applied to edge pixel images, typically after thinning. So every edge pixel situated on a straight line would satisfy this constraint.
And typically, if you know the straight line, this is kind of an equation in the image space that connects all points on that straight line that's on points X and Y. Now, we exchange our way of looking at this equation. If we have an edge pixel image for every pixel,
we know that it's an edge pixel. Now, if we now consider X and Y to fix, this becomes a function in a two-dimensional space whose axes are theta and rho. And so actually, every pixel corresponds
to some curve in this space, and this space is called the parameter space because its axes correspond to the parameters of the curve we're looking for. In this case, it's the parameters of a straight line. And this equation suddenly becomes the equation of all straight lines that pass through our points
at the coordinates X and Y. This is called the tensing of straight lines passing through X, Y, and in the other space, well, this corresponds to such a curve given by this equation for one point X, Y, P is called as X, Y. So the tensing of lines passing through our point P
corresponds to such a curve in this parameter space. And here, we also see we have X and Y, we have our vector, that's the orthogonal vector, to our straight line, theta is the direction and angle,
rho is the length of that vector, and thus, it's also the distance of the straight line from the origin. Now, what happens if we have three points that are all on the same straight lines? Then for every point, we get a curve like this
in the parameter space, we get these three curves, and the interesting thing is that all of these curves will intersect in the same point, and this point corresponds to the straight line that connects our three points, because each of these tensing of lines
will contain this red line here, this red line here has a specific pair of parameters, rho zero and zeta zero, so these are the parameters of this specific straight line, and all the tensing of lines indicated by these three edge pixels
will intersect at this point. And this gives us a way of actually detecting straight lines, because what we do is, we define something like a digital image in this space, we call this an accumulator,
we initialize all the elements here by zero, and then we will go through our image and check for all of the edge pixels wherever we have an edge pixel, we will increase the value of the accumulator by one along this line corresponding to that point
in the parameter space, and we do this for the next edge pixel and for the next one and for the next one and so on. And now if we have straight lines with many edge pixels, then at these straight lines,
at the parameters of these straight lines, we will have local maximum as accumulator. For instance, let's assume that we only had these three edge pixels, and this would be, what would the accumulator be like after filling it? Well, here we would have a value of zero, along this curve we would have a value of one,
here as well, here as well, except at this point, at this intersection point, the accumulator would actually have a value of three. So we have to look for maximum in this accumulator, they correspond to the straight lines, and this way we extract them.
Okay, any questions? It's a global technique, there is no local tracking way. Can we expand this to other curves as well? Yes, we can.
Problem is we have to kind of keep track of the number of parameters our curves have, because the dimensionality of the parameter space is identical to the number of parameters we have. So it's pretty simple with a straight line, it has two parameters, so it's a 2D accumulator. What about the circle?
Ouch, three. So we'd have a 3D accumulator, well, it could be downpoking. In particular, if we knew the circle, so if we knew the circle, if we knew the radius, if we knew the radius, it would be pretty easy, right? So assume we have a circle,
well, if we have four edge pixels that are on the circle, what would be the corresponding curve in the parameter space? Now what's the parameter space? If we keep the radius fixed, then our parameter space is again two dimension, its axis are the coordinates at the center,
and each of these edge pixels would correspond to a circle in this parameter space. The point where this whole circle is incorrect, the second would correspond to the center of the circle. Of course, you could expand this in 3D to also make the radius of the circle third
dimension would correspond to the radius. But this would already be a 3D whole plot, because it would require a lot of memory. We suddenly have a 3D accumulator, a 3D array. What about an ellipse?
Ouch, five parameters, five dimensional accumulator space. Now it's getting more and more difficult, but we would only apply these two curves with a very small number of parameters. This is just an example for straight lines,
so let's assume this dimension. This will be the norm of the gradient. We would apply a thresholding of the gradient norm, we would apply a thinning of the edge pixel regions, and then we would get an accumulator here. So this would be the result we have. Here we have the directional angle
of the normal vector radius, and we see we do have some strong maximum, they're always a bit blurred. So we have regions with high accumulator values, but they are always a bit blurred, so we have to really look for the little eczema in there, and they correspond to straight lines.
And if you take the straight lines, you can super close them to give a check to see, well, they are sure they are working well. Are we finished now?
The problem is, where do these edges stop? Where do these straight lines stop? Where do they stop here? Well, what we get is a pair of parameters. We do not know where in the image the straight line starts, and we do not know where it ends.
As a matter of fact, as in this case, well, we may actually have two straight line segments that have the same parameters, but perhaps should be split. So what we actually have to do next is,
after having gone back to the image, we have to check for support of these straight lines that are defined by a pair of parameter values to find out, well, where do we actually have an edge in the image, and where don't we have an edge in here? So for instance, this edge here has to be cut
to this segment from here to here, and the rest of it has to be deleted at the same time to all of the other straight lines. So you have to go back to the image space and look where do we actually have support for each of these line segments.
Any questions? This is a very old principle, I think it was invented in the 60s already. So for what purposes do we need edges that either if we directly go for linear objects
or if we go for object boundaries, image matching will be another application, so we might look for corresponding edges to get the construction of edges in 3D. The requirements we have completely depend on the application. Sometimes accuracy is more important than in other cases.
This is pretty much the same thing. We also have this point, and it's very important that we apply to be invariant against geometric and radiometric distortions. Radiometric distortions, well, for instance, if you apply the k-operator, there it's all about the threshold, right?
How do we choose the threshold? If we have a full contrast, we choose a narrow threshold, then we get strong contrast in general, and this is not so easy to get. The extracted edges are usually incomplete, so we do need some pre-processing.
It's always a good idea to smooth the images before extracting edges. And the interpretation itself is not a part of edge extraction. It just delivers us a good representation
of where object boundaries may be. But whether or not the edges we have detected are the boundaries of the objects we are searching for, this is something that needs more knowledge, and that is what happens in later parts of the process. And of course, it's usually the requirements
depend on the application. Any questions? Done with edges?