Point Segmentation - Part IV
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 21 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/59685 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer |
Content Metadata
Subject Area | |
Genre |
Image analysis10 / 21
1
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
00:00
Axiom of choiceStandard deviationComputer-generated imageryScale (map)Invariant (mathematics)Direction (geometry)DistancePoint (geometry)Orientation (vector space)HistogramRotationNeighbourhood (graph theory)Order of magnitudeMaxima and minimaClique-widthScaling (geometry)Invariant (mathematics)Content (media)Direction (geometry)Point (geometry)Cartesian coordinate systemHistogramRotationOrientation (vector space)Order of magnitudeSpacetimeDifferent (Kate Ryan album)Game theoryPairwise comparisonObject (grammar)Medical imagingForm (programming)Chemical equationNumberPixelEndliche ModelltheorieWindowCorrespondence (mathematics)GradientMultiplicationTerm (mathematics)Order (biology)Series (mathematics)Instance (computer science)Descriptive statisticsMaxima and minimaVector potentialInformationNeighbourhood (graph theory)Clique-widthConnected spacePosition operatorSession Initiation ProtocolAnglePersonal identification numberSummierbarkeitCountingVector spaceWordFault-tolerant systemComputer animation
07:48
Direction (geometry)PixelGradientComputer-generated imageryOrientation (vector space)Invariant (mathematics)Scale (map)Coefficient of determinationHistogramWeight functionClique-widthElement (mathematics)EmailDifferent (Kate Ryan album)Point (geometry)Physical systemObject (grammar)Graph (mathematics)TesselationScripting languageDistanceHistogramGodDirection (geometry)Connectivity (graph theory)PixelCartesian coordinate systemVector spaceType theorySummierbarkeitDistribution (mathematics)MeasurementGradientOrientation (vector space)Similarity (geometry)Level (video gaming)Pairwise comparisonMathematicsMultiplication signLengthDegree (graph theory)Normal (geometry)Representation (politics)Medical imagingPersonal identification numberoutputOrder of magnitudeSign (mathematics)Element (mathematics)Patch (Unix)Block (periodic table)Position operatorParameter (computer programming)Square numberComputer animation
15:35
PixelDirection (geometry)Weight functionHistogramClique-widthElement (mathematics)Scale (map)Invariant (mathematics)Coefficient of determinationUniqueness quantificationPoint (geometry)Computer-generated imageryLocal ringCartesian coordinate systemOrientation (vector space)Medical imagingTesselationFeature spaceConjugacy classArithmetic meanDimensional analysisDifferent (Kate Ryan album)CASE <Informatik>BitExpressionMatching (graph theory)HistogramPrice indexObject (grammar)Mathematical optimizationInvariant (mathematics)Descriptive statisticsRotationDistortion (mathematics)Point (geometry)SpacetimeNeighbourhood (graph theory)Instance (computer science)2 (number)PixelFitness functionDivisorAlpha (investment)Subject indexingUniqueness quantificationSummierbarkeitMathematicsOrder (biology)Parameter (computer programming)Vector spaceElement (mathematics)GradientNormal (geometry)Order of magnitudeState of matterLoop (music)Block (periodic table)Direction (geometry)Contrast (vision)DeterminantRadiometryGeometryFacebookMultiplication signPattern languageComputer animation
23:23
Uniqueness quantificationPoint (geometry)Computer-generated imageryCoefficient of determinationDivisorSquare numberSoftware testingSoftware developerImage resolutionDirection (geometry)Invariant (mathematics)Affine spaceDirection (geometry)BefehlsprozessorDifferent (Kate Ryan album)Scaling (geometry)State of matterMedical imagingPosition operatorInvariant (mathematics)Affine spaceProcedural programmingSoftware testingEstimatorPoint (geometry)Standard deviationSingle-precision floating-point formatSimilitude (model)ComputerScalabilityOperator (mathematics)Multiplication signParameter (computer programming)Process (computing)Matching (graph theory)BitOrientation (vector space)ResultantImage resolutionDegree (graph theory)AngleView (database)Vertex (graph theory)AreaTerm (mathematics)Filter <Stochastik>Physical systemFormal languageField (computer science)Focus (optics)Wave packetSampling (statistics)Student's t-testNumberCASE <Informatik>Mobile appRevision controlGame theorySpacetimeMathematical analysisCalculationLevel (video gaming)Local ringGeometryMathematicsPatch (Unix)Transformation (genetics)Coefficient of determinationGradientCorrespondence (mathematics)Theory of relativitySquare numberLimit (category theory)Computer animation
Transcript: English(auto-generated)
00:10
So hello, sorry for the talk we made, we were discussing point extractors in the morning session and we've come up with a series of potential point extractors
00:25
and then we were starting to ask ourselves how can we describe a point so that afterwards by comparing the descriptions we can find out which of the points corresponds to each other
00:42
and well we said in the beginning, very often people took little image windows to a certain point and then we could compare these image windows and this is a problem that is not invariant with respect to scale and rotation
01:02
and so of course people tried to come up with other ways of describing a point and the one I will discuss now in this lecture is the SIP descriptor it's a BS but it is okay with each other
01:37
to kind of connect images together about which you do not know much else
01:43
except that they kind of overlap so the image content kind of overlaps this is the information we kind of have well we actually don't even have this one, we can also use this to find out whether they overlap if you can't find any corresponding points for instance
02:02
and if you can't, then probably you don't know that but in order to do so, we cannot expect to know anything about the images in terms of how the general point of this is to find out how they are related to each other can also be used for object detection of course
02:20
I mean it depends on the way I take a picture of the object that I'm looking for and then you describe the features on the model of the object and then you find the corresponding features in the green image and you get a sufficient number of edges to control your image the other image and then you want to affect the object and also it's perfect in the image
02:40
that's the balance of application of the problem this is where points are used basically they are used for geometric purposes so you can find out the relative possibilities of this we shall see that these descriptors
03:01
they will pop up in a slightly different form also when we discuss features men, handcrafted features or characterizing objects so we said
03:29
so six descriptors so when people speak about sift features I said this already but they usually mean points extracted into a dot detector
03:41
with a sift feature descriptor and it's game invariant which is achieved by defining the descriptor and the scale of the point the point being extracted in scale space we know the scale of the point the descriptor is based on histograms in gradient directions and we achieve
04:01
rotation invariance by referring all directions to the main direction of the point this is called orientation of the point in the picture and it's just the maximum of our histogram of gradient orientations and if there's multiple maximums we will have multiple points at the same position all of them with different
04:21
orientation and this is very slow so orientation we compute our histogram of gradient directions so theta will be the direction of the gradient vector we have one such angle in every position and we compute it
04:42
in a local neighborhood of the point defined in terms of pixels at the scale at which the point was extracted and now we get this histogram with widths of ten grace words so we have thirty-six
05:01
pins in our orientation histogram now basically we could just count how often every orientation angle occurs that would be the standard way of computing a histogram if you don't count but you take the sum of the magnitude of the gradients in each pin which means that strong gradients
05:20
will have a high impact on the histogram and this is what we want to have from the main orientation we should refer to this to be dominated by the strong magnitude by the strong gradients and then we look for the maximum of the histogram of this weighted histogram and this is the main direction
05:42
if we have multiple strong maxima we have multiple points with different orientation and as the scripter is referred or refers to the orientation of the point in the main gradient direction we have UG rotation invariance so if we have the same object
06:01
we take a picture like this we take a picture like this and hopefully we can describe the goal in the main direction one which we do like this and the other one which we do like this and then we can do the comparison because the difference in the orientations compensates for the rotation about the U-hacks this is what's meant by
06:21
rotation invariance
07:09
the axis corresponds to the main direction so we really rotate the image content adds the scale of the digital point was extracted
07:21
and now in this rotated grid in this rotated grid we consider a neighborhood of 16 by 16 pixels why? why not? so the scripter is the handcrafted
07:42
feature vector the scripter vector that is the person with the input or as you know probably he could have different values for these parameters and then he said okay, using 16 by 16 gives a reasonable tradeoff between speed and accuracy
08:01
so it's a handcrafted feature vector you can of course criticize the fact that this is only 16 by 16 anyway, now each of these blocks of 16 by 16 pixels is split into further sub-blocks
08:20
of 4 by 4 pixels each and now we compute the gradients, compute the magnitudes and the directions as the whole image patch has been aligned with the main orientation also all of this orientation and the gradient vectors refer
08:42
to the orientation of that point so we have all together for 16 such tiles and what we do next is we compute histograms, we compute one histogram for each of these tiles in the way described earlier
09:01
so we do not just count how often every orientation we go first but we take the sum of the gradient magnitudes of the strong gradients and the strong impact these histograms have 8 entries, so the input on this is 45 degrees
09:21
well, much more would make sense we only have 16 pixels in each tile so we only have 15 gradients between gradient vectors in each tile and what we see here is kind of distribution you see here, so the directions here are the pin centers and the length here is the size of the
09:41
corresponding histogram and we have one such histogram for each of these tiles here and what we do then is we take these histograms as vectors and stack them on top of each other like this, right, so we have the first histogram
10:01
the histogram of the first tile which is a vector having 8 elements and stack it on top of our long descriptor layer then we take the second one, well that's the next 8 entries, this one the next 8 entries and so on all 16 histograms collected in one long
10:21
descriptor which now has a 128 elements a 128 well, that's 16 histograms times 8 entries so it's a 128 dimensional descriptor which collects all of these histograms
10:41
and these histograms are kind of considered to be characteristic or representative for that point and this is the descriptor that's stored now after stacking all of these histograms there is actually a normalization
11:01
so we normalize it so that it has human math this is supposed to lead to some degree of where the math would be varied so if you have poor contrast in general your histogram accuracy would be smaller and you compensate for this
11:20
by normalizing the size of this the length of this normal is restricted to become 1 and this is the famous safety script and you can imagine that this type of distribution of the gradient orientation may no longer be characteristic for a pointy extract
11:40
using some point detector it might also be characteristic for objects and this is also where this type of vector this is an application of this type of vector as well so these kind of vectors based on these graphs point to the gradient directions these are
12:01
very important features for objects thank god for these features for objects any questions? so that's the script why 8 entries? well, yet another
12:22
why 4 times 4 by 4? there was certainly some comparison at some stage and it turned out that this is a pretty good descriptor now how do we measure the similarity of descriptors? two descriptors
12:48
we actually measure similarity but if we use this descriptor the measure we use for finding out whether two descriptors belong together or not is the Euclidean distance
13:02
of these descriptors so if two descriptors are similar then the Euclidean distance will be small because many components will be close to each other and we know the large differences the differences in the components so we can compile all of these components
13:20
all of these differences in one distance and that's the difference that's what's commonly used in coding systems the difference between the two different is just that we take the
13:42
differences of the components we take the square sign of the differences and take the square sign so it's the entries that we use here so it's sums of gradient magnitudes falling into a specific direction
14:01
in a specific position in the receiving point so the first feature this is our point the first feature is the sum of all gradient magnitudes for which the gradient direction is invisible this is the first and it's also the first element
14:22
on the feature axis the second one is this magnitude here and so on and so forth third, fourth and so on seventh, third, eighth ninth, tenth and so on
14:41
and after that the whole thing is normalized such that the square sign is supposed to be a big deal
15:01
and it still is, so it was really a major step forward in computing point descriptors in the characterization of points for every point
15:38
consider the 60 by 60 pixels for every pixel
15:41
they have the gradient direction the gradient magnitude for each of these blocks of 4 by 4 pixels we compute the histogram we take the elements of this histogram which we consider to be a vector runs on the x-axis we have the orientation base and there we have the sums of the magnitudes of all gradients in the bit
16:01
referring to this 4 by 4 pixels tile and we have one for the next one and for the next one we stack all of these orientation histograms on top of each other 16 of them, in this order and then we normalize the whole thing so we drive to the next
16:21
so that we can see we do this for every point we extract from the first image we do this for every point we extract from the second image then we build the spatial index from the descriptors of the first image and then we take every description from the second image and search for the first image
16:51
we also take the instance to the second nearest neighbor because it is easier if you search for the second there is a large instance in the nearest point in the second image
17:20
the nearest neighbor in the feature space is the nearest neighbor in the point in this space having a hundred and twenty eight dimensions so it's the feature space of the descriptors
17:56
that is the same point in the other images but we can't be sure
18:02
and one indication that it's a wrong match is that there is another point which is just a singular but only a bit less similar what you typically do is a complete ratio in these cases the second largest point and the largest point and if that ratio is
18:21
let's say only 0.9 but then you say okay no let's say there are no trusses if this ratio is somehow 0.5 one is twice as far away as the other so you can't be pretty sure about these features so if you then use this point
18:41
you always have to take into account may still have alpha and may still have wrong matches and then have to be eliminated using some other way we estimate we estimate geometrically trans-semination points since the loop feature matches the state
19:11
to some seconds what happens here is that the parameters of the image the size of the image doesn't divide
19:22
it's indivisible by 16 and alpha is 0.9 what is the cell-only histograms we'll be able to know? I'm not correct here it's only 16 it's only 16, obviously but if the image is, for example,
19:44
27 it's quite well it teaches you to take using a camera and be always setting the data so it's always needed by yeah, if your image is so small I mean, imagine but it's not necessarily small
20:01
it doesn't divide into in fact, tiles without their makeup you can always do that you have a certain image and this is your extracted point so you always take this the only case where this
20:23
wouldn't fit is if you point the idea but this is the case you exclude or if your image were too small such that your neighborhood would be larger but then I would argue it doesn't make sense to connect these images anyway because what you see is 16 by 16 exactly nothing
20:40
more questions? so what are applications of point extraction? well, point optimization of course, image orientation
21:01
that is computing the alignment of two images with respect to each other that we want to identify from the points determination of parallax is what's said but that is what we're picking up two images, we want to find out which points belong together so that we can then compute these three-D object coordinates
21:21
of the point expression intersection parallax is the difference in the x coordinates which can be translated to the x or business from the camera and in general image matching so image matching means we want to find conjugate points between pairs of images
21:40
the requirements depend on the application sometimes geometrically accuracy is very important localization accuracy is uniqueness or rarity it's always problematic if we have points or patterns that occur many many times in the image
22:00
imagine zebra crossing if you extract a corner of the zebra crossing the next corner of the same zebra crossing is the same and the zebra crossing is the corner of the zebra crossing you can use this further on also looks exactly the same these are not good points having said that, well how can you differentiate
22:21
not really? so we will have to deal with the fact that we get wrong matches so it's a bit of Facebook peeping penouts and in general we want to have invariants against geometric and radiometric distortions now sift against which of these distortions is it invariant? well it is invariant
22:41
to some unique to radiometric changes so if the brightness changes well that's covered by the fact that we only use gradients so the brightness will cancel out the brightness change will cancel out contrast changes well that's kind of partly compensated by its normalization the feature vector other than that, well you can have very
23:02
large radiometric differences it's just people try to connect it's just taking time it's just taking time it's hopeless there are other methods for which you can achieve this but you can't do this properly it doesn't work at all what about geometric distortion?
23:21
well it's rotation invariant it's scaling invariant to some degree so the experience shows that it's scale difference it can't be very large then it kind of doesn't work anymore very well but it is scaling invariant to some degree is it affine invariant?
23:42
it is affine invariant in the expanded version of the ASIC version where we just have the ASIC detector and then of course you remember where you simulate images taken from different viewing directions and then of course if you extract
24:01
your SIF descriptors from the simulated images it is also representative for a specific viewing app and this means that this variant would be this variant of the SIF descriptor would be affine invariant but the standard SIF case is not affine invariant
24:20
and we hear very different viewing directions you can see that the number of matches you get between your SIF goes down I mentioned Jane Bing in the morning out with her former PhD student we used Oblique Aerial Cameras for this experiment he was working
24:40
affine invariant descriptor matching all descriptors for language matching and there are aerial camera systems out there where you do not probably ever take the view of views this is used in the vertical viewing direction but you typically have five cameras, one looking downwards two looking sidewards with an angle of
25:01
say 30.5 degrees and you have two looking in forward and backward equal directions and there you have of course very different viewing directions which show that using affine invariant descriptors you get a lot more matches which means that if you give out geometrical parameters
25:21
they are much smoother the estimation process is much more stable the quality of the parameters is much better well, there is a test I mean the test we made some time ago I have to say there is a colleague
25:42
calling from Italy Roberto from Italy he compared different point extractors and descriptors for matching he concluded that single point detectors in particular the first operator provide a higher accuracy if the images are suitable
26:00
so that means the state difference between images is not very large then of course you can use the single point detectors to find points in the suitable for matching and then if you do so the accuracy is higher the accuracy of these points is higher the accuracy of these points is higher the accuracy of these points is better
26:22
the most accurate one was the first detector but it's relatively slow because you have this estimation procedure of the sub-pixel positions of the points he also compared affine invariant detectors
26:41
of course if you use an affine or a state invariant detector you can also connect images which are not taken from the CPU in the same human direction using the same scale which you can't do very well if you use the first detector then you still have the kind of limited accuracy but you can improve this if you apply these squares
27:01
so if you have established the relation between corresponding points you can then take local image patches initialize the transformation the node orientation difference and then you can locally adapt the transformed image patch
27:21
which has now been corrected for these large changes in the geometry already and you can achieve a fine localization of the point in the other image and you can actually improve the geometrical quality to a level in which you can achieve using the single point detectors but you could do so in a more typical scenario
27:41
in which you could apply these point detectors now all of these tests were based on images that were suited for the single point detectors so it's a bit difficult to generalize what would have happened if they had had strong scale differences and they wouldn't have hit the problem
28:01
or the results would have looked differently then this would have worked in the first place and that's pretty awesome so simple point detectors do that work well with just having different resolutions and large differences in view directions
28:20
for view we can use skin-in-the-air and the metal-in-the-air detectors sift features turned out to be very suitable so you could say there is a pre-sift area and a post-sift area in terms of point-based matching it was really a major step forward in my paper
28:43
we briefly mentioned SURF SURF is based on similar principles but it always uses moving average filters which it computes in a very efficient way and the descriptor also uses features that can be computed with a very efficient way
29:00
so it's much faster to compute it and this is still a pretty active field of research so other detectors and descriptors are still being developed currently the focus is on learning extractors and descriptors from training samples
29:21
and here of course everybody who is in the field who works in the team all uses deep learning approaches which are always excellent well, in this game space analysis
29:48
you can see both the images they use moving average filters and if you compute this in a very efficient way you just can do all the calculations faster we also use the nation detector
30:01
and not the dog detector in different scale levels but other than that the whole thing is really a bit I mean, I think that modulation works and see if that's like simplified you're all doing away with all the stuff that's required by theory such that we can make it faster
30:20
that's a good direction or there is something similar to that but the features are kind of computed in based on in a very efficient way based on
30:41
so it's the same principle but all the calculations are kind of simplified so that they get faster it's still gradient orientation but the gradients in different scales are computed in a different way so that's what I wanted to say about Point
31:07
extractors