We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Point Segmentation - Part IV

00:00

Formal Metadata

Title
Point Segmentation - Part IV
Title of Series
Number of Parts
21
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer

Content Metadata

Subject Area
Genre
Axiom of choiceStandard deviationComputer-generated imageryScale (map)Invariant (mathematics)Direction (geometry)DistancePoint (geometry)Orientation (vector space)HistogramRotationNeighbourhood (graph theory)Order of magnitudeMaxima and minimaClique-widthScaling (geometry)Invariant (mathematics)Content (media)Direction (geometry)Point (geometry)Cartesian coordinate systemHistogramRotationOrientation (vector space)Order of magnitudeSpacetimeDifferent (Kate Ryan album)Game theoryPairwise comparisonObject (grammar)Medical imagingForm (programming)Chemical equationNumberPixelEndliche ModelltheorieWindowCorrespondence (mathematics)GradientMultiplicationTerm (mathematics)Order (biology)Series (mathematics)Instance (computer science)Descriptive statisticsMaxima and minimaVector potentialInformationNeighbourhood (graph theory)Clique-widthConnected spacePosition operatorSession Initiation ProtocolAnglePersonal identification numberSummierbarkeitCountingVector spaceWordFault-tolerant systemComputer animation
Direction (geometry)PixelGradientComputer-generated imageryOrientation (vector space)Invariant (mathematics)Scale (map)Coefficient of determinationHistogramWeight functionClique-widthElement (mathematics)EmailDifferent (Kate Ryan album)Point (geometry)Physical systemObject (grammar)Graph (mathematics)TesselationScripting languageDistanceHistogramGodDirection (geometry)Connectivity (graph theory)PixelCartesian coordinate systemVector spaceType theorySummierbarkeitDistribution (mathematics)MeasurementGradientOrientation (vector space)Similarity (geometry)Level (video gaming)Pairwise comparisonMathematicsMultiplication signLengthDegree (graph theory)Normal (geometry)Representation (politics)Medical imagingPersonal identification numberoutputOrder of magnitudeSign (mathematics)Element (mathematics)Patch (Unix)Block (periodic table)Position operatorParameter (computer programming)Square numberComputer animation
PixelDirection (geometry)Weight functionHistogramClique-widthElement (mathematics)Scale (map)Invariant (mathematics)Coefficient of determinationUniqueness quantificationPoint (geometry)Computer-generated imageryLocal ringCartesian coordinate systemOrientation (vector space)Medical imagingTesselationFeature spaceConjugacy classArithmetic meanDimensional analysisDifferent (Kate Ryan album)CASE <Informatik>BitExpressionMatching (graph theory)HistogramPrice indexObject (grammar)Mathematical optimizationInvariant (mathematics)Descriptive statisticsRotationDistortion (mathematics)Point (geometry)SpacetimeNeighbourhood (graph theory)Instance (computer science)2 (number)PixelFitness functionDivisorAlpha (investment)Subject indexingUniqueness quantificationSummierbarkeitMathematicsOrder (biology)Parameter (computer programming)Vector spaceElement (mathematics)GradientNormal (geometry)Order of magnitudeState of matterLoop (music)Block (periodic table)Direction (geometry)Contrast (vision)DeterminantRadiometryGeometryFacebookMultiplication signPattern languageComputer animation
Uniqueness quantificationPoint (geometry)Computer-generated imageryCoefficient of determinationDivisorSquare numberSoftware testingSoftware developerImage resolutionDirection (geometry)Invariant (mathematics)Affine spaceDirection (geometry)BefehlsprozessorDifferent (Kate Ryan album)Scaling (geometry)State of matterMedical imagingPosition operatorInvariant (mathematics)Affine spaceProcedural programmingSoftware testingEstimatorPoint (geometry)Standard deviationSingle-precision floating-point formatSimilitude (model)ComputerScalabilityOperator (mathematics)Multiplication signParameter (computer programming)Process (computing)Matching (graph theory)BitOrientation (vector space)ResultantImage resolutionDegree (graph theory)AngleView (database)Vertex (graph theory)AreaTerm (mathematics)Filter <Stochastik>Physical systemFormal languageField (computer science)Focus (optics)Wave packetSampling (statistics)Student's t-testNumberCASE <Informatik>Mobile appRevision controlGame theorySpacetimeMathematical analysisCalculationLevel (video gaming)Local ringGeometryMathematicsPatch (Unix)Transformation (genetics)Coefficient of determinationGradientCorrespondence (mathematics)Theory of relativitySquare numberLimit (category theory)Computer animation
Transcript: English(auto-generated)
So hello, sorry for the talk we made, we were discussing point extractors in the morning session and we've come up with a series of potential point extractors
and then we were starting to ask ourselves how can we describe a point so that afterwards by comparing the descriptions we can find out which of the points corresponds to each other
and well we said in the beginning, very often people took little image windows to a certain point and then we could compare these image windows and this is a problem that is not invariant with respect to scale and rotation
and so of course people tried to come up with other ways of describing a point and the one I will discuss now in this lecture is the SIP descriptor it's a BS but it is okay with each other
to kind of connect images together about which you do not know much else
except that they kind of overlap so the image content kind of overlaps this is the information we kind of have well we actually don't even have this one, we can also use this to find out whether they overlap if you can't find any corresponding points for instance
and if you can't, then probably you don't know that but in order to do so, we cannot expect to know anything about the images in terms of how the general point of this is to find out how they are related to each other can also be used for object detection of course
I mean it depends on the way I take a picture of the object that I'm looking for and then you describe the features on the model of the object and then you find the corresponding features in the green image and you get a sufficient number of edges to control your image the other image and then you want to affect the object and also it's perfect in the image
that's the balance of application of the problem this is where points are used basically they are used for geometric purposes so you can find out the relative possibilities of this we shall see that these descriptors
they will pop up in a slightly different form also when we discuss features men, handcrafted features or characterizing objects so we said
so six descriptors so when people speak about sift features I said this already but they usually mean points extracted into a dot detector
with a sift feature descriptor and it's game invariant which is achieved by defining the descriptor and the scale of the point the point being extracted in scale space we know the scale of the point the descriptor is based on histograms in gradient directions and we achieve
rotation invariance by referring all directions to the main direction of the point this is called orientation of the point in the picture and it's just the maximum of our histogram of gradient orientations and if there's multiple maximums we will have multiple points at the same position all of them with different
orientation and this is very slow so orientation we compute our histogram of gradient directions so theta will be the direction of the gradient vector we have one such angle in every position and we compute it
in a local neighborhood of the point defined in terms of pixels at the scale at which the point was extracted and now we get this histogram with widths of ten grace words so we have thirty-six
pins in our orientation histogram now basically we could just count how often every orientation angle occurs that would be the standard way of computing a histogram if you don't count but you take the sum of the magnitude of the gradients in each pin which means that strong gradients
will have a high impact on the histogram and this is what we want to have from the main orientation we should refer to this to be dominated by the strong magnitude by the strong gradients and then we look for the maximum of the histogram of this weighted histogram and this is the main direction
if we have multiple strong maxima we have multiple points with different orientation and as the scripter is referred or refers to the orientation of the point in the main gradient direction we have UG rotation invariance so if we have the same object
we take a picture like this we take a picture like this and hopefully we can describe the goal in the main direction one which we do like this and the other one which we do like this and then we can do the comparison because the difference in the orientations compensates for the rotation about the U-hacks this is what's meant by
rotation invariance
the axis corresponds to the main direction so we really rotate the image content adds the scale of the digital point was extracted
and now in this rotated grid in this rotated grid we consider a neighborhood of 16 by 16 pixels why? why not? so the scripter is the handcrafted
feature vector the scripter vector that is the person with the input or as you know probably he could have different values for these parameters and then he said okay, using 16 by 16 gives a reasonable tradeoff between speed and accuracy
so it's a handcrafted feature vector you can of course criticize the fact that this is only 16 by 16 anyway, now each of these blocks of 16 by 16 pixels is split into further sub-blocks
of 4 by 4 pixels each and now we compute the gradients, compute the magnitudes and the directions as the whole image patch has been aligned with the main orientation also all of this orientation and the gradient vectors refer
to the orientation of that point so we have all together for 16 such tiles and what we do next is we compute histograms, we compute one histogram for each of these tiles in the way described earlier
so we do not just count how often every orientation we go first but we take the sum of the gradient magnitudes of the strong gradients and the strong impact these histograms have 8 entries, so the input on this is 45 degrees
well, much more would make sense we only have 16 pixels in each tile so we only have 15 gradients between gradient vectors in each tile and what we see here is kind of distribution you see here, so the directions here are the pin centers and the length here is the size of the
corresponding histogram and we have one such histogram for each of these tiles here and what we do then is we take these histograms as vectors and stack them on top of each other like this, right, so we have the first histogram
the histogram of the first tile which is a vector having 8 elements and stack it on top of our long descriptor layer then we take the second one, well that's the next 8 entries, this one the next 8 entries and so on all 16 histograms collected in one long
descriptor which now has a 128 elements a 128 well, that's 16 histograms times 8 entries so it's a 128 dimensional descriptor which collects all of these histograms
and these histograms are kind of considered to be characteristic or representative for that point and this is the descriptor that's stored now after stacking all of these histograms there is actually a normalization
so we normalize it so that it has human math this is supposed to lead to some degree of where the math would be varied so if you have poor contrast in general your histogram accuracy would be smaller and you compensate for this
by normalizing the size of this the length of this normal is restricted to become 1 and this is the famous safety script and you can imagine that this type of distribution of the gradient orientation may no longer be characteristic for a pointy extract
using some point detector it might also be characteristic for objects and this is also where this type of vector this is an application of this type of vector as well so these kind of vectors based on these graphs point to the gradient directions these are
very important features for objects thank god for these features for objects any questions? so that's the script why 8 entries? well, yet another
why 4 times 4 by 4? there was certainly some comparison at some stage and it turned out that this is a pretty good descriptor now how do we measure the similarity of descriptors? two descriptors
we actually measure similarity but if we use this descriptor the measure we use for finding out whether two descriptors belong together or not is the Euclidean distance
of these descriptors so if two descriptors are similar then the Euclidean distance will be small because many components will be close to each other and we know the large differences the differences in the components so we can compile all of these components
all of these differences in one distance and that's the difference that's what's commonly used in coding systems the difference between the two different is just that we take the
differences of the components we take the square sign of the differences and take the square sign so it's the entries that we use here so it's sums of gradient magnitudes falling into a specific direction
in a specific position in the receiving point so the first feature this is our point the first feature is the sum of all gradient magnitudes for which the gradient direction is invisible this is the first and it's also the first element
on the feature axis the second one is this magnitude here and so on and so forth third, fourth and so on seventh, third, eighth ninth, tenth and so on
and after that the whole thing is normalized such that the square sign is supposed to be a big deal
and it still is, so it was really a major step forward in computing point descriptors in the characterization of points for every point
consider the 60 by 60 pixels for every pixel
they have the gradient direction the gradient magnitude for each of these blocks of 4 by 4 pixels we compute the histogram we take the elements of this histogram which we consider to be a vector runs on the x-axis we have the orientation base and there we have the sums of the magnitudes of all gradients in the bit
referring to this 4 by 4 pixels tile and we have one for the next one and for the next one we stack all of these orientation histograms on top of each other 16 of them, in this order and then we normalize the whole thing so we drive to the next
so that we can see we do this for every point we extract from the first image we do this for every point we extract from the second image then we build the spatial index from the descriptors of the first image and then we take every description from the second image and search for the first image
we also take the instance to the second nearest neighbor because it is easier if you search for the second there is a large instance in the nearest point in the second image
the nearest neighbor in the feature space is the nearest neighbor in the point in this space having a hundred and twenty eight dimensions so it's the feature space of the descriptors
that is the same point in the other images but we can't be sure
and one indication that it's a wrong match is that there is another point which is just a singular but only a bit less similar what you typically do is a complete ratio in these cases the second largest point and the largest point and if that ratio is
let's say only 0.9 but then you say okay no let's say there are no trusses if this ratio is somehow 0.5 one is twice as far away as the other so you can't be pretty sure about these features so if you then use this point
you always have to take into account may still have alpha and may still have wrong matches and then have to be eliminated using some other way we estimate we estimate geometrically trans-semination points since the loop feature matches the state
to some seconds what happens here is that the parameters of the image the size of the image doesn't divide
it's indivisible by 16 and alpha is 0.9 what is the cell-only histograms we'll be able to know? I'm not correct here it's only 16 it's only 16, obviously but if the image is, for example,
27 it's quite well it teaches you to take using a camera and be always setting the data so it's always needed by yeah, if your image is so small I mean, imagine but it's not necessarily small
it doesn't divide into in fact, tiles without their makeup you can always do that you have a certain image and this is your extracted point so you always take this the only case where this
wouldn't fit is if you point the idea but this is the case you exclude or if your image were too small such that your neighborhood would be larger but then I would argue it doesn't make sense to connect these images anyway because what you see is 16 by 16 exactly nothing
more questions? so what are applications of point extraction? well, point optimization of course, image orientation
that is computing the alignment of two images with respect to each other that we want to identify from the points determination of parallax is what's said but that is what we're picking up two images, we want to find out which points belong together so that we can then compute these three-D object coordinates
of the point expression intersection parallax is the difference in the x coordinates which can be translated to the x or business from the camera and in general image matching so image matching means we want to find conjugate points between pairs of images
the requirements depend on the application sometimes geometrically accuracy is very important localization accuracy is uniqueness or rarity it's always problematic if we have points or patterns that occur many many times in the image
imagine zebra crossing if you extract a corner of the zebra crossing the next corner of the same zebra crossing is the same and the zebra crossing is the corner of the zebra crossing you can use this further on also looks exactly the same these are not good points having said that, well how can you differentiate
not really? so we will have to deal with the fact that we get wrong matches so it's a bit of Facebook peeping penouts and in general we want to have invariants against geometric and radiometric distortions now sift against which of these distortions is it invariant? well it is invariant
to some unique to radiometric changes so if the brightness changes well that's covered by the fact that we only use gradients so the brightness will cancel out the brightness change will cancel out contrast changes well that's kind of partly compensated by its normalization the feature vector other than that, well you can have very
large radiometric differences it's just people try to connect it's just taking time it's just taking time it's hopeless there are other methods for which you can achieve this but you can't do this properly it doesn't work at all what about geometric distortion?
well it's rotation invariant it's scaling invariant to some degree so the experience shows that it's scale difference it can't be very large then it kind of doesn't work anymore very well but it is scaling invariant to some degree is it affine invariant?
it is affine invariant in the expanded version of the ASIC version where we just have the ASIC detector and then of course you remember where you simulate images taken from different viewing directions and then of course if you extract
your SIF descriptors from the simulated images it is also representative for a specific viewing app and this means that this variant would be this variant of the SIF descriptor would be affine invariant but the standard SIF case is not affine invariant
and we hear very different viewing directions you can see that the number of matches you get between your SIF goes down I mentioned Jane Bing in the morning out with her former PhD student we used Oblique Aerial Cameras for this experiment he was working
affine invariant descriptor matching all descriptors for language matching and there are aerial camera systems out there where you do not probably ever take the view of views this is used in the vertical viewing direction but you typically have five cameras, one looking downwards two looking sidewards with an angle of
say 30.5 degrees and you have two looking in forward and backward equal directions and there you have of course very different viewing directions which show that using affine invariant descriptors you get a lot more matches which means that if you give out geometrical parameters
they are much smoother the estimation process is much more stable the quality of the parameters is much better well, there is a test I mean the test we made some time ago I have to say there is a colleague
calling from Italy Roberto from Italy he compared different point extractors and descriptors for matching he concluded that single point detectors in particular the first operator provide a higher accuracy if the images are suitable
so that means the state difference between images is not very large then of course you can use the single point detectors to find points in the suitable for matching and then if you do so the accuracy is higher the accuracy of these points is higher the accuracy of these points is higher the accuracy of these points is better
the most accurate one was the first detector but it's relatively slow because you have this estimation procedure of the sub-pixel positions of the points he also compared affine invariant detectors
of course if you use an affine or a state invariant detector you can also connect images which are not taken from the CPU in the same human direction using the same scale which you can't do very well if you use the first detector then you still have the kind of limited accuracy but you can improve this if you apply these squares
so if you have established the relation between corresponding points you can then take local image patches initialize the transformation the node orientation difference and then you can locally adapt the transformed image patch
which has now been corrected for these large changes in the geometry already and you can achieve a fine localization of the point in the other image and you can actually improve the geometrical quality to a level in which you can achieve using the single point detectors but you could do so in a more typical scenario
in which you could apply these point detectors now all of these tests were based on images that were suited for the single point detectors so it's a bit difficult to generalize what would have happened if they had had strong scale differences and they wouldn't have hit the problem
or the results would have looked differently then this would have worked in the first place and that's pretty awesome so simple point detectors do that work well with just having different resolutions and large differences in view directions
for view we can use skin-in-the-air and the metal-in-the-air detectors sift features turned out to be very suitable so you could say there is a pre-sift area and a post-sift area in terms of point-based matching it was really a major step forward in my paper
we briefly mentioned SURF SURF is based on similar principles but it always uses moving average filters which it computes in a very efficient way and the descriptor also uses features that can be computed with a very efficient way
so it's much faster to compute it and this is still a pretty active field of research so other detectors and descriptors are still being developed currently the focus is on learning extractors and descriptors from training samples
and here of course everybody who is in the field who works in the team all uses deep learning approaches which are always excellent well, in this game space analysis
you can see both the images they use moving average filters and if you compute this in a very efficient way you just can do all the calculations faster we also use the nation detector
and not the dog detector in different scale levels but other than that the whole thing is really a bit I mean, I think that modulation works and see if that's like simplified you're all doing away with all the stuff that's required by theory such that we can make it faster
that's a good direction or there is something similar to that but the features are kind of computed in based on in a very efficient way based on
so it's the same principle but all the calculations are kind of simplified so that they get faster it's still gradient orientation but the gradients in different scales are computed in a different way so that's what I wanted to say about Point
extractors