Postprocessing - TIB AV-Portal

Postprocessing

00:00

12

Leibniz Universität Hannover (LUH), Institut für Photogrammetrie und GeoInformation (IPI)

Projekt 'Open Education Resources with Ukraine'

Rottensteiner, Franz

Formale Metadaten

Titel

Postprocessing

Serientitel

Anzahl der Teile

21

Autor

Rottensteiner, Franz

Lizenz

CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/59674 (DOI)

Herausgeber

Leibniz Universität Hannover (LUH), Institut für Photogrammetrie und GeoInformation (IPI)

Projekt 'Open Education Resources with Ukraine'

Erscheinungsjahr

Sprache

Produzent

Heipke, Christian

Inhaltliche Metadaten

Fachgebiet	Informatik Geowissenschaften / Geographie
Genre	Vorlesung

Image analysis21 / 21

1

1:31:44

Image analysis - Introduction

2

43:56

Acquisition and Preprocessing - Part I

3

1:30:39

Acquisition and Preprocessing - Part II

4

12:41

Acquisition and Preprocessing - Part III

5

28:34

Scale Space - Part I

6

1:01:03

Scale Space - Part II

7

29:39

Point Segmentation - Part I

8

48:40

Point Segmentation - Part II

9

1:18:11

Point Segmentation - Part III

10

31:15

Point Segmentation - Part IV

11

09:15

Edge Segmentation - Part I

12

54:40

Edge Segmentation - Part II

13

36:37

Region Segmentation I - Part I

14

54:26

Region Segmentation I - Part II

15

1:29:48

Region Segmentation II - Part I

16

05:50

Region Segmentation II - Part II

17

34:57

Feature Extraction - Part I

18

11:03

Feature Extraction - Part II

19

1:16:55

Active Contours - Part I

20

42:09

Active Contours - Part II

21

1:04:18

Automatisches Abspielen

Sprache

Text

Bild

00:00

DatenmodellInformationBildgebendes VerfahrenInformationt-TestAbstraktionsebeneProzess <Informatik>Generator <Informatik>Endliche ModelltheorieCase-ModdingResultanteObjekt <Kategorie>RichtungMultiplikationsoperatorBilderkennungGruppenoperationWertevorratFamilie <Mathematik>EliminationsverfahrenGüte der AnpassungRechenschieberMereologieTermSelbstrepräsentationStrategisches SpielTrennschärfe <Statistik>GrenzschichtablösungKonfiguration <Informatik>AlgorithmusSnake <Bildverarbeitung>TaskGeradeVektorraumSchlussregelNummernsystemEinsKontrast <Statistik>WinkelMetropolitan area networkAbschattungEbener GraphComputeranimation

07:57

Selbst organisierendes SystemMinimumDatenmodellKonfiguration <Informatik>EliminationsverfahrenOperations ResearchBestimmtheitsmaßDatenstrukturSpezialrechnerVollständigkeitInhalt <Mathematik>SoftwareentwicklerSummengleichungTotal <Mathematik>MereologieIdeal <Mathematik>Visuelles SystemPhysikalisches SystemKonvexe MengeAbstandNichtunterscheidbarkeitObjekt <Kategorie>Stetige FunktionOrientierung <Mathematik>Physikalische TheorieWort <Informatik>Leistung <Physik>ResultanteOrientierung <Mathematik>InstantiierungStrategisches SpielGruppenoperationCASE <Informatik>Objekt <Kategorie>PunktGeradeEndliche ModelltheorieSchlussregelDivergente ReiheSummengleichungFigurierte ZahlLeistungsbewertungFunktionalEinfach zusammenhängender RaumSystemaufrufInformationSchnittmengeProgrammierungStatistikAnalytische FortsetzungGraphfärbungGrenzschichtablösungWertevorratBildverarbeitungThermodynamisches GleichgewichtProzess <Informatik>DialektGamecontrollerSymmetrische MatrixKrümmungsmaßKonvexe MengeShape <Informatik>Konfiguration <Informatik>UmwandlungsenthalpieEinsComputeranimation

15:55

Konfiguration <Informatik>AbstandNichtunterscheidbarkeitObjekt <Kategorie>Stetige FunktionDatenmodellVektorpotenzialInformationGeradeSelbst organisierendes SystemGrenzschichtablösungPunktSpezialrechnerWinkelAbstandProzess <Informatik>GeradeKreisbewegungBildschirmmaskeAbgeschlossene MengeBildgebendes VerfahrenInformationUmwandlungsenthalpieRechter WinkelObjekt <Kategorie>Lokales MinimumEndliche ModelltheorieSchlussregelAutomatische HandlungsplanungResultantePuffer <Netzplantechnik>Einfach zusammenhängender RaumTaskPunktGebäude <Mathematik>Parallele SchnittstelleShape <Informatik>NebenbedingungGüte der AnpassungRichtungTypentheorieRandwertBasis <Mathematik>GruppenoperationAbschattungDatenstrukturMaßerweiterungCASE <Informatik>SelbstrepräsentationDelisches ProblemZahlenbereichKartesische KoordinatenDigitale PhotographieOrdnung <Mathematik>PerspektiveRechteckStatistische HypotheseComputeranimation

23:52

GruppoidMinimumSpezialrechnerDatenstrukturLineare AbbildungParallele SchnittstelleKontrast <Statistik>AnalysisVollständigkeitEndliche ModelltheorieBildschirmmaskePolygonBildgebendes VerfahrenAutorisierungNichtlinearer OperatorDatenstrukturGebäude <Mathematik>ResultanteComputeranimationTechnische Zeichnung

25:28

Inverser LimesObjekt <Kategorie>BildsegmentierungSpezialrechnerSpannweite <Stochastik>FlächeninhaltEinfach zusammenhängender RaumMusterspracheSchwellwertverfahrenZahlenbereichAlgorithmusBitPunktSkalarproduktMathematische MorphologieQR-CodeComputeranimation

27:38

Element <Gruppentheorie>ZeitdilatationDatenstrukturNichtlinearer OperatorSpezialrechnerBildsegmentierungSpannweite <Stochastik>Objekt <Kategorie>ResultanteMathematische MorphologiePixelInhalt <Mathematik>FlächeninhaltNichtlinearer OperatorElement <Gruppentheorie>GeradeDatenstrukturKreisflächeSchaltnetzBildgebendes VerfahrenObjekt <Kategorie>QuadratzahlCASE <Informatik>DickeRichtungFunktion <Mathematik>OrtsoperatorPaarvergleichWhiteboardBinärbildPunktArithmetisches MittelComputeranimation

32:09

Objekt <Kategorie>SpezialrechnerBildsegmentierungSpannweite <Stochastik>Nichtlinearer OperatorKlon <Mathematik>ZeitdilatationElement <Gruppentheorie>DatenstrukturZeitdilatationAbgeschlossene MengeKartesische KoordinatenCliquenweiteElement <Gruppentheorie>MeterShape <Informatik>Objekt <Kategorie>SatellitensystemMereologieDatenstrukturNichtlinearer OperatorRadiusLokales MinimumOffene MengeWeb-SeiteResultanteEndliche Modelltheoriep-BlockSpannweite <Stochastik>QuadratzahlBildgebendes VerfahrenGebäude <Mathematik>Prozess <Informatik>KreisflächeEinfach zusammenhängender RaumMathematische MorphologieBinärcodeMultiplikationsoperatorParametersystemFehlermeldungFunktion <Mathematik>MAPMinkowski-MetrikGeradePixelTorsionDialektHeegaard-ZerlegungLoginRichtungOrdnung <Mathematik>RandwertBinärbildKontrast <Statistik>FlächeninhaltSnake <Bildverarbeitung>Interrupt <Informatik>EliminationsverfahrenComputeranimationFlussdiagramm

41:30

SpezialrechnerBildsegmentierungObjekt <Kategorie>Spannweite <Stochastik>ZeitdilatationElementargeometrieNebenbedingungDigitalsignalGerichtete MengeEndliche ModelltheorieBestimmtheitsmaßMultifunktionRadikal <Mathematik>RandwertKurvenanpassungRelation <Informatik>GleichheitszeichenMultikollinearitätPolygonApproximationNachbarschaft <Mathematik>StatistikSoftwaretestKeilförmige AnordnungEbeneSkalarproduktCASE <Informatik>PixelDatenstrukturElement <Gruppentheorie>RelativitätstheorieResultanteDrucksondierungMereologieGebäude <Mathematik>GeradeAutomatische HandlungsplanungEbener GraphBildgebendes VerfahrenObjekt <Kategorie>PolygonPunktVerzerrungstensorGruppenoperationWinkelProzess <Informatik>Minkowski-MetrikParallele SchnittstelleGraphMultifunktionMinimalgradTransformation <Mathematik>RandwertDigitale PhotographiePerspektiveWurzel <Mathematik>Divergente ReiheNebenbedingungEndliche ModelltheorieInformationGarbentheorieDeterminanteFehlermeldungStatistische HypotheseMultiplikationsoperatorKurvenanpassungInstantiierungRichtungAusnahmebehandlungFlächentheorieWertevorratComputeranimation

49:29

PolygonApproximationNachbarschaft <Mathematik>Relation <Informatik>StatistikSoftwaretestElementargeometrieKeilförmige AnordnungEbeneInformationInverser LimesGebäude <Mathematik>GraphDatenstrukturAutomatische IndexierungKrümmungsmaßStichprobePixelRandwertAutomatische HandlungsplanungRohdatenResultanteDreieckGreen-FunktionSchlussregelNachbarschaft <Mathematik>DialektDiagrammApproximationPolygonGrenzschichtablösungGraphRelativitätstheoriePunktZusammenhängender GraphFitnessfunktionEndliche ModelltheorieProzess <Informatik>MultikollinearitätStatistische HypotheseGruppenoperationBildgebendes VerfahrenEinfach zusammenhängender RaumTypentheorieAbstraktionsebeneInterpretiererGebäude <Mathematik>Ein-AusgabeZweiWiderspruchsfreiheitHeegaard-ZerlegungEntscheidungstheorieOrtsoperatorKartesische KoordinatenSignifikanztestGeradeNummernsystemComputeranimation

56:48

InformationGebäude <Mathematik>StichprobeAutomatische IndexierungKrümmungsmaßGraphDatenstrukturDatenmodellBildgebendes VerfahrenAnfangswertproblemGebäude <Mathematik>Endliche ModelltheoriePunktInhalt <Mathematik>DreieckEinfach zusammenhängender RaumObjekt <Kategorie>Element <Gruppentheorie>MultigraphKnotenmengeGraphMereologieAlgorithmische ProgrammierspracheGruppenoperationProjektive EbeneGeradeParametersystemMatchingHilfesystemApproximationStatistische HypothesePrimitive <Informatik>Shape <Informatik>ProzessautomationProzess <Informatik>SelbstrepräsentationFitnessfunktionDatensatzOrtsoperatorDreiEinsResultanteEinflussgrößeZusammenhängender GraphKontrast <Statistik>SichtenkonzeptNachbarschaft <Mathematik>RelativitätstheorieBildschirmmaskeMultifunktionComputeranimation

01:04:07

DatenmodellComputeranimation

Transkript: Englisch(automatisch erzeugt)

00:10

Hello, good morning everybody. Welcome to the morning session of image analysis one.

00:23

And you've seen this slide a couple of times already. It kind of tells us where we are in the biggest scheme of things. As far as model-based image analysis is concerned, we are still in this region where we discuss segmentation,

00:41

where we extract features from the images to get an abstract image representation. We are in this planning, we call the liberal-able processes. We have seen quite a few segmentation algorithms in the last couple of weeks. Last week we discussed a refinement based on the SNAKES algorithm.

01:03

And now it's about post-processing. The problem we have when we apply some kind of segmentation independently from each algorithm we use independently from what we want to extract points, edges, regions,

01:20

we will never exactly get what we want. Some things we would like to have will always be missing. Some things that are there are completely irrelevant for our task. For instance, let's assume my goal is to reconstruct this house in 3D as a vector model.

01:44

And I want to do so using edges. To do 3D, I would need a second picture anyway. But let's take one step back and say, okay, now I'm gonna apply an edge detector, and these are the edges I get. Which are the edges that would be relevant

02:01

for my reconstruction task. It would be all the edges that define roof planes, or planar segments of the roof. Do I get them? Yes. I get this one, I get that one, two, yeah.

02:21

Do I get all of them? No. Here I have locally some cool local contrasts. You see some of the edges couldn't be found. Here I also have some local contrasts. So I do not get all of the edges. Some of them, some of them I only get a small part. And some might be completely missing.

02:40

Here I also only get half of the entire edge I would be interested in. These ones related to the second half of the dome are completely missing. So some of the edges that would be relevant for my task are not there.

03:03

Are all of the edges I get relevant? No. These ones, well, they are edges of the shade that's caused by the house. And they are completely irrelevant to the reconstruction problem. Unless I want to use it to infer 3D,

03:22

which one can do. I call it the poor man's 3D. If I can't afford the second image, I might be able to do something with the shadow if I know the angle in which the sun, I know the direction to the sun, but this is something people most of the time don't do anymore.

03:41

So it's always problematic the result I get from edge extraction will always contain a lot of irrelevant information and it will not contain all the information I would be interested in. And the fact that people, when they started with work on 3D reconstruction from images or object detection

04:01

from images, well, in the 1960s, when people started with work on that, they thought it would be the job of one or two generations of doctoral students, of PhD students. This was a gross underestimation of the problem. And what caused this gross underestimation

04:21

was the fact that these people didn't take into account this very problem, that there is always uncertainty related to what you extract from the images. So the extracted segments are usually incomplete. We'd like to compare our abstract image representation,

04:45

the regional adjacency graph, the future adjacency graph, whatever, with the mod. But we extracted segments that are usually incomplete at the same time, not too many of them. So we have a lot of irrelevant information. We have to find out what's relevant and what's not relevant. And it's quite common after the segmentation

05:04

to do some post-processing to improve the results of the segmentation to make it easier to compare the data in terms of the segmentation results in terms of the synthetic image representation with what's predicted by the mod.

05:20

This is supposed to increase robustness. And the typical strategy, strategy really is this grouping. What does grouping mean? You group features you have extracted to combine them based on some criteria. So grouping is not the only possible way of post-processing the segmentation results,

05:41

but it's one family of methods, so to speak, that's used very often. Incomplete segmentation, but what's that?

06:01

How come you can tell? Most of the information is missing. Well, our brain is pretty good at finding out, okay, there should actually be a connection here. So our brain is pretty good at grouping. This is what's missing.

06:22

In those cases, we can infer what the object is. Actually, we have less information than we have here, but it contains the more important part. And we still can infer, if you see this, that this is probably a wine glass. But can we do something like this for the computer as well?

06:40

What are the rules that make our brain know that we have to connect these lines here? Others are true, now we're up. And this is one family, this will lead to one family of grouping algorithms. So there are several options for post-processing,

07:03

feature selection or elimination. So we just select the features that are interesting for us based on some model already. We can have 2D feature grouping, so we apply grouping in image space based on some criteria or we may apply grouping in object space, which has some pros and cons, we shall discuss this.

07:23

Post-processing of the results of region extractors, well, that's very often based on morphological operations, which is also something that we discuss and that's typically applied in 2D as well. In general, we can use bottom-up and top-down.

07:40

Methods for grouping top-down, there we include model knowledge. We kind of do already consider the assumptions about our model to group relevant features and also to select the features that are more likely to correspond to object parts than other. Or bottom-up, there we basically do not want to use

08:00

such model knowledge, but we try to come up with some generic rules for how to group features that do not require a specific model or model knowledge at all. This leads to a perceptual organization or perceptual group. And we'll have a look at some of the examples for these strategies in the follow-up.

08:21

And let's start with bottom-up processing in 2D. So we are in 2D, we want to work in image space. We want to group results from a feature extractor in image space. We always have to find an equilibrium

08:43

between an over-segmentation and an under-segmentation. What's an under-segmentation? An under-segmentation is a situation where we miss many segments that would be relevant for us. If it's about regions, well, our regions are too large to straddle multiple objects.

09:02

If it's about edges, well, many important edges that corresponds to object boundaries, and this is the typical assumption if you work with edges, are not contained in the extractor itself. Over-segmentation, well, here the relevant information is there, but there is also an abundance

09:22

of irrelevant information. We've just extracted too much. Many of the extracted regions are noisy. This hinders, but this makes it difficult to compare the model with the results of the segmentation because there's just too much that is not really considered in the model anymore because it's finally there, so.

09:40

So we typically try to find and make delivery between these two extreme cases where we get most of the information we are interested in without getting too much noise. It will never be possible to get a perfect result. But can we move on from here? Can we complete incomplete and incorrect segmentation results?

10:00

And well, there is some strategies for doing so. And the very famous one is perceptual grouping. It goes back to the so-called Gestalt theory in psychology, Gestalt, that's a German word that means shape, but the whole thing is still called Gestalt's theory in English

10:22

because, well, the people who invented this theory were German psychologists in the 1930s or the 1920s Professor Wertheimer. And so it was really the Gestalt's school of psychology. It was, the idea was, well,

10:42

or the main question these psychologists asked themselves was how do people perceive the environment? What are the fundamentals of human perception? How do humans perceive the environment? Why can a human know that when he or she sees

11:02

this picture here, that this is a bypass, even though most of the powers of the drawing are missing? And they came up with some psychological experiments where they had a relatively large group of people, groups of people, and they let them do this very thing, gave them incomplete pictures and let them complete this.

11:23

And then they evaluated this statistically and they found out that most humans do this in a very similar way. And they came up with this, with some rules that are based on this statistical evaluation of experiments with humans,

11:43

with the way in which these humans would complete incomplete figures. And people have tried to convert or to apply these principles that were found by this group of psychologists

12:00

also to image processing and segmentation, to complete incomplete or incorrect segmentation results. The basic idea is we try to go for something that's considered to be most regular, sorted, stable, and balanced. And this is also what humans do.

12:22

So these are these rules. And these rules are called the Stalz principles. For instance, the psychologist found out that new things are perceived as belonging together. For instance, if you have a series of points and if you present these to the human,

12:43

how would he or she connect these points? Well, it would connect them like this because these points here are nearer to each other than these personal points. So proximity, translated to compute the programs. This means, well, if you want to complete something

13:02

by connecting some edges, for instance, you would rather connect the edges that are close to each other than edges that are made from each other, pretty clear, isn't it? Similarity, so objects that are identical or almost identical are perceived as belonging together.

13:20

So if you have these six points here, a set of white points and a set of dark points, then most people would connect points having the same color because they are similar. Continuity, among several possibilities, the most simple and regular option for grouping is preferred. For instance, if you have this series of points,

13:42

it kind of looks arbitrary if you want to connect them, most people connect them like this because it gives you the most continuous kind of connections between points. You do not have sharp edges. It also connected like this or like this, but this is not as continuous as this one.

14:04

Similar here means, I mean, there's edges or how would you call them? Similar? No, not similar. They're more simple and regular. Yeah, yeah, yeah. So you can consider this to be like, well, a function that has small curvature.

14:28

Convexity, so if you have something like this, most people would connect these points to form a convex control. Symmetry, so if you have something like this,

14:41

most people would connect as such that you have a symmetric shape afterwards. Objects having the same orientation are typically blue and also small objects are more frequently considered to be backgrounds, so if you have larger and smaller objects, most people would focus on the bigger ones.

15:03

So these are rules that were kind of found out as psychologists. This is the way how humans deal with such kind of arbitrary shapes to connect them, kind of trying to make sense of the world. And all of them can be translated into some rules

15:20

that can be applied to extracted edges, extracted points if you want to group them, if you want. For instance, if you have small edge leads which you want to connect to form longer lines, you would, for instance, look at orientation and proximity, you would, if you had lines like this that have approximately the same orientation

15:41

that are close to each other, you would tend to connect them. If they are like this, you wouldn't, and if they are like this, you wouldn't bother, for instance. Okay. Any questions? So these are the famous Pestales quids. A colleague of mine actually wrote a book about that.

16:01

When was it? Three years ago. The book was published three years ago. So it's gone a bit out of fashion, but there are still people who were in this field. Right. So we are still in this mid-level processes.

16:21

The goal is, the goal of perceptual organizations, of course, to apply grouping without the use of specific object knowledge. So we want to find some rules for grouping that are thinkably independent of the object. And of course we have found that these are the rules. Humans work like this. Humans do a pretty good job of perceiving the world.

16:43

So then you have this as well. Let's just apply them to our problems as well and see where they go. Now, if we apply post-processing on the basis of these Pestales principles, we speak about perceptual grouping and perceptual organization.

17:02

It's about separating foreground and background without specifying which object is concerned. It's about simplifying the segmentation results by eliminating some stuff, by combining other stuff such that we get something that's already a bit closer

17:21

to what we expect the results to be in an ideal image. So the idea is that we do not want to,

17:40

or the main principle here is we do not want to have information about specific object types. This should be applicable under all circumstances. When reality shows, you believe, we will still use some information about the objects to make things faster. The features do not only contain the points,

18:00

edges or regions, but also more valuable information. So then we have knowledge about straightness, parallelism, specific shapes that can be applied. In reality, we typically apply some model knowledge because there is no grouping methods that really provides good results for possible applications.

18:22

Here are just some examples, which are of course heavily based on object knowledge in this case. So here, the example is about extracting, extracting buildings from aerial images. So we have two aerial images of the same building.

18:42

Building looks slightly different in these two images because the images were taken from two different points. And well, we apply an edge extractor to both of these images and this is what we get. As we see the typical thing, we get quite a few of the edges that are relevant for us.

19:02

Edges related to boundaries of roof planes, but we also get a lot of unnecessary stuff. We get all of the edges of the shadow. We do get these parallel pairs of edges that come from small superstructures on the roof

19:22

that have a certain extent. And here, well, these parallel lines, for instance, well, they're just kind of almost double number of edges we have without adding much to our problem. So what could be done first? We could actually think about, okay, if we have pairs of edges that are parallel

19:44

and which kind of overlap, let's just forget about the second one. Let's just represent them on one single edge. And if we do that, if we find pairs of edges that have the same orientation, remember that was one of the Gestalt's principles.

20:02

They are close to each other, yet another Gestalt's principle. Then we can just say, okay, if this is the case, we just remove one of them and just let one of them survive. And in this way, we get from here to here where we have a much simpler representation of the image,

20:22

where all of these parallel lines that are close to each other, which will lead to competing hypotheses about what belongs together, what is relevant, then they are eliminated. It's a much simpler problem starting from here. This could already be seen as some kind of application of Gestalt's principle

20:43

or it is an application of Gestalt's principle. For that purpose, or in order to do that, we do not even know object knowledge really. And then when we could further look for relevant image structures here, and here it's now already about defining which of the extracted edges may be relevant

21:03

for our problem and which are not relevant. So starting from now, we will use modern knowledge. What's the modern knowledge? It's about buildings. We could assume the building to mostly consist of red angles. My thing is the photograph is a perspective.

21:23

So actually a rectangle, do you know if you rectangle? Well, but if you have a neonatal view, roof planes are almost parallel to image plane, and then rectangles will almost be rectangles, or they will be almost parallel to rectangles.

21:44

So, okay. So what can we tell? Well, we might assume that if we have a roof plane, pairs of its edges will be parallel to each other, and they will be at some min and max distance

22:01

from each other. We can use this to kind of find pairs of edges, of straight edges that are almost parallel to each other. So you would compare the rotation angles to the threshold, for the distance of the rotation angles to the threshold, and which end kind of in some kind of buffer,

22:22

which have kind of a similar starting point, such that you can combine them to form something that's close directed. And if you do that, well, starting from these two images, you come up with these. So you can select these edges here, and here you always have pairs of edges

22:42

which kind of fulfill the constraint that they are almost parallel, and they kind of start in a similar region along the direction of the edge. So this is the first step. And then we can move on and further look,

23:02

well, if we have such a parallel lens, do we actually find some edges in this buffer and kind of this, define about near in the vicinity of the connection of the two end points with the three of the width? If we have, then we actually can form

23:21

U-shaped structures consisting of crick-like surfaces. And they can, and here we have all of these U-shaped structures. Many of them will already corresponds to roof plates or roof plane segments. Some of them will not, but it's much easier to now use this result

23:46

for actually finding out each of these U-structures are relevant for my task. There it is, oops, starting from here, right? So by grouping first pairs of edges

24:01

and then triplets to form these U-shapes, we have come up with a result that is much easier to interpret and to compare against the model. Why is it about U-shapes? Why don't we, why don't the authors look for complete correct evidence? Well, first of all, every pair of edges may actually be contained in two U-shapes anyway,

24:21

which if you combine them yet again, form something like a closed polygon. So this would be the next step. On the other hand, by using, I hope this is in U-shapes, you take into account the fact that, well, if you have such a four-three lateral, well, one edge may always be missing

24:41

due to poor local contrast. And then the missing edge can be just be used from the remains, from the remaining analysis. And this way, well, from the raw segments, you actually come up with something that's already pretty close to a kind of a structures.

25:02

From there, it's much easier to reconstruct the building. And if you have a pair of images to be construction, then you can easily do it. So it's just an example, pretty old one, by the way, but it still shows the principle there in the last slide. Any questions?

25:27

Post-processing to demobilogical operations. If we apply region-based segmentation, very often you have the problem, again, that the segmentation is not perfect. This is an example from the book by Professor Gubman, an old book.

25:43

Let's assume we have this photograph here. We apply a simple thresholding, thresholding algorithm to extract these bright points here. And what we see here is actually a target. The target consists of some small reflective areas

26:02

and the pattern of these reflective areas includes the target name. It's a bit like a predecessor of a QR code. So if you can extract all of these bright dots and then check their mutual arrangement,

26:21

you can infer which target you see. Is it target number nine? Is it target number one? Is it target number five? Problem is, well, we have some kind of blooming, so we do not only have bright areas where we have to see the spots, but some of them are actually connected. And if you apply a thresholding algorithm,

26:40

well, the problem is that if you now try to find connected components of foreground objects, some pairs of really bright dots are connected. And this would be our arrangement of foreground objects, which is different from the real one

27:01

and we couldn't read the target number here, so to speak, we couldn't determine the target number. So again, we have to post process this. What would be like the G theater would like to, kind of, get you okay with this connection between these two drawers, essentially everything, can you get rid of it?

27:21

Can you get rid of all of these isolated things here, which are obviously artifacts, the two bigger? Can we also get rid of this connection, this one? And the answer is yes, we can, and the way we do this is we require morphological operations. In this case, this result being a binary image,

27:43

after thresholding, we apply binary morphological operations. So what do we need? We need to define a structure element, which is essentially a small binary image with an anchor point. And typically all the elements in small image

28:03

will be set to one, but it could also be some panel. We could also use a circle here, but the most common thing is to have a square of silent S. And the first operation we have is we shift our structure element over the image,

28:22

pixel by pixel. At every pixel, we compare the contents of the structure element with the contents of the picture. And if inside the area of the structure element, there is at least one pixel where we have the one in the structure element and the one in the image,

28:42

then the result of the operation at this position is one. And the result is written into the output image as the position indicated by the anchor point. And this operation is called evaluation. It's based on a comparison of the gray values of the structure element and the image.

29:02

The question we ask is, is there a one in both images at this position? And all of these questions are combined by logical OR to define the output. So if there is at least one pixel for which the answer to this question is yes, they are identical, then the output of the entire operation at that position will be one.

29:22

What does this mean? If we have a foreground object, which is a square of size D, and back here we will get that one so that we don't get black. Well, if we apply the relation, the size of the foreground is enlarged. And if the anchor point is in the center

29:42

of the structure element, and this is usually the case, it will be enlarged by half the size of the structure element in all directions. Okay, so this is the violation. It violates the program, it makes the program harder.

30:03

So this is the first morphological operation. The second one is erosion. So it's the same setup. We have a structure element. We think of the structure element with this example to consist of all boards. This is one binary digital image. We shift it over the image.

30:22

And now again, we do this comparison. These comparisons for every pixel of the structure element that they will position. And if there is at least one pixel where we have a one in the structure element and the zero in the image of our swirter, then the result is zero. This also means that the result of this operation

30:43

is only one if every pixel having a value of one in the structure element coincides with the pixel having a one in the input image. If there is at least one pixel, then this is not the case, then the result of this position is zero. And again, the result is written in the output image

31:00

at the position of the anchor point. What does this mean for a square object of a certain size? Well, if you apply an erosion with this structure element and the assumption here is that all of the pixels of this structure element have a value of one. Now, if we apply an erosion to this foreground object here,

31:24

it will be smaller afterwards. It will be shrunk by half the side length of the structure element in all directions. So the foreground is made smaller, it's eroded.

31:41

Okay, so that's the second approach. Any questions? Well, what we really use is a combination of the two to solve the problem I just mentioned. So let's assume we have this foreground object here. We see that actually, well,

32:01

probably we have two foreground objects that are connected by a thin line of foreground pixels. So this would be exactly this case here. Well, what can we do? Of course, we would like to preserve the shape of these foreground objects here.

32:21

So what we do is we first apply an erosion. In this case, using a circular structure element. And what will this erosion do? Well, it will make the foreground object smaller by the radius of the circle. So from this original shape, we get this one. And we see that in this erosion process,

32:42

the small kind of connection between the two main object parts is completely eliminated. So whereas here we have one foreground object, then after the erosion, we have two of them. These two blocks have been separated. And this is exactly what we needed for it. The other thing is that these extrusions

33:03

of one of these blocks have also been eliminated. But of course, the foreground is much smaller than it was here. So we want to go back to the original size of what we needed next here by dilation, using the same structural element. So we have the erosion result. Now we apply the dilation,

33:21

which extends the size of the foreground objects after the erosion. And we end up with this result of the process where we now have these two object parts separated. And we have kind of simplified the shape because the small extrusions have been eliminated.

33:42

And this operation that we first applied the erosion and then applied dilation called morphological opening or binary morphological opening if it's applied to binary images. And the size of the structural element for the erosion and dilation here are not the same.

34:01

Oh, they are the same. They're the same? It's actually the same structural element. It's just moving. The structural element for the erosion has the same size as the other thing? Yeah, they're not the same. Yeah, this is exactly what we want. We want to separate these two blocks.

34:22

No, I mean, after you did the separate. And after we applied the structural element and the dilation, and then they are not they can get their time with each other. As you see, they're not. As you see, they're not. Now, an important parameter is, of course,

34:40

the size of the structural element. And it has to be adapted kind of to the width of such connections you still consider to be a relative, right? So if two blocks are connected by a thin line of pixels how thin is thin, so to speak?

35:02

Until which size do you consider this connection to be thin and actually due to a segmentation error? And at which stage is it so wide that it's actually no longer a segmentation error but it's an output of the segmentation process? Well, and of course, this depends on model knowledge.

35:20

Let's assume this were building extraction. Well, if this is a meter in object space it is probably not a part of the building because we can't squeeze ourselves through some kind of ale that only has a width of a meter. What would be the minimum width

35:42

you expect for a house part? That's two meter, that's three meters. And this would kind of define how large you select the size of the structure element, okay? So we can choose this size very well

36:01

if you have an idea about the pixel size. In remote sensing, we always have this knowledge. We might not have it in close range applications because the object of people half as far away will be much more than when it's clear closer to the camera. In the aerial case, if you have satellite images if you have aeronautics, you usually notice, okay?

36:26

Any questions or more questions? The shape of the sector in England will also depend on model. Yeah, I mean, it's actually quite uncommon to use a circle or it's more common to use a square.

36:45

A square of a single side length. And then the size of that square would be adapted to using the considerations I just gave. Okay, more questions?

37:10

Good question. Well, what would fly the car off? Yes, the short answer is I don't know.

37:24

But we'll see another command operation in a minute where it's pretty clear why it's called closing. And perhaps closing was first and then the separate pages is kind of simply like reverse. So here we have closing.

37:42

We have the same shape. And now here, opening was first in erosion and then a dilation. Well, in closing it's the other way around. So we start with a dilation. We make our object kind of bigger

38:02

and then we apply a morphological erosion to the result and this is what we get. And this is called the closing. So what does the closing do? Well, it also simplifies the shape. It does not separate connected areas by the contrast.

38:20

If we had had a hole in the foreground object, it would have been closed. If we had had a small area here inside of this block that was considered to be background after the dilation, this hole would have been closed and so after the subsequent dilation, it would also have been closed.

38:41

Here we can see that this kind of happened here with the small intrusion of the object boundary. So closing really closes holes and that's where the name comes from. And perhaps opening was just used to kind of define that it was exactly the opposite direction or the opposite order in which the two basic operations

39:03

were applied. I don't know. Question? So a typical example is we have some segmentation result. Okay, we see we have a big foreground object

39:20

which may be the one we're interested in. It has some irregularities at its boundaries. It has some holes. And we have some small foreground regions that are probably artifacts, classification errors. What do we do? Well, we define a structure element, the size of the structure element will be adapted

39:43

to the size of the foreground objects we expect and also to the size of the holes we expect to be closed. And then we first apply an opening. So we first apply an erosion which will make the holes bigger, which will remove all of these spurious foreground regions

40:04

which will simplify the shape of our region. And then we apply a dilation and what we end up with is something, some kind of foreground region. All of these small erroneous regions happen eliminated.

40:20

If we had had two such regions that were connected by a thin line of pixels, smaller than the size of the structure element where the two regions would have been separated, the contour has been simplified, but the holes are still there.

40:41

So what do we do next? Now we apply torsion and we end up with a region without holes and all these spurious elements have also been removed and so that's the irregularity of the object value. So this is a difficult process.

41:00

We first apply an opening, get rid of all of these artifacts here or remove the small classification errors which simplify the contour, split logs that are connected by a thin line of pixels and then apply torsion to fill these holes

41:22

and then move on with the remaining four world regions. Note the example from earlier. This is what we had. We first applied an erosion that we'll, with a three by three,

41:41

so a structural element of size p to three pixels or pixels set to one. So now all of these dots are separated then we apply the relation and we end up having only what we want in this simple case. Any questions?

42:03

So this is typically what we apply to improve the results of the two segmentation. What about 3D? 3D of course requires image pairs to reconstruct straight lines in 3D,

42:22

curves in 3D, points in 3D, surfaces in 3D. We can directly apply segmentation in 3D data, for instance if we apply segmentation to digital surface models and then of course we have access

42:40

to 3D information directly. Right? But we can always determine 3D segments from 2D segments as error correspondence. And then we can also apply the post-processing in 3D. So what are the advantages of applying post-processing in 3D?

43:02

Well first of all, we have access to stuff that's not, that's not, that doesn't really make sense in 2D. For instance, if we have two planes in object space, we can find out whether they're co-planar or not. This doesn't make sense in image space, right? Because in image space,

43:20

well there is only one plane, image plane. You cannot, the notion of an object plane defined in images doesn't make any sense. But planes only, or working with planes only makes sense in object space. And there we can get some useful information. Are two planes co-planar? Where do the planes intersect?

43:41

And so on and so forth. The second thing that's useful in 3D is that geometrical constraints are not affected by geometrical distortion. So for instance, we may have some constraints that relate to the angles between edges or planes.

44:03

For instance, most buildings have walls that are perpendicular to each other. So the angle between the walls is 90 degrees. Well, if we check in object space, we can compare the actual angle of two planes to a value of 90 degrees.

44:20

No problem. What about image space? Then let's assume our model says the two edges intersect at a given angle. Can we apply this constraint in the image? Well, angles are distorted by perspective transformation. Photographs are perspective images of the 3D world. The angles in the 3D world

44:41

will not be identical to the angles we observe in the image. So actually, these geometrical constraints can not really be exploited. Only in some special cases with this facility. So we have some advantages if we apply root against repeat. There are also some disadvantages, of course.

45:01

The whole thing is computationally more complex. We can no longer work with the grid. Here is a series of ideas or relations between, well, relations we could check in 2D and in 3D.

45:21

For instance, if we have parallel curves in 3D, well, are the curves parallel in 3D space? Not necessarily. If they intersect at a certain angle in 2D space, can we infer that angle in 3D? No, we can't. If they intersect in 2D space,

45:43

can we automatically infer that they intersect in 3D? No, we can't. There is one exception, and this is the exception we usually have in remote sensing when we look downwards. Because then the roof's planes are almost parallel to the image planes,

46:02

and then we can't take such assumptions. For instance, we can consider edges that are parallel in object space to also be parallel in image space, and it does make sense to search for parallel edges. This only works under the assumption

46:21

that the object edges are approximately parallel to the image plane. Imagine an aircraft flies over an area. You look at the leaves, and the ridge line of the beading. Well, they are horizontal. The image plane is horizontal. So yeah, they are approximately parallel,

46:42

so this parallel is indeed be preserved. You can search for parallel edges in image space. It doesn't make sense to do anything related to planes in image space, as I already said. So what can we check in 3D? Well, for instance, and this is from my PhD thesis

47:03

that was done some time ago by the idea host, we explored the feature adjacency graph to find trihedral corners in the image. In the graph, you would look for points with three emanating edges. And you could do this in two pairs of images

47:20

and then reconstruct such trihedral corners as we did by spacing the section. And then if you have such trihedral corners, you could group them. And again, under which circumstances would you group them? Well, the idea is that the emanating edges of such trihedral corners would kind of correspond

47:42

to, well, edges of the roof. And of course, you would group them, for instance, if they were collinear. So if you have pairs of edges of such trihedral corners that kind of look towards each other like collinear, then in this case, they're not collinear. And this is something you can test using some geometrical process.

48:02

Or the same here, this edge here is not collinear with this one, so we probably do not belong together. You can describe a saddleback here. In 3D, you also have the advantage that we can use co-climarity. For instance, this pair of edges here defines a plane. Well, this pair of edges here, whereas these two lines or these two edges are collinear.

48:24

So it could be combined, but you see that the plane defined by this pair of edges here is not collinear with that pair of edges. Perhaps these two trihedral cones shouldn't be combined for the part of such a saddleback.

48:41

This is an example from my own work. It was post-processing of a planar segmentation result. So we have some results of planar segmentation, one, two, three, four, five, six planar segments.

49:02

And now the question was, well, what can I get improved definitions of the boundary polygons of these individual roof plane segments? And of course, if you have a roof, if you have pairs of roof planes next to each other,

49:21

you can find out, well, what is the geometrical relation between these two roof planes? For that purpose, you of course first need to know what are the neighbors, which roof planes are neighbors, and this can be based on the lower boundary diagram

49:41

of the extracted planes and a neighborhood relation and a regional JCC graph. And then here you get first approximations of the boundary polygons of all of these roof planes. Here I've only shown one example here, right? So we consider this roof plane here.

50:01

We extract all the boundary pixels in the raw segmentation result. And then we define segments of boundary polygons that always separate pairs of roof planes or the roof plane in the background. So all the yellow pixels here are between the boundary, are from the boundary of this roof plane with that one.

50:21

The red pixels here are the boundary between the roof plane and the background. Green pixels are the boundary between the roof plane and this roof plane, this one is between these two planes, between these two planes, between these two planes, and we have the pink one that's again between this plane in the background. So we have the first initialization

50:41

of the roof plane boundary. And now each of these boundary segments is classified. And there may be three cases. We may have an intersection, and if we have an intersection, we can of course replace the entire polygon

51:01

by a straight line, by the intersection line. We can have a step edge, and then we have to extract the step edge precisely. And we can actually have actually both an intersection and a step edge. And then we have to further split the roof boundary polygons into segments that correspond to step edges and other segments that correspond to intersection lines.

51:24

And the question is, how can we decide about this? And the answer is, well, what can we do if we have such an intersection? What will happen? We just compute the intersection line between these two planes. This intersection line will have some position in the image

51:41

and then we just check whether this extracted boundary points are in the vicinity of this intersection line. If this is the case, then there is probably an intersection and we can replace the boundary polygon by this straight line to find the intersection. And this decision was, in my case,

52:02

done based on statistical tests. And if you have identified such a boundary polygon second, be consistent with the assumption that there is an intersection of the points so that it's a 3D straight line, then you just replace the boundary polygon second by such a straight line. This works from here to here.

52:21

In these cases, the rest are all step edges. For step edges, you have to plus edge extraction scheme and then approximate them by polygons and then come up with a group chase of the group of the polygons, which you can then use to get the first 3D model of the...

52:44

So this was the first step I did or second step after plane extraction to reconstruct buildings automatically in 3D using a plane model. Okay.

53:01

Any questions? Here, by the way, also in the pre-processing step I don't discuss here, I also merge two planar segments. Originally I had a strong over-segmentation. I also merge co-planar segments with the pathways. This result that I define in the boundary edges.

53:22

And this is also a nice application of the regional chases. More questions. Top-down post-processing in 3D. So here we rely heavily on model knowledge.

53:45

And this is again from a PhD thesis that was done, the one that was based on these... On these trihedral corners, probably lateral corners. So the first thing was to reconstruct

54:02

trihedral corners in 3D. And then, well, these corners were kind of grouped. They were grouped based on some bottom-up processing methods, but already including... Or first they were aggregated.

54:22

So for instance, pairs of such corners were looked for. That, for instance, had one collinear edge. And it was also about this type of relation between these edges. For instance, the edge of the edges knew whether it was vertical, or horizontal, or weak.

54:44

And this restricted the aggregation. You could only merge or aggregate, for instance, a triangle corner, where one edge pointing downwards would connect to one pointing upwards. You wouldn't connect an edge pointing downwards

55:02

with another edge pointing downwards. This was something that was kind of prohibited by the generic model already. This wouldn't make any sense. And based on these simple ideas, you could aggregate, for instance, you could tell, okay, corner A and component B,

55:22

they are kind of connected. It can be grouped, because they have a special spatial relation. Same between B and B, same between B and C, and between C and D. And component F didn't really fit to,

55:42

or component F could also be connected to component B already. And so you had a first result of grouping. You could tell how these abstracted triangle corners were connected. Well, how can you detect the building in something like this?

56:01

The input to the actual interpretation was this graph. This graph, whose nodes corresponded to these triangle corners, and whose edges connected, corresponded to spatial relations between these triangle corners. Because now you could come up with some standard rules.

56:24

These standard rules, or parameterized rules models, would have some arrangement of triangle corners, where the names you have here kind of describe what type of triangle corners is O-G, for instance, one,

56:41

where you have a triangle of one edge that points vertically upward. So this would be such a graph of triangle corners and their connections. And what you observe in the image, well, parts of the graph you get by grouping in the image

57:01

will fit to parts of the graphs you get from your models. Why only to parts? Well, because you never see all of these corners in the image. Some of them will always be hit. So you actually end up with a graph matching problem. What you can do is, for each of these models,

57:20

you can define typical views, horizontal view, the vertical view, angle, and so on. And then you can infer, well, which of the nodes of your graph you actually see in that view. And this leads to some sub-graphs that would be characteristic for such an object

57:42

if extracted from the image. And you still always have to be aware of the fact that one of these elements might actually be this view to pull out the contrast. We just couldn't extract these edges. So we end up with a graph matching problem. You try to find such a sub-graph,

58:01

or you try to match a sub-graph of this graph you have after the grouping with these graphs predicted by the model, or more specifically with the typical aspects of the graphs you infer from the joint graph. And then you have a matching. And if you get a match between one such sub-graph

58:21

and a sub-graph of what you extract from the image, you have actually inferred that the object shape is present. In this case, you could match this component here with a sub-graph derived from cell-back-roof-shaped building and then you could kind of say, okay, these components can be aggregated

58:42

at the corresponds to an image primitive that represents a cell-back-roof. And they already are kind of explained by the cell-back-roof and from there even forward and just work with the rest of the extracted triliter forms.

59:00

So this way, again, you have a picture, you have extracted edges and the corresponding triliter corners. You get neighborhood relations. You have a graph that's predicted. So these are the grouping results. You will get a graph that's predicted from your object model,

59:20

you do a sub-graph matching between the graph extracted from the image and the one predicted by the model and then you get this. So we have already explained these tri-aggle columns here and they kind of correspond to a cell-back-roof building. The result isn't perfect,

59:41

but the beginning isn't the cell-back-roof building. But that's always when you work with primitives. You always get an approximation. There is no winning shape that contains a model where you actually have one wall that's not also.

01:00:00

to its neighbors. So this is one example of model-based or top-down aggregation. The fact that these one, two, three, four edges or triad recorders were selected

01:00:21

and grouped to form an aggregate that could be explained by a building primitive that's based on model knowledge. That's based because we have a model of the building primitive already. Without that knowledge, this would have made sense. We could have found out which of these triad recorders have emanating edges that kind of are collinear.

01:00:41

There could be potential matches, but which of them belong together to correspond to one such building primitive that was, that required knowledge about building primitives, that required model knowledge. Any questions?

01:01:01

Final example. This is something I did with my PhD thesis as well. Let's assume we're just, let's assume again we want to reconstruct these in 3D. In a semi-automatic setting, and this is what my PhD thesis was about, we kind of click a couple of points in an image

01:01:20

that gives you a course position of a building in that image and the fine measurement should be done automatically. How can this be done? Well, you have initial values for all the parameters of your object. So you can back project your object to the image and you get this back projection here.

01:01:41

Then you also extract edges from the image and then you match the extracted edges with the back projected ones. So here the black edges would be those, for instance, you would apply any edge of the record and the red ones are the result of the back projection using initial values for the parameters of the building.

01:02:01

What do you do next? Well, you match extracted edges with the back projected edges and then determine improved parameters of the building. And here, of course, the fact that you know the back projection helps you to understand

01:02:21

which edges belong together. There is no back projected edge that can kind of explain these small edgelets. So they're actually relevant, but there's some problem there with likely. Here you have the straight line edge that is very likely to correspond to this object edge. And here, well, you have pairs,

01:02:43

so probably you would match both of them with the edge and the final edge would be somewhere here. So it kind of helps. This can also be seen as a kind of grouping. Because you know that all the edges that are kind of, that kind of match one of the back projected edges

01:03:01

correspond to the same object edge. So you kind of group them implicitly. So this is model-based grouping in a semi-automatics procedure. Or you could actually get these initial values also from an automated process

01:03:20

that is, of course, not semi-automatic, but it's fully automatic. It is some kind of refinement of the shape of the extracted object by what is called bioframe fitting. Bioframe? You represent your object just by the edges. This is a bioframe representation.

01:03:42

You match this back projected bioframe with the image edges, so you fit it to the image content. So it's called bioframe fitting. Any questions? This is something I also do in my PhD thesis.

01:04:01

It is what the automatic refinement of the building models of the building primitives. That was all I wanted to say about post-process.