HIECTOR: Hierarchical object detector for cost-efficient detection at scale
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/69226 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Firenze 202219 / 351
1
7
13
22
25
31
33
36
39
41
43
44
46
52
53
55
58
59
60
76
80
93
98
104
108
127
128
133
135
141
142
143
150
151
168
173
176
178
190
196
200
201
202
204
211
219
225
226
236
242
251
258
263
270
284
285
292
00:00
MultiplicationScale (map)SatelliteSineQuery languageAngular resolutionAngular resolutionObject-oriented programmingSparse matrixBuildingApproximationInternationalization and localizationImage resolutionAreaPixelHierarchyLevel (video gaming)DialectObject-oriented programmingCuboidLocal ringBuildingEstimatorGoogle MapsApproximationSatelliteSelf-organizationSoftware developerDifferent (Kate Ryan album)State observerCASE <Informatik>Scaling (geometry)Object-oriented programmingDigital divideMusical ensembleLecture/ConferenceComputer animation
03:53
Product (business)Object-oriented programmingPredictability
04:09
RotationObject-oriented programmingSingle-precision floating-point formatInferenceVirtual machineProcess (computing)Order (biology)Mathematical modelPredictabilityBuildingMereologyDataflowProcedural programmingAreaDifferent (Kate Ryan album)Confidence intervalCuboidLevel (video gaming)Core dumpImage resolutionServer (computing)Multiplication signModel theoryInstance (computer science)Quantum stateObject-oriented programmingInferenceNeuroinformatikSubject indexingLibrary (computing)Single-precision floating-point format2 (number)Virtual machineNumberFunctional (mathematics)Cloud computingDrill commandsCyberspaceRotationSymmetry (physics)Texture mappingReference dataSocial classTerm (mathematics)Square numberPoint (geometry)Computer networkConfiguration spaceOpen sourceReading (process)Direction (geometry)Computer animation
10:44
BuildingNegative numberCurvePairwise comparisonSign (mathematics)Heat transferMathematical modelOpen setProbability distributionBuildingAreaCalculationCASE <Informatik>Texture mapping10 (number)Similarity (geometry)ResultantLevel (video gaming)Validity (statistics)Multiplication signDifferent (Kate Ryan album)Insertion lossInformation retrievalOpen setMathematical modelOrder (biology)Angular resolutionPairwise comparisonProcess (computing)Graph (mathematics)Set (mathematics)BitNatural numberAlgebraTunisTableauPredictabilityReference dataImage resolutionHeat transferEntire functionDistribution (mathematics)Single-precision floating-point formatComputer animation
17:19
Heat transferMathematical modelBuildingOpen setHierarchyMathematical modelCodeOpen sourceLaptopInferenceQuery languageRegulärer Ausdruck <Textverarbeitung>Price indexService (economics)Image resolutionData storage deviceInstance (computer science)Open sourceImage resolutionInferenceCodeCovering spaceCASE <Informatik>Data storage deviceAreaLaptopPresentation of a groupModel theoryInstance (computer science)Computer animation
Transcript: English(auto-generated)
00:01
So hi everybody, I'm Nate. I'm a data scientist at Synergize, and today I'm going to be presenting the HECTOR, the Hierarchical Object Detector using Multiscale Satellite Imagery. So first, some motivation. I don't know about you, but when I first encountered some kind of satellite
00:21
or aerial imagery, it was with Google Maps like back in 2006 or something when I was very young. And the first thing I wanted to do is see my house from it. And yeah, you saw the house, it was very nice. But then years later, I started working at the Earth Observation Company, and I got introduced to Sentinel-2. But there you find out that, okay,
00:41
Sentinel-2, you try to find your house. You can't really see your house because it's less than 10 meters by 10 meters, and it doesn't even cover one Sentinel-2 pixel. So it means that detection of man-made objects from this medium resolution imagery is kind of not possible, so you require high-resolution imagery, which is very expensive,
01:02
and it does the detection over large areas, is limited to kind of organization with a lot of resources or a lot of money. But it turns out that the man-made objects, so buildings, roads, and so on, are actually, if you look at the larger scale, at the large AOI, they are actually very sparse
01:22
and cover a very small percentage of an area of interest, especially if your area of interest is large, maybe country level or even continental level. And this means that we can leverage the hierarchical approach. So we have different levels of detection.
01:40
We use the lowest one to do kind of rough estimation and then increase in accuracy as we go along, and to minimize the amount of very high-resolution imagery that we need. And for this, we do this so-called hierarchical divide-and-conquer approach. So first, we start with detection at medium resolution.
02:02
So your medium resolution is your Sentinel-2 or maybe even Landsat, and in our use case, we use ESA Sentinel-2 imagery and we use the 10-meter band, so blue, green, red, and near-infrared. And the idea here is to use this imagery
02:20
for approximate localization of the built-up areas. So you want to know where the actual built-up areas are because just not ordering the very expensive high-resolution imagery over those areas can save you a lot of money. So here is an example of Azerbaijan, and you see that the detected built-up areas are in red,
02:43
and the other areas are basically areas where we do not expect any buildings to be or any man-made objects. But of course, for the areas that do contain built-up, you still need to detect them somehow. As I said, Sentinel-2 is not really sufficient to accurately see the footprint of your house
03:02
or the bounding box. And this means that we try to use airbus imagery, so more high resolution, so in our case, we use spot, and the idea here is to try to detect large and medium objects. Okay, so here we have this covered,
03:21
but especially kind of in developed countries and in some more urban regions. It is a problem because you have some very dense areas with small buildings, maybe townships and so on, which are very hard to separate even on 1.5-meter resolution. And for that, you need something like maybe Pleiades imagery at half a meter resolution,
03:42
and idea here is to order Pleiades only on those areas where kind of the previous level is not enough. And to detect and accurately detect the small objects. So the final product would look something like this. So in green, you have the Pleiades imagery, and in red, the detections on the Pleiades imagery,
04:03
and in red, the predictions on spot. And they are kind of combined together into a map. Of course, I mean, doing detections and so on, we need some kind of model, but I think it's not the point of this talk to go into technical details about the model,
04:22
but just to give you an idea, so we use a model that detects oriented bounding boxes directly. So we are not doing segmentation or pixel-based, but we are going more into the object detection approach where you try to localize the object by finding the bounding boxes,
04:40
and the model uses the ResNet backbone and tries to use this feature pyramid network to be kind of scaling variant in a sense. And another thing, kind of in practical terms, is that this model turns out that for each detection, it produces a lot of bounding boxes which are overlapping
05:01
and with different amounts of pseudo-probability, and that's why we need to apply non-max suppression to get, for example, for one, for each detection, only one building box. As far as training goes, of course, you have the model, you need to train it. We trained two models, one was for Sentinel-2 imagery,
05:21
and the other one was combined model for which we fed, to which we fed both Spot and Pleiades. And for Sentinel-2, it was quite easy. So we had images available across the whole country of Azerbaijan, and we also had building footprints available for the entire country. I mean, the ground-truth data was not so accurate,
05:41
but it was accurate enough to train this kind of model. And again, remember, the idea here is not to train kind of an accurate object or a building-class detector, but it's to find built-up areas rather than single buildings. And for Spot and Pleiades, of course, like it was mentioned before, this imagery is quite expensive.
06:00
So we only had imaging data over small AOIs, and these images covered only, for example, nine square kilometers, which is just 0.01% of the country's area. And another thing with these kind of images is that they are very, they have high space resolution,
06:20
which means that all the problems in the reference data, so all the kind of geolocation inaccuracies, are even more prevalent. So what we did, we took some areas, and we manually validated buildings in those areas to get kind of a very, very clean reference data set, which enabled us to train an accurate model.
06:42
And for these two, yeah, we trained a single-state rotational detector. I guess the main kind of engineering part of our procedure is the inference. So it takes a lot of engineering efforts to have inference on a large area be scalable.
07:05
So what we do is, let's say you have an area of interest, you first split it into kind of sub-grid of little chunks on which you then order Sentinel-to-imagery. So for each of these sub-chunks,
07:20
you then apply the model, and this means that you get the detections, that you get the detections where the built-up areas are. After you have these built-up areas recognized, you order spot imagery for those, and again, apply the same procedure as before
07:43
by applying the Spot Pleiades model to get, again, detections on those areas. But here we come to a problem. So, okay, we ordered spot imagery across all the areas that we suspect are built-up,
08:01
but how do we know where to then order Pleiades imagery? So we sub-split those areas again, so there's a lot of reading in this procedure. So we sub-split those areas again, and for each of these sub-splits, we calculate the so-called drill-down index, which is a function of the number of buildings in those areas in that chunk,
08:22
and the confidence of the model in its predictions. So the idea here is that you have an area with a lot of small buildings, or a lot of buildings in general, and if the confidence of the model is poor, then there you need to kind of verify your predictions using Pleiades imagery. And again, once you identify those areas,
08:45
you order Pleiades imagery, and on that imagery you run your model again, and then you combine the predictions from the spot level and the Pleiades level into your final map.
09:01
Of course, I mean, this is all nice and well if you're doing small areas, but again, here the idea is to scale. And here we come a problem. So let's say that you have an area the size of Azerbaijan, as was discussed, and we sub-grid it into, I don't know,
09:22
100 by 100 meter, or maybe one kilometer by one kilometer, you still get a lot of chunks, and if each one takes 10, 15 seconds to process, this means that it will take a lot of time. So here we leverage the capabilities of Ray, which is one of the open source libraries that we utilize in our team.
09:43
And what Ray does is, first of all, it enables you to parallelize across cores on a single machine. So let's say you have a server somewhere lying around for something like this. You can speed up your processing by running Ray, and it will take care that your kind of workflow is fully utilizing all the cores.
10:02
But even this, there's only so much you can do with a single machine. If you want to go kind of higher, or want to utilize on a larger area, you need to do some cloud computing. Here, in our company, we use AWS. So what you can do, you can use Ray again,
10:21
so you need to set up kind of a cluster configuration and set up a Docker image, but when that is done, the only thing you need to do is parallelize, use runway with your configuration, and it will parallelize your computation across cheap AWS EC2 instances. I know this sounds expensive, but in reality,
10:42
it's not really. So for as observed by John, the processing was in the order of tens or hundreds of euros. So it's not really that much cost, and it's affordable by smaller companies, or even, I mean, if need be, individuals. Of course, once you have your data,
11:04
I mean, your predictions, we wanted to validate them to see if they are any good, and if what we produced actually makes sense. So we kind of validated those in two different steps. So we did validation on the Sentinel-2 imagery,
11:22
and what we found out is that the built-up areas are quite nicely detected, even for the few isolated buildings. So the main kind of question that we had was, okay, let's say that you have a single building somewhere, or maybe a small village with two, three houses. We'll be still able to get these areas
11:40
using this Sentinel-2 approach, and it turns out, yes, that we can. So the calculation that we made was that we retain around 40% of the entire area of interest. So 60% can be completely discarded, so it doesn't need to be processed anymore, and with this, we only lose 0.6% of buildings.
12:01
And one of the kind of natural questions that one asks is, okay, why even use Sentinel-2? Why don't you just use something like OpenStreetMap? And it turns out that, especially in the developed countries, so for example like this, Azerbaijan, or maybe in Africa, OpenStreetMap is not really accurate,
12:21
or it's really not up to date. So for example, in this use case of Azerbaijan, using OpenStreetMap, we would lose 3% of the buildings, so much more than just using Sentinel-2 imagery. Of course, we also then validated the Spot and Pleiades predictions, and kind of the main takeaways was that
12:41
we wanted to see if what we did makes sense, and yeah, the conclusions are that it does. So Pleiades is much better at detecting smaller buildings, as expected and as wanted, because the spatial resolution is high. But another one, and here on the graph, we see that, for example, the pseudo-probability distribution for Pleiades is kind of much higher,
13:02
so the model is more confident, because again, it gets a higher spatial resolution. But one problem that we faced when we looked at the imagery is the false positive detections. So this is kind of a caveat, because we had limited imagery for Spot and Pleiades, and this limited imagery was kind of focused more on built-up areas,
13:21
and so that means that the model didn't see a lot of other areas where there are no buildings, so this means that we had some problems with the false positive detections, yeah, both for the built-up areas and the non-built-up areas. Like I said, of course, there is no free lunch,
13:42
so this approach does lead to some loss in accuracy, so can't be free, but maybe we can make it a bit cheaper with this similar quality of ingredients. So here we have a nice table comparison between kind of the accuracy, so the accuracy here is the mean average precision score,
14:03
and the cost saving over using just Pleiades, so let's say that you wanted to just use Pleiades, what would be your cost and what would be your accuracy? So first we look at the first one, we see that the accuracy is, for just using Pleiades,
14:20
the mean average precision is 0.45, but let's say that we just use the first level of reactor, so Sentinel-2 to get the built-up areas and then order Pleiades over those, we already get a saving of approximately 2.5 times with negligible loss in accuracy, I think,
14:40
so there's not much of a difference. Similarly, if we just use Spot, okay, it's much cheaper than Pleiades, but there is a loss in mean average precision, but again, then with Hector, so with all three levels,
15:01
and with some threshold that we found out and we saw that this kind of works nicest, you see that the accuracy is somewhere between what you get from Pleiades and what you get just using Spot, but the cost saving is a lot, so you save 20 times the amount of money, basically,
15:21
and we feel like, yeah, this is a nice result for most use cases. Of course, if you require very accurate precision and you have unlimited amounts of money, then you might consider just using Pleiades, but for most companies, I think using this hierarchical approach makes a lot of sense. Of course, these are scores that we calculated
15:43
on our manually validated areas, so it's, yeah, if you use a different area or maybe something like this, then maybe it's not completely accurate, but the rough idea should be the same. After we had this model tuned,
16:00
kind of built for Azerbaijan, we also tried to do some kind of transfer learning and fine-tune this model in Dakar to see how well it transfers to a completely different geographical region, and we used, first of all, we used Sentinel-2 model without fine-tuning, so we didn't, because we had basically
16:21
no training data for Dakar. We just tried to run, we just tried to run the Sentinel-2 model as is, and you can see here on the graph that, yeah, you do lose, the model is much less confident, but overall, the quality seemed to be quite okay. And what we also tried is to fine-tune
16:42
the SpotPaleades model. Again, I said that we had no reference data, but what we did, we, or not to the quality that we wanted, but we manually labeled a data set of 7,000 buildings and trained the model on that. And it turns out that this leads, actually,
17:01
to quite good results, so we compared these results to open buildings, so from Google, and which provides buildings, basically, across whole Africa, and results are, I think, comparable, if not even favorable for Hector.
17:21
So, I'm here talking about what we did, so the question here in the audience is, how can I use it? So, and what we made publicly available. So, Hector is fully open-source, so we've published the models and labels for Dakar, and they are published on this open AWS S3 bucket
17:40
that you can freely access. We've also published the full Hector code on our GitHub, so Sentinel Hub Hector, under the MIT license, and we've also published kind of an example notebook, so kind of to get you started, that does inference only,
18:00
so no training, but does inference, but for a lot of the use cases, maybe this could be enough for somebody to try out what Hector is about. Yeah, just something about the code base, so the code base allows you to reproduce the training and the inference steps in total, but there are, of course, some prerequisites,
18:21
which, unfortunately, are unavoidable, so you do need a Sentinel Hub account, because we do work with Sentinel Hub, and you do need some credits for very high-resolution imagery, but again, yeah, depends on your area of interest, how much you need, and our code base is kind of tailored to be used with AWS,
18:44
so you need an S3 bucket for the storage of the data, and for the EC2 spot instances. I think this about covers my presentation, so I think, so if you have any questions,
19:01
please let me know, and I would be happy to answer. All right, thank you, guys.