Accelerating Open Source Geospatial Machine Learning
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 295 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43514 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Bucharest 201970 / 295
15
20
28
32
37
38
39
40
41
42
43
44
46
48
52
54
57
69
72
75
83
85
87
88
101
103
105
106
108
111
114
119
122
123
126
129
130
131
132
137
139
140
141
142
143
144
147
148
149
155
157
159
163
166
170
171
179
189
191
192
193
194
195
196
197
202
207
212
213
214
215
216
231
235
251
252
263
287
00:00
AutomatonSpacetimeRight angleLecture/Conference
00:24
Machine learningOpen sourceMachine visionOverhead (computing)DemonSpacetimeData managementPresentation of a groupPoint (geometry)Multiplication signComputer animation
01:03
Overhead (computing)Machine visionSet (mathematics)TwitterPresentation of a groupHydraulic jumpVirtual machineOpen sourceEnterprise architectureProduct (business)SatelliteScaling (geometry)Computer animation
01:40
Machine learningKolmogorov complexityTexture mappingSoftware testingVirtual machineEstimationTask (computing)Identity managementEnterprise architectureProduct (business)Scaling (geometry)Latent heatData managementService (economics)Virtual machineCASE <Informatik>Perturbation theoryJSONXMLUML
02:06
SoftwareOpen sourceSet (mathematics)Endliche ModelltheorieSoftwareEndliche ModelltheorieHeat transferVapor barrierVirtual machineRow (database)Product (business)Mathematics2 (number)Latent heatCartesian coordinate systemOpen sourceWordLimit (category theory)Transformation (genetics)Game theoryBenchmarkFocus (optics)SpacetimeAreaBit rateComputer animation
03:37
Artificial neural networkTexture mappingPerformance appraisalSet (mathematics)5 (number)Characteristic polynomialOpen sourceAlgorithmProduct (business)BenchmarkProjective planeTouch typingSet (mathematics)BuildingCartesian coordinate systemPerformance appraisalDomain nameLink (knot theory)Slide ruleTexture mappingLatent heatSpacetimeSound effectFocus (optics)Valuation (algebra)Student's t-testComputer animation
05:52
SatelliteMereologyException handlingMedical imagingFrequencyBuildingMathematical analysisWhiteboardPlanningMetaanalyseComputer animation
06:21
BuildingAreaMereologyResultantSoftware testingCapability Maturity ModelMachine codeSet (mathematics)Computer animation
06:47
Multiplication signBuildingRoutingSpacetimeRight angleSoftware developerSet (mathematics)IdentifiabilityMultiplication signPoint (geometry)BuildingComplex (psychology)Domain nameComputer animation
07:33
BuildingSet (mathematics)Open sourceSet (mathematics)Endliche ModelltheorieMetric systemBuildingRoutingWordLevel (video gaming)Sinc functionNumberType theoryMachine learningNeuroinformatikMachine visionInformationArithmetic progressionContext awarenessQuicksortOpen sourceVirtual machineComputer animation
08:35
Scale (map)PlanningElectric currentLevel (video gaming)Scaling (geometry)Endliche ModelltheorieContext awarenessTexture mappingQuicksortVirtual machineTerm (mathematics)Parameter (computer programming)Metric systemFunctional (mathematics)Arithmetic progressionPole (complex analysis)Autonomic computingPerturbation theoryAddress spaceProjective planePerspective (visual)Task (computing)MappingData managementComputer animation
10:13
MultiplicationBinary fileFocus (optics)Computer networkAxonometric projectionEstimationExpandierender GraphSoftwareDynamical systemEndliche ModelltheorieSpacetimeMultiplication signRoutingComplex (psychology)Standard deviationMathematical optimizationLimit (category theory)Service (economics)Projective planeType theoryWritingArithmetic meanWebsiteLecture/ConferenceMeeting/InterviewComputer animation
11:08
Endliche ModelltheorieWave packetSet (mathematics)Point (geometry)Multiplication signQuicksortEndliche ModelltheorieResultantStorage area networkSinc functionWave packetPerspective (visual)AdditionAlgorithmRevision controlWhiteboardComputer animation
12:08
Data structureImage registrationLink (knot theory)InformationRevision controlAlgorithmAdditionPresentation of a groupLink (knot theory)Term (mathematics)PlastikkarteSpacetimeImage registrationBlogSoftware bugSystem callComputer animation
13:24
WebsiteInformationFeedbackComputer fontTime seriesSound effectBlogArchaeological field surveyEmailLink (knot theory)Set (mathematics)Queue (abstract data type)Device driverLecture/ConferenceComputer animation
14:30
Process (computing)FeedbackData typeOffice suiteSatellitePlanningMachine visionMereologyProjective planeSet (mathematics)Entire functionType theorySemantics (computer science)BitWave packetMappingOpen setClassical physicsDomain nameUniform resource locatorSpacetimeDirection (geometry)Point (geometry)Image resolutionMultiplication signSelf-organizationGoodness of fitObject (grammar)Covering spaceFlow separationCondition numberComputer configurationBit rateWebsitePerspective (visual)Endliche ModelltheorieTerm (mathematics)Strategy gameDigital photographyMomentumTowerPersonal identification numberInformationLevel (video gaming)Medical imagingProduct (business)2 (number)MeasurementSocial classLie groupLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:07
Hi everyone, my name is Ryan Lewis and I'll be walking you through what we've been doing for the last three years with an initiative we call SpaceNet. Before I get into all of that, just show of hands, how many people have worked with SpaceNet data, worked with some of the code?
00:23
Alright, not as many, right? So this will be informative, so I will try not to burn through too much of the introductory piece. Before I get in or underway, a quick shout out to some of my colleagues in the room. So Jake Shermyer is sitting up here in front, he's helped with all the challenges to date, he'll be leading SpaceNet 6.
00:41
Nick Weir, who is hanging out in the hall, but he's really excited about this presentation, even though he's not in here. He was the manager for four, and then from our AWS partners we got Joe, everyone knows Joe. Last presentation covered this well, so I'm not going to belabor the point, but often times we get asked, why did we start SpaceNet? What was our intent?
01:05
And it really goes back to almost five years ago when we first started looking at companies that were producing new remote sensing data sets. On the one hand we have this trend of lower cost imagery or more available imagery, whether it's from satellite or drone or something like that.
01:21
On the other hand we have increasingly commoditized machine learning capabilities, primarily through the open source community. What happens when these two things intersect? All the presentations we've heard today have covered aspects of that, and certainly other presentations here at this conference have hit on that. But the question then kind of jumps to the present, which is why haven't we seen greater adoption in either enterprise scale products or services,
01:49
or why haven't we seen it deployed in some of the use cases that often get cited in press releases? And this came out from Hot OSM. They're just now starting to experiment with the tasking manager, with incorporating machine learning tech.
02:05
And the big question or the answers that come from that are is that, as you all know, there are still a fair amount of market challenges to incorporating machine learning capabilities into spatial specific products. One, and this is no surprise, obviously the questions from the last talk hit upon this,
02:24
which is there is simply a lack of labeled data sets. There's been a big change since we started this effort three years ago, but relative to other markets or applications they'll say there's limited data. Second is that we're still talking about looking at or trying to use transfer learning for models.
02:43
There hasn't been as much benchmarking for geospatial specific models as we'd like to see. And then last, and not least, but this has certainly changed really in the last year, I would say, is that there has historically been a dearth of open source models or software to spin up models very quickly.
03:03
This is changing, but it's still something we'd say is barrier to entry. And so Nick, Jake, and I, our broader company is an investment firm, and imagine all of us were in a company, you'd say, all right, we have very limited data, we're going to have to figure out how do you do some transfer learning for models,
03:22
and there's very limited software to start. Those are three hurdles in a row that would lead us as a company to say, we're going to focus on something else. And so collectively we said, what can we do to help the open source community accelerate research in this area? And that's really what led to us starting SpaceNet.
03:41
And so for the last three years, SpaceNet has now been run as a non-profit LLC, dedicated exclusively to celebrating open source applied AI work in the geospatial domain. And we've been very fortunate to do this in concert with these four partners. And then effective this month, which has been really cool, if I can get the slides to work, we added two more.
04:06
And so you may have seen this, we added Capella Space last week, and then next week I'm stealing some of their thunder, so I apologize. We'll be adding Topcoder as well. Essentially, think about what we do in four general pillars.
04:20
We have data sets, the open source algorithms from competitions, and then we have rigorous evaluation and benchmarking. I'm not going to get into much of the evaluation today, but I'll put up links at the end to highlight where you can get access to that work. And I'm happy to talk offline about any of the specific projects or algorithms
04:41
that I'm going to briefly touch on here. But with the data sets, we focus exclusively on building footprints and roads, so all foundational mapping applications. To date, we've hosted four challenges through Topcoder. We'll be launching our fifth next week. From that, we've open sourced 18 algorithms.
05:01
And then for each one of those challenges, we have provided rigorous benchmarking on both the data set and those models. This is important, especially in this community, because we've had a lot of lessons learned over the years. One of them is how to make data accessible.
05:20
Our first challenge, we put out a label data set, but we did not have the correct license. Essentially, we did not make it easy for commercial companies to use those data as much as any other researchers or academics would. And so over the years, we've tried to build out best practices to ensure that the data sets are not only accessible worldwide, easy to use,
05:45
as well as available to use for corporations so they can bring it into their own product research. Just last week, we announced the release of four more cities. So that brings our total data set to ten cities.
06:02
Essentially, what this means is that we will have either a WorldView 2 or WorldView 3 image for each one of these cities. The exception is Atlanta, where we have 27 images shot over about a four-minute period for an off-nader analysis. With each one of these cities, we have building footprints and or road labels.
06:26
And the intent is that, since the start, is to build iteratively on this data set. And so not only can you go back and test your results against previous challenges and use the code from those challenges,
06:40
but you can also then begin to incrementally build your work to build out, hopefully, a more mature model. I mentioned we've done four challenges. Just like the data set where we're focused on incremental development, our challenges are focused on incrementally increasing complexity. And so the goal in the beginning was,
07:01
let's just see how well people do on identifying building footprints in one city. And the quick story that we always like to tell is that when we started it, the competition host that we were talking to at the time said, well, you're going to need a tiebreaker because everyone's going to get 100%. And the F1 score was 0.2. So we didn't need a tiebreaker.
07:22
And the reality is that it is a very hard problem, particularly when you get into more complex geospatial domains. Right now, we'll be launching SpaceNet 5 next week, and that is going to be focused exclusively going back to buildings, roads, and routes, which I'm going to talk about in a little bit.
07:42
It never ceases to amaze us. The level of interest and probably a better word is demand for both data sets and models in this community. This is just some basic metrics for us, but since we've launched the data set, we just crossed our 450 millionth unique hit on the data set.
08:05
I just checked the numbers today. That's almost 500 total terabytes downloaded across 82 countries. That's really significant, and it highlights the importance of continuing to put out this type of information for this community. Not surprisingly, in a lot of the talks,
08:21
we've discussed how geospatial or computer vision problems are making their way into the leading machine learning and AI conferences. It's not surprising who's downloading these data sets, whether it's academics or corporations. As the title of this talk implies, it's important to set in context about what sort of progress we've made.
08:44
We've open-sourced these data, we've hosted these challenges and open-sourced these models, and we've done some of our homework off these models. How well are we doing? There's a lot of metrics. Often, in our world, usually in the AI domain, you hear it's this very binary discussion.
09:00
It's either you don't use it, or we have completely autonomous mapping capabilities. The reality is there's a ton of nuance in between those two poles. If you think about it as levels, if you will, essentially zero starting with just completely manual efforts to build out maps, five being a completely automated solution.
09:22
In our perspective, we still haven't even touched level four at all in terms of building out some sort of semi-automated mapping function at scale. Really, I'm going to critique my own slides, it's really not even level three, or there should be a huge asterisk next to that,
09:42
because what we've seen in algorithmic performance, you could maybe make an argument for level three in a US city like Las Vegas, but applying those same models to one of our other cities like Khartoum, you'd say that there is a lot of manual annotation that still has to go in to address the machine learning projections.
10:03
When you say, I have a model, it performs well, I think we could use this in something like a hot OSM tasking manager, you have to be really precise of what you mean by that, because there's still a lot of work to be done. Which is why, when we're thinking about challenges and data sets, one of the big thoughts for us is,
10:20
how do we expand generalizability of models? So with SpaceNet 5, we're going back to road networks, but this time, instead of just asking challenge participants to extract road networks and routes, we're now asking a third dynamic, a third projection, which is time estimates, based on road type,
10:42
which has a standard speed limit assigned to that road type. This then really gets into the question of optimal routing. We think this is going to be particularly challenging, one, because of the new cities that we added, but two, the complexity of extracting the road type,
11:02
particularly in certain geographies or weather situations, we anticipate being very challenging. So how do we plan to structure this? So we have our legacy cities that have been in the data sets since SpaceNet 3, which launched almost two years ago. We added four more, and I'll explain the mystery city in a second,
11:23
but we'll have now included in the training, as well as the public leaderboard, both Moscow and Mumbai, and then we'll be including both in the leaderboard, as well as your final scoring, San Juan, and then last but not least, in an effort to push increasingly more towards generalizability,
11:42
we have now had a mystery city. This is the first time we've done this, and we will not expose what the city is, or its labels, or the imagery until after the challenge, so it's going to be a blind test. And the whole point is pretty obvious here, that you can't build a model specifically to the city. You'll have to build something that is designed to reach across different geographies.
12:03
We're really excited to see what sort of results that this produces. From a timeline perspective, this month's been busy for us. In addition to adding the two new partners that I mentioned, we released the algorithmic baseline version one earlier this month.
12:21
Anyone who was at Nick Weir's Solaris talk or workshop on Monday will be releasing an updated baseline in Solaris next week. Last week, we released the data set. We will be starting the challenge next week on Tuesday.
12:40
That challenge will run to October 25th, and then we will be announcing the winners and distributing the prize awards on November 8th. Each one of our challenges runs about eight weeks, give or take. In terms of important links, this is where you can access anything on SpaceNet,
13:00
specifically on the fifth challenge here. We have pre-registration open. I checked yesterday, about 75 people have registered thus far. If you have any questions, I have cards, or any of my colleagues have cards, we're happy to talk afterwards. These are the primary links. It's always frustrating to do a presentation and talk about all this cool quantitative work and not actually present it.
13:23
I will just highlight some links here. I apologize, the crazy font of use doesn't translate well. We have our emails up there if you want to reach out to us direct with any questions. The main place we put a majority of our research is our blog or COSMICS blog,
13:40
which is called the downlink with a Q. Everything we do is a Q. You can find that on Medium, and that's where we describe the datasets, describe our baseline, have links to all the code, and things to that effect. Before I turn it over to questions, I'll just put this up. It's just a brief survey. It's just asking you what you would like to see us do next,
14:02
whether it is new datasets that are non-imagery. Maybe you'd like us to do deeper time series work. This is something that all the partners on our weekly calls, we debate continuously about where we should be pushing either challenges, some of our own applied research, or things to that effect.
14:21
Don't worry. We are going to get the dataset, both STAC and COG compliant, so that's just assumed. With that, I'll turn it over for questions. Thank you.
14:46
Thanks. Maybe to give you first feedback on this, are there any plans on going beyond just feature detection, but actual, let's say, semantic classification of entire scenes? That's a good question. We've talked about it, but no specific plans.
15:01
Right now in the near term, especially given the announcement with Capella, our goal for our targeted goal, I should say, somewhat tentative for SpaceNet 6, would be to make our first multimodal challenge, so to have both imagery and SAR data labeled over one location.
15:21
Broader classification is something that comes up a lot. One of our colleagues, Adam and Nick, have done a little bit of this work, but not through a challenge, just some of our own applied research. It's a direction we'd like to go eventually.
15:41
It also falls to feedback. It would be nice also if future challenges could be based on open data, so not, let's say, very high resolution data. You get the training data, you develop a model, but then basically you don't have access to,
16:01
or it's difficult to get access to very high resolution data, but if you go lower in the resolution, then it's open, you develop a model, you can use it, everyone else can use it, so perhaps. And just as a follow-up, because something we talk about is when you talk about lower resolution, we've kind of had two camps.
16:24
One is saying that we should focus something almost in the classical Landsat domain. The other thought we've had is just going to more of moderate resolution, so like one meter to like five meter GSD, so like a preference. Just curious, in your mind. It's just a preference towards open, open satellite imagery.
16:46
Okay. And just to say that there is an open imagery with high resolution,
17:03
so it's not because we talk open data imagery that we lose the high resolution. That was the only point. A kind of open data related to city, there is open data related to city, so you can use this kind of data.
17:28
I don't say that it covers all the planets, but on several cities, there is aerial acquisition, and it's open data.
17:41
That's only the point. Maybe just to complement that, and as I said yesterday, for example, when I talk about what we're doing in Wallonia right now on land cover mapping, there's more and more open data from actual governments of, let's say, aerial imagery, and there's more and more teams doing actually classifications of that. And so we are hoping that we can actually, for example, take the entire Walloon region, which will have classified very high resolution as well,
18:02
and to be able to open that as a data set for training, for example, after the project is finished. Yoni in the front, and then the back. Could you share a little bit more about your long-term vision and strategies,
18:21
so thinking a year, two years, five years even ahead, what would you like to see and where would you like to be? Yeah, that's a good question. I think when we first started this, it was just could we even get it to work, and do we even know how to host a challenge? Now that we've had a fair amount of momentum,
18:43
certainly in the next year or two, we want to move simply out of one type of image, one type of data type, certainly multimodal. More broadly, looking down the road, we'd like to move out of imagery or just only imagery or space-based data. So many geospatial problems are not just resolved with one data type or only space data.
19:07
And we've had some companies talk to us about being able to incorporate some other information types. There's a lot of inherent challenges just with that, which we think are compelling. But at that point then, there's just a lot more work for us to do.
19:22
I know one thing that's come up frequently just in the last three months that would take a fair amount of thinking is what would it be like to incorporate street-level data? So I think over time, that's where we want to go, because that's where we eventually are going to have to move if we're going to build out even more robust foundational maps.
19:41
Another option, which is a little more provocative, is if we were to include features that were not classic foundational map features. So objects that move or things to that effect, that's certainly something that we've seen more government-backed organizations release. Yet we haven't seen a real consistent labeling method for a lot of those data sets.
20:07
And so incorporating something like that could be useful for answering other questions. But as we've found, if we're going to do that, we want to be really precise about the question we're asking. So let's say we did cars. That's a really popular topic.
20:22
We probably want to integrate that with something that we've done along the road to work. Say, can we infer something about traffic, even though these are very static pictures, taken at only one specific time in the day? Do you have any insights on the participants of the previous challenges?
20:41
So how many people are coming from research, from maybe companies and private people? Yeah, that's a good question. I would say we have some insight, not as much as we would want. So at a high level, overwhelming majority of the participants, and certainly almost all of the winners, are international.
21:03
So that's very compelling. So 32 countries were represented in the last challenge, which was good. From a corporation perspective, we have had several companies participate. They have not won. And that's not a...that sounded terrible.
21:22
It's not that they necessarily didn't have the skills, but certainly once...competitions kind of have their own dynamic to them. We've learned this, is that when people score at a certain rate, you either have to keep up with the submissions or you kind of step down. And we've seen some of that behavior with anonymous corporations.
21:43
But that's really only qualitative. Long story short, we haven't seen as many corporate participants as we want. A majority are either individual researchers or they're a part of research labs. What we have found, though, which is less qualitative, is corporations using the data, but much more so the models,
22:05
and incorporating that into their product suite. That's something that we hear a lot of feedback on, and certainly a lot of encouragement for us to keep doing what we're doing. Thank you.
Recommendations
Series of 2 media