We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Open Source Point Cloud Semantic Segmentation Using AI/ML

00:00

Formal Metadata

Title
Open Source Point Cloud Semantic Segmentation Using AI/ML
Title of Series
Number of Parts
351
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Assigning semantic labels to points within a point cloud aids in both visual interpretation of the data and as a preprocessing step to other forms of analysis like building footprint extraction, hydrological modeling, and biomass estimation. Our talk will focus primarily on earth observation data and airborne lidar data sources in particular, where labels are commonly aligned with those classes specified in the ASPRS LAS specification (e.g., ground, vegetation, and building), but we are also beginning to explore the extension of these same methods to data generated by commodity, consumer-grade devices like iPhones. For many years, hand-tuned models have been developed for this segmentation task, building on reasonable assumptions about the data. For example, ground points should include those lowest elevation returns within a local window or building segments should typically be planar. Within the past decade, we have seen a surge in AI/ML powered models that are able in many cases of outperforming the prior methods, being able to learn novel features and adapt to the intrinsic variability of data. We will provide an overview of the open source ecosystem powering this trend, from benchmark datasets like US3D and DALES to machine learning frameworks (i.e., PyTorch and Tensorflow) and key libraries such as PDAL, Open3D, and PyG.
Keywords
202
Thumbnail
1:16:05
226
242
Information technology consultingService (economics)Point (geometry)Open sourcePoint cloudData structureReading (process)NumberSocial classCombinational logicTotal S.A.BitOperator (mathematics)Geometry1 (number)Dimensional analysisFile formatType theoryAlgebraSet (mathematics)Speech synthesisMathematicsProof theoryCodePower (physics)Virtual machineSoftware repositoryPhysical systemWater vaporIntegrated development environmentTable (information)Electric generatorAuthorizationBridging (networking)Graph coloringPoint cloudTwitterBenchmarkState of matterImplementationOpen sourceMassDrop (liquid)PixelRaster graphicsOpen setProcess (computing)Multiplication signSimulationSlide ruleScaling (geometry)Cartesian coordinate systemRange (statistics)QuicksortPoint (geometry)Term (mathematics)Arithmetic meanArchaeological field surveyTask (computing)Metric systemInferenceResultantBuildingAlgorithmEigenvalues and eigenvectorsFunction (mathematics)MetreSoftware developerAreaSemantics (computer science)Mathematical morphologyNetwork topologyDifferent (Kate Ryan album)Endliche ModelltheorieHookingSoftwareElectronic mailing listArithmetic progressionParameter (computer programming)Disk read-and-write headTunisHypercubeInformation technology consultingCondition numberHeuristicLaptopLibrary (computing)VideoconferencingInformationLine (geometry)Filter <Stochastik>EstimatorNormal (geometry)Array data structureStack (abstract data type)Software frameworkSquare numberPopulation densitySampling (statistics)Vertex (graph theory)LinearizationAutocovarianceWave packetKeyboard shortcutComputer fileUtility softwareWahrscheinlichkeitsfunktionPort scannerGame theoryForm (programming)SphereNeuroinformatikRight angleElectric power transmissionImage resolutionMachine learningMachine learningPrimitive (album)Address spaceDomain nameConnected spacePlanar graphProjective planeNeighbourhood (graph theory)Flow separationObservational studyEntire functionRoundness (object)CASE <Informatik>WebsiteIterationTesselationWeightScatteringRoboticsDreidimensionales VideoComputer iconMeeting/InterviewComputer animation
Transcript: English(auto-generated)
All right, thanks, Mike. So right, my name is Brad Chambers. I work for a small consulting company called Grover Consulting Services. But as Mike mentioned, I have worked and contributed to Poodle for, I think, over 10 years at this point. And by the way, great talk, Howard. Always entertaining listening to you.
A little tough to follow you, but hopefully you guys will enjoy this talk. So we're going to talk a little bit about semantic segmentation. And I would invite any of you, if you missed it, there was another talk earlier today from Auslandia. It was an online session where they also talked a little bit about point cloud semantic segmentation.
And it had a lot more in the way of kind of background on what exactly is the task and some of the different models and methods that are available. So I don't know when those videos or slides become available, but I would encourage you to go and look that one up. So today, the point of the talk is going to be to, again, just briefly describe
what is the problem and what are we trying to do when we talk about semantic segmentation and point clouds. And we will be looking also specifically at some benchmark data sets that are available and also some of the metrics that are typically used to measure performance of this particular task.
And then we will spend most of the time talking about some of the open implementations that are available. And you won't be surprised to find that there is a heavy emphasis on some of the things that Poodle can do to make this a little bit easier.
So this is just an example. This tweet was actually, I think, a year old yesterday where I was showing an example of highlighting building points in the point cloud alone. And so you can see those are highlighted in kind of this orange-red color. And then there's another one here
where it's highlighting the vegetation. And so semantic segmentation really is just the process of assigning a semantic label to each of the points in the point cloud. In the raster domain, this would be labeling each pixel. In the raster, it's really the same concept. Now, so in LAS land, there is kind of this predefined set
of some labels. And so typically, the most common ones that people talk about are ground, building, and vegetation. But there are others, right? You can get into water classes, bridge decks. And then people might be interested,
if their data supports it, into going further into things that the LAS spec doesn't necessarily cover, like vehicles, power lines, that type of thing. And when you start doing that type of classification, you end up getting into working with custom classes. So there's not really anything that's unified or widely accepted for some of those.
But they are also kind of in the weeds. A lot of times, it can be a little bit tricky to train the tools to detect those classes. So the basic metric that we will be looking at, or one of them, the most intuitive one
for a lot of people, is to talk about the overall accuracy. And this really is just kind of a percentage of the total number of correctly classified points divided by the total number of points. The problem with this is that if you are after some of those classes that are perhaps
underrepresented and smaller in number, your performance on a particular class gets kind of lost. And it's hard to tell how well you're actually performing. And so for that reason, what we often look at, actually, is this metric called the intersection over union.
And so that's just looking at, exactly as it sounds, the intersection of those truth labels with your predicted ones divided by the union of all of those for that particular class. And so that way, you can hone in on performance of, individually, all of your ground labels, all of your buildings, all of your vegetation, et cetera.
And it's still common, then, to take that and take the mean over it and report a mean intersection over union score. OK, so in terms of benchmark data sets, the two that I really am focusing on in this talk, and they're not the only ones out there,
but these are the ones that are most closely aligned with aerial LiDAR surveys, which is what I mostly deal with myself. There are others that might be more room scans and some very mainstream benchmark data sets, like ModelNet and ScanNet and S3DIS.
There are quite a few out there. And a lot of the libraries and frameworks that you'll find for point cloud semantic segmentation will reference those benchmarks. These two are a little bit newer. They're not brand new. I'd say they're both within the past probably two or three years. So US3D comes from a US cities collection that's available.
And they picked two sites, one in Jacksonville and one in Omaha. And they've got a large number of training tiles, but it's a relatively low point density. And so if you think back to Howard's chart, actually, about how these point densities are beginning
to skyrocket, this kind of represents what would have been mainstream. And actually, I think this data was collected closer to 10 years ago. So it's a relatively low resolution, but there's still plenty of this type of data that you might find out publicly available, or not necessarily even public.
Data was often collected at this resolution. And as we mentioned, those classes really tend to be mostly saturated in ground, vegetation, and building. Those alone are easily like 90% of the entire data set. This did strive to also classify points
as either water or bridge. And then there are some that are still reserved as simply unlabeled. Dale's is much higher density. This was a survey done with a regal sensor in Canada, closer to 25 points per square meter.
But it still shows kind of the same story, where we've got ground, vegetation, and building really dominating most of the labels in the point cloud. There are some others, and this is a good example of where we get into some of the non-standard classes of cars, trucks, even fences.
But they're really much smaller numbers. All of those together were less than 2% of the entire data set. So for the rest of this talk, I'm really gonna be focused on US3D. That's where I've done most of my work. And so tools. There are some frameworks out there,
and we're beginning to see some more development of tools that are publicly available as open source software for semantic segmentation. And so Open3D has a machine learning kind of sub-project called Open3D ML. There's also PyG, which is PyTorch Geometric.
And these do a really good job of taking kind of state-of-the-art algorithms and keeping relatively current with what we might be seeing in the literature. And so there's all sorts of models in there that you can train on.
Where they kind of have, you know, where some of the drawbacks exist are that they really seem to be focused more on like academic or robotics applications. You see lots of data sets and readers and writers for still ASCII data, PCD, PLY.
And as I mentioned earlier, a lot of the benchmarks that they're measuring themselves against, again, are those kind of more mainstream, not aerial LIDAR survey benchmarks. So what does Poodle bring to the table?
So as I would guess many here already know, the Poodle users in the room, we have the flexibility of supporting those many file formats. And so we can still support ASCII, PCD, these types of things that people have been using historically in some of the past research
and the other frameworks that you would find. But we also have that support for LAS, LAZ, as Howard mentioned, you know, that when we get into the kind of the open data holdings, you know, that's where really all of this aerial LIDAR data, that's how it's encoded. And then Poodle, of course, does have support for reading COPC as well.
Poodle also brings to the table the ability to manipulate and generate certain features. And so while you can do machine learning with just, you know, some basic information, you know, coordinates, maybe intensity, stuff that's just natively available to the point cloud,
there's a lot of power and also being able to manipulate it. And so Faerie is up there because you can kind of rename and shuffle around some of the dimensions and Poodle speak. Assign would allow you to adjust things so you can rescale data or do certain kind of,
you know, algebraic operations on combination of dimensions. And then create is this whole separate idea of being able to generate new features from the data itself. And so a lot of these are geometric features taken from point cloud neighborhoods,
you know, eigenvalues, normals, and these covariance features, which people often see, you know, you'll see in a lot of the literature where they talk about estimation of linearity, planarity, sphericity and verticality. And so those are all computed by Poodle. And Poodle doesn't internally support any machine learning
on its own, but it does have a real strength in that the Python ties are already there through the Poodle filter and through the Poodle Python bindings. And so while Python is certainly not the only name in the game when it comes to machine learning, there are clearly a lot of people,
be it in, you know, PyTorch, TensorFlow, Open3D, PyG, all of these kind of mainstream libraries are really rooted in Python. So there's an easy connect there to do some neat things. So backing up a little bit, actually, you know, one of the first filters
that I probably contributed myself to Poodle was this thing called PMF. It was the Progressive Morphological Filter. We later followed that up with SMERF, which is Simple Morphological Filter, and then CSF is the Cloth Simulation Filter. And all of these were focused on discriminating between ground and non-ground points.
And, you know, that alone is a form of semantic segmentation, right? And we've already seen that it addresses, you know, a majority of the points, like 60-ish percent or higher in many data sets. And so there's already a big bang for your buck in just doing that alone. And it's also kind of foundational to other things
that you might want to do, whether it be hide above ground or other things. Then we also, over time, and I've already mentioned a few of these, but we started developing more primitives, or these are some of those features that you can calculate based off of the data. So again, that's eigenvalues and normals
or the height above ground. And out of that, we actually saw, this was separate work that was posted probably a couple of years ago, but there's a slide deck and a Python notebook that you can go and look at. But they actually took a lot of those kind of stock poodle filters and built up a classifier
just based on some pretty simple heuristics to be able to do ground building and vegetation. And if we take, you know, what they provided at the time and write it in more modern poodle using the newest Python bindings, this is really all it looks like. And so it's taking an EPT data source,
running Smurf to get the ground labels, computing a height above ground, then computing eigenvalues and using those dimensions. It's just looking at some simple kind of conditions. So one of them is they're asserting
that things are likely to be trees if they were still unclassified after the ground segmentation step. Miss, can you see the mouse there? And if the height above ground was greater than equal, greater than or equal to two meters, if the eigenvalue, the smallest eigenvalue was larger than a set number,
and then if the number of returns minus the return number was something greater than one. And so if you can kind of wrap your head all around that, it all kind of makes sense as this might be something that represented a tree because it was tall enough, the data was kind of scattered,
and it was in an area that did have multiple returns, but it also wasn't the last return. Similarly, you can take kind of those same, that same short list of dimensions and come up with a crude classifier for the roof. Those points that were unclassified, here they must have been looking for tall buildings
because it was over seven meters. They were looking for small eigenvalues and they were looking for it to be a situation in which these were points that were last returns. And using that, they output it to LAS. So I actually took that and modified it slightly
to run with the US3D dataset that I mentioned earlier and looked at it a couple of ways. So the first was just measuring the overall accuracy and the mean intersection over union for only that three class case, which it was actually designed for. So building vegetation and ground. And you can see that there may be fair scores.
The overall accuracy is like in the 80% kind of range and the intersection over union is pretty low. And then if you extend that to all five classes, which for US3D added water and bridge, unsurprisingly, those numbers, especially the intersection over union scores go way down
because it was never designed to detect those. But this does also highlight, we don't see nearly as much of a drop in the overall accuracy because those classes were so small to begin with. They just don't really impact the score significantly.
So more recently, we had done some work using a more modern true machine learning approach called KP Conf. And what I determined to be roughly state of the art anyway at the time I did the study, which was about a year ago, but linking that up against Poodle
so that we had support for LAS resampling of the data and some of this feature generation. It can be pretty simply installed. I'm a big Conda user and so this is an example of the Conda environment that's all that's needed to get everything installed on the system and up and running.
Here is the GitHub repo where I have my code, which is really just modifications of the original author's own PyTorch code. This isn't necessarily meant to be pretty code. And if you take a look at it, it might be a bit frustrating to places, but it was really done as a proof of concept
to show that it could be done. So that's out there. And I'll just summarize. The key changes are there is an LAS data set that needs to be defined because again, before they weren't handling LAS at all. And so some of the things that we add to that
are we define some of these additional classes that are specific to the LAS spec and then we actually can go in and tell it which of those classes to ignore. Just because LAS specifies all of these doesn't mean that we have the data support it or that we're going to make any attempt to learn some of the smaller classes.
So that list that you see there limits it to just the five classes that I mentioned earlier. And then there's an LAS reader in the utils folder. And again, this is just, it's using Poodle's more modern Python bindings. So it's reading the file name,
limiting it to points that actually do fall within that range. I think there might have been some other points that had kind of erroneous, kind of weird classifications. And so I just tossed them out at the beginning. It computes these covariance features, which is the linearity and verticality of those things that I mentioned earlier,
and then uses Poodle's Poisson filter to down sample it so that no two points are any closer to one meter to one another. And it takes all of these dimensions, stacks them up into NumPy arrays and passes it back out. And then from there, it's really able to use all of the stock stuff that came with the original kp-conf implementation.
And so what we see here is that we've reiterated the heuristics there on the top line. And we can see that with kp-conf and Poodle, we're now up in the upper 90% for overall classification, both in the three class and five class problems.
And our mean intersection over union scores are significantly higher, both up in the 90s as well. And the last one there is just kind of a data point. So DPNet was the top performing algorithm for the paper that was written for this US3D benchmark.
So we don't have a lot of information about that algorithm, I've never tried to run it myself, but that's kind of the 0.945 mean intersection over union was, you know, that was the best at least a couple of years ago. And so we're up there close to it, you know, running like 500 iterations, I think on a desktop machine.
So, you know, presumably we could maybe do better if we did some more hyper parameter tuning and longer runs and all of this, but I think it's reasonable the way we are right there. So I'll just wrap it up to say, you know, I think some of my own desires to see out of kind
of the community at large is, I mean, especially within the machine learning community, which isn't necessarily who this audience is, but I would like to see more adoption, broader adoption across the community of Poodle so that we don't have to do this work after the fact to kind of hook it up and get all the benefits
of being able to support the different data formats and generate, you know, the features. I mean, so much of that is already provided. People don't need to be, you know, rewriting it themselves. I'd also like to see the same authors or future semantic segmentation model authors
publishing results against US3D and DALES so that, you know, the aerial LIDAR survey community can kind of have an understanding of, okay, great, I see how it performs against like the Stanford bunny and all that, but this is how it might perform against data
that I'm actually using in my own job. And then I think there's also some real interesting things that we've just not really begun to look at in terms of connecting some of these methods and models with COPC and being able to really look then into wide scale inference of these semantic segmentation labels based off of,
you know, very massive point cloud data sets. And with that, that's my talk. If there's any questions.