We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Exploiting PDAL and Entwine in the wild

00:00

Formal Metadata

Title
Exploiting PDAL and Entwine in the wild
Title of Series
Number of Parts
295
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The PDAL and Entwine stack is a powerful toolkit for managing and exploiting massive point clouds - and small point clouds, and weird point clouds. Because they're given freely to the community, small enterprises can pick them up and do incredible things - which are normally the reserve of governments, infrastructure providers, and universities. Come for a whirlwind tour of how these tools have been deployed by a tiny business on airborne LiDAR; photogrammetric point clouds; and huge bathymetric surveys - from raw data through to beautiful visualisations, which are also data as foundation infrastructure. Then stay for some words on how you don’t need to be a programmer to give back to the community which grows and supports these capabilities . It's part technical, part research, part business, and part provocation.
Keywords
129
131
137
139
Thumbnail
28:17
Single-precision floating-point formatTouchscreenEvent horizonQuicksortPoint (geometry)Library (computing)BitInformation technology consultingDialectField (computer science)AngleSinc functionExecution unitOrder (biology)Computer animation
QuicksortCodeLevel (video gaming)LaptopMultiplicationSpacetimeMiniDiscBlock (periodic table)Open sourceCASE <Informatik>Query languagePoint cloudDatabaseHill differential equationCartesian coordinate systemPoint (geometry)Link (knot theory)Square numberHydrographPolygonProjective planeData storage deviceBoundary value problemMetadataFile formatGreatest elementProduct (business)Functional (mathematics)Subject indexingThread (computing)Archaeological field surveySlide ruleGraph coloringInverter (logic gate)Mathematical analysisBitVisualization (computer graphics)Order (biology)Process (computing)Dimensional analysisEndliche ModelltheoriePopulation densityGoodness of fitMereologyLatent heatMappingLibrary (computing)AbstractionNumeral (linguistics)Set (mathematics)Service (economics)Social classDesign by contractDevice driverPrototypeNoise (electronics)Computer filePattern languageMessage passingMassMetreInsertion lossArmSign (mathematics)Key (cryptography)MeasurementLie groupDot productAngleData structureUniform boundedness principleDifferent (Kate Ryan album)Scripting languageView (database)Computer animation
Maxima and minimaQuery languageCuboidTesselationSimilarity (geometry)Point (geometry)Subject indexingBookmark (World Wide Web)MereologyResultantMultiplication signPresentation of a groupEndliche ModelltheorieGraph coloringMappingSet (mathematics)Computer animation
BuildingPolygonDimensional analysisSubject indexingValue-added networkPoint (geometry)Process (computing)Building1 (number)QuicksortArmComplex (psychology)Set (mathematics)Line (geometry)Food energyGreatest elementLevel (video gaming)File formatMetreOcean currentComputer animation
Plot (narrative)Scale (map)Scaling (geometry)Network topologyFigurate numberQuicksortSet (mathematics)Projective planeTransformation (genetics)MetrePoint (geometry)BuildingCodeData storage deviceEntire functionLevel (video gaming)HistogramSatelliteMappingAreaSquare numberPort scannerMatrix (mathematics)Bit1 (number)PixelExecution unitPopulation densityInheritance (object-oriented programming)CuboidDampingDifferent (Kate Ryan album)MereologyRight angleSoftware testingEuler anglesWave packetFood energyGeometryArmDot productDigital photographyCASE <Informatik>Graph coloringSheaf (mathematics)Workstation <Musikinstrument>Computer animation
Default (computer science)Self-organizationCodeMultiplicationVisualization (computer graphics)Point (geometry)Scaling (geometry)CASE <Informatik>Multiplication signBitRing (mathematics)Level (video gaming)Different (Kate Ryan album)MereologyPlastikkarteDefault (computer science)Social classQuery languageSelf-organizationMatching (graph theory)NP-hardEndliche ModelltheorieProcess (computing)Data miningClient (computing)Software testingHard disk driveSet (mathematics)Greatest elementMathematicsOpen sourceLink (knot theory)BuildingGame controllerLaptopAbstractionWeb pageNumberLibrary (computing)Form (programming)QuicksortChannel capacityElectric generatorData structureCuboid2 (number)Electronic mailing listStrategy gameOnline helpObject (grammar)Goodness of fitTransformation (genetics)Scripting languageComputer fileArmEnterprise architectureComputer clusterGroup actionRight anglePoint cloudReading (process)AngleSystem callComputer programmingReal numberBlogComputer animation
Transcript: English(auto-generated)
Thanks for coming and staying to the end of the session. It's been a really long and eventful conference, so it's good to see everyone excited at the end. So my name's Adam Steere, and I
am going to talk about exploiting pudal and entwine in the wild. And hopefully this talk is going to work full screen.
Come on, that'll do. That's OK, like that. OK, so just like Connor's talk, this is going to be about the pudal library and entwine working together in the wild. And even in Bucharest, you can find pudal out there.
We're just walking around a cafe, and we found the logo just sitting there. So it's everywhere. So I'll just introduce myself. I'm a freelance geospatial consultant. I run my own little company, like a lot of people here. And one of the points of this talk
is to sort of show you how using the tools that we've got from people like Connor, all of the people that just do have small companies or single person companies even can do some pretty amazing stuff. I hope it's pretty amazing anyway. I think it is. So I've been involved in OSJO stuff for a little bit.
I was on the organizing committee for the Oceania regional conferences. I have a bunch of OSJO things. And I'm pretty bad at actually doing things about it. So sorry about that. We'll get there. And I used to be a field scientist. But since science went crazy with funding stuff,
I turned into a data wrangler. So I came from driving these things around over this stuff, Antarctic sea ice. And I really hope this works. Yes, cool. In order to make models like this which give us data about the sea ice, so this we're trying to capture elevation and then figure out
how thick the sea ice is underneath that. Then turned into a data wrangler. So if you've been coming to phosphor geese for a little while, you might have seen me talk about this in 2017. I used the point data abstraction library to basically work as the infrastructure behind a PyWPS
service that gave you point cloud products on demand. So this is a rasterized hill shade from a complex polygon clip from a 1,600 square kilometer LIDAR data set that the government that collected it still doesn't quite know what to do with it.
So we prototyped that service. Unfortunately, I left that role and the project died. So the link on the bottom is really small to read. But you can go on GitHub and find it if you want to have a look and play with it. I don't recommend running it in production. So that's a little bit about what I've done.
So what do I do now with PDAL and Entwine and what's happened since then? So basically, use these tools for analysis and processing and data visualization. And I like this little slide. It's a great visualization of some data.
And if anyone can guess what data is being visualized there, stick your hand up because not everyone gets it. Single shot, double shot. So the left hand is, yeah. It's great. It's just made. But it's a great example because not everyone gets that.
So it's a cool example of how visualization is different for everybody as well. So I should have made that bigger. So for anyone who's watching live or if you're sitting in here with a laptop, you can find the talk there. And some of the slides have an interactive background.
And you can play around with them. Or you can listen or do both if you want. So the first use of PDAL and Entwine in the wild that I want to tell you about is hydrography. Because normally, we think about these tools for looking up above the ground. There's actually quite a big set of use cases
for multi-beam sonar and analyzing the numerous data formats that come along with hydrographic surveys. So the first use case is actually, sorry, I'm going to see if I can make this a bit bigger.
Now I have really big eyes, sorry. OK, I should have tested this earlier. But anyway, this project has nothing
to do with visualizing data or displaying data. It's part of a quality assurance tool for hydrographic surveys. So this pipeline, this little PDAL pipeline, JSON block, is all about just extracting boundaries from a bunch of hydrographic survey points that are stored as ASCII X, Y's and Z's.
And what we want to do is just get a coverage out, like the boundaries that Connor showed in the last talk, and compare them with what was surveyed. So this just extracts boundaries from those otherwise unwieldy ASCII data formats that hydrographers do not want to get away from. This one here, very similar.
It's just saying it's a very short PDAL pipeline wrapped up in a Python function. Actually, no, this is a Python function that drives a PDAL pipeline as a Python library to get the density of these ASCII points. So you feed it a big ASCII file, and it comes back and says, your points are,
you have 10 points per square meter, or whatever it is. And that, again, is just designed for quality assurance for surveys, because a contractor will go out and collect some data, and you want to check that they've met the specification. So really simple tools. And because Geoscience Australia and FrontierSI
were happy to open source it, you can go to the link there and have a look through a bunch of Python notebooks and run all the tests or have a look at how they're built. And that particular case is just a neat way of PDAL turning up somewhere surprising.
It works well there, because you don't have to write all of the readers and writers yourself. You can just plug them in, drive it with Python. It's pretty easy, and it works. So for me, those are all winning things. And you can make interactive notebooks that let people walk through it all. So another much prettier use case for hydrographic surveying is just making maps.
So this is just over 4 billion points of multibeam sonar data over 700,000 square kilometers that was collected when they were searching for a lost aircraft off the west coast of Australia. So it's a lot of data to get through and map.
And this map was made completely in QGIS using PDAL to drive the processing of the ASCII hydrographic data into something that could be used to draw this map. So here's the first pipeline, and it just steps through.
So basically, the first process was to take the ASCII data, reproject it into something that sort of worked in one map that size instead of having multiple UTM zones, add some dimensions. So we could say, well, it's ground data, and we can give it a color.
And then clean up any noise, invert the z-axis of the data. A lot of hydrographic surveys, the z-value comes out positive, and it's like, well, it's not up, it's down. So we just fixed that. And then we write it out as Lasvos,
and that, for someone just processing a bunch of data, that works really well, because you can throw away these big fluffy ASCII files, store it as Lasvos, save yourself a lot of time and space on disk, a lot of space on disk at least. But this is, basically you can just collect every one of those big ASCII files,
run this pipeline once, and it's done on all of them. That's more or less all of the magic code that you have to write. And then you can switch to something else, like a Bash shell, and query all of those Lasv files, stuff them into a database. So this is another block of PDAL, driven by a Bash script,
that just queries the metadata for every one of those Lasv files, and then stuffs the data into a PostGIS database. So here, grab all the ASCII from a thread server, which is an interesting exercise in itself. Reproject, invert, get the boundaries, stuff it into PostGIS,
and we'll make an entwined index for good measure, which we'll see in a bit, and do stuff, like make pretty maps. And here's the result of entwinedifying it. So when you're watching this presentation at home, you can play with this data set, and you can browse around and have a look at it all.
So this is visualized using the Poetry WebGL viewer, straight from an entwined point tile index. And you can see that the RGB colors that I've applied come out, otherwise it would just be looking at black points. And so then you can do other stuff.
Once you've got that EPT index, say we wanted to make an elevation model of part of the sea floor. It's just a really simple query a box from the EPT and go for it. And then you can dump it back into QGIS and make pretty maps and spit them out again. But I'm using more time than I thought, so I'm gonna have to go faster.
So going back up above the ground, we can do similar stuff with landscape-style data. So this is 1,600 square kilometers of the Australian Capital Territory again. This is my favorite toy data set, because it's really pretty. So it's colorized with PDAL, and then made into an entwined index,
and that lets us do stuff like clip buildings out. So we might not be interested in all of it, we just want a building. So this is a two-stage process. First we have to collect points from inside a bounding box, and then lower down and clip them by a complex polygon, so that I've truncated the long line of coordinates at the bottom there. But I think it's either current releases of entwine
or upcoming ones, you can just go straight to the complex polygon clip and you don't have to do it in two stages, which is really cool. And then you can just get stuff like this. So this is the National Museum of Australia. It's come straight from that 30 billion point data set, without having to search three ties,
I've just gone, here's the bounding box, get me the data. And on the way, so I've put the dimensions up there, because if you're looking at the Z dimension there, it's the point that I've picked is 13 meters. So on the way between my entwine index and this little data set, I've got the building out as height above ground. So I'm looking at building heights rather than absolute elevations.
So that's all done with fairly simple tooling. And here we go. So again, the workflow colorized and Twinefy do stuff. So it's a lot shorter than what I did with the hydrographic data. Granted there are less steps because the data was already in a decent format.
Oh, this one worked great. So this one is another above ground example. This is 8,000 square kilometers of river basin. There's not so many buildings to care about there, but if you're looking at it from a scientific point of view, you might be caring about the tree-ness.
So for a given unit, say you're looking at satellite data and going, well, I've got 25 meter satellite pixels. I can do some tree detection and figure out how many trees are in it. We've got this giant LIDAR data set. How can we use that to QA it? Well, now that we have it entwined, we can use PDAL to grab a little section, run some Python code.
I won't show you the Python code because it's pretty long, but basically just does, collects all of the tree classified points in the data, writes out a 2D histogram and gives us back a rasterized tree-ness histogram as a GeoTIFF, which looks like this. So it's come out a little bit blurry, but in the coincident aerial photo,
is the base map there. And on top, blue-colored areas are very few trees, up to brighter yellow, which is nearly fully tree-colored. And you can sort of see that the trees follow the river or old river paths. So it's a really neat way of being able to test,
check against other things or make these maps from scratch if they don't exist. So this one's set for 10 meter pixels. So you might go, well, that kind of matches up with Sentinel-1, or you can do whatever you like. But again, it's pretty easy. Colorize, entwine if I do stuff. And it's all those really short units
using PDAL to drive stuff and entwine as the storage medium. And that's a really easy storage medium to work with. So I'm gonna go to smaller scales now. So we've gone from 700,000, well, whole continents to sea floors to sort of landscape scales.
And now, what if you're looking at smaller scales than that? Is entwine still a useful tool? And I think it is, because this is a cool project that I had and it's the only picture I can show you of it. But in there is a 30 point per meter airborne scan. And the original data set for this building stadium on the lower right,
was just an un-georeferenced terrestrial LIDAR scan. It's about 1.2 billion points. So it's super dense. Like if you do a less density, it's like 10,000 points per meter. There's a lot of data there. And our challenge was to co-register the two. And we're like, what do we do?
So in this case, we were able to find a few co-registered points and then use tools that are all contained in PDAL to find the relationship between those points, apply a transformation and glue the two data sets together and then visualize them. So this was part of a sort of stand in a basketball court and then zoom out through the building,
out to sort of the entire landscape scale project. And again, workflow. This is a bit longer. Find co-registration points, generate the matrix of play using PDAL. Come on. Entwineify and do stuff.
So again, there we could clip out bits of the building or do different things if you knew the bounding box and wanted to do stuff. And then everyone wants to talk about these things, so I'm gonna talk about them as well. Little remote piloted aircraft as we call them properly where I come from.
This one's really cool. So that little aircraft weighs 300 grammes. I can launch it from, oh no, killed. Anyway, the point of that one was to show a cliff model that I made with that. And you can see, I don't know how many people are rock climbers, but you can see the ring bolts. You can measure the ring bolts on the cliff.
And then because it's viewed in JavaScript as well, you can add more and more stuff to it. But that's beyond the scope of this talk, I think. But this one's a real use case. So going back to our ACT LIDAR, Australian Capital Territory LIDAR data set. We had a case where we wanted to test change detection.
So the upper data set there is flowing with the RPA. Took about six minutes to fly it, collected a bunch of photos, processed it. And we didn't have ground control points. It's like, well, we'll use the, whatever GPS comes with the drone. And then we have the challenge of, well, how do we know where it is? Does it match up with anything else? So we're able to use the point data abstraction library
to classify ground in the RPA data, use an iterative closest point matter with ground class points from the fully geodetically controlled LIDAR data set, match the two, and then generate a difference model. So here, in the top model, that's colored by difference.
So blue is not very different, yellow is very different. And this is, the LIDAR data was flown in 2015. The RPA data was collected in 2018. And you can see a building's been built. And a little bit of ground's been moved around around it as well. That was a great little use case for those capacities.
And again, classify the ground, generate transforms, apply the transformation, and twineify, do stuff. And I think that's all the stuff I had to say, no more. That's the technical part of the talk. And now the second part of the talk that I promised
is how having these communities that build this stuff and all work together to support it, let people like me, who's just one person, do these things that you can play around with many thousands of kilometers worth of LIDAR data and make all these magical things happen and just pop them out in thin air
and go, well, it's done now. So I know a number of big companies in Australia are still struggling, they're sort of shipping multiple terabytes of hard drives around when they need to analyze LIDAR data. I keep talking to them and say, well, we can just do this. And one day it'll start working. But the basic practice in my little enterprise,
I guess, is being opened by default. And that's because all of this stuff that I use happens because of passionate people. So I want to pick up that model and use it. So when I do stuff or write about things on my blog, the whole recipe is there, there's nothing held back.
It doesn't help my business to keep things private because I want to be in the business of making cool stuff happen, not the business of protecting IP. If I was a lawyer, I might think differently, but I don't because I'm one guy and there's lots of smart people. And I am standing on the shoulders of giants.
I wouldn't be able to do any of this if it wasn't for the people that are here. And also the organization that supports it. So I'm relatively new to OSGO. Even though I've been using open source GIS things for decades, I think four years ago I was like,
oh, what's OSGO, I should go check that out and come to a Phosphor G and that was the first time I ever did. So it's there and it's worth getting involved in. But it works as long as we contribute, but I don't know how to write C++. I can hack around in Python and Bash and stuff, but I can't make commits to make Entwine better
or PDAL better or anything. I struggle with JavaScript. All my visualizers are just really janky, throwing together JavaScript. And that's my programming ability. So what can I do? We can support the people that do know all this stuff and that's by coming here and,
I'm out of time so I'm just gonna go through, and supporting things like this. So getting involved in running and growing the community that we depend on. And that's it. Thank you for coming.
And just one more thing. The link at the bottom of the page there are sort of long form reads of all of the stuff that I've shown there. You can go and look at it all and take it and do what you need. Does anyone have any questions?
Yeah, sure. Oh, thank you. Do you have some experience of using PDAL for object detection inside the clouds? For example, polls. No, I don't yet. It's been on my list of things to do, but it's highly dependent on being funded
to do it these days. Okay, thank you. Thanks, thanks so much Adam for the talk. I have a question which is mainly involved
both to you or to Connor. I was curious when the amount of points is really exploding and you're storing it to some buckets on Amazon or on your own hard disk. Connor already mentioned it goes from seconds to minutes to query if you want specific points. Is there a strategy with any noticeable difference
if you spread your points over different resources or it doesn't really matter? One thing I've been meaning to test and I haven't and maybe Connor's done it is one of the valuable things about the EPT structure is that I've been sending out these little bounding boxes you could just split your query
across multiple bounding boxes and as long as you weren't getting the edge points twice you could send your query out across multiple processes or scale it out across as many processes as you liked rather than having one process chugging down lots and lots of points. Does that sort of answer your question or? Yeah, I was also curious whether it matters
whether you disperse your points over multiple resources instead of multiple hard disks, multiple buckets. That's probably a Connor question. That's a lower level layer than I know about. So yeah. I don't think it would help all that much. Maybe if you were trying to build like a full globe's worth maybe you'd get a little bit of benefit. I should have clarified though
when I was talking about the timings in mine that was timing for downloading all the data and running all the algorithmic stuff on the client. So the query itself is quite fast and it's only a few files, a few megabytes but I'm reaching out from Iowa City to S3 somewhere downloading it all and the actual Poodle pipeline stuff was running on my laptop. So that's where that time comes from.
All right, so it's totally scalable in a sense. Cool, any more questions? I thought this was a question.
All right, well thanks everyone. And enjoy the.