Exploiting PDAL and Entwine in the wild
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 295 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43439 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Bucharest 2019145 / 295
15
20
28
32
37
38
39
40
41
42
43
44
46
48
52
54
57
69
72
75
83
85
87
88
101
103
105
106
108
111
114
119
122
123
126
129
130
131
132
137
139
140
141
142
143
144
147
148
149
155
157
159
163
166
170
171
179
189
191
192
193
194
195
196
197
202
207
212
213
214
215
216
231
235
251
252
263
287
00:00
Single-precision floating-point formatTouchscreenEvent horizonQuicksortPoint (geometry)Library (computing)BitInformation technology consultingDialectField (computer science)AngleSinc functionExecution unitOrder (biology)Computer animation
02:12
QuicksortCodeLevel (video gaming)LaptopMultiplicationSpacetimeMiniDiscBlock (periodic table)Open sourceCASE <Informatik>Query languagePoint cloudDatabaseHill differential equationCartesian coordinate systemPoint (geometry)Link (knot theory)Square numberHydrographPolygonProjective planeData storage deviceBoundary value problemMetadataFile formatGreatest elementProduct (business)Functional (mathematics)Subject indexingThread (computing)Archaeological field surveySlide ruleGraph coloringInverter (logic gate)Mathematical analysisBitVisualization (computer graphics)Order (biology)Process (computing)Dimensional analysisEndliche ModelltheoriePopulation densityGoodness of fitMereologyLatent heatMappingLibrary (computing)AbstractionNumeral (linguistics)Set (mathematics)Service (economics)Social classDesign by contractDevice driverPrototypeNoise (electronics)Computer filePattern languageMessage passingMassMetreInsertion lossArmSign (mathematics)Key (cryptography)MeasurementLie groupDot productAngleData structureUniform boundedness principleDifferent (Kate Ryan album)Scripting languageView (database)Computer animation
09:50
Maxima and minimaQuery languageCuboidTesselationSimilarity (geometry)Point (geometry)Subject indexingBookmark (World Wide Web)MereologyResultantMultiplication signPresentation of a groupEndliche ModelltheorieGraph coloringMappingSet (mathematics)Computer animation
10:55
BuildingPolygonDimensional analysisSubject indexingValue-added networkPoint (geometry)Process (computing)Building1 (number)QuicksortArmComplex (psychology)Set (mathematics)Line (geometry)Food energyGreatest elementLevel (video gaming)File formatMetreOcean currentComputer animation
12:23
Plot (narrative)Scale (map)Scaling (geometry)Network topologyFigurate numberQuicksortSet (mathematics)Projective planeTransformation (genetics)MetrePoint (geometry)BuildingCodeData storage deviceEntire functionLevel (video gaming)HistogramSatelliteMappingAreaSquare numberPort scannerMatrix (mathematics)Bit1 (number)PixelExecution unitPopulation densityInheritance (object-oriented programming)CuboidDampingDifferent (Kate Ryan album)MereologyRight angleSoftware testingEuler anglesWave packetFood energyGeometryArmDot productDigital photographyCASE <Informatik>Graph coloringSheaf (mathematics)Workstation <Musikinstrument>Computer animation
16:01
Default (computer science)Self-organizationCodeMultiplicationVisualization (computer graphics)Point (geometry)Scaling (geometry)CASE <Informatik>Multiplication signBitRing (mathematics)Level (video gaming)Different (Kate Ryan album)MereologyPlastikkarteDefault (computer science)Social classQuery languageSelf-organizationMatching (graph theory)NP-hardEndliche ModelltheorieProcess (computing)Data miningClient (computing)Software testingHard disk driveSet (mathematics)Greatest elementMathematicsOpen sourceLink (knot theory)BuildingGame controllerLaptopAbstractionWeb pageNumberLibrary (computing)Form (programming)QuicksortChannel capacityElectric generatorData structureCuboid2 (number)Electronic mailing listStrategy gameOnline helpObject (grammar)Goodness of fitTransformation (genetics)Scripting languageComputer fileArmEnterprise architectureComputer clusterGroup actionRight anglePoint cloudReading (process)AngleSystem callComputer programmingReal numberBlogComputer animation
Transcript: English(auto-generated)
00:07
Thanks for coming and staying to the end of the session. It's been a really long and eventful conference, so it's good to see everyone excited at the end. So my name's Adam Steere, and I
00:23
am going to talk about exploiting pudal and entwine in the wild. And hopefully this talk is going to work full screen.
00:46
Come on, that'll do. That's OK, like that. OK, so just like Connor's talk, this is going to be about the pudal library and entwine working together in the wild. And even in Bucharest, you can find pudal out there.
01:03
We're just walking around a cafe, and we found the logo just sitting there. So it's everywhere. So I'll just introduce myself. I'm a freelance geospatial consultant. I run my own little company, like a lot of people here. And one of the points of this talk
01:21
is to sort of show you how using the tools that we've got from people like Connor, all of the people that just do have small companies or single person companies even can do some pretty amazing stuff. I hope it's pretty amazing anyway. I think it is. So I've been involved in OSJO stuff for a little bit.
01:42
I was on the organizing committee for the Oceania regional conferences. I have a bunch of OSJO things. And I'm pretty bad at actually doing things about it. So sorry about that. We'll get there. And I used to be a field scientist. But since science went crazy with funding stuff,
02:01
I turned into a data wrangler. So I came from driving these things around over this stuff, Antarctic sea ice. And I really hope this works. Yes, cool. In order to make models like this which give us data about the sea ice, so this we're trying to capture elevation and then figure out
02:23
how thick the sea ice is underneath that. Then turned into a data wrangler. So if you've been coming to phosphor geese for a little while, you might have seen me talk about this in 2017. I used the point data abstraction library to basically work as the infrastructure behind a PyWPS
02:41
service that gave you point cloud products on demand. So this is a rasterized hill shade from a complex polygon clip from a 1,600 square kilometer LIDAR data set that the government that collected it still doesn't quite know what to do with it.
03:01
So we prototyped that service. Unfortunately, I left that role and the project died. So the link on the bottom is really small to read. But you can go on GitHub and find it if you want to have a look and play with it. I don't recommend running it in production. So that's a little bit about what I've done.
03:23
So what do I do now with PDAL and Entwine and what's happened since then? So basically, use these tools for analysis and processing and data visualization. And I like this little slide. It's a great visualization of some data.
03:41
And if anyone can guess what data is being visualized there, stick your hand up because not everyone gets it. Single shot, double shot. So the left hand is, yeah. It's great. It's just made. But it's a great example because not everyone gets that.
04:00
So it's a cool example of how visualization is different for everybody as well. So I should have made that bigger. So for anyone who's watching live or if you're sitting in here with a laptop, you can find the talk there. And some of the slides have an interactive background.
04:21
And you can play around with them. Or you can listen or do both if you want. So the first use of PDAL and Entwine in the wild that I want to tell you about is hydrography. Because normally, we think about these tools for looking up above the ground. There's actually quite a big set of use cases
04:40
for multi-beam sonar and analyzing the numerous data formats that come along with hydrographic surveys. So the first use case is actually, sorry, I'm going to see if I can make this a bit bigger.
05:08
Now I have really big eyes, sorry. OK, I should have tested this earlier. But anyway, this project has nothing
05:20
to do with visualizing data or displaying data. It's part of a quality assurance tool for hydrographic surveys. So this pipeline, this little PDAL pipeline, JSON block, is all about just extracting boundaries from a bunch of hydrographic survey points that are stored as ASCII X, Y's and Z's.
05:40
And what we want to do is just get a coverage out, like the boundaries that Connor showed in the last talk, and compare them with what was surveyed. So this just extracts boundaries from those otherwise unwieldy ASCII data formats that hydrographers do not want to get away from. This one here, very similar.
06:01
It's just saying it's a very short PDAL pipeline wrapped up in a Python function. Actually, no, this is a Python function that drives a PDAL pipeline as a Python library to get the density of these ASCII points. So you feed it a big ASCII file, and it comes back and says, your points are,
06:21
you have 10 points per square meter, or whatever it is. And that, again, is just designed for quality assurance for surveys, because a contractor will go out and collect some data, and you want to check that they've met the specification. So really simple tools. And because Geoscience Australia and FrontierSI
06:42
were happy to open source it, you can go to the link there and have a look through a bunch of Python notebooks and run all the tests or have a look at how they're built. And that particular case is just a neat way of PDAL turning up somewhere surprising.
07:00
It works well there, because you don't have to write all of the readers and writers yourself. You can just plug them in, drive it with Python. It's pretty easy, and it works. So for me, those are all winning things. And you can make interactive notebooks that let people walk through it all. So another much prettier use case for hydrographic surveying is just making maps.
07:23
So this is just over 4 billion points of multibeam sonar data over 700,000 square kilometers that was collected when they were searching for a lost aircraft off the west coast of Australia. So it's a lot of data to get through and map.
07:42
And this map was made completely in QGIS using PDAL to drive the processing of the ASCII hydrographic data into something that could be used to draw this map. So here's the first pipeline, and it just steps through.
08:01
So basically, the first process was to take the ASCII data, reproject it into something that sort of worked in one map that size instead of having multiple UTM zones, add some dimensions. So we could say, well, it's ground data, and we can give it a color.
08:21
And then clean up any noise, invert the z-axis of the data. A lot of hydrographic surveys, the z-value comes out positive, and it's like, well, it's not up, it's down. So we just fixed that. And then we write it out as Lasvos,
08:40
and that, for someone just processing a bunch of data, that works really well, because you can throw away these big fluffy ASCII files, store it as Lasvos, save yourself a lot of time and space on disk, a lot of space on disk at least. But this is, basically you can just collect every one of those big ASCII files,
09:01
run this pipeline once, and it's done on all of them. That's more or less all of the magic code that you have to write. And then you can switch to something else, like a Bash shell, and query all of those Lasv files, stuff them into a database. So this is another block of PDAL, driven by a Bash script,
09:21
that just queries the metadata for every one of those Lasv files, and then stuffs the data into a PostGIS database. So here, grab all the ASCII from a thread server, which is an interesting exercise in itself. Reproject, invert, get the boundaries, stuff it into PostGIS,
09:41
and we'll make an entwined index for good measure, which we'll see in a bit, and do stuff, like make pretty maps. And here's the result of entwinedifying it. So when you're watching this presentation at home, you can play with this data set, and you can browse around and have a look at it all.
10:02
So this is visualized using the Poetry WebGL viewer, straight from an entwined point tile index. And you can see that the RGB colors that I've applied come out, otherwise it would just be looking at black points. And so then you can do other stuff.
10:20
Once you've got that EPT index, say we wanted to make an elevation model of part of the sea floor. It's just a really simple query a box from the EPT and go for it. And then you can dump it back into QGIS and make pretty maps and spit them out again. But I'm using more time than I thought, so I'm gonna have to go faster.
10:41
So going back up above the ground, we can do similar stuff with landscape-style data. So this is 1,600 square kilometers of the Australian Capital Territory again. This is my favorite toy data set, because it's really pretty. So it's colorized with PDAL, and then made into an entwined index,
11:00
and that lets us do stuff like clip buildings out. So we might not be interested in all of it, we just want a building. So this is a two-stage process. First we have to collect points from inside a bounding box, and then lower down and clip them by a complex polygon, so that I've truncated the long line of coordinates at the bottom there. But I think it's either current releases of entwine
11:22
or upcoming ones, you can just go straight to the complex polygon clip and you don't have to do it in two stages, which is really cool. And then you can just get stuff like this. So this is the National Museum of Australia. It's come straight from that 30 billion point data set, without having to search three ties,
11:40
I've just gone, here's the bounding box, get me the data. And on the way, so I've put the dimensions up there, because if you're looking at the Z dimension there, it's the point that I've picked is 13 meters. So on the way between my entwine index and this little data set, I've got the building out as height above ground. So I'm looking at building heights rather than absolute elevations.
12:03
So that's all done with fairly simple tooling. And here we go. So again, the workflow colorized and Twinefy do stuff. So it's a lot shorter than what I did with the hydrographic data. Granted there are less steps because the data was already in a decent format.
12:24
Oh, this one worked great. So this one is another above ground example. This is 8,000 square kilometers of river basin. There's not so many buildings to care about there, but if you're looking at it from a scientific point of view, you might be caring about the tree-ness.
12:40
So for a given unit, say you're looking at satellite data and going, well, I've got 25 meter satellite pixels. I can do some tree detection and figure out how many trees are in it. We've got this giant LIDAR data set. How can we use that to QA it? Well, now that we have it entwined, we can use PDAL to grab a little section, run some Python code.
13:01
I won't show you the Python code because it's pretty long, but basically just does, collects all of the tree classified points in the data, writes out a 2D histogram and gives us back a rasterized tree-ness histogram as a GeoTIFF, which looks like this. So it's come out a little bit blurry, but in the coincident aerial photo,
13:24
is the base map there. And on top, blue-colored areas are very few trees, up to brighter yellow, which is nearly fully tree-colored. And you can sort of see that the trees follow the river or old river paths. So it's a really neat way of being able to test,
13:42
check against other things or make these maps from scratch if they don't exist. So this one's set for 10 meter pixels. So you might go, well, that kind of matches up with Sentinel-1, or you can do whatever you like. But again, it's pretty easy. Colorize, entwine if I do stuff. And it's all those really short units
14:01
using PDAL to drive stuff and entwine as the storage medium. And that's a really easy storage medium to work with. So I'm gonna go to smaller scales now. So we've gone from 700,000, well, whole continents to sea floors to sort of landscape scales.
14:20
And now, what if you're looking at smaller scales than that? Is entwine still a useful tool? And I think it is, because this is a cool project that I had and it's the only picture I can show you of it. But in there is a 30 point per meter airborne scan. And the original data set for this building stadium on the lower right,
14:41
was just an un-georeferenced terrestrial LIDAR scan. It's about 1.2 billion points. So it's super dense. Like if you do a less density, it's like 10,000 points per meter. There's a lot of data there. And our challenge was to co-register the two. And we're like, what do we do?
15:00
So in this case, we were able to find a few co-registered points and then use tools that are all contained in PDAL to find the relationship between those points, apply a transformation and glue the two data sets together and then visualize them. So this was part of a sort of stand in a basketball court and then zoom out through the building,
15:21
out to sort of the entire landscape scale project. And again, workflow. This is a bit longer. Find co-registration points, generate the matrix of play using PDAL. Come on. Entwineify and do stuff.
15:42
So again, there we could clip out bits of the building or do different things if you knew the bounding box and wanted to do stuff. And then everyone wants to talk about these things, so I'm gonna talk about them as well. Little remote piloted aircraft as we call them properly where I come from.
16:01
This one's really cool. So that little aircraft weighs 300 grammes. I can launch it from, oh no, killed. Anyway, the point of that one was to show a cliff model that I made with that. And you can see, I don't know how many people are rock climbers, but you can see the ring bolts. You can measure the ring bolts on the cliff.
16:21
And then because it's viewed in JavaScript as well, you can add more and more stuff to it. But that's beyond the scope of this talk, I think. But this one's a real use case. So going back to our ACT LIDAR, Australian Capital Territory LIDAR data set. We had a case where we wanted to test change detection.
16:40
So the upper data set there is flowing with the RPA. Took about six minutes to fly it, collected a bunch of photos, processed it. And we didn't have ground control points. It's like, well, we'll use the, whatever GPS comes with the drone. And then we have the challenge of, well, how do we know where it is? Does it match up with anything else? So we're able to use the point data abstraction library
17:03
to classify ground in the RPA data, use an iterative closest point matter with ground class points from the fully geodetically controlled LIDAR data set, match the two, and then generate a difference model. So here, in the top model, that's colored by difference.
17:23
So blue is not very different, yellow is very different. And this is, the LIDAR data was flown in 2015. The RPA data was collected in 2018. And you can see a building's been built. And a little bit of ground's been moved around around it as well. That was a great little use case for those capacities.
17:42
And again, classify the ground, generate transforms, apply the transformation, and twineify, do stuff. And I think that's all the stuff I had to say, no more. That's the technical part of the talk. And now the second part of the talk that I promised
18:00
is how having these communities that build this stuff and all work together to support it, let people like me, who's just one person, do these things that you can play around with many thousands of kilometers worth of LIDAR data and make all these magical things happen and just pop them out in thin air
18:20
and go, well, it's done now. So I know a number of big companies in Australia are still struggling, they're sort of shipping multiple terabytes of hard drives around when they need to analyze LIDAR data. I keep talking to them and say, well, we can just do this. And one day it'll start working. But the basic practice in my little enterprise,
18:42
I guess, is being opened by default. And that's because all of this stuff that I use happens because of passionate people. So I want to pick up that model and use it. So when I do stuff or write about things on my blog, the whole recipe is there, there's nothing held back.
19:02
It doesn't help my business to keep things private because I want to be in the business of making cool stuff happen, not the business of protecting IP. If I was a lawyer, I might think differently, but I don't because I'm one guy and there's lots of smart people. And I am standing on the shoulders of giants.
19:23
I wouldn't be able to do any of this if it wasn't for the people that are here. And also the organization that supports it. So I'm relatively new to OSGO. Even though I've been using open source GIS things for decades, I think four years ago I was like,
19:41
oh, what's OSGO, I should go check that out and come to a Phosphor G and that was the first time I ever did. So it's there and it's worth getting involved in. But it works as long as we contribute, but I don't know how to write C++. I can hack around in Python and Bash and stuff, but I can't make commits to make Entwine better
20:02
or PDAL better or anything. I struggle with JavaScript. All my visualizers are just really janky, throwing together JavaScript. And that's my programming ability. So what can I do? We can support the people that do know all this stuff and that's by coming here and,
20:22
I'm out of time so I'm just gonna go through, and supporting things like this. So getting involved in running and growing the community that we depend on. And that's it. Thank you for coming.
20:40
And just one more thing. The link at the bottom of the page there are sort of long form reads of all of the stuff that I've shown there. You can go and look at it all and take it and do what you need. Does anyone have any questions?
21:00
Yeah, sure. Oh, thank you. Do you have some experience of using PDAL for object detection inside the clouds? For example, polls. No, I don't yet. It's been on my list of things to do, but it's highly dependent on being funded
21:23
to do it these days. Okay, thank you. Thanks, thanks so much Adam for the talk. I have a question which is mainly involved
21:41
both to you or to Connor. I was curious when the amount of points is really exploding and you're storing it to some buckets on Amazon or on your own hard disk. Connor already mentioned it goes from seconds to minutes to query if you want specific points. Is there a strategy with any noticeable difference
22:01
if you spread your points over different resources or it doesn't really matter? One thing I've been meaning to test and I haven't and maybe Connor's done it is one of the valuable things about the EPT structure is that I've been sending out these little bounding boxes you could just split your query
22:21
across multiple bounding boxes and as long as you weren't getting the edge points twice you could send your query out across multiple processes or scale it out across as many processes as you liked rather than having one process chugging down lots and lots of points. Does that sort of answer your question or? Yeah, I was also curious whether it matters
22:40
whether you disperse your points over multiple resources instead of multiple hard disks, multiple buckets. That's probably a Connor question. That's a lower level layer than I know about. So yeah. I don't think it would help all that much. Maybe if you were trying to build like a full globe's worth maybe you'd get a little bit of benefit. I should have clarified though
23:00
when I was talking about the timings in mine that was timing for downloading all the data and running all the algorithmic stuff on the client. So the query itself is quite fast and it's only a few files, a few megabytes but I'm reaching out from Iowa City to S3 somewhere downloading it all and the actual Poodle pipeline stuff was running on my laptop. So that's where that time comes from.
23:22
All right, so it's totally scalable in a sense. Cool, any more questions? I thought this was a question.
23:42
All right, well thanks everyone. And enjoy the.