Continental Scale Point Cloud Data Management with Entwine
Formal Metadata
Title |
Continental Scale Point Cloud Data Management with Entwine
|
Alternative Title |
Continental Scale Point Cloud Data Management and Exploitation with Entwine
|
Title of Series | |
Author |
|
Contributors |
|
License |
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. |
Identifiers |
|
Publisher |
|
Release Date |
2019
|
Language |
English
|
Content Metadata
Subject Area | |
Abstract |
The defining characteristic of point cloud data is that they are large, and tools such as [Entwine](https://entwine.io) and the Entwine Point Tile specification can help you overcome their bigness. We will discuss how we used Entwine and EPT to construct point cloud web services for the [USGS 3DEP LiDAR data](https://usgs.entwine.io) of the United States as an Amazon Public Dataset. We will also demonstrate how to leverage EPT web services with open source software such as [PDAL](https://pdal.io) to extract information, enhance data utility, and reduce data volume for tasks such as filtering, object identification, and visualization. You will learn about how these tools work together with others such as [GDAL](https://www.gdal.org/) and [PROJ](https://proj4.org/) to provide data management and processing pipelines for expansive data holdings.
|
Keywords | General |
Related Material
Video is cited by the following resource

00:00
Scale (map)
Scaling (geometry)
Open source
Multiplication sign
Point (geometry)
Bit
Graph coloring
Digital rights management
Software
Point cloud
Digital rights management
Metropolitan area network
Point cloud
00:39
Point (geometry)
Filter <Stochastik>
Divisor
Computer file
Point (geometry)
Projective plane
Similarity (geometry)
Bit
Function (mathematics)
Process (computing)
Point cloud
Abstraction
Library (computing)
Abstraction
Library (computing)
Point cloud
01:22
Point (geometry)
Complex (psychology)
Noise (electronics)
Building
Computer file
Open source
Block (periodic table)
Set (mathematics)
Angle
Function (mathematics)
Reading (process)
02:01
Point (geometry)
Trail
Building
Network topology
Block (periodic table)
Electric power transmission
Wave packet
02:25
Type theory
Volume
Mathematics
Discrete element method
Different (Kate Ryan album)
Image resolution
Point cloud
Set (mathematics)
Endliche Modelltheorie
02:47
Point (geometry)
Type theory
Dataflow
Sign (mathematics)
Software
Multiplication sign
03:09
Point (geometry)
Population density
State of matter
Order of magnitude
Tessellation
03:33
Point (geometry)
Mapping
Software
Tessellation
03:58
Point (geometry)
Computer file
Transformation (genetics)
Set (mathematics)
Attribute grammar
Fluid statics
Order (biology)
Software
Query language
Data structure
output
Data compression
Self-organization
Point cloud
Standard deviation
Multiplication
Information
File format
Parallel computing
Point (geometry)
Data storage device
Electronic mailing list
Attribute grammar
Scalability
Tessellation
Type theory
Software
Self-organization
Point cloud
output
Data structure
05:27
Point (geometry)
Network topology
1 (number)
Representation (politics)
Data structure
05:49
Point (geometry)
Arithmetic mean
Response time (technology)
Service (economics)
Scaling (geometry)
Visualization (computer graphics)
Point cloud
Tessellation
Physical system
Point cloud
06:46
Videoconferencing
Analytic set
Data structure
Exploit (computer security)
2 (number)
07:24
Server (computing)
Computer file
File format
Patch (Unix)
Projective plane
Gradient
Bit
Water vapor
Exploit (computer security)
Number
Roundness (object)
Visualization (computer graphics)
File system
MiniDisc
Quicksort
Endliche Modelltheorie
Audiovisualisierung
08:45
Point (geometry)
Multiplication
Query language
Patch (Unix)
Point cloud
Similarity (geometry)
Right angle
Database
Figurate number
Product (business)
Tessellation
09:26
Point (geometry)
Noise (electronics)
Digital filter
Algorithm
Computer file
Image resolution
State of matter
Limit (category theory)
Bound state
Mereology
Operator (mathematics)
Orthogonality
Order (biology)
Boundary value problem
Data structure
10:06
Metre
Point (geometry)
Laptop
Hexagon
Image resolution
Hierarchy
Electronic visual display
Boundary value problem
Thresholding (image processing)
Mereology
2 (number)
10:48
Algorithm
Greatest element
Service (economics)
Algorithm
Total S.A.
Bit
Attribute grammar
Revision control
Arithmetic mean
Web service
Error message
Different (Kate Ryan album)
Rewriting
Website
Normal (geometry)
Endliche Modelltheorie
Resultant
Social class
11:39
Standard deviation
Context awareness
Building
Multiplication sign
File format
Dimensional analysis
Different (Kate Ryan album)
Hypermedia
Befehlsprozessor
Analogy
Single-precision floating-point format
Visualization (computer graphics)
Endliche Modelltheorie
Data compression
Point cloud
Algorithm
Mapping
File format
Building
Open source
Attribute grammar
Bit
Computer
Tessellation
Type theory
Volumenvisualisierung
Endliche Modelltheorie
Quicksort
Resultant
Data compression
Point (geometry)
Digital filter
Open source
Codierung <Programmierung>
Exploit (computer security)
Attribute grammar
Number
Wave packet
Software
Octree
Data structure
Standard deviation
Information
Exploit (computer security)
Equivalence relation
Visualization (computer graphics)
Mixed reality
Point cloud
Video game
Library (computing)
14:33
Scripting language
Scripting language
Suite (music)
Transformation (genetics)
File format
Multiplication sign
Projective plane
Java applet
Translation (relic)
Web browser
Translation (relic)
Web browser
Tessellation
Process (computing)
Utility software
MiniDisc
Routing
Library (computing)
15:17
Gateway (telecommunications)
Server (computing)
Dependent and independent variables
Tower
Direction (geometry)
Reflection (mathematics)
Point cloud
Routing
Equivalence relation
Tessellation
15:43
Computer program
Server (computing)
Statistics
Gateway (telecommunications)
Scaling (geometry)
Transformation (genetics)
Server (computing)
Multiplication sign
Projective plane
Archaeological field survey
Set (mathematics)
Transformation (genetics)
Equivalence relation
2 (number)
Lambda calculus
Point cloud
16:18
Computer program
Software
Multiplication sign
Point (geometry)
Software
Computer program
Set (mathematics)
Information privacy
Tessellation
17:19
Point (geometry)
Laptop
Greatest element
Dot product
Open source
State of matter
Transformation (genetics)
Coma Berenices
Tessellation
Volumenvisualisierung
Boundary value problem
Right angle
Data structure
18:06
Point (geometry)
Metre
Standard deviation
Standard deviation
State of matter
Sampling (statistics)
Maxima and minima
Analytic set
Variance
Mathematics
Sampling (music)
Website
Information
18:30
State of matter
Multiplication sign
Combinational logic
Set (mathematics)
Open set
Mereology
Variance
Subset
Web 2.0
Fluid statics
Ontology
Information
Series (mathematics)
Descriptive statistics
Physical system
Social class
Mapping
File format
Electronic mailing list
Data storage device
Bit
Instance (computer science)
Statistics
Tessellation
Type theory
Arithmetic mean
Order (biology)
Website
Right angle
Spacetime
Web page
Point (geometry)
Server (computing)
Twin prime
Service (economics)
Link (knot theory)
Computer file
Transformation (genetics)
Image resolution
Maxima and minima
Black box
Number
Attribute grammar
Data structure
Default (computer science)
Projective plane
Shareware
Query language
Personal digital assistant
Sampling (music)
Blog
Network topology
Video game
Abstraction
00:05
the caribbean.
00:09
they get started with an ex toward thanks for coming to this this last session and so now we have color manning from the united states talking to us about and continental scale point cloud management within china it really is continental scale so he was going. i like he said i'm commenting on here to talk to you about really really large point clouds were made with some open source software called entwined and a little bit of put all so the first time when i go over little bit of some of the open source software tools that on that make up the.
00:47
these projects put all or peto either pronunciation is fine and and twine.
00:53
so for someone to talk about pull bit it is the point data abstraction library and is used to translate and manipulate point cloud data so for people that are familiar with gel which is probably quite a few of you as a similar scope in point cloud land that jail doesn't restroom factor land. and it all provide you a processing pipeline to develop workflows which are composed of stages and stages our readers writers and filters so an example of a simple pipeline might be something like read a couple last files reject one of them to match the other and write the output to a tiff.
01:31
it but they can also get quite complex.
01:35
because these stages are composed will you can develop some pretty complex work clothes i'm not going to go through the details of this one but we're doing some reading from a and e p t data source which is what i'm going to go over shortly we do some rejection the noise ing and what we end up with is just the ground points from this data set and we write. the output to both the tiff and the easy file. so the building blocks that pull gives you are very powerful it's pretty unopinionated about how you composer workflows it gives you a small building blocks on which to build.
02:13
so you might imagine somewhere closer probably lot of people that work with point closer to have a lot of work clothes in mind so for example you might be seen how close your trees are to your power line over train track or maybe you're concerned with stripping all that out and you're interested in the train itself.
02:30
and maybe have some post earthquake point cloud model and you like to figure out how to turn that into a dem at different resolutions so you're playing around with some settings to figure out how to do volumetric change detection type stuff.
02:48
city planning type things as setbacks from curbs figuring out where to put signs that cetera.
02:53
or maybe just measuring something in a place that's not very easy to reach all the time. so probably everybody has worked flows in mind a lot of people can think of software and tools that do that but what about when your data instead of looking like those looks a little more like this like this is a i think sixty billion point city gathered with mobile ad are so they drove cars around with like ours attached to it or countries.
03:19
this is all of the netherlands six hundred forty billion points and on many terabytes or large states this is kentucky in the usa and so the data like this at this magnitude is sometimes delivered as flights but more frequently will be delivered as lots and lots of full density tiles with fixed.
03:40
but which can be very difficult to work with i am i mean you're not people are delivering map tiles and rest of tiles this way but you do see point clubs deliver this way a lot so you can talk to you about next about the software that i think is a better way to do delivery of lighter data and the software behind us.
04:00
called and twine so it's a point cloud organization software that enables you to efficiently query analyze visualize and enrich your very large point cloud collections it's very scalable up to trillions of points which will see shortly and it's built with a perilous parallel as asian in the cloud in mind. so what and twine does is generated by a new format called in twenty point tiles or the p.t. format and this is a static file structure that's not stick to the encoding so you can swap out the back and compression depending on who use case of or you can use the industry standards like was it that cetera.
04:40
it's got a flexible attribute schema so you're not bound by a fixed pre-defined sets of types and is fully lossless and a really important thing here is that it's lost listening its last lists in the strictest sense of the word such that the input data set is fully reconstruct able from the p.t. another. is really important when you're looking at multiple terabyte data sets because if you're going to undertake a transformation that converts these multiple terabytes to another multiple terabytes it would be great if you didn't have to keep both of them around and you could put one in cold storage so the e.p. to format has been designed with maintaining every aspect of the information from the input in the beauty. itself so that you could theoretically reproduce the inputs a completely from the party.
05:27
and so this play up so this is just a visual representation i mention that beauty is in our country structure and this is kind of what an arc tree is visually represented you can see the point budget slider being split up and down and as we decide yes i can load more points are no i want fewer points we can discard the ones that are least relevant depending on what.
05:47
currently looking at.
05:50
so it's kind of like slipping out tiles were baptized services for point clubs.
05:55
the.
05:58
so this is a bit longer video but i'm just going to show some of the visualization and scale ability of how big the p.t. stuff can scale to so this is a four trillion point dataset i think slightly less but are approximately four trillion individual points for the entire you united states. interstate system so as were zooming around you can see where happening all over the country but the part that we're interested in fills in quite quickly. and some people are probably thinking will have yet visualization i mean it's kind of cool but that's really not why we're using point clouds right but the idea the key thing to kind of think about imagine here is that if we can load the data that we care about very quickly on demand with a millisecond response time we can probably do a lot of other things too so after this.
06:48
the official playing a talk about some of the analytics and exploitation that you can do with this side of structure. thing to another twenty seconds or so anybody have any early questions so real quick. get it. it. the company that i don't think i can say the name of the sorry.
07:21
the videos the only public thing.
07:24
i'm. so that the data structured as a whole bunch of files on disc in an artery format so and so there's no server a typically you would store them in something like a three year distributed file system or bare metal server and its you can use any in coating you wish that data and in particular and most ladder. it always lies it and it's a bunch of as up files with some meditative that let you access them this way. so like i said and when scope isn't really just about visualization we try to be somewhere in the middle of this gradient between being able to view your data and being able to do things with your data probably a little bit more towards the exploitation side but somewhere in the middle there are a lot of projects way over the green and in a number of them were. over the blue and not quite as many in the middle so easy to it it doesn't try to be the best at visualization but it tries to be the most useful all round format. so i'll talk about someone else's and exploitation like i said in this is going to be using portal.
08:32
i saw go through just a couple were close and how you might use the p.t. to solve them similar before so for example maybe you're interested in this patch of this lovely patch of a forested hell but what you're really interested in his modeling how the water might flow over it so you'd like to get rid of all the vegetation and what you really want to some sort of.
08:52
watertight masher bastareaud or some other derivative products from the point cloud so like before you can probably think of ways to do this right you may have done this before or something similar but what if that patch of land is in a i think this one five hundred billion point data set that spans multiple terabytes how may be that complicated but you probably.
09:13
probably not thinking about my need to go query that tile database figure out which overlaps there are then have to do want to downloads and then used it can be difficult when your data exists in an ecosystem this large.
09:26
but with the e.t. reader with put all using the spatially accelerated data structure it's actually quite easy so you can see that at the top there we have an e.p. t. reader and the important part there's that work wearing by only the bones we care about and then we do some operations on it so we're detecting noise we running a ground algorithm and filtering the non ground points and. and writing they'll put to a tough and even in a mall to terrify dataset like that one was this would probably take only a few seconds or on the order of minutes.
09:56
another kind of orthogonal example is this is the state of kentucky also half a trillion points and how would you do something like generate a reasonable boundary for it right you don't you might think taking the hatters of the files and mashing all the bounce togethers but then you get a bunch of jagged edges it's not not necessarily so.
10:17
continue on display is kind of your user facing footprint and and this is also really easy with the p.t. reader because it's structured and a hierarchical manner by resolution so the key part here is that resolution there that's quite course i'm carrying four hundred points that are four hundred metres apart a typically and then i'm just using pulls has been.
10:37
he turned to create a hex boundary on that data and like i said that can talk he said his multiple terabytes and i think this takes about five or six seconds or so on my laptop.
10:49
he and another thing you can do with the p.t. structure and with total as you can do enrichment to the data and what i mean by that is that you can add new attributes later and at the bottom there so you don't need to add them or read you don't need to add them to the exact site itself so there's no rewrites involved you can write these low. locally and this is something will see a little bit later so for example if you have a web service but you don't have ready access to it can you swap out attributes for it with attributes that you've decided an example of that might be things like normals that you're going to reuse for lots of different algorithms or workflow results like class fires.
11:29
a typical example would be replacing the classification of some service with something better with a better version of a classification algorithm.
11:39
and that there's a lot of stuff on here a lot of it's not all that important but the point the point to note here is that the p.t. add on writer which maps dimension that's the result of work closer it we've we've signed a classification with some awesome ground algorithm and then we mapped that to a path and in this example it's on our local. our local computer and then later we can map these paths back into the attributes in the point cloud so if you have all sorts of different classifications for different contexts or you're comparing different algorithms you can swap them out can dynamically this way.
12:13
now we're going to have a little bit i and some of you i think there's been a lot of talks about seizing three tiles actually see ushered out there of three tile shirt. so somebody might have been thinking well to the sounds kind of like the tiles what like what is this what are the differences why would i use one or the other the first what they are cesium surrendering library. and three the tiles a format so the analog would be cesium is like. but i guess poetry which i haven't mentioned poetry was the visualize are using earlier but three d. tiles as the format. and in general cesium is really good for makes media types because you can do things like mix up your building models and train models and point clouds and you can love them all up in a single render and they've also got flexible tiling format so you can define how you want to split your data and it's got its just a really robust rendering library but. for point clouds in particular and i'm going to compare it with the p.t. here there are some drawbacks and this is these are really drive these aren't really. it's not things that cesium is like missing that they should have added but their scope as little more toward the visualization side so when you start to look at it for things like exploitation you're missing some important things so one n one example of an advantage of the p.t. over three tiles is that you can build the t.v. with open source tools which would be in twine. with the cesium you need to use easy my on for a building things and in general the formats just more oriented toward visualization you can't use standard ladder and coatings of the compressions optimized for g.p.u. so if you would write a great many things again said it might be a little clunky and and in general not run herbal. attributes are the prioritise so for example if you if you upload something to say to see my own it strips out the things that aren't render bill like your g.p.s. time in your skin return number scanning all which are really important to people that really care about life are using it for driving things and the last one there is that the meditative the matter data for equivalent. tea is much larger and cesium because the beauty is an implicit i treat we can bet a lot of information just in our notes structure well while cesium explicitly has to list a lot of implicit things and that's on the road map for them.
14:34
so i'll come back to that and a little bit and we're going to switch again or a quick to a new project i've been working on called the tools and and this is a job a script library that can run in the browser or no j s and it has tools to work with the p.t. data right now there aren't very many you can see there's only three tools one of them's validate so we can check out the men.
14:54
the data and make sure it looks good which would be useful if you are creating your own e p t and not using twine. and then to go back to the three d. tile stuff there's a tile command that translates the p.t. to three tiles as a one time transformation so this is this would be duplicated your data and yet another format or perhaps more interesting would be the live translation of the p.t. to three d. tiles not look something like this you just serve and e.d.t. project route and.
15:21
can you point cesium it that route and cesium makes three the tiles requests that are automatically converted. by the server from the beauty so you p.t. serve response to that with three towers data directly. and more interesting than that though is that that's actually i don't potent and stateless so you can run that in alaska so with something like a ws land an a.p.i. a.p.i. gateway for example or the equivalent in some other cloud you can have a server lists reflection of all of your point clouds and e.d.t. as three d. tiles.
15:56
for very cheap because you're not paying for certain you're paying for the mill seconds of the actual transformations so you don't have a server running all the time. so the last thing i'm going to talk about just a couple minutes left his home and example project of using tools like this to manage a very large data collection at scale.
16:17
so the sunlight our people from the u.s. or people that have worked with light are from time might be familiar with the u.s. geological survey of three that program which gathers lots and lots of light are so here's some stats about the data set that goes back about fifteen years seventy plus terabytes.
16:38
and existed as tile data in s three. as always he so leveraging the status of the u.s.g.s. just had a sitting there for a long time and people were downloading it but how can we leverage it and do other things like can we look at all of it. can we write software against can we query and filter it and most importantly can we get amazon to pay for it perhaps and the answer to that was yes actually all these so through the aid of us public data sets program and amazon paid for the the computerised for converting all the data from tile data. e.d.t. as well as hosting in as three for at least the next two years.
17:20
and here's kind of the the portal of what we ended up with so you can see all the footprints side and i talked about how how these were generated before all of these so you can see the point countertop over ten trillion but all these boundaries were generated and i think like for five minutes on my laptop so not perfect the quite course but it's pretty. good for something like this and it's a you wouldn't be able to do this with every structure and so you can see down at the bottom there there's poetry and classico little dots and those are to open source renders and on the right you have cesium which goes to the on demand reflected three tiles transformation so the state. only exists as a p.t. but we do reflected of caesium as well.
18:06
and i here's an example of just what you can do with this website are filtering down to the five hundred billion points or so and just loading about all of this is poetry. and you can also run analytics i mean it's full of the p.t. datasets available over h e t p so do things like sample the disease actually this data sets quite noisy as you can see the standard deviation of diseases three hundred but this state has an elevation change of approximately ten or twenty meters because it's iowa.
18:37
have you can also do sampling on the classification so this is counting this is clearing a really low resolution that same data set a really large one and counting the values that come back as classification you can see of one that's kind of weird their the to twenty nine i'm not sure what that would be. a but that's all i got him i'm going to i'm not going to put up a whole bunch of links you should only need this one because after the sessions over i'm going to have a blog post on the main page there with these slides and with all the links to all the projects have talked about so if you want to if you need to remember link that's the only one you should mean.
19:14
post isn't there yet when you check in me a little bit of time. but thank you that's often. thanks god i'm does anyone have any questions i do have swag for questions. so what's the time was the road map for a.t.p. tools and there's only three that she listed what more do you plan on having a while i this is the first time i've actually published publicize that i've given it out to a couple people that have used it but it's probably going to cut its going to depend on community involvement of what it would open. people think would fit in this space and say hey i need this there should be this tool so people like you will be the drivers of the road map so what can you get a spork the twins forks it. so every dixon for your kind of i'm when you're writing add ons to they can seem the same amount of space on your story just the original data said i know so the add ons to take up so for example an add on you can specify the type like for classification i think it's a sixteen bets eight or sixteen bets. one of the two eight eight its thinking martin it's a bit so if you're writing your own classification it will take up a bit times the number of points that you actually run the class fire on you don't have to so i mention the actress structure so tree structure you don't have to write and add on for the full set you can write for a subset so that all those queries like the bone square. we're talking about or the resolution queries if you have add ons that only going to a certain resolution you can you can write as a subset so they will only take the amount of space of the attribute tight times the number of points that you actually apply them to. who thinks the art by an upgrade. and i appreciate you slide down on last lists in the beginning and i appreciate that it included ordering does it mean if i have to state of iowa in the no one thousand seven hundred tiles and i give it to the intertwined in colder that i can get back those one thousand seven hundred tiles in the seam. naming wis every point in the same order and if yes how to implement that yes. so the air on on the in twenty no website there's a link called and one point tiles on the side bar and that's the description of the format so the key parts as far as what specifically what you asked for every file that comes and we tagged we add a new attribute called the origin id which maps back to. the combination of the files for matter data so everything in the last hatters all the alarms all the evil ours to so we store all that and we started filename itself and so not by default we don't include appoint idea because typically they're already ordered by g.p.'s time so we use g.p.s. time is an implicit ordering but you can. and said there's a flag on and twine that you can say store point id and it will tag every single point with its order in its origin file so hopefully that answers and. anything else. we have a few minutes so the more questions are welcome or more sports to give away. you know. well first of all thanks one point i think i saw them already on the web where you switch between the you asked on the mound. holland and i may be done work on the fly you will start the mill tissue resting or was that everything was rejected. that specific case i think everything was rejected so for the for the public services that we kind of post as demos we actually do something very that all the latter people won't like very much as we put on by mercator because then it everybody can interact with a very easily right i mean it's its demo data right it's not it's not really meant for. everything but yes so in those instances that was all wet mercator and for something i didn't mention about like the three d. tiles one that's actually a pretty pretty nice benefit of the eventual stuff is that cesium only supports i think let lying whether cater in the sea yeah for some combination thereof i think the points are you c f. but the meditative must be left lying part of the on the fly transformation of the usual stuff is doing the rejection from whatever your data set is in so if your data is on you tube and more some local coordinate as long as you have the court systems and they can be rejected to shop correctly on a globe you can do that translation but in your specific case i think all the data was in the same as here. it's going to tank. but no one else and i don't want to everyone in the short question. my concern was that a greyhound i should i actually should mention this i guess some guys might have seen me talking about greyhound a couple years ago which was a server that kind of did a lot of these features and this is the first time in presenting a p t a greyhound was alive server see a dead. but life server up and running all times and what grounded was translate a black box format that you could that wasn't a wasn't documented to anybody but it served a kind of like the bt does that black box format that you weren't supposed to look at because we wanted the abstraction layer of greyhound has been solidified and the p.t. and because we've implemented it as. we've intimate implemented the the ability to read it static lee in things like poetry etc and put all there's no server involved there so greyhounds kind of the need for it kind of goes away when you have a static format that's more recognisable are more usable i think the space for greyhound or what was ground might move towards. the p.t. tools kind of thing like if you do want to live server for some reason or maybe it's a series of land as i think that's where it would go right if you want to be filtering on the servers something like that so probably ground ground doesn't exist anymore but would be any beatles. thanks everyone i think you need for this too often. i.
