We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Cloud Optimized Point Cloud: Compressed, Geospatial, Lossless and Compatible Data Organization for Analysis Ready Point Cloud Data

00:00

Formal Metadata

Title
Cloud Optimized Point Cloud: Compressed, Geospatial, Lossless and Compatible Data Organization for Analysis Ready Point Cloud Data
Title of Series
Number of Parts
351
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Point cloud data are an important component of geospatial data workflows, but software and formats to manage it often have compromises that work against efficient storage and processing of data. While commonly seen characterizing topographic information in LiDAR applications, point cloud data are an important driver of change detection applications in SAR workflows and provide important raw data to bring the physical world to the augmented one through handset capture on devices like the iPhone 12+. COPC.io is an open specification by Hobu, Inc. for organizing point cloud data in LAZ that allows it to be streamable over HTTP, selectable for resolution or spatial window, and adaptable to existing point cloud workflows in a backward compatible way. We will discuss the design choices and evolution of COPC, demonstrate its use in PDAL and QGIS scenarios, and show how COPC can be used in the cloud for management of massive point cloud collections.
Keywords
Point (geometry)Set (mathematics)Different (Kate Ryan album)Formal languageMathematical optimizationPoint cloudPoint (geometry)Meeting/InterviewXML
Point (geometry)File formatCAN busWritingMultiplication signLinear mapOpen sourceData compressionBitComputer fileData compressionCartesian coordinate systemContent (media)Least squaresLatent heatMetadataComplex (psychology)Point cloudOpen sourceSelf-organizationMechanism designPoint (geometry)InformationAsynchronous Transfer ModeOverhead (computing)Library (computing)Core dumpMultiplication signCurveCorrespondence (mathematics)Physical systemCategory of beingPort scannerSoftwareQuicksortData managementObject (grammar)Zariski topologyPosition operatorData structureSpacetimeFile formatImage resolutionProjective planeCodeMobile appData storage deviceImplementationBefehlsprozessorHecke operatorReading (process)Type theoryOpen setRight angleVolume (thermodynamics)Population densityLaser scanningPublic domainPulse (signal processing)Real numberMereologyEndliche ModelltheorieLinearizationComputer animationXML
Data compressionBefehlsprozessorDressing (medical)Complex (psychology)Data modelAsynchronous Transfer ModeBinary fileASCIIImage resolutionMetadataData structureQuery languageCubeOverhead (computing)Interface (computing)Control flowParallel computingVolumePoint (geometry)Library catalogData transmissionData storage deviceFile formatContent (media)Open setOpen sourceOctreeFunction (mathematics)Computer fileCore dumpCASE <Informatik>Computer fileSoftwareFile archiverWaveformForm (programming)CuboidPoint cloudCartesian coordinate systemReading (process)Open sourceData compressionCategory of beingAugmented realityRow (database)WebsiteService (economics)Process (computing)Software frameworkServer (computing)Web applicationWeb browserMultiplication signData loggerTraverse (surveying)OctreePartial derivativeSoftware engineeringRight anglePoint at infinityContent (media)WindowSubject indexingData structureNetwork topologyPoint (geometry)MereologyFile formatQuicksortOperator (mathematics)Latent heatMetadataElectronic mailing listPairwise comparisonImage resolutionData transmissionData storage deviceSelf-organizationVolume (thermodynamics)Data managementMedical imagingExtension (kinesiology)Endliche ModelltheorieFunctional (mathematics)TesselationMathematical optimizationOpen setCombinational logicWeb 2.0Band matrixComputer animation
Core dumpData typeWeb pageComputer reservations systemOpen sourceFreewareComputer fileStructural loadFile viewerImage resolutionComputer animation
Point cloudSelectivity (electronic)Computer fileFile systemCASE <Informatik>Image resolutionParallel portLocal ringCartesian coordinate systemPoint (geometry)Content (media)WindowGame controllerBound stateNeuroinformatik
View (database)VolumenvisualisierungWeb browserStreaming mediaComputerVisualization (computer graphics)DemosceneComputer fileProcess (computing)Medical imagingNeuroinformatikGreatest elementInternetworkingData structureCartesian coordinate systemQuery languagePoint (geometry)MetadataWeb browserFile formatSelectivity (electronic)Surjective functionComputer animation
ComputerIntegrated development environmentMultiplicationLibrary catalogConservation lawDecision theoryPoint cloudContent (media)Library catalogComputer fileNeuroinformatikTesselation
StatisticsElectronic visual displayPoint (geometry)Core dumpQuery languageExtension (kinesiology)SoftwareOpen sourceLeast squaresScaling (geometry)Category of beingTesselationObject (grammar)State of matterComputer fileImage resolutionOpen setLibrary (computing)Point (geometry)Reading (process)BitFitness functionStack (abstract data type)Content (media)WindowLatent heatAlgorithmLaptopConstructor (object-oriented programming)Data storage deviceCartesian coordinate systemSurfaceFile systemCentralizer and normalizerOctreeCASE <Informatik>Medical imagingLevel (video gaming)Linear partial informationOpen sourcePoint cloudRight angleGoogolMultiplication signFile archiverData managementData structureComputer animationMeeting/Interview
Transcript: English(auto-generated)
Thanks a lot, Mike. So I'd like to talk to you today about Cloud Optimize Point Cloud, the way you alternate every pronunciation. Kind of a thing where acronym pronunciation in different languages is always kind of different,
and I always found that really interesting. And so however you want to pronounce it in your language is just fine with me. The lights are up a little high, but how many of you in this audience are doing anything with LIDAR or point cloud content? All right, that's a good set. How many of you have used Poodle, or LazTools,
or Whitebox tools, or Lazpie? Most of you as well, OK. So it's going to be a friendly audience. I'm excited. My name's Howard Butler. I have a company in Iowa City, like Mike said, called Hobo Inc. We're a five person team.
We're an open core company that is focused on LIDAR data, point cloud data management. So if you have a country sized point cloud and you want to know what the heck to do with it, how do you distribute it, how should it be organized in such a way that people can access and do things with it, come to us. And so we work on this COPSI project
I'm talking about today, Entwine, which is a software for organizing point cloud data, and the Poodle project. So one of the realities of point cloud data that's a really important property, I think, for people to think about is it's read many more times than it's written.
And so data formats and organization should really be tilted toward this reality. Also, many software use point cloud data in kind of a pan and scan sort of search mode. You're looking for objects. You're filtering data based on a position of another object in space.
And so data organization that supports fast searching without lots of overhead is a really important property. Third point cloud, it's fluffy, right? And what I mean by that is a billion point point cloud scan of the Duomo, taking a point out of it
does not materially change the information content of that file or of that data. And data formats that are organized and structured with that reality in mind can do a lot. The open formats that we have in our ecosystem, LAS, LAZ, E57, have pretty good interop
especially for the geospatial lidar domain, but there are closed point cloud formats that can do more. So backing up 15 or 20 years ago, lidar was a kind of a niche technology,
governments could do it, it was really expensive and they were capturing points across their systems at quite slow speed. And even then the files were quite large for the time. This is a curve that everybody kind of is used to seeing in culture these days, the hockey stick curve that's really applying to data
in geospatial lidar scans. The data volumes, the densities, point per meter, whatever they're using to declare that is rapidly increasing. That corresponds directly to resolution of the scan and everybody wants as much as possible. I mean, more is always better, right? But software formats and data organization need to keep up with this challenge.
So what is COPSI? So, COPSI is an LAZ 1.4 file. Backward compatible, your LAZ LASIP library, your LASPI library that reads LAZ data, your QGIS software that reads LAZ data can read it just as an LAZ file if it wants.
But LAZ has a mechanism for storing metadata and what we're doing is adding metadata to organize, spatially organize that LAZ file and then we're clustering the storage of that, those chunks or that data structure into the file itself.
What does this allow us? So, first, like a cloud optimized geotiff, it allows an opt-in spatial access and what I mean by that is an application that doesn't care about having complexity to be able to spatially select or filter the content. Can just read the file front to back,
read the content and do whatever it wants with it. But it does allow selective decompression. So, an application that does put in the effort or the capability to figure out where should I actually decompress data can do so in a performant way and of course, just like cloud optimized geotiff, this is backward compatible with LAZ.
We have to thank Martin Eisenberg from Rapid Lasso. He had a company called Rapid Lasso who died this past September for developing LASIP in the mid 2000s. He released this as open source in 2011 and 12. He, of course, was instrumental in our community,
instrumental in the ASPRS LAS specification and had his software LAS tools with his company Rapid Lasso. It's been very influential in our community and very sad to see him pass. But we can take LAZ, add a little bit of essentially metadata to it
and get a lot for it. So, why would we choose to do that with LAZ? LAZ kind of has industry-wide read support. For the most part, software that's doing geospatial LIDAR or even point clouds would have, that might not be geospatial, has LAZ support,
certainly on the read side. It provides an efficient lossless compression of the content. It's open source software and recently this past winter, we changed the license from LGPL to Apache. So if embedding the LASIP code base into, say you're doing App Store type deployment,
that's a possibility now without any restriction. And there's JavaScript, C++ and Python and Rust implementations support for it. Downsides of LASIP and LAZ can be kind of CPU intensive. It's slowish to compress because it's trying to squeeze as much fluff
out of the cloud as it can. It's faster to decompress. It's chunked to allow you to seek across the file and only decompress chunks that you need and that's the property we're taking advantage of with COPC. But it has a complex compression model, meaning it was organized and developed in support of storing what's called linear mode pulsed LIDAR,
you know, typical LIDAR laser scan. And that supposes a data organization and a compression model and not all LIDAR and certainly not all geospatial point cloud data, say you were doing point cloud from a coincident managed imagery, fits that model particularly well, but it still does okay
and the interoperability is still worth it. So, you know, there's a Sullivan quote, a famous architect that form follows function, right? So LAZ certainly follows its function, which is to provide a lossless compressed organization of ASPRS LAS data. So, you know, software engineers like to solve problems
that they can't solve. And, you know, we've always had this kind of back and forth with my engineering team of what's the ideal point cloud format? And, you know, everybody starts putting up their list of features that they might want and many of these features are in competition or completely opposed to each other.
And so there is no right format. And, you know, what we can, you know, doesn't exist and what we can hope to do is avoid this though, which is creating another one. And so that's what we're doing with COPC, right? It's backward compatible with LAZ. It's just an augmentation or an extension of LAZ
for applications that want to opt into it. So if we take our laundry list of features that we wanted and knock off a few that are probably in opposition with what we're doing with LAS-ZIP, we can get quite a large list. For LAZ, the features that are missing,
certainly in comparison to commercial formats that might be available, is the ability to select spatially for data that you want and the ability to select for resolution and to do those in combination. And so the organization of a COPC file, what it's doing is allowing applications,
if they choose to, to opt in, to be able to do that part of the challenge. So spatial access, meaning a windowing sort of operation, whether it's a 2D or 3D window, and then resolution, where the data are decimated into a pyramid. And so typically in point cloud data, you have two kinds of indexing structures.
One is called an octree and the other might be a KD tree. And for COPC's case, we're doing an octree. Another important property that COPC supports because of its organization is the concept of HTTP partial access, right? So an application reaching out over the network,
over HTTP, can hop and skip around a file to select and fetch only the bytes it might want to. What this allows is applications to control how they traverse across that file, how they might operate in parallel, if you had multiple processes filtering or selecting that data, and how they might do so according to whatever volume
they can manage at a point in time. So instead of data as a service, Cloud Optimized Geotiff and Cloud Optimized Point Cloud, our data is a service, right? Just at rest, sitting on an HTTP server, an application can consume it as a service if it chooses to,
or it could simply download all the content, process it front to back. This is a concept that's really important in kind of an emerging organizational framework called Cloud Native Geospatial. Matt Hanson and the crew had a very packed session this morning about stack, hopefully some of you saw it.
COPC certainly is trying to be well aligned with that approach, being able to partially access content if you desire applications that can control how they filter and process data, and being able to do that in parallel with many workers or many applications, many processes touching in at once.
Why do we want to augment LAZ? So LAZ right now is the storage and transmission format of record for most of the public LIDAR. If you go to an agency's website to download LIDAR for your country, nine and a half out of 10 times, it's gonna be LAZ tiles that you're fetching. There's petabytes of LAZ content out there,
and it's well established, it's widely implemented, there's open source software for it, it's based on the LAZ specification, and we want this cloud-optimized geotiff property of opt-in spatially accelerated access, but let's do that for point clouds. How does it do it? So there's a octree metadata.
On this particular picture down below, it's the blue box out on the right, and that tells an application, if it seeks to it and consumes it, here's how I might manipulate or move around in the gray box where the points are.
And applications can control which chunks they might read for which resolutions, and an application can figure out what data it needs to select without overreading, over decompressing, or consuming lots of network bandwidth. Copsy only supports the point data records six, seven,
and eight, which are kind of the common sweet spot for LAZ archives, right? So most point cloud data, it's not waveform data, it's just geospatial point cloud data, it also supports extra bytes, which some of the agencies are starting to provide as content. So what? So you have this data format. Everyone has a data format.
What can you do with this thing? So one of the things we developed is a browser-based web application. This particular thing is not open source, but others are available that could do so. Being able to decompress, filter, select, and put this data in a web browser, in this case it's using cesium.
So this is a substantial, I think it's 330 megabyte file, just browsing, manipulating, filtering through the content, and providing it to a browser-based application for people to visualize it. This application's available at viewer.copsi.io.
If you have a COPSI file, because maybe you converted it using QGIS, recently released QGIS from the Lucha team in our crowdfunding effort, you can actually drag and drop it from your desktop and load it locally in this viewer if you wanted to. So that's a thing we just released for it.
So what else can you do with a COPSI file? Because of the resolution and the windowing selectivity, it makes it easy for applications to control how much data they're going to select. So this is that 330 megabyte file.
Say I want a two-meter overview of, in this case it's the intensity of that point cloud that we were looking at in the movie. I only need to download 1.7 million of those points, as opposed to all 360 million of those, if I want to satisfy the data at a two-meter resolution.
And as an application, I can control how I might do that by changing my window. Say I would slide a bounds-limiting window across the data. Maybe I would change a resolution. I could do that in parallel. I could do that over HTTP. I could do that over a local file system.
So as an application consumer of this content, I have a lot of control about how I attack it with computing. So what? The bottom bullet's out of date here. QGIS 3.2.6, which was 3.26, which was released in June. I think we're at 3.262 right now,
supports COPSI as its internal format. And so if you drag an LAZ file or an LAS file onto QGIS, it'll process it into a COPSI file for you and start to display and allow you to do fantastic GIS 3D visualization. It'll allow you to do 2D GIS if you want with it.
There's more features coming out from the Lucha team. And make sure to catch Martin DeBias' talk tomorrow afternoon. So this particular movie, we're navigating a scene in, I think it's Melbourne, Australia. It's like a, I don't remember,
a 10 gigabyte file or something like this. So over the internet, a really large file organized as one data structure and application. It's using a frustum and camera query to figure out where am I located, what data do I need to select, dereferencing that metadata, selecting those points, filtering and pulling it into the browser.
So what? Okay, I can make pretty pictures, pretty movies. I can filter data for a little preview image. So what? We worked with Microsoft Planetary Computer to provide all of the USGS LIDAR point cloud content in a stack catalog
and organized as cloud optimized point cloud files. So if you're using Planetary Computer and you're doing US-based data processing, if you wanted to touch any one of those tiles, ask questions of it, filter and construct data. In this case, I was generating a height above ground surface for the park in central Chicago.
There's a Jupyter notebook you can go visit. Asking the stack API that's available, hey, which tiles would I select for this particular window or whatever the filter I would be used to select stack. Filtering that data with Poodle for height above ground, maybe I would buffer it out, maybe I would control how that algorithm is working.
And in this case, we're just producing a little image. If you want to find out more about Poodle and kind of what the state of things is, make sure you see Mike Smith's talk tomorrow afternoon. He'll be given a kind of a state of Poodle upstairs, I think.
So what else can you do with COPSI? And this, in my opinion, is the most important thing about cloud optimized geotiff and why cloud optimized geotiff has been such a successful thing in our industry, is it's just an LAZ file or it's just a TIFF file, right? So applications do not have to add special support to read the LAZ content out of a COPSI file.
And that's a really important property to allow applications a smooth ramp to adding support instead of a big discontinuous cliff. They can add support for it at their leisure. And it allows the rest of the ecosystem to build out data archive, data management, data storage
for however they might manage the content. We had been working on something called entwined point tile recently, maybe the last few years. If you're following Foster G, you would have seen Connor Manning talk about EPT. One of the challenges with entwined point tile is, in concept, it's the same thing
as cloud optimized point cloud. It's chunks of octree. The problem is they're exploded across a file system or an object store. And if you never have to move that data, that's not really a problem. Say you're in Amazon or Google Cloud, you don't move content from S3 or GCS. But if you're not, say you're at desktop scale on a laptop doing QGIS,
you don't want a million files, right? That's a really big problem for you. The other thing that isn't so great about exploded object stores, exploded data structures, let's say, is they don't fit the concepts of stack collection and stack item particularly well, because conceptually they're a stack item, but they're actually lots of little objects. And so it makes it a little bit more challenging.
So something like this makes it more convenient and easy, and it allows people to control how big their tiles are. Maybe we process data at such and such a tile resolution and tile size, and they get to control that. So our roadmap in November of 2021,
we finalized the specification. Poodle 2.4.0 was released this past March. I think we're up to 2.4.3. Has both readers and writers.copic, so you can filter, construct that data, read that data, do whatever you need to with it.
Safe FME has it on their roadmap. QGIS, of course, has support for it. OpenDroneMap just added support for it, so you can export a COPC file. Python LASPy library has it and so on. TakeHomes for today, if you go back and want to think about this or how to use it, lots of open source, no official pronunciation.
And I have party favors if you ask a question, so I'm happy to take any questions. Thank you very much.