State of GDAL: what's new in 3.8 and 3.9?
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 156 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68541 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
AbstractionLibrary (computing)File formatCommunications protocolRaster graphicsVector spaceReading (process)Open sourceComputer reservations systemGeometryType theoryRevision controlCurveCharacteristic polynomialElement (mathematics)AdditionPoint (geometry)Device driverSimilarity (geometry)PolyhedronMaxima and minimaAttribute grammarInterior (topology)String (computer science)Gastropod shellLine (geometry)PolygonDevice driverIntegerInformationOpen setCountingUnified threat managementTime zoneTexture mappingMean value theoremVirtual realityPhysical systemPoint cloudData storage deviceFunction (mathematics)Price indexDirectory serviceScale (map)SoftwareOpen sourceLevel (video gaming)Server (computing)Software developerOcean currentRevision controlAsymmetryTheory of relativityStandard deviationComputer fileTimestampFlow separationCodierung <Programmierung>ExpressionMereologyDifferent (Kate Ryan album)Product (business)Latent heatLink (knot theory)Presentation of a groupPrisoner's dilemmaReading (process)Set (mathematics)TesselationGeometryVector spaceElement (mathematics)Attribute grammarMappingData structureContent (media)Device driverDatabaseService (economics)File formatType theoryMultiplication signCategory of beingObject (grammar)Equivalence relationCurvatureAdditionPolyhedronMultiplicationData storage deviceBlack boxFunctional (mathematics)Subject indexingVirtualizationMetadataSurfaceRaster graphicsSoftware bugProjective planeMathematicsDirection (geometry)Source codeCoordinate systemExtension (kinesiology)CodeCodeCurveRange (statistics)Physical systemLimit (category theory)Computer reservations systemFile systemSlide ruleClosed setCommunications protocolPoint cloudAbstractionLibrary (computing)Mean value theoremLecture/ConferenceComputer animation
08:10
Utility softwareMusical ensembleAlpha (investment)GeometryProcess (computing)Raster graphicsPolygonEnvelope (mathematics)Validity (statistics)Population densityMaxima and minimaMultiplicationPoint (geometry)Price indexVirtual realityDevice driverVector spaceRevision controlPixelComputer fileSource codeArrow of timeInterface (computing)Read-only memoryMusical ensembleMetadataFile formatTesselationDevice driverSubject indexingPolygonGeometryComputer reservations systemVector spacePrice indexAlpha (investment)Utility softwareVirtualizationGroup actionPhotographic mosaicStatisticsSoftware developerSemiconductor memoryData storage deviceString (computer science)Computer fileProjective planeResultantNumberCASE <Informatik>Library catalogCurvatureOrder (biology)Image resolutionHeegaard splittingFlow separationAreaReliefTimestampFront and back endsData typeData structureDirection (geometry)MultiplicationPixelReading (process)CuboidUniform resource locatorOrientation (vector space)Envelope (mathematics)Field (computer science)Data compressionValidity (statistics)AlgorithmMappingLatent heatSet (mathematics)ImplementationRight angleRow (database)DatabaseTranslation (relic)Multiplication signComplex (psychology)Electronic mailing listArrow of timeRaster graphicsTessellationWritingStapeldateiInformation retrievalAttribute grammarDifferent (Kate Ryan album)Bound stateLecture/ConferenceComputer animation
16:14
Device driverGeometryImplementationType theoryArrow of timeCodierung <Programmierung>StatisticsLarge eddy simulationSpecial unitary groupLoop (music)Computer-assisted translationLink (knot theory)Virtual realityRaster graphicsPrice indexMetadataFile formatPasswordInformationOpen sourceComputer reservations systemSoftware developerArrow of timeDevice driverOpen setFile formatSubject indexingData structureNatural numberLine (geometry)Electronic mailing listReading (process)Self-organizationHierarchyCASE <Informatik>SoftwareMultiplication signGeometryCodierung <Programmierung>BitLibrary (computing)Presentation of a groupMultiplicationDifferent (Kate Ryan album)Photographic mosaicImage warpingComputer fileSoftware bugTesselationInformation securityNumberConnectivity (graph theory)MereologyStack (abstract data type)IdentifiabilityUtility softwareRun time (program lifecycle phase)Plug-in (computing)10 (number)DigitizingVector spaceData compressionDecimalSet (mathematics)View (database)AlgorithmData typePoint (geometry)Translation (relic)Revision controlSoftware frameworkCoordinate systemMetadataProjective planeMathematicsLecture/ConferenceComputer animation
24:18
Computer-assisted translationLecture/ConferenceComputer animation
Transcript: English(auto-generated)
00:00
So, my name is Yven Roux, and I'm a free and open source software developer, mostly focused on the GDAL, Map Server, Progen, QGIS projects. In this talk, I go over the changes GDAL has received during the past year with the new 3.8 and 3.9 releases, and I will also talk about the future directions.
00:24
So, for those who don't know a lot about GDAL in just one slide, GDAL stands for the Geospatial Data Abstraction Library, which is sometimes a black box you use without even realising
00:41
it when you use most C or C++ open source or closed source GIS software. As of today, it handles more than 250 different file formats or network or database or services protocols.
01:01
It is released under the MIT licence, which is a very permissive software licence, and we release a version with new features about every six months, and bug fix releases every two months.
01:21
So, let's now dive into the novelties of the new GDAL versions. First, there is an ongoing development at OGC with OGC features and geometry JSON file format specification. It's often shortened as JSON-FG, and it's an extension of the well-known GeoJSON format.
01:50
It is compatible with other OGC ongoing developments, and particularly with OGC API features.
02:03
So, let's look at a very simple or simplified example at the new features of JSON-FG. One of them is the ability to handle coordinate reference systems, which are not WGS84.
02:21
So, you have this called refsys element that you can attach to a feature collection or feature. For now, it's limited to what I would say is a well-known CRS, that is, for example, EPSG codes. Another main change is addition of a place element, which is an alternate place where
02:47
you can put geometry. So, if you use a non-WGS84 CRS, you will put your geometry in there, but you can also put geometries that you cannot encode at all in GeoJSON, such as 3D geometries like
03:05
polyhedron or prism, and there will probably be enhancements also to support curved geometries in this place element. JSON-FG also brings a way to standardize how to express timestamps or time ranges,
03:24
and you can also declare the feature type of an element, and you can have within a feature collection different feature types. So, in GDAL 3.8, we have added a dedicated driver to handle this new format.
03:43
It shares a lot of things with GeoJSON, but it's also separated for clarity. Writing the driver will automatically put geometries into the place element if they are expressed in a non-WGS84 CRS, and it will also write the corresponding geometry
04:02
when it's possible in the geometry element. Multiple layers can be read or written with this feature type special attribute. There's a mapping between the time element and OGR properties, and on the read side,
04:21
we have a minimum support for some of the 3D geometries. So, just an example on the small file I showed just before. So, you can see that the layer name is retrieved for the feature type element,
04:43
we can detect the non-WGS84 CRS, the geometry is taken from the place element, and the time property is properly recognized as such.
05:02
So, in GDAL 3.8, we also added a driver for the PMTiles format specification. So, PMTiles is a cloud-friendly container that enables to serve tiles efficiently with only object storage functionality.
05:20
So, it's the equivalent of cloud-optimized duty for flat geobirth, but for tiled datasets. It is really quite close to the Mapbox Tiles and BTiles format, which use the SQLite 3 layout, but here, it's a really optimized and dedicated structure to efficiently navigate through tiles
05:48
and with an index that has been cleverly designed, in particular, to handle the tiles that are of the same content. I've put a link on a presentation that Brandon Liu gave last year at Phosphology Prison
06:03
about portrait maps, and he has a talk later this morning too. So, the driver, the OGR driver, supports reading and writing vector tiles in Mapbox vector tiles format, and it shares a lot with existing MPTiles and MPT drivers, so you have exactly the same set of creation options.
06:27
Of course, if you really need well-customized options, you're probably better using TPCanoe to create your PMTiles datasets. We have VSI PMTiles virtual file systems, which enables you to directly access low-level parts of PMTiles datasets
06:50
to extract, for example, metadata document or extract a given MPT tiles directly from the PMTiles files.
07:04
We have new drivers under different product specifications of asymmetric related raster datasets. So, those are the S102, 104, and 111 EHO standards.
07:21
So, they are all based on an abstract specification, and they are based on an HDF file container. They are read-only drivers currently. So, the S102 driver is to read the asymmetric surface products, which give death and certainty.
07:42
It's similar to the existing BAG driver. The S104 driver is for surface navigation products. So, those are water level 8 and trans, and you can have several timestamps in such files. So, each timestamp will be exposed as a separate GDAL subdataset.
08:03
And the S10111 driver is for surface current products, such as you have a bound with a speed and another one with a current direction, and the same at multiple timestamps. We have a new command line utility, which is called GDAL footprint,
08:23
and it is to compute the polygonal envelope of a raster. So, it's really based on the existing GDAL polygonal utility, but with new options to really address the use case of computing footprints of rasters.
08:41
So, it takes into account no data or alpha bound. You can decide how you deal with validity if you have a multiple bound dataset. If the validity of a pixel is as soon as one bound is valid, or if all bounds must be valid. You can decide to compute the footprint only on the overview,
09:05
instead of the full resolution dataset. So, if you want coarser footprints, but compute it faster, you can work with overviews. And you have different options to reproject to a command CRS, or to densify or simplify the footprints.
09:24
You can split multiple polygons into separate polygons, and you can also remove areas that are smaller than a given value. And it's accessible as a C method, and you can use it from Python.
09:45
Anyone here familiar with VRT, the virtual raster format of GDAL? Anyone has tried to create virtual mosaics with hundreds of thousands of files? And anyone has needed to create VRTs or VRTs to overcome such use cases?
10:05
So, as a new GTI driver, which stands for GDAL Raster Tile Index, is something that will really help you if you have a really large collection of tiles, for which you want to create a virtual mosaic. So, it basically chooses an OGR vector driver as a backend,
10:25
to store the file name and the footprint of the tile. And typically, you want to use that with efficient drivers, like geo-package, like geo-buff, or post.js.
10:42
And besides being able to handle arbitrarily large collection of tiles, it has also smaller, or not so small, but enhancements over VRT. So, typically, you can benefit from the special indices of the backend you use.
11:01
So, you can immediately retrieve a tile in really just a second, or just a few tenths of milliseconds, hopefully. You can have control on the Z-order, if you have overlapping tiles. And it has support for on-the-fly reprojection, if you have a collection of tiles in different projections.
11:24
And it also supports correctly the alpha-bond, when you have overlapping tiles. It will really take into account the alpha-bond, to compose it as a final result of the mosaic. So, you can create a GTI index, using the existing GDALTI index utility,
11:45
which has been announced with a number of new options, under that new case. Or you can sometimes, for example, you would have your catalogue of tiles in some database, and you can, of course, use OGR API to programmatically create a GTI dataset.
12:07
So, basically, a GTI tile index requires a vector layer with a column with a dataset location, and its polygonal footprint as a geometry, and a few metadata items, which helps the drivers to quickly instantiate the visual mosaic.
12:27
So, typically, you will have the resolution, the extent, the CRS data type, and the number of bands. The metadata can be embedded in other vector formats that support that.
12:45
So, currently, we have Geopackage, of course, and the FlageoBuff and PostGIS drivers have been announced to be able to store and retrieve layer metadata. Or you can provide that metadata in a smaller XML file, and here I've put an example of such a file.
13:04
It's really small. It points to the vector dataset where your tiles are referenced, and it contains a layer name, if it's a multi-layer dataset, the field name where you have the location of each tile,
13:21
and, for example, the resolution of the virtual mosaics. I'm just going to quickly recap past developments. So, in GDAL 3.6, we introduced a new API for vector dataset,
13:41
which is based on the row columnar-oriented way of presenting data. So, basically, data for a given field is packed together in memory, which helps it to be more CPU-friendly and storage-friendly also,
14:02
because it increases the efficiency of compression algorithm. So, in GDAL 3.8, we went on improving that by having a few announcements in the packet driver
14:21
to better handle attribute and spatial filtering, and we also added a write side to this arrow API with a write arrow batch method, where you can group together a number of features that you want to write in an efficient way. So, we have generic implementation that works with all OGR drivers that have a write side,
14:44
and we have a specialized implementation for arrow and packet drivers, which have, obviously, a natural way of implementing that new API. OGR to OGR has been announced to support using this read and write sides of arrow,
15:01
and so, for example, now if you do a geo package to parquet format translation, it's three times faster, and parquet to parquet is ten times faster, roughly. There have been announcements in the geo packet driver with the support for reading
15:23
quite complex data structure with nested lists and maps, and those are mapped to the adjacent serialized string to be more friendly with other drivers. We have full spatial filtering and not just a bounding box intersection,
15:42
and we also have implemented the features of the new geo parquet 1.1 specification, so when we write geometry, we also write its bounding box, which helps to have faster filtering on the read side, and on the write side, we support sorting features specially,
16:04
which helps to group together nearby features, and having more efficient use of parquet statistics to be able to read them efficiently, and we have also support for the geo arrow encoding,
16:20
which is an alternate encoding to WKB, and which uses more natural data structure. For example, lines are represented as a list of XY pairs, and it helps for faster and better compression.
16:43
We have support for geometry coordinate precision framework, so it's a way to specify that typically for text-based formats, you have a number of decimals that are significant,
17:01
and so now you can specify that in a unified way, and for drivers that support storing that coordinate precision, it's also stored as metadata, so if you do format translation, in the good case, the coordinate precision will be preserved,
17:21
which makes files smaller and avoids putting insignificant information. We have technical enhancements to the GDAL driver plug-in support, so that's something that has always existed,
17:42
the ability to build a GDAL driver as a separate shared library that is loaded at runtime. So previously, all plug-ins were loaded at instantiation time of GDAL, which could be some tens or hundreds of milliseconds,
18:00
and now we have an improvement where drivers are only loaded when needed. Various different small or not small topics, but we have a new driver for the vector Mermon format, so it's a format that is developed mostly in Catalonia.
18:23
We have the TALDB driver, which has been announced with support for multi-dimensional API. We have a performance improvement in geo-package creation, so typically the spatial index creation used to be a very slow step, and now it's three to four times faster.
18:42
We have a line-of-sight algorithm, so it's to compute the visibility between two points, taking into account the digital elevation model, and we have a number of technical updates and components we use in GDAL.
19:02
GDAL ADO has new enhancements to be able to partially refresh overviews, so sometimes you have an update in a virtual mosaic, and you want to really refresh just a part of your overviews, you can do that easily and efficiently now.
19:25
And a small preview of the next version, so we will have a filter pushdown support for multi-file datasets in GeoParker. The TALDB driver will be announced with new data and overview support. The GDAL viewshed utility will be multi-threaded, so it will be faster.
19:48
We have support for 64-bit subject identifiers in the open file GDB driver, and not in the proprietary one, because S3 didn't provide support for their own format.
20:02
We have a new driver for the open drive format, and there will be a talk later, I don't remember, but in this conference about it, and we will probably have support also for 16-bit floating-bot datatype,
20:20
especially for the ZAR format. And so to conclude, I would like to thank all the sponsors who make it possible to have daily bug fixes and timely review of pull requests and monitoring of security issues. So if your organization critically depends on GDAL,
20:44
you can also join as a sponsor. And that's it, thank you. Thank you very much, Yves, for your great presentation
21:01
with all the nice new features in GDAL and a preview of what's coming up. The floor is now for you in the audience to ask questions. We'll have a microphone that is going around. Who has a question? There's a question in the back.
21:29
Thank you, Yves, for this talk. Very interesting. I was very interested in the tile serving the GDAL-T index with multiple or many files.
21:43
Does it support also different projections? Because you were talking about on-the-fly reprojections, so can all those small multiple files be in different projections? Yes. So in the tile index, all the footprints will be reprojected
22:03
into a single CRS because you need that to be able to find them. You need a unified CRS. And this will be also the one that is used for the virtual mosaic. But if the tile itself is another CRS,
22:20
it will basically use GDAL warp underneath to reproject the tiles into the GTI CRS. And that reprojection will be done at read time when you read the VRT? Not when you actually need to open that tile
22:42
and extract information from it. So it works. Obviously, it will not be the fastest way because on-the-fly reprojection still takes time, but that works. That will give you the same result as if you had materialized the reprojection
23:01
of the tile to the target CRS. Okay. Thank you. We have time for another question. Anyone? No, then I have a quick question. You presented the organizations that are now supporting GDAL development.
23:25
Is this going well? Because in the past, there were some challenges you had. Can you comment a bit on the sustainability now? Yeah, it's always a bit challenging because sometimes organizations change policies
23:43
or sometimes they lay out employees. So the situation is not so good to ask for money to them. And so we rely sometimes just on one or two people in this organization and they have to navigate through their hierarchy
24:02
to make a case for sponsoring free and open source software. So it's not easy, definitely. And it requires effort to really make sure they go on with their sponsoring.
24:21
Thanks a lot, Yves. An applause for Yves and support his work.