We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

State of GDAL (versions 3.6 and 3.7)

00:00

Formale Metadaten

Titel
State of GDAL (versions 3.6 and 3.7)
Serientitel
Anzahl der Teile
266
Autor
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
FOSS4G 2023 Prizren This talk will give a status report on the GDAL software, focusing on recent developments and achievements in the 3.6 and 3.7 GDAL versions released during the last year, but also on the general health of the project. The discussed topics will be as various as the scope of GDAL is, covering the new single CMake build system, the full open source write vector support for the Esri FileGeodatabase format, a Arrow-based columnar oriented read API for vector layers implement in the Arrow, (Geo)Parquet, GeoPackage and FlatGeoBuf drivers, new vector layer API for table relationsihp management, new raster drivers for the JPEG-XL, KTX2, BASISU, NSIDCbin formats, multi-threaded read capabilities in the GeoTIFF driver, multiple performance improvements in the GeoPackage driver, advanced API to read raster compressed data, a new vector driver for the General Transit Feed Specification (GTFS), support for the new Seek Optimized ZIP (SOZip) specification, etc.
Algebraisches ModellZustandSchlussregelOpen SourceServerFreewareMAPSoftwareentwicklerVorlesung/Konferenz
AbstraktionsebeneRechenschieberMathematikRichtungProgrammbibliothekBitVorlesung/Konferenz
Open SourceProtokoll <Datenverarbeitungssystem>DateiformatVektorraumBitmap-GraphikProgrammbibliothekAbstraktionsebeneLesen <Datenverarbeitung>DateiformatBlackboxSoftwareSchreiben <Datenverarbeitung>Open SourceTypentheorieDienst <Informatik>Protokoll <Datenverarbeitungssystem>VersionsverwaltungProgrammfehlerDifferenteVorlesung/Konferenz
AbstraktionsebeneProgrammbibliothekLesen <Datenverarbeitung>Bitmap-GraphikVektorraumProtokoll <Datenverarbeitungssystem>DateiformatPhysikalisches SystemSystemprogrammierungVisuelles SystemKonfiguration <Informatik>VersionsverwaltungPhysikalisches SystemAdditionExistenzsatzGebäude <Mathematik>Vorlesung/Konferenz
Physikalisches SystemSigma-AlgebraVisuelles SystemSystemprogrammierungKonfiguration <Informatik>Technische OptikGebäude <Mathematik>VisualisierungPhysikalisches Systemsinc-FunktionBesprechung/InterviewVorlesung/Konferenz
PortabilitätGüte der AnpassungEinfache GenauigkeitPhysikalisches SystemIntegralKonfiguration <Informatik>Vorlesung/Konferenz
VektorraumAdditionAttributierte GrammatikDatenverwaltungDomain <Netzwerk>IndexberechnungDatenbankSpeicherabzugGebäude <Mathematik>Treiber <Programm>Plug inDateiformatElektronische PublikationMultiplikationsoperatorOpen SourceDruckertreiberOffene MengeFormation <Mathematik>DatenbankElementargeometrieCluster <Rechnernetz>Besprechung/InterviewVorlesung/Konferenz
DruckertreiberNichtlinearer OperatorPunktReverse EngineeringDateiformatElektronische PublikationRechter WinkelBesprechung/InterviewVorlesung/Konferenz
Attributierte GrammatikAdditionVektorraumDomain <Netzwerk>DatenverwaltungIndexberechnungDatenbankSpeicherabzugDatenfeldIndexberechnungSoftwareAttributierte GrammatikDomain <Netzwerk>FunktionalVorlesung/Konferenz
FunktionalSchnittmengeElektronische PublikationBenutzeroberflächeDatenbankBrowserElementargeometrieBesprechung/InterviewVorlesung/Konferenz
FunktionalLesen <Datenverarbeitung>Bitmap-GraphikSchnittmengeHilfsprogrammDruckertreiberBesprechung/InterviewVorlesung/Konferenz
InformationLesen <Datenverarbeitung>Bitmap-GraphikSpeicherabzugDatenkompressionAttributierte GrammatikTabelleElementargeometrieTesselationReservierungssystem <Warteschlangentheorie>MereologieDruckertreiberBitmap-GraphikVorlesung/Konferenz
DatenkompressionAttributierte GrammatikDatensichtgerätMultiplikationOpen SourceAuflösung <Mathematik>DruckertreiberBitmap-GraphikTabelleVerkehrsinformationVorlesung/Konferenz
InformationLesen <Datenverarbeitung>Bitmap-GraphikSpeicherabzugDatenkompressionAttributierte GrammatikTabelleBitmap-GraphikFormation <Mathematik>VersionsverwaltungSchnittmengeDateiformatTreiber <Programm>DualitätstheorieLesen <Datenverarbeitung>FehlermeldungBitZeitrichtungVektorraumBesprechung/InterviewVorlesung/Konferenz
KommandospracheZeitrichtungMagnettrommelspeicherZeitrichtungKommandospracheFehlermeldungProjektive GeometrieVorlesung/Konferenz
ZeitrichtungKommandospracheDateiformatProjektive GeometrieSoftwareSchwach besetzte MatrixVorlesung/Konferenz
InformationAttributierte GrammatikIkosaederThreadInformationsspeicherungSelbstrepräsentationDatensatzTabelleAttributierte GrammatikMultiplikationsoperatorObjekt <Kategorie>Vorlesung/Konferenz
Attributierte GrammatikInformationStapeldateiZeitrichtungStreaming <Kommunikationstechnik>Interface <Schaltung>Offene MengeZeitstempelElementargeometrieObjekt <Kategorie>BinärcodeProzess <Informatik>PufferspeicherDatensatzStreaming <Kommunikationstechnik>DatenkompressionHalbleiterspeicherWellenformVererbungshierarchieKlasse <Mathematik>SkalarfeldBefehlsprozessorVorlesung/Konferenz
ZeitrichtungStapeldateiStreaming <Kommunikationstechnik>Interface <Schaltung>Offene MengeAttributierte GrammatikStreaming <Kommunikationstechnik>SchnittmengeTypentheorieDatenstrukturVorlesung/KonferenzComputeranimation
ProgrammbibliothekSchnittmengeFehlermeldungStapeldateiDatenstrukturVorlesung/Konferenz
StapeldateiZeitrichtungStreaming <Kommunikationstechnik>Interface <Schaltung>Offene MengeGebäude <Mathematik>LaufzeitfehlerVorlesung/KonferenzBesprechung/Interview
ZeitrichtungStreaming <Kommunikationstechnik>Interface <Schaltung>Offene MengeStapeldateiZeitrichtungProgrammbibliothekSchnittmengeDatenstrukturGenerizitätFehlermeldungBitImplementierungTreiber <Programm>FlächeninhaltVorlesung/Konferenz
GenerizitätImplementierungTreiber <Programm>ElementargeometrieVektorraumOverhead <Kommunikationstechnik>Orientierung <Mathematik>Treiber <Programm>SelbstrepräsentationImplementierungDatensatzOverhead <Kommunikationstechnik>Bridge <Kommunikationstechnik>Funktion <Mathematik>PunktVorlesung/Konferenz
ImplementierungTreiber <Programm>TesselationMultiplikationsoperatorDatensatzSchwach besetzte MatrixFehlermeldungGlobale OptimierungBesprechung/InterviewVorlesung/Konferenz
ZeitrichtungTreiber <Programm>ImplementierungGenerizitätElementargeometrieVektorraumIterationOverhead <Kommunikationstechnik>ImplementierungKrümmungsmaßElementargeometrieFehlermeldungTreiber <Programm>Overhead <Kommunikationstechnik>Funktion <Mathematik>Vorlesung/Konferenz
ZeitrichtungBenchmarkPolygonAttributierte GrammatikGebäude <Mathematik>SchnittmengeDatensatzHochdruckAttributierte GrammatikVorlesung/Konferenz
TeilbarkeitAttributierte GrammatikMultiplikationsoperatorDatensatzMailing-ListeDateiformatZeitrichtungTypentheorieRohdatenVorlesung/Konferenz
ZeitrichtungThreadBenchmarkAttributierte GrammatikPolygonDateiformatElementargeometrieImplementierungATMVorlesung/Konferenz
PolygonAttributierte GrammatikBenchmarkThreadDatenkompressionATMElementargeometrieImplementierungEinflussgrößeGüte der AnpassungElektronische PublikationVorlesung/Konferenz
Translation <Mathematik>GenerizitätMultiplikationsoperatorTreiber <Programm>VersionsverwaltungVorlesung/KonferenzBesprechung/Interview
Treiber <Programm>DruckertreiberBitmap-GraphikGraphikprozessorBasis <Mathematik>Textur-MappingCodecOpen SourceProgrammbibliothekDateiformatTreiber <Programm>Basis <Mathematik>GrundraumTextur-MappingGraphikprozessorKoroutineBefehlsprozessorDatenkompressionDruckertreiberCodierung <Programmierung>Translation <Mathematik>Vorlesung/Konferenz
Bitmap-GraphikTreiber <Programm>Basis <Mathematik>GraphikprozessorTextur-MappingDruckertreiberCodecOpen SourceProgrammbibliothekATMCodecDatenkompressionDruckertreiberFormation <Mathematik>Elektronische PublikationVorlesung/Konferenz
DruckertreiberBitmap-GraphikTreiber <Programm>Basis <Mathematik>Textur-MappingBefehlsprozessorCodecOpen SourceProgrammbibliothekVektorraumDateiformatTransportproblemInformationTabelleBitProgrammbibliothekOpen SourceDigitalisierungSuite <Programmpaket>DruckertreiberBinärdatenBitmap-GraphikVorlesung/Konferenz
GruppenoperationBinärdatenSchnittmengeSchedulingDruckertreiberBitmap-GraphikDateiformatKonzentrizitätUmwandlungsenthalpieRohdatenTransportproblemVorlesung/Konferenz
Bitmap-GraphikDruckertreiberTreiber <Programm>VektorraumTabelleDateiformatTransportproblemInformationDruckertreiberTabelleAssoziativgesetzAttributierte GrammatikTypentheorieEigentliche AbbildungFormale SpracheMetadatenElementargeometrieFunktionalHilfsprogrammDatenstrukturWorkstation <Musikinstrument>InformationVektorraumSchnelltasteBesprechung/InterviewVorlesung/Konferenz
FlächeninhaltPolygonFunktion <Mathematik>AdditionFunktion <Mathematik>HilfsprogrammSkriptspracheDifferenteAttributierte GrammatikLineare GeometrieTeilbarkeitDatenkompressionBitmap-GraphikRandomisierungInformationDatenfeldMultiplikationsoperatorElementargeometrieTreiber <Programm>Virtuelle MaschineMessage-PassingCodecDatensatzCASE <Informatik>DateiformatEinfache GenauigkeitInverser LimesFramework <Informatik>Kategorie <Mathematik>Syntaktische AnalyseAlgorithmische GeometrieTypentheorieArithmetische FolgeURLBesprechung/InterviewVorlesung/Konferenz
ThreadFormation <Mathematik>DatenkompressionTeilbarkeitMultiplikationDruckertreiberVektorraumStandardabweichungDatenkompressionPunktwolkeImplementierungElementargeometrieVolumenvisualisierungTesselationMultiplikationsoperatorDruckertreiberDateiformatMaßerweiterungReservierungssystem <Warteschlangentheorie>MereologieElektronische PublikationVersionsverwaltungCASE <Informatik>InformationsspeicherungObjekt <Kategorie>Vorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
So, um, my name is Evan rule. I'm a independent free and open source developer mostly focused on GDAL, maps, server, uh, curious and approach. And I give you know, uh,
an overview of the changes that GDAL has received in the last, uh, GDAL three, six and three, seven releases. And I took a bit, uh, about the future directions to, uh, next slide please. Um, so, um, what is GDAL in just one slide?
So if first it stands for the geospatial data abstraction library, uh, which is a black box, uh, you use often without realizing it when you want to read or write geospatial formats in MOSI or C plus plus open source or cross all GIS software. Um, as of today,
it under small than 250 different formats. So, uh, network protocols and services. It is a released under the MIT princess license, which is super permissive. And we release a version every six months with new features and bug fix
releases every two months. And next type is, uh, yeah, next. Thanks. So, uh, one of the major items of the GDAL three, six version that's been the full transition to the C make build system in
GDAL three dot five. It was introduced in additions to the existing auto tours and a visual studio and make build systems, um, that existed, uh, since forever, but showed their age and the C makes offer many advantages,
uh, in particular offering, uh, um, a single cross platform solution with a good integration with many sub party tours. And, um, the transition to C make rent are really well. So as of today, it's not the only built system available. Uh,
we have an extensive discrimination of all the build options, which are mostly related to the dependency of GDAL. And, uh, also, um, how to build drivers as plugins or to skip drivers from being compiled.
Um, GDAL has supported their S3 file geodatabase format for a long time. First were driver, which use the cross source SDK. And many years ago we came with a open file GDB driver, uh, which is a fully open source driver with any external dependency. Um,
so the driver only supported read operations until no, but we manage to complete as a reverse and engineering to the point where the format was very well known and we could add a write support. So in a, in GDAL three to six, you can do whatever you expect from a GDAL driver with a right capabilities.
You can create a new file geodatabase, add or remove layers to it. You can, uh, uh, the date or delete, uh, attributes, uh, features. Uh, um, you can create spatial and attribute indices, um,
as well as more advanced, uh, functionalities such as a support for field domains or layer relationships. And a quite fun fact as a basic licensing of, uh, as a proprietary vendor or software doesn't even offer us that one functionality. So with GDAL you can do that for free. Uh,
so, um, Niall Dawson from a QGIS also integrated those new capabilities in the QGIS user interface. So from the QGIS browser, you can easily explore credit, uh, uh, or edit a file geodatabase. Uh,
so still related to, uh, uh, no read support, uh, for, um, a raster data sets. Uh, this functionality was not available in the proprietary SDK, so there was really no easy solution up to now.
You had previously to use an external tool as a raster rescue utility. And so now it's functionalities are fully integrated as a GDAL raster driver with, uh, all the things you expect, like being able to read the CRS and the georeferencing, uh,
access, uh, just a part of the tiles or of, um, pyramids for faster multi resolution display. Uh, as a driver or source reports, uh, or the compression methods. Uh, and you can access the value attribute tables as a GDAL raster attribute
table. Um, so the current version only read the raster data sets that has been produced by ArcGIS 10 and in the next GDAL 38 version you will be able to read data sets from ArcGIS 9. Um, no,
let's talk a bit, uh, the topic of columnar oriented, uh, vector reading. So in GDAL 35 we had two new drivers for the parquet and arrow columnar formats that were only using the traditional OGR feature based
API. And so what is important about this arrow based, uh, I think is the world ecosystem is that, uh, is around it. So you have many Apache, uh, uh, projects, uh, such as a spark drill, could do and parquet,
and you have also other projects that use it to facilitate exchange between different software. Uh, you have a pandas and Geopandas, you have a special ecosystem. You have a tie DB, which is a multidimensional sparse and a dense array, a storage solution. You have duck DB and now GDAL.
So you may wonder what do we call exactly or columnar oriented representation of a table. So here we, we did this really simple example of table with four and three,
uh, columns or attributes. So you have a one attribute with an object ID and those are one with a time stamp. And the last one is a geometry. Uh, it's shown as a WKT, but generally it will be encoded in a, in a binary way from efficient processing. And so in a row, a feature based memory buffer,
you will find the data for each, uh, for each row column by column and followed by the data for the second row and so on. Whereas in a columnar memory buffer, you will first find all the data for a given column organized in a
consecutive way in memory, which enables a better compression and as well using the super scalar capabilities of CPU's. So concretely what was implemented in GDAL is a new method in the OGR layer
class, which returns a narrow stream C structure. So a narrow stream C structure is a standardized API that enables the user to get a schema, uh, which is a set of attributes with a name and types.
And then you have a get next method where you can iterate us through the data, through batches of many features organized in a columnar way. And I should also mention that despite the error in the name,
there is no dependency to the error C plus plus library. So it's just a set of C structures, uh, which are shared by the many projects and unable to exchange data. So there's no runtime, uh, new requirements. It will be available in all GDAL builds.
If you use a Python API, you have two new methods, one which returns data using the structures of the py arrow library and self-contained self-contained implementation, um, which returns data as a set of NumPy errors.
So, um, previously we, we discussed, uh, discussed about the API. No, let's shake a bit. It's implemented. So there's a generic implementation, uh, which is available for all OGR supported drivers, uh, even those who don't specifically implement the new API or which,
are not even columnar oriented. Um, of course there are some overhead in doing this bridge, uh, between, uh, uh, a row, a row oriented, um,
API and the columnar representation. But if you're going to do it at some point, it's better to let GDAL do it. Uh, because, uh, for some of the drivers you may have an optimized implementation. Uh, so, uh, you have also the drivers, uh,
which, uh, this optimized implementation. Of course you, you will find the parquet and a row drivers. Uh, you also have a tile DB, uh, when you want to read the time DB, uh, sparse RS. Um, and as a geo package and flat geo birth drivers, uh,
I've been, uh, improved to have a specific implementation of a zero API, even if they're not, uh, uh, columnar oriented, oriented and the CSA, uh, saves the override override of the generic implementation. Sorry.
So, uh, what about performance? Uh, yeah, I've listed, um, the timings for loaded, uh, quite large data sets, which is made of a sweet that 2 million records with a bidding for prints of New Zealand. Uh, and each record has a certain attribute.
So you can see in the list that, uh, for the, the two listed formats, uh, parquet and tie DB, uh, you can get her four times to six times a performance improvement factor using as a raw API, uh, which is expected given that those formats are column now oriented.
And, uh, we also have a heavily tuned geo package implementation and it's, it performs extremely well. It can be even be slightly faster than a geo pocket in a multi-threading mode. Of course, um, there's a, uh, some trade off because pocket compresses,
very well. Uh, whereas a geo package has a new native compression as a flag, your birth implementation or so brings a substantial performance improvement and for a good measure of also listed a ship file ship file has no,
uh, dedicated implementation of the new API. So it goes with a generic translation layer and here you can see that there is what that five times slow down. So what about the new drivers added in as a recent version in a GDAL three,
six, we added two closely related drivers that deal with a GPU optimized textures formats. So there was a basis, a universal format and the K T X two as those formats enable a very quick
translation, uh, between GPU compressed and compressed texture formats to be noted though, that GDAL driver only use a CPU based encoding and decoding routines. And this is GDAL three five. Uh,
we add a support for the JPEG Excel compression. Uh, this was only as a TIFF codec until now and now it's a, a freestanding driver. Um, so when you have a J J Excel file, you can open it. And J Excel is a new, uh,
compression methods that competes with a modern codecs such as a AVF, AF, HEAF or WebP. Um, it has two, uh, two modes, lossy and lossless mode. And the lossless mode is quite interesting. If you're dealing with a 12 or 16 bit, uh, data,
uh, and so like, uh, uh, TIFF, JPEG, Excel codec, it relies on the open source libjexcel library. In a GDAL three seven, um, the N S I D C bin raster driver has been added.
So this is a driver to access, uh, as a raw binary format for sea ice, uh, concentration data sets that are produced by the U S national snow and ice data center. Um, there's also a new OGR driver for the GTFS format.
GTFS is a general transit feet specification and it's used for a public transportation, transportation schedule. And, uh, they're associated geographic information. So the driver exposes a text tables with proper attribute typing and it also
builds a OGR geometries, uh, for the stations and trips. Um, OGR info, uh, which is a utility you use to get access to the structure and the metadata
vector layers, uh, has been enhanced to be available from programming languages. So the main function is a C function GDAL vector info, uh, which is, uh, accessible through Python and all the other sweet bindings of GDAL.
But as a main addition is a new Jason based output that you can get with a slash Jason, uh, switch. So now you can get information about the geometry types, a CRS, uh, field definitions, uh,
and uh, every other output of OGI info in a, in a machine readable friendly way. So you don't have any longer to use grape or oak or any other, uh, and passing of OGR info output.
Um, in the random category, um, I should mention the OGR layer algebra script that is now available as a fully, uh, official utilities. So this utility can compute a intersection union differences between the two
layers. Uh, as OGR layer API has also been improved, uh, with a two methods to, to modified features and in particular date feature, methods can bring a substantial performance improvements for drivers that
implemented. So currently we have the Mongo DB, PostgreSQL and Geopackage drivers that implemented and uh, for Geopackage, uh, sorry for Geopackage. Um, uh, in some cases you can get to a four times a performance improvement when you want to update, for example, just one single attribute in a millions of rows.
Um, the PNG decompression has been substantially improved. So you can get almost a two times acceleration factor and it's available in the PNG and all drivers that use a PNG such as a Geopackage raster,
the WMS or WNTS. Um, there's also a new, um, framework, uh, which has been added in the raster API to be able to directly access, uh, compressed, uh, data, uh, without decompressing it.
And so it, uh, enables, uh, to losslessly convert between different formats that would use the same codec. For example, JPEG or JPEG XM. Uh, what has been implemented for now is, uh, is a bit limited,
but the API could, uh, opens a door to losslessly convert from a geopackage, uh, using JPEG compression to a tight geotiff using JPEG compression as well. So for those who heavily use a T for cloud optimized geotiff,
you will maybe interested in this performance implementing the geotiff driver. Uh, so when you want to read several times at once, it can parallelize the decompression. And in the best cases you can have a five times faster on decompression.
Uh, I've another talk later, uh, later today, uh, about, uh, so zip, which is an announcement to the zip format, which enables to read, uh, uh, random parts within a big compressed file in the zip.
And, uh, it has quite interesting use cases. If you have like a gigabytes, uh, geopackage files and you want to be able to use them directly from the zip. So to almost conclude, let's have a quick look at, uh,
the GDAL38 versions that will be released this November. Um, we'll have a driver to read the OGC feature and geometries JSON format, which is an extension of a geojson, uh, that has a CRS support. Uh, you also have time and 3d geometries. Um,
there will also be a new driver to run the vector tiles, uh, uh, encoded in a PN tiles archive. So PN tiles is a cloud friendly tile container, um, that enables you to serve tiles efficiently from, uh, uh,
an object storage and a tile DB driver will also be enhanced with an implementation of the multi-dimensional API.