How to deal with a massive geographic database when surrounded by datascientists? - TIB AV-Portal

How to deal with a massive geographic database when surrounded by datascientists?

0

Related Material

Open Source Geospatial Foundation (OSGeo)

Haubourg, Régis

Formal Metadata

Title

How to deal with a massive geographic database when surrounded by datascientists?

Title of Series

FOSS4G Firenze 2022

Number of Parts

351

Author

Haubourg, Régis

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/69029 (DOI)

Publisher

Open Source Geospatial Foundation (OSGeo)

Release Date

Language

Production Year

2022

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The Scientific and Technical Center for Building (CSTB) built the first French database of buildings and houses to address climate change challenge, helping knowledge and decision making for massive retrofit. The pipeline factory intersects massive datasets (21 Millions buildings, ＞400 descriptors) and keeps adding new predictions and external datasets all the time. It allows to run analyses and predictions for all the climate change related indicators, such as housing price and energetic performance relation, heat wave impact, solar potential, etc.. While the first versions where a direct image of the classical datascientist’s approach -ie a massive dataframe driven by massive yaml config files and cryptic meta-templated scripts– ease of use and access performance soon became a limiting factor. This is a major concern since this dataset will be one long term foundation of derived information systems. Between brute force approach based on scaling resources up, and the old fashioned « data diet » normalization and optimization process, the truth is not easy to find. Abusing from cartoonish humor, this talk will try to explore the benefits of normalizing back hugely redundant geographic datasets and making public interfaces (public SQL model, API’s, vector tiles, OGC API’s) so that both end users can analyze efficiently this dataset, and the data manager team can rely on more stability using those good old’ database constraints.

Keywords

UsecasesAndapplications