We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How to deal with a massive geographic database when surrounded by datascientists?

Formale Metadaten

Titel
How to deal with a massive geographic database when surrounded by datascientists?
Serientitel
Anzahl der Teile
351
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
Produktionsjahr2022

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
The Scientific and Technical Center for Building (CSTB) built the first French database of buildings and houses to address climate change challenge, helping knowledge and decision making for massive retrofit. The pipeline factory intersects massive datasets (21 Millions buildings, >400 descriptors) and keeps adding new predictions and external datasets all the time. It allows to run analyses and predictions for all the climate change related indicators, such as housing price and energetic performance relation, heat wave impact, solar potential, etc.. While the first versions where a direct image of the classical datascientist’s approach -ie a massive dataframe driven by massive yaml config files and cryptic meta-templated scripts– ease of use and access performance soon became a limiting factor. This is a major concern since this dataset will be one long term foundation of derived information systems. Between brute force approach based on scaling resources up, and the old fashioned « data diet » normalization and optimization process, the truth is not easy to find. Abusing from cartoonish humor, this talk will try to explore the benefits of normalizing back hugely redundant geographic datasets and making public interfaces (public SQL model, API’s, vector tiles, OGC API’s) so that both end users can analyze efficiently this dataset, and the data manager team can rely on more stability using those good old’ database constraints.
Schlagwörter