Geospatial big data analytics for sustainable smart cities
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 266 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/66422 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G Prizren Kosovo 2023176 / 266
10
17
23
44
45
46
47
48
49
50
52
53
80
84
85
91
110
116
129
148
164
167
169
173
174
181
182
183
186
187
199
202
204
206
209
215
241
248
265
00:00
Gamma functionPoint cloudPersonal digital assistantGeomaticsMusical ensemble
00:19
Domain namePlastikkarteGreen's functionIntegrated development environmentComponent-based software engineeringComputer fileFile formatProcess (computing)Parallel portCluster analysisPlastikkarteBenchmarkFile formatWeightPresentation of a groupComponent-based software engineeringProcess (computing)Content (media)Software developerQueue (abstract data type)Parallel portState of matterInformationSoftware frameworkGreen's functionTable (information)Cluster analysisComputer animation
01:08
Observational studyOrder (biology)CASE <Informatik>PlastikkarteData managementLecture/Conference
01:37
Integrated development environmentParity (mathematics)PlastikkartePopulation densityInformation and communications technologyData managementComplex (psychology)Computer animation
02:11
Physical systemBuildingShift operatorPlastikkarteIntegrated development environmentInternet forumMoving averageCollatz conjectureAsynchronous Transfer ModeGreen's functionInformation and communications technologyDisintegrationSource codeCloud computingProcess (computing)Cluster analysisParallelverarbeitungPrice indexVirtual memoryAbfrageverarbeitungFile formatPoint cloudGoogolData storage deviceWKB-MethodeSoftware developerVector spaceClient (computing)Task (computing)Scheduling (computing)Distribution (mathematics)Model theorySoftware frameworkShader <Informatik>Mathematical analysisVisualization (computer graphics)Software frameworkFlow separationSubject indexingData analysisProcess (computing)Data storage deviceAreaDifferent (Kate Ryan album)DatabaseCoprocessorVector spaceElectric generatorPlastikkarteIntegrated development environmentCluster analysisMassSoftware developerParallel portNatural numberMathematical analysisOrder (biology)Thermal conductivityRaster graphicsLaptopPoint (geometry)Observational studyTask (computing)Partition (number theory)Multiplication signResultantCartesian coordinate systemAnalytic setProgramming languageMulti-core processorInformationClient (computing)State of matterNeuroinformatikServer (computing)Open sourceSource codeGreen's functionSystem administratorFile formatSoftware testingGroup actionPoint cloudModel theoryService (economics)Data warehouseData managementScheduling (computing)Control flowPhase transitionWeb serviceMobile WebCloud computingConsistencyMusical ensembleLattice (order)Set (mathematics)Computer animation
09:08
Open setSource codeAreaObservational study
09:23
FAQPublic key certificateNumberBuildingEstimationElectric currentSummierbarkeitCategory of beingTotal S.A.FrequencyAreaWärmestrahlungSource codePolygonAttribute grammarMeasurementPoint (geometry)Form (programming)Open setCurvatureScaling (geometry)Term (mathematics)Category of beingBuildingArchaeological field surveyLevel of measurementType theoryLocal ringLevel (video gaming)Integrated development environmentOcean currentPoynting vectorComputer animationEngineering drawing
10:25
FAQReading (process)Data typeElectric currentDrum memoryBuildingEstimationTotal S.A.AreaWärmestrahlungWeb pageData structureUniform resource nameGraph (mathematics)BefehlsprozessorCoordinate systemAddress spaceLie groupFrame problemIntegrated development environmentString (computer science)GeometryEuklidischer RaumInterpreter (computing)Model theoryWave packetPredictionCurve fittingLetterpress printingRun time (program lifecycle phase)System programmingSoftware testingParallelverarbeitungSpatial joinScale (map)Software frameworkGroup actionProcess (computing)Queue (abstract data type)Parallel portIntegrated development environmentRun time (program lifecycle phase)2 (number)Partition (number theory)Software frameworkPlanningOrder (biology)Group actionAreaPhysical systemSoftware developerObservational studyData analysisAddress spaceBuildingPlastikkarteTerm (mathematics)File formatShader <Informatik>Scaling (geometry)Inheritance (object-oriented programming)Moore's lawDialectSource codeMusical ensembleVector potentialSimilarity (geometry)Greatest elementModel theoryBitSet (mathematics)System administratorResultantMassLaptopGreen's functionParameter (computer programming)Open sourceChord (peer-to-peer)Cluster analysisTrailComputer fileSoftware testingEnergy conversion efficiencyMathematical analysisAnalytic setCodeLattice (order)Computer animation
17:03
GeomaticsData typeComputer animation
Transcript: English(auto-generated)
00:08
I'm currently working as an assistant professor in Istanbul Technical University, Department of Geomatics Engineering, Istanbul. Here is the table of contents of my presentation.
00:23
I will give a brief information about sustainability, sustainable development goals, European Green Deal, net zero targets, and I will continue with smart state definitions and components. Then I will connect those topics with the geospatial big data, how to handle the geospatial
00:46
big data, and we have solutions like parallel and cluster processing and some frameworks like Dask Geopandas and Apache Sedona, and briefly talk about data file formats like
01:02
GeoPARQ, and I have benchmark tests of the big geospatial big data frameworks, and I will give the case study which is about sustainable smart energy management in order
01:22
to achieve the United Kingdom net zero carbon emission targets. So let's start firstly, I will talk about the problem definition. As you see, the increasing urbanization across the world makes cities more crowded
01:40
and complex. This situation brings along many social, environmental, and economic problems such as housing, traffic, density, and air pollution. It is necessary to overcome these problems and manage cities effectively, and that is
02:00
why we say that there is a need for sustainable smart cities that utilizes information and communication technologies. So in the European Green Deal, there are some targets in order to achieve sustainable
02:22
development goals. The European Commission's Green Deal sets ambitious goals for reducing greenhouse gas emissions and investing the environmental friendly technologies. The European Green Deal sets zero carbon emission by 2050, and there is also a 2030
02:50
goal, which is making the carbon natural continent by at least 50% up to 2030, and
03:05
successfully implementing these goals requires some rigid actions by utilizing geospatial big data, and let's briefly talk about smart cities.
03:23
A smart city can be defined as information technologies to improve its services and the management to solve the problems, and see that computers, mobile phones, even people's
03:40
humans generate massive amount of data, which increases day by day. Well, we need to take care of this huge massive data, handle this big data in order to achieve these Green Deal goals and carbon natural goals, and looking at the smart
04:04
cities, we have several different teams like smart mobility, smart people, smart living, and smart economy, smart government, and today I will be holding to the smart environment, which is related with green buildings, green energy, and green urban planning.
04:25
So how to handle geospatial big data? As we see, we have both structured and unstructured data, several big data sources. We need to handle those by using parallel processing, cluster processing, or cloud
04:44
computing, since these are massive to store and process, analyze, and even visualize them. We cannot use traditional methods and approaches to handle those data.
05:03
We have Apache Sedona and DasGeoPondas, two open source frameworks to process these geospatial big data, they are big data analytics tools and they are using parallel
05:20
processing and special indexes. The parallel processing is a method that uses two or more processors in parallel to process computational tasks in partitions, so it splits the task into several different partitions and assigns one partition to each processor core, which
05:44
greatly reduce the time it takes to process the data. On the other hand, multi-core processors provide better performance and lower the power consumption while taking care of this data, and those frameworks are using also special indexes to increase the speed of processing.
06:08
Here is the general framework of Apache Sedona, it has several developer tools like Apache Zappala and Tableau, Jupyter, Geopondas, and RStudio. In this study we have used
06:26
Jupyter notebook, a developer tool, and it can both work with vector and raster data sources with special query processing layers. We can conduct special k-nearest neighbors,
06:45
special join, NDVI, etc. It has competing engines, Apache Spark and Flink. In this study we have used Apache Spark for the performance test of Apache Sedona to handle
07:02
geospatial big data, and it can also work with different spatial data formats like GeoPark, GeoJSON, shapefile, and GeoTIFF as well. It can also work in a cloud environment,
07:20
Amazon Web Services, Microsoft Azure, Google Cloud, and Databricks. Let's mention also about Dask Geopondas. It has a client user-facing entry point for cluster users, and it works with Jupyter notebook and environment Python programming language. It has a scheduler
07:43
which manages state and sends tasks to workers for execution, and the distributed cluster has also several workers which compute tasks and store and serve computed results to the other workers or clients. Here is the geospatial big data administration
08:06
model framework for sustainable smart cities. Firstly, we have model design and development of the framework, and we have a data warehouse which stores the geospatial big data, and
08:22
we can develop smart city applications, for example, smart environment applications. In the analysis and analytics phase, we have geospatial big data frameworks to process the data. For example, we can specially join the data and create some clusters, special
08:49
clusters like k-means clustering, and after the processing step, we can visualize the data, for example, in data shader. This is the general framework which we have used
09:05
for developing sustainable smart city applications. I will continue with the study area. We have the United Kingdom, Great Britain Islands, England and Wales countries. There is energy
09:26
performance of buildings data of England and Wales. In these data sources, there is lots of attributes like property type, which describes the type of property such
09:41
as house, flat, mezzanets, and built form, detached rest, etc. It also contains environment impacts, which is a measure of the property's current impacts on the environment in terms of carbon dioxide emissions and many other useful attributes. This is in the England
10:10
and Wales scale, and we have also used the Ordnance Survey Open Map local data source. We have extracted the buildings of the whole two countries, England and Wales. This is
10:23
the polygon data, and this is the point vector data. These are the two data sources we have used in our study, and here is the methodology. We have tested two different analysis, special join and special clustering approaches in both Apache, Sedona, and that's
10:48
GeoPANDAS. Firstly, we have downloaded the data from the relevant sources, as I mentioned before. We have used GeoParQ and GeoPackage data sources, and we have also tested the
11:10
performance of those data file types. After having the data from the data sources, we have saved it as ParQ file format, which creates special partitions and provides faster
11:31
data manipulation. This is the second data source, EEPC. We have read it in ParQ file format, and we have conducted special join in here by using Dask GeoPANDAS. The reason
11:52
doing this analysis was creating building-scale data analytics for the study area to observe
12:07
which regions, which buildings have more energy efficiency, and which buildings have lower energy scores, so that we can manage this energy issue, carbon dioxide emissions,
12:27
so that we can create some politics, like we can change the plans, zoning plans, so
12:46
we can increase the green areas or lower the residential areas. It gives us to administrate the city as a whole in order to successfully apply the sustainable development goals in
13:05
terms of sustainable smart cities. This is another special join. We have read the data in the Apache Sedona, in GeoParQ file format as well, and UPRN was the address
13:21
of the buildings. We have specially joined the two data, and these are the Dask GeoPANDAS and Apache Sedona special join, which we have conducted in the Jupyter Notebook Python environment, and we have also conducted k-means clustering in order to cluster the
13:46
buildings to see, to observe where is the specially clustered energy efficiency regions, and this is the Apache Sedona codes for k-means. We have conducted defining the
14:08
k-parameter with the album methods, and find the clusters as two, and here is the performance test of the parallel processing systems, both for the special join and clustering.
14:23
For the clustering, we have faced problems, and we lowered the data in order to make it work for the Apache Sedona, so it is seen that it has a shorter run time than
14:46
compared with the special join. The bottom frameworks has similar performances, but Apache Sedona has a little bit faster in terms of run time of the performance test,
15:02
in the performance test, but using parallel processing requires really big data solutions. When you use small datasets, you will see that it is nonsense to use those parallel processing systems. The traditional methods work faster in the relatively small datasets,
15:24
and we have visualized the building scale energy efficiency analytics using data shader. You can see that the yellow buildings have more energy efficiency, sorry, the lower energy efficiency, and we can make more detailed solutions for those regions in order to increase
15:50
the sustainability of the cities in terms of smart energy, green energy. Looking at the results, the data-driven tracking of smart cities for climate action
16:02
is crucial. We can use the data in order to plan the cities, zoning plans, green areas, the buildings, and make the actions, realistic actions, to achieve those green deal sets
16:25
for all over the world. And the development of a holistic geospatial big data administration model gives us a computation power in order to handle geospatial big data, and we have
16:40
compared the two geospatial big data analytics tools. We can use those open source geospatial big data frameworks in order to efficiently handle these massive data sources and revealing the potential of geospatial big data analytics was the result of our
17:02
study. Thank you so much. If you have any questions, I will be happy to answer them. Thank you.