We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Geospatial big data analytics for sustainable smart cities

00:00

Formal Metadata

Title
Geospatial big data analytics for sustainable smart cities
Title of Series
Number of Parts
266
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
This presentation focuses on the role of geospatial big data analytics tools in advancing sustainable smart cities. It emphasizes that achieving United Nations Sustainable Development Goals (SDGs) related to sustainable cities, clean energy, industry, innovation, and climate action can be facilitated by effectively implementing the smart city concept, which relies on location-based data and technologies like big data, Geographic Information Systems (GIS), cloud computing, and the Internet of Things (IoT). The research delves into the practical application of these tools, particularly Dask-GeoPandas and Apache Sedona, for handling geospatial big data in the context of smart cities. Performance comparisons reveal that these cluster computing systems outperform traditional methods, providing faster and more efficient data handling, which is essential for urban management in smart cities. The talk also highlights the advantages of the GeoParquet data format, which is faster and more compact than other formats like GPKG. Additionally, the presentation emphasizes the significance of open data sources, such as Energy Performance Certificates (EPC) data and mapping data, in analyzing the energy efficiency of domestic buildings, aligning with net zero carbon emission goals. By leveraging geospatial big data analytics tools, cities can effectively manage urban infrastructure and buildings while advancing sustainability and energy efficiency objectives. This study underscores the potential of these tools for smart infrastructure and buildings and suggests that future research could explore larger spatial datasets and cloud-native platforms to further test their capabilities.
Gamma functionPoint cloudPersonal digital assistantGeomaticsMusical ensemble
Domain namePlastikkarteGreen's functionIntegrated development environmentComponent-based software engineeringComputer fileFile formatProcess (computing)Parallel portCluster analysisPlastikkarteBenchmarkFile formatWeightPresentation of a groupComponent-based software engineeringProcess (computing)Content (media)Software developerQueue (abstract data type)Parallel portState of matterInformationSoftware frameworkGreen's functionTable (information)Cluster analysisComputer animation
Observational studyOrder (biology)CASE <Informatik>PlastikkarteData managementLecture/Conference
Integrated development environmentParity (mathematics)PlastikkartePopulation densityInformation and communications technologyData managementComplex (psychology)Computer animation
Physical systemBuildingShift operatorPlastikkarteIntegrated development environmentInternet forumMoving averageCollatz conjectureAsynchronous Transfer ModeGreen's functionInformation and communications technologyDisintegrationSource codeCloud computingProcess (computing)Cluster analysisParallelverarbeitungPrice indexVirtual memoryAbfrageverarbeitungFile formatPoint cloudGoogolData storage deviceWKB-MethodeSoftware developerVector spaceClient (computing)Task (computing)Scheduling (computing)Distribution (mathematics)Model theorySoftware frameworkShader <Informatik>Mathematical analysisVisualization (computer graphics)Software frameworkFlow separationSubject indexingData analysisProcess (computing)Data storage deviceAreaDifferent (Kate Ryan album)DatabaseCoprocessorVector spaceElectric generatorPlastikkarteIntegrated development environmentCluster analysisMassSoftware developerParallel portNatural numberMathematical analysisOrder (biology)Thermal conductivityRaster graphicsLaptopPoint (geometry)Observational studyTask (computing)Partition (number theory)Multiplication signResultantCartesian coordinate systemAnalytic setProgramming languageMulti-core processorInformationClient (computing)State of matterNeuroinformatikServer (computing)Open sourceSource codeGreen's functionSystem administratorFile formatSoftware testingGroup actionPoint cloudModel theoryService (economics)Data warehouseData managementScheduling (computing)Control flowPhase transitionWeb serviceMobile WebCloud computingConsistencyMusical ensembleLattice (order)Set (mathematics)Computer animation
Open setSource codeAreaObservational study
FAQPublic key certificateNumberBuildingEstimationElectric currentSummierbarkeitCategory of beingTotal S.A.FrequencyAreaWärmestrahlungSource codePolygonAttribute grammarMeasurementPoint (geometry)Form (programming)Open setCurvatureScaling (geometry)Term (mathematics)Category of beingBuildingArchaeological field surveyLevel of measurementType theoryLocal ringLevel (video gaming)Integrated development environmentOcean currentPoynting vectorComputer animationEngineering drawing
FAQReading (process)Data typeElectric currentDrum memoryBuildingEstimationTotal S.A.AreaWärmestrahlungWeb pageData structureUniform resource nameGraph (mathematics)BefehlsprozessorCoordinate systemAddress spaceLie groupFrame problemIntegrated development environmentString (computer science)GeometryEuklidischer RaumInterpreter (computing)Model theoryWave packetPredictionCurve fittingLetterpress printingRun time (program lifecycle phase)System programmingSoftware testingParallelverarbeitungSpatial joinScale (map)Software frameworkGroup actionProcess (computing)Queue (abstract data type)Parallel portIntegrated development environmentRun time (program lifecycle phase)2 (number)Partition (number theory)Software frameworkPlanningOrder (biology)Group actionAreaPhysical systemSoftware developerObservational studyData analysisAddress spaceBuildingPlastikkarteTerm (mathematics)File formatShader <Informatik>Scaling (geometry)Inheritance (object-oriented programming)Moore's lawDialectSource codeMusical ensembleVector potentialSimilarity (geometry)Greatest elementModel theoryBitSet (mathematics)System administratorResultantMassLaptopGreen's functionParameter (computer programming)Open sourceChord (peer-to-peer)Cluster analysisTrailComputer fileSoftware testingEnergy conversion efficiencyMathematical analysisAnalytic setCodeLattice (order)Computer animation
GeomaticsData typeComputer animation
Transcript: English(auto-generated)
I'm currently working as an assistant professor in Istanbul Technical University, Department of Geomatics Engineering, Istanbul. Here is the table of contents of my presentation.
I will give a brief information about sustainability, sustainable development goals, European Green Deal, net zero targets, and I will continue with smart state definitions and components. Then I will connect those topics with the geospatial big data, how to handle the geospatial
big data, and we have solutions like parallel and cluster processing and some frameworks like Dask Geopandas and Apache Sedona, and briefly talk about data file formats like
GeoPARQ, and I have benchmark tests of the big geospatial big data frameworks, and I will give the case study which is about sustainable smart energy management in order
to achieve the United Kingdom net zero carbon emission targets. So let's start firstly, I will talk about the problem definition. As you see, the increasing urbanization across the world makes cities more crowded
and complex. This situation brings along many social, environmental, and economic problems such as housing, traffic, density, and air pollution. It is necessary to overcome these problems and manage cities effectively, and that is
why we say that there is a need for sustainable smart cities that utilizes information and communication technologies. So in the European Green Deal, there are some targets in order to achieve sustainable
development goals. The European Commission's Green Deal sets ambitious goals for reducing greenhouse gas emissions and investing the environmental friendly technologies. The European Green Deal sets zero carbon emission by 2050, and there is also a 2030
goal, which is making the carbon natural continent by at least 50% up to 2030, and
successfully implementing these goals requires some rigid actions by utilizing geospatial big data, and let's briefly talk about smart cities.
A smart city can be defined as information technologies to improve its services and the management to solve the problems, and see that computers, mobile phones, even people's
humans generate massive amount of data, which increases day by day. Well, we need to take care of this huge massive data, handle this big data in order to achieve these Green Deal goals and carbon natural goals, and looking at the smart
cities, we have several different teams like smart mobility, smart people, smart living, and smart economy, smart government, and today I will be holding to the smart environment, which is related with green buildings, green energy, and green urban planning.
So how to handle geospatial big data? As we see, we have both structured and unstructured data, several big data sources. We need to handle those by using parallel processing, cluster processing, or cloud
computing, since these are massive to store and process, analyze, and even visualize them. We cannot use traditional methods and approaches to handle those data.
We have Apache Sedona and DasGeoPondas, two open source frameworks to process these geospatial big data, they are big data analytics tools and they are using parallel
processing and special indexes. The parallel processing is a method that uses two or more processors in parallel to process computational tasks in partitions, so it splits the task into several different partitions and assigns one partition to each processor core, which
greatly reduce the time it takes to process the data. On the other hand, multi-core processors provide better performance and lower the power consumption while taking care of this data, and those frameworks are using also special indexes to increase the speed of processing.
Here is the general framework of Apache Sedona, it has several developer tools like Apache Zappala and Tableau, Jupyter, Geopondas, and RStudio. In this study we have used
Jupyter notebook, a developer tool, and it can both work with vector and raster data sources with special query processing layers. We can conduct special k-nearest neighbors,
special join, NDVI, etc. It has competing engines, Apache Spark and Flink. In this study we have used Apache Spark for the performance test of Apache Sedona to handle
geospatial big data, and it can also work with different spatial data formats like GeoPark, GeoJSON, shapefile, and GeoTIFF as well. It can also work in a cloud environment,
Amazon Web Services, Microsoft Azure, Google Cloud, and Databricks. Let's mention also about Dask Geopondas. It has a client user-facing entry point for cluster users, and it works with Jupyter notebook and environment Python programming language. It has a scheduler
which manages state and sends tasks to workers for execution, and the distributed cluster has also several workers which compute tasks and store and serve computed results to the other workers or clients. Here is the geospatial big data administration
model framework for sustainable smart cities. Firstly, we have model design and development of the framework, and we have a data warehouse which stores the geospatial big data, and
we can develop smart city applications, for example, smart environment applications. In the analysis and analytics phase, we have geospatial big data frameworks to process the data. For example, we can specially join the data and create some clusters, special
clusters like k-means clustering, and after the processing step, we can visualize the data, for example, in data shader. This is the general framework which we have used
for developing sustainable smart city applications. I will continue with the study area. We have the United Kingdom, Great Britain Islands, England and Wales countries. There is energy
performance of buildings data of England and Wales. In these data sources, there is lots of attributes like property type, which describes the type of property such
as house, flat, mezzanets, and built form, detached rest, etc. It also contains environment impacts, which is a measure of the property's current impacts on the environment in terms of carbon dioxide emissions and many other useful attributes. This is in the England
and Wales scale, and we have also used the Ordnance Survey Open Map local data source. We have extracted the buildings of the whole two countries, England and Wales. This is
the polygon data, and this is the point vector data. These are the two data sources we have used in our study, and here is the methodology. We have tested two different analysis, special join and special clustering approaches in both Apache, Sedona, and that's
GeoPANDAS. Firstly, we have downloaded the data from the relevant sources, as I mentioned before. We have used GeoParQ and GeoPackage data sources, and we have also tested the
performance of those data file types. After having the data from the data sources, we have saved it as ParQ file format, which creates special partitions and provides faster
data manipulation. This is the second data source, EEPC. We have read it in ParQ file format, and we have conducted special join in here by using Dask GeoPANDAS. The reason
doing this analysis was creating building-scale data analytics for the study area to observe
which regions, which buildings have more energy efficiency, and which buildings have lower energy scores, so that we can manage this energy issue, carbon dioxide emissions,
so that we can create some politics, like we can change the plans, zoning plans, so
we can increase the green areas or lower the residential areas. It gives us to administrate the city as a whole in order to successfully apply the sustainable development goals in
terms of sustainable smart cities. This is another special join. We have read the data in the Apache Sedona, in GeoParQ file format as well, and UPRN was the address
of the buildings. We have specially joined the two data, and these are the Dask GeoPANDAS and Apache Sedona special join, which we have conducted in the Jupyter Notebook Python environment, and we have also conducted k-means clustering in order to cluster the
buildings to see, to observe where is the specially clustered energy efficiency regions, and this is the Apache Sedona codes for k-means. We have conducted defining the
k-parameter with the album methods, and find the clusters as two, and here is the performance test of the parallel processing systems, both for the special join and clustering.
For the clustering, we have faced problems, and we lowered the data in order to make it work for the Apache Sedona, so it is seen that it has a shorter run time than
compared with the special join. The bottom frameworks has similar performances, but Apache Sedona has a little bit faster in terms of run time of the performance test,
in the performance test, but using parallel processing requires really big data solutions. When you use small datasets, you will see that it is nonsense to use those parallel processing systems. The traditional methods work faster in the relatively small datasets,
and we have visualized the building scale energy efficiency analytics using data shader. You can see that the yellow buildings have more energy efficiency, sorry, the lower energy efficiency, and we can make more detailed solutions for those regions in order to increase
the sustainability of the cities in terms of smart energy, green energy. Looking at the results, the data-driven tracking of smart cities for climate action
is crucial. We can use the data in order to plan the cities, zoning plans, green areas, the buildings, and make the actions, realistic actions, to achieve those green deal sets
for all over the world. And the development of a holistic geospatial big data administration model gives us a computation power in order to handle geospatial big data, and we have
compared the two geospatial big data analytics tools. We can use those open source geospatial big data frameworks in order to efficiently handle these massive data sources and revealing the potential of geospatial big data analytics was the result of our
study. Thank you so much. If you have any questions, I will be happy to answer them. Thank you.