We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Big size meteorological data processing and mobile displaying system using PostgresSQL and GeoServer

00:00

Formal Metadata

Title
Big size meteorological data processing and mobile displaying system using PostgresSQL and GeoServer
Title of Series
Number of Parts
188
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production Year2014
Production PlacePortland, Oregon, United States of America

Content Metadata

Subject Area
Genre
Abstract
Gaia3D has developed meteorological data mobile web service using PostgresSQL and GeoServer for weather forecaster in Korea Meteorological Administration(KMA).This system displays weather charts, weather prediction information, weather images, observation data on mobile web for rapid decision of weather forecasters when they are out of office or in remote environment. I will deliver a presentation about experience to develop and launch mobile web service showing weather charts by tuning daily updated big spatial data in terms of database.Weather charts generated by this system are displayed using OpenLayers Mobile after inserting big size vector data into PostGIS and rendering these data by GeoServer. This system processes 67 million lines of spatial data(approximately 35GB) and generates more than five thousands weather charts everyday.On previous system, it took five hours to insert data into PostGIS and took tens of seconds to publish single weather charts by GeoServer. Also, there was another problem that file size on PostGIS has unlimitedly increased.Gaia3D decided to fix the problems and improve this system in terms of data input, data management, and data display. Consquently, the performance of data input has increased about hundred times and the performance of data display has increased about two hundred times. Finally, KMA could successfully and stablely manage the system without increase of file size of three days data.(This system shows data up to previous 72 hours.)
Keywords
25
74
Thumbnail
29:15
Presentation of a groupMobile WebService (economics)Integrated development environmentPhysical systemVector spaceState observerMultiplication signDifferent (Kate Ryan album)Mathematical analysisEndliche ModelltheorieQuality of serviceElectronic data processingFrequencyElectric generatorObservational studyTable (information)Image resolutionOnline chatRaster graphicsNumberComputer architectureProjective planeSurfaceOpen setMedical imagingProcess (computing)TunisFlow separationCore dumpForestIterationType theoryOperator (mathematics)QuicksortMetreMoistureSpeech synthesisPressureSpring (hydrology)State of matterFreewareDirected graphMereology
Point (geometry)DatabaseFunctional (mathematics)Multiplication signProduct (business)Different (Kate Ryan album)Spring (hydrology)Data typeSubject indexingPartition (number theory)Right angleWordCategory of beingGraph (mathematics)Software testingMereologyGraph (mathematics)Physical systemComputer configurationState of matterLogical constantVideoconferencingResultantService (economics)Condition numberComputer programmingFrequencyProper mapInsertion lossMetropolitan area networkNatural numberNumberGraph (mathematics)Lattice (order)Wave packetRow (database)Content (media)InjektivitätContrast (vision)Uniform resource locatorContext awarenessProcess (computing)Pairwise comparisonVotingPlotterSolid geometryDivision (mathematics)Personal digital assistantRadical (chemistry)IterationData compressionCore dumpDenial-of-service attackUniqueness quantificationView (database)AdditionVirtual machineOperator (mathematics)Saddle pointFrustrationReduction of orderSystem callDirected graphObject (grammar)Ring (mathematics)CASE <Informatik>Semiconductor memoryPresentation of a groupFormal verificationWebsiteScaling (geometry)Sheaf (mathematics)Table (information)Demo (music)KinematicsReading (process)BitFitness functionInformation2 (number)Web pageLimit of a functionChannel capacityQuery languageVacuumMathematical analysisDataflowClient (computing)Sampling (statistics)SpacetimeData managementReal numberMobile WebSingle-precision floating-point formatComputer fileTask (computing)Arithmetic meanLine (geometry)Mixture modelDatabase transactionError messageKontraktion <Mathematik>Mathematics10 (number)StapeldateiWeightStatement (computer science)StatisticsShared memoryObservational studyRenewal theoryLevel (video gaming)Projective planeSoftware developerPlanningElectric generatorCharacteristic polynomialPhysicalismLatent heatType theoryCountingFilter <Stochastik>Code
Transcript: English(auto-generated)
Hi. Hello, everyone. My name is Jeon Gwon, and I work at Kaya3D, based in South Korea. I really appreciate that I have a chance to deliver a presentation at first project,
2014. This project was mainly managed by PJ Jeon, one of OSHO chapter member in South Korea, currently used at Korean metallurgy agency. This title is the big size metallurgical data processing and mobile displaying system using PostGIS
and GeoStubber. Let's go. Generally speaking, using PostGIS without tuning could lower the performance and quality of service. Tuning according to the
situation or environment makes the service better. I speak about the experience that our team successfully launched the weather chat service at KMA by using several tuning skills based on situations. First, background. KMA
mobile weather chat service is processed like this. Oh, sorry. On top of that, collected observation data is modeled and converted into grid data, which is raster data. After that, vector weather charts are generated
throughout vectorized and weather chart for services throughout imageification. If you look at the service architecture here, KML data firstly
are inserted into PostGIS and data in PostGIS is linked to GeoStubber for the service using HTML5 open layers and jQuery mobile. Weather data are mostly
low resolution images such as five kilometer by five kilometer. Also, there are somewhat different concepts such as isobaric surface and various kind of analysis models and observation time and date. Furthermore, different from
general GIS data, frequency of data generation is quite often like from a few times to several hundred times per day. Lastly, weather data should always be up to date. This system uses six special tables to handle weather data
generated for time per day. From this data, more than 5,000 weather charts are generated and reaches about 35 gigabytes and 67 million columns. Gigantic amount
of data is collected, generated, processed, and extracted every day. Number of special data column is more than the number of South Korean population. It's very many thing. I go over some problems that we have to
overcome. Due to the surge of weather data, there could be three problems. Firstly, it takes too much time to collect data. Secondly, data is not
properly managed because lots of data is accumulated every day. Lastly, it takes too much time to search data. In all, there are all around problems regarding database. The reason to happen this problem is lack of characteristics
and situation of weather data. At the beginning, level of system development, people simply thought putting data into projects and service using geo-server without understanding characteristics of data. However, weather
data is quite unique, so customizing should be required before development of the system. More specifically, it usually took
five hours to insert all data to generate weather charts and the size of data grows 35 gigabytes every day. Also, it took more than tens of seconds to search a single weather chart on mobile devices. Based on this situation, we built our own goal. We want to improve the system like this.
Inserting our data less than 30 minutes using add batch and execute batch. Keeping the size of data very fixed using partitioning and truncate. And lastly, searching our weather chart reading a few seconds by
improvement on index. First, improvement on importing speed for pixelized data. There was a big difference of speed according to how to import data. It took
more than 24 hours to import one by one of 67 billion data. However, it is shortened if data is imported at a time. The graph on the right side shows the result of the test and it says that it shortens more than several hundred times of time
when inserting after gathering 3,000s of data. One weather chart KML file has about 3,000s columns of data and we took
our imported speed test using this data. When importing one by one, it took 109 seconds. But as you can see here, using add batch after gathering data save us huge amount of time. It took 8.9 seconds to execute
batch after 100 times of add batch and it only 1.1 seconds after 3,000 seconds times of add batch. This differently shows that how to import makes huge difference. Second task is about how to keep the
data file size stable which originally grows 35 gigabytes per day. I go over
how to put many data on PostGIS. The problem of PostGIS or difference from other DBMS is PostGIS is right once tied. When updating or deleting, your data isn't removed from database. There are some marks on deleting without removing so execution is fast and filtering is
possible. However, this makes the size of database extremely large. Sometimes making the system down due to slow down of performance. Also, weather data increased by 35 gigabytes per day and we should
solve this problem. I go over more in detail. On Oracle, if learning data called B looks as B apostrophe, B goes into snapshot and disappears after completing transaction. However, on
PostGIS, it keeps record before renewing and adding A record after renewing. This makes the data increase whenever transaction. Owing to PostGIS feature, PostGIS provides a function called
vacuum. Like real vacuum, this function places a role to observe data and automatically executes. As you can see here in this slide, B and G are renewed. B apostrophe and G are apostrophes and E is deleted. On PostGIS, B, G and E data are remained where
they originally are but if vacuum is executed, B, G and E data is moved to share memory called FSM. However, in case the data is too big, this way could even increase the data file
even though vacuum is executed. Different from general vacuum, using the function called full vacuum can decrease the file size. Like this image, it reaches useless space and data and
push your data back to empty spaces. As a result of applying this function into three days of KMA data, it took 50 hours. Furthermore, during full vacuum, database locked and can't do anything. So full vacuum is not a good option. So we tried
to do partitioning. Partitioning is a function provided by database and used when data size is chose. To use partitioning, we divided weather chart into seven by days from one day to Sunday.
We defined tables by days to insert and also defined tables by days to deleted. For example, on Sunday, data is inserted into table 0 and data on table 4 is deleted.
On our seventh table, data is inserted into three tables and repeated same works every day. At this time, in order to delete data, it takes less than one second to use truncates
instead of vacuum. It blows away in the blank. As a result, we could easily and quickly keep our data less than M minus 1 day. Lastly, improvement on increase speed by resetting index.
Improving flow of increase speed is first data condition analysis check and query finding and query plane analysis check last index improvement. Sorry. It's important
to graph out the number of rows of tables for data condition analysis. It takes too much time to generally use select count to
asterisk. However, using status code table as physical race, it can be quickly recognized. If executing the query like above on the right side, the results shows like the below on the right side within a few seconds. Next, I
move to other subject which is a little bit away from database. I use SQL view in conjunction with PostGIS and GeoServer for easily managing layer not just to use a layer. It can
manage a specific SQL query when database data source is generated. In this case, it can be used to use GeoDB like Oracle spatial, PostGIS, and SD. This same consists of six tables so query are forced to complex but can manage easily by creating a SQL view. In addition, in the process of
making a SQL view, it can operate another work like a query then in the GeoServer or client. Such up in the
right side query can be managed as a layer having the same properties as a result of down in the right side. In the next page, how to use this query in right side.
Actually, executed SQL statement should be found to improve in query speed. While doing this, we found actually executing SQL statement internally using statistical table as PG state activity and graphs out how much time it took. Up in the right
side is the result of query execution of down in the left side. Currently, executing query is shown and it shows when it started, how much time it looked, and slow query. The query down in the right side is a little bit complicated and the sample of internally executing one registered on a SQL view. You can see a
SQL view registered previous page have a long lower in query as table. Oracle's analysis function is conscious of first professional functions and basic clients doesn't have professional functions. However,
PostGIS provides command line analysis function on its basic installation and PG Admin 3 which is UI tool of PostGIS also has that function. After clicking SQL, if clicking
explain analyze button on query tab, you can graphically see what's going on query and also can see the result when executing explain analyze command. Using this result, query can be analyzed.
We should consider how to set the index based on previous analysis results when improving index. First of all, index reader with all columns is set on where close. Here, special column has somewhat index type so it should be set
separately. Next, column with lots of data types should come first. Like this, it is world effective to filter out some using filtering. Also, items including same operator like echo should come first which is more
efficient than filtering out items such as smaller, larger, and like. Lastly, unnecessary index should be removed due to bad performance on inserting. As a result, data capacity is decreased by 20% due to index correction and increase speed has
increased by 6 times to 25 times. It seems that the bigger table size, the better performance. Lastly, improvement result. This demo process under 300 isobaric and temperature isokinetic math
and ground weight number temperature last is 800 isobaric and mixture ratio and temperature. Isobaric means same at upper line section in error. Let us
go to real site in Korea. This is
KMS mobile weather chat sub-site. It consists of three temas. First is metallurgy data type. We selected GDAPs. We can select isobaric.
I selected 925 isobaric. Another data type appears under the category. I select temperature and huminate.
This weather chat is serviced in a few seconds. We can write
this weather chat is showing in this site.
As I previously said, we have some results by tuning according to three scenarios. We just found out that appropriate conditions for system is very important to insert data and we improve 100 times performance. Secondly, we still keep the state of file size as M-1
according to data by mixing partitioning and truncate. Lastly, we have about 20 times inquiry time improvement throughout the composing of property index for queries. In conclusion, PostGIS is
real great DBMS and the performance is never lower than other DBMS. For us, it is perfectly suited with just server, so availability is high. However, better performance will be granted after tuning with perfect
understanding of the features. Thank you for listening to my presentation. My manager is there. He knows everything in this presentation.
Thank you.