Big size meteorological data processing and mobile displaying system using PostgresSQL and GeoServer
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 188 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/31597 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2014 | |
Production Place | Portland, Oregon, United States of America |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G 2014 Portland172 / 188
7
10
14
15
16
23
24
25
28
29
33
37
39
40
43
45
46
48
50
56
64
65
69
72
74
82
89
91
98
102
107
111
114
118
128
131
132
135
138
141
143
147
149
150
157
158
161
164
165
166
173
174
175
179
185
00:00
Presentation of a groupMobile WebService (economics)Integrated development environmentPhysical systemVector spaceState observerMultiplication signDifferent (Kate Ryan album)Mathematical analysisEndliche ModelltheorieQuality of serviceElectronic data processingFrequencyElectric generatorObservational studyTable (information)Image resolutionOnline chatRaster graphicsNumberComputer architectureProjective planeSurfaceOpen setMedical imagingProcess (computing)TunisFlow separationCore dumpForestIterationType theoryOperator (mathematics)QuicksortMetreMoistureSpeech synthesisPressureSpring (hydrology)State of matterFreewareDirected graphMereology
03:49
Point (geometry)DatabaseFunctional (mathematics)Multiplication signProduct (business)Different (Kate Ryan album)Spring (hydrology)Data typeSubject indexingPartition (number theory)Right angleWordCategory of beingGraph (mathematics)Software testingMereologyGraph (mathematics)Physical systemComputer configurationState of matterLogical constantVideoconferencingResultantService (economics)Condition numberComputer programmingFrequencyProper mapInsertion lossMetropolitan area networkNatural numberNumberGraph (mathematics)Lattice (order)Wave packetRow (database)Content (media)InjektivitätContrast (vision)Uniform resource locatorContext awarenessProcess (computing)Pairwise comparisonVotingPlotterSolid geometryDivision (mathematics)Personal digital assistantRadical (chemistry)IterationData compressionCore dumpDenial-of-service attackUniqueness quantificationView (database)AdditionVirtual machineOperator (mathematics)Saddle pointFrustrationReduction of orderSystem callDirected graphObject (grammar)Ring (mathematics)CASE <Informatik>Semiconductor memoryPresentation of a groupFormal verificationWebsiteScaling (geometry)Sheaf (mathematics)Table (information)Demo (music)KinematicsReading (process)BitFitness functionInformation2 (number)Web pageLimit of a functionChannel capacityQuery languageVacuumMathematical analysisDataflowClient (computing)Sampling (statistics)SpacetimeData managementReal numberMobile WebSingle-precision floating-point formatComputer fileTask (computing)Arithmetic meanLine (geometry)Mixture modelDatabase transactionError messageKontraktion <Mathematik>Mathematics10 (number)StapeldateiWeightStatement (computer science)StatisticsShared memoryObservational studyRenewal theoryLevel (video gaming)Projective planeSoftware developerPlanningElectric generatorCharacteristic polynomialPhysicalismLatent heatType theoryCountingFilter <Stochastik>Code
Transcript: English(auto-generated)
00:04
Hi. Hello, everyone. My name is Jeon Gwon, and I work at Kaya3D, based in South Korea. I really appreciate that I have a chance to deliver a presentation at first project,
00:22
2014. This project was mainly managed by PJ Jeon, one of OSHO chapter member in South Korea, currently used at Korean metallurgy agency. This title is the big size metallurgical data processing and mobile displaying system using PostGIS
00:44
and GeoStubber. Let's go. Generally speaking, using PostGIS without tuning could lower the performance and quality of service. Tuning according to the
01:01
situation or environment makes the service better. I speak about the experience that our team successfully launched the weather chat service at KMA by using several tuning skills based on situations. First, background. KMA
01:25
mobile weather chat service is processed like this. Oh, sorry. On top of that, collected observation data is modeled and converted into grid data, which is raster data. After that, vector weather charts are generated
01:45
throughout vectorized and weather chart for services throughout imageification. If you look at the service architecture here, KML data firstly
02:03
are inserted into PostGIS and data in PostGIS is linked to GeoStubber for the service using HTML5 open layers and jQuery mobile. Weather data are mostly
02:22
low resolution images such as five kilometer by five kilometer. Also, there are somewhat different concepts such as isobaric surface and various kind of analysis models and observation time and date. Furthermore, different from
02:44
general GIS data, frequency of data generation is quite often like from a few times to several hundred times per day. Lastly, weather data should always be up to date. This system uses six special tables to handle weather data
03:07
generated for time per day. From this data, more than 5,000 weather charts are generated and reaches about 35 gigabytes and 67 million columns. Gigantic amount
03:27
of data is collected, generated, processed, and extracted every day. Number of special data column is more than the number of South Korean population. It's very many thing. I go over some problems that we have to
03:48
overcome. Due to the surge of weather data, there could be three problems. Firstly, it takes too much time to collect data. Secondly, data is not
04:03
properly managed because lots of data is accumulated every day. Lastly, it takes too much time to search data. In all, there are all around problems regarding database. The reason to happen this problem is lack of characteristics
04:25
and situation of weather data. At the beginning, level of system development, people simply thought putting data into projects and service using geo-server without understanding characteristics of data. However, weather
04:45
data is quite unique, so customizing should be required before development of the system. More specifically, it usually took
05:01
five hours to insert all data to generate weather charts and the size of data grows 35 gigabytes every day. Also, it took more than tens of seconds to search a single weather chart on mobile devices. Based on this situation, we built our own goal. We want to improve the system like this.
05:24
Inserting our data less than 30 minutes using add batch and execute batch. Keeping the size of data very fixed using partitioning and truncate. And lastly, searching our weather chart reading a few seconds by
05:41
improvement on index. First, improvement on importing speed for pixelized data. There was a big difference of speed according to how to import data. It took
06:02
more than 24 hours to import one by one of 67 billion data. However, it is shortened if data is imported at a time. The graph on the right side shows the result of the test and it says that it shortens more than several hundred times of time
06:24
when inserting after gathering 3,000s of data. One weather chart KML file has about 3,000s columns of data and we took
06:42
our imported speed test using this data. When importing one by one, it took 109 seconds. But as you can see here, using add batch after gathering data save us huge amount of time. It took 8.9 seconds to execute
07:01
batch after 100 times of add batch and it only 1.1 seconds after 3,000 seconds times of add batch. This differently shows that how to import makes huge difference. Second task is about how to keep the
07:23
data file size stable which originally grows 35 gigabytes per day. I go over
07:41
how to put many data on PostGIS. The problem of PostGIS or difference from other DBMS is PostGIS is right once tied. When updating or deleting, your data isn't removed from database. There are some marks on deleting without removing so execution is fast and filtering is
08:04
possible. However, this makes the size of database extremely large. Sometimes making the system down due to slow down of performance. Also, weather data increased by 35 gigabytes per day and we should
08:22
solve this problem. I go over more in detail. On Oracle, if learning data called B looks as B apostrophe, B goes into snapshot and disappears after completing transaction. However, on
08:41
PostGIS, it keeps record before renewing and adding A record after renewing. This makes the data increase whenever transaction. Owing to PostGIS feature, PostGIS provides a function called
09:01
vacuum. Like real vacuum, this function places a role to observe data and automatically executes. As you can see here in this slide, B and G are renewed. B apostrophe and G are apostrophes and E is deleted. On PostGIS, B, G and E data are remained where
09:24
they originally are but if vacuum is executed, B, G and E data is moved to share memory called FSM. However, in case the data is too big, this way could even increase the data file
09:41
even though vacuum is executed. Different from general vacuum, using the function called full vacuum can decrease the file size. Like this image, it reaches useless space and data and
10:00
push your data back to empty spaces. As a result of applying this function into three days of KMA data, it took 50 hours. Furthermore, during full vacuum, database locked and can't do anything. So full vacuum is not a good option. So we tried
10:25
to do partitioning. Partitioning is a function provided by database and used when data size is chose. To use partitioning, we divided weather chart into seven by days from one day to Sunday.
10:44
We defined tables by days to insert and also defined tables by days to deleted. For example, on Sunday, data is inserted into table 0 and data on table 4 is deleted.
11:07
On our seventh table, data is inserted into three tables and repeated same works every day. At this time, in order to delete data, it takes less than one second to use truncates
11:20
instead of vacuum. It blows away in the blank. As a result, we could easily and quickly keep our data less than M minus 1 day. Lastly, improvement on increase speed by resetting index.
11:45
Improving flow of increase speed is first data condition analysis check and query finding and query plane analysis check last index improvement. Sorry. It's important
12:14
to graph out the number of rows of tables for data condition analysis. It takes too much time to generally use select count to
12:23
asterisk. However, using status code table as physical race, it can be quickly recognized. If executing the query like above on the right side, the results shows like the below on the right side within a few seconds. Next, I
12:47
move to other subject which is a little bit away from database. I use SQL view in conjunction with PostGIS and GeoServer for easily managing layer not just to use a layer. It can
13:01
manage a specific SQL query when database data source is generated. In this case, it can be used to use GeoDB like Oracle spatial, PostGIS, and SD. This same consists of six tables so query are forced to complex but can manage easily by creating a SQL view. In addition, in the process of
13:23
making a SQL view, it can operate another work like a query then in the GeoServer or client. Such up in the
13:41
right side query can be managed as a layer having the same properties as a result of down in the right side. In the next page, how to use this query in right side.
14:00
Actually, executed SQL statement should be found to improve in query speed. While doing this, we found actually executing SQL statement internally using statistical table as PG state activity and graphs out how much time it took. Up in the right
14:20
side is the result of query execution of down in the left side. Currently, executing query is shown and it shows when it started, how much time it looked, and slow query. The query down in the right side is a little bit complicated and the sample of internally executing one registered on a SQL view. You can see a
14:44
SQL view registered previous page have a long lower in query as table. Oracle's analysis function is conscious of first professional functions and basic clients doesn't have professional functions. However,
15:04
PostGIS provides command line analysis function on its basic installation and PG Admin 3 which is UI tool of PostGIS also has that function. After clicking SQL, if clicking
15:20
explain analyze button on query tab, you can graphically see what's going on query and also can see the result when executing explain analyze command. Using this result, query can be analyzed.
15:42
We should consider how to set the index based on previous analysis results when improving index. First of all, index reader with all columns is set on where close. Here, special column has somewhat index type so it should be set
16:01
separately. Next, column with lots of data types should come first. Like this, it is world effective to filter out some using filtering. Also, items including same operator like echo should come first which is more
16:21
efficient than filtering out items such as smaller, larger, and like. Lastly, unnecessary index should be removed due to bad performance on inserting. As a result, data capacity is decreased by 20% due to index correction and increase speed has
16:41
increased by 6 times to 25 times. It seems that the bigger table size, the better performance. Lastly, improvement result. This demo process under 300 isobaric and temperature isokinetic math
17:03
and ground weight number temperature last is 800 isobaric and mixture ratio and temperature. Isobaric means same at upper line section in error. Let us
17:29
go to real site in Korea. This is
17:41
KMS mobile weather chat sub-site. It consists of three temas. First is metallurgy data type. We selected GDAPs. We can select isobaric.
18:01
I selected 925 isobaric. Another data type appears under the category. I select temperature and huminate.
18:23
This weather chat is serviced in a few seconds. We can write
18:42
this weather chat is showing in this site.
19:02
As I previously said, we have some results by tuning according to three scenarios. We just found out that appropriate conditions for system is very important to insert data and we improve 100 times performance. Secondly, we still keep the state of file size as M-1
19:24
according to data by mixing partitioning and truncate. Lastly, we have about 20 times inquiry time improvement throughout the composing of property index for queries. In conclusion, PostGIS is
19:44
real great DBMS and the performance is never lower than other DBMS. For us, it is perfectly suited with just server, so availability is high. However, better performance will be granted after tuning with perfect
20:00
understanding of the features. Thank you for listening to my presentation. My manager is there. He knows everything in this presentation.
20:26
Thank you.