Spatial Tajo : Supporting Spatial Queries on Apache Tajo
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 183 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/32041 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2015 | |
Production Place | Seoul, South Korea |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G Seoul 2015100 / 183
7
8
47
53
54
65
73
74
79
82
84
92
102
103
105
124
126
127
130
141
142
143
156
161
162
170
176
178
181
183
00:00
SoftwareProduct (business)Physical systemSinc functionTheory of relativityComputer configurationOpen sourceQuery languagePlug-in (computing)DataflowStandard deviationDatabaseFile systemData warehouseSubject indexingEndliche ModelltheorieVariety (linguistics)Process (computing)MereologyStructural loadImplementationGame controllerResultantData typeTerm (mathematics)Computer scienceScheduling (computing)1 (number)Presentation of a groupUniverse (mathematics)Transformation (genetics)Computer programmingScalabilityForm (programming)Operator (mathematics)Dynamical systemSemiconductor memoryMathematical analysisDistribution (mathematics)Table (information)FreewareModal logicTwitterAxiom of choiceCombinational logicFormal grammarFunctional (mathematics)Natural numberSoftware developerVolume (thermodynamics)Strategy gameRaster graphicsEvoluteMathematical optimizationClosed setLevel (video gaming)Connectivity (graph theory)Speech synthesisThermal conductivityPhysical lawGoodness of fitPatch (Unix)CircleDistanceSpacetimeHybrid computerInformationData managementLibrary (computing)VotingMachine visionMultiplication signFrequencySurfaceCASE <Informatik>Interrupt <Informatik>Position operatorSet (mathematics)System callDirection (geometry)Population densityProteinTouch typingFunctional (mathematics)Graph (mathematics)Statistical dispersionVisualization (computer graphics)Condition numberEscape characterCorrespondence (mathematics)Data analysisGeneral relativityObservational studyConstructor (object-oriented programming)String (computer science)Computer animation
08:49
Subject indexingData warehouseImplementationMathematical optimizationPlug-in (computing)MereologyModal logicCharacteristic polynomialQuery languageFunctional (mathematics)String (computer science)InformationSmoothingInformation retrievalAreaType theoryRecursionProcess (computing)Local ringData conversionAdditionDistanceModul <Datentyp>Operator (mathematics)Special functionsOcean currentEmailLevel (video gaming)Content (media)WebsiteSource codeAddress spaceSoftwareData typeTupleStapeldateiPhysical systemKey (cryptography)Semiconductor memoryFunctional (mathematics)Form (programming)VirtualizationData storage deviceLibrary (computing)Open sourceSimilarity (geometry)Distribution (mathematics)Medical imagingWeb 2.0Mathematical analysisInternet service providerDirectory serviceMachine visionMetropolitan area networkComputer-assisted translationCASE <Informatik>Coefficient of determinationPoint (geometry)Revision controlPrimitive (album)Inequality (mathematics)DemosceneComputer programmingOraclePresentation of a groupGroup actionArithmetic progressionPole (complex analysis)Fisher's exact testNetwork topologyCausalityState of matterSlide ruleConstructor (object-oriented programming)Software testingQuicksortFile formatComputer animation
17:34
Computer animation
Transcript: English(auto-generated)
00:04
His title is Spatial Tajou, Supporting Spatial Query on Apache Tajou. So please start.
00:20
Hello everyone, my name is Hyung-gu Jo, and I'm doing master course in computer science at Gunsan National University. The title of my presentation is Spatial Tajou, Supporting Spatial Quarries on Tajou.
00:45
The presentation will be made in following of the what is Spatial Tajou, what is for development, why I choose, why I chose Spatial Tajou, plan for the implementation of a plugin, parts implement, implemented,
01:03
and parts not yet implemented, and conclusion. What is Spatial Tajou? Briefly, it is a plugin to provide Spatial Quarries for Tajou. In detail, it is a plugin to provide and perform
01:21
querying data set by Spatial Quarries using SQL in distributed data system. Tajou, which provides spatial functions, supports spatial data types for Spatial Quarries, supports indexing spatial data,
01:40
and allows the use of raster data. Then, what would be the multiple development of this program? As I was getting more interest in big data for the last decades,
02:04
attempts to analyze spatial big data too have increased naturally. I also thought that the volume of the data containing the spatial information, I attempted to analyze could come close to big data.
02:22
The data held by my lab are collection of tweets on Twitter in New York Times, which are not the ones consisting only of spatial information, but I attempted a few analyze using the data. When I analyzed the data,
02:42
analysis using Hadoop was of trend, and I too conduct an analysis using Hadoop. However, I had to use MapNidus to conduct an analysis using Hadoop, and often thought that using SQL whenever I conduct an experiment would be convenient.
03:05
Since it is somewhat difficult for analysis to use MapNidus freely, which is essentially is that's processing the more data, the higher latency becomes.
03:27
So, I did a result to use existing software solutions. Of course, there are good ones among the existing software and solutions. Traditional spatial database and DBMS include
03:43
Oracle Spatial and Graph, which can be installed to Oracle database as a plugin and MySQL DBMS. Both are satisfactory solutions, but are commercial product. Some are difficult to build up a cluster.
04:02
My university did not provide them separately, and solutions that continuously cost were not an appropriate option. Using PostGIS that does not cost seems to be a good method. However, since it is not itself software
04:21
to analyze big data, it was held up. As for NoSQL, there are document oriented database as MongoDB, CoachDB, and RethinkDB. I could have used this, but I already had structured data, so I did not have to use this.
04:43
HBase was made by modeling after Google's big table, but it's somewhat difficult for analysts to use that. Hive is convenient, since it does not use SQL, but HiveQL, similar to that, but it was not just that Hive,
05:03
which used Mavnidius, would be an appropriate choice. Other solutions that use Hadoop, including GeoMesa, S3 GIS tools for Hadoop, and special Hadoop. S3 GIS tools for Hadoop is close to tools or libraries,
05:22
prepared for analysis using Hadoop, while special Hadoop tool is close to the formal combination of special plugin with Hadoop. Overall, in that it does not cost, despite our little effort should be invested,
05:41
it seems that performing special queries using Hive or using GeoMesa may be most appropriate. However, I finally decided to prepare plugin that allows Tazu, free and open source software, which low latency to perform special queries by myself.
06:06
Then, why I choose Tazu? There are features, there are, first I will introduce Tazu before I will speak
06:20
about why I choose Tazu. Apache Tazu is robust big data relational and distribute data warehouse system for Apache Hadoop. Tazu is designed for low latency and scalable ad hoc queries, or an aggregation and extract transform load process
06:40
on large data set stored on Hadoop distributed file system and other data sources. By supporting SQL standard and leveraging advanced database techniques, Tazu allows direct control of distributed execution
07:01
and data flow across a variety of query evolution strategies and optimization opportunities. Then, why did I choose Tazu? There are a few features
07:20
that become the reasons for my choice. Since Tazu is designed to run on Hadoop, using Hadoop can alleviate my concern about data distribution and also support a module necessary.
07:41
Tazu has a function of external table so it can bring existing files and make queries. It does not support updated syntax like HBase or Hive, but it can override. Tazu supports SQL standard and does not use memory use
08:02
so it is best for them that processing using memory use has felt tolerance and supports dynamic scheduling for long-learning queries. Tazu is convenience in terms of installation,
08:21
construction, and operation. Of course, it is a project still growing so it is certainly difficult to solve when it fails. And if you need, you can implement it yourself and attach it in the form of programming. Tazu has described the above has marriage and demerge
08:43
as compared to solutions or software in the examples, but it is free and open source software so I decided to implement it on Tazu, which was judged to be appropriate
09:01
considering the characteristics introduced. The plan for implementation of the plugin is broadly decided into four steps. First, for the implementation of special functions
09:22
for special queries, I decided to implement 11 basic functions including distance and equals and the function for the conversion of special data types. Second, the additional types of special data
09:45
which means the custom type containing special information like point, line, string. Third, KNN query, which means the implementation for smooth performance of KNN after
10:00
the implementation of special functions. Fourth, indexing special data, which means the implementation of index that allows retrieval of necessary data in making special queries.
10:23
Parts that have been implemented so far include special functions, KNN query and special data indexing. For special functions, I implemented most of the primary functions using JTS.
10:41
I carried out KNN queries using the special functions implemented which are working smoothly. For special data indexing, following the operation methods of index in TAZO and special Hadoop,
11:01
I implemented a two-level R3 using soft-tie recursive. The two-level R3 has two forms of global index and local indexes as shown in the picture. The process of building the two-level R3 indexes
11:22
as follows. First, TAZO workers divide the stored data set by each area using the STR. Second, they build local indexes for each area. Third, they extract only the data
11:41
on a certain off-level from each of the local indexes and build a global index using them. The process leading the two-level R3 index is as follows. First, TAZO workers read the global index
12:00
and find such keys. Second, they find local indexes corresponding to the such keys. Third, they find the such keys in the local indexes. Lastly, they read the data directly from the storage and construct tuples.
12:27
Parts that are not yet implemented include special data types, special functions, k and query, special data index, and modularization. Special data types are currently not implemented
12:42
so there is inconvenience that special functions have to use primitive types. I'm going to resolve the inconvenience of special functions once special data types are implemented.
13:03
First, as for special functions, I'm going to implement the functions as length, area, centroid that are not implemented and then optimize each of the special functions and k and queries. Next, as for the indexing of special data,
13:22
I'm going to implement quadtree or kd3 using Jewish as well as R3. Lastly, I'm going to modularize the plugin currently. It is combined since it is difficult to separate it from TASO but after going through the final modularization
13:45
in the plugin, it will be distributed in the form of plugin. Conclusion, what is special TASO? It is a plugin to provide special queries for TASO.
14:01
While this multiple development, I began developing it as I wanted as analysis to SQL in distributed data warehouse system instead of using web images or similar batch processing.
14:21
Why did I choose Apache TASO? I did so because I can use it without a great concern about the distribution storage support, SQL syntax, it's better than memory juice, guarantees, virtual runs and above all, I can implement
14:40
and attach it in the form of plugin myself if necessary. A plan for implementation is as follows. As overall plan for implementation, I'm going to implement special functions for special queries and special data types and allow running KN queries and supporting
15:04
the function of indexing the special data. The current status is as follows, the partial implemented software include most special functions, the running of KN queries and support for indexing
15:22
through implementing two or two level R3. The parts not yet implemented include special data types, the remaining special functions, the optimization of special functions and special queries are another method of indexing
15:42
the special data and the modularization of plugin. For today's presentation, I mainly refer to document and source code of Apache TASO, special other than PostGIS and etc. And there are books or website for information
16:04
about contents as well. For Q&A, please email your questions to the email address stated here and I will answer them in detail. Thank you for listening to my presentation.
16:20
Thank you. Any question? Have you looked at libspatialindex for doing your indexing rather than trying to do the indexing all on your own?
16:40
Lib3, sorry. Libspatialindex, it's a seed library for doing R3, kd3 indexing? No. No? Oh, yes. Any comments or question?
17:00
How do you think that how long plan to implement it or function? Not plan to, cannot plan to?
17:23
I don't know, sorry. Okay. Any other question or comments? Okay, thank you.