Accelerating GeoSpatial Data Analytics With Pivotal Greenplum Database

Video thumbnail (Frame 0) Video thumbnail (Frame 925) Video thumbnail (Frame 8870) Video thumbnail (Frame 22101) Video thumbnail (Frame 35332) Video thumbnail (Frame 36349) Video thumbnail (Frame 40598)
Video in TIB AV-Portal: Accelerating GeoSpatial Data Analytics With Pivotal Greenplum Database

Formal Metadata

Accelerating GeoSpatial Data Analytics With Pivotal Greenplum Database
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Year
Production Place
Seoul, South Korea

Content Metadata

Subject Area
As a typical big data application, geospatial analysis nowadays has been receiving extensive attention from both academic and industrial domains. Along collecting massive geospatial data, more and more manufacturers as well as research institutions find that the analysis over geospatial data in existing legacy architecture cannot be scalable. The reason is typical two-fold. On one hand, extending traditional databases to support modern complex geospatial data analytics is rather challenging. On the other hand, integrating the emerging techniques in other big data applications to traditional databases may suffer from compatibility issue, resulting in the poor performance or even painful debugging tasks. Specifically, most of today��s general-purpose relational databases (e.g., Oracle, Microsoft SQL Server, together with their geospatial components) are particularly designed as OLTP systems. Their shared-disk or shared-everything architectures are especially optimized for high-throughput transaction execution while sacrificing analytical query performance. In contrast to the exiting relational database systems, Pivotal offers the Greenplum Database (GPDB), which is an extensible relational database platform that uses a shared-nothing, massive parallel processing (MPP) based architecture to vastly accelerate the online analytical processing (OLAP) over geospatial big data. Even better, GPDB can seamlessly integrate in-database analytical processing with our extended analytics stacks, such as heterogeneous Hadoop environments and in-memory data grid. Recent reports from Gartner highly scored Pivotal GPDB on data warehousing and analytics. We design and develop geospatial analytics toolkits on GPDB in terms of three aspects. First, we migrate the latest PostGIS project into GPDB so that GPDB is able to run as a spatial database system for regular GIS users. Second, we extend the spatial component with various types of advanced geospatial functions, such as geospatial group-by, similarity search and network-constrained scenarios. Third, we are making effort to support associable retrievals of data across geospatial and other data domains, i.e, queries involving in both geospatial information as well as other non-spatial information, like RDF (which is known as GeoSPARQL queries), Text (which is known as spatial keyword search), time (which is known as trajectory search) etc. Above all we aim to integrate full breath of big data developers on geospatial analytics. This talk will briefly introduce (1) the architecture of Pivotal GPDB that provides automatic high-performance parallelization of geospatial data loading and data processing, (2) GPDB��s extensive and growing library of in-database geospatial analytic functions, and (3) the capability to build up a comprehensive geospatial data analytics platform around Pivotal GPDB. I will provide examples of how data science teams may transform billions of geo-tagged customer records to tackle the real-world problem of identity resolution in one minute. I will also discuss our plan of making Pivotal Greenplum Database open-source in the coming quarters.
Spring (hydrology) Computer animation Food energy
Group action Code Multiplication sign Execution unit Source code Numbering scheme Set (mathematics) Data analysis Mereology Proper map Dimensional analysis Heegaard splitting Sign (mathematics) Casting (performing arts) Different (Kate Ryan album) Semiconductor memory Negative number File system Error message Partition (number theory) Vulnerability (computing) Area Boss Corporation Relational database Structural load Data storage device Cluster analysis Funktionalanalysis Sequence Demoscene Arithmetic mean Process (computing) Internet service provider Summierbarkeit Cycle (graph theory) Quicksort Spacetime Open source Image resolution Patch (Unix) Tape drive Translation (relic) Graph coloring Scherbeanspruchung 2 (number) Operator (mathematics) Energy level Software testing Metropolitan area network Mathematical optimization Form (programming) Computer architecture Multiplication Distribution (mathematics) Forcing (mathematics) Weight Physical law Planning Analytic set Database Basis <Mathematik> Cartesian coordinate system System call Subject indexing Computer animation Doubling the cube Query language Hybrid computer Object (grammar) Table (information) Local ring
Trajectory Building Java applet Code Multiplication sign Execution unit 1 (number) Set (mathematics) Insertion loss Solid geometry Bit rate Semiconductor memory Bus (computing) Videoconferencing Office suite Series (mathematics) Endliche Modelltheorie Physical system Predictability Mapping Structural load Shared memory Staff (military) Funktionalanalysis Lattice (order) Sequence Data mining Message passing Internet service provider Dataflow Open source Similarity (geometry) Code Theory Field (computer science) Power (physics) Attribute grammar Number Product (business) Form (programming) Computer architecture Graph (mathematics) Key (cryptography) Inheritance (object-oriented programming) Interface (computing) Mathematical analysis Planning Database Cartesian coordinate system Subject indexing Cache (computing) Film editing Computer animation Visualization (computer graphics) Point cloud Game theory Table (information)
Implementation Computer animation Personal digital assistant Multiplication sign Data storage device Wave packet Row (database)
Sequel Code Execution unit BEEP Computer programming Programmer (hardware) Goodness of fit Strategy game Lecture/Conference Profil (magazine) Different (Kate Ryan album) Semiconductor memory Partition (number theory) Relational database Data storage device Memory management Basis <Mathematik> Bit Database Sphere Computer animation Query language Summierbarkeit Table (information) Row (database)
that everyone this my apparently encoding so look here man the the they came from P with the would always the the some kind an go start up company is set up a by EMC of energy usury of goal and there may be some of the you know some may say we use our products before like the ladies open-sourcing under spring from so that if what during the after lunch I have to
look at it will pay dearly myosin-II because I money and I have listened toolserver hopefully speak about uh but whether there are no cycles so I want to say something to you man opinion yeah so the following code is all connects see some some popular all can see some of facial sexy solution genre of data analysis why is uh them most of wiser than the huddled yeah because if we look at what commanded the commenter date how you want 1 that associated and the when the single arise not we considered we use of a clustered pooled deal with them and the hubble with other the other popular 1 in a huddle put the force of the puzzle when you put blue is a translator high dimension daytime unit you spilled the Italian tool 1 one-dimensional don't live the weakest going to you know caveman allele store the likes age they select among the VBE then we can't store data blog the in tool are the use those intermolecular GFS from IBM and IDA vessel a unified the bullet because it on and entire ways away Casteel Europe paralleled that uh relational database a pool storage you'll be talked uh new fair ICA can be worth something yeah Our of the sigh soffex last test with a search and from of the terrible pose remember the name he said that uh pick it lottery interesting talks about that the mumble about planned paid yes pardon tape the faster than posted you see of around pose agrees uh and the another over we can get the index the index is not that useful but a form my experiments this is the difference uh is capital say what would a solution is a bad time in there and see call on OSI Colby College that depend on the depend on a lot data partition you feel there was some of the particular query high and the you can't use I and did a party I mean a ditribution of across that due to because it has it would be fun for this kind of fool query the but a for some uh for some queries expected of some genera proper Mars a data store you come and you come the Buda did the party and for every query so so maybe some Tamil we will find that and all supposedly at a faster most have might be you see still foster here uh coagulum ample by I have performances of experiments on 16 said uh sediment us or our immune is to apply the you know the kind of funny and the way they would be the said how about a plan take a basic tool by comparing index way that the Argentina using that sign there that your high shear which is a commonly used the you know Seiko yeah I'm using the uh mountain is the all the so this a freedom to be used to income and the you pugilism are a function like that you want her to submissively the space aphelion co it would be you but a will for us it will bother forte poisons solve for the public discuss the are we can see that in with the index look where is reality lot a faster so I don't know why when he said that uh our money and use color the set NOT between the way the index all without index I confused tools but but you my experiment I find that with the index so we can deal with reality in a compact and uh so basis so we can lead to what they are very efficiently yeah because for the the query plan will have we can find that our on low level is a well you you know you know that scheme if always before dawn ability in denser geospatial so we'll you'll hear US Krinsky itself all scheme it mean away with us we step almighty Timor-Leste and the Union them from database away for improvement of we that be demanding bags give me the for the index you have we can't access of a set of of uh did probable so it should be so the faster than the law OTC index so there's a why and confuse the by this the index is a very useful the OK and I alluded at high so sealed we use
that oppose suggest higher than the than those Seiko approach because that would them with their developed for many man a geospatial function and the week high-yielding histamine is a meaning all for all the source light so when you pull translator our function you know Seiko maybe as the huddle put occurring ob we come we will if your application tonal sickle but for now you 20 are the geospatial query in Poland no way and the for some company and as negative for some start up company they are father told police this job and that's particularly funny that she'll size it is not that efficient enough OK I feel mission of before because is dependent on that the the party hint hint that today the part him policy why you will you you will come party your date has yet smarter come fosters some quite like the I'm is a k nearest k nearest and they wrote query also time faster fast makes Lenny coarray upon the clarity of some and some speculation no of function natural reliably deception with the with us by showed do you find that did a partition but it's not to be size and this another common for our database OK because 0 if it was so poem for with no time for the signals thing is or the less than a mile my talk the I OK will quickly will be all about that from databases in William called deals with is architecture on the feature and then we'll we'll are open-source planning thank you for your time if with summertime we can discuss some uh and case-studies Ch them from databases database and is a fair would develop a lot more was all around there 12 years and now we have will fail so the balls wise solid 101 is still based so it's a little popular use some area the not in the popular as the Oracle on on a database yeah but it's still my memory I know that I like them all this time me electoral were mountain ICA with that many they cut him off uh the IUE enlisted of you in this database with film and a lot of William attention so as did in some of the same therefore say is a is a union of MPP architecture is merely affair master it will coordinate our sediment and they're Europe the top will be a see the directed pooled the different uh know that mu for segment node it and affairs is always simply always on the sky and know if it will have will know all levels between different segments under the weight connecting them away to the highest bidder unit connected so this mean we kind provider of flexible from local flora precision large-scale data set we can do query in parallel he had given you ample you want to know the of the price of a be how users city you know it usually square AI and the union in database away generated this aquarium plant you can find some motion like that lead to the distribution under the sum of there are many these so we kind of finish is up joy and because of its joint vitamin the different sediments and the group that is generated by the query of matter and the way they do some more work song these because it isn't really a important tool we uh you clustering tool for high performance and that now we simply separate this the of the matter from on the basis that mean it can be it can be used the year multiple pay the form actually with have will you the the the probable sequence probable solution I usually the uh also 0 let me week kind do do we come she was some EUR define the policy pool due to be gained but they'd have pull up at him I the operator is the uh I know we give it lovely name all Allcock so and there is so will be open source of you know in all this year the OK community shock but but the Ch the and not enough utilizing there is a more important is that this will be Y. Singer is an important because you induce abbatial scenario we mean the pulley called unlike the set of all the time for example in Beijing China with travel 6 of the 7 Sorensen taxis so you've wanted pool monitoring traffic offered a city with a yeah so many date how will be injected in the pool of the database by we on and a master may be of bottleneck here so but don't worry we can loaded did have from a secondment sediment the direct I'm not an idiot how can be laws chameleon in any settlement under each at them know we'll ladies patches that they'd had poorer correct the segment so that means there must there will never be a button that here is a very important of all many scenario you know that with many many uh comes that Budin are some the CDA the wonderful monitor error scene and like uh your mobile phone like Q. the taxi let the boss and all so many moving objects so would you know it pull lawdebate that had the really quickly and the also we can look at they have from file system directly and we'll as the fat book yes we can accept it excess of the Hubble with the fire directly you cannot see what they've had career table we can claim is that it's a 10 no load tables then the fight that date hasn't really store the you know the Hubble not you did based this I think it's a really interesting because we can provide a hybrid the solution for the euro and also and we know it but from sort the some data are faultily kind store you neural and there's something about a cold call cast later you and column anaphora of the true data which has a new tools so the splitting is OK so we can provide a you a defined define a story this in the article this is also a provider of commanders and full among interior cluster and as particularly weak kind setup for that a priority create that create fear that means I Kaiser double home and is also you can you for some for the Muriel kind as access among maybe uh 10 database viral you can use more so was so it's a more flexible for for company for use this database the you listening yeah way of can way folk on analytics and ultimately Sofia this new the he had the Phillies that some papers I would like to share our technology that will you know you are yeah this is some paper we publish the units in the year and that you can find that all query optimizer unknown leverage and no on always seems if you want to find such a at the while the plant into it we knew
we want will support that use visual but before the open-source impair way does not provide a solution that poorer Europe but in all we considered will support its use visual unit admitted directly and fair use the map now with else about their jumper graph a and 2 b Joyce building index and not and the way have supported the loss that you are you are the devil but there was and so not only the stuff and well the blue that something like the trajectory book came together is the really interesting you know that has a where 1 GPS applied sometime is the knowledge that a meaningful because 0 wait can't understand that acts and a wife but he if we all the column together immune you will consider a sequence a GPS it as we'll leave printed up In the radial the via it's a much more useful than a mere we can query the use of higher semantic earlier so we can compare a similar similarity between pool to the to the parade with ample Everdale Joe from a home office and now i of fun of cool were where the 0 maybe you can share Koch's in cut 1 cop next time as mutant like some cop application is the more you know interesting was always title some uh some take a picnic in X that you sparkle and Oracle have will implement a Somalis and away new pool support a unit future yet it was only Lempel and human germ theory and all fields are exploiting this demand so I'll skip claims this index yeah that they will only a assault will use this database support achieved pull our analysis your data you would usability power you can you just you just the Italian for a database on the the defy it's a features and you can't use a model the pool the to achieve a data classroom data mining of on-axis some prediction yep way about it was simply a put the fire your function fair you can't let he'll coder union in Java you see in you are in an hour of questions and there are a number you which we you a defined function and also always have develop the open source and liberated tool the unedited uh and that it takes so you can't wait to tell us about the morons so take a kind of all good internal here is over all so you can download a UK you'll hit you later on possible ways so also you can use that unity and ordered basic them from some database then you come and do some viennese uh target yes series of the Staff will some example you know mentally with likes to that uh some from at least from the descriptive the so there's talk about lot bigger so here uh mn and little attribute was either or a productive ask open source stuff all and open source in in the future of the athlete ladies it's a memory caches are many again I use unit pool foster your Persiles and this is a message in the middleware and fair yes with him or sorry database uh antigen a fat is open source so you can learn in the main before May this year it's so you memory database is a a useful tool 7 I know that it's be use the you know earlier we take it as it channel and the union Diyarbakir with there will COG publication and the many people at that's a lot for a takers online so is the will of high load on the system it's a you memory you member database is open source and then there is a folk complicated Seiko are hobbled so you can't access to date have from our Hubble would just say we the C code interface and the last Eliza Greenplum after we talked about the user's talk a MPP based uh parallel database there's the chance that you know and the the below this databases these products so we also have to publish the OTC the solution where in yeah yeah that this product is from women wear and the now 1 way if more weighted toward you with all now is open source that means you can't even for your private cloud of fear and that the applied the yellow database here so you come from a provider cloud database of bio selfless reusing an offer rate OK with there was similar with pool and analysis you on the attack and yeah flow for database when way of parental push it it will communicate by the end of October immune maximize it will be open source without developped of floor tell yeah so them many interesting in coding is you can't you floor and improve us is will be pushed the eparchy community wise some in some it is a proposal way plan pull it we're up also go is a community about for some reason which in your idea of so is a why is about told minus delayed yeah and the if I come affair and I mean unless the unless a set a date with so all players in open source the top on database the in Beijing and the more than 100 people uh 10 is a meeting on that once more the 1 solids under 500 people if you're an alternate history of underlying webcast become an while for my for and that they will need his picture k ee something and the enjoy the video the other we have a set of items of an America and the null many many people are Johnson on the um them primer geospatial function for and all of the key the is is the other famous you know open source community in the form of A. Fernández and now we are working on a pool hall tool Excelerator that the geospatial on our and pp the architecture you know that the implicit yes is the most important table they the spatial reference the table and not for most offer a function we'll access this table and that you fool was stored table still years you discuss it will be really slow so now wait at home always had pulled memory way of will improve improve with by you member high she that means that you have we'll start has geospatial and this table will be loaded remember directly the union in the pool access to the table games it would be faster and also with go of medicine of freedom for you this database away or assume all data should be the Indian bus for that is sometimes not win near Promega modification support in and the also this guy is as close as I goes this busy importance and the it just provide the fair 1 and 10 years of experience it will provide a solution pool the the euro you but you will you know opens the map will pull jazz so was similar kind uh sit together and that would you consult will make uh praefectus solution tool you know them from database yet
of you know until you and the the here
is some new cases yeah that's within and question so what we have few so you would you OK they give very much I was right on time you hear from any questions from the floor 1st in training on and I've got a couple that the polymorphic storages really interesting right and you've you're combining with that the rows store the column storing insane implementation what kind
of queries and what kind of geospatial data does that really perform well on from your experience so don't data partitions no you were talking about what early on your trying but article no sequel but I was asking were particularly but the rows store versus column store which you get a single polymorphic storage which I guess you implement together and depending on the query here depending on the data you decide which way to cut in goal how does it work exactly of this the way I given that develop a lot of implication tool a Beijing intensity and they each day to do with Janero Boletín of 5 you will hundreds of megabases IA will crater to it a separate table for each day and then fullerene for for the nearest of a mean 4 than and they have will still Uriel and therefore the sum of good code a bit how we'll just push it a poor profiles of it because you know that the database uh and uh 60 464 evoking a basis of the best the size of full of relational database you full of Saturday's Toby unit will partitioning through a small piece yeah and 1 last thing I was going to ask you use answered at the end about the in memory storage and you're working so most of your work you're doing just in memory or you're running up this is that and was the speed difference this uh ethical about pool parcel of free memory wiser than and why is that gym of fire wiser spatial if table that yeah a sphere is a good question because we can't store the height in the memory that so we no way considered will go dope the solution at least where a lot of the programmer will specify which data after the beep p in the in their memory if so you'll you'll that is if you come and kid it's depend on the program but you know that the programmer is the which kind there tussling with that so we consider tool use some uh catches strategy it will be due because the later in what if if there's nothing else than I'd like to thank you again for your talk the that