We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Graph Analytics on Massively Parallel Processing Databases

Formal Metadata

Title
Graph Analytics on Massively Parallel Processing Databases
Title of Series
Number of Parts
611
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2017

Content Metadata

Subject Area
Genre
Abstract
As graph processing moves to the mainstream, a large number of specializedgraph engines have emerged. However, for many enterprises, much of theirimportant data resides in relational databases and SQL is the most commonworkload. So is it reasonable to suggest that relational data processingengines can be used to solve graph problems in a productive and performantmanner? The answer to this question is: “Yes!” In this talk, we will address the use of massively parallel processing (MPP)databases for graph analytics workloads. We will share some recent findingsfrom the Apache MADlib (incubating) project, including design of graph datastructures, implementation of common graph algorithms, and performanceresults. Graph analytics is becoming an important part of enterprise computing. Withroots in academia going back many decades, the last 10-15 years have seen ahuge surge of interest in this topic to address a wide range of modern usecases, from cybersecurity to social networks to supply distribution chains. Enterprises have made significant investments in infrastructure, software, andtraining of their employees, all centered around SQL. So how can an enterpriseadd graph analytics to their business without the cost and complexity ofmoving to specialized graph processing engines? And, what are the tradeoffs? Graph analytics is a new area of innovation in Apache MADlib, which is a SQL-based open source library for scalable in-database analytics. It providesparallel implementations of mathematical, statistical and machine learningmethods for structured and unstructured data. Many existing analytics products do not scale in a way that makes itconvenient and economical to operate on large data sets. The graph methods inApache MADlib have been designed to take advantage of the shared-nothing,scale-out parallelism offered by modern parallel database engines.