Graph Analytics on Massively Parallel Processing Databases

Cite

FOSDEM VZW

McQuillan, Frank

Formal Metadata

Title

Graph Analytics on Massively Parallel Processing Databases

Title of Series

FOSDEM 2017

Number of Parts

611

Author

McQuillan, Frank

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/42067 (DOI)

Publisher

FOSDEM VZW

Release Date

2018

Language

English

Production Year

2017

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

As graph processing moves to the mainstream, a large number of specializedgraph engines have emerged. However, for many enterprises, much of theirimportant data resides in relational databases and SQL is the most commonworkload. So is it reasonable to suggest that relational data processingengines can be used to solve graph problems in a productive and performantmanner? The answer to this question is: “Yes!” In this talk, we will address the use of massively parallel processing (MPP)databases for graph analytics workloads. We will share some recent findingsfrom the Apache MADlib (incubating) project, including design of graph datastructures, implementation of common graph algorithms, and performanceresults. Graph analytics is becoming an important part of enterprise computing. Withroots in academia going back many decades, the last 10-15 years have seen ahuge surge of interest in this topic to address a wide range of modern usecases, from cybersecurity to social networks to supply distribution chains. Enterprises have made significant investments in infrastructure, software, andtraining of their employees, all centered around SQL. So how can an enterpriseadd graph analytics to their business without the cost and complexity ofmoving to specialized graph processing engines? And, what are the tradeoffs? Graph analytics is a new area of innovation in Apache MADlib, which is a SQL-based open source library for scalable in-database analytics. It providesparallel implementations of mathematical, statistical and machine learningmethods for structured and unstructured data. Many existing analytics products do not scale in a way that makes itconvenient and economical to operate on large data sets. The graph methods inApache MADlib have been designed to take advantage of the shared-nothing,scale-out parallelism offered by modern parallel database engines.