As graph processing moves to the mainstream, a large number of specializedgraph engines have emerged. However, for many enterprises, much of theirimportant data resides in relational databases and SQL is the most commonworkload. So is it reasonable to suggest that relational data processingengines can be used to solve graph problems in a productive and performantmanner?
The answer to this question is: “Yes!”
In this talk, we will address the use of massively parallel processing (MPP)databases for graph analytics workloads. We will share some recent findingsfrom the Apache MADlib (incubating) project, including design of graph datastructures, implementation of common graph algorithms, and performanceresults.
Graph analytics is becoming an important part of enterprise computing. Withroots in academia going back many decades, the last 10-15 years have seen ahuge surge of interest in this topic to address a wide range of modern usecases, from cybersecurity to social networks to supply distribution chains.
Enterprises have made significant investments in infrastructure, software, andtraining of their employees, all centered around SQL. So how can an enterpriseadd graph analytics to their business without the cost and complexity ofmoving to specialized graph processing engines? And, what are the tradeoffs?
Graph analytics is a new area of innovation in Apache MADlib, which is a SQL-based open source library for scalable in-database analytics. It providesparallel implementations of mathematical, statistical and machine learningmethods for structured and unstructured data.
Many existing analytics products do not scale in a way that makes itconvenient and economical to operate on large data sets. The graph methods inApache MADlib have been designed to take advantage of the shared-nothing,scale-out parallelism offered by modern parallel database engines. |