CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching

Cite

ACM SIGMOD

Bhattarai, Bibek Liu, Hang Huang, H. Howie

Formal Metadata

Title

CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching

Title of Series

SIGMOD 2019

Number of Parts

155

Author

Bhattarai, Bibek

Liu, Hang

Huang, H. Howie

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/43055 (DOI)

Publisher

ACM SIGMOD

Release Date

2019

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Subgraph matching finds all distinct isomorphic embeddings of a query graph on a data graph. For large graphs, current solutions face the scalability challenge due to expensive joins, excessive false candidates, and workload imbalance. In this paper, we propose a novel framework for subgraph listing based on Compact Embedding Cluster Index (idx), which divides the data graph into multiple embedding clusters for parallel processing. The sub has three unique techniques: utilizing the BFS-based filtering and reverse-BFS-based refinement to prune the unpromising candidates early on, replacing the edge verification with set intersection to speed up the candidate verification, and using search cardinality based cost estimation for detecting and dividing large embedding clusters in advance. The experiments performed on several real and synthetic datasets show that the sub outperforms state-of-the-art solutions on average by 20.4times for listing all embeddings and by 2.6times for enumerating the first 1,024 embeddings.