CATAPULT: Data-driven Selection of Canned Patterns for Efficient Visual Graph Query Formulation

ACM SIGMOD

Huang, Kai Chua, Huey Bhowmick, Sourav Choi, Byron Zhou, Shuigeng

Formal Metadata

Title

Title of Series

SIGMOD 2019

Number of Parts

155

Author

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/43067 (DOI)

Publisher

ACM SIGMOD

Release Date

2019

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Visual graph query interfaces (a.k.a textscgui) widen the reach of graph querying frameworks across different users by enabling non-programmers to use them. Consequently, several commercial and academic frameworks for querying a large collection of small- or medium-sized data graphs (e.g., chemical compounds) provide such visual interfaces. Majority of these interfaces expose a fixed set of canned patterns (i.e., small subgraph patterns) to expedite query formulation by enabling pattern-at-a-time in lieu of edge-at-a-time construction mode. Canned patterns to be displayed on a textscgui are typically selected manually based on domain knowledge. However, manual generation of canned patterns is labour intensive. Furthermore, these patterns may not sufficiently cover the underlying data graphs to expedite visual formulation of a wide range of subgraph queries. In this paper, we present a generic and extensible framework called textscCatapult to address these limitations. textscCatapult takes a data-driven approach to automatically select canned patterns, thereby taking a concrete step towards the vision of data-driven construction of visual query interfaces. Specifically, it first clusters the underlying data graphs based on their topological similarities and then summarize each cluster to create a cluster summary graph (textsccsg). The canned patterns within a user-specified pattern budget are then generated from these textsccsgs by maximizing coverage and diversity, and minimizing cognitive load of the patterns. Experimental study with real-world datasets and visual graph interfaces demonstrates the superiority of textscCatapult compared to traditional techniques.