Mining frequent itemsets is an established approach to data mining andsupported by productive data mining solutions. For example, one can getinsights about buyers’ behavior by analyzing frequent co-occurrences ofproducts in shopping baskets. In contrast, frequent subgraph mining (FSM), thegraphy variant of frequent itemset mining, not only evaluates entity co-occurrence but also relationships among entities, i.e., structural patterns.However, existing implementations are all research prototypes which aretailored to textbook problems. In our talk, we want to give an introduction to the FSM problem on distributedcollections of graphs and our implementation in Gradoop, an open source systemfor scalable graph analytics based on Apache Flink. In contrast to otheriterative graph algorithms like page rank, in FSM the search space is droppedbut intermediate results of iterations are the desired result. Here, the majortechnical challenge is the respective usage of Flinks’ distributed iterations. We will explain different implementation approaches, discuss implementationdetails which influence scalability and show benchmark results. Intended audience and goal of the talk: Developers and analysts, interested inrelationship-centric data mining techniques |