Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G , compute the set M(Q,G) of matches of Q in G . However, these algorithms often return an excessive number of matches, and are expensive on large real-life social graphs. Moreover, in practice many social queries are to find matches of a specific pattern node, rather than the entire M(Q,G) . This paper studies top- k graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node u o . Given G and Q , it is to find those nodes in M(Q,G) that match u o , instead of the large set M(Q,G) . (2) We study two classes of functions for ranking the matches: relevance functions δ r () based on, e.g., social impact, and distance functions δ d () to cover diverse elements. (3) We develop two algorithms for computing top- k matches of u o based on δ r (), with the early termination property, i.e., they find top- k matches without computing the entire M(Q,G) . (4) We also study diversified top- k matching, a bi-criteria optimization problem based on both δ r () and δ d (). We show that its decision problem is NP-complete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using real-life and synthetic data, we experimentally verify that our (diversified) top- k matching algorithms are effective, and outperform traditional matching algorithms in efficiency.
Jiahua XieHongmei ChenLizhen Wang
Aissam AouarSaïd YahiaouiLamia SadegKadda Beghdad Bey
Jiefeng ChengXianggang ZengJeffrey Xu Yu
Xianggang ZengJiefeng ChengJeffrey Xu YuShengzhong Feng