About

A heterogeneous information network (HIN) is a graphical representation of a dataset. In an HIN, nodes and edges are multi-typed. Examples include DBLP, Yago and Twitter. Discovering knowledge in HINs has been gaining attention from researchers in the last decade. One promising research topic is to determine the relevance of two data objects which enables the application of similarity search, classification and clustering. Sun et al. [3], Huang et al. [2] and Fang et al. [1] proposed some novel data structures named meta-path and meta-structure, collectively known as meta-graphs, aiming to provide alternatives to capture the rich semantic information embedded in an HIN. However, most of the existing works assumed that these structures being used in different applications would be provided by domain experts or could be discovered by enumeration. Those methods are neither efficient nor scalable in handling the discovery in large heterogeneous information networks. Therefore, the goal of this work is to develop a systematic and methodical algorithm framework allowing efficient discovery of meta-graphs in large heterogeneous information networks. In particular, the framework being developed will be used for graph querying and this "Query-by-example" application will demonstrate the benefits of the proposed framework.


Reference
[1] Y. Fang, W. Lin, V. W. Zheng, M. Wu, K. C. C. Chang, and X. L. Li. Semantic proximity search on graphs with metagraph-based learning. In ICDE, pages 277–288. IEEE, 2016. [PDF]
[2] Z. Huang, Y. Zheng, R. Cheng, Y. Sun, N. Mamoulis, and X. Li. Meta structure: Computing relevance in large heterogeneous information networks. In SIGKDD, 2016. [PDF]
[3] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. PathSim: Meta path-based top-k similarity search in heterogeneous information networks. In PVLDB, 4(11), 2011. [PDF]

People

Supervisor:
Dr. Reynold C.K. Cheng
ckcheng[at]cs[dot]hku[dot]hk
Student:
Florence Y. Fung