Link Prediction and Topological Feature Importance in Social Networks

Curiskis, S.A., Osborn, T.R. and Kennedy, P.J.

    The problem of link prediction describes how to ac- count for the development of connection structure in a graph. There are many applications of link predic- tion, such as predicting missing links and future links in online social networks. Much of the literature has focused on limited characteristics of the graph topol- ogy or on node attributes, rather than a broad range of measures. There is a rich spectrum of topologi- cal features associated with a graph, such as neigh- bourhood similarity scores, node centrality measures, community structure and path-based distance mea- sures. In this paper we formulate a supervised learn- ing approach to link prediction using a feature set of graph measures chosen to capture a wide range of topological structure. This approach has the ad- vantage that it can be applied to any graph where the connection structure is known. Random forest learning models are used for their high accuracy and measures of feature importance. The feature impor- tance scores reveal the strength of contribution of the topological predictors for link prediction in a variety of synthetically generated network datasets, as well as three real world citation networks. We investigate both undirected and directed cases. Our results show that this approach can deliver very high model pre- cision and recall performance in certain graphs, and good performance generally. Our models also con- sistently outperform a simpler comparison model we developed to resemble earlier work. In addition, our analysis of variable importance for each dataset re- veals meaningful information regarding deep network properties.
Cite as: Curiskis, S.A., Osborn, T.R. and Kennedy, P.J. (2015). Link Prediction and Topological Feature Importance in Social Networks. In Proc. Thirteenth Australasian Data Mining Conference (AusDM 2015) Sydney, Australia. CRPIT, 168. Ong, K.L., Zhao, Y., Stone, M.G. and Islam, M.Z. Eds., ACS. 39-50
pdf (from pdf (local if available) BibTeX EndNote GS