This paper presents a detailed study of the behavior of three different content-based collaborative filtering metrics (correlation, cosine and mean squared difference) when they are processed on several ratio matrices with different levels of sparsity. The total number of experiments carried out is 648, in which the following parameters are varied: metric used, number of k-neighborhoods, sparsity level and type of result (mean absolute error, percentage of incorrect predictions, percentage of correct predictions and capacity to generate predictions). The results are illustrated in two and three-dimensional representative graphs. The conclusions of the paper emphasize the superiority of the correlation metric over the cosine metric, and the unusually good results of the mean squared difference metric when used on matrices with high sparsity levels, leading us to interesting future studies.
|Cite as: Bobadilla, J. and Serradilla, F. (2009). The Effect of Sparsity on Collaborative Filtering Metrics. In Proc. Twentieth Australasian Database Conference (ADC 2009), Wellington, New Zealand. CRPIT, 92. Bouguettaya, A. and Lin, X., Eds. ACS. 9-17. |
(local if available)