|
| | | |
We wish you a happy and safe holiday season and all the best for 2025
Improving Scalability and Performance of Random Forest Based Learning-to-Rank Algorithms by Aggressive Subsampling
Ibrahim, M. and Carman, M.
Random forest based Learning-to-rank (LtR) algorithms
exhibit competitive performance to other state-of-the-art
algorithms. Traditionally, each tree of the forest is learnt
from a bootstrapped copy (sampled with replacement) of
the training set, where approximately 63% examples are
unique, although some studies show that sampling without
replacement also works well. The goal of using a
bootstrapped copy instead of the original training set is to
reduce correlation among individual trees, thereby making
the prediction of the ensemble more accurate. In this
study, we investigate whether we can decrease the correlation
of the trees even more without compromising accuracy.
Among several potential options, we work with
the sub-sample used for learning individual trees. We investigate
the performance of a random forest based LtR
algorithm as we reduce the size of the sub-samples used
for learning individual trees. Experiments on Letor data
sets reveal that substantial reduction of training time can
be achieved using only small amount of data training data.
Not only that, the accuracy is likely to increase while
maintaining the same level of performance stability as the
baseline. Thus in addition to the existing benefit of being
completely parallelizable, this study empirically discovers
yet another ingredient of random forest based LtR
algorithms for making them one of the top contenders for
large scale LtR. |
Cite as: Ibrahim, M. and Carman, M. (2014). Improving Scalability and Performance of Random Forest Based Learning-to-Rank Algorithms by Aggressive Subsampling. In Proc. Twelfth Australasian Data Mining Conference (AusDM14) Brisbane, Australia. CRPIT, 158. Li, X., Liu, L., Ong, K.L. and Zhao, Y. Eds., ACS. 91-99 |
(from crpit.com)
(local if available)
|
|