|
| | | |
An Empirical Study of Learning from Imbalanced Data
Zhang , X. and Li, Y.
No consistent conclusions have been drawn from existing studies regarding the eectiveness of dierent approaches to learning from imbalanced data. In this paper we apply bias-variance analysis to study the utility of dierent strategies for imbalanced learning.
We conduct experiments on 15 real-world imbalanced datasets of applying various resampling and induction bias adjustment strategies to the standard decision tree, naive bayes and k-nearest neighbour (k-NN)
learning algorithms. Our main ndings include: Imbalanced class distribution is primarily a high bias
problem, which partly explains why it impedes the
performance of many standard learning algorithms.
Compared to the re-sampling strategies, adjusting induction bias can more signicantly vary the bias and
variance components of classication errors. Especially the inverse distance weighting strategy can signicantly reduce the variance errors for k-NN. Based
on these ndings we oer practical advice on applying the re-sampling and induction bias adjustment
strategies to improve imbalanced learning. |
Cite as: Zhang , X. and Li, Y. (2011). An Empirical Study of Learning from Imbalanced Data. In Proc. Australasian Database Conference (ADC 2011) Perth, Australia. CRPIT, 115. Heng Tao Shen and Yanchun Zhang Eds., ACS. 85-94 |
(from crpit.com)
(local if available)
|
|