Conferences in Research and Practice in Information Technology
  

Online Version - Last Updated - 20 Jan 2012

 

 
Home
 

 
Procedures and Resources for Authors

 
Information and Resources for Volume Editors
 

 
Orders and Subscriptions
 

 
Published Articles

 
Upcoming Volumes
 

 
Contact Us
 

 
Useful External Links
 

 
CRPIT Site Search
 
    

An Empirical Study of Learning from Imbalanced Data

Zhang , X. and Li, Y.

    No consistent conclusions have been drawn from existing studies regarding the e ectiveness of di erent approaches to learning from imbalanced data. In this paper we apply bias-variance analysis to study the utility of di erent strategies for imbalanced learning. We conduct experiments on 15 real-world imbalanced datasets of applying various resampling and induction bias adjustment strategies to the standard decision tree, naive bayes and k-nearest neighbour (k-NN) learning algorithms. Our main ndings include: Imbalanced class distribution is primarily a high bias problem, which partly explains why it impedes the performance of many standard learning algorithms. Compared to the re-sampling strategies, adjusting induction bias can more signi cantly vary the bias and variance components of classi cation errors. Especially the inverse distance weighting strategy can signi cantly reduce the variance errors for k-NN. Based on these ndings we o er practical advice on applying the re-sampling and induction bias adjustment strategies to improve imbalanced learning.
Cite as: Zhang , X. and Li, Y. (2011). An Empirical Study of Learning from Imbalanced Data. In Proc. Australasian Database Conference (ADC 2011) Perth, Australia. CRPIT, 115. Heng Tao Shen and Yanchun Zhang Eds., ACS. 85-94
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS