Empirical Study of Bagging Predictors on Medical Data

Liang, Guohua and Zhang, Chengqi

This study investigates the performance of bagging in terms of learning from imbalanced medical data. It is important for data miners to achieve highly accurate prediction models, and this is especially true for imbalanced medical applications. In these situations, practitioners are more interested in the minority class than the majority class; however, it is hard for a traditional supervised learning algorithm to achieve a highly accurate prediction on the minority class, even though it might achieve better results according to the most commonly used evaluation metric, Accuracy. Bagging is a simple yet effective ensemble method which has been applied to many real-world applications. However, some questions have not been well answered, e.g., whether bagging outperforms single learners on medical data-sets; which learners are the best predictors for each medical data-set; and what is the best predictive performance achievable for each medical data-set when we apply sampling techniques. We perform an extensive empirical study on the performance of 12 learning algorithms on 8 medical data-sets based on four performance measures: True Positive Rate (TPR), True Negative Rate (TNR), Geometric Mean (G-mean) of the accuracy rate of the majority class and the minority class, and Accuracy as evaluation metrics. In addition, the statistical analyses performed instil confidence in the validity of the conclusions of this research.

Cite as: Liang, Guohua and Zhang, Chengqi (2011). Empirical Study of Bagging Predictors on Medical Data. In Proc. Australasian Data Mining Conference (AusDM 11) Ballarat, Australia. CRPIT, 121. Vamplew, P., Stranieri, A., Ong, K.-L., Christen, P. and Kennedy, P. J. Eds., ACS. 31-40

(from crpit.com) (local if available)