Multiple Imputation on Partitioned Datasets

Furner, M. and Islam, M.Z.

    This paper discusses the impact of making modifications to partition-discovering missing value imputation techniques, and through this process develops a novel imputation algorithm which makes use of partition discovering and multiple imputation - two state of the art techniques. We discuss the difference between global and partition-discovering imputation techniques and show how the techniques have been developed over time through making modifications to existing techniques in the literature. Beginning by examining the role of missing value imputation as it relates to the world's increasing desire for data analysis, we proceed to review the current state of the art in regards to global and partition- discovering imputation techniques, and categorise a variety of existing algorithms into these classes. Provided in this section is an in-depth discussion of an algorithm from each of these categories (EMI and SiMI) in order to gain a greater understanding of how each one works before developing novel techniques. This is followed by the presentation of several variants to the SiMI algorithm, which are used as a launchpad to our discussion of our proposed technique, the MultiSiMI algorithm, which is shown to improve SiMI's quality of imputation on 6 of 7 datasets tested. This technique is the major contribution of this paper. Each section with a variant of SiMI presents experimental results for the variant discussed in order to gain an understanding of how intelligent modifications to existing algorithms can result in superior novel techniques such as MultiSiMI. We conclude by reviewing the contributions of the paper and recommending some future research directions.
Cite as: Furner, M. and Islam, M.Z. (2015). Multiple Imputation on Partitioned Datasets. In Proc. Thirteenth Australasian Data Mining Conference (AusDM 2015) Sydney, Australia. CRPIT, 168. Ong, K.L., Zhao, Y., Stone, M.G. and Islam, M.Z. Eds., ACS. 59-68
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS