This paper discusses the impact of making modifications to partition-discovering missing value imputation techniques, and through this process develops a novel imputation algorithm which makes use of
partition discovering and multiple imputation - two
state of the art techniques. We discuss the difference
between global and partition-discovering imputation
techniques and show how the techniques have been
developed over time through making modifications to
existing techniques in the literature.
Beginning by examining the role of missing value
imputation as it relates to the world's increasing desire for data analysis, we proceed to review the current state of the art in regards to global and partition-
discovering imputation techniques, and categorise a
variety of existing algorithms into these classes. Provided in this section is an in-depth discussion of an algorithm from each of these categories (EMI and SiMI)
in order to gain a greater understanding of how each
one works before developing novel techniques.
This is followed by the presentation of several variants to the SiMI algorithm, which are used as a
launchpad to our discussion of our proposed technique, the MultiSiMI algorithm, which is shown
to improve SiMI's quality of imputation on 6 of 7
datasets tested. This technique is the major contribution of this paper. Each section with a variant of
SiMI presents experimental results for the variant discussed in order to gain an understanding of how intelligent modifications to existing algorithms can result
in superior novel techniques such as MultiSiMI. We
conclude by reviewing the contributions of the paper
and recommending some future research directions.
|Cite as: Furner, M. and Islam, M.Z. (2015). Multiple Imputation on Partitioned Datasets. In Proc. Thirteenth Australasian Data Mining Conference (AusDM 2015) Sydney, Australia. CRPIT, 168. Ong, K.L., Zhao, Y., Stone, M.G. and Islam, M.Z. Eds., ACS. 59-68 |
(local if available)