Statistical learning methods are commonly applied in content-based video and image retrieval. Such methods require a large number of examples which are usually obtained through a manual annotation process, that is human raters review images and assign semantic concept labels. The human judgement, however, cannot be regarded as the ultimate truth because of its subjectiveness and the likelihood of human error. We can address these issues by using multiple judgements per example, but evaluating and resolving disagreement between raters is problematic. Moreover, the nature of rater disagreement and how to minimise it are not yet well explored. In this paper we present results of a user study that was specifically designed to investigate human judgement of digital imagery. We discuss the influence of factors such as size and type of semantic vocabulary on inter-rater agreement. We demonstrate the application of latent class analysis for combining multiple judgements. Known from applications in the medical and social sciences, this statistic allows robust, quantitative evaluation of multiple judgements per subject. We believe it is well suited for application during the evaluation and modelling phase in semantic image and video retrieval.
|Cite as: Volkmer, T., Thom, J.A. and Tahaghoghi, S.M.M. (2007). Exploring Human Judgement of Digital Imagery. In Proc. Thirtieth Australasian Computer Science Conference (ACSC2007), Ballarat Australia. CRPIT, 62. Dobbie, G., Ed. ACS. 151-160. |
(local if available)