|
| | | |
Indexing for Fast Categorisation
Shanks, V.R., Williams, H.E. and Cannane, A.
Automatic categorisation is an important technique for the management of large document collections. Categorisation can be used to store or locate documents that satisfy an information need when the need cannot be expressed as a concise list of query terms. Inverted indexes are used in all query-based retrieval systems to allow efficient query processing. In this paper, we propose the application of inverted indexes to categorisation with the aim of developing a fast, scalable, and accurate approach. Specifically, we propose successful variants of inverted indexing to reduce index size: first, quantisation of term-category weights; second, compression of the quantised weights; and, last, storing, only those weights that significantly impact the categorisation process. We show that our technique permits fast, accurate categorisation: index size is reduced by orders of magnitude compared to conventional inverted indexing and the accuracy of categorisation is preserved. |
Cite as: Shanks, V.R., Williams, H.E. and Cannane, A. (2003). Indexing for Fast Categorisation. In Proc. Twenty-Sixth Australasian Computer Science Conference (ACSC2003), Adelaide, Australia. CRPIT, 16. Oudshoorn, M. J., Ed. ACS. 119-127. |
(from crpit.com)
(local if available)
|
|