Index Compression Using Fixed Binary Codewords

Anh, V.N. and Moffat, A.

Document retrieval and web search engines index large quantities of text. The static costs associated with storing the index can be traded against dynamic costs associated with using it during query evaluation. Typically, index representations that are effective and obtain good compression tend not to be efficient, in that they require more operations during query processing. In this paper we describe a scheme for compressing lists of integers as sequences of fixed binary codewords that has the twin benefits of being both effective and efficient. Experimental results are given on several large text collections to validate these claims.

Cite as: Anh, V.N. and Moffat, A. (2004). Index Compression Using Fixed Binary Codewords. In Proc. Fifteenth Australasian Database Conference (ADC2004), Dunedin, New Zealand. CRPIT, 27. Schewe, K.-D. and Williams, H. E., Eds. ACS. 61-67.

(from crpit.com) (local if available)