Recently several offline data compression schemes have been published that expend large amounts of computing resources when encoding a file, but decode the file quickly. These compressors work by identifying phrases in the input data, and storing the data as a series of pointer to these phrases. This paper explores the application of an algorithm for computing all repeating substrings within a string for phrase selection in an offline data compressor. Using our approach, we obtain compression similar to that of the best known offline compressors on genetic data, but poor results on general text. It seems, however, that an alternate approach based on selecting repeating substrings is feasible.
|Cite as: Turpin, A. and Smyth, W.F. (2002). An Approach to Phrase Selection for Offline Data Compression. In Proc. Twenty-Fifth Australasian Computer Science Conference (ACSC2002), Melbourne, Australia. CRPIT, 4. Oudshoorn, M. J., Ed. ACS. 267-273. |
(local if available)