The archiving and maintenance of vast quantities of data is a key challenge for the current use of information technology. When storing large repositories, possibly mirrored at multiple sites, an archiving system aims to reduce both storage and transmission costs. Delta compression is a key component of many archiving and backup systems. A file may be stored succinctly as a sequence of references to other files in the collection, establishing a dependency relationship between files. On the one hand, exploiting large dependency chains provides excellent compression. On the other hand, if a file is stored compactly, so that it depends on hundreds of other files, then retrieving it from the archive may be very time and resource consuming.
This paper assesses the scalability of delta compression of typical data collections. We use experiments to model and examine the dependency relationship, and quantify the cost of full use of dependencies. We propose strategies to reduce dependencies and yet retain highly effective compression.
|Cite as: Molfetas, A., Wirth, A. and Zobel, J. (2014). Scalability in Recursively Stored Delta Compressed Collections of Files. In Proc. Australasian Web Conference (AWC 2014) Auckland, New Zealand. CRPIT, 155. Trotman, A., Cranefield, S. and Yang, J. Eds., ACS. 21-30 |
(local if available)