Optimized XPath evaluation for Schema-compressed XML data

B÷ttcher, S., Hartel, R. and Heindorf, S.

    XML has become the de facto standard for data exchange in enterprise information systems. But whenever XML data is stored or processed, e.g. in form of a DOM tree representation, the XML markup causes a huge blow-up of the memory consumption compared to the data, i.e., text and attribute values, contained in the XML document. In this paper, we present an optimized XPath query evaluation for XSDS, an XML compression approach based on removing information that is obsolete as this information can be derived from the existing XML Schema definition (XSD). Thereby, XSDS allows for storing and exchanging XML data in a space efficient and still queryable way. While previous papers have shown that XSDS generally reaches stronger compression ratios than other approaches like gzip, bzip2, and XMill and that XPath queries can be evaluated on XSDS compressed data, we show in this paper that when optimizing the query evaluation on XSDS compressed data by using the given schema information, we can speed up query evaluation by a factor of 13 reaching evaluation times that are more than 5 times faster than those of JAXP đ the standard Java XPath evaluator. The speed up was reached by avoiding the decompression of large parts of the structure while evaluating the query.
Cite as: BÜttcher, S., Hartel, R. and Heindorf, S. (2012). Optimized XPath evaluation for Schema-compressed XML data. In Proc. Australasian Database Conference (ADC 2012) Melbourne, Australia. CRPIT, 124. Zhang, R. and Zhang, Y. Eds., ACS. 137-144
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS