In this paper, we study the problem of measuring
structural similarities of large number of source
schemas against a single domain schema, which is useful
for enhancing the quality of searching and ranking
big volume of source documents on the Web with
the help of structural information. After analyzing
the improperness of adopting existing edit-distance
based methods, we propose a new similarity measure
model that caters for the requirements of the problem.
Given the asymmetric nature of the similarity
comparisons of source schemas with a domain schema,
similarity preserving rules and algorithm are designed
to filter out uninteresting elements in source schemas
for the purpose of optimizing the similarity computation.
Based on the model, a basic algorithm and an
improved algorithm are developed for structural similarity
computation. The improved algorithm makes
full use of a new coding scheme that is devised to
reduce the number of comparisons. Complexities of
both algorithms are analyzed and extensive experiments
are conducted showing the significant performance
gain achieved by the improved algorithm. |
Cite as: Li, J., Liu, C., Yu, J.X., Liu, J., Wang, G. and Yang, C. (2008). Computing Structural Similarity of Source XML Schemas against Domain XML Schema. In Proc. Nineteenth Australasian Database Conference (ADC 2008), Wollongong, NSW, Australia. CRPIT, 75. Fekete, A. and Lin, X., Eds. ACS. 155-164. |
(from crpit.com)
(local if available)
|