Discovering Conditional Functional Dependencies in XML Data

Vo, L. T.H., Cao, J. and Rahayu, W.

    XML data inconsistency has become a serious problem since XML was widely adopted as a standard for data representation on the web. XML-based standards such as OASIS, xCBL and xBRL have been used to report and exchange business and financial information. Such standards focus on technical rather than semantic aspects. XML Functional Dependencies (XFDs) have been introduced to improve XML semantic expressiveness. However, existing approaches to XFD discovery that have been proposed mainly for enhancing schema design are not capable of dealing with data inconsistency. They cannot find a proper set of semantic constraints from the data, and thus are insufficient for capturing data inconsistency. In this paper we propose an approach, called XDiscover, to discover a set of minimal XML Conditional Functional Dependencies (XCFDs) from a given XML instance to improve data consistency. The XCFD notion is extended from XFDs by incorporating conditions into XFD specifications. XCFDs can be used to constrain data process and also to detect and correct non-compliant data. XDiscover incorporates pruning rules into discovering process to improve searching performance. We present several case studies to demonstrate the effectiveness of our approach.
Cite as: Vo, L. T.H., Cao, J. and Rahayu, W. (2011). Discovering Conditional Functional Dependencies in XML Data. In Proc. Australasian Database Conference (ADC 2011) Perth, Australia. CRPIT, 115. Heng Tao Shen and Yanchun Zhang Eds., ACS. 143-152
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS