Exploit Keyword Query Semantics and Structure of Data for Effective XML Keyword Search

Nguyen, K. and Cao, J.

Keyword search is a natural and user-friendly mechanism for querying XML data in information systems and Web based applications. One of the key tasks is to identify and return meaningful fragments as results, due to the limited expressiveness and the ambiguity of keyword queries. In this paper, we first studied query keyword patterns in order to exploit the user’s search intention behind the input keywords. The outcome of this task is that keywords in the query are classified as required information and search conditions (or predicates). In addition, unlike previous work that our work only returns desired fragments as results. Each returned result must satisfy the search conditions rather than simply contain all query keywords. To further prune irrelevant fragments we introduce a novel notion called Relevant Lowest Common Ancestor (RLCA) which effectively and precisely captures the meaningful and relevant fragments to the given keyword query. We conducted extensive experimental studies to prove the effectiveness of our approach.

Cite as: Nguyen, K. and Cao, J. (2010). Exploit Keyword Query Semantics and Structure of Data for Effective XML Keyword Search. In Proc. 21st Australasian Database Conference (ADC 2010) Brisbane, Australia. CRPIT, 104. Shen H.T. and Bouguettaya, A. Eds., ACS. 133-140

(from crpit.com) (local if available)