|
| | | |
Ranking-Constrained Keyword Sequence Extraction from Web Documents
Chen, D., Li, X., Liu, J. and Chen, X.
Given a large volume of Web documents, we consider
problem of finding the shortest keyword sequences for
each of the documents such that a keyword sequence
can be rendered to a given search engine, then the
corresponding Web document can be identified and
is ranked at the first place within the results. We
call this system as an Inverse Search Engine (ISE).
Whenever a shortest keyword sequence is found for
a given Web document, the corresponding document
can be returned as the first document by the given
search engine. The resulting keyword sequence is
search-engine dependent. The ISE therefore can be
used as a tool to manage Web content in terms of
the extracted shortest keyword sequences. In this
way, a traditional keyword extraction process is constrained by the document ranking method adopted
by a search engine. The significance is that the whole
Web-searchable documents on the World Wide Web
can then be partitioned according to their keyword
phrases. This paper discusses the design and implementation of the proposed ISE. Four evaluation measures are proposed and are used to show the effectiveness and efficiency of our approach. The experiment
results set up a test benchmark for further researches. |
Cite as: Chen, D., Li, X., Liu, J. and Chen, X. (2009). Ranking-Constrained Keyword Sequence Extraction from Web Documents. In Proc. Twentieth Australasian Database Conference (ADC 2009), Wellington, New Zealand. CRPIT, 92. Bouguettaya, A. and Lin, X., Eds. ACS. 161-169. |
(from crpit.com)
(local if available)
|
|