Constructing Good Quality Web Page Communities

Hou, J. and Zhang, Y.

    The World Wide Web is a rich source of information and continues to expand in size and complexity. To capture the features of the Web at a higher level to realise the information classification and efficient retrieval on the Web is becoming a challenge task. One natural way is to exploit the linkage information among the Web pages. Previous work such as HITS in this area is based on a set of retrieved pages to get a Web community that is a bunch of pages related to the query topics. Since the set of retrieved pages may contain many unrelated pages (noise pages) to the query topics, the obtained Web community sometimes is unsatisfactory. In this paper, we propose an innovative algorithm to eliminate noise pages from the set of retrieved pages and improve its quality. This improvement will enable existing community construction algorithms to construct good quality Web page communities. The proposed algorithm reveals and takes advantage of the relationships among concerned Web pages at a deeper level. The numerical experiment results show the effectiveness and feasibility of the algorithm. This algorithm could also be used solely to filter unnecessary Web pages and reduce the management cost and burden of Web-based data management systems. The ideas in the algorithm can also be applied to other hyperlink analysis.
Cite as: Hou, J. and Zhang, Y. (2002). Constructing Good Quality Web Page Communities. In Proc. Thirteenth Australasian Database Conference (ADC2002), Melbourne, Australia. CRPIT, 5. Zhou, X., Ed. ACS. 65-74.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS