Performance and Cost Tradeoffs in Web Search

Craswell, N., Crimmins, F., Hawking, D. and Moffat, A.

Web search engines crawl the web to fetch the data that they index. In this paper we re-examine that need, and evaluate the network costs associated with data acquisition, and alternative ways in which a search service might be supported. As a concrete example, we make use of the Research Finder search service provided at http://rf.panopticsearch.com, and information derived from its crawl and query logs. Based upon an analysis of the Research Finder system we introduce a hybrid arrangement, in which queries are evaluated partially by reference to a centrally maintained index representing a subset of the collection, and partially by referring them on to the local search services maintained by the balance of the collection. We also examine various ways in which crawling costs can be reduced.

Cite as: Craswell, N., Crimmins, F., Hawking, D. and Moffat, A. (2004). Performance and Cost Tradeoffs in Web Search. In Proc. Fifteenth Australasian Database Conference (ADC2004), Dunedin, New Zealand. CRPIT, 27. Schewe, K.-D. and Williams, H. E., Eds. ACS. 161-169.

(from crpit.com) (local if available)