Comparison of Texts Streams in the Presence of Mild Adversaries

Malkin, M. and Venkatesan, R.

    Text sifting is a method of quickly and securely identifying documents for database searching, copy detection, duplicate email detection and plagiarism detection. A small amount of text is extracted from a document using hash functions and is used as the document's fingerprint. We build upon previous work by Broder et al. and Heintze, specifically addressing a certain set of attacks that we discovered to be very powerful against previous systems. We achieve robustness against these attacks with a new selection process. We also give theoretical and experimental results for these and other attacks on text sifting functions.
Cite as: Malkin, M. and Venkatesan, R. (2005). Comparison of Texts Streams in the Presence of Mild Adversaries. In Proc. Third Australasian Information Security Workshop (AISW 2005), Newcastle, Australia. CRPIT, 44. Safavi-Naini, R., Montague, P. and Sheppard, N., Eds. ACS. 179-186.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS