Text sifting is a method of quickly and securely identifying documents for database searching, copy detection, duplicate email detection and plagiarism detection. A small amount of text is extracted from a document using hash functions and is used as the document's fingerprint. We build upon previous work by Broder et al. and Heintze, specifically addressing a certain set of attacks that we discovered to be very powerful against previous systems. We achieve robustness against these attacks with a new selection process. We also give theoretical and experimental results for these and other attacks on text sifting functions.
|Cite as: Malkin, M. and Venkatesan, R. (2005). Comparison of Texts Streams in the Presence of Mild Adversaries. In Proc. Third Australasian Information Security Workshop (AISW 2005), Newcastle, Australia. CRPIT, 44. Safavi-Naini, R., Montague, P. and Sheppard, N., Eds. ACS. 179-186. |
(local if available)