Signature Extraction for Overlap Detection in Documents

Finkel, R.A., Zaslavsky, A., Monostori, K. and Schmidt, H.

    Easy access to the Web has led to increased potential for students cheating on assignments by plagiarising others' work. By the same token, Web-based tools offer the potential for instructors to check submitted assignments for signs of plagiarism. Overlap-detection tools are easy to use and accurate in plagiarism detection, so they can be an excellent deterrent to plagiarism. Documents can overlap for other reasons, too: Old documents are superseded, and authors summarize previous work identically in several papers. Overlap-detection tools can pinpoint interconnections in a corpus of documents and could be used in search engines. We describe a web-accessible text registry based on signature extraction. We extract a small but diagnostic signature from each registered text for permanent storage and comparison against other stored signatures. This comparison allows us to estimate the amount of overlap between pairs of documents, although the total time required is linear in the total size of the documents. We compare our algorithm with several alternatives and present both efficiency and accuracy results
Cite as: Finkel, R.A., Zaslavsky, A., Monostori, K. and Schmidt, H. (2002). Signature Extraction for Overlap Detection in Documents. In Proc. Twenty-Fifth Australasian Computer Science Conference (ACSC2002), Melbourne, Australia. CRPIT, 4. Oudshoorn, M. J., Ed. ACS. 59-64.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS