Document DNA: Content Centric Provenance Data Tracking in Documents

Rinck, M., Hinze, A. and Bainbridge, D.

    This paper presents a new content centric approach to provenance data tracking: Document DNA. We present the results and analysis of our exploratory study on the re-use and re-finding of content contained in digital documents. The study's results support our content-centric approach, as users have difficulties on keeping track of re-used content. The Document DNA presented in this paper is a distributed approach that tracks content when it is copy pasted in between documents by attaching a signature to the content. This signature evolves according to the changes made to the content, therefore allowing for tracking the changes made to content. By choosing a distributed approach, we achieve independence of central management systems. Since the Document DNA is adapted on the fly, we do not need post action analysis, making our approach very accurate. This paper includes a detailed description of the initial study, the theoretical concept and finally the prototype which implements this concept as a Microsoft Word's add-in.
Cite as: Rinck, M., Hinze, A. and Bainbridge, D. (2014). Document DNA: Content Centric Provenance Data Tracking in Documents. In Proc. Thirty-Seventh Australasian Computer Science Conference (ACSC 2014) Auckland, New Zealand. CRPIT, 147. Thomas, B. and Parry, D. Eds., ACS. 57-66
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS