A System for Managing Data Provenance in In Silico Experiments

Trevathan, J., Atkinson, I., Read, W., Sim, N. and Christensen, C.

    In silico experiments use computers or computer simulation to speed up the rate at which scientific discoveries are made. However, the voluminous amounts of data generated in such experiments is often recorded in an ad hoc manner without regard to workflow, and often lacks rigorous business rules. The absence of stringent auditing and reporting policies makes it difficult to repeat experiments and largely denies independent parties the ability to verify study results. This paper presents a data provenance management system based on the utility of the ICAT metadata storage service as a viable schema for representing in silico experiments. The system provides a portal interface to integrate ICAT with job execution. We have built on a data repository which can handle arbitrary data size, complexity and type. This can be practically used to compare, validate and aid in the repetition of historic experiments. Furthermore, data can be verified via external repositories/sources which will ultimately enhance the scientific merit of in silico experimentation. Our proposed system augments existing applications and therefore does not require users to modify their current experimentation platform. A test case for a pharmacological study is presented to illustrate the proposed system’s versatility for reporting and auditing of experiments and their results.
Cite as: Trevathan, J., Atkinson, I., Read, W., Sim, N. and Christensen, C. (2011). A System for Managing Data Provenance in In Silico Experiments. In Proc. Australasian Database Conference (ADC 2011) Perth, Australia. CRPIT, 115. Heng Tao Shen and Yanchun Zhang Eds., ACS. 65-74
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS