Privacy-Preserving Distributed Queries for a Clinical Case Research Network

Schadow, G., Grannis, S.J. and McDonald, C.J.

We present the motivation, use-case and requirements of a clinical case research network that would allow biomedical researchers to perform retrospective analysis on de-identified clinical cases joined across a large scale (nationwide) distributed network. Based on semi-join adaptive plans for fusion-queries, in this paper we discuss how joining can be done in a way that protects the privacy of the individual patients involved. Our method is based on a cryptographically strong keyed-hash algorithm (HMAC.) These hash values are truncated and the resulting hash-collisions in semi-join filters are exploited to limit the ability of an apprentice-site to re-identify patients in the filter. As a measure of privacy we use likelihood ratios. Since the join key is based on real person identifiers, we need to apply the methods of record linkage to hashing and semi-join filters. We find that multiple disjunctive rules as used in deterministic matching, lead here to a higher privacy risk than rules based on a single identifier vector.

Cite as: Schadow, G., Grannis, S.J. and McDonald, C.J. (2002). Privacy-Preserving Distributed Queries for a Clinical Case Research Network. In Proc. IEEE ICDM Workshop on Privacy, Security and Data Mining (PSDM 2002), Maebashi City, Japan. CRPIT, 14. Clifton, C. and Estivill-Castro, V., Eds. ACS. 55-65.

(from crpit.com) (local if available)