|
| | | |
Tree Based Scalable Indexing for Multi-Party Privacy-Preserving Record Linkage
Ranbaduge, T., Christen, P. and Vatsalan, D.
Recently, the linking of multiple databases to identify
common sets of records has gained increasing recognition
in application areas such as banking, health,
insurance, etc. Often the databases to be linked contain
sensitive information, where the owners of the
databases do not want to share any details with any
other party due to privacy concerns. The linkage of
records in different databases without revealing their
actual values is an emerging research discipline known
as privacy-preserving record linkage. Comparison of
records in multiple databases requires significant time
and computational resources to produce the resulting
matching sets of records. At the same time, preserving
the privacy of the data is becoming more problematic
with the increase of database sizes.
We propose a novel indexing (blocking) approach
for privacy-preserving record linkage between multiple
(more than two) parties. Our approach is based on
Bloom filters to encode attribute values into bit vectors.
The Bloom filters are used to construct a single bit
tree, where the encoded records are arranged into
different blocks. The approach requires the parties
to only participate in a secure summation protocol
to find the best bits to construct the trees in a balanced
manner. Leaf nodes of the trees will contain
the blocks with encoded records. These blocks can finally be compared using private comparison and classification techniques to determine the similar record
sets in different databases. Experiments conducted
with datasets of sizes up-to one million records show
that our protocol is scalable with both the size of the
datasets and the number of parties, while providing
better blocking quality and privacy than a phonetic
based indexing approach. |
Cite as: Ranbaduge, T., Christen, P. and Vatsalan, D. (2014). Tree Based Scalable Indexing for Multi-Party Privacy-Preserving Record Linkage. In Proc. Twelfth Australasian Data Mining Conference (AusDM14) Brisbane, Australia. CRPIT, 158. Li, X., Liu, L., Ong, K.L. and Zhao, Y. Eds., ACS. 31-42 |
(from crpit.com)
(local if available)
|
|