The Privacy of k-NN Retrieval for Horizontal Partitioned Data - New Methods and Applications

Amirbekyan, A. and Estivill-Castro, V.

    Recently, privacy issues have become important in clustering analysis, especially when data is horizontally partitioned over several parties. Associative queries are the core retrieval operation for many data mining algorithms, especially clustering and k-NN classification. The algorithms that effciently support k-NN queries are of special interest. We show how to adapt well-known data structures to the privacy preserving context and what is the overhead of this adaptation. We present an algorithm for k-NN in secure multiparty computation. This is based on presenting private computation of several metrics. As a result, we can offer three approaches to associative queries over horizontally partitioned data with progressively less security. We show privacy preserving algorithms for data structures that induce a partition on the space; such as KD-Trees. Our next preference is our Privacy Preserving SASH. However, we demonstrate that the most effective approach to achieve privacy is separate data structures for parties, where associative queries work separately, followed by secure combination to produce the overall output. This idea not only enhances security but also reduces communication cost between data holders. Our results and protocols also enable us to improve on previous approaches for k-NN classification.
Cite as: Amirbekyan, A. and Estivill-Castro, V. (2007). The Privacy of k-NN Retrieval for Horizontal Partitioned Data - New Methods and Applications. In Proc. Eighteenth Australasian Database Conference (ADC 2007), Ballarat, Australia. CRPIT, 63. Bailey, J. and Fekete, A., Eds. ACS. 33-42.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS