We study the problem of clustering fingerprints with at most p missing values (CMV(p) for short) naturally arising in oligonucleotide fingerprinting, which is an efficient method for characterizing DNA clone libraries. We show that already CMV(2) is NP-hard. We also show that a greedy algorithm yields a min(1 + ln n; 2+p ln l) approximation for CMV(p), and can be implemented to run in O(nl2p) time. Furthermore, we introduce other variants of the problem of clustering fingerprints with at most p missing values based on slightly different optimization criteria and show that they can be approximated in polynomial time with ratios 22p
Cite as: Figueroa, A., Goldstein, A., Jiang, T., Kurowski, M., Lingas, A. and Persson, M. (2005). Approximate Clustering of Fingerprint Vectors with Missing Values. In Proc. Eleventh Computing: The Australasian Theory Symposium (CATS2005), Newcastle, Australia. CRPIT, 41. Atkinson, M. and Dehne, F., Eds. ACS. 57-60.
(from crpit.com)
(local if available)