In this study we have designed a novel algorithm for searching common segments in multiple DNA sequences. To improve efficiency in pattern searching, combination of hashing encoding, quick sorting and ladderlike stepping and/or interval jumping techniques are applied. Since multiple sequence alignment of DNA sequences from the giant genomic database is usually time consuming, we develop a three-phase methodology to search common sub-segments and reduce its time complexity for pattern matching. In the first coding phase, DNA nucleotide sequences are transformed into a numerical space set. Subsequently, the quick sort algorithms are employed in the second sorting stage to reorder the encoded data. In the last searching phase, ladderlike stepping and interval jumping rules are proposed to increase efficiencies of numerical comparison. In addition, two interval segmentation techniques, uniform partition and bitwise partition are applied prior to interval jumping procedures. The segmenting methodologies are designed according to the length of searching pattern, and the proposed ladderlike searching algorithms provide robust and improved performance. Experimental results show that the algorithms are capable of reducing time complexity from O(mLi(Li - m+1)+mLj(Lj-m+1)) to O(|Ii|+|IJ|).
|Cite as: Pai, T.-W., Chang, W.-Y., Chang, M.D.-T., Chu, J.-H. and Tai, H.L. (2004). Ladderlike Stepping and Interval Jumping Searching Algorithms for DNA Sequences. In Proc. Second Asia-Pacific Bioinformatics Conference (APBC2004), Dunedin, New Zealand. CRPIT, 29. Chen, Y.-P. P., Ed. ACS. 93-98. |
(local if available)