Phylogenetic Tree of Prokaryotes Based on the Complete Genomes using Fractal and Correlation Analyses

Yu, Z.-G. and Anh, V.

    We develop a fast algorithm for deriving species phylogeny based on the measure representations of DNA sequences and protein sequences proposed in our previous papers (Yu et al., Phys. Rev. E 64, 031903 (2003); Phys. Rev. E 68, 021913 (2003)). Due to the way they are constructed, these two measures will be treated as a random multiplicative cascades. Such multiplicative cascades commonly have built-in multifractal structures. In this paper we propose to use an iterated function system (IFS) model to simulate the multifractal structures. After removing the mulifractal structures from the original measures, the two kinds of obtained sequences will become stationary time series suitable for cross-correlation analysis. We then can define two kinds of correlation distances between two organisms using these obtained sequences respectively. Using a large data set of prokaryote genomes, we produce two species trees that are largely in agreement with previously published trees using different methods. These trees also agree well with currently accepted phylogenetic theory.
Cite as: Yu, Z.-G. and Anh, V. (2004). Phylogenetic Tree of Prokaryotes Based on the Complete Genomes using Fractal and Correlation Analyses. In Proc. Second Asia-Pacific Bioinformatics Conference (APBC2004), Dunedin, New Zealand. CRPIT, 29. Chen, Y.-P. P., Ed. ACS. 321-326.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS