Empirical Knowledge and Genetic Algorithms for Selection of Amide I Frequencies in Protein Secondary Structure Prediction

Hering, J.A., Haris, P.I. and Innocent, P.R.

    Here we investigate an extension of a previously suggested 'automatic amide I frequency selection procedure' where we introduce an additional criterion utilizing empirical knowledge on regions within the amide I band (1600-1700 cm-1) found to be particularly sensitive to protein secondary structure. We show that the genetic algorithm provides a solution with good protein secondary structure prediction accuracy. Based on an evaluation set of 13 protein infrared spectra from proteins not contained in the reference set, it is demonstrated that our method is capable of making good predictions for proteins it has never seen before during training. In the present study, where the genetic algorithm is guided towards a solution with a higher number of empirically determined, structure sensitive amide I frequencies selected, minor improvement in prediction accuracy for ?-helix and ?-sheet structure could be achieved compared to our previous study, where no such knowledge has been provided. Despite the very limited number of protein spectra in the reference set (18), the neural networks were able to generalize with an overall average of standard errors of prediction of 4.36 % based on the evaluation set of protein spectra, which is even better than that achieved during the analysis based on the reference set of protein spectra (4.8 %). This clearly indicates the potential of our approach once more protein infrared spectra are available to base the analysis on.
Cite as: Hering, J.A., Haris, P.I. and Innocent, P.R. (2004). Empirical Knowledge and Genetic Algorithms for Selection of Amide I Frequencies in Protein Secondary Structure Prediction. In Proc. Second Asia-Pacific Bioinformatics Conference (APBC2004), Dunedin, New Zealand. CRPIT, 29. Chen, Y.-P. P., Ed. ACS. 345-350.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS