Conferences in Research and Practice in Information Technology
  

Online Version - Last Updated - 20 Jan 2012

 

 
Home
 

 
Procedures and Resources for Authors

 
Information and Resources for Volume Editors
 

 
Orders and Subscriptions
 

 
Published Articles

 
Upcoming Volumes
 

 
Contact Us
 

 
Useful External Links
 

 
CRPIT Site Search
 
    

Building a Generic Graph-based Descriptor Set for use in Drug Discovery

Lock, P., Le Mercier, N., Li, J. and Stumptner, M.

    The ability to predict drug activity from molecular structure is an important field of research both in academia and in the pharmaceutical industry. Raw 3D structure data is not in a form suitable for identifying properties using machine learning so it must be reconfigured into descriptor sets that continue to encapsulate important structural properties of the molecule. In this study, a large number of small molecule structures, obtained from publicly available databases, was used to generate a set of molecular descriptors that can be used with machine learning to predict drug activity. The descriptors were for the most part simple graph strings representing chains of connected atoms. Atom counts averaging seventy, using a dataset of just over one million molecules, resulted in a very large set of simple graph strings of lengths two to twelve atoms. Elimination of duplicates, reverse strings and feature reduction techniques were applied to reduce the path count to about three thousand which was viable for machine learning. Training data from twenty six data sets was used to build a decision tree classifier using J48 and Random Forest. Forty three thousand molecules from the NCI HIV dataset were used with the descriptor set to generate decision tree models with good accuracy. A similar algorithm was used to extract ring structures in the molecules. Inclusion of thirteen ring structure descriptors increased the accuracy of prediction.
Cite as: Lock, P., Le Mercier, N., Li, J. and Stumptner, M. (2009). Building a Generic Graph-based Descriptor Set for use in Drug Discovery. In Proc. Eighth Australasian Data Mining Conference (AusDM`09) Melbourne, Australia. CRPIT, 101. Kennedy P. J., Ong K. and Christen P. Eds., ACS. 167-174
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS