Conferences in Research and Practice in Information Technology
  

Online Version - Last Updated - 20 Jan 2012

 

 
Home
 

 
Procedures and Resources for Authors

 
Information and Resources for Volume Editors
 

 
Orders and Subscriptions
 

 
Published Articles

 
Upcoming Volumes
 

 
Contact Us
 

 
Useful External Links
 

 
CRPIT Site Search
 
    

Building a Dynamic Classifier for Large Text Data Collections

Kalinov, P., Stantic, B. and Sattar, A.

    Due to the lack of in-built tools to navigate the web, people have to use external solutions to find information. The most popular of these are search engines and web directories. Search engines allow users to locate specific information about a particular topic, whereas web directories facilitate exploration over a wider topic. In the recent past, statistical machine learning methods have been successfully exploited in search engines. Web directories remained in their primitive state, which resulted in their decline. Exploration however is a task which answers a different information need of the user and should not be neglected. Web directories should provide a user experience of the same quality as search engines. Their development by machine learning methods however is hindered by the noisy nature of the web, which makes text classifiers unreliable when applied to web data. In this paper we propose Stochastic Prior Distribution Adjustment (SPDA) - a variation of the Multi- nomial Na ̈ıve Bayes (MNB) classifier which makes it more suitable to classify real-world data. By stochastically adjusting class prior distributions we achieve a better overall success rate, but more importantly we also significantly improve error distribution across classes, making the classifier equally reliable for all classes and therefore more usable.
Cite as: Kalinov, P., Stantic, B. and Sattar, A. (2010). Building a Dynamic Classifier for Large Text Data Collections. In Proc. 21st Australasian Database Conference (ADC 2010) Brisbane, Australia. CRPIT, 104. Shen H.T. and Bouguettaya, A. Eds., ACS. 113-122
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS