Decreasing Uncertainty for Improvement of Relevancy Prediction

Zhang, L., Li, Y. and Bijaksana, M.A.

    As one of the key techniques of Information Retrieval (IR) and Information Filtering (IF), Text Classification focuses on classifying textual documents into predefined categories through relative classifiers learned from labelled or unlabelled training samples. Binary text classifiers is the main branch of Text Classification, involving the relevance prediction of documents to users or categories. However, the current binary text classifiers cannot clearly describe the difference between relevant and irrelevant information because of knowledge uncertainty owing to the imperfection of the knowledge mining techniques and the limitation of feature selection methods. This paper proposes a relevance prediction model by decreasing the relative uncertainty to improve the performance of binary text classification. It tries to form and train the decision boundary through partitioning the training samples into three regions (the positive, boundary and negative regions) to assure the discrimination of extracted knowledge for describing relevant and irrelevant information. It then produces six decision rules corresponding with six different situations of the related objects to help make relevance predications for those objects. A large number of experiments have been conducted on two standard datasets including RCV1 and Reuters21578. The experiment results show that the proposed model has significantly improved the performance of binary text classification, thus proved to be effective and promising.
Cite as: Zhang, L., Li, Y. and Bijaksana, M.A. (2014). Decreasing Uncertainty for Improvement of Relevancy Prediction. In Proc. Twelfth Australasian Data Mining Conference (AusDM14) Brisbane, Australia. CRPIT, 158. Li, X., Liu, L., Ong, K.L. and Zhao, Y. Eds., ACS. 157-164
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS