|
| | | |
Decreasing Uncertainty for Improvement of Relevancy Prediction
Zhang, L., Li, Y. and Bijaksana, M.A.
As one of the key techniques of Information Retrieval
(IR) and Information Filtering (IF), Text Classification
focuses on classifying textual documents into predefined categories through relative classifiers learned
from labelled or unlabelled training samples. Binary
text classifiers is the main branch of Text Classification,
involving the relevance prediction of documents
to users or categories. However, the current binary
text classifiers cannot clearly describe the difference
between relevant and irrelevant information because
of knowledge uncertainty owing to the imperfection of
the knowledge mining techniques and the limitation
of feature selection methods. This paper proposes a
relevance prediction model by decreasing the relative
uncertainty to improve the performance of binary text
classification. It tries to form and train the decision
boundary through partitioning the training samples
into three regions (the positive, boundary and negative
regions) to assure the discrimination of extracted
knowledge for describing relevant and irrelevant information.
It then produces six decision rules corresponding
with six different situations of the related
objects to help make relevance predications for those
objects. A large number of experiments have been
conducted on two standard datasets including RCV1
and Reuters21578. The experiment results show that
the proposed model has significantly improved the
performance of binary text classification, thus proved
to be effective and promising. |
Cite as: Zhang, L., Li, Y. and Bijaksana, M.A. (2014). Decreasing Uncertainty for Improvement of Relevancy Prediction. In Proc. Twelfth Australasian Data Mining Conference (AusDM14) Brisbane, Australia. CRPIT, 158. Li, X., Liu, L., Ong, K.L. and Zhao, Y. Eds., ACS. 157-164 |
(from crpit.com)
(local if available)
|
|