Sentiment classification of blog posts using topical extracts

Zhou, Z., Zhang, X. and Vines, P.

Unlike news stories and product reviews which usually have a strong focus on a single topic, blog posts are often unstructured, and opinions expressed in blog posts do not necessarily correspond to a specific topic. This can lead to unsatisfactory performance of sentiment classification. In this paper we report our pilot study on addressing topic drift in blogs. We examine this phenomenon by manual inspection and extablish a ground truth. Our annotations have shown that topic drift is indeed very common, with all documents sampled showing a considerable degree of drift, averaging over 80%. The topical sentences are extracted from each post to produce an extract data set. We propose to address the topical drift problem by classifying the blog posts using the sentence-level polarities of topical extracts. We propose and evaluate two models for aggregating the sentence polarities by comparing their performance to that of a popular word-based model. Our preliminary results suggest that topical extracts can provide a concise but more accurate representation of the sentiment polarity of the blog posts. More importantly, sentence-level polarities are potentially a more reliable evidence than word distributions with regard to document polarity prediction

Cite as: Zhou, Z., Zhang, X. and Vines, P. (2012). Sentiment classification of blog posts using topical extracts. In Proc. Australasian Database Conference (ADC 2012) Melbourne, Australia. CRPIT, 124. Zhang, R. and Zhang, Y. Eds., ACS. 71-80

(from crpit.com) (local if available)