Conferences in Research and Practice in Information Technology
  

Online Version - Last Updated - 20 Jan 2012

 

 
Home
 

 
Procedures and Resources for Authors

 
Information and Resources for Volume Editors
 

 
Orders and Subscriptions
 

 
Published Articles

 
Upcoming Volumes
 

 
Contact Us
 

 
Useful External Links
 

 
CRPIT Site Search
 
    

Detecting Topic Labels for Tweets by Matching Features from Pseudo-Relevance Feedback

Zhang, J., Liu, D., Ong, K.L., Li, Z. and Li, M.

    Detecting a suitable topic label for short texts, e.g. tweets from Twitter, is an important component in many applications including diversity ranking, clustering, information retrieval, and information filtering. To automatically detect topic labels however is a major challenge. The character limit of a short text means the lack of a significant feature space to adequately describe its content in relation to other short texts in a given collection. Therefore, methods like LDA, TF-IDF or similarity measures all fail due to their sensitivity to a small feature space. And when a collection of related short texts are considered, e.g., from a Twitter search, the result set collectively exhibits sparsity and high dimensionality { a nightmare for information processing. A solution to this problem is to expand the feature space through a process known as pseudo-relevance feedback. Unfortunately, they disappoint when subjected to real-world conditions. The fundamental problem lie in the level of noise present in both the short texts and the feedback source, which is often the World Wide Web. We propose a novel pseudo-relevance feedback algorithm to accurately identify topic labels for short texts. Our algorithm robustly handles noise in both the short texts and the feedback source through a method called `feature matching'. Empirical results confirm the efficacy of our algorithm.
Cite as: Zhang, J., Liu, D., Ong, K.L., Li, Z. and Li, M. (2012). Detecting Topic Labels for Tweets by Matching Features from Pseudo-Relevance Feedback. In Proc. Data Mining and Analytics 2012 (AusDM 2012) Sydney, Australia. CRPIT, 134. Zhao, Y., Li, J. , Kennedy, P.J. and Christen, P. Eds., ACS. 9 - 20
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS