|
| | | |
Cyberbullying Detection based on Text-Stream Classification
Nahar, V., Li, X., Pang, C. and Zhang, Y.
Current studies on cyberbullying detection, under
text classification, mainly assume that the streaming text can be fully labelled. However, the exponential growth of unlabelled data in online content
makes this assumption impractical. In this paper,
we propose a session-based framework for automatic
detection of cyberbullying from the huge amount of
unlabelled streaming text. Given that the streaming data from Social Networks arrives in large volume
at the server system, we incorporate an ensemble of
one-class classifiers in the session-based framework.
The proposed framework addresses the real world scenario, where only a small set of positive instances are
available for initial training. Our main contribution
in this paper is to automatically detect cyberbullying in real world situations, where labelled data is
not readily available. Our early results show that
the proposed approach is reasonably effective for the
automatic detection of cyberbullying on Social Net-
works. The experiments indicate that the ensemble
learner outperforms the single window and fixed window approaches, while learning is from positive and
unlabelled data. |
Cite as: Nahar, V., Li, X., Pang, C. and Zhang, Y. (2013). Cyberbullying Detection based on Text-Stream Classification. In Proc. Eleventh Australasian Data Mining Conference (AusDM13) Canberra, Australia. CRPIT, 146. Christen, P., Kennedy, P., Liu, L., Ong, K.L., Stranieri, A. and Zhao, Y. Eds., ACS. 49-57 |
(from crpit.com)
(local if available)
|
|