|
| | | |
Towards a Feature Rich Model for Predicting Spam Emails containing Malicious Attachments and URLs
Tran, K.N., Alazab, M. and Broadhurst, R.
Malicious content in spam emails is increasing in the
form of attachments and URLs. Malicious attachments
and URLs attempt to deliver software that can
compromise the security of a computer. These malicious
attachments also try to disguise their content to avoid
virus scanners used by most email services to screen for
such risks. Malicious URLs add another layer of disguise,
where the email content tries to entice the recipient to
click on a URL that links to a malicious Web site or
downloads a malicious attachment. In this paper, based on
two real world data sets we present our preliminary
research on predicting the kind of spam email most likely
to contain these highly dangerous spam emails. We
propose a rich set of features for the content of emails to
capture regularities in emails containing malicious
content. We show these features can predict malicious
attachments within an area under the precious recall curve
(AUC-PR) up to 95.2%, and up to 68.1% for URLs. Our
work can help reduce reliance on virus scanners and URL
blacklists, which often do not update as quickly as the
malicious content it attempts to identify. Such methods
could reduce the many different resources now needed to
identify malicious content. |
Cite as: Tran, K.N., Alazab, M. and Broadhurst, R. (2013). Towards a Feature Rich Model for Predicting Spam Emails containing Malicious Attachments and URLs. In Proc. Eleventh Australasian Data Mining Conference (AusDM13) Canberra, Australia. CRPIT, 146. Christen, P., Kennedy, P., Liu, L., Ong, K.L., Stranieri, A. and Zhao, Y. Eds., ACS. 161-171 |
(from crpit.com)
(local if available)
|
|