|
| | | |
Integrated Scoring For Spelling Error Correction, Abbreviation Expansion and Case Restoration in Dirty Text
Wong, W., Liu, W. and Bennamoun, M.
An increasing number of language and speech applications are gearing towards the use of texts from online sources as input. Despite such rise, not much work can be found in the aspect of integrated approaches for cleaning dirty texts from online sources. This paper presents a mechanism of Integrated Scoring for Spelling error correction, Abbreviation expansion and Case restoration (ISSAC). The idea of ISSAC was first conceived as part of the text preprocessing phase in an ontology engineering project. Evaluations of ISSAC using 400 chat records reveal an improved accuracy of 96.5% over the existing 74.4% based on the use of Aspell only. |
Cite as: Wong, W., Liu, W. and Bennamoun, M. (2006). Integrated Scoring For Spelling Error Correction, Abbreviation Expansion and Case Restoration in Dirty Text. In Proc. Fifth Australasian Data Mining Conference (AusDM2006), Sydney, Australia. CRPIT, 61. Peter, C., Kennedy, P. J., Li, J., Simoff, S. J. and Williams, G. J., Eds. ACS. 83-89. |
(from crpit.com)
(local if available)
|
|