|
| | | |
Clustering and Classification of Maintenance Logs using Text Data Mining
Edwards, B., Zatorsky, M. and Nayak, R.
Spreadsheets applications allow data to be stored with low
development overheads, but also with low data quality.
Reporting on data from such sources is difficult using
traditional techniques. This case study uses text data
mining techniques to analyse 12 years of data from dam
pump station maintenance logs stored as free text in a
spreadsheet application. The goal was to classify the data
as scheduled maintenance or unscheduled repair jobs.
Data preparation steps required to transform the data
into a format appropriate for text data mining are
discussed. The data is then mined by calculating term
weights to which clustering techniques are applied.
Clustering identified some groups that contained
relatively homogeneous types of jobs. Training a
classification model to learn the cluster groups allowed
those jobs to be identified in unseen data. Yet clustering
did not provide a clear overall distinction between
scheduled and unscheduled jobs.
With some manual analysis to code a target variable
for a subset of the data, classification models were trained
to predict the target variable based on text features. This
was achieved with a moderate level of accuracy. |
Cite as: Edwards, B., Zatorsky, M. and Nayak, R. (2008). Clustering and Classification of Maintenance Logs using Text Data Mining. In Proc. Seventh Australasian Data Mining Conference (AusDM 2008), Glenelg, South Australia. CRPIT, 87. Roddick, J. F., Li, J., Christen, P. and Kennedy, P. J., Eds. ACS. 193-199. |
(from crpit.com)
(local if available)
|
|