|
| | | |
S2MP: Similarity Measure for Sequential Patterns
Saneifar, H., Bringay, S., Laurent, A. and Teisseire, M.
In data mining, computing the similarity of objects is an
essential task, for example to identify regularities or to
build homogeneous clusters of objects. In the case of
sequential data seen in various fields of application (e.g.
series of customers purchases, Internet navigation) this
problem (i.e. comparing the similarity of sequences) is
very important. There are already some similarity measures
as Edit distance and LCS suited to simple sequences,
but these measures are not relevant in the case of complex
sequences composed of sets of items, as is the case
of sequential patterns. In this paper, we propose a new
similarity measure taking the characteristics of sequential
patterns into account. S2MP is an adjustable measure
depending on the importance given to each characteristic
of sequential patterns according to context, which is not
the case of existing measures. We have experimented the
accuracy and quality of S2MP against Edit distance by
using them in a clustering of sequential patterns. The results
show that the clusters obtained by S2MP are more
homogeneous. Moreover these cluster are more precise
and more complete according to the clusters obtained using
Edit distance. The experiments show also that S2MP
is efficient in term of calculation time and size of used
memory. |
Cite as: Saneifar, H., Bringay, S., Laurent, A. and Teisseire, M. (2008). S2MP: Similarity Measure for Sequential Patterns. In Proc. Seventh Australasian Data Mining Conference (AusDM 2008), Glenelg, South Australia. CRPIT, 87. Roddick, J. F., Li, J., Christen, P. and Kennedy, P. J., Eds. ACS. 95-104. |
(from crpit.com)
(local if available)
|
|