|
10 Feb 2004
STAMP: On Discovery of Statistically Important Pattern Repeats in Long
Sequential Data

Speaker: CAO Huiping
Abstract
This paper focuses on mining periodic patterns allowing
some degree of imperfection in the form of random replacement
from a perfect periodic pattern. It proposes a
new metric, namely generalized information gain, to identify
patterns with events of vastly different occurrence frequencies
and to adjust for the deviation from a pattern. In particular, a
penalty is allowed to be associated with gaps between pattern
occurrences. This is particularly useful in locating repeats in
DNA sequences. In this paper, the authors present an effective mining
algorithm, STAMP, to simultaneously mine significant patterns
and the associated subsequences under the model of generalized
information gain.
|