HKU Research  The University of Hong Kong
Department of Computer Science
Feature
home
current research
people
publications
HKU CS

 

10 Feb 2004

STAMP: On Discovery of Statistically Important Pattern Repeats in Long Sequential Data
Line
Speaker: CAO Huiping

Abstract

This paper focuses on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. It proposes a new metric, namely generalized information gain, to identify patterns with events of vastly different occurrence frequencies and to adjust for the deviation from a pattern. In particular, a penalty is allowed to be associated with gaps between pattern occurrences. This is particularly useful in locating repeats in DNA sequences. In this paper, the authors present an effective mining algorithm, STAMP, to simultaneously mine significant patterns and the associated subsequences under the model of generalized information gain.

Back to the top

Comment?  Send to dbgroup@cs.hku.hk