|
|
Abstract
Clustering is the process of grouping a set of objects into classes of
similar objects. Although definitions of similarity vary from one
clustering model to another, in most of these models the concept of
similarity is based on distances, e.g., Euclidean distance or cosine
distance. In other words, similar objects are required to have close
values on at least a set of dimensions. In this paper, we explore a more
general type of similarity. Under the pCluster model we proposed, two
objects are similar if they exhibit a coherent pattern on a subset of
dimensions. For instance, in DNA microarray analysis, the expression
levels of two genes may rise and fall synchronously in response to a set
of environmental stimuli. Although the magnitude of their expression
levels may not be close, the patterns they exhibit can be very much alike.
Discovery of such clusters of genes is essential in revealing significant
connections in gene regulatory networks. E-commerce applications, such as
collaborative filtering, can also benefit from the new model, which
captures not only the closeness of values of certain leading indicators
but also the closeness of (purchasing, browsing, etc.) patterns exhibited
by the customers. Our paper introduces an effective algorithm to detect
such clusters, and we perform tests on several real and synthetic data
sets to show its effectiveness.
Read the Presentation
Slides...
Referred Papers
|