Approximation Clustering Models and Methods For Various Data Formats

Boris Mirkin
DIMACS, Rutgers University

Abstract

Clustering is a discipline devoted to finding and describing homogeneous groups of data entities. In contrast to conventional clustering which involves data processing in terms of either entities or variables, approximation clustering is aimed at processing the data matrices as they are. The principal idea is to approximate a given data table by a ``cleaned'' model matrix corresponding to a cluster structure. We consider three types of data tables (those of (dis)similarity, object-to-variable, and of contingency) and three types of cluster structures (single clusters, partitions, and hierarchies). This leads to putting a considerable part of the existing clustering techniques into a unified mathematical framework along with producing advanced computational procedures and interpretation aids.
 

Luis Gravano
gravano@cs.columbia.edu