Revised: September 28, 2006

Published: October 12, 2006

**Keywords:**algorithms, matrix approximation, projective clustering

**ACM Classification:**G.1.3, F.2.1

**AMS Classification:**68W25, 65F15

**Abstract:**
[Plain Text Version]

Frieze, Kannan, and Vempala (JACM 2004) proved that a small sample
of rows of a given matrix $A$ spans the rows of a low-rank
approximation $D$ that minimizes $\|A-D\|_F$ within a small additive
error, and the sampling can be done efficiently using just two
passes over the matrix. In this paper, we generalize this result in
two ways. First, we prove that the additive error drops
exponentially by iterating the sampling in an adaptive manner
(*adaptive sampling*). Using this result, we give a
pass-efficient algorithm for computing a low-rank approximation with
reduced additive error. Our second result is that there exist $k$
rows of $A$ whose span contains the rows of a multiplicative
$(k+1)$-approximation to the best rank-$k$ matrix; moreover, this
subset can be found by sampling $k$-subsets of rows from a natural
distribution (*volume sampling*). Combining *volume
sampling* with *adaptive sampling* yields the existence of a
set of $k+k(k+1)/\epsilon$ rows whose span contains the rows of a
multiplicative $(1+\epsilon)$-approximation. This leads to a PTAS for
the following NP-hard projective clustering problem: Given a set $P$
of points in $\mathbb{R}^d$, and integers $j$ and $k$, find subspaces
$F_1, \dotsc, F_j$, each of dimension at most $k$, that minimize
$\sum_{p \in P} \min_{i} d(p,F_i)^2$.