Dan Taylor     About     Archive     Feed     Misc     Sunday School

The Singular Value Decomposition and Principle Component Analysis

As a follow up to the the last post, I want to briefly discuss the relationship between the SVD and Principle Component Analysis (PCA).

SVD

The last post highlighted the fact that the existence of the SVD proves that an $m \times n$ matrix $A$ maps unit vectors from $\mathbb{C}^n$ to a hyperellipse in $\mathbb{C}^m$. In this way, if we write

\[\begin{equation} A = U \Sigma V^* \end{equation}\]

then the columns of $U$ are the principle semiaxes of the hyperellipse and the columns of $V$ are their respective preimages.

Principle Component Analysis

Principle Component Analysis is a method employed in data analysis for finding the primary directions of variation in the data. Here we provide a brief explanation.

Given $m$ measurements, $a_1, a_2, \ldots , a_m$ of an $n$ dimensional vector, combine these measurements into an $m \times n$ matrix, $A$. For what follows it is crucial that we also center the columns by subtracting their means, so that the mean of each dimension is now zero. Then the covariance matrix of the measurements is given by

\[\begin{equation} S = \frac{1}{m-1}A^TA \end{equation}\]

The matrix $S$ describes the variation in the data, $A$. Now, all that we need to do is find the directions in which we have the most variation. This can be done by computing the eigendecomposition of the matrix $S$.

\[S = X \Lambda X^{-1}\]

where $\Lambda$ is a diagonal matrix of eigenvalues sorted from largest to smallest and $X$ is the matrix whose columns are formed by the corresponding eigenvectors. These eigenvectors point in the direction of greatest variation. We note that $S$ is positive semi-definite so this decompostion exists. Then, it is common to keep as many columns (dimensions) as necessary, $k$ to meet some threshold. We can measure how much information we are keeping by comparing the sum of the $k$ (largest) corresponding eigenvalues to the sum of all the eigenvalues.

PCA is usually explained in terms of the eigendecomposition of the covariance matrix of the data, just as we have done here. But, as we will see next it can be explained just as well using the SVD.

PCA vs SVD

For this section, we assume that the matrix $A$ is real. Using (1), we can rewrite (2) as

\[\begin{align*} S &= \frac{1}{m-1}(U \Sigma V^T)^T(U \Sigma V^T)\\ &= \frac{1}{m-1} V\Sigma^TU^TU\Sigma V^T \end{align*}\]

Note that $U$ and $V$ are unitary so the last equation becomes

\[\begin{equation} S = V\frac{\Sigma^2}{m-1}V^T \end{equation}\]

Comparing (3) with (2) we see that

\[\begin{align} X &= V\\ \Lambda &= \frac{\Sigma^2}{m-1} \end{align}\]

That is, the matrix describing the direction of greatest variation in the data, $X$ is the same as the matrix containing the preimages of the semiaxes of the hyperellipse resulting from applying the matrix $A$ to the unit sphere. The eigenvalues, $\lambda_i$ of the covariance matrix correspond to the singular values of $A$, $\sigma_i$ by the relation

\[\lambda_i = \frac{\sigma_i^2}{m-1}\]