These notes contain some of the considerations I made, when writing the JvCoords package. These notes are likely not of general interest.
library(microbenchmark)
Operating on columns seems noticeably faster than operating on the rows of a matrix.
n <- 100 x <- matrix(rnorm(n*n), n, n) microbenchmark( colSums(x), rowSums(x))
A vector operating on a matrix operates on the rows of the matrix (because matrices are stored column by column):
A <- matrix(1, 3, 4) x <- 1:3 A + x A * x
In order to operate on columns, we can either transpose the matrix before
and after the operation, or (mis-)use the scale()
function:
A <- matrix(2, 3, 4) x <- 1:4 t(t(A) + x) sweep(A, 2, x, FUN="+") scale(A, center = -x, scale = FALSE) A <- matrix(rnorm(10000), 100, 100) x <- rnorm(100) microbenchmark( t(t(A) + x), sweep(A, 2, x, FUN="+"), scale(A, center = -x, scale = FALSE))
X <- matrix(2, 3, 4) a <- 1:4 t(t(X) * a) sweep(X, 2, a, FUN="*") X %*% diag(a) scale(X, center = FALSE, scale = 1/a) X <- matrix(rnorm(10000), 100, 100) a <- rnorm(100) microbenchmark( t(t(X) * a), sweep(X, 2, a, FUN="*"), X %*% diag(a), scale(X, center = FALSE, scale = 1/a))
The best way to compute column averages of a matrix seems to be colMeans()
:
n <- 100 x <- matrix(rnorm(n*n), n, n) microbenchmark( colMeans(x), colSums(x) / n, apply(x, 2, mean), check = function(x) {isTRUE(all.equal(x[[1]], x[[2]]))})
n <- 100 x <- matrix(rnorm(n*n), n, n) microbenchmark( apply(x, 2, sd), sqrt(colSums(x^2) / (n-1) - colSums(x)^2 / n / (n-1)), check = function(x) {isTRUE(all.equal(x[[1]], x[[2]]))})
Column means and standard deviations can also be found by inspecting the
corresponding attributes in the output of the scale()
function
n <- 100 x <- matrix(rnorm(n*n), n, n) microbenchmark( scale(x, center=TRUE, scale=FALSE), scale(x, center=FALSE, scale=TRUE), scale(x, center=TRUE, scale=TRUE) )
The (reduced) SVD
$$A = U D V^\top$$
of an $n \times p$-matrix $A$ can be
computed using the La.svd()
function in R:
This returns matrices s$u
and s$vt
and a vector s$d
such that
A == s$u %*% diag(s$d) %*% s$vt
.
X <- matrix(1:15, 3, 5) s <- La.svd(X) all.equal(X, s$u %*% diag(s$d) %*% s$vt) str(s) round(t(s$u) %*% s$u, 3) round(s$vt %*% t(s$vt), 3)
X <- matrix(1:15, 5, 3) s <- La.svd(X) all.equal(X, s$u %*% diag(s$d) %*% s$vt) str(s) round(t(s$u) %*% s$u, 3) round(s$vt %*% t(s$vt), 3)
The length of the vector s$d
is always min(n, p)
. Since this is a
reduced SVD, the matrices $U$ (for $n > p$) and $V$ (for $n < p$) are not
square, and thus not orthogonal. But we always have
$$
V^\top V = I_{\min(n,p)}
\qquad\mbox{and}\qquad
U^\top U = I_{\min(n,p)},
$$
i.e. the matrices $U$ and $V$ have orthogonal columns.
The function La.svd()
has two arguments, nu
and nv
, which control
how many columns of $U$ and $V$, respectively, are required. The help
page promised that "The computation will be more efficient if both
nu <= min(n, p)
and nv <= min(n, p)
, and even more so if both are zero."
Experimentally, it seems that the only case which makes a difference is the
case where both nu=0
and nv=0
:
n <- 100 X <- matrix(rnorm(n*n), n, n) microbenchmark( La.svd(X), La.svd(X, nu=0), La.svd(X, nv=0), La.svd(X, nu=1, nv=1), La.svd(X, nu=1, nv=0), La.svd(X, nu=0, nv=1), La.svd(X, nu=0, nv=0) )
In this section we discuss methods the shift and scale each individual variable of a multivariate data set so that each variable has meanĀ $0$ and varinanceĀ $1$.
It seems that the scale()
function is not an efficient way to
standardize data:
n <- 100 x <- matrix(rnorm(n*n), n, n) std1 <- function(x) { scale(x) } std2 <- function(x) { n <- nrow(x) col.mean <- colMeans(x) s <- sqrt(colSums(x^2) / (n-1) - colSums(x)^2 / n / (n-1)) t((t(x) - col.mean) / s) } std3 <- function(x) { n <- nrow(x) col.mean <- colMeans(x) s <- sqrt((colSums(x^2) - col.mean^2*n) / (n-1)) t((t(x) - col.mean) / s) } std4 <- function(x) { col.mean <- colMeans(x) xt <- t(x) - col.mean n <- ncol(xt) s <- sqrt(rowSums(xt^2) / (n-1)) t(xt / s) } check.fn <- function(values) { all(sapply(values, function(x) { isTRUE(all.equal(values[[1]], x, check.attributes = FALSE)) })) } microbenchmark(std1(x), std2(x), std3(x), std4(x), times=500L, check = check.fn)
We list the code to transform (additional) data to and from the space of the standardized data.
n <- 100 x <- matrix(rnorm(n*n), n, n) n <- nrow(x) col.mean <- colMeans(x) s <- sqrt((colSums(x^2) - col.mean^2*n) / (n-1)) y <- t((t(x) - col.mean) / s)
$$ y = D^{-1}(x-m) $$
x2 <- x[1,] y2 <- (x2 - col.mean) / s stopifnot(isTRUE(all.equal(y2, y[1,])))
$$ x = Dy + m $$
y2 <- y[1,] x2 <- y2 * s + col.mean stopifnot(isTRUE(all.equal(x2, x[1,])))
The aim of whitening data $X \in \mathbb{R}^{n\times p}$ is to find an affine coordinate transform $Y = AX + b$, such that the column sums of $Y$ all equal $0$ and $\mathrm{Cov}(Y) = I_q$, where $q$ is the rank of the matrix $X$ after centering. There are infinitely many ways to whiten $X$.
Here we use an approach to compute $Y = C^{-1/2} (X - \bar X)$, where $X - \bar X$ denotes the matrix X with the column means subtracted from each column, and $C = \mathrm{Cov}(X)$: Using SVD we can write $X - \bar X$ as $X - \bar X = U D V^\top$ and thus we get $$ C = \mathrm{Cov}(X - \bar X) = \frac{1}{n-1} (X - \bar X)^\top (X - \bar X) = \frac{1}{n-1} V D U^\top U D V^\top = \frac{1}{n-1} V D^2 V^\top $$
This works both for $n \leq p$ and for $n \geq p$.
n <- 3 p <- 20 X <- matrix(rnorm(n*p), n, p) X <- t(t(X) - colMeans(X)) s <- La.svd(X) C <- t(s$vt) %*% diag(s$d^2) %*% s$vt / (n-1) all.equal(C, cov(X))
The matrix $C \in \mathbb{R}^{p \times p}$ has rank at most $\min(n-1, p)$. The rank can be lower, if rows or columns of $X$ are linearly dependent. For $n > p$, and if $C$ is invertible, the matrix $C^{-1/2}$ can be found as $$ C^{-1/2} = \sqrt{n-1} V D^{-1} V^\top. $$
n <- 20 p <- 7 X <- matrix(rnorm(n*p), n, p) X <- t(t(X) - colMeans(X)) s <- La.svd(X) Q <- t(s$vt) %*% diag(1/s$d) %*% s$vt * sqrt(n-1) C <- cov(X) all.equal(Q %*% C %*% Q, diag(7))
In this case the data can be whitened by multiplying it with $C^{-1/2}$ from the right:
n <- 100 p <- 5 X <- matrix(rnorm(n * p), n, p) X <- t(t(X) - colMeans(X)) s <- La.svd(X, nu = 0) Q <- sqrt(n-1) * t(s$vt) %*% diag(1/s$d) %*% s$vt Y <- X %*% Q round(cov(Y), 3)
Since, for $n > p$, we have $V\in\mathbb{R}^{p\times p}$, we have $Y V \in \mathbb{R}^{n\times p}$ and $$ \mathrm{Cov}(Y V) = \frac{1}{n-1} V^\top Y^\top Y V = V^\top V = I_p. $$ Thus, instead of $Y$, we can use $\tilde Y = Y V = \sqrt{n-1} X V D^{-1} V^\top V = \sqrt{n-1} X V D^{-1}$ as a whitened version of $X$, thus slightly reducing computational cost:
n <- 100 p <- 5 X <- matrix(rnorm(n * p), n, p) X <- t(t(X) - colMeans(X)) s <- La.svd(X, nu = 0) Q <- sqrt(n-1) * t(s$vt) %*% diag(1/s$d) Y <- X %*% Q round(cov(Y), 3)
If $q = \mathrm{rank}(X - \bar X) < p$, and in particular if $n \leq p$,
the data is contained in an affine sub-space of dimension $q \leq \min(n-1, p)$,
and in order to have identity covariance, the data must be projected down into a
space of dimensionĀ $q$. We can find $q$ as the number of non-zero entries
of s$d
.
n <- 5 p <- 100 X <- matrix(rnorm(n * p), n, p) X[1,] <- X[2,] # now the rank of the centred matrix should be 3 X.centred <- t(t(X) - colMeans(X)) n.comp <- min(n-1, p) s <- La.svd(X.centred, nu = 0, nv = n.comp) eps <- 1e-14 cols <- which(s$d[seq_len(n.comp)] >= eps) vt <- s$vt[cols, , drop=FALSE] d <- s$d[cols] Y <- sqrt(n-1) * X.centred %*% t(vt) %*% diag(1/d) round(cov(Y), 3)
The code shown in the above listing also works for the full rank case, and we use (a variant of) this code in the package.
Again, we list the code required to transform additional data to and from the space of the whitened data.
n <- 100 x <- matrix(rnorm(n*n), n, n) col.mean <- colMeans(x) x.centred <- t(t(x) - col.mean) n <- nrow(x) p <- ncol(x) n.comp <- min(n-1, p) s <- La.svd(x.centred, nu = 0, nv = n.comp) eps <- 1e-14 cols <- which(s$d[seq_len(n.comp)] >= eps) vt <- s$vt[cols, , drop=FALSE] d <- s$d[cols] y <- x.centred %*% t(vt) %*% diag(sqrt(n-1) / d)
The transform to convert into the space of whitened data is $$ y = \sqrt{n-1} D^{-1} \; V^\top (x-m). $$
x1 <- x[1,] y1 <- as.vector(diag(sqrt(n-1) / d) %*% vt %*% (x1 - col.mean)) stopifnot(isTRUE(all.equal(y1, y[1,])))
Using $V V^\top x = x$ for all $x$ in the pre-image of the whitened space (true???), we find that the inverse transform is $$ x = \frac{1}{\sqrt{n-1}} V D y + m $$
y1 <- y[1,] x1 <- as.vector(t(vt) %*% diag(d / sqrt(n-1)) %*% y1 + col.mean) stopifnot(isTRUE(all.equal(x1, x[1,])))
Again, we list the code required to transform additional data to and from the space of the whitened data.
n <- 100 x <- matrix(rnorm(n*n), n, n) col.mean <- colMeans(x) x.centred <- t(t(x) - col.mean) n <- nrow(x) p <- ncol(x) n.comp <- min(n-1, p) s <- La.svd(x.centred, nu = 0, nv = n.comp) eps <- 1e-14 cols <- which(s$d[seq_len(n.comp)] >= eps) vt <- s$vt[cols, , drop=FALSE] y <- x.centred %*% t(vt) colnames(y) <- paste0('PC', 1:ncol(y))
For comparison, we compute the same result using prcomp()
:
m <- prcomp(x, rank.=99) stopifnot(isTRUE(all.equal(abs(y), abs(m$x))))
The transform to convert into the PCA space is $$y = V^\top (x-m).$$
x1 <- x[1,] y1 <- as.numeric(vt %*% (x1 - col.mean)) stopifnot(isTRUE(all.equal(y1, y[1,], check.attributes = FALSE)))
Using $V V^\top x = x$ for all $x$ in the pre-image of the whitened space (true???), we find that the inverse transform is $$ x = V y + m $$
y1 <- y[1,] x1 <- as.vector(t(vt) %*% y1 + col.mean) stopifnot(isTRUE(all.equal(x1, x[1,])))
In this section we load the jvcoords
library and compare the speed
of our functions to the speed of alternative implementations from
the standard libary.
library(jvcoords)
x <- iris[,1:4] check.fn <- function(values) { all(sapply(values, function(x) { isTRUE(all.equal(values[[1]], x, check.attributes = FALSE)) })) } microbenchmark(jvcoords={m<-standardize(x);m$y}, scale(x), check = check.fn)
loadNamespace("whitening") x <- as.matrix(iris[,1:4]) microbenchmark(jvcoords=whiten(x), whitening=whitening::whiten(x))
x <- iris[,1:4] microbenchmark(PCA(x), prcomp(x), princomp(x), times=1000L)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.