big_SVD | R Documentation |
An algorithm for partial SVD (or PCA) of a Filebacked Big Matrix through the eigen decomposition of the covariance between variables (primal) or observations (dual). Use this algorithm only if there is one dimension that is much smaller than the other. Otherwise use big_randomSVD.
big_SVD( X, fun.scaling = big_scale(center = FALSE, scale = FALSE), ind.row = rows_along(X), ind.col = cols_along(X), k = 10, block.size = block_size(nrow(X)) )
X |
An object of class FBM. |
fun.scaling |
A function with parameters \frac{X_{i,j} - center_j}{scale_j}. Default doesn't use any scaling.
You can also provide your own |
ind.row |
An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices. |
ind.col |
An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices. |
k |
Number of singular vectors/values to compute. Default is |
block.size |
Maximum number of columns read at once. Default uses block_size. |
To get X = U \cdot D \cdot V^T,
if the number of observations is small, this function computes K_(2) = X \cdot X^T \approx U \cdot D^2 \cdot U^T and then V = X^T \cdot U \cdot D^{-1},
if the number of variable is small, this function computes K_(1) = X^T \cdot X \approx V \cdot D^2 \cdot V^T and then U = X \cdot V \cdot D^{-1},
if both dimensions are large, use big_randomSVD instead.
A named list (an S3 class "big_SVD") of
d
, the singular values,
u
, the left singular vectors,
v
, the right singular vectors,
center
, the centering vector,
scale
, the scaling vector.
Note that to obtain the Principal Components, you must use predict on the result. See examples.
Large matrix computations are made block-wise and won't be parallelized
in order to not have to reduce the size of these blocks.
Instead, you may use Microsoft R Open
or OpenBLAS in order to accelerate these block matrix computations.
You can also control the number of cores used with
bigparallelr::set_blas_ncores()
.
prcomp
set.seed(1) X <- big_attachExtdata() n <- nrow(X) # Using only half of the data ind <- sort(sample(n, n/2)) test <- big_SVD(X, fun.scaling = big_scale(), ind.row = ind) str(test) plot(test$u) pca <- prcomp(X[ind, ], center = TRUE, scale. = TRUE) # same scaling all.equal(test$center, pca$center) all.equal(test$scale, pca$scale) # scores and loadings are the same or opposite # except for last eigenvalue which is equal to 0 # due to centering of columns scores <- test$u %*% diag(test$d) class(test) scores2 <- predict(test) # use this function to predict scores all.equal(scores, scores2) dim(scores) dim(pca$x) tail(pca$sdev) plot(scores2, pca$x[, 1:ncol(scores2)]) plot(test$v[1:100, ], pca$rotation[1:100, 1:ncol(scores2)]) # projecting on new data X2 <- sweep(sweep(X[-ind, ], 2, test$center, '-'), 2, test$scale, '/') scores.test <- X2 %*% test$v ind2 <- setdiff(rows_along(X), ind) scores.test2 <- predict(test, X, ind.row = ind2) # use this all.equal(scores.test, scores.test2) scores.test3 <- predict(pca, X[-ind, ]) plot(scores.test2, scores.test3[, 1:ncol(scores.test2)])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.