# predict.big_SVD: Scores of PCA In bigstatsr: Statistical Tools for Filebacked Big Matrices

## Description

Get the scores of PCA associated with an svd decomposition (class big_SVD).

## Usage

 1 2 3 4 5 6 7 8 9 ## S3 method for class 'big_SVD' predict( object, X = NULL, ind.row = rows_along(X), ind.col = cols_along(X), block.size = block_size(nrow(X)), ... ) 

## Arguments

 object A list returned by big_SVD or big_randomSVD. X An object of class FBM. ind.row An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices. ind.col An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices. block.size Maximum number of columns read at once. Default uses block_size. ... Not used.

## Value

A matrix of size n \times K where n is the number of samples corresponding to indices in ind.row and K the number of PCs computed in object. If X is not specified, this just returns the scores of the training set of object.

predict big_SVD big_randomSVD

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 set.seed(1) X <- big_attachExtdata() n <- nrow(X) # Using only half of the data ind <- sort(sample(n, n/2)) test <- big_SVD(X, fun.scaling = big_scale(), ind.row = ind) str(test) plot(test$u) pca <- prcomp(X[ind, ], center = TRUE, scale. = TRUE) # same scaling all.equal(test$center, pca$center) all.equal(test$scale, pca$scale) # scores and loadings are the same or opposite # except for last eigenvalue which is equal to 0 # due to centering of columns scores <- test$u %*% diag(test$d) class(test) scores2 <- predict(test) # use this function to predict scores all.equal(scores, scores2) dim(scores) dim(pca$x) tail(pca$sdev) plot(scores2, pca$x[, 1:ncol(scores2)]) plot(test$v[1:100, ], pca$rotation[1:100, 1:ncol(scores2)]) # projecting on new data X2 <- sweep(sweep(X[-ind, ], 2, test$center, '-'), 2, test$scale, '/') scores.test <- X2 %*% test$v ind2 <- setdiff(rows_along(X), ind) scores.test2 <- predict(test, X, ind.row = ind2) # use this all.equal(scores.test, scores.test2) scores.test3 <- predict(pca, X[-ind, ]) plot(scores.test2, scores.test3[, 1:ncol(scores.test2)])  ### Example output List of 5$ d     : num [1:10] 172.5 117.6 89.6 87.5 87.2 ...
$u : num [1:258, 1:10] -0.1015 -0.0914 -0.0951 -0.0798 -0.0901 ...$ v     : num [1:4542, 1:10] 0.00304 -0.00274 0.02779 -0.01381 0.00599 ...
$center: num [1:4542] 1.32 1.6 1.56 1.69 1.05 ...$ scale : num [1:4542] 0.684 0.572 0.616 0.511 0.692 ...
- attr(*, "class")= chr "big_SVD"
[1] TRUE
[1] TRUE
[1] "big_SVD"
[1] TRUE
[1] 258  10
[1] 258 258
[1] 3.037803e+00 3.014497e+00 2.977890e+00 2.953051e+00 2.877736e+00
[6] 7.611047e-15
[1] TRUE


bigstatsr documentation built on April 5, 2021, 5:08 p.m.