Home

/

GitHub

/

patperry/r-bcv

/

cv.svd: Cross-Validation for choosing the rank of an SVD...

cv.svd: Cross-Validation for choosing the rank of an SVD...
In patperry/r-bcv: Cross-Validation for the SVD (Bi-Cross-Validation)

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Perform Wold- or Gabriel-style cross-validation for determining the appropriate rank SVD approximation of a matrix.

  cv.svd.gabriel(x, krow = 2, kcol = 2, 
                 maxrank = floor(min(n - n/krow, p - p/kcol)))
                 
  cv.svd.wold(x, k = 5, maxrank = 20, tol = 1e-4, maxiter = 20)

`x`	the matrix to cross-validate.
`k`	the number of folds (for Wold-style CV).
`krow`	the number of row folds (for Gabriel-style CV).
`kcol`	the number of column folds (for Gabriel-style CV).
`maxrank`	the maximum rank to cross-validate up to.
`tol`	the convergence tolerance for `impute.svd`.
`maxiter`	the maximum number of iterations for `impute.svd`.

These functions are for cross-validating the SVD of a matrix. They assume a model $X = U D V' + E$ with the terms being signal and noise, and try to find the best rank to truncate the SVD of x at for minimizing prediction error. Here, prediction error is measured as sum of squares of residuals between the truncated SVD and the signal part.

For both types of cross-validation, in each replicate we leave out part of the matrix, fit an SVD approximation to the left-in part, and measure prediction error on the left-out part.

In Wold-style cross-validation, the holdout set is "speckled", a random set of elements in the matrix. The missing elements are predicted using impute.svd.

In Gabriel-style cross-validation, the holdout set is "blocked". We permute the rows and columns of the matrix, and leave out the lower-right block. We use a modified Schur-complement to predict the held-out block. In Gabriel-style, there are krow*kcol total folds.

`call`	the function call
`msep`	the mean square error of prediction (MSEP); this is a matrix whose columns contain the mean square errors in the predictions of the holdout sets for ranks 0, 1, ..., `maxrank` across the different replicates.
`maxrank`	the maximum rank for which prediction error is estimated; this is equal to `nrow(msep)+1`.
`krow`	the number of row folds (for Gabriel-style only).
`kcol`	the number of column folds (for Gabriel-style only).
`rowsets`	the partition of rows into `krow` holdout sets (for Gabriel-style only).
`colsets`	the partition of the columns into `kcol` holdout sets (for Gabriel-style only).
`k`	the number of folds (for Wold-style only).
`sets`	the partition of indices into `k` holdout sets (for Wold-style only).

Gabriel's version of cross-validation was for leaving out a single element of the matrix, which corresponds to n-by-p-fold. Owen and Perry generalized Gabriel's idea to larger holdouts, showing that 2-by-2-fold cross-validation often works better.

Wold's original version of cross-validation did not use the EM algorithm to estimate the SVD. He recommend using the NIPALS algorithm instead, which has since faded into obscurity.

Wold-style cross-validation takes a lot more computation than Gabriel-style. The maxrank, tol, and maxiter have been chosen to give up some accuracy in the name of expediency. They may need to be adjusted to get the best results.

Patrick O. Perry

Gabriel, K.R. (2002). Le biplot–outil d'explaration de données multidimensionelles. J. Roy. Stat. Soc. Series B 40 186–196.

Owen, A.B. and Perry, P.O. (2009). Bi-cross-validation of the SVD and the non-negative matrix factorization. Annals of Applied Statistics 3(2) 564–594.

Wold, S. (1978). Cross-validitory estimation of the number of components in factor and principal components models. Technometrics 20(4) 397–405.

impute.svd, plot.cvsvd, print.cvsvd summary.cvsvd

  # generate a rank-2 matrix plus noise
  n <- 50; p <- 20; k <- 2
  u <- matrix( rnorm( n*k ), n, k )
  v <- matrix( rnorm( p*k ), p, k )
  e <- matrix( rnorm( n*p ), n, p )
  x <- u %*% t(v) + e
  
  # perform 5-fold Wold-style cross-validtion
  (cvw <- cv.svd.wold( x, 5, maxrank=10 ))
  
  # perform (2,2)-fold Gabriel-style cross-validation
  (cvg <- cv.svd.gabriel( x, 2, 2, maxrank=10 ))

patperry/r-bcv documentation built on May 24, 2019, 8:20 p.m.

patperry/r-bcv index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

patperry/r-bcv
Cross-Validation for the SVD (Bi-Cross-Validation)

cv.svd: Cross-Validation for choosing the rank of an SVD...
In patperry/r-bcv: Cross-Validation for the SVD (Bi-Cross-Validation)

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to cv.svd in patperry/r-bcv...

R Package Documentation

Browse R Packages

We want your feedback!

patperry/r-bcv Cross-Validation for the SVD (Bi-Cross-Validation)

cv.svd: Cross-Validation for choosing the rank of an SVD... In patperry/r-bcv: Cross-Validation for the SVD (Bi-Cross-Validation)

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to cv.svd in patperry/r-bcv...

R Package Documentation

Browse R Packages

We want your feedback!

patperry/r-bcv
Cross-Validation for the SVD (Bi-Cross-Validation)

cv.svd: Cross-Validation for choosing the rank of an SVD...
In patperry/r-bcv: Cross-Validation for the SVD (Bi-Cross-Validation)