Description Usage Arguments Details Value Examples
This function performs sparse pca and optimises the number of variables to keep on each component using repeated cross-validation.
1 2 3 4 5 6 7 8 9 10 |
X |
a numeric matrix (or data frame) which provides the data for the sparse principal components analysis. It should not contain missing values. |
ncomp |
Integer, if data is complete |
nrepeat |
Number of times the Cross-Validation process is repeated. |
folds |
Number of folds in 'Mfold' cross-validation. See details. |
test.keepX |
numeric vector for the different number of variables to test from the X data set |
center |
(Default=TRUE) Logical, whether the variables should be shifted
to be zero centered. Only set to FALSE if data have already been centered.
Alternatively, a vector of length equal the number of columns of |
scale |
(Default=TRUE) Logical indicating whether the variables should be scaled to have unit variance before the analysis takes place. |
BPPARAM |
A BiocParallelParam object indicating the type of parallelisation. See examples. |
Essentially, for the first component, and for a grid of the number of
variables to select (keepX
), a number of repeats and folds, data are
split to train and test and the extracted components are compared against
those from a spca model with all the data to ascertain the optimal
keepX
. In order to keep at least 3 samples in each test set for
reliable scaling of the test data for comparison, folds
must be <=
floor(nrow(X)/3)
The number of selected variables for the following components will then be
sequentially optimised. If the number of observations are small (e.g. < 30),
it is recommended to use Leave-One-Out Cross-Validation which can be
achieved by setting folds = nrow(X)
.
A tune.spca
object containing:
The function call
The selected number of components on each component
The correlations between the components from the cross-validated studies and those from the study which used all of the data in training.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | data("nutrimouse")
set.seed(42)
nrepeat <- 5
tune.spca.res <- tune.spca(
X = nutrimouse$lipid,
ncomp = 2,
nrepeat = nrepeat,
folds = 3,
test.keepX = seq(5, 15, 5)
)
tune.spca.res
plot(tune.spca.res)
## Not run:
## parallel processing using BiocParallel on repeats with more workers (cpus)
## You can use BiocParallel::MulticoreParam() on non_Windows machines
## for faster computation
BPPARAM <- BiocParallel::SnowParam(workers = max(parallel::detectCores()-1, 2))
tune.spca.res <- tune.spca(
X = nutrimouse$lipid,
ncomp = 2,
nrepeat = nrepeat,
folds = 3,
test.keepX = seq(5, 15, 5),
BPPARAM = BPPARAM
)
plot(tune.spca.res)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.