tune.spca | R Documentation |
This function performs sparse pca and optimises the number of variables to keep on each component using repeated cross-validation.
tune.spca(
X,
ncomp = 2,
nrepeat = 1,
folds,
test.keepX,
center = TRUE,
scale = TRUE,
BPPARAM = SerialParam(),
seed = NULL
)
X |
a numeric matrix (or data frame) which provides the data for the sparse principal components analysis. It should not contain missing values. |
ncomp |
Integer, if data is complete |
nrepeat |
Number of times the Cross-Validation process is repeated. |
folds |
Number of folds in 'Mfold' cross-validation. See details. |
test.keepX |
numeric vector for the different number of variables to
test from the |
center |
(Default=TRUE) Logical, whether the variables should be shifted
to be zero centered. Only set to FALSE if data have already been centered.
Alternatively, a vector of length equal the number of columns of |
scale |
(Default=TRUE) Logical indicating whether the variables should be scaled to have unit variance before the analysis takes place. |
BPPARAM |
A BiocParallelParam object indicating the type of parallelisation. See examples. |
seed |
set a number here if you want the function to give reproducible outputs. Not recommended during exploratory analysis. Note if RNGseed is set in 'BPPARAM', this will be overwritten by 'seed'. |
Essentially, for the first component, and for a grid of the number of
variables to select (keepX
), a number of repeats and folds, data are
split to train and test and the extracted components are compared against
those from a spca model with all the data to ascertain the optimal
keepX
. In order to keep at least 3 samples in each test set for
reliable scaling of the test data for comparison, folds
must be <=
floor(nrow(X)/3)
The number of selected variables for the following components will then be
sequentially optimised. If the number of observations are small (e.g. < 30),
it is recommended to use Leave-One-Out Cross-Validation which can be
achieved by setting folds = nrow(X)
.
A tune.spca
object containing:
The function call
The selected number of components on each component
The correlations between the components from the cross-validated studies and those from the study which used all of the data in training.
data("nutrimouse")
nrepeat <- 5
tune.spca.res <- tune.spca(
X = nutrimouse$lipid,
ncomp = 2,
nrepeat = nrepeat,
folds = 3,
test.keepX = seq(5, 15, 5),
seed = 42
)
tune.spca.res
plot(tune.spca.res)
## Not run:
## parallel processing using BiocParallel on repeats with more workers (cpus)
# Check if the environment variable exists (during R CMD check) and limit cores accordingly
max_cores <- if (Sys.getenv("_R_CHECK_LIMIT_CORES_") != "") 2 else parallel::detectCores() - 1
# Setup the parallel backend with the appropriate number of workers
BPPARAM <- BiocParallel::MulticoreParam(workers = max_cores)
tune.spca.res <- tune.spca(
X = nutrimouse$lipid,
ncomp = 2,
nrepeat = nrepeat,
folds = 3,
test.keepX = seq(5, 15, 5),
BPPARAM = BPPARAM
)
plot(tune.spca.res)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.