mmscaModelSelection: Model selection for MMSCA

Description Usage Arguments Value Examples

View source: R/mmscaModelSelection.R

Description

A function that performs model selection, for the regularizers and the number of components for mmsca()

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
mmscaModelSelection(
  X,
  ridgeSeq,
  lassoSeq,
  grouplassoSeq,
  elitistlassoSeq,
  ncompSeq,
  tuningMethod = "BIC",
  groups,
  nrFolds = NULL,
  itr = 1e+06,
  nStart = 1,
  tol = 1e-07,
  coorDes = TRUE,
  coorDesItr = 100,
  printProgress = TRUE
)

Arguments

X

A data matrix of class matrix

ridgeSeq

A range of values for the ridge penalty that need to be examined. Specify a zero if the tuning parameter is not wanted.

lassoSeq

A range of values for the lasso penalty that need to be examined. Specify a zero if the tuning parameter is not wanted.

grouplassoSeq

A range of values for the group lasso penalty that need to be examined. Specify a zero if the tuning parameter is not wanted.

elitistlassoSeq

A range of values for the elitist lasso penalty that need to be examined. Specify a zero if the tuning parameter is not wanted.

ncompSeq

A range of integers for the number of components that need to be examined.

tuningMethod

A string indicating which model selection method should be used. "BIC" enables the Bayesian information criterion, "IS" enables the index of sparseness. "CV" enables cross-validation (CV) with the EigenVector method, if CV is used, the number of folds nrFolds needs to be chosen. The number of folds should be an integer less than nrow(X). The data are then split in equal sized chunks if order of appearance.

groups

A vector specifying which columns of X belong to what block. Example: c(10, 100, 1000). The first 10 variables belong to the first block, the 100 variables after that belong to the second block etc.

itr

The maximum number of iterations (a positive integer)

tol

The convergence is determined by comparing the loss function value after each iteration, if the difference is smaller than tol, the analysis is converged. Default value is 10e-8

coorDes

A boolean with the default FALSE. If coorDes is FALSE the estimation of the majorizing function to estimate the component weights W conditional on the loadings P will be found using matrix inverses which can be slow. If set to TRUE the marjozing function will be optimized (or partially optimized) using coordinate descent, in some cases coordinate descent will be faster

coorDesItr

An integer specifying the maximum number of iterations for the coordinate descent algorithm, the default is set to 1. You do not have to run this algorithm until convergence before alternating back to the estimation of the loadings. The tolerance for this algorithm is hardcoded and set to 10^-8.

printProgress

A boolean: TRUE will print the progress of the model selection

nrFold

An integer that specify the number of folds that Cross-validation should use if tuningmethod == "CV", the number of folds needs to be lower then nrow(X).

nStarts

The number of random starts the analysis should perform. The first start will be a warm start. You can not give custom starting values.

Value

A list containing:
results A list with ncomp elements each containing the following items in a list

bestNcomp The number of component with the best value for the chosen tuning index

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
 
X <- matrix(rnorm(100 * 10), 100, 10)

out <- mmscaModelSelection(X, 
            ridgeSeq = seq(0, 1, by = 0.1), 
            lassoSeq = 0:100, 
            grouplassoSeq = 0,
            elitistlassoSeq = 0, 
            ncompSeq = 1:3, 
            tuningMethod = "CV", 
            groups = ncol(X), 
            nrFolds = 10, 
            itr = 100000, 
            nStart = 1, 
            coorDes = FALSE, 
            coorDesItr = 100, 
            printProgress = TRUE)

#Inspect the results of the model selection for the optimal number of components according to the tuning method
out$results[[out$bestNcomp]]

trbKnl/sparseWeightBasedPCA documentation built on July 22, 2020, 10:29 p.m.