cvSelectParams: Fold-Specific Selection of Contrastive and Penalization...

Description Usage Arguments Value References

View source: R/scPCA.R

Description

A wrapper function for fitting various internal functions to select the optimal setting of the contrastive and penalization parameters via cross-validation. For internal use only.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
cvSelectParams(
  fold,
  target,
  background,
  center,
  scale,
  n_eigen,
  alg = alg,
  contrasts,
  penalties,
  clust_method,
  n_centers,
  max_iter,
  linkage_method,
  n_medoids,
  parallel,
  clusters,
  eigdecomp_tol,
  eigdecomp_iter
)

Arguments

fold

Object specifying cross-validation folds as generated by a call to make_folds.

target

The target (experimental) data set, in a standard format such as a data.frame or matrix.

background

The background data set, in a standard format such as a data.frame or matrix. Note that the number of features must match the number of features in the target data.

center

A logical indicating whether the target and background data sets should be centered to mean zero.

scale

A logical indicating whether the target and background data sets should be scaled to unit variance.

n_eigen

A numeric indicating the number of eigenvectors (or sparse contrastive components) to be computed. The default is to compute two such eigenvectors.

alg

A character indicating the SPCA algorithm used to sparsify the contrastive loadings. Currently supports iterative for the \insertCitezou2006sparse;textualscPCA implementation, var_proj for the non-randomized \insertCiteerichson2018sparse;textualscPCA solution, and rand_var_proj for the randomized \insertCiteerichson2018sparse;textualscPCA result.

contrasts

A numeric vector of the contrastive parameters. Each element must be a unique non-negative real number. The default is to use 40 logarithmically spaced values between 0.1 and 1000.

penalties

A numeric vector of the L1 penalty terms on the loadings. The default is to use 20 equidistant values between 0.05 and 1.

clust_method

A character specifying the clustering method to use for choosing the optimal contrastive parameter. Currently, this is limited to either k-means, partitioning around medoids (PAM), and hierarchical clustering. The default is k-means clustering.

n_centers

A numeric giving the number of centers to use in the clustering algorithm. If set to 1, cPCA, as first proposed by Abid et al., is performed, regardless of what the penalties argument is set to.

max_iter

A numeric giving the maximum number of iterations to be used in k-means clustering, defaulting to 10.

linkage_method

A character specifying the agglomerative linkage method to be used if clust_method = "hclust". The options are ward.D2, single, complete, average, mcquitty, median, and centroid. The default is complete.

n_medoids

A numeric indicating the number of medoids to consider if n_centers is set to 1. The default is 8 such medoids.

parallel

A logical indicating whether to invoke parallel processing via the BiocParallel infrastructure. The default is FALSE for sequential evaluation.

clusters

A numeric vector of cluster labels for observations in the target data. Defaults to NULL, but is otherwise used to identify the optimal set of hyperparameters when fitting the scPCA and the automated version of cPCA.

eigdecomp_tol

A numeric providing the level of precision used by eigendecompositon calculations. Defaults to 1e-10.

eigdecomp_iter

A numeric indicating the maximum number of interations performed by eigendecompositon calculations. Defaults to 1000.

Value

Output structure matching either that of fitCPCA or fitGrid (or their parallelized variants, namely either bpFitCPCA and link{bpFitGrid}, respectively).

References

\insertAllCited
scPCA documentation built on Nov. 8, 2020, 6 p.m.