DTD_cv_lambda_cxx: Cross-validation for digital tissue deconvolution

View source: R/function_crossValidation.R

DTD_cv_lambda_cxxR Documentation

Cross-validation for digital tissue deconvolution

Description

Our descent generalized FISTA implementation includes a l1 regularization term (see train_deconvolution_model). This function performs a 'n.folds'-fold cross validation to find the best fitting regularization parameter.#'

Usage

DTD_cv_lambda_cxx(
  lambda.seq = "none",
  tweak.start,
  X.matrix,
  n.folds = 5,
  lambda.length = 10,
  train.data.list,
  cv.verbose = TRUE,
  warm.start = FALSE,
  estimate.c.type = "direct",
  NORM.FUN = "norm2",
  NESTEROV.FUN = "positive",
  ST.FUN = "softmax",
  inv.precision = 1e-12,
  ...
)

Arguments

lambda.seq

numeric vector or NULL or "none": Over this series of lambdas the FISTA will be optimized. If 'lambda.seq' is set to NULL, a generic series of lambdas - depending on the dimensions of the training set - will be generated. If 'lambda.seq' is "none", no cross validation is done. Only one model with lambda = 0 is trained on the complete data set.

tweak.start

numeric vector, starting vector for the DTD algorithm.

X.matrix

numeric matrix, with features/genes as rows, and cell types as column. Each column of X.matrix is a reference expression profile

n.folds

integer, number of buckets in the cross validation.

lambda.length

integer, how many lambdas will be generated (only used if lambda.seq is NULL)

train.data.list

list, with two entries, a numeric matrix each, named 'mixtures' and 'quantities' Within this list the train/test cross validation will be done. (see Vignette 'browseVignettes("DTD")' for details)

cv.verbose

logical, should information about the cv process be printed to the screen?

warm.start

logical, should the solution of a previous model of the cross validation be used as start in the next model. Notice, that the warm.start starts with the most unpenalized model.

estimate.c.type

string, either "non_negative", or "direct". Indicates how the algorithm finds the solution of arg min_C ||diag(g)(Y - XC)||_2.

  • If 'estimate.c.type' is set to "direct", there is no regularization (see estimate_c),

  • if 'estimate.c.type' is set to "non_negative", the estimates "C" must not be negative (non-negative least squares) (see (see estimate_nn_c))

NORM.FUN

string, after each gradient descent and nesterov step, the the resulting tweak/g-vector can be normed. There are three implemenations:

  • 'identity': No normalization, every entry stays as it is.

  • 'n2normed': the vector is scaled to ||g||_2 = 1

  • 'n1normed': the vector is scaled to ||g||_1 = 1

NESTEROV.FUN

string, sets the nesterov function. The current implementation restricts the result to be positive (due to the optimization constraint g_i ≥ 0)

ST.FUN

string, sets the soft thresholding function.

inv.precision

numeric, for the least squares solution (X^T G X)^-1 must be inverted.

...

all parameters that are passed to the c++ optimization function.

Details

For an example see 'browseVignettes("DTD")'

Notice, there is an R and a C++ implementation of our optimizer. Hence, there are two cross validation implementations, calling either the R or C++ implementation:

DTD_cv_lambda_R and DTD_cv_lambda_cxx.

Value

list of length 2:

  • 'cv.obj', list of lists. DTD model for each lambda, and every folds.

  • 'best.model', list. DTD model optimized on the complete data set with the best lambda from the cross validation.


MarianSchoen/DTD documentation built on April 29, 2022, 1:59 p.m.