nbldaControl: Control parameters for trained NBLDA model.

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/control.R


Define control parameters to be used within trainNBLDA function.


nbldaControl(folds = 5, repeats = 2, foldIdx = NULL, rhos = NULL,
  beta = 1, prior = NULL, transform = FALSE, alpha = NULL,
  truephi = NULL, target = 0, phi.epsilon = 0.15,
  normalize.target = FALSE, delta = NULL, multicore = FALSE, ...)



A positive integer. The number of folds for k-fold model validation.


A positive integer. This is the number of repeats for k-fold model validation. If NULL, 0 or negative, it is set to 1.


a list with indices of hold-out samples for each fold. It should be a list where folds are nested within repeats. If NULL, folds and repeats are used to define hold-out samples.


A vector of tuning parameters that control the amount of soft thresholding performed. If NULL, it is automatically generated within trainNBLDA using tuneLength, i.e. the length of grid search. See details.


A smoothing term. A Gamma(beta,beta) prior is used to fit the Poisson model. Recommendation is to just leave it at 1, the default value. See Witten (2011) and Dong et. al. (2016) for details.


A vector of length equal to the number of classes indicating the prior class probabilities. If NULL, all classes are assumed to be equally distributed.


a logical. If TRUE, count data is transformed using power transformation. If alpha is not specified the power transformation parameter is automatically calculated using goodness-of-fit test. See Witten (2011) for details.


a numeric value within [0, 1] to be used for power transformation.


a vector of length equal to the number of variables representing the true overdispersion parameters for each variable. If a single value is given, it is replicated for all variables. If a vector of length unequal to the number of variables is given, the first element of this vector is used and replicated for all variables. If NULL, estimated overdispersions are used in the classifier. See details.


a value for the shrinkage target of dispersion estimates. If target is NULL, then then a value that is small and minimizes the average squared difference is automatically used as the target value. See getT for details.


a positive value for controlling the number of features whose dispersions are shrinked towards 0. See details.


a logical. If TRUE and target is NULL, the target value is estimated using normalized dispersion estimates. See getT for details.


a weight within the interval [0, 1] that is used while shrinking dispersions towards 0. When "delta = 0", initial dispersion estimates are forced to be shrinked to 1. Similarly, if "delta = 0", no shrinkage is performed on initial estimates.


a logical. If a parallel backend is loaded and available, the function runs in parallel CPUs.


further arguements passed to trainNBLDA.


rhos is used to control the level of sparsity, i.e. the number of variables (or features) used in classifier. If a variable has no contribution to discrimination function, it should be removed from the model. By setting rhos within the interval [0, Inf], it is possible control the amount of variables that is removed from the model. As the upper bound of rhos decreases towards 0, fewer variables are removed. If rhos = 0, all variables are included in classifier.

truephi controls how Poisson model differs from Negative Binomial model. If overdispersion is zero, Negative Binomial model converges to Poisson model. Hence, the results from trainNBLDA is identical to PLDA results from Classify when truephi = 0.

phi.epsilon is a value used to shring estimated overdispersions towards 0. Poisson model assumes that there is no overdispersion in the observed counts. However, this is not a valid assumption in highly overdispersed count data. NBLDA performs a shrinkage on estimated overdispersions. Although the amount of shrinkage is dependent on several parameters such as delta, target and truephi, some of the shrinked overdispersions might be very close to 0. By defining a threshold value for shrinked overdispersions, it is possible to shrink very small overdispersions towards 0. If estimated overdispersion is below phi.epsilon, it is shrinked to 0. If phi.epsilon = NULL, threshold value is set to 0. Hence, all the variables with very small overdispersion are included in the NBLDA model.


a list with all the control elements.


Dincer Goksuluk


Witten, DM (2011). Classification and clustering of sequencing data using a Poisson model. Ann. Appl. Stat. 5(4), 2493–2518. doi:10.1214/11-AOAS493.

Dong, K., Zhao, H., Tong, T., & Wan, X. (2016). NBLDA: negative binomial linear discriminant analysis for RNA-Seq data. BMC Bioinformatics, 17(1), 369. http://doi.org/10.1186/s12859-016-1208-1.

Yu, D., Huber, W., & Vitek, O. (2013). Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size. Bioinformatics, 29(10), 1275-1282.

See Also

getT, getAdjustDisp


nbldaControl()  # return default control parameters.

NBLDA documentation built on May 2, 2019, 12:21 p.m.