trainNBLDA: Train Model over Different Tuning Parameters

View source: R/trainNBLDA.R

trainNBLDAR Documentation

Train Model over Different Tuning Parameters

Description

This function fits the Negative Binomial classifier using various model parameters and finds the best model parameter using the resampling based performance measures.

Usage

trainNBLDA(x, y, type = c("mle", "deseq", "quantile", "tmm"),
  tuneLength = 10, metric = c("accuracy", "error"), train.control = nbldaControl(), ...)

Arguments

x

a n-by-p data frame or matrix. Samples should be in the rows and variables in the columns. Used to train the classifier.

y

a vector of length n. Each element corresponds to a class label of a sample. Integer and/or factor types are allowed.

type

a character string indicating the type of normalization method within the NBLDA model. See details.

tuneLength

a positive integer. This is the total number of levels to be used while tuning the model parameter(s) in grid search.

metric

which criteria should be used while determining the best model parameter? overall accuracy or average number of misclassified samples?

train.control

a list with control parameters to be used in the NBLDA model. See nbldaControl for details.

...

further arguments. Deprecated.

Details

NBLDA is proposed to classify count data from any field, e.g., economics, social sciences, genomics, etc. In RNA-Seq studies, for example, normalization is used to adjust between-sample differences for downstream analysis. type is used to define normalization method. Available options are "mle", "deseq", "quantile", and "tmm". Since "deseq", "quantile", and "tmm" methods are originally proposed as robust methods to be used in RNA-Sequencing studies, one should carefully define normalization types. In greater detail, "deseq" estimates the size factors by dividing each sample by the geometric means of the transcript counts (Anders and Huber, 2010). "tmm" trims the lower and upper side of the data by log-fold changes to minimize the log-fold changes between the samples and by absolute intensity (Robinson and Oshlack, 2010). "quantile" is quantile normalization approach of Bullard et al. (2010). "mle" (less robust) divides total counts of each sample to the total counts (Witten, 2010). See related papers for mathematical backgrounds.

Value

an nblda object with following slots:

input

an nblda_input object including the raw count data and response variable. See nblda_input for details.

result

an nblda_trained object including the results from cross-validated and final models. See nblda_trained for details.

call

a call expression.

Author(s)

Dincer Goksuluk

References

Witten, DM (2011). Classification and clustering of sequencing data using a Poisson model. Ann. Appl. Stat. 5(4), 2493–2518. doi:10.1214/11-AOAS493.

Dong, K., Zhao, H., Tong, T., & Wan, X. (2016). NBLDA: negative binomial linear discriminant analysis for RNA-Seq data. BMC Bioinformatics, 17(1), 369. http://doi.org/10.1186/s12859-016-1208-1.

Anders S. Huber W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106

Witten D. et al. (2010) Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biology, 8:58

Robinson MD, Oshlack A (2010). A scaling normalization method for differential expression analysis of RNA-Seq data. Genome Biology, 11:R25, doi:10.1186/gb-2010-11-3-r25

Examples

set.seed(2128)
counts <- generateCountData(n = 20, p = 10, K = 2, param = 1, sdsignal = 0.5, DE = 0.8,
                            allZero.rm = FALSE, tag.samples = TRUE)
x <- t(counts$x + 1)
y <- counts$y
xte <- t(counts$xte + 1)
ctrl <- nbldaControl(folds = 2, repeats = 2)

fit <- trainNBLDA(x = x, y = y, type = "mle", tuneLength = 10,
                  metric = "accuracy", train.control = ctrl)

fit
nbldaTrained(fit)  # Cross-validated model summary.


NBLDA documentation built on March 18, 2022, 7:51 p.m.