trainNBLDA: Train Model over Different Tuning Parameters

Description Usage Arguments Details Value Author(s) References Examples

View source: R/trainNBLDA.R

Description

This function fits Negative Binomial classifier using various model parameters and finds the best model parameter using the resampling based performance measures.

Usage

1
2
3
trainNBLDA(x, y, type = c("mle", "deseq", "quantile", "tmm"),
  tuneLength = 10, metric = c("accuracy", "error"),
  train.control = nbldaControl(), ...)

Arguments

x

a n-by-p data frame or matrix. Samples should be in the rows and variables in the columns. Used to train the classifier.

y

a vector of length n. Each element corresponds to a class label of a sample. Integer and/or factor types are allowed.

type

a character string indicating the type of normalization method within the NBLDA model. See details.

tuneLength

a positive integer. This is the total number of levels to be used while tuning the model parameter(s).

metric

which criteria should be used while determining the best parameter? overall accuracu or avarage number of misclassified samples?

train.control

a list with control parameters to be used in NBLDA model. See nbldaControl for details.

...

further arguments. Deprecated.

Details

NBLDA is proposed to classify count data from any field, e.g. economics, social sciences, genomics, etc. In RNA-Seq studies, for example, normalization is used to adjust between-sample differences for downstream analysis. type is used to define normalization method. Available options are "mle", "deseq", "quantile" and "tmm". Since "deseq", "quantile" and "tmm" methods are originally proposed as robust methods to be used in RNA-Sequencing studies, one should carefully define normalization types. In greater details, "deseq" estimates the size factors by dividing each sample by the geometric means of the transcript counts (Anders and Huber, 2010). "tmm" trims the lower and upper side of the data by log fold changes to minimize the log-fold changes between the samples and by absolute intensity (Robinson and Oshlack, 2010). "quantile" is quantile normalization approach of Bullard et al (2010). "mle" (less robust) divides total counts of each sample to the grand total counts (Witten, 2010). See related papers for mathematical backgrounds.

Value

an nblda object with following slots:

input

an nblda_input object including the raw count data and response variable. See nblda_input for details.

result

an nblda_trained object including the results from cross-validated and final models. See nblda_trained for details.

call

a call expression.

Author(s)

Dincer Goksuluk

References

Witten, DM (2011). Classification and clustering of sequencing data using a Poisson model. Ann. Appl. Stat. 5(4), 2493–2518. doi:10.1214/11-AOAS493.

Dong, K., Zhao, H., Tong, T., & Wan, X. (2016). NBLDA: negative binomial linear discriminant analysis for RNA-Seq data. BMC Bioinformatics, 17(1), 369. http://doi.org/10.1186/s12859-016-1208-1.

Anders S. Huber W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106

Witten D. et al. (2010) Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biology, 8:58

Robinson MD, Oshlack A (2010). A scaling normalization method for differential expression analysis of RNA-Seq data. Genome Biology, 11:R25, doi:10.1186/gb-2010-11-3-r25

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
set.seed(2128)
counts <- generateCountData(n = 20, p = 10, K = 2, param = 1, sdsignal = 0.5, DE = 0.8,
                            allZero.rm = FALSE, tag.samples = TRUE)
x <- t(counts$x + 1)
y <- counts$y
xte <- t(counts$xte + 1)
ctrl <- nbldaControl(folds = 2, repeats = 2)

fit <- trainNBLDA(x = x, y = y, type = "mle", tuneLength = 10,
                  metric = "accuracy", train.control = ctrl)

fit
nbldaTrained(fit)  # Cross-validated model summary.

NBLDA documentation built on May 2, 2019, 12:21 p.m.