generateCountData: Generate Count Data

View source: R/generateCountData.R

generateCountDataR Documentation

Generate Count Data

Description

This function can be used to generate counts, e.g., RNA-Sequencing data, for both the classification and clustering purposes.

Usage

generateCountData(n, p, K, param, sdsignal = 1, DE = 0.3, allZero.rm = TRUE,
  tag.samples = FALSE)

Arguments

n

number of samples.

p

number of variables/features.

K

number of classes.

param

overdispersion parameter. This parameter is matched with the argument size in the rnbinom function. Hence, the Negative Binomial distribution approximates to the Poisson distribution as param increases.

sdsignal

a nonzero numeric value. As sdsignal increases, the observed counts greatly differs among K classes.

DE

a numeric value within the interval [0, 1]. This is the proportion of total number of variables that is significantly different among K classes. The remaining part is assumed to be having no contribution to the discrimination function.

allZero.rm

a logical. If TRUE, the columns having all zero cells are dropped.

tag.samples

a logical. If TRUE, the row names are automatically generated using a tag for each sample such as "S1", "S2", etc.

Value

x, xte

count data matrix for training and test set.

y, yte

class labels for training and test set.

truesf, truesfte

true size factors for training and test set. See Witten (2011) for more information on estimating size factors.

Author(s)

Dincer Goksuluk

Examples

set.seed(2128)
counts <- generateCountData(n = 20, p = 10, K = 2, param = 1, sdsignal = 0.5, DE = 0.8,
                            allZero.rm = FALSE, tag.samples = TRUE)
head(counts$x)


NBLDA documentation built on March 18, 2022, 7:51 p.m.