newCountDataSet: Generate a simulated sequencing data set using a negative...

Description Usage Arguments Value Examples

View source: R/newCountDataSet.R

Description

Generate two nxp data sets: a training set and a test set, as well as outcome vectors y and yte of length n indicating the class labels of the training and test observations.

Usage

1
newCountDataSet(n, p, K, param, sdsignal,drate)

Arguments

n

Number of observations desired.

p

Number of features desired. Note that drate of the features will differ between classes, though some of those differences may be small.

K

Number of classes desired. Note that the function requires that n be at least equal to 4K.i.e. there must be at least 4 observations per class on average.

param

The dispersion parameter for the negative binomial distribution. The negative binomial distribution is parameterized using "mu" and "size" in the R function "rnbinom". That is, Y ~ NB(mu, param) means that E(Y)=mu and Var(Y) = mu+mu^2/param.So when param is very large this is essentially a Poisson distribution, and when param is smaller then there is a lot of overdispersion relative to the Poisson distribution.

sdsignal

The extent to which the classes are different. If this equals zero then there are no class differences and if this is large then the classes are very different.

drate

The proportion of differentially expressed genes

Value

list(.) A list of output, "sim_train_data" represents training data of q*n data matrix. "sim_test_data" represents test data of q*n data matrix. The colnames of this two matrix are class labels for the n observations May have q<p because features with 0 total counts are removed. The q features are those with >0 total counts in dataset. So q <= p. "truesf" denotes size factors for training observations."isDE" represnts the differential gene label.

Examples

1
dat <- newCountDataSet(n=40,p=500, K=4, param=10, sdsignal=0.1,drate=0.4)

zhangli1109/ENTC documentation built on Nov. 10, 2020, 11:16 p.m.