SimulateData: Data Simulation Function to Study the Performance of TreeFDR.

Description Usage Arguments Value Author(s) References Examples

Description

We include five scenarios ('S1-S5'). S1-S3 are phylogeny-informative/clade-consistent scenarios while S4-S5 are phylogeny-noninformative/clade-inconsistent scenarios. In 'S1', we simulate two large clusters of differentially abundant OTUs. The fold changes (effect sizes) for OTUs from the same cluster are the same. In 'S2', we weaken the assumption, and generate variable fold changes for OTUs from the same cluster. In 'S3', we simulate many small clusters (10) of differentially abundant OTUs with the same effect sizes. In 'S4', we still simulate two large clusters of differentially abundant OTUs but we allow opposite effects for OTUs from the same cluster. This violates the assumption of similar effects for closely related OTUs. In 'S5', we pick 10% random OTUs without respect to the underlying phyologeny.

Usage

1
2
3
SimulateData(nCases = 50, nControls = 50, nOTU = 400, nCluster = 20, depth = 10000,
		     p.est, theta, scene, signal.strength = 4, otu.no.min = 40, 
		     otu.no.max = 80, zero.pct = 0, balanced = FALSE) 

Arguments

nCases, nControls

the number of case and control samples.

nOTU

the number of OTUs simulated.

nCluster

the number of clusters to be clustered.

depth

mean library sizes/sequencing depth. The library sizes are simulated from negative binomial distribution of size=25.

p.est, theta

the parameters (proportion vector and dispersion parameter) of the Dirichlet distribution.

scene

simulation scenarios. 'S1', 'S2', 'S3', 'S4', 'S5' denote five scenarios, respectively.

signal.strength

the strength of signal (related to the mean and sd of the log fold change).

otu.no.min, otu.no.max

the minimum and maximum numbers of differentially abundant OTUs. Defaults are 40 and 80.

zero.pct

the percentage of non-differential OTUs within the cluster/clade. Applicable to 'S1' and 'S2'

balanced

a logical value indicating whether the fold changes should be multiplied to cases samples (FALSE, increase/decrease in cases, no change for controls) only or to both case and control samples (TRUE, increase in case and control samples). Balanced design will have similar power for all OTUs.

Value

y

a vector of group membership. 0 = control, 1 = case.

X

a matrix of normalized OTU counts. row: OTUs; column: samples.

beta.true

a vector of the true log fold changes for all OTUs. 0s for non-differential OTUs.

D

a matrix of the cophenetic distances among the simulated OTUs.

tree

a simulated coalescent tree of the 'phylo' class.

clustering

a vector of cluster memberships for the OTUs based on PAM.

Author(s)

Jun Chen

References

Jian Xiao, Hongyuan Cao and Jun Chen (2016). False discovery rate control incorporating phylogenetic tree increases detection power in microbiome-wide multiple testing. Submitted.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Generate data set for different scenarios S1-S5
require(StructFDR)
data(throat.parameter)
# Scene 1
data.obj <- SimulateData(nCases = 50, nControls = 50, nOTU = 400, nCluster = 20, 
		depth = 10000, p.est = throat.parameter$p.est, theta = throat.parameter$theta,
        scene = 'S1', signal.strength = 4)
# Scene 2
data.obj <- SimulateData(nCases = 50, nControls = 50, nOTU = 400, nCluster = 20, 
		depth = 10000, p.est = throat.parameter$p.est, theta = throat.parameter$theta,
        scene = 'S2', signal.strength = 4)
# Scene 3
data.obj <- SimulateData(nCases = 50, nControls = 50, nOTU = 400, nCluster = 100, 
		depth = 10000, p.est = throat.parameter$p.est, theta = throat.parameter$theta,
        scene = 'S3', signal.strength = 4)
# Scene 4
data.obj <- SimulateData(nCases = 50, nControls = 50, nOTU = 400, nCluster = 20, 
		depth = 10000, p.est = throat.parameter$p.est, theta = throat.parameter$theta,
        scene = 'S4', signal.strength = 2)
# Scene 5
data.obj <- SimulateData(nCases = 50, nControls = 50, nOTU = 400, nCluster = 20, 
		depth = 10000, p.est = throat.parameter$p.est, theta = throat.parameter$theta,
        scene = 'S5', signal.strength = 4)       

StructFDR documentation built on May 2, 2019, 9:44 a.m.