simData: Simulate different scenarios of abundance change in entities

Description Usage Arguments Details Value Author(s) Examples

Description

simData simulates different abundance patterns for entities under different conditions. These entities have their corresponding nodes on a tree. More details about the simulated patterns could be found in the vignette via browseVignettes("treeAGG").

Usage

1
2
3
4
5
simData(tree = NULL, data = NULL, obj = NULL, scenario = "S1",
  from.A = NULL, from.B = NULL, minTip.A = 0, maxTip.A = Inf,
  minTip.B = 0, maxTip.B = Inf, minPr.A = 0, maxPr.A = 1,
  ratio = 2, adjB = NULL, pct = 0.6, nSam = c(50, 50),
  mu = 10000, size = 50, n = 1, fun = sum)

Arguments

tree

A phylo object. Only use when obj is NULL.

data

A matrix, representing a table of values, such as count, collected from real data. It has the entities corresponding to tree leaves in the row and samples in the column. Only use when obj is NULL.

obj

A leafSummarizedExperiment object that includes a list of matrix-like elements, or a matrix-like element in assays, and a phylo object in metadata. In other words, obj provides the same information given by tree and data.

scenario

“S1”, “S2”, or “S3” (see Details). Default is “S1”.

from.A, from.B

The branch node labels of branches A and B for which the signal is swapped. Default, both are NULL. In simulation, we select two branches (A & B) to have differential abundance under different conditions. One could specify these two branches or let simData choose. (Note: If from.A is NULL, from.B is set to NULL).

minTip.A

The minimum number of leaves in branch A

maxTip.A

The maximum number of leaves in branch A

minTip.B

The minimum number of leaves in branch B

maxTip.B

The maximum number of leaves in branch B

minPr.A

A numeric value selected from 0 to 1. The minimum abundance proportion of leaves in branch A

maxPr.A

A numeric value selected from 0 to 1. The maximum abundance proportion of leaves in branch A

ratio

A numeric value. The proportion ratio of branch B to branch A. This value is used to select branches(see Details). If there are no branches having exactly this ratio, the pair with the value closest to ratio would be selected.

adjB

a numeric value selected from 0 and 1 (only for scenario is “S3”). Default is NULL. If NULL, branch A and the selected part of branch B swap their proportions. If a numeric value, e.g. 0.1, then the selected part of branch B decreases to its one tenth proportion and the decrease in branch B is added to branch A. For example, assume there are two experimental conditions (C1 & C2), branch A has 10 and branch B has 40 in C1. If adjB is set to 0.1, then in C2 branch B becomes 4 and branch A 46 so that the total proportion stays the same.

pct

The percentage of leaves in branch B that have differential abundance under different conditions (only for scenario “S3”)

nSam

A numeric vector of length 2, containing the sample size for two different conditions

mu, size

The parameters of the Negative Binomial distribution. (see mu and size in rnbinom). Parameters used to generate the library size for each simulated sample.

n

A numeric value to specify how many count tables would be generated with the same settings. Default is one and one count table would be obtained at the end. If above one, the output is a list of matrices (count tables). This is useful, when one needs multiple simulations.

fun

A function to derive the count at each internal node based on its descendant leaves, e.g. sum, mean. The argument of the function is a numeric vector with the counts of an internal node's descendant leaves.

Details

simData simulates a count table for entities which are corresponding to the nodes of a tree. The entities are in rows and the samples from different groups or conditions are in columns. The library size of each sample is sampled from a Negative Binomial distribution with mean and size specified by the arguments mu and size. The counts of entities, that are mapped ot the leaf nodes, in a same sample are assumed to follow a Dirichlet-Multinomial distribution. The parameters for the Dirichlet-Multinomial distribution are estimated from a real data set specified by the argument data via the function dirmult (see dirmult). To generate different abundance patterns under different conditions, we provide three different scenarios, “S1”, “S2”, and “S3” (specified via scenario). Our vignette provides figures to explain these three scenarios (try browseVignettes("treeAGG")).

Value

a treeSummarizedExperiment object.

Author(s)

Ruizhu Huang

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
set.seed(1)
y <- matrix(rnbinom(100,size=1,mu=10),nrow=10)
colnames(y) <- paste("S", 1:10, sep = "")
rownames(y) <- tinyTree$tip.label


toy_lse <- leafSummarizedExperiment(tree = tinyTree,
                                    assays = list(y))
res <- parEstimate(data = toy_lse)

set.seed(1122)
dat1 <- simData(obj = res)

markrobinsonuzh/treeAGG documentation built on May 26, 2019, 9:32 a.m.