Description Usage Arguments Details Value References Examples
View source: R/dacomp_generate_example_data.R
This function generates a two-sample dataset, based on the kostic
dataset (Kostic et. al. 2012) from the phyloseq
package (McMurdie et. al. 2012). Simulated data is generated in a procedure similar to the one presented in Brill et. al. 2019, Subsection 4.1. See additionals details below.
1 2 3 4 5 6 | dacomp.generate_example_dataset.two_sample(
n_X = 30,
n_Y = 30,
m1 = 30,
signal_strength_as_change_in_microbial_load = 0.1
)
|
n_X |
Number of samples from the first group |
n_Y |
Number of samples from the second group |
m1 |
Number of differentially abundant taxa |
signal_strength_as_change_in_microbial_load |
A number in the range 0-0.75, indicating the fraction of the microbial load of group Y that is added due to the simulated condition. The complement of this fraction, is the fraction of the microbial load of group Y that is distribued across taxa as in group X. |
Data is generated as follows. In the first step, we generate a list of vectors of relative frequencies to sample from: only healthy subjects from the kostic colorectal dataset are selected. Samples with less than 500 reads are dropped. Only OTUs that appear in 2 or more subjects are retained. In the second step, samples for group X are generated. For each sample, a vector of frequencies is chosen at random from the list generated in the first step. The observed sampled are multinomial random variables with a probability vector matching the selected frequencies, and a total number of reads realized from a Poisson distribution with a mean number of reads equal to the median number of reads across the samples listed in the first step. In the third step, samples for group Y are generated. For each sample, a vector of frequencies is chosen at random, similar to group X. The frequencies of differentially abundant taxa is increased, with the increase realized from a poisson random variable, such that the total increase in microbial load across all differentially abundant taxa is equivlant to the signal strength specified by the user. Observed counts are sampled based on the updated frequencies. This function requires the phyloseq package from bioconductor.
a list with the following entries
countsA counts matrix with (n_X + n_Y)
rows, and 1384 columns, rows represent samples,columns represent taxa.
group_labelsA vector of group labelings, with values 0 and 1
select_diff_abundantA vector containing the indices of taxa that are differentially abundant.
taxonomyA table for the taxonomic affiliation of OTUs in the simulated dataset.
Brill, Barak, Amnon Amir, and Ruth Heller. 2019. Testing for Differential Abundance in Compositional Counts Data, with Application to Microbiome Studies. arXiv Preprint arXiv:1904.08937.
Kostic, Aleksandar D, Dirk Gevers, Chandra Sekhar Pedamallu, Monia Michaud, Fujiko Duke, Ashlee M Earl, Akinyemi I Ojesina, et al. 2012. Genomic Analysis Identifies Association of Fusobacterium with Colorectal Carcinoma. Genome Research 22 (2). Cold Spring Harbor Lab: 292–98.
McMurdie, Paul J, and Susan Holmes. 2013. Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PloS One 8 (4). Public Library of Science: e61217.
1 2 3 4 5 6 7 8 9 10 11 12 13 | ## Not run:
library(dacomp)
set.seed(1)
data = dacomp.generate_example_dataset.two_sample(m1 = 100,
n_X = 50,
n_Y = 50,
signal_strength_as_change_in_microbial_load = 0.1)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.