Description Usage Arguments Details Value References Examples
View source: R/dacomp_generate_example_data.R
This function generates a dataset with a continuous Phenotype, based on the kostic
dataset (Kostic et. al. 2012) from the phyloseq
package (McMurdie et. al. 2012). Simulated data is generated in a procedure similar to the one presented in Brill et. al. 2019, Subsection 4.1. See additionals details below.
1 2 3 4 5 | dacomp.generate_example_dataset_continuous(
n,
m1 = 30,
signal_strength_as_change_in_microbial_load = 0.1
)
|
n |
Number of samples |
m1 |
Number of differentially abundant taxa |
signal_strength_as_change_in_microbial_load |
A number in the range 0-0.75, indicating the fraction of the microbial load that is added to the measured ecosystem, if the phenotype for the sample is equal to 1. For phenotypes with lower values, the change in the microbial load is proportional to the value of the measured phenotype. |
Data is generated as follows.
In the first step, we generate a list of frequency vectors to sample from: only healthy subjects from the kostic colorectal dataset are selected. Samples with less than 500 reads are dropped. Only OTUs that appear in 2 or more subjects are retained.
In the seccond step, a random phenotype is sampled for each sample, from a uniform(0,1) distribution.
In the third step, samples are generated. For each sample, a vector of frequencies is chosen at random, The differentially abundant taxa are increased, with the additions realized from a poisson random variable. The signal inserted is such that a phenotype with a value of 1 is equivlant to an increase in the microbial load, signal_strength_as_change_in_microbial_load
in fraction of the original microbial load.
a list
countsA counts matrix with n
rows, and 1384 columns, rows represent samples,columns represent taxa.
covariateThe measured phenotype
select_diff_abundantA vector containing the indices of taxa that are differentially abundant.
taxonomyA table for the taxonomic affiliation of OTUs in the simulated dataset.
Brill, Barak, Amnon Amir, and Ruth Heller. 2019. Testing for Differential Abundance in Compositional Counts Data, with Application to Microbiome Studies. arXiv Preprint arXiv:1904.08937.
Kostic, Aleksandar D, Dirk Gevers, Chandra Sekhar Pedamallu, Monia Michaud, Fujiko Duke, Ashlee M Earl, Akinyemi I Ojesina, et al. 2012. Genomic Analysis Identifies Association of Fusobacterium with Colorectal Carcinoma. Genome Research 22 (2). Cold Spring Harbor Lab: 292–98.
McMurdie, Paul J, and Susan Holmes. 2013. Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PloS One 8 (4). Public Library of Science: e61217.
1 2 3 4 5 6 7 8 9 10 11 | ## Not run:
library(dacomp)
set.seed(1)
data = dacomp.generate_example_dataset_continuous(n = 100,
m1 = 30,signal_strength_as_change_in_microbial_load = 0.1)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.