Generation of Dirichlet-Multinomial Random Samples

Share:

Description

Random generation of Dirichlet-Multinomial samples.

Usage

1

Arguments

Nrs

A vector specifying the number of reads or sequence depth for each sample.

shape

A vector of Dirichlet parameters for each taxa.

Details

The Dirichlet-Multinomial distribution is given by (Mosimann, J. E. (1962); Tvedebrink, T. (2010)),

\textbf{P}≤ft ({\textbf{X}_i}=x_{i};≤ft \{ π_j \right \},θ\right )=\frac{N_{i}!}{x_{i1} !,…,x_{iK} !}\frac{∏_{j=1}^K ∏_{r=1}^{x_{ij}} ≤ft \{ π_j ≤ft ( 1-θ \right )+≤ft ( r-1 \right )θ\right \}}{∏_{r=1}^{N_i}≤ft ( 1-θ\right )+≤ft ( r-1 \right) θ}

where \textbf{x}_{i}= ≤ft [ x_{i1}, …, x_{iK} \right ] is the random vector formed by K taxa (features) counts (RAD vector), N_{i}= ∑_{j=1}^K x_{ij} is the total number of reads (sequence depth), ≤ft\{ π_j \right\} are the mean of taxa-proportions (RAD-probability mean), and θ is the overdispersion parameter.

Note: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.

Value

A data matrix of taxa counts where the rows are samples and columns are the taxa.

Author(s)

Patricio S. La Rosa, Elena Deych, Berkley Shands, William D. Shannon

References

Mosimann, J. E. (1962). On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika 49, 65-82.
Tvedebrink, T. (2010). Overdispersion in allelic counts and theta-correction in forensic genetics. Theor Popul Biol 78, 200-210.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
	data(saliva)
	
	### Generate a random vector of number of reads per sample
	Nrs <- rep(15000, 20) 
	
	### Get a list of dirichlet-multinomial parameters for the data
	fit.saliva <- dirmult(saliva) 
	
	dirmult_data <- Dirichlet.multinomial(Nrs, fit.saliva$gamma)
	dirmult_data[1:5, 1:5]

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.