Function to simulate allele frequencies for independent loci from a Dirichlet model


The simufreqD function simulate single population allele frequencies for independent loci. Allele frequencies are generated as random deviates from a Dirichlet distribution, whose parameters control the mean and the variance of the simulated allele frequencies.


simufreqD(nloc = 1, nal = 2, alpha = 1)



the number of loci to simulate


the numbers of alleles per locus. Either an integer, if the loci have the same number of alleles, or an integer vector, if the number of alleles differ between loci


the parameter used to simulate allele frequencies from the Dirichlet distribution. If the nloc loci have the same allele number, alpha can either be the same for all alleles (default is one: uniform distribution), in this case alpha is an integer, or alpha can be different between alleles at a given locus, in this case, alpha is a matrix of dimension nal x nloc.

When the number of alleles differ between loci, alpha can either be the same or differ between alleles at a given locus. In the first case alpha is a vector of length nloc, in the second case, alpha is a matrix of dimensions nal x nloc where NAs are introduced for alleles not seen at a given locus.


Allele frequencies for independent loci are simulated using a Dirichlet distribution with parameter alpha. At a given locus L with n alleles, the allele frequencies are modeled as a vector of random variables p=(p1, ..., pn), following a Dirichlet distribution with parameters:
alpha = (alpha1, ..., alphan) where p1+...+pn=1 and alpha1,..., alphan > 0.


A matrix containing the simulated allele frequencies. The data is presented in the format of the Journal of Forensic Sciences for genetic data: allele names are given in the first column, and frequencies for a given allele are read in rows for the different markers in columns. When an allele is not observed for a given locus, the value is coded NA (instead of "-" in the original format).


The code used here for the generation of random Dirichlet deviates was previously implemented in the gtools library.


Hinda Haned


Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions, vol 2. John Wiley & Sons, 1995.

Wright S. The genetical structure of populations. Ann Eugen 1951;15:323-354.

See Also



#simulate alleles frequencies for 5 markers with respectively 2, 3, 4, 5, and 6 alleles

simufreqD(nloc=5,na=c(2,3,4,5,6) , alpha=1)

Want to suggest features or report bugs for Use the GitHub issue tracker. Vote for new features on Trello.

comments powered by Disqus