draw_all_unstructured: Simulate random allele frequencies and genotypes from the...

View source: R/draw_all_unstructured.R

draw_all_unstructuredR Documentation

Simulate random allele frequencies and genotypes from the unstructured model

Description

This function returns simulated ancestral allele frequencies and genotypes without structure, meaning individuals draw their genotypes independently and identically from the Binomial distribution with the same ancestral allele frequency per locus. The function is a wrapper around draw_p_anc() with additional features such as requiring polymorphic loci, mimicking draw_all_admix() in options as applicable. Importantly, by default fixed loci (where all individuals were homozygous for the same allele) are re-drawn from the start (starting from the ancestral allele frequencies) so no fixed loci are in the output. Below m_loci (also m) is the number of loci and n_ind is the number of individuals.

Usage

draw_all_unstructured(
  n_ind,
  m_loci = NA,
  beta = NA,
  p_anc = NULL,
  require_polymorphic_loci = TRUE,
  maf_min = 0,
  verbose = TRUE
)

Arguments

n_ind

The number of individuals to draw (required).

m_loci

The number of loci to draw. Required except when p_anc below is provided and is a vector, in which case the number of loci equals the length of p_anc and the value of m_loci passed is ignored.

beta

Shape parameter for a symmetric Beta for ancestral allele frequencies p_anc. If NA (default), p_anc is uniform with range in [0.01, 0.5]. Otherwise, p_anc has a symmetric Beta distribution with range in [0, 1]. Has no effect if p_anc option is non-NULL.

p_anc

If provided, it is used as the ancestral allele frequencies (instead of drawing random ones). Must either be a scalar or a length-m_loci vector. If scalar, m_loci is required, and the returned p_anc is the scalar value repeated m_loci times. If p_anc is a vector, its length is used to define m_loci and the value of m_loci passed is ignored. If a locus was fixed and has to be redrawn, the ancestral allele frequency in p_anc is retained and only genotypes are redrawn.

require_polymorphic_loci

If TRUE (default), returned genotype matrix will not include any fixed loci (loci that happened to be fixed are drawn again, starting from their ancestral allele frequencies, and checked iteratively until no fixed loci remain, so that the final number of polymorphic loci is exactly m_loci).

maf_min

The minimum minor allele frequency (default zero), to extend the working definition of "fixed" above to include rare variants. This helps simulate a frequency-based locus ascertainment bias. Loci with minor allele frequencies less than or equal to this value are treated as fixed (passed to fixed_loci()). This parameter has no effect if require_polymorphic_loci is FALSE.

verbose

If TRUE, prints messages for every stage in the algorithm.

Value

A named list with the following items (which may be missing depending on options):

  • X: An m_loci-by-n_ind matrix of genotypes.

  • p_anc: A length-m_loci vector of ancestral allele frequencies.

Examples

# dimensions
# number of loci
m_loci <- 10
# number of individuals
n_ind <- 5

# draw all random allele freqs and genotypes
out <- draw_all_unstructured( n_ind, m_loci )

# return value is a list with these items:

# genotypes
X <- out$X

# ancestral AFs
p_anc <- out$p_anc


StoreyLab/bnpsd documentation built on July 29, 2023, 3:31 a.m.