Q_simulate: Simulate one or more Q matrices using the Dirichlet...
In MaikeMorrison/FSTruct: Measure variability in population structure estimates

Q_simulate

R Documentation

Simulate one or more Q matrices using the Dirichlet distribution

Description

Simulates Q matrices by drawing vectors of membership coefficients from a Dirichlet distribution parameterized by two variables: \alpha, which controls variability, and \lambda=(\lambda_1, \lambda_2, ...., \lambda_K) which controls the mean of each of the K ancestry coefficients.

Usage

Q_simulate(alpha, lambda, popsize, rep = 1, seed)

Arguments

`alpha`	A number greater than 0 that sets the variability of the membership coefficients under the Dirichlet model. The variance of coefficient k is `Var[x_k] = \lambda_k(1-\lambda_k)/(\alpha+1)`. Larger values of `\alpha` lead to lower variability. `alpha` can also be a numeric vector, in which case `rep` groups of `popsize` rows are simulated for each entry of `alpha`.
`lambda`	A vector that sets the mean membership of each ancestral cluster across the population. The vector must sum to 1.
`popsize`	The number of individuals to include in each population.
`rep`	The number of populations to generate. Default is 1.
`seed`	Optional; sets the random seed. Use if reproducibility of random results is desired.

Value

A data frame containing the simulated ancestry vectors. Each row represents a single simulated individual. The data frame has the following columns

rep: Which population the row belongs to (a number between 1 and the parameter rep)
ind: Which individual in each population the row corresponds to (a number between 1 and the parameter popsize)
alpha: The alpha value used for that population.
Pop: alpha_rep (where rep and alpha are the first and third columns as described in this list). Serves as a unique identifier for each population.
spacer: a repeated ":" to make simulated Q matrices match output of population structure inference software.
q1, q2, etc.: Membership coefficients (sum to 1).

Examples

# Simulate ancestry for 100 random populations of 50 individuals.
# In this example, each Q matrix has
# 100 individuals.
# On average these individuals have
# mean ancestry (1/2, 1/4, 1/4)
# from each of 3 ancestral clusters.
# The variance of each cluster i is
# Var[q_i] = lambda_i(1-lambda_i)/(alpha + 1)
# Here lambda_1 = 1/2,
#      lambda_2 = lambda_3 = 1/4

Q <- Q_simulate(
  alpha = 1,
  lambda = c(1 / 2, 1 / 4, 1 / 4),
  popsize = 50,
  rep = 100,
  seed = 1
)

MaikeMorrison/FSTruct documentation built on Aug. 26, 2023, 7:01 a.m.