Q_simulate: Simulate one or more Q matrices using the Dirichlet...

Q_simulateR Documentation

Simulate one or more Q matrices using the Dirichlet distribution

Description

Simulates Q matrices by drawing vectors of membership coefficients from a Dirichlet distribution parameterized by two variables: \alpha, which controls variability, and \lambda=(\lambda_1, \lambda_2, ...., \lambda_K) which controls the mean of each of the K ancestry coefficients.

Usage

Q_simulate(alpha, lambda, popsize, rep = 1, seed)

Arguments

alpha

A number greater than 0 that sets the variability of the membership coefficients under the Dirichlet model. The variance of coefficient k is Var[x_k] = \lambda_k(1-\lambda_k)/(\alpha+1). Larger values of \alpha lead to lower variability. alpha can also be a numeric vector, in which case rep groups of popsize rows are simulated for each entry of alpha.

lambda

A vector that sets the mean membership of each ancestral cluster across the population. The vector must sum to 1.

popsize

The number of individuals to include in each population.

rep

The number of populations to generate. Default is 1.

seed

Optional; sets the random seed. Use if reproducibility of random results is desired.

Value

A data frame containing the simulated ancestry vectors. Each row represents a single simulated individual. The data frame has the following columns

  • rep: Which population the row belongs to (a number between 1 and the parameter rep)

  • ind: Which individual in each population the row corresponds to (a number between 1 and the parameter popsize)

  • alpha: The alpha value used for that population.

  • Pop: alpha_rep (where rep and alpha are the first and third columns as described in this list). Serves as a unique identifier for each population.

  • spacer: a repeated ":" to make simulated Q matrices match output of population structure inference software.

  • q1, q2, etc.: Membership coefficients (sum to 1).

Examples

# Simulate ancestry for 100 random populations of 50 individuals.
# In this example, each Q matrix has
# 100 individuals.
# On average these individuals have
# mean ancestry (1/2, 1/4, 1/4)
# from each of 3 ancestral clusters.
# The variance of each cluster i is
# Var[q_i] = lambda_i(1-lambda_i)/(alpha + 1)
# Here lambda_1 = 1/2,
#      lambda_2 = lambda_3 = 1/4

Q <- Q_simulate(
  alpha = 1,
  lambda = c(1 / 2, 1 / 4, 1 / 4),
  popsize = 50,
  rep = 100,
  seed = 1
)


MaikeMorrison/FSTruct documentation built on Aug. 26, 2023, 7:01 a.m.