draw_genotypes_admix: Draw genotypes from the admixture model

Description Usage Arguments Value Examples

View source: R/draw_genotypes_admix.R

Description

Given the Individual-specific Allele Frequency (IAF) matrix p_ind for m loci (rows) and n individuals (columns), the genotype matrix X (same dimensions as p_ind) is drawn from the Binomial distribution equivalent to X[ i, j ] <- rbinom( 1, 2, p_ind[ i, j ] ), except the function is more efficient. If admix_proportions is provided as the second argument (a matrix with n individuals along rows and k intermediate subpopulations along the columns), the first argument p_ind is treated as the intermediate subpopulation allele frequency matrix (must be m-by-k) and the IAF matrix is equivalent to p_ind %*% t( admix_proportions ). However, in this case the function computes the IAF matrix in parts only, never stored in full, greatly reducing memory usage. If admix_proportions is missing, then p_ind is treated as the IAF matrix.

Usage

1
draw_genotypes_admix(p_ind, admix_proportions = NULL)

Arguments

p_ind

The m-by-n IAF matrix (if admix_proportions is missing) or the m-by-k intermediate subpopulation allele frequency matrix (if admix_proportions is present).

admix_proportions

The optional n-by-k admixture proportion matrix (to draw data from the admixture model using reduced memory, by not fully forming the IAF matrix). If provided, and if both admix_proportions and p_ind have column names, and if they disagree, the function stops as a precaution, as this suggests the data is misaligned or inconsistent in some way.

Value

The m-by-n genotype matrix. If admix_proportions is missing, the row and column names of p_ind are copied to this output. If admix_proportions is present, the row names of the output are the row names of p_ind, while the column names of the output are the row names of admix_proportions.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# dimensions
# number of loci
m_loci <- 10
# number of individuals
n_ind <- 5
# number of intermediate subpops
k_subpops <- 2

# define population structure
# FST values for k = 2 subpops
inbr_subpops <- c(0.1, 0.3)
# non-trivial admixture proportions
admix_proportions <- admix_prop_1d_linear(n_ind, k_subpops, sigma = 1)

# draw allele frequencies
# vector of ancestral allele frequencies
p_anc <- draw_p_anc(m_loci)

# matrix of intermediate subpop allele freqs
p_subpops <- draw_p_subpops(p_anc, inbr_subpops)

# matrix of individual-specific allele frequencies
p_ind <- make_p_ind_admix(p_subpops, admix_proportions)

# draw genotypes from intermediate subpops (one individual each)
X_subpops <- draw_genotypes_admix(p_subpops)

# and genotypes for admixed individuals
X_ind <- draw_genotypes_admix(p_ind)

# draw genotypes for admixed individuals without p_ind intermediate
# (p_ind is computed internally in parts, never stored in full,
# reducing memory use substantially)
X_ind <- draw_genotypes_admix(p_subpops, admix_proportions)

bnpsd documentation built on Aug. 25, 2021, 5:07 p.m.