sim_geno_cline: Simulate genetic data from a cline

Description Usage Arguments Details Value Examples

View source: R/sim_geno_cline.R

Description

This function generates a dataframe with simulated genotypic data sampled from a genetic cline. The sampling sites, number of individuals, level of inbreeding, and cline parameters are supplied by the user. Cline parameters are flexible, and can model both sigmoid clines and stepped clines with introgresison tails.

Usage

1
2
3
sim_geno_cline(transect_distances, n_ind, Fis, decrease, center, width,
  pmin = 0, pmax = 1, deltaL = NULL, tauL = NULL, deltaR = NULL,
  tauR = NULL)

Arguments

transect_distances

The distances along the transect for the simulated sampling sites. A numeric vector.

n_ind

The number of diploid individuals sampled at each site. Either a single numeric value (for constant sampling), or a numeric vector equal in length to transect_distances.

Fis

The inbreeding coefficient, Fis, for each site. Must be between 0 and 1 (inclusive). Either a single numeric value (for constant inbreeding), or a numeric vector equal in length to transect_distances.

decrease

Is the cline decreasing in frequency? TRUE or FALSE.

center

The location of the cline center, in the same distance units as transect_distances. Numeric, must be greater than 0.

width

The width of the cline, in the same distance units as transect_distances. Numeric, must be greater than 0.

pmin, pmax

Optional. The minimum and maximum allele frequency values in the tails of the cline. Default values are 0 and 1, respectively. Must be between 0 and 1 (inclusive). Numeric.

deltaL, tauL

Optional delta and tau parameters which describe the left exponential tail. Must supply both to generate a tail. Default is NULL (no tails). Numeric. tauL must be between 0 and 1 (inclusive).

deltaR, tauR

Optional delta and tau parameters which describe the right exponential tail. Must supply both to generate a tail. Default is NULL (no tails). Numeric. tauR must be between 0 and 1 (inclusive).

Details

This function calls general_cline_eqn for each sampled point along the cline to generate the expected allele frequency at that point. The calculated allele frequency and provided Fis are then used to calculate the expected genotype frequencies for each site, according to the equations:

AA = p^2 + p(1-p)Fis

Aa = 2p(1-p)(1-Fis)

aa = (1-p)^2 + p(1-p)Fis

Sampled genotypes are then drawn from a multinomial distribution, with the expected genotype frequencies as the probabilities. From those sampled genotypes, empirical allele frequencies (emp.p) and empirical Fis estimates (emp.f) are calculated as:

emp.p = (2*AA + Aa)/N

emp.f = (Hexp - Hobs)/Hexp

where AA is the number of honozygotes of the focal allele, Aa is the number of heterozygotes, N is the number of sampled individuals, and Hexp and Hobs are the expected and observed heterozygosity. Fis values are corrected with correct_fis, such that empirical Fis values which are undefined or < 0 are corrected to 0.

Value

A data frame of simulated genetic data sampled from the cline. Columns are:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Simulate genotype data from a decreasing cline
# with center at 100, width of 30.
# Sites are 20 units apart, from 0 to 200.
# 20 individuals are sampled at each site.
# Inbreeding is constant at Fis = 0.1.

set.seed(123)
sim_geno_cline(transect_distance = seq(0,200,20),
               n_ind = 20, Fis = 0.1,
               decrease = TRUE,
               center = 100, width = 30)

# Simulate genotype data from an increasing cline
# with center at 272, width of 91.
# The minimum and maximum allele frequencies
# are 0.07 and 0.98, respectively, and
# there is an introgression tail on the
# left side with deltaL = 29 and tauL = 0.8.

# Sites are 13 units apart, from 162 to 312.
# At each site, the number of individuals sampled
# is drawn from a random normal distribution with
# mean = 25 and sd = 5
# Inbreeding is constant at Fis = 0.

set.seed(123)
ind_sampling <- as.integer(rnorm(length(seq(162,312,12)), 25, 5))
sim_geno_cline(transect_distance = seq(162,312,12),
               n_ind = ind_sampling,
               Fis = 0, decrease = FALSE,
               center = 272, width = 91,
               pmin = 0.07, pmax = 0.98,
               deltaL = 29, tauL = 0.8)

tjthurman/BAHZ documentation built on May 30, 2020, 8:28 a.m.