sim_microsats: Simulate micrasatellite data using the coalescent

Description Usage Arguments Details References Examples

Description

Simulate micrasatellite data using the coalescent

Usage

1
2
3
sim_microsats(theta, n_ind, n_loc, n_pop = NULL, mutation_model = "smm",
  ancestral_allele_size = 80, ms_options = NULL, p_single = NULL,
  sigma2 = NULL)

Arguments

theta

The per locus mutation rate.

n_ind

An integer indicating the number of diploid individuals to be simulated, or a vector indicating how many individuals to sample per simulated population

n_loc

An integer indicating the number of loci to be simulated.

n_pop

An integer indicating the number of populations to be simulated. If NULL, then assume n_pop = 1.

mutation_model

A character string indicating the mutation model to use. Currently, only the strict stepwise mutation model of Ohta and Kimura (1973) ('smm'), and the DiRienzo et al. (1994) two-phase model ('tpm') are implemented. Default is 'smm'

ancestral_allele_size

An integer indicating the ancestral allele size (i.e, the count of number of repeats of the ancestral allele). Default is 80 (see Details).

ms_options

A string with additional options to pass on to ms.

p_single

Probability of a single-step mutation to be used in the 'tmp' model

sigma2

Variance in allele size to be used in the 'tpm' model

Details

A tree is first simulated using 'ms'. Mutations are simulated along the branches of the tree following a Poisson distribution with lambda proportional to branch length times theta (4Nmu).

The number of mutations along the branches are then transformed into gain/loss of repeat units using mutate_microsats function.

Details of each mutation model in the vignette.

References

Di Rienzo, A., Peterson, A. C., Garza, J. C., Valdes, A. M., Slatkin, M., & Freimer, N. B. (1994). Mutational processes of simple-sequence repeat loci in human populations. Proceedings of the National Academy of Sciences of the United States of America, 91(8), 3166–3170.

Ohta, T., & Kimura, M. (2007). A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genetical Research, 89(5-6), 367–370. http://doi.org/10.1017/S0016672308009531

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# generate a simulated dataset for a single population with theta = 5,
# from which 100 loci were genotyped for 20 individuals
simd_data_1pop <- microsimr::sim_microsats(theta = 5,
                                           n_ind = 20,
                                           n_loc = 100,
                                           n_pop = 1)

# generate a simulated dataset for a three populations with theta = 12,
# from which 100 loci were genotyped for 20 individuals each sampled from
# three populations with 20 migrants per generation

simd_data_1pop <- microsimr::sim_microsats(theta = 12,
                                           n_ind = c(20, 20, 20),
                                           n_loc = 100,
                                           n_pop = 3,
                                           ms_options = "-I 3 40 40 40 20")
# Notice the 'ms_options' parameter, the -I indicates to model an island
# population structure model. The three 40's indicate to sample 40 chromosomes
# from each of the three modeled populations, and the 20 tells ms to simulate
# populations exchanging 20 migrants per generation. Additional options,
# including population growth, assymetric migration, isolation, etc. can
# be modeled by using the right combination of parameters. These are laid
# out in ms's manual. Examples can be added depending on user's needs. Please
# request additional examples through the GitHub issues page.

andersgs/microsimr documentation built on May 12, 2019, 2:42 a.m.