simulate_g2: Simulate g2

Description Usage Arguments Details Value Author(s) Examples

View source: R/simulate_g2.R

Description

This function can be used to simulate genotype data, draw subsets of loci and calculate the respective g2 values. Every subset of markers is drawn independently to give insights into the variation and precision of g2 calculated from a given number of markers and individuals.

Usage

1
2
simulate_g2(n_ind = NULL, H_nonInb = 0.5, meanF = 0.2, varF = 0.03,
  subsets = NULL, reps = 100, type = c("msats", "snps"), CI = 0.95)

Arguments

n_ind

number of individuals to sample from the population

H_nonInb

true genome-wide heteorzygosity of a non-inbred individual

meanF

mean realized inbreeding f

varF

variance in realized inbreeding f

subsets

a vector specifying the sizes of marker-subsets to draw. Specifying subsets = c(2, 5, 10, 15, 20) would draw marker sets of 2 to 20 markers. The minimum number of markers to calculate g2 is 2.

reps

number of resampling repetitions

type

specifies g2 formula. Type "snps" for large datasets and "msats" for smaller datasets.

CI

Confidence intervals to calculate (default to 0.95)

Details

The simulate_g2 function simulates genotypes from which subsets of loci can be sampled independently. These simulations can be used to evaluate the effects of the number of individuals and loci on the precision and magnitude of g2. The user specifies the number of simulated individuals (n_ind), the subsets of loci (subsets) to be drawn, the heterozygosity of non-inbred individuals (H_nonInb) and the distribution of f among the simulated individuals. The f values of the simulated individuals are sampled randomly from a beta distribution with mean (meanF) and variance (varF) specified by the user (e.g. as in wang2011). This enables the simulation to mimic populations with known inbreeding characteristics, or to simulate hypothetical scenarios of interest. For computational simplicity, allele frequencies are assumed to be constant across all loci and the simulated loci are unlinked. Genotypes (i.e. the heterozygosity/homozygosity status at each locus) are assigned stochastically based on the f values of the simulated individuals. Specifically, the probability of an individual being heterozygous at any given locus (H) is expressed as H = H0(1-f) , where H0 is the user-specified heterozygosity of a non-inbred individual and f is an individual's inbreeding coefficient drawn from the beta distribution.

Value

simulate_g2 returns an object of class "inbreed". The functions 'print' and 'plot' are used to print a summary and to plot the g2 values with means and confidence intervals

An 'inbreed' object from simulate_g2 is a list containing the following components:

call

function call.

estMat

matrix with all r2(h,f) estimates. Each row contains the values for a given subset of markers

true_g2

"true" g2 value based on the assigned realized inbreeding values

n_ind

specified number of individuals

subsets

vector specifying the marker sets

reps

repetitions per subset

H_nonInb

true genome-wide heteorzygosity of a non-inbred individual

meanF

mean realized inbreeding f

varF

variance in realized inbreeding f

min_val

minimum g2 value

max_val

maximum g2 value

all_CI

confidence intervals for all subsets

all_sd

standard deviations for all subsets

Author(s)

Marty Kardos (marty.kardos@ebc.uu.se) & Martin A. Stoffel (martin.adam.stoffel@gmail.com)

Examples

1
2
3
4
5
6
data(mouse_msats)
genotypes <- convert_raw(mouse_msats)
sim_g2 <- simulate_g2(n_ind = 10, H_nonInb = 0.5, meanF = 0.2, varF = 0.03,
                      subsets = c(4,6,8,10), reps = 100, 
                      type = "msats")
plot(sim_g2)

Example output

[1] "done with subsampling of 4 loci"
[1] "done with subsampling of 6 loci"
[1] "done with subsampling of 8 loci"
[1] "done with subsampling of 10 loci"

inbreedR documentation built on Feb. 2, 2022, 5:09 p.m.