MkGenoErrors: Simulate Genotyping Errors

View source: R/SimGeno.R

MkGenoErrorsR Documentation

Simulate Genotyping Errors

Description

Generate errors and missing values in a (simulated) genotype matrix.

Usage

MkGenoErrors(
  SGeno,
  CallRate = 0.99,
  SnpError = c((5e-04/2)^2, 5e-04/2, 5e-04 * (1 - 5e-04/2)),
  ErrorFM = function(E) {
     matrix(c(1 - E - (E/2)^2, E, (E/2)^2, E/2, 1 - E, E/2,
    (E/2)^2, E, 1 - E - (E/2)^2), 3, 3, byrow = TRUE)
 },
  Error.shape = 0.5,
  CallRate.shape = 1
)

Arguments

SGeno

matrix with genotype data in Sequoia's format: 1 row per individual, 1 column per SNP, and genotypes coded as 0/1/2.

CallRate

either a single number for the mean call rate (genotyping success), OR a vector with the call rate at each SNP, OR a named vector with the call rate for each individual. In the third case, ParMis is ignored, and individuals in the pedigree (as id or parent) not included in this vector are presumed non-genotyped.

SnpError

either a single value which will be combined with ErrorFM, or a length 3 vector with hom->other hom, hom->het, het-hom error rates; OR a vector or 3XnSnp matrix with the genotyping error rate(s) for each SNP.

ErrorFM

function taking the error rate (scalar) as argument and returning a 4x4 or 3x3 matrix with probabilities that actual genotype i (rows) is observed as genotype j (columns).

Error.shape

first shape parameter (alpha) of beta-distribution of per-SNP error rates. A higher value results in a flatter distribution.

CallRate.shape

as Error.shape, for per-SNP call rates.

Value

The input genotype matrix, with some genotypes replaced, and some set to missing (-9).


sequoia documentation built on Sept. 8, 2023, 5:29 p.m.