microhaplotype_geno_err_matrix: create matrix C for probability of observed genotypes from...

View source: R/geno_err_models.R

microhaplotype_geno_err_matrixR Documentation

create matrix C for probability of observed genotypes from microhaplotype data

Description

This is intended for the case where the genotypes in question are composed of alleles that are actually the multi-SNP haplotypes obtained from next generation sequence data. In other words, all the SNPs occur on a single read and the phase is known because they are all together on the read. It allows for a SNP-specific sequencing error rate. The haplotypes must be named as strings of A, C, G, or T, (though they could be strings of any characters—the function isn't going to check that!) and for now we assume that if the SNPs are multiallelic then genotyping errors to either of the alternate alleles are equally likely. Currently assumes that genotyping errors are equally likely in either direction at a SNP, too.

Usage

microhaplotype_geno_err_matrix(
  haps,
  snp_err_rates = 0.005,
  dropout_rates = 0.005,
  scale_by_num_snps = FALSE
)

Arguments

haps

character vector of strings that denote the haplotypes at the locus. For example "CCAG", "CTAG", "GCAG", etc. Note that these should be in the same order as they are given in the allele frequency definitions (so that the ordering of genotypes made from them will be correct). Each element of haps must be a string of the same number of characters. haps cannot be a factor.

snp_err_rates

Vector of rates at which sequencing errors are expected at each of the SNPs that are in the haplotype. This recycles if its length is less than the number of SNPs in the haplotypes.

dropout_rates

Haplotype-specific rates of allelic dropout. Recycles if need be.

scale_by_num_snps

Logical. If true, then the error rate is divided by the number of SNPs in each microhaplotype.

Examples

# five haplotypes in alphabetical order
haps <- c("AACC", "GACC", "GATA", "GTCC", "GTTC")

# make the matrix C
C_mat <- microhaplotype_geno_err_matrix(haps)

# look at the first part of it
C_mat[1:5, 1:5]

eriqande/CKMRsim documentation built on Aug. 2, 2024, 7:23 a.m.