View source: R/geno_err_models.R
microhaplotype_geno_err_matrix | R Documentation |
This is intended for the case where the genotypes in question are composed of alleles that are actually the multi-SNP haplotypes obtained from next generation sequence data. In other words, all the SNPs occur on a single read and the phase is known because they are all together on the read. It allows for a SNP-specific sequencing error rate. The haplotypes must be named as strings of A, C, G, or T, (though they could be strings of any characters—the function isn't going to check that!) and for now we assume that if the SNPs are multiallelic then genotyping errors to either of the alternate alleles are equally likely. Currently assumes that genotyping errors are equally likely in either direction at a SNP, too.
microhaplotype_geno_err_matrix(
haps,
snp_err_rates = 0.005,
dropout_rates = 0.005,
scale_by_num_snps = FALSE
)
haps |
character vector of strings that denote the haplotypes at the locus. For example "CCAG", "CTAG", "GCAG", etc. Note that these should be in the same order as they are given in the allele frequency definitions (so that the ordering of genotypes made from them will be correct). Each element of haps must be a string of the same number of characters. haps cannot be a factor. |
snp_err_rates |
Vector of rates at which sequencing errors are expected at each of the SNPs that are in the haplotype. This recycles if its length is less than the number of SNPs in the haplotypes. |
dropout_rates |
Haplotype-specific rates of allelic dropout. Recycles if need be. |
scale_by_num_snps |
Logical. If true, then the error rate is divided by the number of SNPs in each microhaplotype. |
# five haplotypes in alphabetical order
haps <- c("AACC", "GACC", "GATA", "GTCC", "GTTC")
# make the matrix C
C_mat <- microhaplotype_geno_err_matrix(haps)
# look at the first part of it
C_mat[1:5, 1:5]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.