ErrToM | R Documentation |
Generate a matrix with the probabilities of observed genotypes (columns) conditional on actual genotypes (rows), or return a function to generate such matrices (using a single value Err as input to that function).
ErrToM(Err = NA, flavour = "version2.0", Return = "matrix")
Err |
estimated genotyping error rate, as a single number, or 3x3 or 4x4
matrix, or length 3 vector. If a single number, an error model is used that
aims to deal with scoring errors typical for SNP arrays. If a matrix, this
should be the probability of observed genotype (columns) conditional on
actual genotype (rows). Each row must therefore sum to 1. If
|
flavour |
matrix-generating function, or one of 'version2.0',
'version1.3' (='SNPchip'), 'version1.1' (='version111'), referring to the
sequoia version in which it was used as default. Only used if |
Return |
output, 'matrix' (default), 'function', or 'vector' |
By default (flavour
= "version2.0"), Err
is
interpreted as a locus-level error rate (rather than allele-level), and
equals the probability that an actual heterozygote is observed as either
homozygote (i.e., the probability that it is observed as AA = probability
that observed as aa = Err
/2). The probability that one homozygote is
observed as the other is (Err
/2)^2
.
The inbuilt 'flavours' correspond to the presumed and simulated error structures, which have changed with sequoia versions. The most appropriate error structure will depend on the genotyping platform; 'version0.9' and 'version1.1' were inspired by SNP array genotyping while 'version1.3' and 'version2.0' are intended to be more general.
Pr(observed genotype (columns) | actual genotype (rows)):
version2.0:
0 | 1 | 2 | |
0 | (1-E/2)^2 | E(1-E/2) | (E/2)^2 |
1 | E/2 | 1-E | E/2 |
2 | (E/2)^2 | E(1-E/2) | (1-E/2)^2 |
version1.3
0 | 1 | 2 | |
0 | 1-E-(E/2)^2 | E | (E/2)^2 |
1 | E/2 | 1-E | E/2 |
2 | (E/2)^2 | E | 1-E-(E/2)^2 |
version1.1
0 | 1 | 2 | |
0 | 1-E | E/2 | E/2 |
1 | E/2 | 1-E | E/2 |
2 | E/2 | E/2 | 1-E |
version0.9 (not recommended)
0 | 1 | 2 | |
0 | 1-E | E | 0 |
1 | E/2 | 1-E | E/2 |
2 | 0 | E | 1-E |
When Err
is a length 3 vector, or if Return = 'vector'
these
are the following probabilities:
hom|hom: an actual homozygote is observed as the other homozygote
het|hom: an actual homozygote is observed as heterozygote
hom|het: an actual heterozygote is observed as homozygote
and Pr(observed genotype (columns) | actual genotype (rows)) is then:
0 | 1 | 2 | |
0 | 1-E_1-E_2 | E_2 | E_1 |
1 | E_3 | 1-2E_3 | E_3 |
2 | E_1 | E_2 | 1-E_1-E_2 |
The only assumption made is that the two alleles can be treated equally, i.e. observing actual allele $A$ as $a$ is as likely as observing actual $a$ as $A$, and so e.g. P(obs=1|act=0) = P(obs=1|act=2).
When the SNPs are scored via sequencing (e.g. RADseq or DArTseq), the 3rd error rate (hom|het) is typically considerably higher than the other two, while for SNP arrays it tends to be similar to P(het|hom).
Depending on Return
, either:
'matrix'
: a 3x3 matrix, with probabilities of observed genotypes
(columns) conditional on actual (rows)
'function'
: a function taking a single value Err
as input, and
generating a 3x3 matrix
'vector'
: a length 3 vector, with the probabilities (observed given
actual) hom|other hom, het|hom, and hom|het.
EstEr
to estimate genotyping error rate as a length 3
vector.
ErM <- ErrToM(Err = 0.05)
ErM
ErrToM(ErM, Return = 'vector')
# use error matrix from Whalen, Gorjanc & Hickey 2018
funE <- function(E) {
matrix(c(1-E*3/4, E/2, E/4,
E/4, 1-2*E/4, E/4,
E/4, E/2, 1-E*3/4),
3,3, byrow=TRUE) }
ErrToM(Err = 0.05, flavour = funE)
# equivalent to:
ErrToM(Err = c(0.05/4, 0.05/2, 0.05/4))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.