Description Usage Arguments Details Value Author(s) See Also Examples
By default, this function will produce a matrix that can be used for converting
a matrix of multiallelic genotype probabilities to a matrix of allele copy
number probabilties via matrix multiplication. If inverse = TRUE
, the
Moore-Penrose pseudoinverse of the matrix is returned, to be used for converting
allele copy number probabilities to approximate multiallelic genotype
probabilities via matrix multiplication.
1 | genoConvMat(ploidy, n_alleles, inverse = FALSE)
|
ploidy |
An integer indicating the ploidy. |
n_alleles |
An integer indicating the number of alleles. |
inverse |
A logical value. If |
If we know probabilities of multiallelic genotypes, it is easy to derive
allele copy number probabilities from them. For example, in a tetraploid
with three alleles, the probabilties of having two copies of allele 0 is
the sum of the probabilities of genotypes 0011, 0012, and 0022. The
matrix generated when inverse = FALSE
is used for calculating these
sums via matrix multiplication.
CM = A
where M is the matrix of multiallelic
genotype probabilities, with genotypes in rows and individuals in columns,
A is the matrix of allele copy number probabilties, with allele
copy numbers in rows and individuals in columns, and C is the
matrix generated by genoConvMat
.
If instead we know probabilities of allele copy numbers, we can estimate
multiallelic genotype probabilities using the inverse of C (setting the
function to inverse = TRUE
):
M = C^{-1}A
Given that C^{-1} may not be solvable, the pseudoinverse is used, which represents the shortest length least squares solution to the equation.
If inverse = FALSE
, a matrix with allele copy numbers in rows and
multiallelic genotypes in columns. All values are either 0
or
1
, indicating whether or not that multiallelic genotype corresponds to
that copy number for that allele.
If inverse = TRUE
, a matrix with multiallelic genotypes in rows and
allele copy numbers in columns, with numbers ranging from -1 to +1. These
numbers indicate how much each allele copy number probability contributes
to each multiallelic genotype probability.
Rows and columns are named to assist the user with interpretation of
the matrix. Genotypes are ordered according to the VCF specification.
Alleles are numbered starting at zero.
Allele copy numbers are ordered from 0
to ploidy
for allele
0
, then 0
to ploidy
for allele 1
, etc.
Lindsay V. Clark
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # say we have a tetraploid with three alleles
convmat1 <- genoConvMat(4, 3, inverse = FALSE)
convmat2 <- genoConvMat(4, 3, inverse = TRUE)
# generate some multiallelic genotype probs for this example
genoprobs <- matrix(nrow = 15, ncol = 2,
dimnames = list(genotypeStrings(4, 3, sep = ""),
c("ind1", "ind2")))
genoprobs[,1] <- sample(c(5000, sample(100, 14)))
genoprobs[,1] <- genoprobs[,1]/sum(genoprobs[,1])
genoprobs[,2] <- sample(c(900, sample(100, 14)))
genoprobs[,2] <- genoprobs[,2]/sum(genoprobs[,2])
# convert to allele dosage probabilities
alprobs <- convmat1 %*% genoprobs
# convert back to multiallelic genotype probabilities
genoprobs2 <- convmat2 %*% alprobs
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.