genoConvMat: Matrices for Conversion between Multiallelic Genotype...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/prob_convert.R

Description

By default, this function will produce a matrix that can be used for converting a matrix of multiallelic genotype probabilities to a matrix of allele copy number probabilties via matrix multiplication. If inverse = TRUE, the Moore-Penrose pseudoinverse of the matrix is returned, to be used for converting allele copy number probabilities to approximate multiallelic genotype probabilities via matrix multiplication.

Usage

1
genoConvMat(ploidy, n_alleles, inverse = FALSE)

Arguments

ploidy

An integer indicating the ploidy.

n_alleles

An integer indicating the number of alleles.

inverse

A logical value. If TRUE, the pseudoinverse matrix is returned.

Details

If we know probabilities of multiallelic genotypes, it is easy to derive allele copy number probabilities from them. For example, in a tetraploid with three alleles, the probabilties of having two copies of allele 0 is the sum of the probabilities of genotypes 0011, 0012, and 0022. The matrix generated when inverse = FALSE is used for calculating these sums via matrix multiplication.

CM = A

where M is the matrix of multiallelic genotype probabilities, with genotypes in rows and individuals in columns, A is the matrix of allele copy number probabilties, with allele copy numbers in rows and individuals in columns, and C is the matrix generated by genoConvMat.

If instead we know probabilities of allele copy numbers, we can estimate multiallelic genotype probabilities using the inverse of C (setting the function to inverse = TRUE):

M = C^{-1}A

Given that C^{-1} may not be solvable, the pseudoinverse is used, which represents the shortest length least squares solution to the equation.

Value

If inverse = FALSE, a matrix with allele copy numbers in rows and multiallelic genotypes in columns. All values are either 0 or 1, indicating whether or not that multiallelic genotype corresponds to that copy number for that allele.

If inverse = TRUE, a matrix with multiallelic genotypes in rows and allele copy numbers in columns, with numbers ranging from -1 to +1. These numbers indicate how much each allele copy number probability contributes to each multiallelic genotype probability.

Rows and columns are named to assist the user with interpretation of the matrix. Genotypes are ordered according to the VCF specification. Alleles are numbered starting at zero. Allele copy numbers are ordered from 0 to ploidy for allele 0, then 0 to ploidy for allele 1, etc.

Author(s)

Lindsay V. Clark

See Also

enumerateGenotypes

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# say we have a tetraploid with three alleles
convmat1 <- genoConvMat(4, 3, inverse = FALSE)
convmat2 <- genoConvMat(4, 3, inverse = TRUE)

# generate some multiallelic genotype probs for this example
genoprobs <- matrix(nrow = 15, ncol = 2,
                    dimnames = list(genotypeStrings(4, 3, sep = ""),
                                    c("ind1", "ind2")))
genoprobs[,1] <- sample(c(5000, sample(100, 14)))
genoprobs[,1] <- genoprobs[,1]/sum(genoprobs[,1])
genoprobs[,2] <- sample(c(900, sample(100, 14)))
genoprobs[,2] <- genoprobs[,2]/sum(genoprobs[,2])

# convert to allele dosage probabilities
alprobs <- convmat1 %*% genoprobs

# convert back to multiallelic genotype probabilities
genoprobs2 <- convmat2 %*% alprobs

ploidyverse/ploidyverseClasses documentation built on May 25, 2019, 2:21 p.m.