Description Usage Arguments Details Value Author(s) See Also Examples
These functions provide some utilities for processing multiallelic genotypes, including multinomial probability, adjustment of genotype frequencies under certain mating schemes, and ordering of genotypes according to the VCF specification. They are all implemented in Rcpp, so that they can be used from R or from Rcpp.
1 2 3 4 5 6 7 8 9 10 11 | dmultinom(x, prob)
dDirichletMultinom(x, prob, alpha)
nGen(ploidy, nalleles)
enumerateGenotypes(ploidy, nalleles)
indexGenotype(genotype)
genotypeFromIndex(index, ploidy)
alleleCopy(genotype, nalleles)
makeGametes(genotype)
selfingMatrix(ploidy, nalleles)
|
x |
A vector of integers indicating number of items of each category sampled, for example copy numbers of alleles within a genotype (when estimating genotype frequencies under HWE), or sequence read depth for each allele (when estimating genotype likelihood). |
prob |
A numeric vector the same length as |
alpha |
An overdispersion parameter. Lower values result in more overdispersed
results, whereas higher values will cause the output of
|
ploidy |
An integer indicating the ploidy of the organism. |
nalleles |
An integer indicating the number of alleles, including the reference allele. |
genotype |
An integer vector indicating a genotype. The length of the vector should be the same as the ploidy of the organism. Each item in the vector represents one copy of the locus, with the integer representing an allele. The reference allele should be represented as zero, and alternative alleles as integers above zero. The vector must be sorted in ascending order. |
index |
The index (starting at zero) of a genotype in the VCF order. |
Since these functions are designed to be run repeatedly in computationally
intensive loops, they don't perform error checking. It is highly advisable
to perform error checking at a higher level in the code, for example checking
once that the ploidy is above zero before making millions of calls to
nGen
with that ploidy value.
dmultinom
should give identical results to
stats::dmultinom
, but be faster and accessible
from within Rcpp.
Where K is the length of x, p is the vector of probabilities, α is the overdispersion parameter, and n is the sum of x:
n = sum(i = 1, ..., K) x[i]
The output of dmultinom
is
n! * prod(i = 1, ..., K)((p[i] ^ x[i])/x[i]!)
And the output of dDirichletMultinom
is
n! * Γ(α) / Γ(n + α) * prod(i = 1, ...K)(Γ(p[i] * α + x[i])/ x[i]! / Γ(p[i] * α))
dimultinom
and dDirchletMultinom
return a single numeric value
between 0 and 1 indicating the multinomial or Dirichlet-multinomial
probability, repectively.
nGen
returns an integer indicating how many unique genotypes are
possible for a given ploidy and number of alleles.
enumerateGenotypes
returns an integer matrix, with as many columns as
the ploidy, listing all possible genotypes. Each row represents one genotype.
Each element represents an allele in the genotype, with zero indicating the
reference allele and higher integers indicating alternative alleles. The
genotypes are ordered according to the VCF specification.
indexGenotype
returns an integer indicating the index of a genotype
within the VCF specification (i.e. in what row of the matrix returned by
enumerateGenotypes
would we find that genotype), starting at index zero.
genotypeFromIndex
returns an integer vector indicating the genotype
at a given index (starting at index zero, and according to the VCF
specification).
alleleCopy
returns an integer vector indicating the genotype in an
alternative format: there is one element for each allele (including the
reference allele and all alternative alleles), and the value of the element
indicates how many copies of that allele that genotype has. This format
can be useful for input to dmultinom
.
makeGametes
returns an integer matrix listing all possible gametes.
The number of columns will be equal to half the ploidy of the input genotype.
Each row represents one gamete. Gametes may be listed multiple times in
order to reflect their relative frequency. Fully polysomic inheritance is
assumed with no double reduction.
selfingMatrix
returns a numeric matrix with parent genotypes in rows and
progeny genotypes in columns, in VCF order. Matrix elements indicate the
frequency with which each parent genotype will produce each progeny genotype
after self-fertilization. Polysomic inheritance and no double reduction are
assumed. If f is a vector of genotype frequencies and A is
the matrix output by this function, fA is the vector of genotype
frequencies after one round of self-fertilization.
Lindsay V. Clark
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | # Likelihood of genotype 0/1/2/2 having 20 reads for 0, 25 reads
# for 1, and 35 reads for 2.
dmultinom(c(20, 25, 35), c(0.25, 0.25, 0.5))
dDirichletMultinom(c(20, 25, 35), c(0.25, 0.25, 0.5), 9)
# A tetraploid with two alternative alleles
nGen(4, 3)
enumerateGenotypes(4, 3)
# A diploid with one alternative allele
nGen(2, 2)
enumerateGenotypes(2, 2)
# Indices for two diploid genotypes
indexGenotype(c(0, 2))
indexGenotype(c(1, 1))
# Indices for two tetraploid genotypes
indexGenotype(c(0, 0, 0, 0))
indexGenotype(c(0, 0, 2, 2))
# Convert a tetraploid genotype to copy number format
alleleCopy(c(1, 1, 1, 2), 4)
# Generate gametes
makeGametes(c(0, 1)) # Diploid heterozygous
makeGametes(0:3) # Tetraploid fully heterozygous
makeGametes(0:5) # Hexaploid fully heterozygous
makeGametes(c(0,0,0,1)) # Tetraploid partial heterozygote
# Self-fertilization matrix for a biallelic diploid
selfingMatrix(2, 2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.