GetWeightedMeanGenotypes: Export Numeric Genotype Values from Posterior Probabilities
In lvclark/polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

GetWeightedMeanGenotypes

R Documentation

Export Numeric Genotype Values from Posterior Probabilities

Description

These functions calculate numerical genotype values using posterior probabilities in a "RADdata" object, and output those values as a matrix of taxa by alleles. GetWeightedMeanGenotypes returns continuous genotype values, weighted by posterior genotype probabilities (i.e. posterior mean genotypes). GetProbableGenotypes returns discrete genotype values indicating the most probable genotype. If the "RADdata" object includes more than one possible inheritance mode, the $ploidyChiSq slot is used for selecting or weighting inheritance modes for each allele.

Usage

GetWeightedMeanGenotypes(object, ...)
## S3 method for class 'RADdata'
GetWeightedMeanGenotypes(object, minval = 0, maxval = 1, 
                         omit1allelePerLocus = TRUE, 
                         omitCommonAllele = TRUE,
                         naIfZeroReads = FALSE, 
                         onePloidyPerAllele = FALSE, ...)

GetProbableGenotypes(object, ...)
## S3 method for class 'RADdata'
GetProbableGenotypes(object, omit1allelePerLocus = TRUE,
                     omitCommonAllele = TRUE,
                     naIfZeroReads = FALSE, 
                     correctParentalGenos = TRUE,
                     multiallelic = "correct", ...)

Arguments

`object`	A `"RADdata"` object. Posterior genotype probabilities should have been added with `AddGenotypePosteriorProb`, and if there is more than one possible ploidy, ploidy chi-squared values should have been added with `AddPloidyChiSq`.
`...`	Additional arguments, listed below, to be passed to the method for `"RADdata"`.
`minval`	The number that should be used for indicating that a taxon has zero copies of an allele.
`maxval`	The number that should be used for indicating that a taxon has the maximum copies of an allele (equal to the ploidy of the locus).
`omit1allelePerLocus`	A logical indicating whether one allele per locus should be omitted from the output, in order to reduce the number of variables and prevent singularities for genome-wide association and genomic prediction. The value for one allele can be predicted from the values from all other alleles at its locus.
`omitCommonAllele`	A logical, passed to the `commonAllele` argument of `OneAllelePerMarker`, indicating whether the most common allele for each locus should be omitted (as opposed to simply the first allele for each locus). Ignored if `omit1allelePerLocus = FALSE`.
`naIfZeroReads`	A logical indicating whether `NA` should be inserted into the output matrix for any taxa and loci where the total read depth for the locus is zero. If `FALSE`, the output for these genotypes is essentially calculated using prior genotype probabilities, since prior and posterior genotype probabilities are equal when there are no reads.
`onePloidyPerAllele`	Logical. If `TRUE`, for each allele the inheritance mode with the lowest `\chi ^ 2` value is selected and is assumed to be the true inheritance mode. If `FALSE`, inheritance modes are weighted by inverse `\chi ^ 2` values for each allele, and mean genotypes that have been weighted across inheritance modes are returned.
`correctParentalGenos`	Logical. If `TRUE` and if the dataset was processed with `PipelineMapping2Parents`, the parental genotypes that are output are corrected according to the progeny allele frequencies, using the `likelyGeno_donor` and `likelyGeno_recurrent` slots in `object`. For the ploidy of the marker, the appropriate ploidy for the parents is selected using the `donorPloidies` and `recurrentPloidies` slots.
`multiallelic`	A string indicating how to handle cases where allele copy number across all alleles at a locus does not sum to the ploidy. To retain the most probable copy number for each allele, even if they don't sum to the ploidy across all alleles, use `"ignore"`. To be conservative and convert these allele copy numbers to `NA`, use `"na"`. To adjust allele copy numbers to match the ploidy (adding or subtracting allele copies while maximizing the product of posterior probabilities across alleles), use `"correct"`.

Details

For each inheritance mode m, taxon t, allele a, allele copy number i, total ploidy k, and posterior genotype probability p_{i,t,a,m}, posterior mean genotype g_{t,a,m} is estimated by GetWeightedMeanGenotypes as:

g_{t,a,m} = \sum_{i = 0}^k p_{i,t,a,m} * \frac{i}{k}

For GetProbableGenotypes, the genotype is the one with the maximum posterior probability:

g_{t,a,m} = i | \max_{i = 0}^k{p_{i,t,a,m}}

When there are multiple inheritance modes and onePloidyPerAllele = FALSE, the weighted genotype is estimated by GetWeightedMeanGenotypes as:

g_{t,a} = \sum_m [ g_{t,a,m} * \frac{1}{\chi^2_{m,a}} / \sum_m \frac{1}{\chi^2_{m,a}}]

In GetProbableGenotypes, or GetWeightedMeanGenotypes when there are multiple inheritance modes and onePloidyPerAllele = TRUE, the genotype is simply the one corresponding to the inheritance mode with the minimum \chi ^2 value:

g_{t,a} = g_{t,a,m} | \min_m{\chi^2_{m,a}}

Value

For GetWeightedMeanGenotypes, a named matrix, with taxa in rows and alleles in columns, and values ranging from minval to maxval. These values can be treated as continuous genotypes.

For GetProbableGenotypes, a list:

`genotypes`	A named integer matrix, with taxa in rows and alleles in columns, and values ranging from zero to the maximum ploidy for each allele. These values can be treated as discrete genotypes.
`ploidy_index`	A vector with one value per allele. It contains the index of the most likely inheritance mode of that allele in `object$priorProbPloidies`.

Author(s)

Lindsay V. Clark

Examples

# load dataset
data(exampleRAD_mapping)

# run a genotype calling pipeline; 
# substitute with any pipeline and parameters
exampleRAD_mapping <- SetDonorParent(exampleRAD_mapping, "parent1")
exampleRAD_mapping <- SetRecurrentParent(exampleRAD_mapping, "parent2")
exampleRAD_mapping <- PipelineMapping2Parents(exampleRAD_mapping,
                                 n.gen.backcrossing = 1, useLinkage = FALSE)


# get weighted mean genotypes
wmg <- GetWeightedMeanGenotypes(exampleRAD_mapping)
# examine the results
wmg[1:10,]

# get most probable genotypes
pg <- GetProbableGenotypes(exampleRAD_mapping, naIfZeroReads = TRUE)
# examine the results
pg$genotypes[1:10,]

lvclark/polyRAD documentation built on June 14, 2025, 10:08 p.m.

lvclark/polyRAD index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lvclark/polyRAD
Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

GetWeightedMeanGenotypes: Export Numeric Genotype Values from Posterior Probabilities
In lvclark/polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

Export Numeric Genotype Values from Posterior Probabilities

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to GetWeightedMeanGenotypes in lvclark/polyRAD...

R Package Documentation

Browse R Packages

We want your feedback!

lvclark/polyRAD Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

GetWeightedMeanGenotypes: Export Numeric Genotype Values from Posterior Probabilities In lvclark/polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

Export Numeric Genotype Values from Posterior Probabilities

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to GetWeightedMeanGenotypes in lvclark/polyRAD...

R Package Documentation

Browse R Packages

We want your feedback!

lvclark/polyRAD
Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

GetWeightedMeanGenotypes: Export Numeric Genotype Values from Posterior Probabilities
In lvclark/polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids