pgen: Genotype Probability

View source: R/round_robin.R

pgenR Documentation

Genotype Probability

Description

Calculate the probability of genotypes based on the product of allele frequencies over all loci.

Usage

pgen(gid, pop = NULL, by_pop = TRUE, log = TRUE, freq = NULL, ...)

Arguments

gid

a genind or genclone object.

pop

either a formula to set the population factor from the strata slot or a vector specifying the population factor for each sample. Defaults to NULL.

by_pop

When this is TRUE (default), the calculation will be done by population.

log

a logical if log =TRUE (default), the values returned will be log(Pgen). If log = FALSE, the values returned will be Pgen.

freq

a vector or matrix of allele frequencies. This defaults to NULL, indicating that the frequencies will be determined via round-robin approach in rraf. If this matrix or vector is not provided, zero-value allele frequencies will automatically be corrected. For details, please see the documentation on correcting rare alleles.

...

options from correcting rare alleles. The default is to correct allele frequencies to 1/n

Details

Pgen is the probability of a given genotype occuring in a population assuming HWE. Thus, the value for diploids is

P_{gen} = \left(\prod_{i=1}^m p_i\right)2^h

where p_i are the allele frequencies and h is the count of the number of heterozygous sites in the sample (Arnaud-Haond et al. 2007; Parks and Werth, 1993). The allele frequencies, by default, are calculated using a round-robin approach where allele frequencies at a particular locus are calculated on the clone-censored genotypes without that locus.

To avoid issues with numerical precision of small numbers, this function calculates pgen per locus by adding up log-transformed values of allele frequencies. These can easily be transformed to return the true value (see examples).

Value

A vector containing Pgen values per locus for each genotype in the object.

Note

For haploids, Pgen at a particular locus is the allele frequency. This function cannot handle polyploids. Additionally, when the argument pop is not NULL, by_pop is automatically TRUE.

Author(s)

Zhian N. Kamvar, Jonah Brooks, Stacy A. Krueger-Hadfield, Erik Sotka

References

Arnaud-Haond, S., Duarte, C. M., Alberto, F., & Serrão, E. A. 2007. Standardizing methods to address clonality in population studies. Molecular Ecology, 16(24), 5115-5139.

Parks, J. C., & Werth, C. R. 1993. A study of spatial features of clones in a population of bracken fern, Pteridium aquilinum (Dennstaedtiaceae). American Journal of Botany, 537-544.

See Also

psex, rraf, rrmlg, rare_allele_correction

Examples

data(Pram)
head(pgen(Pram, log = FALSE))

## Not run: 
# You can also supply the observed allele frequencies
pramfreq <- Pram %>% genind2genpop() %>% tab(freq = TRUE)
head(pgen(Pram, log = FALSE, freq = pramfreq))

# You can get the Pgen values over all loci by summing over the logged results:
pgen(Pram, log = TRUE) %>%  # calculate pgen matrix
  rowSums(na.rm = TRUE) %>% # take the sum of each row
  exp()                     # take the exponent of the results

# You can also take the product of the non-logged results:
apply(pgen(Pram, log = FALSE), 1, prod, na.rm = TRUE)

## Rare Allele Correction ---------------------------------------------------
##
# If you don't supply a table of frequencies, they are calculated with rraf 
# with correction = TRUE. This is normally benign when analyzing large 
# populations, but it can have a great effect on small populations. To help 
# control this, you can supply arguments described in 
# help("rare_allele_correction"). 


# Default is to correct by 1/n per population. Since the calculation is 
# performed on a smaller sample size due to round robin clone correction, it
# would be more appropriate to correct by 1/rrmlg at each locus. This is 
# acheived by setting d = "rrmlg". Since this is a diploid, we would want to
# account for the number of chromosomes, and so we set mul = 1/2
head(pgen(Pram, log = FALSE, d = "rrmlg", mul = 1/2)) # compare with the output above

# If you wanted to treat all alleles as equally rare, then you would set a
# specific value (let's say the rare alleles are 1/100):
head(pgen(Pram, log = FALSE, e = 1/100))

## End(Not run)

poppr documentation built on May 29, 2024, 5:54 a.m.