calc_single_stats: Calculate standard single-SNP genetic statistics

calc_single_statsR Documentation

Calculate standard single-SNP genetic statistics

Description

These functions calculate a basic genetic statistics from SNP data contained in snpRdata objects, splitting samples by any number of provided facets. Since these statistics all use only a single SNP at a time, they ignore any SNP specific facet levels.

Usage

calc_pi(x, facets = NULL)

calc_maf(x, facets = NULL)

calc_ho(x, facets = NULL)

calc_private(x, facets = NULL, rarefaction = TRUE, g = 0)

calc_hwe(
  x,
  facets = NULL,
  method = "exact",
  fwe_method = "BY",
  fwe_case = c("by_facet", "overall")
)

calc_he(x, facets = NULL)

calc_allelic_richness(x, facets = NULL, g = 0)

Arguments

x

snpRdata. Input SNP data.

facets

character. Categorical metadata variables by which to break up analysis. See Facets_in_snpR for more details.

rarefaction

logical, default TRUE. Should the number of segregating sites be estimated via rarefaction? See details.

g

numeric, default 0. If doing rarefaction, controls the number of alleles/gene copies to rarefact to. If 0, this will rarefact to the smallest sample size per locus. If g < 0, this will rarefact to to the smallest sample size per locus minus the absolute value of g. If positive, this will rarefact to g, and any loci where the smallest sample size is less than g will be dropped from the calculation.

method

character, default "exact". Defines the method to use for calculating p-values for HWE divergence. Options:

  • exact: Uses the exact test as described in Wigginton et al (2005).

  • chisq: Uses a chi-squared test.

See details

fwe_method

character, default "BY". Type of Family-Wise Error correction (multiple testing correction) to use. For details and options, see p.adjust. If no correction is desired, set this argument to "none".

fwe_case

character, default c("by_facet", "by_subfacet", "overall"). How should Family-Wise Error correction (multiple testing correction) be applied?

  • "by_facet": Each facet supplied (such as pop or pop.fam) is treated as a set of tests.

  • "by_subfacet": Each level of each subfacet is treated as a separate set of tests.

  • "overall": All tests are treated as a set.

Details

The data can be broken up categorically by sample metadata, as described in Facets_in_snpR.

Value

snpRdata object with requested stats merged into the stats socket

Functions

  • calc_pi(): \pi (nucleotide diversity/average number of pairwise differences)

  • calc_maf(): minor allele frequency

  • calc_ho(): observed heterozygosity

  • calc_private(): find private alleles

  • calc_hwe(): p-values for Hardy-Weinberg Equilibrium divergence

  • calc_he(): expected heterozygosity

  • calc_allelic_richness(): allelic richness (standardized number of alleles per locus via rarefaction)

\pi

Calculates \pi (nucleotide diversity/average number of pairwise differences) according to Hohenlohe et al. (2010).

HE

Calculates traditional expected heterozygosity 2pq. Note that this will produce results almost identical to \pi.

HO

Calculates observed heterozygosity.

maf

Calculates minor allele frequencies and note identities and counts of major and minor alleles.

private alleles

Determines if each SNP is a private allele across all levels in each sample facet. Will return an error if no sample facets are provided. If rarefaction is requested, the estimated number of private alleles will be calculated according to Smith and Grassle (1977). Note that the standardized sample size (g) will vary across loci due to differences in sequencing coverage at those loci, equal to the smallest number of alleles sequenced in any population at that locus minus one. Instead of weighted averages, the value stored in the $weighted.means slot in the returned value is the total number of private alleles per population.

hwe

Calculates a p-value for the null hypothesis that a population is in HWE at a given locus. Several methods available:

  • "exact" Exact test according to Wigginton, JE, Cutler, DJ, and Abecasis, GR (2005). Slightly slower.

  • "chisq" Chi-squared test. May produce poor results when sample sizes for any observed or expected genotypes are small.

For the exact test, code derived from http://csg.sph.umich.edu/abecasis/Exact/snp_hwe.r

allelic richness

Calculates the allelic richness, the estimated number of alleles per locus standardized via rarefaction for sample size according to Hurlburt (1971). Note that the standardized sample size (g) will vary across loci due to differences in sequencing coverage at those loci, equal to the smallest number of alleles sequenced in any population at that locus minus one. Weighted averages are weighted by g.

Author(s)

William Hemstrom

References

Wigginton, JE, Cutler, DJ, and Abecasis, GR (2005). American Journal of Human Genetics

Hohenlohe et al. (2010). PLOS Genetics.

Hurlburt (1971). Ecology.

Smith and Grassle (1977). Biometrics

Examples

# base facet
x <- calc_pi(stickSNPs)
get.snpR.stats(x)

# multiple facets
x <- calc_pi(stickSNPs, facets = c("pop", "pop.fam"))
get.snpR.stats(x, c("pop", "pop.fam"))

# HWE with family-wise error correction
## Not run: 
x <- calc_hwe(stickSNPs, facets = c("pop", "pop.fam"))
get.snpR.stats(x, c("pop", "pop.fam"))

## End(Not run)

hemstrow/snpR documentation built on March 20, 2024, 7:03 a.m.