do_bootstraps: Generate bootstrapped windowed statistics
In hemstrow/snpR: Whole-Genome Analysis Tools for Use with Single Nucleotide Polymorphism Data

View source: R/bootstrapping_functions.R

do_bootstraps

R Documentation

Generate bootstrapped windowed statistics

Description

do_bootstraps creates a distribution of bootstrapped smoothed values for requested statistics.

Usage

do_bootstraps(
  x,
  facets = NULL,
  boots,
  sigma,
  step = 2 * sigma,
  statistics = "all",
  nk = TRUE,
  par = FALSE,
  triple_sigma = FALSE,
  gaussian = TRUE,
  do.p = TRUE,
  p.alt = "two-sided"
)

Arguments

`x`	snpRdata object.
`facets`	character or NULL, default NULL. Categories by which to break up bootstraps.
`boots`	numeric. Number of bootstraps to generate.
`sigma`	numeric. Designates the width of windows in kilobases. Full window size is 6*sigma.
`step`	numeric or NULL, default default `2*sigma` (non-overlapping windows). Designates the number of kilobases between each window centroid. If NULL, windows are centered on each SNP.
`statistics`	character. Designates the statistic(s) to smooth, typically "all", "single", or "pairwise". See details for options.
`nk`	logical, default TRUE. If TRUE, weights SNP contribution to window averages by the number of observations at those SNPs.
`par`	numeric or FALSE, default FALSE. If numeric, the number of cores to use for parallel processing.
`triple_sigma`	Logical, default FALSE. If TRUE, sigma will be tripled to create windows of 6*sigma total.
`gaussian`	Logical, default TRUE. If TRUE, windows will be gaussian smoothed. If not, windows will be raw averages.
`do.p`	logical, default TRUE. Determines if p-values should be calculated for sliding windows.
`p.alt`	character, default "two-sided". Specifies the alternative hypothesis to be used. Options: "less": probability that a bootstrapped value is as small or smaller than observed. "greater": probability that a bootstrapped value is as large or larger than observed. "two-sided": probability that a bootstrapped value is as or more extreme than observed.

Details

Bootstraps are conducted as described by Hohenlohe et al. (2010). For each bootstrap, this function draws random window position, then draws random statistics from all provided SNPs to fill each observed position on that window and calculates a smoothed statistic for that window using Gaussian-smoothing. Note that a "position" column must be present in the snp metadata of the snpRdata object to do any window calculations.

Bootstraps for multiple statistics can be calculated at once. If the statistics argument is set to "all", all calculated stats will be run. If it is set to "single", then all non-pairwise statistics will be bootstrapped, if it is set to "pairwise", then all pairwise statistics will be bootstrapped. Individual statistics can also be requested by name ("pi", "ho", etc.). All statistics bootstrapped at the same time will be calculated using the same randomly filled windows, and thus do not represent independent observations between statistics. This is still computationally more efficient, however.

The data can be broken up categorically by snp or sample metadata, as described in Facets_in_snpR.

As described in Hohenlohe et al. (2010), the contribution of individual SNPs to window averages can be weighted by the number of observations per SNP by setting the nk argument to TRUE, as is the default. For bootstraps, nk values are randomly drawn for each SNP in each window.

Possible centers for windows can either SNPs (if no step size is provided), or every step kilobases from the 0 position of each snp level facet category (chromosome, etc.).

If do.p is TRUE, calculates p-values for smoothed values of a statistic based upon the bootstrapped null distribution of that statistic using an empirical continuous distribution function. Note that in this case, the minimum possible p-value for a window depends upon the number of bootstraps calculated (if only a 1000 bootstraps were performed, the minimum possible p-value is about .001, or one in a thousand.)

Note that this function will return an error if equivalent windowed statistics have not first been calculated for the designated facets (if the "chromosome" facet is requested with a sigma of 200 and a step of 50, do_bootstraps will error unless calc_smoothed_averages has not yet been run with the same facet, sigma, and step values).

Value

A snpRdata object with bootstrapped windows merged in to the window.bootstraps slot. If do.p is TRUE, it will also merge p values in for bootstrapped statistics into the stats or window.stats sockets.

Author(s)

William Hemstrom

References

Hohenlohe et al. (2010). PLOS Genetics

Examples

# add statistics
dat <- calc_basic_snp_stats(stickSNPs, "chr", sigma = 200, step = 150)

# do bootstraps
dat <- do_bootstraps(dat, facets = "chr", boots = 100, 
                     sigma = 200, step = 150)

# fetch results, bootstraps and then p-values on original stats
get.snpR.stats(dat, "chr", "bootstraps")
get.snpR.stats(dat, "chr", "single.window")

hemstrow/snpR documentation built on July 5, 2025, 4:38 a.m.