calc_pairwise_ld: Pairwise LD from SNP data.

calc_pairwise_ldR Documentation

Pairwise LD from SNP data.

Description

calc_pairwise_ld calculates LD between each pair of SNPs.

Usage

calc_pairwise_ld(
  x,
  facets = NULL,
  subfacets = NULL,
  ss = FALSE,
  par = FALSE,
  CLD = "only",
  use.ME = FALSE,
  sigma = 1e-04,
  window_sigma = NULL,
  window_step = window_sigma * 2,
  window_gaussian = TRUE,
  window_triple_sigma = TRUE,
  verbose = FALSE,
  .prox_only = FALSE
)

Arguments

x

snpRdata. Input SNP data. Note that a SNP column containing snp position in base pairs named 'position' is required.

facets

character. Categorical metadata variables by which to break up analysis. See Facets_in_snpR for more details.

subfacets

character, default NULL. Subsets the facet levels to run. Given as a named list: list(fam = A) will run only fam A, list(fam = c("A", "B"), chr = 1) will run only fams A and B on chromosome 1. list(fam = "A", pop = "ASP") will run samples in either fam A or pop ASP, list(fam.pop = "A.ASP") will run only samples in fam A and pop ASP.

ss

numeric, default NULL. Number of snps to subsample.

par

numeric or FALSE, default FALSE. If numeric, the number of cores to use for parallel processing.

CLD

TRUE, FALSE, or "only", default "only". Specifies if the CLD method should be used either in addition to or instead of default methods. See details.

use.ME

logical, default FALSE. Specifies if the Minimization-Expectation haplotype estimation should be used. See details.

sigma

numeric, default 0.0001. If the ME method is used, specifies the minimum difference required between steps before haplotype frequencies are accepted.

window_sigma

numeric, default NULL. Size of windows in kb within which to calculate ld values, if requested. Windows will be two times window_sigma in size unless window_triple_sigma is true, in which case they will be six times window_sigma.

window_step

numeric or NULL, default two times window_sigma (non-overlapping windows if window_triple_sigma is FALSE). Size of the steps between windows, in kb.

window_gaussian

logical, default TRUE. If TRUE, windows will be gaussian-smoothed. Otherwise, raw averages will be returned. See calc_smoothed_averages for details.

window_triple_sigma

logical, default TRUE. If TRUE, window_sigma values will be tripled prior to averaging.

verbose

Logical, default FALSE. If TRUE, some progress updates will be reported.

.prox_only

Logical, default FALSE. Primarily for internal use. if TRUE returns ONLY a proximity table of LD values, not a snpRdata object.

Details

Calculates pairwise linkage disequilibrium between pairs of SNPs using several different methods. By default uses the Burrow's Composite Linkage Disequilibrium method.

If cld is not "only", haplotypes are estimated either via direct count after removing all "0101" double heterozygote haplotypes (if use.ME is FALSE) or via the Minimization-Expectation method described in Excoffier, L., and Slatkin, M. (1995). Note that while the latter method is likely more accurate, it can be very slow and often produces qualitatively equivalent results, and so is not preferred during casual or preliminary analysis. Either method will calculate D', r-squared, and the p-value for that r-squared.

Since this process involves many pairwise comparisons, it can be very slow. As an alternative, average LD values can be calculated within sliding windows using the window_ family of arguments. This will be substantially faster, but individual snp/snp LD values will not be returned. See calc_smoothed_averages for details.

In contrast, Burrow's Composite Linkage Disequilibrium (CLD) can be calculated very quickly via the cor function from base R. calc_pairwise_ld will perform this method alongside the other methods if cld = TRUE and by itself if cld = "only". For most analyses, this will be sufficient and much faster than the other methods. This is the default behavior.

The data can be broken up categorically by either SNP and/or sample metadata, as described in Facets_in_snpR.

Heatmaps of the resulting data can be easily plotted using plot_pairwise_ld_heatmap.

Value

a snpRdata object with linkage results stored in the pairwise.LD slot. Specifically, this slot will contain a list containing any LD matrices in a nested list broken down facet then by facet levels and a data.frame containing all pairwise comparisons, their metadata, and calculated statistics in long format for easy plotting.

Author(s)

William Hemstrom

Keming Su

References

Dimitri Zaykin (2004). Genetic Epidemiology

Excoffier, L., and Slatkin, M. (1995). Molecular Biology and Evolution

Lewontin (1964). Genetics

Examples

## Not run: 
# not run, slow
## CLD
x <- calc_pairwise_ld(stickSNPs, facets = "chr.pop")
get.snpR.stats(x, "chr.pop", "LD")

## standard haplotype frequency estimation
x <- calc_pairwise_ld(stickSNPs, facets = "chr.pop", CLD = FALSE)
get.snpR.stats(x, "chr.pop", "LD")

## End(Not run)

# subset for specific subfacets (ASP and OPL, chromosome IX)
x <- calc_pairwise_ld(stickSNPs, facets = "chr.pop",
                      subfacets = list(pop = c("ASP", "OPL"), 
                                       chr = "groupIX"))
get.snpR.stats(x, "chr.pop", "LD")

## Not run: 
## not run, really slow
# ME haplotype estimation
x <- calc_pairwise_ld(stickSNPs, facets = "chr.pop", 
                      CLD = FALSE, use.ME = TRUE,
                      subfacets = list(pop = c("ASP", "OPL"), 
                                       chr = "groupIX"))
get.snpR.stats(x, "chr.pop", "LD")

## End(Not run)

hemstrow/snpR documentation built on July 15, 2024, 7:14 p.m.