get_ld_variants_by_window: Get linkage disequilibrium data for variants

View source: R/linkage_disequilibrium.R

get_ld_variants_by_by_windowR Documentation

Get linkage disequilibrium data for variants

Description

Gets linkage disequilibrium data for variants from Ensembl REST API. There are four ways to query, either by:

Genomic window centred on variants:

get_ld_variants_by_window(variant_id, genomic_window_size, ...)

Pairs of variants:

get_ld_variants_by_pair(variant_id1, variant_id2, ...)

Genomic range:

get_ld_variants_by_range(genomic_range, ...)

All pair combinations of variants:

get_ld_variants_by_pair_combn(variant_id, ...)

Usage

get_ld_variants_by_window(
  variant_id,
  genomic_window_size = 500L,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair(
  variant_id1,
  variant_id2,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_range(
  genomic_range,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair_combn(
  variant_id,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

variant_id

Variant identifiers, e.g., 'rs123'. This argument is to be used with either function get_ld_variants_by_window() or get_ld_variants_by_pair_combn(). In the case of get_ld_variants_by_pair_combn() all pairwise combinations of elements of variant_id are used to define pairs of variants for querying. Note that this argument is not the same as variant_id1 or variant_id2, to be used with function get_ld_variants_by_pair.

genomic_window_size

An integer vector specifying the genomic window size in kilobases (kb) around the variant indicated in variant_id. This argument is to be used with function get_ld_variants_by_window(). At the moment, the Ensembl REST API does not allow values greater than 500kb. A window size of 500 means looking 250kb upstream and downstream the variant passed as variant_id. The minimum value for this argument is 1L, not 0L.

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

population

Population for which to compute linkage disequilibrium. See get_populations on how to find available populations for a species.

d_prime

D' is a measure of linkage disequilibrium. d_prime defines a cut-off threshold: only variants whose D' \ge d_prime are returned.

r_squared

r^2 is a measure of linkage disequilibrium. r_squared defines a cut-off threshold: only variants whose r^2 \ge r_squared are returned. The lower bound for r_squared is 0.05, not 0; the upper bound is 1.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

variant_id1

The first variant of a pair of variants. Used with variant_id2. Note that this argument is not the same as variant_id. This argument is to be used with function get_ld_variants_by_pair().

variant_id2

The second variant of a pair of variants. Used with variant_id1. Note that this argument is not the same as variant_id. This argument is to be used with function get_ld_variants_by_pair().

genomic_range

Genomic range formatted as a string "chr:start..end", e.g., "X:1..10000". Check function genomic_range to easily create these ranges from vectors of start and end positions. This argument is to be used with function get_ld_variants_by_range().

Value

A tibble of 6 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

population

Population for which to compute linkage disequilibrium.

variant_id1

First variant identifier.

variant_id2

Second variant identifier.

d_prime

D' between the two variants.

r_squared

r^2 between the two variants.

Examples

# Retrieve variants in LD by a window size of 1kb:
# 1kb: 500 bp upstream and 500 bp downstream of variant.
get_ld_variants_by_window('rs123', genomic_window_size = 1L)

# Retrieve LD measures for pairs of variants:
get_ld_variants_by_pair(
  variant_id1 = c('rs123', 'rs35439278'),
  variant_id2 = c('rs122', 'rs35174522')
)

# Retrieve variants in LD within a genomic range
get_ld_variants_by_range('7:100000..100500')

# Retrieve all pair combinations of variants in LD
get_ld_variants_by_pair_combn(c('rs6978506', 'rs12718102', 'rs13307200'))


ramiromagno/ensemblr documentation built on Oct. 19, 2023, 11:12 a.m.