get_LD: Procure an LD matrix for fine-mapping

View source: R/get_LD.R

get_LDR Documentation

Procure an LD matrix for fine-mapping

Description

Calculate and/or query linkage disequilibrium (LD) from reference panels (UK Biobank, 1000 Genomes), a user-supplied pre-computed LD matrix. If need be, query_dat will automatically be lifted over to the genome build of the target LD panel before query is performed.

Usage

get_LD(
  query_dat,
  locus_dir = tempdir(),
  standardise_colnames = FALSE,
  force_new_LD = FALSE,
  LD_reference = c("1KGphase1", "1KGphase3", "UKB"),
  query_genome = "hg19",
  target_genome = "hg19",
  samples = character(0),
  superpopulation = NULL,
  local_storage = NULL,
  leadSNP_LD_block = FALSE,
  fillNA = 0,
  verbose = TRUE,
  remove_tmps = TRUE,
  as_sparse = TRUE,
  subset_common = TRUE,
  download_method = "axel",
  conda_env = "echoR_mini",
  nThread = 1
)

Arguments

query_dat

SNP-level summary statistics subset to query the LD panel with.

locus_dir

Storage directory to use.

standardise_colnames

Automatically rename all columns to a standard nomenclature using standardise_header.

force_new_LD

Force new LD subset.

LD_reference

LD reference to use:

  • "1KGphase1" : 1000 Genomes Project Phase 1 (genome build: hg19).

  • "1KGphase3" : 1000 Genomes Project Phase 3 (genome build: hg19).

  • "UKB" : Pre-computed LD from a British European-decent subset of UK Biobank. Genome build : hg19

  • "<vcf_path>" : User-supplied path to a custom VCF file to compute LD matrix from.
    Accepted formats: .vcf / .vcf.gz / .vcf.bgz
    Genome build : defined by user with target_genome.

  • "<matrix_path>" : User-supplied path to a pre-computed LD matrix Accepted formats: .rds / .rda / .csv / .tsv / .txt
    Genome build : defined by user with target_genome.

query_genome

Genome build of the query_dat.

target_genome

Genome build of the LD panel. This is automatically assigned to the correct genome build for each LD panel except when the user supplies custom vcf/LD files.

samples

[Optional] Sample names to subset the VCF by. If this option is used, the GRanges object will be converted to a ScanVcfParam for usage by readVcf.

superpopulation

Superpopulation to subset LD panel by (used only if LD_reference is "1KGphase1" or "1KGphase3"). See popDat_1KGphase1 and popDat_1KGphase3 for full tables of their respective samples.

local_storage

Storage folder for previously downloaded LD files. If LD_reference is "1KGphase1" or "1KGphase3", local_storage is where VCF files are stored. If LD_reference is "UKB", local_storage is where LD compressed numpy array (npz) files are stored. Set to NULL to download VCFs/LD npz from remote storage system.

leadSNP_LD_block

Only return SNPs within the same LD block as the lead SNP (the SNP with the smallest p-value).

fillNA

Value to fill LD matrix NAs with.

verbose

Print messages.

remove_tmps

Remove all intermediate files like vcf, npz, and plink files.

as_sparse

Convert the LD matrix to a sparse matrix.

subset_common

Subset LD_matrix and dat to only the SNPs that are common to them both.

download_method
  • "axel" : Multi-threaded

  • "wget" : Single-threaded

  • "download.file" : Single-threaded

  • "internal" : Single-threaded (passed to download.file)

  • "wininet" : Single-threaded (passed to download.file)

  • "libcurl" : Single-threaded (passed to download.file)

  • "curl" : Single-threaded (passed to download.file)

conda_env

Conda environments to search in. If NULL (default), will search all conda environments.

nThread

Number of threads to parallelize over.

Value

A named list containing:

  • "LD": Symmetric LD matrix of pairwise SNP correlations.

  • "DT": Standardised query data filtered to only the SNPs included in both query_dat and the LD matrix.

  • "path": The path to where the LD matrix was saved.

See Also

Other LD: check_population_1kg(), compute_LD(), filter_LD(), get_LD_1KG(), get_LD_1KG_download_vcf(), get_LD_UKB(), get_LD_matrix(), get_LD_vcf(), get_locus_vcf_folder(), ldlinkr_ldproxy_batch(), plot_LD(), popDat_1KGphase1, popDat_1KGphase3, rds_to_npz(), saveSparse(), save_LD_matrix(), snpstats_get_MAF()

Examples

query_dat <- echodata::BST1[seq(1, 50), ] 
locus_dir <- file.path(tempdir(), echodata::locus_dir)  
LD_list <- echoLD::get_LD(
    locus_dir = locus_dir,
    query_dat = query_dat,
    LD_reference = "1KGphase1")

RajLabMSSM/echoLD documentation built on May 12, 2024, 3:23 a.m.