get_LD_1KG: Compute LD from 1000 Genomes

View source: R/get_LD_1KG.R

get_LD_1KGR Documentation

Compute LD from 1000 Genomes

Description

Downloads a subset vcf of the 1KG database that matches your locus coordinates. Then uses ld to calculate LD on the fly.

Usage

get_LD_1KG(
  locus_dir,
  query_dat,
  query_genome = "hg19",
  LD_reference = "1KGphase1",
  superpopulation = NULL,
  samples = character(0),
  local_storage = NULL,
  leadSNP_LD_block = FALSE,
  force_new = FALSE,
  force_new_maf = FALSE,
  fillNA = 0,
  stats = "R",
  as_sparse = TRUE,
  subset_common = TRUE,
  remove_tmps = TRUE,
  conda_env = "echoR_mini",
  nThread = 1,
  verbose = TRUE
)

Arguments

locus_dir

Storage directory to use.

query_dat

SNP-level summary statistics subset to query the LD panel with.

query_genome

Genome build of the query_dat.

LD_reference

LD reference to use:

  • "1KGphase1" : 1000 Genomes Project Phase 1 (genome build: hg19).

  • "1KGphase3" : 1000 Genomes Project Phase 3 (genome build: hg19).

  • "UKB" : Pre-computed LD from a British European-decent subset of UK Biobank. Genome build : hg19

  • "<vcf_path>" : User-supplied path to a custom VCF file to compute LD matrix from.
    Accepted formats: .vcf / .vcf.gz / .vcf.bgz
    Genome build : defined by user with target_genome.

  • "<matrix_path>" : User-supplied path to a pre-computed LD matrix Accepted formats: .rds / .rda / .csv / .tsv / .txt
    Genome build : defined by user with target_genome.

superpopulation

Superpopulation to subset LD panel by (used only if LD_reference is "1KGphase1" or "1KGphase3"). See popDat_1KGphase1 and popDat_1KGphase3 for full tables of their respective samples.

samples

[Optional] Sample names to subset the VCF by. If this option is used, the GRanges object will be converted to a ScanVcfParam for usage by readVcf.

local_storage

Storage folder for previously downloaded LD files. If LD_reference is "1KGphase1" or "1KGphase3", local_storage is where VCF files are stored. If LD_reference is "UKB", local_storage is where LD compressed numpy array (npz) files are stored. Set to NULL to download VCFs/LD npz from remote storage system.

leadSNP_LD_block

Only return SNPs within the same LD block as the lead SNP (the SNP with the smallest p-value).

fillNA

When pairwise LD (r) between two SNPs is NA, replace with 0.

as_sparse

Save/return LD matrix as a sparse matrix.

subset_common

Subset LD_matrix and dat to only the SNPs that are common to them both.

remove_tmps

Remove all intermediate files like vcf, npz, and plink files.

conda_env

Conda environments to search in. If NULL (default), will search all conda environments.

nThread

Number of threads to parallelize over.

verbose

Print messages.

Details

This approach is taken, because other API query tools have limitations with the window size being queried. This approach does not have this limitations, allowing you to fine-map loci more completely.

See Also

Other LD: check_population_1kg(), compute_LD(), filter_LD(), get_LD(), get_LD_1KG_download_vcf(), get_LD_UKB(), get_LD_matrix(), get_LD_vcf(), get_locus_vcf_folder(), ldlinkr_ldproxy_batch(), plot_LD(), popDat_1KGphase1, popDat_1KGphase3, rds_to_npz(), saveSparse(), save_LD_matrix(), snpstats_get_MAF()


RajLabMSSM/echoLD documentation built on May 12, 2024, 3:23 a.m.