get_LD_1KG: Compute LD from 1000 Genomes
In RajLabMSSM/echoLD: echoverse module: LD downloading and processing

get_LD_1KG

R Documentation

Compute LD from 1000 Genomes

Description

Downloads a subset vcf of the 1KG database that matches your locus coordinates. Then uses ld to calculate LD on the fly.

Usage

get_LD_1KG(
  locus_dir,
  query_dat,
  query_genome = "hg19",
  LD_reference = "1KGphase1",
  superpopulation = NULL,
  samples = character(0),
  local_storage = NULL,
  leadSNP_LD_block = FALSE,
  force_new = FALSE,
  force_new_maf = FALSE,
  fillNA = 0,
  stats = "R",
  as_sparse = TRUE,
  subset_common = TRUE,
  remove_tmps = TRUE,
  conda_env = "echoR_mini",
  nThread = 1,
  verbose = TRUE
)

Arguments

`locus_dir`	Storage directory to use.
`query_dat`	SNP-level summary statistics subset to query the LD panel with.
`query_genome`	Genome build of the `query_dat`.
`LD_reference`	LD reference to use: "1KGphase1" : 1000 Genomes Project Phase 1 (genome build: hg19). "1KGphase3" : 1000 Genomes Project Phase 3 (genome build: hg19). "UKB" : Pre-computed LD from a British European-decent subset of UK Biobank. Genome build : hg19 "<vcf_path>" : User-supplied path to a custom VCF file to compute LD matrix from. Accepted formats: .vcf / .vcf.gz / .vcf.bgz Genome build : defined by user with `target_genome`. "<matrix_path>" : User-supplied path to a pre-computed LD matrix Accepted formats: .rds / .rda / .csv / .tsv / .txt Genome build : defined by user with `target_genome`.
`superpopulation`	Superpopulation to subset LD panel by (used only if `LD_reference` is "1KGphase1" or "1KGphase3"). See popDat_1KGphase1 and popDat_1KGphase3 for full tables of their respective samples.
`samples`	[Optional] Sample names to subset the VCF by. If this option is used, the GRanges object will be converted to a ScanVcfParam for usage by readVcf.
`local_storage`	Storage folder for previously downloaded LD files. If `LD_reference` is "1KGphase1" or "1KGphase3", `local_storage` is where VCF files are stored. If `LD_reference` is "UKB", `local_storage` is where LD compressed numpy array (npz) files are stored. Set to `NULL` to download VCFs/LD npz from remote storage system.
`leadSNP_LD_block`	Only return SNPs within the same LD block as the lead SNP (the SNP with the smallest p-value).
`fillNA`	When pairwise LD (r) between two SNPs is `NA`, replace with 0.
`as_sparse`	Save/return LD matrix as a sparse matrix.
`subset_common`	Subset `LD_matrix` and `dat` to only the SNPs that are common to them both.
`remove_tmps`	Remove all intermediate files like vcf, npz, and plink files.
`conda_env`	Conda environments to search in. If `NULL` (default), will search all conda environments.
`nThread`	Number of threads to parallelize over.
`verbose`	Print messages.

Details

This approach is taken, because other API query tools have limitations with the window size being queried. This approach does not have this limitations, allowing you to fine-map loci more completely.

Other LD: check_population_1kg(), compute_LD(), filter_LD(), get_LD(), get_LD_1KG_download_vcf(), get_LD_UKB(), get_LD_matrix(), get_LD_vcf(), get_locus_vcf_folder(), ldlinkr_ldproxy_batch(), plot_LD(), popDat_1KGphase1, popDat_1KGphase3, rds_to_npz(), saveSparse(), save_LD_matrix(), snpstats_get_MAF()