get_LD_1KG_download_vcf: Download VCF subset from 1000 Genomes

View source: R/LD_1KG_download_vcf.R

get_LD_1KG_download_vcfR Documentation

Download VCF subset from 1000 Genomes

Description

Query the 1000 Genomes Project for a subset of their individual-level VCF files.

Usage

get_LD_1KG_download_vcf(
  query_granges,
  query_genome = "hg19",
  LD_reference = "1KGphase1",
  superpopulation = NULL,
  samples = character(0),
  local_storage = NULL,
  locus_dir = tempdir(),
  save_path = echotabix::construct_vcf_path(locus_dir = locus_dir, subdir = "LD",
    target_path = LD_reference, query_granges = query_granges),
  query_save = TRUE,
  force_new = FALSE,
  conda_env = "echoR_mini",
  nThread = 1,
  verbose = TRUE
)

Arguments

query_granges

GRanges object to be used for querying the target_path file. Alternatively, can be variant-level summary statistics to be converted into a GRanges object by construct_query.

query_genome

Genome build of the query_dat.

LD_reference

LD reference to use:

  • "1KGphase1" : 1000 Genomes Project Phase 1 (genome build: hg19).

  • "1KGphase3" : 1000 Genomes Project Phase 3 (genome build: hg19).

  • "UKB" : Pre-computed LD from a British European-decent subset of UK Biobank. Genome build : hg19

  • "<vcf_path>" : User-supplied path to a custom VCF file to compute LD matrix from.
    Accepted formats: .vcf / .vcf.gz / .vcf.bgz
    Genome build : defined by user with target_genome.

  • "<matrix_path>" : User-supplied path to a pre-computed LD matrix Accepted formats: .rds / .rda / .csv / .tsv / .txt
    Genome build : defined by user with target_genome.

superpopulation

Superpopulation to subset LD panel by (used only if LD_reference is "1KGphase1" or "1KGphase3"). See popDat_1KGphase1 and popDat_1KGphase3 for full tables of their respective samples.

samples

[Optional] Sample names to subset the VCF by. If this option is used, the GRanges object will be converted to a ScanVcfParam for usage by readVcf.

local_storage

Storage folder for previously downloaded LD files. If LD_reference is "1KGphase1" or "1KGphase3", local_storage is where VCF files are stored. If LD_reference is "UKB", local_storage is where LD compressed numpy array (npz) files are stored. Set to NULL to download VCFs/LD npz from remote storage system.

locus_dir

Storage directory to use.

save_path

Path to save LD subset to.

query_save

Whether to save the queried data subset.

conda_env

Conda environments to search in. If NULL (default), will search all conda environments.

nThread

Number of threads to parallelize over.

verbose

Print messages.

Source

query_dat <- echodata::BST1 locus_dir <- file.path(tempdir(), echodata::locus_dir) query_granges <- echotabix::construct_query(query_dat=query_dat) vcf_subset.popDat <- echoLD:::get_LD_1KG_download_vcf( query_granges = query_granges, LD_reference = "1KGphase1", locus_dir = locus_dir)

See Also

Other LD: check_population_1kg(), compute_LD(), filter_LD(), get_LD(), get_LD_1KG(), get_LD_UKB(), get_LD_matrix(), get_LD_vcf(), get_locus_vcf_folder(), ldlinkr_ldproxy_batch(), plot_LD(), popDat_1KGphase1, popDat_1KGphase3, rds_to_npz(), saveSparse(), save_LD_matrix(), snpstats_get_MAF()


RajLabMSSM/echoLD documentation built on May 12, 2024, 3:23 a.m.