eQTLcatalogue_query: Iterate queries to _eQTL Catalogue_

View source: R/eQTLcatalogue_query.R

eQTLcatalogue_queryR Documentation

Iterate queries to eQTL Catalogue

Description

Determines which datasets to query using qtl_search. Uses coordinates from stored summary stats files (e.g. GWAS) to determine which regions to query from eQTL Catalogue. Each locus file can be stored separately, or merged together to form one large file with all query results.

Usage

eQTLcatalogue_query(
  sumstats_paths = NULL,
  output_dir = file.path(tempdir(), "catalogueR_queries"),
  qtl_search = NULL,
  multithread_tabix = FALSE,
  method = c("REST", "tabix"),
  quant_method = "ge",
  split_files = TRUE,
  merge_with_gwas = TRUE,
  force_new_subset = FALSE,
  query_genome = "hg19",
  conda_env = "echoR_mini",
  nThread = 1,
  verbose = TRUE
)

Arguments

sumstats_paths

A list of paths to any number of summary stats files whose coordinates you want to use to make queries to eQTL Catalogue. If you wish to add custom names to the loci, simply add these as the names of the path list (e.g. c(BST1="<path>/<to>/<BST1_file>", LRRK2="<path>/<to>/<LRRK2_file>")). Otherwise, loci will automatically named based on their min/max genomic coordinates.

The minimum columns in these files required to make queries include:

SNP

RSID of each SNP.

CHR

Chromosome (can be in "chr12" or "12" format).

POS

Genomic position of each SNP.

...

Optional extra columns.

output_dir

The folder you want the merged gwas/qtl results to be saved to (set to NULL to not save the results). If split_files=FALSE, all query results will be merged into one and saved as <output_dir>/eQTLcatalogue_tsv.gz. If split_files=TRUE, all query results will instead be split into smaller files and stored in <output_dir>/.

qtl_search

This function will automatically search for any datasets that match your criterion. For example, if you search "Alasoo_2018", it will query the datasets:

  • Alasoo_2018.macrophage_naive

  • Alasoo_2018.macrophage_Salmonella

  • Alasoo_2018.macrophage_IFNg+Salmonella

You can be more specific about which datasets you want to include, for example by searching: "Alasoo_2018.macrophage_IFNg". You can even search by tissue or condition type (e.g. c("blood","brain")) and any QTL datasets containing those substrings (case-insensitive) in their name or metadata will be queried too.

multithread_tabix

Multi-thread across within a single tabix file query (good when you have one-several large loci).

method

Method for querying eQTL Catalogue:

  • "REST" (default): Uses the REST API. Slow but can be used by anyone.

  • "tabix"Uses tabix query. Fast, but requires the user to first get their IP address whitelisted by the EMBL-EBI server admin by putting in a request here.

Note: "tabix" is about ~17x faster than the REST API, but is currently a far less reliable method than the REST API because tabix tends to get blocked by eQTL Catalogue's firewall. See here for more details.

quant_method

eQTL Catalogue actually contains more than just eQTL data. For each dataset, the following kinds of QTLs can be queried:

gene expression QTL

quant_method="ge" (default) or quant_method="microarray", depending on the dataset. catalogueR will automatically select whichever option is available.

exon expression QTL

*under construction* quant_method="ex"

transcript usage QTL

*under construction* quant_method="tx"

promoter, splice junction and 3' end usage QTL

*under construction* quant_method="txrev"

split_files

Save the results as one file per QTL dataset (with all loci within each file). If this is set to =TRUE, then this function will return the list of paths where these files were saved. A helper function is provided to import and merge them back together in R. If this is set to =FALSE, then this function will instead return one big merged data.table containing results from all QTL datasets and all loci. =FALSE is not recommended when you have many large loci and/or many QTL datasets, because you can only fit so much data into memory.

merge_with_gwas

Whether you want to merge your QTL query results with your GWAS data (convenient, but takes up more storage).

force_new_subset

By default, catalogueR will use any pre-existing files that match your query. Set force_new_subset=T to override this and force a new query.

query_genome

The genome build of your query coordinates (e.g. query_dat). If your coordinates are in hg19, catalogueR will automatically lift them over to hg38 (as this is the build that eQTL Catalogue uses).

conda_env

Conda environment to search for tabix executable in.

nThread

The number of CPU cores you want to use to speed up your queries through parallelization.

verbose

Show more (=TRUE) or fewer (=FALSE) messages.

See Also

Other eQTL Catalogue: eQTLcatalogue_fetch(), eQTLcatalogue_header, eQTLcatalogue_iterate_fetch(), eQTLcatalogue_search_metadata(), fetch_restAPI(), fetch_tabix(), merge_gwas_qtl(), meta

Examples

sumstats_paths <- echodata::get_Nalls2019_loci(limit_snps = 5)
GWAS.QTL <- catalogueR::eQTLcatalogue_query(
    sumstats_paths = sumstats_paths$BST1,
    qtl_search = "Alasoo_2018.macrophage_naive")

RajLabMSSM/catalogueR documentation built on Jan. 1, 2023, 10:45 a.m.