COLOC_run: Iteratively run coloc on merged GWAS-QTL datatables

View source: R/COLOC_run.R

COLOC_runR Documentation

Iteratively run coloc on merged GWAS-QTL datatables

Description

Runs colocalization tests (coloc.abf) on merged GWAS-QTL data.tables generated by eQTLcatalogue_query. Iteratively runs coloc across each:

  • QTL dataset

  • GWAS locus

  • QTL gene

NOTE: Assumes that each file is within a subfolder named after the QTL dataset it came from.

Usage

COLOC_run(
  gwas.qtl_paths,
  save_path = tempfile(pattern = "coloc_results", fileext = ".tsv.gz"),
  top_snp_only = TRUE,
  split_by_group = FALSE,
  method = "abf",
  coloc_thresh = 0.8,
  compute_n = NULL,
  nThread = 1,
  verbose = TRUE
)

Arguments

gwas.qtl_paths

Query results paths from eQTLcatalogue_query.

save_path

Where to save results to.

top_snp_only

Only include the SNP (with the highest SNP-wise PP.H4, which is usually the one with the smallest p-value) instead of all SNPs. Can be useful for reducing data size.

split_by_group

Split files by QTL group when saving.

method

Method for querying eQTL Catalogue:

  • "REST" (default): Uses the REST API. Slow but can be used by anyone.

  • "tabix"Uses tabix query. Fast, but requires the user to first get their IP address whitelisted by the EMBL-EBI server admin by putting in a request here.

Note: "tabix" is about ~17x faster than the REST API, but is currently a far less reliable method than the REST API because tabix tends to get blocked by eQTL Catalogue's firewall. See here for more details.

coloc_thresh

Colocalization Posterior Probability threshold, using the formula: (PP.H3 + PP.H4 >= coloc_thresh) & (PP.H4 / PP.H3 >= 2.

compute_n

How to compute per-SNP sample size (new column "N").
If the column "N" is already present in dat, this column will be used to extract per-SNP sample sizes and the argument compute_n will be ignored.
If the column "N" is not present in dat, one of the following options can be supplied to compute_n:

  • 0: N will not be computed.

  • >0: If any number >0 is provided, that value will be set as N for every row. **Note**: Computing N this way is incorrect and should be avoided if at all possible.

  • "sum": N will be computed as: cases (N_CAS) + controls (N_CON), so long as both columns are present.

  • "ldsc": N will be computed as effective sample size: Neff =(N_CAS+N_CON)*(N_CAS/(N_CAS+N_CON)) / mean((N_CAS/(N_CAS+N_CON))(N_CAS+N_CON)==max(N_CAS+N_CON)).

  • "giant": N will be computed as effective sample size: Neff = 2 / (1/N_CAS + 1/N_CON).

  • "metal": N will be computed as effective sample size: Neff = 4 / (1/N_CAS + 1/N_CON).

nThread

The number of CPU cores you want to use to speed up your queries through parallelization.

verbose

Print messages.

Value

If top_snp_only=TRUE, returns SNP-level stats for only the SNP with the highest colocalization probability (SNP.PP.H4) If top_snp_only=FALSE, returns SNP-level stats for every SNP. In either case, summary-level coloc stats are added in the columns PP.H0, PP.H1, PP.H2, PP.H3, PP.H4.

See Also

Other coloc: COLOC_corplot(), COLOC_get_example_res(), COLOC_get_res(), COLOC_heatmap(), COLOC_merge_res(), COLOC_report_summary()

Examples

gwas.qtl_paths <- catalogueR::eQTLcatalogue_example_queries()
coloc_QTLs <- catalogueR::COLOC_run(gwas.qtl_paths = gwas.qtl_paths)

RajLabMSSM/catalogueR documentation built on Jan. 1, 2023, 10:45 a.m.