POLYFUN: Run PolyFun+SUSIE fine-mapping pipeline
In RajLabMSSM/echofinemap: echoverse module: Fine-mapping functions

POLYFUN

R Documentation

Run PolyFun+SUSIE fine-mapping pipeline

Description

Uses echolocatoR wrapper for SUSIE instead of the POLYFUN_finemapper. function which uses a python script provided with PolyFun.

Usage

POLYFUN(
  dat,
  LD_matrix,
  locus_dir,
  polyfun_path = NULL,
  mode = c("precomputed", "parametric", "non-parametric"),
  method = c("SUSIE", "FINEMAP"),
  max_causal = 5,
  compute_n = "ldsc",
  credset_thresh = 0.95,
  rescale_priors = TRUE,
  case_control = TRUE,
  conda_env = "echoR_mini",
  force_new = FALSE,
  nThread = 1,
  verbose = TRUE,
  ...
)

Arguments

`dat`	Fine-mapping results data.
`LD_matrix`	Linkage Disequilibrium (LD) matrix to use for fine-mapping.
`locus_dir`	Locus-specific directory to store results in.
`polyfun_path`	[Optional] Path to PolyFun directory where all the executables and reference data are stored. Will be automatically installed if set to `NULL` (default).
`mode`	PolyFun can run in several different modes corresponding to how SNP-wise prior causal probabilities (i.e. priors) are computed: "precomputed" : Using precomputed prior causal probabilities based on a meta-analysis of 15 UK Biobank traits. The meta-analysis was performed as part of the original PolyFun publication. "parametric" : Computing prior causal probabilities via an L2-regularized extension of stratified LD-score regression (S-LDSC). This is a relatively simple approach, but the prior causal probabilities may not be robust to modeling misspecification. Gathered from the ".snpvar_ridge_constrained.gz" output files from PolyFun. "non-parametric" : Computing prior causal probabilities non-parametrically. This is the most robust approach, but it is computationally intensive and requires access to individual-level genotypic data from a large reference panel (optimally >10,000 population-matched individuals). Gathered from the ".snpvar_constrained.gz" output files from PolyFun.
`method`	Method to conduct fine-mapping step with (using priors generated by PolyFun). "SUSIE": Uses SUSIE "FINEMAP": Uses FINEMAP
`max_causal`	The maximum number of non-zero effects (and thus causal variants).
`compute_n`	How to compute per-SNP sample size (new column "N"). If the column "N" is already present in `dat`, this column will be used to extract per-SNP sample sizes and the argument `compute_n` will be ignored. If the column "N" is not present in `dat`, one of the following options can be supplied to `compute_n`: `0`: N will not be computed. `>0`: If any number >0 is provided, that value will be set as N for every row. Note: Computing N this way is incorrect and should be avoided if at all possible. `"sum"`: N will be computed as: cases (N_CAS) + controls (N_CON), so long as both columns are present. `"ldsc"`: N will be computed as effective sample size: Neff =(N_CAS+N_CON)*(N_CAS/(N_CAS+N_CON)) / mean((N_CAS/(N_CAS+N_CON))(N_CAS+N_CON)==max(N_CAS+N_CON)). `"giant"`: N will be computed as effective sample size: Neff = 2 / (1/N_CAS + 1/N_CON). `"metal"`: N will be computed as effective sample size: Neff = 4 / (1/N_CAS + 1/N_CON).
`credset_thresh`	The minimum mean Posterior Probability (across all fine-mapping methods used) of SNPs to be included in the "mean.CS" column.
`rescale_priors`	If prior probabilities are supplied, rescale them from 0-1 (i.e. `rescaled_priors = priors / sum(priors)`).
`case_control`	Whether the summary statistics come from a case-control study (e.g. a GWAS of having Alzheimer's Disease or not) (`TRUE`) or a quantitative study (e.g. a GWAS of height, or an eQTL) (`FALSE`).
`conda_env`	Conda environment to use.
`force_new`	If saved results already exist in the given `locus_dir`, skip re-running FINEMAP and use them (default: `force_new`). Set `TRUE` to ignore these files and re-run FINEMAP.
`nThread`	Number of threads to parallelise across (when applicable).
`verbose`	Print messages.
`...`	Additional arguments passed to the chosen fine-mapping `method`.

Value

The same input SNP-wise dat but with the following additional columns:

"CS" : Credible Set of putative causal SNPs.
"PP" : Posterior (Inclusion) Probability of each SNP being causal, or belonging to the causal Credible Set.
"POLYFUN.h2" : The normalized heritability (h^2) used as prior probabilities during fine-mapping.

Source

PolyFun publication

PolyFun GitHub repo

Examples

locus_dir <- file.path(tempdir(),echodata::locus_dir)
dat <- echodata::BST1
LD_matrix <- echodata::BST1_LD_matrix

dat2 <- echofinemap::POLYFUN(locus_dir=locus_dir,
                             dat=dat,
                             LD_matrix = LD_matrix,
                             method="SUSIE")

RajLabMSSM/echofinemap documentation built on Jan. 3, 2023, 1:42 a.m.