POLYFUN: Run PolyFun+SUSIE fine-mapping pipeline

View source: R/POLYFUN.R

POLYFUNR Documentation

Run PolyFun+SUSIE fine-mapping pipeline

Description

Uses echolocatoR wrapper for SUSIE instead of the POLYFUN_finemapper. function which uses a python script provided with PolyFun.

Usage

POLYFUN(
  dat,
  LD_matrix,
  locus_dir,
  polyfun_path = NULL,
  mode = c("precomputed", "parametric", "non-parametric"),
  method = c("SUSIE", "FINEMAP"),
  max_causal = 5,
  compute_n = "ldsc",
  credset_thresh = 0.95,
  rescale_priors = TRUE,
  case_control = TRUE,
  conda_env = "echoR_mini",
  force_new = FALSE,
  nThread = 1,
  verbose = TRUE,
  ...
)

Arguments

dat

Fine-mapping results data.

LD_matrix

Linkage Disequilibrium (LD) matrix to use for fine-mapping.

locus_dir

Locus-specific directory to store results in.

polyfun_path

[Optional] Path to PolyFun directory where all the executables and reference data are stored. Will be automatically installed if set to NULL (default).

mode

PolyFun can run in several different modes corresponding to how SNP-wise prior causal probabilities (i.e. priors) are computed:

  • "precomputed" : Using precomputed prior causal probabilities based on a meta-analysis of 15 UK Biobank traits. The meta-analysis was performed as part of the original PolyFun publication.

  • "parametric" : Computing prior causal probabilities via an L2-regularized extension of stratified LD-score regression (S-LDSC). This is a relatively simple approach, but the prior causal probabilities may not be robust to modeling misspecification. Gathered from the "*.snpvar_ridge_constrained.gz" output files from PolyFun.

  • "non-parametric" : Computing prior causal probabilities non-parametrically. This is the most robust approach, but it is computationally intensive and requires access to individual-level genotypic data from a large reference panel (optimally >10,000 population-matched individuals). Gathered from the "*.snpvar_constrained.gz" output files from PolyFun.

method

Method to conduct fine-mapping step with (using priors generated by PolyFun).

  • "SUSIE": Uses SUSIE

  • "FINEMAP": Uses FINEMAP

max_causal

The maximum number of non-zero effects (and thus causal variants).

compute_n

How to compute per-SNP sample size (new column "N").
If the column "N" is already present in dat, this column will be used to extract per-SNP sample sizes and the argument compute_n will be ignored.
If the column "N" is not present in dat, one of the following options can be supplied to compute_n:

  • 0: N will not be computed.

  • >0: If any number >0 is provided, that value will be set as N for every row. **Note**: Computing N this way is incorrect and should be avoided if at all possible.

  • "sum": N will be computed as: cases (N_CAS) + controls (N_CON), so long as both columns are present.

  • "ldsc": N will be computed as effective sample size: Neff =(N_CAS+N_CON)*(N_CAS/(N_CAS+N_CON)) / mean((N_CAS/(N_CAS+N_CON))(N_CAS+N_CON)==max(N_CAS+N_CON)).

  • "giant": N will be computed as effective sample size: Neff = 2 / (1/N_CAS + 1/N_CON).

  • "metal": N will be computed as effective sample size: Neff = 4 / (1/N_CAS + 1/N_CON).

credset_thresh

The minimum mean Posterior Probability (across all fine-mapping methods used) of SNPs to be included in the "mean.CS" column.

rescale_priors

If prior probabilities are supplied, rescale them from 0-1 (i.e. rescaled_priors = priors / sum(priors)).

case_control

Whether the summary statistics come from a case-control study (e.g. a GWAS of having Alzheimer's Disease or not) (TRUE) or a quantitative study (e.g. a GWAS of height, or an eQTL) (FALSE).

conda_env

Conda environment to use.

force_new

If saved results already exist in the given locus_dir, skip re-running FINEMAP and use them (default: force_new). Set TRUE to ignore these files and re-run FINEMAP.

nThread

Number of threads to parallelise across (when applicable).

verbose

Print messages.

...

Additional arguments passed to the chosen fine-mapping method.

Value

The same input SNP-wise dat but with the following additional columns:

  • "CS" : Credible Set of putative causal SNPs.

  • "PP" : Posterior (Inclusion) Probability of each SNP being causal, or belonging to the causal Credible Set.

  • "POLYFUN.h2" : The normalized heritability (h^2) used as prior probabilities during fine-mapping.

Source

PolyFun publication

PolyFun GitHub repo

See Also

Other polyfun: POLYFUN_compute_priors(), POLYFUN_download_ref_files(), POLYFUN_find_folder(), POLYFUN_finemapper(), POLYFUN_gather_annotations(), POLYFUN_gather_ldscores(), POLYFUN_help(), POLYFUN_import_priors(), POLYFUN_initialize(), POLYFUN_munge_summ_stats(), POLYFUN_prepare_snp_input(), POLYFUN_run_ldsc()

Examples

locus_dir <- file.path(tempdir(),echodata::locus_dir)
dat <- echodata::BST1
LD_matrix <- echodata::BST1_LD_matrix

dat2 <- echofinemap::POLYFUN(locus_dir=locus_dir,
                             dat=dat,
                             LD_matrix = LD_matrix,
                             method="SUSIE")

RajLabMSSM/echofinemap documentation built on Jan. 3, 2023, 1:42 a.m.