create_prs: Pruning+thresholding polygenic risk score creation based on...

create_prsR Documentation

Pruning+thresholding polygenic risk score creation based on summary statistics

Description

Create a polygenic risk score based on summary statistics from prior GWAS/pQTL discovery studies. Included variants in high LD can be decorrelated to prevent double-counting, using the conditional argument.

Usage

create_prs(
  variant_data,
  gwas_info,
  remove_indels = FALSE,
  imp_threshold = 0.8,
  binary_outcome = TRUE,
  exclude_extreme_associations = TRUE,
  maf_filter = 0.001,
  LDplot = FALSE,
  pruning_threshold = 0.75,
  pruning_filter = "p",
  pval_threshold = 5e-08,
  conditional = FALSE,
  cond_window = 35000,
  cond_N = 60000,
  cond_stepwise = TRUE,
  ridge = FALSE,
  lambda = 0,
  scale = FALSE,
  flowchart = TRUE
)

Arguments

variant_data

An object of format output by extract_variants().

gwas_info

An object generated by get_trait_variants() or get_pQTLs().

remove_indels

If TRUE, removes indels.

imp_threshold

Imputation quality threshold, based on R^2. Any variant with lower imputation R^2 is removed.

binary_outcome

Set to TRUE for binary traits, and FALSE for continuous outcomes (including pQTLs).

exclude_extreme_associations

If TRUE, removes variants with an odds ratio > 5 or <1/5.

maf_filter

Variants with a MAF below the specified threshold are filtered.

LDplot

If TRUE, plots the LD matrix (squared correlation matrix of variants).

pruning_threshold

Variants in LD >= pruning_threshold with other variants are removed.

pruning_filter

The criterion by which to keep one variant of a high LD pair. By default, it keeps the variant with the lower p-value, but keeping the variant with higher MAF is also possible.

pval_threshold

Variants with GWAS p-values > pval_threshold are discarded. Set to 1 to turn off.

conditional

If TRUE, uses marg2con() to decorrelate variants in LD if these are within a given bp distance on the same chromosome.

cond_window

A genomic distance within which to decorrelate variants in LD, in base pair.

cond_N

The sample size of the original GWAS from which the marginal estimates were derived, or an approximation of it.

cond_stepwise

If TRUE, iteratively applies conditional analysis conditioning on the top variant within the specified window, each time removing variants for which p > pval_threshold, until only conditionally independent variants remain. When set to FALSE, conditional joint analysis (i.e. of several variants jointly) is applied within the specified window.

ridge

If TRUE, applies a ridge penalty. This only applies when conditional is set to TRUE.

lambda

The parameter controlling the degree of ridge regularization.

scale

Centers and standardizes the polygenic risk score if TRUE.

flowchart

If TRUE, plots a flowchart describing the creation of the polygenic risk score.

Value

A list containing several data.frames with all relevant information. The risk score is stored in element 'prs'.

Examples

# vte_prs <- create_prs(vte_extracted_variants, vte_gwas_info)
# hist(vte_prs$prs$prs)

vincent10kd/polygenic documentation built on Feb. 25, 2024, 10:17 a.m.