LDpred2: LDpred2
In bigsnpr: Analysis of Massive SNP Arrays

snp_ldpred2_inf

R Documentation

LDpred2

Description

LDpred2. Tutorial at https://privefl.github.io/bigsnpr/articles/LDpred2.html.

Usage

snp_ldpred2_inf(corr, df_beta, h2)

snp_ldpred2_grid(
  corr,
  df_beta,
  grid_param,
  burn_in = 50,
  num_iter = 100,
  ncores = 1,
  return_sampling_betas = FALSE,
  ind.corr = cols_along(corr)
)

snp_ldpred2_auto(
  corr,
  df_beta,
  h2_init,
  vec_p_init = 0.1,
  burn_in = 500,
  num_iter = 200,
  sparse = FALSE,
  verbose = FALSE,
  report_step = num_iter + 1L,
  allow_jump_sign = TRUE,
  shrink_corr = 1,
  use_MLE = TRUE,
  p_bounds = c(1e-05, 1),
  alpha_bounds = c(-1.5, 0.5),
  ind.corr = cols_along(corr),
  ncores = 1
)

Arguments

`corr`	Sparse correlation matrix as an SFBM. If `corr` is a dsCMatrix or a dgCMatrix, you can use `as_SFBM(corr)`.
`df_beta`	A data frame with 3 columns: `⁠$beta⁠`: effect size estimates `⁠$beta_se⁠`: standard errors of effect size estimates `⁠$n_eff⁠`: either GWAS sample size(s) when estimating `beta` for a continuous trait, or in the case of a binary trait, this is `4 / (1 / n_control + 1 / n_case)`; in the case of a meta-analysis, you should sum the effective sample sizes of each study instead of using the total numbers of cases and controls, see \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.biopsych.2022.05.029")}; when using a mixed model, the effective sample size needs to be adjusted as well, see \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.xhgg.2022.100136")}.
`h2`	Heritability estimate.
`grid_param`	A data frame with 3 columns as a grid of hyper-parameters: `⁠$p⁠`: proportion of causal variants `⁠$h2⁠`: heritability (captured by the variants used) `⁠$sparse⁠`: boolean, whether a sparse model is sought They can be run in parallel by changing `ncores`.
`burn_in`	Number of burn-in iterations.
`num_iter`	Number of iterations after burn-in.
`ncores`	Number of cores used. Default doesn't use parallelism. You may use `bigstatsr::nb_cores()`.
`return_sampling_betas`	Whether to return all sampling betas (after burn-in)? This is useful for assessing the uncertainty of the PRS at the individual level (see \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1101/2020.11.30.403188")}). Default is `FALSE` (only returns the averaged final vectors of betas). If `TRUE`, only one set of parameters is allowed.
`ind.corr`	Indices to "subset" `corr`, as if this was run with `corr[ind.corr, ind.corr]` instead. No subsetting by default.
`h2_init`	Heritability estimate for initialization.
`vec_p_init`	Vector of initial values for p. Default is `0.1`.
`sparse`	In LDpred2-auto, whether to also report a sparse solution by running LDpred2-grid with the estimates of p and h2 from LDpred2-auto, and sparsity enabled. Default is `FALSE`.
`verbose`	Whether to print "p // h2" estimates at each iteration. Disabled when parallelism is used.
`report_step`	Step to report sampling betas (after burn-in and before unscaling). Nothing is reported by default. If using `num_iter = 200` and `report_step = 20`, then 10 vectors of sampling betas are reported (as a sparse matrix with 10 columns).
`allow_jump_sign`	Whether to allow for effects sizes to change sign in consecutive iterations? Default is `TRUE` (normal sampling). You can use `FALSE` to force effects to go through 0 first before changing sign. Setting this parameter to `FALSE` could be useful to prevent instability (oscillation and ultimately divergence) of the Gibbs sampler. This would also be useful for accelerating convergence of chains with a large initial value for p.
`shrink_corr`	Shrinkage multiplicative coefficient to apply to off-diagonal elements of the correlation matrix. Default is `1` (unchanged). You can use e.g. `0.95` to add a bit of regularization.
`use_MLE`	Whether to use maximum likelihood estimation (MLE) to estimate alpha and the variance component (since v1.11.4), or assume that alpha is -1 and estimate the variance of (scaled) effects as h2/(m*p), as it was done in earlier versions of LDpred2-auto (e.g. in v1.10.8). Default is `TRUE`, which should provide a better model fit, but might also be less robust.
`p_bounds`	Boundaries for the estimates of p (the polygenicity). Default is `c(1e-5, 1)`. You can use the same value twice to fix p.
`alpha_bounds`	Boundaries for the estimates of `\alpha`. Default is `c(-1.5, 0.5)`. You can use the same value twice to fix `\alpha`.

Details

For reproducibility, set.seed() can be used to ensure that two runs of LDpred2 give the exact same results (since v1.10).

Value

snp_ldpred2_inf: A vector of effects, assuming an infinitesimal model.

snp_ldpred2_grid: A matrix of effect sizes, one vector (column) for each row of grid_param. Missing values are returned when strong divergence is detected. If using return_sampling_betas, each column corresponds to one iteration instead (after burn-in).

snp_ldpred2_auto: A list (over vec_p_init) of lists with

⁠$beta_est⁠: vector of effect sizes (on the allele scale); note that missing values are returned when strong divergence is detected
⁠$beta_est_sparse⁠ (only when sparse = TRUE): sparse vector of effect sizes
⁠$postp_est⁠: vector of posterior probabilities of being causal
⁠$corr_est⁠, the "imputed" correlations between variants and phenotypes, which can be used for post-QCing variants by comparing those to with(df_beta, beta / sqrt(n_eff * beta_se^2 + beta^2))
⁠$sample_beta⁠: sparse matrix of sampling betas (see parameter report_step), not on the allele scale, for which you need to multiply by with(df_beta, sqrt(n_eff * beta_se^2 + beta^2))
⁠$path_p_est⁠: full path of p estimates (including burn-in); useful to check convergence of the iterative algorithm
⁠$path_h2_est⁠: full path of h2 estimates (including burn-in); useful to check convergence of the iterative algorithm
⁠$path_alpha_est⁠: full path of alpha estimates (including burn-in); useful to check convergence of the iterative algorithm
⁠$h2_est⁠: estimate of the (SNP) heritability (also see coef_to_liab)
⁠$p_est⁠: estimate of p, the proportion of causal variants
⁠$alpha_est⁠: estimate of alpha, the parameter controlling the relationship between allele frequencies and expected effect sizes
⁠$h2_init⁠ and ⁠$p_init⁠: input parameters, for convenience