fineboost_normal: A genetic fine-mapping method using kernel-based Forward...

View source: R/fineboost_normal.R

fineboost_normalR Documentation

A genetic fine-mapping method using kernel-based Forward Stepwise boosted regression model.

Description

This function uses a kernel-based FS-boost framework to find causal fine-mapped SNP sets from GWAS or QTL effect size data. It performs a regression Y = Xb + e where b is a sparse vector of coefficients with local signal clusters. Unlike other fine-mapping methods, our algorithm is model-free on the coefficients b.

Usage

fineboost_normal(X, Y, M = 1000, Lmax = 5, LD = NULL, step = 0.1,
  kern_tau = 0.01, method = "LS", kernel = "L2",
  stop_thresh = 1e-04, na.rm = FALSE, intercept = TRUE,
  standardize = TRUE, coverage = 0.95, clus_thresh = 0.1,
  min_within_LD = 0.5, min_between_LD = 0.25,
  min_clus_centrality = 0.5, nmf_try = 5, verbose = TRUE)

Arguments

X

The design matrix X (N times P) with samples/individuals along the rows and putatively correlated ordered features (SNPs) along the columns.

Y

The response vector of length N

M

The maximum number of boosting iterations to run. Default is 1000.

Lmax

The maximum number of local signal clusters fitted.

LD

The external LD matrix for the P features of interest. Defaults to NULL, in which case, in-sample LD is used.

step

The stepsize used in boosting iterations. Default set to 0.05.

kern_tau

The smoothing intensity of the kernel averaging at each boosting iteration. Default set to 0.01.

method

The boosting update method- either 'LS' or 'FS' indicating the LS-Boost and FS-epsilon methods respectively. Default is set to LS-Boost.

kernel

The nature of the kernel used for smoothing. Can be either 'L1', 'L2', 'epanechnikov' or 'prune'. 'L1' kernel uses a L-1 norm based kernel, 'L2' uses a L-2 norm based kernel, 'epanechnikov' uses an Epanechnikov kernel and 'prune' uses a uniform kernel on all SNPs with high LD to the optimal SNP at each boosting iteration.

stop_thresh

The stopping threshold (small number) for the objective function, when attained, the boosting iterations will stop automatically. Default is 0.1.

na.rm

Drop missing samples in y from both y and X inputs. Default set to FALSE.

intercept

Boolean; if there is an intercept in the model to fit. Defaults to TRUE.

standardize

Boolean; if the columns of X need to be standardized. Defaults to TRUE.

coverage

A number between 0 and 1 (close to 1) specifying the coverage of the estimated signal clusters. Default set to 0.95.

min_within_LD

The minimum value of LD permitted for SNPs within a local signal cluster. Default is 0.25.

min_between_LD

The minimum value of LD permitted for SNPs across two local signal clusters. Default is it cannot exceed 0.25.

nmf_try

The number of NMF initiializations to fix the confidence sets. Default is set to 5.

verbose

If verbose = TRUE, information about the objective and progress at each iteration of the kerne-based boosting procedure is returned.

min_cluster_centrality

The minimum value of cluster centrailty required for a SNP to make the cut in a local signal cluster. Default is set at 0.5.

min_abs_corr

Minimum of absolute value of correlation allowed in a credible set. The default, 0.5, corresponds to squared correlation of 0.25, which is a commonly used threshold for genotype data in genetics studies.

Value

A "fineboost" object with the following elements:

N
P
Lmax
beta

Y = Xb + e.

beta_path
weights_path
profile_loglik
obj_path
csets

kkdey/fineboostR documentation built on Jan. 1, 2023, 4:48 p.m.