gmap: Finemapping using Bayesian Linear Regression Models

View source: R/genomic_bayes.R

gmapR Documentation

Finemapping using Bayesian Linear Regression Models

Description

In the Bayesian multiple regression model, the posterior density of the model parameters depends on the likelihood of the data given the parameters and a prior probability for the model parameters. The choice of the prior for marker effects can influence the type and extent of shrinkage induced in the model.

Usage

gmap(
  y = NULL,
  X = NULL,
  W = NULL,
  stat = NULL,
  trait = NULL,
  sets = NULL,
  fit = NULL,
  Glist = NULL,
  chr = NULL,
  rsids = NULL,
  ids = NULL,
  b = NULL,
  bm = NULL,
  seb = NULL,
  mask = NULL,
  LD = NULL,
  n = NULL,
  vg = NULL,
  vb = NULL,
  ve = NULL,
  ssg_prior = NULL,
  ssb_prior = NULL,
  sse_prior = NULL,
  lambda = NULL,
  scaleY = TRUE,
  shrinkLD = FALSE,
  shrinkCor = FALSE,
  formatLD = "dense",
  pruneLD = TRUE,
  r2 = 0.05,
  checkLD = TRUE,
  h2 = NULL,
  pi = 0.001,
  updateB = TRUE,
  updateG = TRUE,
  updateE = TRUE,
  updatePi = TRUE,
  adjustE = TRUE,
  models = NULL,
  checkConvergence = FALSE,
  critVe = 3,
  critVg = 5,
  critVb = 5,
  critPi = 3,
  ntrial = 1,
  nug = 4,
  nub = 4,
  nue = 4,
  verbose = FALSE,
  msize = 100,
  threshold = NULL,
  ve_prior = NULL,
  vg_prior = NULL,
  tol = 0.001,
  nit = 100,
  nburn = 50,
  nit_local = NULL,
  nit_global = NULL,
  method = "bayesC",
  algorithm = "mcmc"
)

Arguments

y

A vector or matrix of phenotypes.

X

A matrix of covariates.

W

A matrix of centered and scaled genotypes.

stat

Dataframe with marker summary statistics.

trait

Integer used for selection traits in covs object.

sets

A list of character vectors where each vector represents a set of items. If the names of the sets are not provided, they are named as "Set1", "Set2", etc.

fit

List of results from gbayes.

Glist

List of information about genotype matrix stored on disk.

chr

Chromosome for which to fit BLR models.

rsids

Character vector of rsids.

ids

vector of individuals used in the study

b

Vector or matrix of marginal marker effects.

bm

Vector or matrix of adjusted marker effects for the BLR model.

seb

Vector or matrix of standard error of marginal effects.

mask

Vector or matrix specifying if marker should be ignored.

LD

List with sparse LD matrices.

n

Scalar or vector of number of observations for each trait.

vg

Scalar or matrix of genetic (co)variances.

vb

Scalar or matrix of marker (co)variances.

ve

Scalar or matrix of residual (co)variances.

ssg_prior

Scalar or matrix of prior genetic (co)variances.

ssb_prior

Scalar or matrix of prior marker (co)variances.

sse_prior

Scalar or matrix of prior residual (co)variances.

lambda

Vector or matrix of lambda values

scaleY

Logical indicating if y should be scaled.

shrinkLD

Logical indicating if LD should be shrunk.

shrinkCor

Logical indicating if cor should be shrunk.

formatLD

Character specifying LD format (default is "dense").

pruneLD

Logical indicating if LD pruning should be applied.

r2

Scalar providing value for r2 threshold used in pruning

checkLD

Logical indicating if LD matches summary statistics.

h2

Trait heritability.

pi

Proportion of markers in each marker variance class.

updateB

Logical indicating if marker (co)variances should be updated.

updateG

Logical indicating if genetic (co)variances should be updated.

updateE

Logical indicating if residual (co)variances should be updated.

updatePi

Logical indicating if pi should be updated.

adjustE

Logical indicating if residual variance should be adjusted.

models

List structure with models evaluated in bayesC.

checkConvergence

Logical indicating if convergences should be checked.

critVe

Scalar providing value for z-score threshold used in checking convergence for Ve

critVg

Scalar providing value for z-score threshold used in checking convergence for Vg

critVb

Scalar providing value for z-score threshold used in checking convergence for Vg

critPi

Scalar providing value for z-score threshold used in checking convergence for Pi

ntrial

Integer providing number of trials used if convergence is not obtaines

nug

Scalar or vector of prior degrees of freedom for genetic (co)variances.

nub

Scalar or vector of prior degrees of freedom for marker (co)variances.

nue

Scalar or vector of prior degrees of freedom for residual (co)variances.

verbose

Logical; if TRUE, it prints more details during iteration.

msize

Integer providing number of markers used in computation of sparseld

threshold

Scalar providing value for threshold used in adjustment of B

ve_prior

Scalar or matrix of prior residual (co)variances.

vg_prior

Scalar or matrix of prior genetic (co)variances.

tol

Convergence criteria used in gbayes.

nit

Number of iterations.

nburn

Number of burnin iterations.

nit_local

Number of local iterations.

nit_global

Number of global iterations.

method

Method used (e.g. "bayesN","bayesA","bayesL","bayesC","bayesR").

algorithm

Specifies the algorithm.

Details

This function implements Bayesian linear regression models to provide unified mapping of genetic variants, estimate genetic parameters (e.g. heritability), and predict disease risk. It is designed to handle various genetic architectures and scale efficiently with large datasets.

Value

Returns a list structure including the following components:

bm

Vector or matrix of posterior means for marker effects.

dm

Vector or matrix of posterior means for marker inclusion probabilities.

vb

Scalar or vector of posterior means for marker variances.

vg

Scalar or vector of posterior means for genomic variances.

ve

Scalar or vector of posterior means for residual variances.

rb

Matrix of posterior means for marker correlations.

rg

Matrix of posterior means for genomic correlations.

re

Matrix of posterior means for residual correlations.

pi

Vector of posterior probabilities for models.

h2

Vector of posterior means for model probability.

param

List of current parameters used for restarting the analysis.

stat

Matrix of marker information and effects used for genomic risk scoring.

Author(s)

Peter Sørensen

Examples


# Plink bed/bim/fam files
bedfiles <- system.file("extdata", paste0("sample_chr",1:2,".bed"), package = "qgg")
bimfiles <- system.file("extdata", paste0("sample_chr",1:2,".bim"), package = "qgg")
famfiles <- system.file("extdata", paste0("sample_chr",1:2,".fam"), package = "qgg")

# Prepare Glist
Glist <- gprep(study="Example", bedfiles=bedfiles, bimfiles=bimfiles, famfiles=famfiles)

# Simulate phenotype
sim <- gsim(Glist=Glist, chr=1, nt=1)

# Compute single marker summary statistics
stat <- glma(y=sim$y, Glist=Glist, scale=FALSE)
str(stat)

# Define fine-mapping regions 
sets <- Glist$rsids
Glist$chr[[1]] <- gsub("21","1",Glist$chr[[1]]) 
Glist$chr[[2]] <- gsub("22","2",Glist$chr[[2]]) 

# Fine map
fit <- gmap(Glist=Glist, stat=stat, sets=sets, verbose=FALSE, 
            method="bayesC", nit=1500, nburn=500, pi=0.001)
            
fit$post  # Posterior inference for every fine-mapped region
fit$conv  # Convergence statistics for every fine-mapped region

# Posterior inference for marker effect
head(fit$stat)             


psoerensen/qgg documentation built on March 9, 2024, 10:02 p.m.