sbl: Sparse Bayesian Learning for QTL Mapping and Genome-Wide Association Studies"

knitr::opts_chunk$set(echo = TRUE)

Introduction

Single marker models that detecting one locus at a time are subject to many problems in genome-wide association studies (GWAS) and quantitative trait locus (QTL) mapping, which includes large matrix inversion, over-conservativeness after Bonferroni correction and difficulty in evaluation of total genetic contribution. Such problems can be solved by a multiple locus model which includes all markers in the same model with effects being estimated simultaneously. The sparse Bayesian learning method (SBL), implemented in sbl package, is a multiple locus model that can handle extremely large sample size (>100,000) and outcompetes other multiple locus GWAS methods in terms of detection power and computing time.

sbl package installation

sbl can be downloaded and installed locally. The download link is here.

install.packages('sbl_0.1.0.tar.gz', repos=NULL, type='source')

SBL for QTL mapping and GWAS

The usage of sbl to perform QTL mapping and GWAS is very simple (Note: Please remove the markers without variation before running the program):

  1. Load input data
    • A vector of response variables (observations of the trait).
    • A design matrix for fixed effects, which represents non-genetic factors that contribute to the phenotypic variance. This can be the intercept only.
    • A matrix storing genotype indicators. The original three genotypes ($a_1a_1$, $a_1a_2$, $a_2a_2$) should be coded to numeric indicator as (-1, 0, 1) or (0, 1, 2).
library('sbl')

# load example data
data(phe)
data(intercept)
data(gen)
  1. Call sblgwas() function to perform regression and detect significant markers.
# A minimal invocation of "sblgwas()" function looks like: 
fit1<-sblgwas(x = intercept, y = phe, z = gen)

# Restuls of markers surrounding the second simulated QTL with non-zero effect in the example data
fit1$blup[c(17:21),]

Users can arbitrarily set the value of t between [0, 2] to control the sparseness of model, default is -1.

# Setting t = 0 leads to the most sparse model  
fit2<-sblgwas(x = intercept, y = phe, z = gen, t = 0)
fit2$parm

# Setting t = -2 leads to the least sparse model  
fit3<-sblgwas(x = intercept, y = phe, z = gen, t = -2)
fit3$parm

max.iter and min.err are two thresholds to stop the program when either of them is met. max.iter defines the maximum number of iterations that the program is allowed to run, default is 200; min.err defines the minimum threshold of mean squared error of random effects estimated from the current and the previous iteration, default is 1e-6.

# Set max.iter and min.err to control the convergence of the program 
fit4<-sblgwas(x = intercept, y = phe, z = gen, t = -1, max.iter = 300, min.err = 1e-8)
fit4$parm


Try the sbl package in your browser

Any scripts or data that you put into this service are public.

sbl documentation built on May 1, 2019, 10:06 p.m.