knitr::opts_chunk$set(tidy=FALSE, cache=TRUE,
                      dev="png", 
                      message=FALSE, error=FALSE, warning=FALSE)

Note: if you use fastGenoGAM in published research, please cite:

Stricker and Engelhardt, et al. (2017) GenoGAM: Genome-wide generalized additive models for ChIP-seq analysis Bioinformatics, 33:15. 10.1093/bioinformatics/btx150

Quick guide (with HDF5)

This is the brief version of the usual workflow of fastGenoGAM. It involves:

Getting started

The example dataset is restricted to the first 100kb of the first yeast chromosome.

  1. We set some required meta variables
library(fastGenoGAM)

## A.
## specify folder and experiment design path
wd <- system.file("extdata/Set1", package='fastGenoGAM')
folder <- file.path(wd, "bam")
expDesign <- file.path(wd, "experimentDesign.txt")

## B. 
## set HDF5 folder where the data will be stored
## Note, don't usually use /tmp because all your data and
## results get deleted otherwise later
hdf5_folder <- tempdir()
settings <- GenoGAMSettings(hdf5Control = list(dir = hdf5_folder))

## C.
## register parallel backend (default is "the number of cores" - 2)
## Here we also use the SnowParam backend which is advised for larger data
## For yeast MulticoreParam should also do fine
BiocParallel::register(BiocParallel::SnowParam(worker = 2))

## D.
## specify chunk and overhang size. It is possible to skip this,
## but the automatic specification would prevent us from using
## multiple tiles in such a small example.
chunkSize <- 50000
overhangSize <- 1000

## E.
## the experiment design reflecting the underlying GAM
## x = position
design <- ~ s(x) + s(x, by = genotype)
  1. Read in the data to obtain a Robject{"GenoGAMDataSet"} object. Warnings about out-of-bound ranges can be ignored, as they occur during the shifting process when the genome gets tiled. It is trimmed accordingly. I have not yet figured out how to silent those warnings. The usual suppressWarnings does not seem to work in the parallel Snow environment.
## build the GenoGAMDataSet with HDF5 backend
ggd <- GenoGAMDataSet(
  expDesign, directory = folder,
  chunkSize = chunkSize, overhangSize = overhangSize,
  design = design,
  settings = settings, hdf5 = TRUE
)

ggd
  1. Compute Size factors
## compute size factors
ggd <- computeSizeFactors(ggd)

ggd

## the data
assay(ggd)

Fitting the model

  1. Compute model with fixed hyperparameters
## compute model without parameter estimation to save time in vignette
result <- genogam(ggd, lambda = 4601, theta = 4.51)

result

## the fit and standard error
fits(result)
se(result)
  1. Compute log p-values
computeSignificance(result)

pvalue(result)

Plot results of the center 10kb, where both tiles are joined together.

ranges = GRanges("chrI", IRanges(45000, 55000))
plotGenoGAM(result, ranges = ranges)

FAQ

Armadillo Error

Problem: An error occurs during model fitting along those lines:

error: matrix multiplication: incompatible matrix dimensions: 22333830147200x5360120267008000 and 4294972496x1

Solution: First, make sure you have all Armadillo dependencies installed correctly. See here

Second, the error is most likely related to the fact, that Armadillo is using 32bit matrices, thus causing problems for large matrices fastGenoGAM is using. The solution is to enable ARMA_64BIT_WORD, which is not enabled in RcppArmadillo by default. This should have been done during compilation, but if it fails for some reason you can do it manually with #define ARMA_64BIT_WORD 1 in my_R_Directory/lib/R/library/RcppArmadillo/include/RcppArmadilloConfig.h. See here.

Acknowledgments

We thank Alexander Engelhardt, Mathilde Galinier, Simon Wood, Herv\'e Pag`es, and Martin Morgan for input in the development of fastGenoGAM

Session Info

sessionInfo()

References



gstricker/fastGenoGAM documentation built on May 17, 2019, 8:56 a.m.