cleanCounts: Remove effects from covariates of interest

View source: R/cleanCounts.R

cleanCountsR Documentation

Remove effects from covariates of interest

Description

This function removes the effect from covariates of interest (such as GC content) from experimental counts

Usage

cleanCounts(object, effectNames, byNames = NULL, log = TRUE)

Arguments

object

an epigraHMMDataSet

effectNames

a character vector with the names of assays for which the effect will be removed from the experimental counts. Names in 'effectNames' must be assays stored in the epigraHMMDataSet 'object'.

byNames

a character vector with the name of an assay containing stratification variables which will be used to define stratum-specific effects. Examples of byNames assays include the 'peaks' assay from 'initializer()'. In this case, models will be fit separately for peaks and non-peaks regions. This can be useful for effects such as GC content, which are known to have a differential effect between peaks and non-peak regions. Default is NULL, i.e., effects will be removed without stratification.

log

a logical indicating if the effect from 'effectNames' should be log-transformed in the regression model (default is TRUE)

Value

An epigraHMMDataSet with an 'offset' assay filled in.

Author(s)

Pedro L. Baldoni, pedrobaldoni@gmail.com

References

https://github.com/plbaldoni/epigraHMM

Examples


# Creating dummy object
gc <- rbeta(3e3,50,50)

countData <- list('counts' = rbind(matrix(rnbinom(2e3,mu = 7.5,size = 10),ncol = 1),
                                   matrix(rnbinom(3e3,mu = exp(0.5 + 8*gc),size = 5),ncol = 1),
                                   matrix(rnbinom(2e3,mu = 7.5,size = 10),ncol = 1)),
                  'gc' = matrix(c(rbeta(2e3,50,50),gc,rbeta(2e3,50,50)),ncol = 1))

colData <- data.frame(condition = 'A', replicate = 1)
object <- epigraHMMDataSetFromMatrix(countData,colData)

# Initializing
object <- initializer(object = object,controlEM())

# Cleaning counts
object <- cleanCounts(object = object,effectNames = 'gc',byNames = 'peaks')

# Plotting the cleaned data
#par(mfrow = c(2,1))
#smoothScatter(log1p(assay(object))~assay(object,'gc'),xlab = 'gc',ylab = 'log counts')
#smoothScatter(as.numeric(log(assay(object)+1) - assay(object,'offsets'))~assay(object,'gc'),
#              xlab = 'gc',ylab = 'log cleaned counts')


plbaldoni/epigraHMM documentation built on Oct. 15, 2023, 7:53 p.m.