batch_counts: Perform different batch corrections using limma, sva, ruvg,...
In elsayed-lab/hpgltools: A pile of (hopefully) useful R functions

batch_counts

R Documentation

Perform different batch corrections using limma, sva, ruvg, and cbcbSEQ.

Description

I found this note which is the clearest explanation of what happens with batch effect data: https://support.bioconductor.org/p/76099/ Just to be clear, there's an important difference between removing a batch effect and modelling a batch effect. Including the batch in your design formula will model the batch effect in the regression step, which means that the raw data are not modified (so the batch effect is not removed), but instead the regression will estimate the size of the batch effect and subtract it out when performing all other tests. In addition, the model's residual degrees of freedom will be reduced appropriately to reflect the fact that some degrees of freedom were "spent" modelling the batch effects. This is the preferred approach for any method that is capable of using it (this includes DESeq2). You would only remove the batch effect (e.g. using limma's removeBatchEffect function) if you were going to do some kind of downstream analysis that can't model the batch effects, such as training a classifier. I don't have experience with ComBat, but I would expect that you run it on log-transformed CPM values, while DESeq2 expects raw counts as input. I couldn't tell you how to properly use the two methods together.

Usage

batch_counts(
  count_table,
  method = TRUE,
  expt_design = NULL,
  batch1 = "batch",
  current_state = NULL,
  current_design = NULL,
  expt_state = NULL,
  surrogate_method = NULL,
  surrogates = NULL,
  low_to_zero = FALSE,
  cpus = 4,
  batch2 = NULL,
  noscale = TRUE,
  ...
)

Arguments

`count_table`	Matrix of (pseudo)counts.
`method`	Choose the method for batch/surrogate estimation.
`expt_design`	Model matrix defining the experimental conditions/batches/etc.
`batch1`	String describing the method to try to remove the batch effect (or FALSE to leave it alone, TRUE uses limma).
`current_state`	Current state of the expt in an attempt to avoid double-normalization.
`current_design`	Redundant with expt_design above, but provides another place for normalize_expt() to send data.
`expt_state`	Current state of the data
`surrogate_method`	Also redundant for normalize_expt()
`surrogates`	Number of surrogates or method to estimate them.
`low_to_zero`	Send <0 entries to 0 to avoid shenanigans.
`cpus`	Parallelize intensive operations.
`batch2`	Column in the design table describing the second covariant to remove (only used by limma at the moment).
`noscale`	Used for combatmod, when true it removes the scaling parameter from the invocation of the modified combat.
`...`	More options for you!

Value

The 'batch corrected' count table and new library size. Please remember that the library size which comes out of this may not be what you want for voom/limma and would therefore lead to spurious differential expression values.

Examples

## Not run: 
 limma_batch <- batch_counts(table, design, batch1='batch', batch2='strain')
 sva_batch <- batch_counts(table, design, batch='sva')

## End(Not run)

elsayed-lab/hpgltools documentation built on May 9, 2024, 5:02 a.m.