calcBatchEffects: Calculate the Batch Effects in a given data set

View source: R/calcBatchEffects.R

calcBatchEffectsR Documentation

Calculate the Batch Effects in a given data set

Description

Calculates for each gene in every batch the median distance to the other batches and the p-value resulting from the Kolmogorov-Smirnov test.

Usage

calcBatchEffects(data, samples, adjusted=TRUE, method="fdr",
BPPARAM=SerialParam())

Arguments

data

a data.table with one column indicating the sample, one the features and a value column indicating the beta value or a matrix with rows as features and columns as samples

samples

data frame with two columns, the first column has to contain the sample numbers, the second column has to contain the corresponding batch number. Colnames have to be named as "sample_id" and "batch_id".

adjusted

should the p-values be adjusted or not, see "method" for available adjustment methods.

method

adjustment method for p-value adjustment (if TRUE), default method is "false discovery rate adjustment", for other available methods see the description of the used standard R package p.adjust.

BPPARAM

An instance of the BiocParallelParam-class that determines how to parallelisation of the functions will be evaluated.

Details

calcBatchEffects

  1. medians Compares the median value of all beta values belonging to one batch with the median value of all beta values belonging to all other batches. Returns a matrix containing this median difference value for every gene in every batch, columns define the batch numbers, rows the gene names.

  2. p-values Compares the distribution of all beta values corresponding to one batch with the distribution of all beta values corresponding to all other batches and returns a p-value which defines if the distributions are the same or not. Standard two sided Kolmogorov-Smirnov test is used to calculate the (adjusted) p-values.

Value

a matrix containing medians and p-values for all genes in all batches

See Also

ks.test

p.adjust

correctBatchEffect

Examples

## Shortly running example. For a more realistic example that takes
## some more time, run the same procedure with the full BEclearData
## dataset.

## Calculate fdr-adjusted p-values in non-parallel mode
data(BEclearData)
ex.data <- ex.data[31:90, 7:26]
ex.samples <- ex.samples[7:26, ]


res <- calcBatchEffects(data = ex.data, samples = ex.samples, method = "fdr")

## How to handle data-sets without defined batches
## https://github.com/Livia-Rasp/BEclear/issues/22
library(data.table)
data(BEclearData)

DT <- data.table(ex.samples)[, .(sample_id)]

## set the batch_id equal to the sample_id
## this way samples are treated as batches
DT[, batch_id := sample_id]

res <- calcBatchEffects(data = ex.data, samples = DT)

David-J-R/BEclear documentation built on April 17, 2023, 2:19 p.m.