BEclear-package: Correction of batch effects in DNA methylation data
In BEclear: Correction of batch effects in DNA methylation data

Description Details Author(s) References See Also Examples

Provides some functions to detect and correct for batch effects in DNA methylation data. The core function BEclear is based on Latent Factor Models and can also be used to predict missing values in any other matrix containing real numbers.

BEclear-package

correctBatchEffect: The function combines most functions of the BEclear-package to one. This function performs the whole process of searching for batch effects and automatically correct them for a matrix of beta values stemming from DNA methylation data.
BEclear: This function predicts the missing entries of an input matrix (NA values) through the use of a Latent Factor Model.
calcBatchEffects: Compares the median value of all beta values belonging to one batch with the median value of all beta values belonging to all other batches. Returns a matrix containing this median difference value for every gene in every batch, columns define the batch numbers, rows the gene names.
And compares the distribution of all beta values corresponding to one batch with the distribution of all beta values corresponding to all other batches and returns a p-value which defines if the distributions are the same or not.
calcSummary: Summarizes the results of the calcBatchEffects function
calcScore: Returns a table with the number of found genes with found p-values less or equal to 0.01 and median values greater or equal to 0.05. A score is calculated depending on the number of found genes as well as the magnitude of the median difference values, this score is divided by the overall number of genes in the data and returned as "BEscore". See the methods details for further information and details about the score calculation.
makeBoxplot: A simple boxplot is done with boxes either separated by batches or by samples and describe the five number summary of all beta values corresponding to a batch or a sample, respectively. The batch_ids are shown on the x-axis with a coloring corresponding to the BEscore.
clearBEgenes: A function that simply sets all values to NA which were previously found by median value comparison and p-value calculation and are stored in a summary. The summary defines which values in the data matrix are set to NA.
countValuesToPredict: Simple function that counts all values in a matrix which are NA
findOutsideValues: A method which lists values below 0 or beyond 1 contained in the input matrix. These entries are stored in a data.frame together with the corresponding row and column position of the matrix.
replaceOutsideValues: A method which replaces values below 0 or beyond 1 contained in the input matrix. These entries outside the boundaries are replaced by 0 or 1, respectively.

Ruslan Akulenko, Markus Merl, David Rasp

\insertRef

Akulenko2016BEclear

Useful links:

https://github.com/David-J-R/BEclear
Report bugs at https://github.com/David-J-R/BEclear/issues

data(BEclearData)
## Calculate the batch effects
batchEffects <- calcBatchEffects(data = ex.data, samples = ex.samples,
adjusted = TRUE, method = "fdr")
med <- batchEffects$med
pvals <- batchEffects$pval

## Summarize p-values and median differences for batch affected genes
sum <- calcSummary(medians = med, pvalues = pvals)

## Calculates the score table
score.table <- calcScore(data = ex.data, samples = ex.samples, summary = sum)

## Simple boxplot for the example data separated by batch
makeBoxplot(
  data = ex.data, samples = ex.samples, score = score.table,
  bySamples = FALSE, main = "Some box plot"
)

## Simple boxplot for the example data separated by samples
makeBoxplot(
  data = ex.data, samples = ex.samples, score = score.table,
  bySamples = TRUE, main = "Some box plot"
)

## Sets assumed batch affected entries to NA
cleared <- clearBEgenes(data = ex.data, samples = ex.samples, summary = sum)
## Counts and stores number of entries to predict
numberOfEntries <- countValuesToPredict(data = cleared)
## Not run: 
## Predicts the missing entries
predicted <- imputeMissingData(data = cleared)

## Find predicted entries outside the boundaries
outsideEntries <- findOutsideValues(data = predicted)

## Replace predicted entries outside the boundaries
corrected <- replaceOutsideValues(data = predicted)

## End(Not run)