View source: R/correctBatchEffect.R
This method combines most functions of the
BEclearpackage
to one. The method performs the whole process
of searching for batch effects and automatically correct them for a matrix
of beta values stemming from DNA methylation data.
1 2 3 4 
data 
any matrix filled with beta values, column names have to be sample_ids corresponding to the ids listed in "samples", row names have to be gene names. 
samples 
data frame with two columns, the first column has to contain the sample numbers, the second column has to contain the corresponding batch number. Colnames have to be named as "sample_id" and "batch_id". 
adjusted 
should the pvalues be adjusted or not, see "method" for available adjustment methods. 
method 
adjustment method for pvalue adjustment, default
method is "false discovery rate adjustment", for other available methods see
the description of the used standard R package 
mediansTreshold 
the threshold above or equal median values are regarded as batch effected, when the criteria for pvalues is also met. 
pvaluesTreshold 
the threshold below or equal pvalues are regarded as batch effected, when the criteria for medians is also met. 
rowBlockSize 
the number of rows that is used in a block if the
function is run in parallel mode and/or not on the whole matrix. Set this,
and the "colBlockSize" parameter to 0 if you want to run the function on the
whole input matrix. See 
colBlockSize 
the number of columns that is used in a block if the
function is run in parallel mode and/or not on the whole matrix. Set this,
and the "rowBlockSize" parameter to 0 if you want to run the function on the
whole input matrix. See 
epochs 
the number of iterations used in the gradient descent algorithm
to predict the missing entries in the data matrix. See

lambda 
constant that controls the extent of regularization during the gradient descent 
gamma 
constant that controls the extent of the shift of parameters during the gradient descent 
r 
length of the second dimension of variable matrices R and L 
outputFormat 
you can choose if the finally returned data matrix should
be saved as an .RData file or as a tabdelimited .txt file in the specified
directory. Allowed values are "RData" and "txt".
See 
dir 
set the path to a directory the predicted matrix should be stored. The current working directory is defined as default parameter. 
BPPARAM 
An instance of the

fixedSeed 
determines if they seed should be fixed, which is important for testing 
correctBatchEffect
The function performs the whole process of searching for batch
effects and automatically correct them for a matrix of beta values stemming
from DNA methylation data. Thereby, the function is running most of the
functions from the BEclearpackage
in a logical order.
First, median comparison values are calculated by the
calcBatchEffects
function, followed by the calculation of pvalues
also by the calcBatchEffects
function. With the results from the median
comparison and pvalue calculation, a summary data frame is build using the
calcSummary
function, and a scoring table is established by
the calcScore
function. Now, found entries from the summary are
set to NA in the input matrix using the clearBEgenes
function,
then the imputeMissingData
function is used to predict the
missing values and at the end, predicted entries outside the
boundaries (values lower than 0 or greater than 1) are corrected using the
replaceOutsideValues
function.
A list containing the following fields (for detailed information look at the documentations of the corresponding functions):
A data.frame containing all median comparison values corresponding to the input matrix.
A data.frame containing all pvalues corresponding to the input matrix.
The summarized results of the median comparison and pvalue calculation.
A data.frame containing the number of found genes and a BEscore for every batch.
the input matrix with all values defined in the summary set to NA.
the input matrix after all previously NA values have been predicted.
the predicted matrix after the correction for predicted values outside the boundaries.
Akulenko2016BEclear
\insertRefKoren2009BEclear
\insertRefCandes2009BEclear
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27  ## Shortly running example. For a more realistic example that takes
## some more time, run the same procedure with the full BEclearData
## dataset.
## Whole procedure that has to be done to use this function.
## Correct the example data for a batch effect
data(BEclearData)
ex.data < ex.data[31:90, 7:26]
ex.samples < ex.samples[7:26, ]
# Note that row and block sizes are just set to 10 to get a short runtime.
# To use these parameters, either use the default values or please note the
# description in the details section of \code{\link{imputeMissingData}}
result < correctBatchEffect(
data = ex.data, samples = ex.samples,
adjusted = TRUE, method = "fdr", rowBlockSize = 10, colBlockSize = 10,
epochs = 50, outputFormat = "RData", dir = getwd()
)
# Unlist variables
medians < result$medians
pvals < result$pvals
summary < result$summary
score < result$score.table
cleared < result$clearedData
predicted < result$predictedData
corrected < result$correctedPredictedData

Loading required package: BiocParallel
INFO [20210129 18:14:37] Transforming matrix to data.table
INFO [20210129 18:14:37] Calculate the batch effects for 4 batches
INFO [20210129 18:14:38] Adjusting pvalues
INFO [20210129 18:14:38] Generating a summary table
INFO [20210129 18:14:38] Calculating the scores for 4 batches
INFO [20210129 18:14:38] Removing values with batch effect:
INFO [20210129 18:14:38] 70 values ( 5.83333333333333 % of the data) set to NA
INFO [20210129 18:14:38] Starting the imputation of missing values.
INFO [20210129 18:14:38] This might take a while.
INFO [20210129 18:14:38] BEclear imputation is started:
INFO [20210129 18:14:38] block size: 10 x 10
INFO [20210129 18:14:38] Impute missing data for block 1 of 12
INFO [20210129 18:14:38] Impute missing data for block 2 of 12
INFO [20210129 18:14:38] Impute missing data for block 3 of 12
INFO [20210129 18:14:38] Impute missing data for block 4 of 12
INFO [20210129 18:14:38] Impute missing data for block 5 of 12
INFO [20210129 18:14:38] Impute missing data for block 6 of 12
INFO [20210129 18:14:38] Impute missing data for block 7 of 12
INFO [20210129 18:14:38] Impute missing data for block 8 of 12
INFO [20210129 18:14:38] Impute missing data for block 9 of 12
INFO [20210129 18:14:38] Impute missing data for block 10 of 12
INFO [20210129 18:14:38] Impute missing data for block 11 of 12
INFO [20210129 18:14:38] Impute missing data for block 12 of 12
INFO [20210129 18:14:38] Replacing values below 0 or above 1:
INFO [20210129 18:14:38] 0 values replaced
