Nothing
#' calcScore
#'
#' @aliases calcScore
#'
#' @title calculate batch effect score
#'
#' @seealso \code{\link{calcBatchEffects}}
#' @seealso \code{\link{calcSummary}}
#' @seealso \code{\link{correctBatchEffect}}
#'
#' @description Returns a table with the number of found genes with found
#' p-values less or equal to 0.01 and median values greater or equal to 0.05.
#' A score is calculated depending on the number of found genes as well as the
#' magnitude of the median difference values, this score is divided by the
#' overall number of genes in the data and returned as "BEscore". See details
#' for further information and details about the score calculation. The
#' returned data.frame is also stored in the specified directory as .RData file.
#'
#' @details The returned data frame contains one column for the batch numbers,
#' 11 columns containing the number of genes found in a certain range of the
#' median difference value and a column with the calculated BEscore. These
#' found genes are assumed to be batch affected due to their difference in
#' median values and their different distribution of the beta values. The higher
#' the found number of genes and the more extreme the median difference is, the
#' more severe is the assumed batch effect supposed to be. We suggest that there
#' is no need for a batch effect correction if the BEscore for a batch is less
#' than 0.02. BEscores between 0.02 and 0.1 are lying in a "gray area" for which
#' we assume a not severe batch effect, and values beyond 0.1 certainly describe
#' a batch effect and should definitely be corrected.\cr
#' The 11 columns containing the numbers of found genes count the median
#' difference values which are ranging from >= 0.05 to < 0.1 ; >= 0.1 to < 0.2;
#' >= 0.2 to < 0.3 and so on up to a limit of 1.\cr
#' The BEscore is calculated by the sum of the weighted number of genes divided
#' by the number of genes. Weightings are calculated by multiplication of the
#' number of found genes between 0.05 and 0.1 by 1, between 0.1 and 0.2 by 2,
#' between 0.2 and 0.3 by 4, between 0.3 and 0.4 by 6 and so on.
#'
#' @param data any matrix filled with beta values, column names have to be
#' sample_ids corresponding to the ids listed in "samples", row names have to
#' be gene names.
#' @param samples data frame with two columns, the first column has to contain
#' the sample numbers, the second column has to contain the corresponding batch
#' number. Colnames have to be named as "sample_id" and "batch_id".
#' @param summary a summary \code{\link[data.table]{data.table}} containing the
#' columns "gene", "batch", median" and "p-value" and consists of all genes which
#' were found in the median and p-value calculations, see
#' \code{\link{calcSummary}} function for more details.
#' @param saveAsFile determining if the data.frame should also be saved as a file
#' @param dir set the path to a directory the returned data.frame should be
#' stored. The current working directory is defined as default parameter.
#'
#' @export calcScore
#' @import futile.logger
#' @import data.table
#' @importFrom outliers dixon.test
#' @usage calcScore(data, samples, summary, saveAsFile=FALSE, dir=getwd())
#'
#' @references \insertRef{Dixon1950}{BEclear}
#' @references \insertRef{Dixon1951}{BEclear}
#' @references \insertRef{Rorabacher1991}{BEclear}
#'
#' @return A data.frame is returned containing the number of found genes assumed
#' to be batch affected separated by batch and a BEscore for every batch. Furthermore
#' there's a column dixonPval giving you a p-value regarding each BEscore according
#' to a Dixon test.
#' The data.frame is also stored in the specified directory as .RData file, if
#' saveAsFile is \code{TRUE}.
#'
#' @examples
#' ## Shortly running example. For a more realistic example that takes
#' ## some more time, run the same procedure with the full BEclearData
#' ## dataset.
#'
#' ## Whole procedure that has to be done to use this function.
#' data(BEclearData)
#' ex.data <- ex.data[31:90, 7:26]
#' ex.samples <- ex.samples[7:26, ]
#' ## Calculate the batch effects
#' batchEffects <- calcBatchEffects(data = ex.data, samples = ex.samples,
#' adjusted = TRUE, method = "fdr")
#' med <- batchEffects$med
#' pvals <- batchEffects$pval
#'
#' # Summarize p-values and median differences for batch affected genes
#' sum <- calcSummary(medians = med, pvalues = pvals)
#'
#' # Calculates the score table
#' score.table <- calcScore(data = ex.data, samples = ex.samples, summary = sum)
calcScore <- function(data, samples, summary, saveAsFile = FALSE, dir = getwd()) {
## take batch ids
batches <- unique(samples$batch_id)
flog.info(paste("Calculating the scores for", length(batches), "batches"))
## take number of genes
numGenes <- nrow(data)
## count the medians
DT <- lapply(batches, FUN = function(x, y){
data.table(batch_id = x,
count05 = y[batch_id == x & median >= 0.05 & median < 0.1, .N],
count1 = y[batch_id == x & median >= 0.1 & median < 0.2, .N],
count2 = y[batch_id == x & median >= 0.2 & median < 0.3, .N],
count3 = y[batch_id == x & median >= 0.3 & median < 0.4, .N],
count4 = y[batch_id == x & median >= 0.4 & median < 0.5, .N],
count5 = y[batch_id == x & median >= 0.5 & median < 0.6, .N],
count6 = y[batch_id == x & median >= 0.6 & median < 0.7, .N],
count7 = y[batch_id == x & median >= 0.7 & median < 0.8, .N],
count8 = y[batch_id == x & median >= 0.8 & median < 0.9, .N],
count9 = y[batch_id == x & median >= 0.9, .N])}, summary)
DT <- do.call(rbind, DT)
## calculate BEscores
DT[, BEscore := (count05 * 1 + count1* 2 + count2 * 4 + count3 * 6 + count4 * 8
+ count5 * 10 + count6 * 12 + count7 * 14 + count8 * 16
+ count9 * 18)/numGenes ]
## calculate outlier according to the dixon test
DT[, dixonPval := as.numeric(NA)]
if(sum(!is.na(DT$BEscore)) >= 3 && max(DT$BEscore) != min(DT$BEscore)){
pval <- dixon.test(DT$BEscore, two.sided = FALSE)
DT[BEscore == max(DT$BEscore), dixonPval := as.numeric(pval$p.value)]
}
if (saveAsFile) {
filename <- "score.table.Rdata"
save(DT, file = paste(dir, filename, sep = "/"))
}
return(DT)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.