calcScore: calculate batch effect score

Description Usage Arguments Details Value References See Also Examples

View source: R/calcScore.R

Description

Returns a table with the number of found genes with found p-values less or equal to 0.01 and median values greater or equal to 0.05. A score is calculated depending on the number of found genes as well as the magnitude of the median difference values, this score is divided by the overall number of genes in the data and returned as "BEscore". See details for further information and details about the score calculation. The returned data.frame is also stored in the specified directory as .RData file.

Usage

1
calcScore(data, samples, summary, saveAsFile=FALSE, dir=getwd())

Arguments

data

any matrix filled with beta values, column names have to be sample_ids corresponding to the ids listed in "samples", row names have to be gene names.

samples

data frame with two columns, the first column has to contain the sample numbers, the second column has to contain the corresponding batch number. Colnames have to be named as "sample_id" and "batch_id".

summary

a summary data.table containing the columns "gene", "batch", median" and "p-value" and consists of all genes which were found in the median and p-value calculations, see calcSummary function for more details.

saveAsFile

determining if the data.frame should also be saved as a file

dir

set the path to a directory the returned data.frame should be stored. The current working directory is defined as default parameter.

Details

calcScore

The returned data frame contains one column for the batch numbers, 11 columns containing the number of genes found in a certain range of the median difference value and a column with the calculated BEscore. These found genes are assumed to be batch affected due to their difference in median values and their different distribution of the beta values. The higher the found number of genes and the more extreme the median difference is, the more severe is the assumed batch effect supposed to be. We suggest that there is no need for a batch effect correction if the BEscore for a batch is less than 0.02. BEscores between 0.02 and 0.1 are lying in a "gray area" for which we assume a not severe batch effect, and values beyond 0.1 certainly describe a batch effect and should definitely be corrected.
The 11 columns containing the numbers of found genes count the median difference values which are ranging from >= 0.05 to < 0.1 ; >= 0.1 to < 0.2; >= 0.2 to < 0.3 and so on up to a limit of 1.
The BEscore is calculated by the sum of the weighted number of genes divided by the number of genes. Weightings are calculated by multiplication of the number of found genes between 0.05 and 0.1 by 1, between 0.1 and 0.2 by 2, between 0.2 and 0.3 by 4, between 0.3 and 0.4 by 6 and so on.

Value

A data.frame is returned containing the number of found genes assumed to be batch affected separated by batch and a BEscore for every batch. Furthermore there's a column dixonPval giving you a p-value regarding each BEscore according to a Dixon test. The data.frame is also stored in the specified directory as .RData file, if saveAsFile is TRUE.

References

\insertRef

Dixon1950BEclear

\insertRef

Dixon1951BEclear

\insertRef

Rorabacher1991BEclear

See Also

calcBatchEffects

calcSummary

correctBatchEffect

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Shortly running example. For a more realistic example that takes
## some more time, run the same procedure with the full BEclearData
## dataset.

## Whole procedure that has to be done to use this function.
data(BEclearData)
ex.data <- ex.data[31:90, 7:26]
ex.samples <- ex.samples[7:26, ]
## Calculate the batch effects
batchEffects <- calcBatchEffects(data = ex.data, samples = ex.samples,
adjusted = TRUE, method = "fdr")
med <- batchEffects$med
pvals <- batchEffects$pval

# Summarize p-values and median differences for batch affected genes
sum <- calcSummary(medians = med, pvalues = pvals)

# Calculates the score table
score.table <- calcScore(data = ex.data, samples = ex.samples, summary = sum)

Example output

Loading required package: BiocParallel
INFO [2021-01-11 16:52:53] Transforming matrix to data.table
INFO [2021-01-11 16:52:53] Calculate the batch effects for 4 batches
INFO [2021-01-11 16:52:54] Adjusting p-values
INFO [2021-01-11 16:52:54] Generating a summary table
INFO [2021-01-11 16:52:54] Calculating the scores for 4 batches

BEclear documentation built on Nov. 8, 2020, 8:07 p.m.