makeBoxplot: produce simple predefined boxplot for methylation data

View source: R/makeBoxplot.R

makeBoxplotR Documentation

produce simple predefined boxplot for methylation data

Description

A simple boxplot is done with boxes either separated by batches or by samples and describe the five number summary of all beta values corresponding to a batch or a sample, respectively. The batch_ids are shown on the x-axis with a coloring corresponding to the BEscore.

Usage

makeBoxplot(data, samples, score, bySamples=FALSE, col="standard",
main="", xlab="Batch", ylab="Beta value", scoreCol=TRUE, log = FALSE)

Arguments

data

any matrix filled with beta values, column names have to be sample_ids corresponding to the ids listed in "samples", row names have to be gene names.

samples

data frame with two columns, the first column has to contain the sample numbers, the second column has to contain the corresponding batch number. Colnames have to be named as "sample_id" and "batch_id".

score

data frame produced by the calcScore function. Contains the number of presumably batch affected genes and a BEscore which is needed for the coloring of the batch_ids.

bySamples

should the boxes be separated by samples or not. If not, boxes are separated by the batch_ids.

col

colors for the boxes, refers to the standard boxplot R-function. If it is set to "standard", boxes are colored batch-wise (if separated by samples) or the standard color "yellow" is used (if separated by batches).

main

main title for the box plot. Default is an empty string.

xlab

label for the x-axis of the box plot. Default is "Batch".

ylab

label for the y-axis of the box plot. Default is "Beta value".

scoreCol

should the batch_ids on the a-axis be colored according to the BEscore or not? If not, black is used as color for all batch_ids.

log

TRUE, if the y-axis should be on a logarithmic scale.

Details

makeBoxplot

The color code for the batch_ids on the x-axis provides a simple "traffic light" the user can use to decide if he wants to correct for an assumed batch effect or not. Green means no batch effect, yellow a possibly existing not severe batch effect and red stands for an obviously existing batch effect that should be corrected. The traffic light colors are set according to the BEscore from the calcScore function, values from 0 to 0.02 are colored in green, from 0.02 to 0.1 in yellow and values over 0.1 are colored in red.

Value

Returns a boxplot on the graphic device with the features explained above.

See Also

calcScore

boxplot

correctBatchEffect

Examples

## Shortly running example. For a more realistic example that takes
## some more time, run the same procedure with the full BEclearData
## dataset.

## Whole procedure that has to be done to use this function.
data(BEclearData)
ex.data <- ex.data[31:90, 7:26]
ex.samples <- ex.samples[7:26, ]

## Prepare the data for the box plots
## Calculate the batch effects
batchEffects <- calcBatchEffects(data = ex.data, samples = ex.samples,
adjusted = TRUE, method = "fdr")
meds <- batchEffects$med
pvals <- batchEffects$pval

## Summarize p-values and median differences for batch affected genes
sum <- calcSummary(medians = meds, pvalues = pvals)

# Calculate the BEscore for the batch_id colorings of the x-axis
score <- calcScore(data = ex.data, samples = ex.samples, summary = sum)

## Simple boxplot for the example data separated by batch
makeBoxplot(
  data = ex.data, samples = ex.samples, score = score, bySamples = FALSE,
  main = "Some box plot"
)

## Simple boxplot for the example data separated by samples
makeBoxplot(
  data = ex.data, samples = ex.samples, score = score, bySamples = TRUE,
  main = "Some box plot"
)

uds-helms/BEclear documentation built on April 16, 2023, 12:44 a.m.