evaluatePCA: Evaluation of differences between batches, using PCA

Description Usage Arguments Value Author(s) See Also Examples

Description

This function calculates the (average) Bhattacharyya distance between the batches of a data set. The lower this distance is, the more alike the batches. Alternatively, a distance matrix can be returned indicating for any pair of batches their B. distance.

Usage

1
2
evaluatePCA(X, Y, npc = 2, plot = FALSE, batch.colors, scaleX = TRUE,
            legend.loc = "topright", legend.col = 2, ..., perBatch = TRUE)

Arguments

X

Data matrix: rows are samples, columns are features (metabolites in this case).

Y

Batch information: a data.frame with columns SCode, Batch and

npc

Number of PCs to include in the low-dimensional representation.

plot

Logical: should a score plot be shown?

batch.colors

Colors to be used for individual batches.

scaleX

Logical: should standardization (zero mean, unit variance) be applied for all columns? Default: yes.

legend.loc

Location of the legend.

legend.col

Number of columns in the legend.

...

Further graphical arguments.

perBatch

Logical: should the result be given as a distance matrix between batches (the default), or as one average distance?

Value

Returns Bhattacharyya distances between batches. If perBatch == TRUE, a distance matrix is returned, otherwise the average value of all distances is returned.

Author(s)

Ron Wehrens

See Also

evaluateDuplos

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
data(BC)
set.1.lod <- min(set.1[!is.na(set.1)])

## do correction, only first ten metabolites of set.1
set.1.corrected.Q0 <-
  apply(set.1[,1:10], 2, doBC, ref.idx = which(set.1.Y$SCode == "ref"),
        batch.idx = set.1.Y$Batch, minBsamp = 4,
        seq.idx = set.1.Y$SeqNr, method = "lm",
        imputeVal = 0)
set.1.corrected.Q2 <-
  apply(set.1[,1:10], 2, doBC, ref.idx = which(set.1.Y$SCode == "ref"),
        batch.idx = set.1.Y$Batch, minBsamp = 4,
        seq.idx = set.1.Y$SeqNr, method = "lm",
        imputeVal = set.1.lod)

huhnPCA.A0 <- evaluatePCA(set.1.corrected.Q0, set.1.Y, perBatch = FALSE,
                          plot = TRUE, legend.loc = "bottomright")
title(main = paste("Q: Interbatch distance:", round(huhnPCA.A0, 3)),
      sub = "NA imputation: 0")
huhnPCA.A2 <- evaluatePCA(set.1.corrected.Q2, set.1.Y, perBatch = FALSE,
                          plot = TRUE, legend.loc = "bottomright")
title(main = paste("Q: Interbatch distance:", round(huhnPCA.A2, 3)),
      sub = "NA imputation: LOD")

## which batches are more similar?
B2B <- evaluatePCA(set.1.corrected.Q2, set.1.Y, what = "PCA", plot = FALSE,
                   perBatch = TRUE)
dimnames(B2B) <- list(levels(set.1.Y$Batch), levels(set.1.Y$Batch))
plot(hclust(as.dist(B2B)))

rwehrens/BatchCorrMetabolomics documentation built on May 28, 2019, 10:42 a.m.