Package MKomics

Introduction

Package MKomics includes a collection of functions that I found and find useful for the analysis of omics data. It includes:

The package requires Bioconductor package limma, which can be installed via

## Install package BiocManager
install.packages("BiocManager")
## Use BiocManager to install limma
BiocManager::install("limma")

We first load the package.

library(MKomics)

Moderated Tests Based on Package limma

The functions to compute the moderated t-test and one-way ANOVA were motivated by the fact that my students had problems to understand and correctly adapt the code of the limma package [@limma].

Moderated t-Test

We generate some artificial data and compute moderated t-tests.

## One-sample test
X <- matrix(rnorm(10*20, mean = 1), nrow = 10, ncol = 20)
mod.t.test(X)

## Two-sample test
set.seed(123)
X <- rbind(matrix(rnorm(5*20), nrow = 5, ncol = 20),
           matrix(rnorm(5*20, mean = 1), nrow = 5, ncol = 20))
g2 <- factor(c(rep("group 1", 10), rep("group 2", 10)))
mod.t.test(X, group = g2)

## Paired two-sample test
subjID <- factor(rep(1:10, 2))
mod.t.test(X, group = g2, paired = TRUE, subject = subjID)

Moderated 1-Way ANOVA

We generate some artifical data and compute moderated one-way ANOVAs.

set.seed(123)
X <- rbind(matrix(rnorm(5*20), nrow = 5, ncol = 20),
           matrix(rnorm(5*20, mean = 1), nrow = 5, ncol = 20))
gr <- factor(c(rep("A1", 5), rep("B2", 5), rep("C3", 5), rep("D4", 5)))
mod.oneway.test(X, gr)

We can also perform a moderated one-way ANOVA with repeated measures.

X <- rbind(matrix(rnorm(6*18), nrow = 6, ncol = 18),
           matrix(rnorm(6*18, mean = 1), nrow = 6, ncol = 18))
gr <- factor(c(rep("T1", 6), rep("T2", 6), rep("T3", 6)))
subjectID <- factor(c(rep(1:6, 3)))
mod.oneway.test(X, gr, repeated = TRUE, subject = subjectID)

Pairwise moderated t-tests

As a optional post-hoc analysis after mod.oneway.test one can use pairwise moderated t-tests (only unpaired). One should carefully think about the adjustment of p values in this context.

pairwise.mod.t.test(X, gr)

Pairwise Comparisons

Often we are in a situation that we want to compare more than two groups pairwise.

FC and logFC

In the analysis of omics data, the FC or logFC are important measures of effect size and are often used in combination with (adjusted) p values [@Shi2006].

x <- rnorm(100) ## assumed as log-data
g <- factor(sample(1:4, 100, replace = TRUE))
levels(g) <- c("a", "b", "c", "d")
## modified FC
pairwise.fc(x, g)
## "true" FC
pairwise.fc(x, g, mod.fc = FALSE)
## without any transformation
pairwise.logfc(x, g)

The function returns a modified FC. That is, if the FC is smaller than 1 it is transformed to -1/FC. One can also use other functions than the mean for the aggregation of the data.

pairwise.logfc(x, g, ave = median)

Aggregating Technical Replicates

In case of omics experiments it is often the case that technical replicates are measured and hence it is part of the preprocessing of the raw data to aggregate these technical replicates. This is the purpose of function repMeans.

M <- matrix(rnorm(100), ncol = 5)
FL <- matrix(rpois(100, lambda = 10), ncol = 5) # only for this example
repMeans(x = M, flags = FL, use.flags = "max", ndups = 5, spacing = 4)

1- and 2-Way ANOVA

Functions oneWayAnova and twoWayAnova return a function that can be used to perform a 1- or 2-way ANOVA, respectively.

af <- oneWayAnova(c(rep(1,5),rep(2,5)))
## p value
af(rnorm(10))
x <- matrix(rnorm(12*10), nrow = 10)
## 2-way ANOVA with interaction
af1 <- twoWayAnova(c(rep(1,6),rep(2,6)), rep(c(rep(1,3), rep(2,3)), 2))
## p values
apply(x, 1, af1)
## 2-way ANOVA without interaction
af2 <- twoWayAnova(c(rep(1,6),rep(2,6)), rep(c(rep(1,3), rep(2,3)), 2), 
                   interaction = FALSE)
## p values
apply(x, 1, af2)

Correlation Distance Matrix

In the analysis of omics data correlation and absolute correlation distance matrices are often used during quality control. Function corDist can compute the Pearson, Spearman, Kendall or Cosine sample correlation and absolute correlation as well as the minimum covariance determinant or the orthogonalized Gnanadesikan-Kettenring correlation and absolute correlation implemented in package robustbase [@robustbase].

M <- matrix(rcauchy(1000), nrow = 5)
## Pearson
corDist(M)
## Spearman
corDist(M, method = "spearman")
## Minimum Covariance Determinant
corDist(M, method = "mcd")

MAD Matrix

In case of outliers the MAD (median absolute deviation = median of the absolute deviations from the median) may be useful as measure of scale.

madMatrix(t(M))

Similarity Matrices

First, we can plot a similarity matrix based on correlation.

M <- matrix(rnorm(1000), ncol = 20)
colnames(M) <- paste("Sample", 1:20)
M.cor <- cor(M)
corPlot(M.cor, minCor = min(M.cor), labels = colnames(M))

There is a second implementation based on package \pkg{ComplexHeatmap} [@ComplexHeatmap].

corPlot2(M.cor, minCor = min(M.cor), labels = colnames(M))

Next, we can use the MAD.

## random data
x <- matrix(rnorm(1000), ncol = 10)
## outliers
x[1:20,5] <- x[1:20,5] + 10
madPlot(x, new = TRUE, maxMAD = 2.5, labels = TRUE,
        title = "MAD: Outlier visible")
madPlot2(x, new = TRUE, maxMAD = 2.5, labels = TRUE,
        title = "MAD: Outlier visible")
## in contrast
corPlot2(x, new = TRUE, minCor = -0.5, labels = TRUE,
        title = "Correlation: Outlier masked")

Colors for Heatmaps

Nowadays there are better solutions e.g. provided by Bioconductor package ComplexHeatmap [@ComplexHeatmap]. This is my solution to get a better coloring of heatmaps.

## generate some random data
data.plot <- matrix(rnorm(100*50, sd = 1), ncol = 50)
colnames(data.plot) <- paste("patient", 1:50)
rownames(data.plot) <- paste("gene", 1:100)
data.plot[1:70, 1:30] <- data.plot[1:70, 1:30] + 3
data.plot[71:100, 31:50] <- data.plot[71:100, 31:50] - 1.4
data.plot[1:70, 31:50] <- rnorm(1400, sd = 1.2)
data.plot[71:100, 1:30] <- rnorm(900, sd = 1.2)
nrcol <- 128
## Load required packages
library(RColorBrewer)
myCol <- rev(colorRampPalette(brewer.pal(10, "RdBu"))(nrcol))
heatmap(data.plot, col =  myCol, main = "standard colors")
myCol2 <- heatmapCol(data = data.plot, col = myCol, 
                     lim = min(abs(range(data.plot)))-1)
heatmap(data.plot, col = myCol2, main = "heatmapCol colors")

String Alignment

String Distances

In Bioinformatics the (pairwise and multiple) alignment of strings is an important topic. For this one can use the distance or similarity between strings. The Hamming and the the Levenshtein (edit) distance are implemented in function stringDist [@Merkl2009].

x <- "GACGGATTATG"
y <- "GATCGGAATAG"
## Hamming distance
stringDist(x, y)
## Levenshtein distance
d <- stringDist(x, y)
d

In case of the Levenshtein (edit) distance, the respective scoring and traceback matrices are attached as attributes to the result.

attr(d, "ScoringMatrix")
attr(d, "TraceBackMatrix")

The characters in the trace-back matrix reflect insertion of a gap in string y (d: deletion), match (m), mismatch (mm), and insertion of a gap in string x (i).

String Similarities

The function stringSim computes the optimal alignment scores for global (Needleman-Wunsch) and local (Smith-Waterman) alignments with constant gap penalties [@Merkl2009]. Scoring and trace-back matrix are again attached as attributes to the results.

## optimal global alignment score
d <- stringSim(x, y)
d
attr(d, "ScoringMatrix")
attr(d, "TraceBackMatrix")

## optimal local alignment score
d <- stringSim(x, y, global = FALSE)
d
attr(d, "ScoringMatrix")
attr(d, "TraceBackMatrix")

The entry stop indicates that the minimum similarity score has been reached.

Optimal Alignment

Finally, the function traceBack computes an optimal global or local alignment based on a trace back matrix as provided by function stringDist or stringSim [@Merkl2009].

x <- "GACGGATTATG"
y <- "GATCGGAATAG"
## Levenshtein distance
d <- stringDist(x, y)
## optimal global alignment
traceBack(d)

## Optimal global alignment score
d <- stringSim(x, y)
## optimal global alignment
traceBack(d)

## Optimal local alignment score
d <- stringSim(x, y, global = FALSE)
## optimal local alignment
traceBack(d, global = FALSE)

sessionInfo

sessionInfo()

References



Try the MKomics package in your browser

Any scripts or data that you put into this service are public.

MKomics documentation built on Aug. 8, 2021, 5:06 p.m.