# require(knitr) knitr::opts_chunk$set(message=FALSE,warning=FALSE,comment=NA, crop = NULL) rm(list=ls())
library(pander) library(ggplot2) library(gridExtra)
The purpose of this vignette is to show the possibilities offered by the LMWiRe package.
LMWiRe stands for Linear Models for Wide Responses. This package was created to analyse models with wide responses and a multi-factor design of experiment and provide an implementation of ASCA and APCA derived methods.
The model used in this example is a three-way ANOVA with fixed effects. This document presents all the usual steps of the analysis, from importing the data to visualising the results. Details on the methods used can be found in the articles of @Thiel2017 and @Guisset2019.
The package is actually in its development stage and is available on GitHub at the url: https://github.com/bgovaerts/LMWiRe/tree/develop. It can be installed via the remotes::install_github()
function. The package needs to be loaded afterwards.
# if (!requireNamespace("remotes")) install.packages(pkgs="remotes", dependencies="Depends") # remotes::install_github("ManonMartin/LMWiRe", dependencies = TRUE) library("LMWiRe")
Before any analysis, the UCH
data set needs to be loaded. The LMWiRe package contains the data set and it can be loaded with the data()
function. The load()
function is also useful to import your own data.
data("UCH") UCH$formula <- "outcomes ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time"
The UCH (Urine-Citrate-Hippurate) data set is described in @Thiel2017 and @Guisset2019 and is issued form a metabolomics experiment. In this experiment, 36 samples of a pool of rat urine samples were spiked with two molecules Citrate and Hippurate according to a $3^2$ full factorial design in the quantities of these two molecules. The spiked samples were analyzed by ^1^H NMR at two different time after defrozing and over two days. Two of the spectra where finally missing at the end of the experiment.
The UCH data set is a list containing 3 elements:
outcomes
matrix with 34 observations of 600 response variables representing the spectra from the \textsuperscript^1H-NMR$ spectroscopy, design
matrix with 34 observations and 4 explanatory variables and formula
for the General Linear Model (GLM) used. For the purpose of this example, only 3 factors of interest will be studied : Quantities of Hippurate and Citrate and Time after defrozing.
The design matrix contains the information about each observation for the four variables: Hippurate, Citrate, Day and Time. Only 3 of these variables are used in the model. The function plotDesign
is useful to observe the design.
pander(head(UCH$design)) plotDesign(design = UCH$design, x = "Hippurate", y = "Citrate", rows = "Time", title = "Design of the UCH dataset")
This plot confirms that the design is a full 3x3x2 factorial design replicated twice with 2 missing values. The design is then not balanced.
The 600 response (outcomes
) variables represent, for each observation, the intensities of the ^1^H NMR spectra. These spectra can be visualized by the plotLine
function.
plotLine
functionHere, annotation are added to the ggplot
in order to highlight the Hippurate (green) and Citrate (red) peaks.
cit_peaks <- annotate("rect", xmin=c(2.509), xmax=c(2.709), ymin=-Inf, ymax=Inf, alpha=0.2, fill=c("tomato")) hip_peaks <- annotate("rect", xmin=c(7.458,3.881), xmax=c(7.935,4.041), ymin=-Inf, ymax=Inf, alpha=0.2, fill=c("yellowgreen")) p1 <- plotLine(Y = UCH$outcomes, title = "H-NMR spectrum", rows = c(3), xlab = "ppm", ylab = "Intensity") p1 + cit_peaks + hip_peaks
plotScatter
functionplotScatter
function allows to visualize the values of two outcomes variables in highlighting with colors or markers the values of the design factors. Here, it is used to show that the $3^2$ factorial design can be recovered from the intensities of the Hippurate and Citrate peaks in the spectra.
# xy corresponds to citrate (453) and hippurate peaks (369) plotScatter(Y = UCH$outcomes, xy = c(453, 369), design = UCH$design, color = "Hippurate", shape = "Citrate") # Or plotScatter(Y = UCH$outcomes, xy = c("2.6092056","3.9811536"), design = UCH$design, color = "Hippurate", shape = "Citrate")
plotScatterM
functionplotScatter
function allows to visualize the values of a series of outcomes variables in highlighting with colors or markers the values of the design factors. It is done here for all peaks of Citrate et Hippurate
plotScatterM(Y = UCH$outcomes, cols = c(133, 145, 150, 369, 453), design = UCH$design,varname.colorup = "Hippurate", varname.colordown = "Citrate")
plotMeans
functionplotMeans
allows finally to visualize the mean values of a response variable for different levels of the design factors. Here we show the evolution of the Citrate peak higth with respect to the three design factors of interest.
Note that the results of this function must be interpreted with caution when designs are unbalanced.
plotMeans(Y = UCH$outcomes, design = UCH$design, cols = c(453), x = c("Citrate"), w = c("Hippurate"), z = c("Time"), ylab = "Intensity", title=c("Mean reponse for main Citrate peak"))
The function pcaBySvd
is useful to compute a PCA decomposition of the outcomes
matrix. The linked usefull functions are :
pcaScreePlot
to obtaine a scree plotpcaLoading1dPlot
for the loading plotspcaScorePlot
for the score plotsResPCA = pcaBySvd(UCH$outcomes) pcaScreePlot(ResPCA, nPC = 6)
The score plots show a quite effect of the three design factors which will be more clearly highlighted by ASCA and APCA.
pcaScorePlot(resPcaBySvd = ResPCA, axes = c(1,2), title = "PCA scores plot: PC1 and PC2", design = UCH$design, color = "Hippurate", shape = "Citrate", points_labs_rn = FALSE) pcaScorePlot(resPcaBySvd = ResPCA, axes = c(1,2), title = "PCA scores plot: PC1 and PC2", design = UCH$design, color = "Time", shape = "Hippurate", points_labs_rn = FALSE) pcaScorePlot(resPcaBySvd = ResPCA, axes = c(3,4), title = "PCA scores plot: PC3 and PC4", design = UCH$design, color = "Time", shape = "Citrate", points_labs_rn = FALSE)
In the first two loading plots a mix of Citrate and Hippurate peaks already appear.
p2 <- pcaLoading1dPlot(resPcaBySvd = ResPCA, axes = c(1,2), title = "PCA loadings plot UCH", xlab = "ppm", ylab = "Intensity") p2 + hip_peaks + cit_peaks
The formula of the ANOVA-GLM model used in this analysis is the 3 ways crossed ANOVA model:
UCH$formula
The first step of ASCA+ is to build the (GLM) model matrix from the experimental design matrix and the model. Each factor is reencoded with multiple binary variables using contr.sum
coding. The model matrix is a \emph(34xp) with p being the total number of parameter in the ANOVA model for one response.
The function lmwModelMatrix()
encodes the design matrix as a model matrix.
resLmwModelMatrix = lmwModelMatrix(UCH) pander::pander(head(resLmwModelMatrix$modelMatrix))
lmwEffectMatrices()
is the used to estimate the GLM model and decompose the outcomes matrix effect matrices for every model term. This function calculates also type III effect contributions (in %) and generates a linked barpot
.
resLmwEffectMatrices = lmwEffectMatrices(resLmwModelMatrix)
The contributions from each effect is outputted from lmwEffectMatrices
.
pander(resLmwEffectMatrices$variationPercentages) resLmwEffectMatrices$varPercentagesPlot
lmwBootstrapTests()
allows to apply parametric bootstrap test to determine whether an effect is significant or not. Use first a small value of nboot
(e.g. nboot=100) to develop your code and increase then it (e.g. nboot=1000) in ordre to get an accurate value for the p-values.
resLmwBootstrapTests = lmwBootstrapTests(resLmwEffectMatrices = resLmwEffectMatrices, nboot=100) # Print P-values pander::pander(t(resLmwBootstrapTests$resultsTable))
ASCA/APCA/ASCA-E decomposition allow to represent the information from the effect matrices in a space of reduced dimensions through PCA. The function lmwPcaEffects()
has a method argument to define which method to use, namely ASCA
, APCA
or ASCA-E
.
The ASCA method performs PCA on the pure effect matrices.
resASCA = lmwPcaEffects(resLmwEffectMatrices = resLmwEffectMatrices, method="ASCA", combineEffects = list(c("Hippurate", "Time", "Hippurate:Time")))
The contribution of each principal component of the effects is estimated and reported in tables with the function lmwContributions()
.
resLmwContributions = lmwContributions(resASCA)
lmwEffectMatrices
is retained here.pander::pander(resLmwContributions$totalContribTable)
pander::pander(resLmwContributions$effectTable)
pander::pander(resLmwContributions$contribTable)
effectTable
for the combination of effects mentioned in lmwPcaEffects().
pander::pander(resLmwContributions$combinedEffectTable)
resLmwContributions$plotTotal resLmwContributions$plotContrib
The loadings can be represented on a line plot with the function lmwLoading1dPlot()
to conveniently compare them with the original spectral profiles.
all_loadings_pl <- lmwLoading1dPlot(resASCA, effectNames = c("Hippurate", "Citrate","Time", "Hippurate:Time", "Hippurate+Time+Hippurate:Time", "Residuals"), axes = 1, xlab = "ppm")
The score matrices are represented two components at a time on a scatterplot with the function lmwScorePlot()
.
# Hippurate hip_scores_pl <- lmwScorePlot(resASCA, effectNames = "Hippurate", color = "Hippurate", shape = "Hippurate") hip_loadings_pl <- all_loadings_pl$Hippurate + hip_peaks grid.arrange(hip_scores_pl,hip_loadings_pl, ncol=2) # Citrate cit_scores_pl <- lmwScorePlot(resASCA, effectNames = "Citrate", color = "Citrate", shape = "Citrate") cit_loadings_pl <- all_loadings_pl$Citrate + cit_peaks grid.arrange(cit_scores_pl,cit_loadings_pl, ncol=2) # Time tim_scores_pl <- lmwScorePlot(resASCA, effectNames = "Time", color = "Time", shape = "Time") time_peaks <- annotate("rect", xmin=c(5.955364), xmax=c(6.155364), ymin=-Inf, ymax=Inf, alpha=0.2, fill=c("royalblue")) tim_loadings_pl <- all_loadings_pl$Time + time_peaks grid.arrange(tim_scores_pl,tim_loadings_pl, ncol=2)
Hippurate:Time
# Hippurate:Time hiptim_scores_pl <- lmwScorePlot(resASCA, effectNames = "Hippurate:Time", color = "Hippurate", shape = "Time") hiptim_loadings_pl <- all_loadings_pl$`Hippurate:Time` + time_peaks + hip_peaks grid.arrange(hiptim_scores_pl,hiptim_loadings_pl, ncol=2)
Hippurate+Time+Hippurate:Time
Scores and loadings of a combination of effects, here "Hippurate+Time+Hippurate:Time"
can also be represented.
# Hippurate+Time+Hippurate:Time hiptimInter_scores_pl <- lmwScorePlot(resASCA, effectNames = "Hippurate+Time+Hippurate:Time", color = "Hippurate", shape = "Time") hiptimInter_loadings_pl <- all_loadings_pl$`Hippurate:Time` + time_peaks + hip_peaks grid.arrange(hiptimInter_scores_pl,hiptimInter_loadings_pl, ncol=2)
For interaction terms of combined effects however, a better graphical representation is possible with the function lmwEffectPlot()
(see below).
resid_scores_pl <- lmwScorePlot(resASCA, effectNames = "Residuals", color = "Day", shape = "Day", drawShapes = "segment") resid_loadings_pl <- all_loadings_pl$Residuals grid.arrange(resid_scores_pl,resid_loadings_pl, ncol=2)
We can also represent the scores with a matrix of plots with lmwScoreScatterPlotM()
. This graph allows to observe multiple variables simultaneously.
lmwScoreScatterPlotM(resASCA,PCdim=c(1,1,1,1,1,1,1,2), modelAbbrev = TRUE, varname.colorup = "Citrate", varname.colordown = "Time", varname.pchup="Hippurate", varname.pchdown="Time", title = "ASCA scores scatterplot matrix")
Finally the loadings could also be represented as a scatter plot.
# adding labels to points for the Hippurate peaks labels = substr(colnames(UCH$outcomes),1,4) labels[-c(369, 132, 150, 133, 149, 144, 145, 368, 151)] <- "" lmwLoading2dPlot(resASCA, effectNames = c("Hippurate"), axes = c(1,2), points_labs = labels)
This function can only be applied to the ASCA
method.
lmwEffectPlot(resASCA, effectName = "Hippurate", x = "Hippurate") lmwEffectPlot(resASCA, effectName = "Citrate", x = "Citrate") lmwEffectPlot(resASCA, effectName = "Time", x = "Time")
Hippurate:Time
lmwEffectPlot(resASCA, effectName = "Hippurate:Time", x = "Hippurate", z = "Time") lmwEffectPlot(resASCA, effectName = "Hippurate:Time", x = "Time", z = "Hippurate")
Hippurate+Time+Hippurate:Time
lmwEffectPlot(resASCA, effectName = "Hippurate+Time+Hippurate:Time", x = "Hippurate", z = "Time") lmwEffectPlot(resASCA, effectName = "Hippurate+Time+Hippurate:Time", axes = c(1:3), x = "Time", z = "Hippurate")
The APCA method performs PCAs on the effect matrices augmented by the residuals. The same functions are used.
resAPCA = lmwPcaEffects(resLmwEffectMatrices = resLmwEffectMatrices, method="APCA")
different shapes with drawShapes
argument
lmwScorePlot(resAPCA, effectNames = "Hippurate", color = "Hippurate", shape = "Hippurate", drawShapes = "ellipse") lmwScorePlot(resAPCA, effectNames = "Citrate", color = "Citrate", shape = "Citrate", drawShapes = "ellipse") lmwScorePlot(resAPCA, effectNames = "Time", color = "Time", shape = "Time", drawShapes = "ellipse") lmwScorePlot(resAPCA, effectNames = "Time", color = "Time", shape = "Time", drawShapes = "polygon") lmwScorePlot(resAPCA, effectNames = "Time", color = "Time", shape = "Time", drawShapes = "segment") lmwScorePlot(resAPCA, effectNames = "Hippurate:Time", color = "Hippurate", shape = "Time", drawShapes = "segment") lmwScorePlot(resAPCA, effectNames = "Hippurate:Time", color = "Hippurate", shape = "Time", drawShapes = "polygon")
lmwScoreScatterPlotM(resAPCA, effectNames = c("Hippurate", "Citrate", "Time", "Hippurate:Time"), modelAbbrev = TRUE, varname.colorup = "Citrate", varname.colordown = "Time", varname.pchup="Hippurate", varname.pchdown="Time", title = "APCA scores scatterplot matrix")
lmwLoading1dPlot(resAPCA, effectNames = c("Hippurate", "Citrate", "Time", "Hippurate:Time"), axes = 1)
The ASCA-E method performs PCA on the effect matrices then adds the residuals to compute the augmented scores.
resASCAE = lmwPcaEffects(resLmwEffectMatrices = resLmwEffectMatrices, method="ASCA-E")
Contributions and loadings are similar to the ASCA results.
lmwScorePlot(resASCAE, effectNames = "Hippurate", color = "Hippurate", shape = "Hippurate") lmwScorePlot(resASCAE, effectNames = "Citrate", color = "Citrate", shape = "Citrate") lmwScorePlot(resASCAE, effectNames = "Time", color = "Time", shape = "Time") lmwScorePlot(resASCAE, effectNames = "Hippurate:Time", color = "Hippurate", shape = "Time")
lmwScoreScatterPlotM(resASCAE, effectNames = c("Hippurate", "Citrate", "Time", "Hippurate:Time"), modelAbbrev = TRUE, varname.colorup = "Citrate", varname.colordown = "Time", varname.pchup="Hippurate", varname.pchdown="Time", title = "ASCA-E scores scatterplot matrix")
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.