# require(knitr)
knitr::opts_chunk$set(message=FALSE,warning=FALSE,comment=NA, crop = NULL)
rm(list=ls())
library(pander)
library(ggplot2)
library(gridExtra)

Introduction

The purpose of this vignette is to show the possibilities offered by the LMWiRe package.

LMWiRe stands for Linear Models for Wide Responses. This package was created to analyse models with wide responses and a multi-factor design of experiment and provide an implementation of ASCA and APCA derived methods.

The model used in this example is a three-way ANOVA with fixed effects. This document presents all the usual steps of the analysis, from importing the data to visualising the results. Details on the methods used can be found in the articles of @Thiel2017 and @Guisset2019.

Installation

The package is actually in its development stage and is available on GitHub at the url: https://github.com/bgovaerts/LMWiRe/tree/develop. It can be installed via the remotes::install_github() function. The package needs to be loaded afterwards.

# if (!requireNamespace("remotes")) install.packages(pkgs="remotes", dependencies="Depends")
# remotes::install_github("ManonMartin/LMWiRe", dependencies = TRUE)
library("LMWiRe")

Data importation

Before any analysis, the UCH data set needs to be loaded. The LMWiRe package contains the data set and it can be loaded with the data() function. The load() function is also useful to import your own data.

data("UCH")
UCH$formula <- "outcomes ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time"

Data exploration

The UCH (Urine-Citrate-Hippurate) data set is described in @Thiel2017 and @Guisset2019 and is issued form a metabolomics experiment. In this experiment, 36 samples of a pool of rat urine samples were spiked with two molecules Citrate and Hippurate according to a $3^2$ full factorial design in the quantities of these two molecules. The spiked samples were analyzed by ^1^H NMR at two different time after defrozing and over two days. Two of the spectra where finally missing at the end of the experiment.

The UCH data set is a list containing 3 elements:

For the purpose of this example, only 3 factors of interest will be studied : Quantities of Hippurate and Citrate and Time after defrozing.

Design

The design matrix contains the information about each observation for the four variables: Hippurate, Citrate, Day and Time. Only 3 of these variables are used in the model. The function plotDesign is useful to observe the design.

pander(head(UCH$design))
plotDesign(design = UCH$design, x = "Hippurate", 
           y = "Citrate", rows = "Time",
           title = "Design of the UCH dataset")

This plot confirms that the design is a full 3x3x2 factorial design replicated twice with 2 missing values. The design is then not balanced.

Outcomes visualization

The 600 response (outcomes) variables represent, for each observation, the intensities of the ^1^H NMR spectra. These spectra can be visualized by the plotLine function.

plotLine function

Here, annotation are added to the ggplot in order to highlight the Hippurate (green) and Citrate (red) peaks.

cit_peaks <- annotate("rect", xmin=c(2.509), xmax=c(2.709), 
              ymin=-Inf, ymax=Inf, alpha=0.2, 
              fill=c("tomato")) 

hip_peaks <- annotate("rect", xmin=c(7.458,3.881), xmax=c(7.935,4.041), 
              ymin=-Inf, ymax=Inf, alpha=0.2, 
              fill=c("yellowgreen"))

p1 <- plotLine(Y = UCH$outcomes,
         title = "H-NMR spectrum",
         rows = c(3),
         xlab = "ppm",
         ylab = "Intensity")

p1 + cit_peaks + hip_peaks

plotScatter function

plotScatter function allows to visualize the values of two outcomes variables in highlighting with colors or markers the values of the design factors. Here, it is used to show that the $3^2$ factorial design can be recovered from the intensities of the Hippurate and Citrate peaks in the spectra.

# xy corresponds to citrate (453) and hippurate peaks (369)
plotScatter(Y = UCH$outcomes,
            xy = c(453, 369),
            design = UCH$design,
            color = "Hippurate", 
            shape = "Citrate")
# Or
plotScatter(Y = UCH$outcomes,
            xy = c("2.6092056","3.9811536"),
            design = UCH$design,
            color = "Hippurate", 
            shape = "Citrate")

plotScatterM function

plotScatter function allows to visualize the values of a series of outcomes variables in highlighting with colors or markers the values of the design factors. It is done here for all peaks of Citrate et Hippurate

plotScatterM(Y = UCH$outcomes, cols = c(133, 145, 150, 369, 453), 
             design = UCH$design,varname.colorup = "Hippurate", 
             varname.colordown = "Citrate")

plotMeans function

plotMeans allows finally to visualize the mean values of a response variable for different levels of the design factors. Here we show the evolution of the Citrate peak higth with respect to the three design factors of interest. Note that the results of this function must be interpreted with caution when designs are unbalanced.

plotMeans(Y = UCH$outcomes,
          design = UCH$design,
          cols = c(453),
           x = c("Citrate"),
          w = c("Hippurate"),
          z = c("Time"),
          ylab = "Intensity",
          title=c("Mean reponse for main Citrate peak"))

Principal Component Analysis

The function pcaBySvd is useful to compute a PCA decomposition of the outcomes matrix. The linked usefull functions are :

ResPCA = pcaBySvd(UCH$outcomes)
pcaScreePlot(ResPCA, nPC = 6) 

The score plots show a quite effect of the three design factors which will be more clearly highlighted by ASCA and APCA.

pcaScorePlot(resPcaBySvd = ResPCA, axes = c(1,2), 
             title = "PCA scores plot: PC1 and PC2", 
             design = UCH$design,
             color = "Hippurate", shape = "Citrate",
             points_labs_rn = FALSE)

pcaScorePlot(resPcaBySvd = ResPCA, axes = c(1,2), 
             title = "PCA scores plot: PC1 and PC2", 
             design = UCH$design,
             color = "Time", shape = "Hippurate",
             points_labs_rn = FALSE)

pcaScorePlot(resPcaBySvd = ResPCA, axes = c(3,4), 
             title = "PCA scores plot: PC3 and PC4", 
             design = UCH$design,
             color = "Time", shape = "Citrate",
             points_labs_rn = FALSE)

In the first two loading plots a mix of Citrate and Hippurate peaks already appear.

p2 <- pcaLoading1dPlot(resPcaBySvd = ResPCA, axes = c(1,2), 
               title = "PCA loadings plot UCH", xlab = "ppm", 
               ylab = "Intensity")

p2 +  hip_peaks + cit_peaks

Model estimation and effect matrix decomposition

Model formula

The formula of the ANOVA-GLM model used in this analysis is the 3 ways crossed ANOVA model:

UCH$formula

Model matrix generation

The first step of ASCA+ is to build the (GLM) model matrix from the experimental design matrix and the model. Each factor is reencoded with multiple binary variables using contr.sum coding. The model matrix is a \emph(34xp) with p being the total number of parameter in the ANOVA model for one response.

The function lmwModelMatrix() encodes the design matrix as a model matrix.

resLmwModelMatrix = lmwModelMatrix(UCH)
pander::pander(head(resLmwModelMatrix$modelMatrix))

Model estimation and effect matrices decomposition

lmwEffectMatrices() is the used to estimate the GLM model and decompose the outcomes matrix effect matrices for every model term. This function calculates also type III effect contributions (in %) and generates a linked barpot.

resLmwEffectMatrices = lmwEffectMatrices(resLmwModelMatrix)

Effects importance

The contributions from each effect is outputted from lmwEffectMatrices.

pander(resLmwEffectMatrices$variationPercentages)
resLmwEffectMatrices$varPercentagesPlot

Bootstrap tests and quantification of effects importance

lmwBootstrapTests() allows to apply parametric bootstrap test to determine whether an effect is significant or not. Use first a small value of nboot (e.g. nboot=100) to develop your code and increase then it (e.g. nboot=1000) in ordre to get an accurate value for the p-values.

resLmwBootstrapTests = 
  lmwBootstrapTests(resLmwEffectMatrices = resLmwEffectMatrices, 
                                         nboot=100)

# Print P-values
pander::pander(t(resLmwBootstrapTests$resultsTable))

ASCA/APCA/ASCA-E decomposition

ASCA/APCA/ASCA-E decomposition allow to represent the information from the effect matrices in a space of reduced dimensions through PCA. The function lmwPcaEffects() has a method argument to define which method to use, namely ASCA, APCA or ASCA-E.

ASCA

The ASCA method performs PCA on the pure effect matrices.

resASCA = lmwPcaEffects(resLmwEffectMatrices = resLmwEffectMatrices, 
                        method="ASCA", 
                        combineEffects = list(c("Hippurate", "Time", 
                                                "Hippurate:Time")))

Contributions

The contribution of each principal component of the effects is estimated and reported in tables with the function lmwContributions().

resLmwContributions = lmwContributions(resASCA)
pander::pander(resLmwContributions$totalContribTable)
pander::pander(resLmwContributions$effectTable)
pander::pander(resLmwContributions$contribTable)
pander::pander(resLmwContributions$combinedEffectTable)
resLmwContributions$plotTotal
resLmwContributions$plotContrib

Scores and loadings Plots

The loadings can be represented on a line plot with the function lmwLoading1dPlot() to conveniently compare them with the original spectral profiles.

all_loadings_pl <- lmwLoading1dPlot(resASCA, 
                              effectNames = c("Hippurate", "Citrate","Time",
                                              "Hippurate:Time",
                                              "Hippurate+Time+Hippurate:Time", 
                                              "Residuals"), 
                              axes = 1, xlab = "ppm")

The score matrices are represented two components at a time on a scatterplot with the function lmwScorePlot().

Main effects

# Hippurate
hip_scores_pl <- lmwScorePlot(resASCA, effectNames = "Hippurate", 
             color = "Hippurate", shape = "Hippurate")

hip_loadings_pl <- all_loadings_pl$Hippurate +  hip_peaks

grid.arrange(hip_scores_pl,hip_loadings_pl, ncol=2)

# Citrate
cit_scores_pl <- lmwScorePlot(resASCA, effectNames = "Citrate", 
             color = "Citrate", shape = "Citrate")
cit_loadings_pl <- all_loadings_pl$Citrate +  cit_peaks

grid.arrange(cit_scores_pl,cit_loadings_pl, ncol=2)

# Time
tim_scores_pl <- lmwScorePlot(resASCA, effectNames = "Time", color = "Time", 
                              shape = "Time")

time_peaks <- annotate("rect", xmin=c(5.955364), xmax=c(6.155364), 
              ymin=-Inf, ymax=Inf, alpha=0.2, 
              fill=c("royalblue")) 

tim_loadings_pl <- all_loadings_pl$Time +  time_peaks

grid.arrange(tim_scores_pl,tim_loadings_pl, ncol=2)

Interaction Hippurate:Time

# Hippurate:Time
hiptim_scores_pl <- lmwScorePlot(resASCA, effectNames = "Hippurate:Time", 
                                 color = "Hippurate", shape = "Time")
hiptim_loadings_pl <- all_loadings_pl$`Hippurate:Time` + time_peaks + hip_peaks

grid.arrange(hiptim_scores_pl,hiptim_loadings_pl, ncol=2)

Combination of effects Hippurate+Time+Hippurate:Time

Scores and loadings of a combination of effects, here "Hippurate+Time+Hippurate:Time" can also be represented.

# Hippurate+Time+Hippurate:Time
hiptimInter_scores_pl <- lmwScorePlot(resASCA, 
                                 effectNames = "Hippurate+Time+Hippurate:Time", 
                                 color = "Hippurate", shape = "Time")

hiptimInter_loadings_pl <- all_loadings_pl$`Hippurate:Time` + 
                            time_peaks + hip_peaks

grid.arrange(hiptimInter_scores_pl,hiptimInter_loadings_pl, ncol=2)

For interaction terms of combined effects however, a better graphical representation is possible with the function lmwEffectPlot() (see below).

Model residuals

resid_scores_pl <- lmwScorePlot(resASCA, effectNames = "Residuals",
                                color = "Day", shape = "Day", 
                                drawShapes = "segment")


resid_loadings_pl <- all_loadings_pl$Residuals 

grid.arrange(resid_scores_pl,resid_loadings_pl, ncol=2)

We can also represent the scores with a matrix of plots with lmwScoreScatterPlotM(). This graph allows to observe multiple variables simultaneously.

lmwScoreScatterPlotM(resASCA,PCdim=c(1,1,1,1,1,1,1,2),
                     modelAbbrev = TRUE,
                     varname.colorup = "Citrate",
                     varname.colordown  = "Time",
                     varname.pchup="Hippurate",
                     varname.pchdown="Time",
                     title = "ASCA scores scatterplot matrix")

Finally the loadings could also be represented as a scatter plot.

# adding labels to points for the Hippurate peaks
labels = substr(colnames(UCH$outcomes),1,4)
labels[-c(369, 132, 150, 133, 149, 144, 145, 368, 151)] <- ""

lmwLoading2dPlot(resASCA, effectNames = c("Hippurate"), 
                 axes = c(1,2), points_labs = labels)

Effects plot

This function can only be applied to the ASCA method.

Main effects

lmwEffectPlot(resASCA, effectName = "Hippurate", x = "Hippurate")
lmwEffectPlot(resASCA, effectName = "Citrate", x = "Citrate")
lmwEffectPlot(resASCA, effectName = "Time", x = "Time")

Interaction Hippurate:Time

lmwEffectPlot(resASCA, effectName = "Hippurate:Time", 
              x = "Hippurate", z = "Time")
lmwEffectPlot(resASCA, effectName = "Hippurate:Time", 
              x = "Time", z = "Hippurate")

Combination of effects Hippurate+Time+Hippurate:Time

lmwEffectPlot(resASCA, effectName = "Hippurate+Time+Hippurate:Time", 
              x = "Hippurate", z = "Time")
lmwEffectPlot(resASCA, effectName = "Hippurate+Time+Hippurate:Time", 
              axes = c(1:3), x = "Time", z = "Hippurate")

APCA

The APCA method performs PCAs on the effect matrices augmented by the residuals. The same functions are used.

resAPCA = lmwPcaEffects(resLmwEffectMatrices = resLmwEffectMatrices, 
                        method="APCA")

Scores Plot

different shapes with drawShapes argument

lmwScorePlot(resAPCA, effectNames = "Hippurate", 
             color = "Hippurate", shape = "Hippurate", drawShapes = "ellipse")

lmwScorePlot(resAPCA, effectNames = "Citrate", 
             color = "Citrate", shape = "Citrate", drawShapes = "ellipse")

lmwScorePlot(resAPCA, effectNames = "Time", 
             color = "Time", shape = "Time", drawShapes = "ellipse")
lmwScorePlot(resAPCA, effectNames = "Time", 
             color = "Time", shape = "Time", drawShapes = "polygon")
lmwScorePlot(resAPCA, effectNames = "Time", 
             color = "Time", shape = "Time", drawShapes = "segment")

lmwScorePlot(resAPCA, effectNames = "Hippurate:Time", 
             color = "Hippurate", shape = "Time", drawShapes = "segment")
lmwScorePlot(resAPCA, effectNames = "Hippurate:Time", 
             color = "Hippurate", shape = "Time", drawShapes = "polygon")
lmwScoreScatterPlotM(resAPCA,
                     effectNames = c("Hippurate", "Citrate", "Time",
                                     "Hippurate:Time"),
                     modelAbbrev = TRUE,
                     varname.colorup = "Citrate",
                     varname.colordown  = "Time",
                     varname.pchup="Hippurate",
                     varname.pchdown="Time",
                     title = "APCA scores scatterplot matrix")

Loadings plot

lmwLoading1dPlot(resAPCA, effectNames = c("Hippurate", "Citrate", 
                                        "Time", "Hippurate:Time"), axes = 1)

ASCA-E

The ASCA-E method performs PCA on the effect matrices then adds the residuals to compute the augmented scores.

resASCAE = lmwPcaEffects(resLmwEffectMatrices = resLmwEffectMatrices, 
                         method="ASCA-E")

Contributions and loadings are similar to the ASCA results.

Scores Plot

lmwScorePlot(resASCAE, effectNames = "Hippurate", 
             color = "Hippurate", shape = "Hippurate")
lmwScorePlot(resASCAE, effectNames = "Citrate", 
             color = "Citrate", shape = "Citrate")
lmwScorePlot(resASCAE, effectNames = "Time", 
             color = "Time", shape = "Time")
lmwScorePlot(resASCAE, effectNames = "Hippurate:Time", 
             color = "Hippurate", shape = "Time")
lmwScoreScatterPlotM(resASCAE,
                     effectNames = c("Hippurate", "Citrate", "Time",
                                     "Hippurate:Time"),
                     modelAbbrev = TRUE,
                     varname.colorup = "Citrate",
                     varname.colordown  = "Time",
                     varname.pchup="Hippurate",
                     varname.pchdown="Time",
                     title = "ASCA-E scores scatterplot matrix")

Session info

sessionInfo()

References



bgovaerts/LMWiRe documentation built on Sept. 17, 2022, 12:32 a.m.