In bgovaerts/LMWiRe: Linear Models for Wide Responses

# require(knitr)
knitr::opts_chunk$set(message=FALSE,warning=FALSE,comment=NA, crop = NULL)
rm(list=ls())

library(pander)
library(ggplot2)
library(gridExtra)

Introduction

The purpose of this vignette is to show the possibilities offered by the LMWiRe package.

LMWiRe stands for Linear Models for Wide Responses. This package was created to analyse models with wide responses and a multi-factor design of experiment and provide an implementation of ASCA and APCA derived methods.

The model used in this example is a three-way ANOVA with fixed effects. This document presents all the usual steps of the analysis, from importing the data to visualising the results. Details on the methods used can be found in the articles of @Thiel2017 and @Guisset2019.

Installation

The package is actually in its development stage and is available on GitHub at the url: https://github.com/bgovaerts/LMWiRe/tree/develop. It can be installed via the remotes::install_github() function. The package needs to be loaded afterwards.

# if (!requireNamespace("remotes")) install.packages(pkgs="remotes", dependencies="Depends")
# remotes::install_github("ManonMartin/LMWiRe", dependencies = TRUE)
library("LMWiRe")

Data importation

Before any analysis, the UCH data set needs to be loaded. The LMWiRe package contains the data set and it can be loaded with the data() function. The load() function is also useful to import your own data.

data("UCH")
UCH$formula <- "outcomes ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time"

Data exploration

The UCH (Urine-Citrate-Hippurate) data set is described in @Thiel2017 and @Guisset2019 and is issued form a metabolomics experiment. In this experiment, 36 samples of a pool of rat urine samples were spiked with two molecules Citrate and Hippurate according to a $3^2$ full factorial design in the quantities of these two molecules. The spiked samples were analyzed by ^1^H NMR at two different time after defrozing and over two days. Two of the spectra where finally missing at the end of the experiment.

The UCH data set is a list containing 3 elements:

an outcomes matrix with 34 observations of 600 response variables representing the spectra from the \textsuperscript^1H-NMR$ spectroscopy,
a design matrix with 34 observations and 4 explanatory variables and
a formula for the General Linear Model (GLM) used.

For the purpose of this example, only 3 factors of interest will be studied : Quantities of Hippurate and Citrate and Time after defrozing.

Design

The design matrix contains the information about each observation for the four variables: Hippurate, Citrate, Day and Time. Only 3 of these variables are used in the model. The function plotDesign is useful to observe the design.

pander(head(UCH$design))
plotDesign(design = UCH$design, x = "Hippurate", 
           y = "Citrate", rows = "Time",
           title = "Design of the UCH dataset")

This plot confirms that the design is a full 3x3x2 factorial design replicated twice with 2 missing values. The design is then not balanced.

Outcomes visualization

The 600 response (outcomes) variables represent, for each observation, the intensities of the ^1^H NMR spectra. These spectra can be visualized by the plotLine function.

`plotLine` function

Here, annotation are added to the ggplot in order to highlight the Hippurate (green) and Citrate (red) peaks.

cit_peaks <- annotate("rect", xmin=c(2.509), xmax=c(2.709), 
              ymin=-Inf, ymax=Inf, alpha=0.2, 
              fill=c("tomato")) 

hip_peaks <- annotate("rect", xmin=c(7.458,3.881), xmax=c(7.935,4.041), 
              ymin=-Inf, ymax=Inf, alpha=0.2, 
              fill=c("yellowgreen"))

p1 <- plotLine(Y = UCH$outcomes,
         title = "H-NMR spectrum",
         rows = c(3),
         xlab = "ppm",
         ylab = "Intensity")

p1 + cit_peaks + hip_peaks

`plotScatter` function

plotScatter function allows to visualize the values of two outcomes variables in highlighting with colors or markers the values of the design factors. Here, it is used to show that the $3^2$ factorial design can be recovered from the intensities of the Hippurate and Citrate peaks in the spectra.

# xy corresponds to citrate (453) and hippurate peaks (369)
plotScatter(Y = UCH$outcomes,
            xy = c(453, 369),
            design = UCH$design,
            color = "Hippurate", 
            shape = "Citrate")
# Or
plotScatter(Y = UCH$outcomes,
            xy = c("2.6092056","3.9811536"),
            design = UCH$design,
            color = "Hippurate", 
            shape = "Citrate")

`plotScatterM` function

plotScatter function allows to visualize the values of a series of outcomes variables in highlighting with colors or markers the values of the design factors. It is done here for all peaks of Citrate et Hippurate

plotScatterM(Y = UCH$outcomes, cols = c(133, 145, 150, 369, 453), 
             design = UCH$design,varname.colorup = "Hippurate", 
             varname.colordown = "Citrate")

`plotMeans` function

plotMeans allows finally to visualize the mean values of a response variable for different levels of the design factors. Here we show the evolution of the Citrate peak higth with respect to the three design factors of interest. Note that the results of this function must be interpreted with caution when designs are unbalanced.

plotMeans(Y = UCH$outcomes,
          design = UCH$design,
          cols = c(453),
           x = c("Citrate"),
          w = c("Hippurate"),
          z = c("Time"),
          ylab = "Intensity",
          title=c("Mean reponse for main Citrate peak"))

Principal Component Analysis

The function pcaBySvd is useful to compute a PCA decomposition of the outcomes matrix. The linked usefull functions are :

pcaScreePlot to obtaine a scree plot
pcaLoading1dPlot for the loading plots
pcaScorePlot for the score plots

ResPCA = pcaBySvd(UCH$outcomes)
pcaScreePlot(ResPCA, nPC = 6)

The score plots show a quite effect of the three design factors which will be more clearly highlighted by ASCA and APCA.

pcaScorePlot(resPcaBySvd = ResPCA, axes = c(1,2), 
             title = "PCA scores plot: PC1 and PC2", 
             design = UCH$design,
             color = "Hippurate", shape = "Citrate",
             points_labs_rn = FALSE)

pcaScorePlot(resPcaBySvd = ResPCA, axes = c(1,2), 
             title = "PCA scores plot: PC1 and PC2", 
             design = UCH$design,
             color = "Time", shape = "Hippurate",
             points_labs_rn = FALSE)

pcaScorePlot(resPcaBySvd = ResPCA, axes = c(3,4), 
             title = "PCA scores plot: PC3 and PC4", 
             design = UCH$design,
             color = "Time", shape = "Citrate",
             points_labs_rn = FALSE)

In the first two loading plots a mix of Citrate and Hippurate peaks already appear.

p2 <- pcaLoading1dPlot(resPcaBySvd = ResPCA, axes = c(1,2), 
               title = "PCA loadings plot UCH", xlab = "ppm", 
               ylab = "Intensity")

p2 +  hip_peaks + cit_peaks

Model estimation and effect matrix decomposition

Model formula

The formula of the ANOVA-GLM model used in this analysis is the 3 ways crossed ANOVA model:

UCH$formula

Model matrix generation

The first step of ASCA+ is to build the (GLM) model matrix from the experimental design matrix and the model. Each factor is reencoded with multiple binary variables using contr.sum coding. The model matrix is a \emph(34xp) with p being the total number of parameter in the ANOVA model for one response.

The function lmwModelMatrix() encodes the design matrix as a model matrix.

resLmwModelMatrix = lmwModelMatrix(UCH)
pander::pander(head(resLmwModelMatrix$modelMatrix))

Model estimation and effect matrices decomposition

lmwEffectMatrices() is the used to estimate the GLM model and decompose the outcomes matrix effect matrices for every model term. This function calculates also type III effect contributions (in %) and generates a linked barpot.

resLmwEffectMatrices = lmwEffectMatrices(resLmwModelMatrix)

Effects importance

The contributions from each effect is outputted from lmwEffectMatrices.

pander(resLmwEffectMatrices$variationPercentages)
resLmwEffectMatrices$varPercentagesPlot

Bootstrap tests and quantification of effects importance

lmwBootstrapTests() allows to apply parametric bootstrap test to determine whether an effect is significant or not. Use first a small value of nboot (e.g. nboot=100) to develop your code and increase then it (e.g. nboot=1000) in ordre to get an accurate value for the p-values.

resLmwBootstrapTests = 
  lmwBootstrapTests(resLmwEffectMatrices = resLmwEffectMatrices, 
                                         nboot=100)

# Print P-values
pander::pander(t(resLmwBootstrapTests$resultsTable))

ASCA/APCA/ASCA-E decomposition

ASCA/APCA/ASCA-E decomposition allow to represent the information from the effect matrices in a space of reduced dimensions through PCA. The function lmwPcaEffects() has a method argument to define which method to use, namely ASCA, APCA or ASCA-E.

ASCA

The ASCA method performs PCA on the pure effect matrices.

resASCA = lmwPcaEffects(resLmwEffectMatrices = resLmwEffectMatrices, 
                        method="ASCA", 
                        combineEffects = list(c("Hippurate", "Time", 
                                                "Hippurate:Time")))

Contributions

The contribution of each principal component of the effects is estimated and reported in tables with the function lmwContributions().

resLmwContributions = lmwContributions(resASCA)

The table of the percentage of contribution of each effect to the total variance outputted from lmwEffectMatrices is retained here.

pander::pander(resLmwContributions$totalContribTable)

Table of the variance percentage explained by each Principal Component in each model effect decomposition.

pander::pander(resLmwContributions$effectTable)

Table of the variance percentage explained by each Principal Component of each effect reported to the percentage contribution of the given effect to the total variance.

pander::pander(resLmwContributions$contribTable)

Equivalent of the effectTable for the combination of effects mentioned in lmwPcaEffects().

pander::pander(resLmwContributions$combinedEffectTable)

Moreover the function also outputs a barplot either with the ordered contributions per effect or across all the PCs of the different effects.

resLmwContributions$plotTotal
resLmwContributions$plotContrib

Scores and loadings Plots

The loadings can be represented on a line plot with the function lmwLoading1dPlot() to conveniently compare them with the original spectral profiles.

all_loadings_pl <- lmwLoading1dPlot(resASCA, 
                              effectNames = c("Hippurate", "Citrate","Time",
                                              "Hippurate:Time",
                                              "Hippurate+Time+Hippurate:Time", 
                                              "Residuals"), 
                              axes = 1, xlab = "ppm")

The score matrices are represented two components at a time on a scatterplot with the function lmwScorePlot().

Main effects

# Hippurate
hip_scores_pl <- lmwScorePlot(resASCA, effectNames = "Hippurate", 
             color = "Hippurate", shape = "Hippurate")

hip_loadings_pl <- all_loadings_pl$Hippurate +  hip_peaks

grid.arrange(hip_scores_pl,hip_loadings_pl, ncol=2)

# Citrate
cit_scores_pl <- lmwScorePlot(resASCA, effectNames = "Citrate", 
             color = "Citrate", shape = "Citrate")
cit_loadings_pl <- all_loadings_pl$Citrate +  cit_peaks

grid.arrange(cit_scores_pl,cit_loadings_pl, ncol=2)

# Time
tim_scores_pl <- lmwScorePlot(resASCA, effectNames = "Time", color = "Time", 
                              shape = "Time")

time_peaks <- annotate("rect", xmin=c(5.955364), xmax=c(6.155364), 
              ymin=-Inf, ymax=Inf, alpha=0.2, 
              fill=c("royalblue")) 

tim_loadings_pl <- all_loadings_pl$Time +  time_peaks

grid.arrange(tim_scores_pl,tim_loadings_pl, ncol=2)

Interaction `Hippurate:Time`

# Hippurate:Time
hiptim_scores_pl <- lmwScorePlot(resASCA, effectNames = "Hippurate:Time", 
                                 color = "Hippurate", shape = "Time")
hiptim_loadings_pl <- all_loadings_pl$`Hippurate:Time` + time_peaks + hip_peaks

grid.arrange(hiptim_scores_pl,hiptim_loadings_pl, ncol=2)

Combination of effects `Hippurate+Time+Hippurate:Time`

Scores and loadings of a combination of effects, here "Hippurate+Time+Hippurate:Time" can also be represented.

# Hippurate+Time+Hippurate:Time
hiptimInter_scores_pl <- lmwScorePlot(resASCA, 
                                 effectNames = "Hippurate+Time+Hippurate:Time", 
                                 color = "Hippurate", shape = "Time")

hiptimInter_loadings_pl <- all_loadings_pl$`Hippurate:Time` + 
                            time_peaks + hip_peaks

grid.arrange(hiptimInter_scores_pl,hiptimInter_loadings_pl, ncol=2)

For interaction terms of combined effects however, a better graphical representation is possible with the function lmwEffectPlot() (see below).

Model residuals

resid_scores_pl <- lmwScorePlot(resASCA, effectNames = "Residuals",
                                color = "Day", shape = "Day", 
                                drawShapes = "segment")


resid_loadings_pl <- all_loadings_pl$Residuals 

grid.arrange(resid_scores_pl,resid_loadings_pl, ncol=2)

We can also represent the scores with a matrix of plots with lmwScoreScatterPlotM(). This graph allows to observe multiple variables simultaneously.

lmwScoreScatterPlotM(resASCA,PCdim=c(1,1,1,1,1,1,1,2),
                     modelAbbrev = TRUE,
                     varname.colorup = "Citrate",
                     varname.colordown  = "Time",
                     varname.pchup="Hippurate",
                     varname.pchdown="Time",
                     title = "ASCA scores scatterplot matrix")

Finally the loadings could also be represented as a scatter plot.

# adding labels to points for the Hippurate peaks
labels = substr(colnames(UCH$outcomes),1,4)
labels[-c(369, 132, 150, 133, 149, 144, 145, 368, 151)] <- ""

lmwLoading2dPlot(resASCA, effectNames = c("Hippurate"), 
                 axes = c(1,2), points_labs = labels)

Effects plot

This function can only be applied to the ASCA method.

Main effects

lmwEffectPlot(resASCA, effectName = "Hippurate", x = "Hippurate")
lmwEffectPlot(resASCA, effectName = "Citrate", x = "Citrate")
lmwEffectPlot(resASCA, effectName = "Time", x = "Time")

Interaction `Hippurate:Time`

lmwEffectPlot(resASCA, effectName = "Hippurate:Time", 
              x = "Hippurate", z = "Time")
lmwEffectPlot(resASCA, effectName = "Hippurate:Time", 
              x = "Time", z = "Hippurate")

Combination of effects `Hippurate+Time+Hippurate:Time`

lmwEffectPlot(resASCA, effectName = "Hippurate+Time+Hippurate:Time", 
              x = "Hippurate", z = "Time")
lmwEffectPlot(resASCA, effectName = "Hippurate+Time+Hippurate:Time", 
              axes = c(1:3), x = "Time", z = "Hippurate")

APCA

The APCA method performs PCAs on the effect matrices augmented by the residuals. The same functions are used.

resAPCA = lmwPcaEffects(resLmwEffectMatrices = resLmwEffectMatrices, 
                        method="APCA")

Scores Plot

different shapes with drawShapes argument

lmwScorePlot(resAPCA, effectNames = "Hippurate", 
             color = "Hippurate", shape = "Hippurate", drawShapes = "ellipse")

lmwScorePlot(resAPCA, effectNames = "Citrate", 
             color = "Citrate", shape = "Citrate", drawShapes = "ellipse")

lmwScorePlot(resAPCA, effectNames = "Time", 
             color = "Time", shape = "Time", drawShapes = "ellipse")
lmwScorePlot(resAPCA, effectNames = "Time", 
             color = "Time", shape = "Time", drawShapes = "polygon")
lmwScorePlot(resAPCA, effectNames = "Time", 
             color = "Time", shape = "Time", drawShapes = "segment")

lmwScorePlot(resAPCA, effectNames = "Hippurate:Time", 
             color = "Hippurate", shape = "Time", drawShapes = "segment")
lmwScorePlot(resAPCA, effectNames = "Hippurate:Time", 
             color = "Hippurate", shape = "Time", drawShapes = "polygon")

lmwScoreScatterPlotM(resAPCA,
                     effectNames = c("Hippurate", "Citrate", "Time",
                                     "Hippurate:Time"),
                     modelAbbrev = TRUE,
                     varname.colorup = "Citrate",
                     varname.colordown  = "Time",
                     varname.pchup="Hippurate",
                     varname.pchdown="Time",
                     title = "APCA scores scatterplot matrix")

Loadings plot

lmwLoading1dPlot(resAPCA, effectNames = c("Hippurate", "Citrate", 
                                        "Time", "Hippurate:Time"), axes = 1)

ASCA-E

The ASCA-E method performs PCA on the effect matrices then adds the residuals to compute the augmented scores.

resASCAE = lmwPcaEffects(resLmwEffectMatrices = resLmwEffectMatrices, 
                         method="ASCA-E")

Contributions and loadings are similar to the ASCA results.

Scores Plot

lmwScorePlot(resASCAE, effectNames = "Hippurate", 
             color = "Hippurate", shape = "Hippurate")
lmwScorePlot(resASCAE, effectNames = "Citrate", 
             color = "Citrate", shape = "Citrate")
lmwScorePlot(resASCAE, effectNames = "Time", 
             color = "Time", shape = "Time")
lmwScorePlot(resASCAE, effectNames = "Hippurate:Time", 
             color = "Hippurate", shape = "Time")

lmwScoreScatterPlotM(resASCAE,
                     effectNames = c("Hippurate", "Citrate", "Time",
                                     "Hippurate:Time"),
                     modelAbbrev = TRUE,
                     varname.colorup = "Citrate",
                     varname.colordown  = "Time",
                     varname.pchup="Hippurate",
                     varname.pchdown="Time",
                     title = "ASCA-E scores scatterplot matrix")

Session info

sessionInfo()

References

bgovaerts/LMWiRe documentation built on Sept. 17, 2022, 12:32 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bgovaerts/LMWiRe Linear Models for Wide Responses

In bgovaerts/LMWiRe: Linear Models for Wide Responses

Introduction

Installation

Data importation

Data exploration

Design

Outcomes visualization

plotLine function

plotScatter function

plotScatterM function

plotMeans function

Principal Component Analysis

Model estimation and effect matrix decomposition

Model formula

Model matrix generation

Model estimation and effect matrices decomposition

Effects importance

Bootstrap tests and quantification of effects importance

ASCA/APCA/ASCA-E decomposition

ASCA

Contributions

Scores and loadings Plots

Main effects

Interaction Hippurate:Time

Combination of effects Hippurate+Time+Hippurate:Time

Model residuals

Effects plot

Main effects

Interaction Hippurate:Time

Combination of effects Hippurate+Time+Hippurate:Time

APCA

Scores Plot

Loadings plot

ASCA-E

Scores Plot

Session info

References

R Package Documentation

Browse R Packages

We want your feedback!

bgovaerts/LMWiRe
Linear Models for Wide Responses

`plotLine` function

`plotScatter` function

`plotScatterM` function

`plotMeans` function

Interaction `Hippurate:Time`

Combination of effects `Hippurate+Time+Hippurate:Time`

Interaction `Hippurate:Time`

Combination of effects `Hippurate+Time+Hippurate:Time`