Multivariate tools for compositional data analysis: the ToolsForCoDA package"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
knitr::opts_chunk$set(fig.align = "center")
knitr::opts_chunk$set(warnings = FALSE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(fig.width = 6, fig.height = 6) 
rm(list=ls())

Introduction

The **ToolsForCoDa** package was originally created in order to provide functions for a canonical correlation analysis with compositional data (@Graffelman2018), based on the centred logratio (clr) transformation of the compositions. Posteriorly, it has been extended with additional tools for the multivariate analysis of compositional data in the R environment. Currently, this package (version 1.1.0) provides functionality for * log-ratio principal component analysis (LR-PCA). * log-ratio canonical correlation analysis (LR-CCO). * log-ratio discriminant analysis (LR-LDA). Both CCO and LDA rely on the inversion of a covariance matrix. The covariance matrix of the clr transformed compostions is structurally singular. The programs `lrcco` and `lrlda` resolve this with the use of a generalized inverse. Functionality for the analysis of compositional data in the R environment can be found in the packages **compositions** (@compositions), **robCompositions** (@robCompositions), **easyCODA** (@easyCODA) and **zCompositions** (@zCompositions). For further reading on compositional data, see the seminal text on compositional data by Aitchison (@Aitchison) and several recent statistical textbooks (@Pawlowsky, @Filzmoser, @Greenacre, @VanDenBoogaart). The remainder of this vignette shows an example session showing how to perform the three aforementioned types of analysis. 1. [Installation](#installation) 2. [LR-PCA](#lrpca) 3. [LR-CCO](#lrpco) 4. [LR-LDA](#lrlda) ## 1. Installation wzxhzdk:1 ## 2. Logratio principal component analysis (LR-PCA) We consider the composition of 37 Pinot Noir samples, consisting of the concentrations of Cd, Mo, Mn, Ni, Cu, Al, Ba, Cr, Sr, Pb, B, Mg, Si, Na, Ca, P, K and an evaluation of the wine's aroma. (@FrankKowalski). wzxhzdk:2 We apply closure to the chemical concentrations by division by their total, and use `lrpca` to do perform LR-PCA. wzxhzdk:3 We study the decomposition of compositional variance, and the decay of the LR-PCA eigenvalues by means of a screeplot wzxhzdk:4 We construct a covariance biplot, using `jointlim` to establish sensible limits for the x and y axes. Column markers for the clr transformed variables are multiplied by a constant (2.5) for a better visualization, and the amount of explained variance is indicated on the coordinate axes. wzxhzdk:5 This biplot reveals that the logratio $\ln{(Na/Pb)}$ has a large variance and is tightly correlated to the first principal component. The variable `Aroma` correlates with the first principal components wzxhzdk:6 and as the biplot suggests, `Aroma` correlates positively with the logratio $\ln{(Cr/Sr)}$ wzxhzdk:7 We note function `lrpca` also calculates condition indices, which may prove useful for detecting proportionality or one-dimensional relationships (@Graffelman2021). ## 3. Logratio canonical correlation analysys (LR-CCO) Two examples of LR-CCO are given below. The first example concerns a small artificial data set, where both the X and Y set are compositional, and is described in Section 3.1 of Graffelman et al. (2018). The second example concerns major oxides compositions of bentonites, where the X set is compositional and Y set is not. ### 3.1 Artificial data We first load two artificial 3-part compositions. wzxhzdk:8 We make the ternary diagrams of the two sets of compositions wzxhzdk:9 We perform the compositional canonical analysis: wzxhzdk:10 And we reproduce the results in Table 1 of Graffelman et al. (2018). The canonical correlations are obtained as wzxhzdk:11 The canonical weights of the X set and the Y set are obtained by: wzxhzdk:12 The canonical loadings of the X set and the Y set are obtained by wzxhzdk:13 The adequacy coefficients of the X set and the Y set: wzxhzdk:14 The redundancy coefficients of the X set and the Y set wzxhzdk:15 Finally, we make the full set of biplots for LR-CCO given in Figure 2 (@Graffelman2018). In each biplot, the canonical variates are multiplied by a convenient scalar to facilitate the visualization. wzxhzdk:16 Panel A shows the logratios $\log{(x_2/x_3)}$ and $\log{(y_1/y_2)}$ to have long links that run parallel to the first canonical variate with the largest canonical correlation; these logratios are highly correlated. The canonical biplot shows the association between the two sets of compositions, which is not visible in the ternary diagrams above. ### 3.2 Canonical analysis of bentonites In this subsection we treat the canonical analysis of bentonites. The X set concerns the concentrations of 9 major oxides, measured in 14 samples in the US (@Cadrin). A canonical analysis of this data set has been previously described (@Reyment), and is extended here with biplots. The Y set concerns two isotopes, $\delta D$ and $\delta 18O$. wzxhzdk:17 We clr-transform and column-center the major oxides, after deletion of MnO which is outlying and had many zeros, which were replaced with 0.001. We standardize the isotopes. wzxhzdk:18 The two canonical correlations are large: wzxhzdk:19 We construct a biplot of the data: wzxhzdk:20 We overplot the biplot with the canonical X-variates, which allows one to inspect the original samples (@Graffelman2005). For plotting, the canonical variate is scaled with a convenient scaling factor (here 0.45). This factor does not affect the interpretation of the biplot, but gives the samples a convenient spread. The logratio $\log{(Na/Mg)}$ (among others) almost coincides with the first canonical variate, which correlates with $\delta 18O$. However, interpretation should proceed with care because of the small sample size. wzxhzdk:21 ## 4. Logratio discriminant analysis (LR-LDA) We use archeological data from the UK (@Tubb) to illustrate LR-LDA. This dataset consists of measurements of nine oxides on 48 archeological samples from three regions in the UK. We first prepare the data: wzxhzdk:22 Next, we carry out LR-LDA by passing the compositions in `Oxides` to the function `lrlda`. Internally, `lrlda` applies the clr transformation of the data. wzxhzdk:23 The group sizes are obtained with: wzxhzdk:24 The group mean vectors of the clr transformed compositions are given by: wzxhzdk:25 The scores of the linear discriminant function are obtained by: wzxhzdk:26 The confusion matrix for the training observations is: wzxhzdk:27 Posterior probabilities for the classifications are obtained by wzxhzdk:28 We extract biplot coordinates for group centers, individual observations and variables, and construct the LDA biplot. wzxhzdk:29 The LR-LDA biplot shows perfect separation of the three UK regions and suggests that a single logratio like $\log{(MgO/Al2O3)}$ (among other possibilities) is capable of discriminating the three regions. A boxplot of this logratio confirms this. wzxhzdk:30

References



Try the ToolsForCoDa package in your browser

Any scripts or data that you put into this service are public.

ToolsForCoDa documentation built on April 3, 2025, 7:47 p.m.