Introduction

In this vignette, we illustrate how to construct the GMD-biplot and screeplot using the tobacco data set from [@Satten2017]. This data set includes 15 smokeless tobacco products: 6 dry snuffs, 7 moist snuffs, and 2 toombak samples from Sudan. Three separate (replicate) observations (starting with sample preparation) were made of each product, so that in total 45 observations are available. Each observation has a 271 × 1 vector of taxon counts. To make the measurements comparable, we consider the centered log ratio (CLR) transformation of the data set. Additionally, the squared weighted UniFrac distance, denoted $\Delta$, is used to measure the distance between samples. The corresponding similarity kernel $H$ is derived from $\Delta$ using the Gower's centering matrix.

Step 1: Loading the tobacco data set

We first load our R package GMDecomp.

library(GMDecomp)

The data object tobacco_clr in the package include

Step 2: Generalized Matrix Decomposition

To construct the GMD-biplot and screeplot, we need to first perform the generalized matrix decomposition [@Allen2014] of the data with respect to $H$. This can be easily achieved using the following line:

tobacco.gmd <- GMD(X = tobacco_clr$data, H = tobacco_clr$H, Q = diag(1, dim(tobacco_clr$data)[2]), K = 10)

Note that here we don't have a similarity kernel for the OTUs, so $Q$ is set to be an identity matrix. One can set $Q$ to be any informative positive semi-definite matrix, if such information is available. Also, here we set $K = 10$, since we want to display the screeplot the top 10 GMD components. If only the GMD-biplot is needed, one can set $K = 2$, which may save computational time.

tabacco.gmd is a list of class gmd, which consists of the following variables.

Step 3: The GMD-biplot and screeplot

Once the GMD outputs are obtained, the screeplot can by easily constructed as follows.

screeplot(tobacco.gmd) #the screeplot of the top 10 GMD components 

Note that one can select specific OTUs to display in the GMD-biplot. For this analysis, we display the top 3 OTUs that have the longest arrows.

gmd.order = order(rowSums(tobacco.gmd$V[,1:2]^2), decreasing = T)
plot.index = gmd.order[1:3]
plot.names = tobacco_clr$otu.names[plot.index]
biplot(fit = tobacco.gmd, index = plot.index, names = plot.names, sample.col = tobacco_clr$sample.color, sample.pch = tobacco_clr$sample.pch, arrow.col = 'grey50')

References



taryue/GMDecomp documentation built on Nov. 5, 2019, 10 a.m.