# Example for msma" In msma: Multiblock Sparse Multivariable Analysis

r Sys.Date() @Atsushi Kawaguchi

The msma package provides functions for a matrix decomposition method incorporating sparse and supervised modeling for a multiblock multivariable data analysis.

# Preparation

Install package (as necessary)

if(!require("msma")) install.packages("msma")


library(msma)

#predir = unlist(strsplit(getwd(), "ADNI"))
#source(paste(predir, "ADNI/Multimodal/program/package/Ver2/src.r", sep="")); library(mvtnorm)


# Getting started

Simulated multiblock data (list data) by using the function simdata. Sample size is 50. The correlation coeficient is 0.8. The numbers of columns for response and predictor can be specified by the argument Yps and Xps, respectively. The length of vecor represents the number of blocks. That is, response has three blocks with the numbers of columns being 3, 4, and 5 and predictor has one block with the number of columns being 3.

dataset0 = simdata(n = 50, rho = 0.8, Yps = c(3, 4, 5), Xps = 3, seed=1)
X0 = dataset0$X; Y0 = dataset0$Y


The argument comp can specify the number of components. The arguments lambdaX and lambdaY can specify the regularization parameters for X and Y, respectively.

• One Component
fit01 = msma(X0, Y0, comp=1, lambdaX=0.05, lambdaY=1:3)
fit01


The plot function is available. In default setting, the block weights are displayed as a barplot.

plot(fit01)

• Two Component
fit02 = msma(X0, Y0, comp=2, lambdaX=0.03, lambdaY=0.01*(1:3))
fit02


# Single Block

Two matrics are prepared by specifying arguments Yps and Xps.

dataset1 = simdata(n = 50, rho = 0.8, Yps = 5, Xps = 5, seed=1)
X1 = dataset1$X[]; Y1 = dataset1$Y


## Principal Component Analysis (PCA)

If input is a matrix, a principal component analysis is implemented.

(fit111 = msma(X1, comp=5))


The weight (loading) vectors can be obtained as follows.

fit111$wbX  The bar plots of weight vectors are provided by the function plot. The component number is specified by the argument axes. The plot type is selected by the argument plottype. par(mfrow=c(1,2)) plot(fit111, axes = 1, plottype="bar") plot(fit111, axes = 2, plottype="bar")  The score vectors for first six subjects. lapply(fit111$sbX, head)


The scatter plots for the score vectors specified by the argument v. The argument axes is specified by the two length vector represents which components are displayed.

plot(fit111, v="score", axes = 1:2, plottype="scatter")
plot(fit111, v="score", axes = 2:3, plottype="scatter")


When the argument v was specified as "cpev", the cummulative eigenvalues are plotted.

par(mfrow=c(1,1))
plot(fit111, v="cpev")


### Other functions in R for PCA

There is the R function prcomp to implement PCA.

(fit1112 = prcomp(X1, scale=TRUE))
summary(fit1112)
biplot(fit1112)


The ggfortify package is also available for the PCA plot.

### Sparse PCA

If lambdaX (>0) is specified, a sparse principal component analysis is implemented.

(fit112 = msma(X1, comp=5, lambdaX=0.1))
par(mfrow=c(1,2))
plot(fit112, axes = 1, plottype="bar")
plot(fit112, axes = 2, plottype="bar")


### Supervised Sparse PCA

The outcome Z is generated.

set.seed(1); Z = rbinom(50, 1, 0.5)


If the outcome Z is specified, a supervised sparse principal component analysis is implemented.

(fit113 = msma(X1, Z=Z, comp=5, lambdaX=0.02))

par(mfrow=c(1,2))
plot(fit113, axes = 1, plottype="bar")
plot(fit113, axes = 2, plottype="bar")


## Partial Least Squres (PLS)

If the another input Y1 is specified, a partial least squres is implemented.

(fit121 = msma(X1, Y1, comp=2))


The component number is specified by the argument axes. When the argument XY was specified as "XY", the scatter plots for Y score against X score are plotted.

par(mfrow=c(1,2))
plot(fit121, axes = 1, XY="XY")
plot(fit121, axes = 2, XY="XY")


### Sparse PLS

If lambdaX and lambdaY are specified, a sparse PLS is implemented.

(fit122 = msma(X1, Y1, comp=2, lambdaX=0.5, lambdaY=0.5))
par(mfrow=c(1,2))
plot(fit122, axes = 1, XY="XY")
plot(fit122, axes = 2, XY="XY")


### Supervised Sparse PLS

If the outcome Z is specified, a supervised sparse PLS is implemented.

(fit123 = msma(X1, Y1, Z, comp=2, lambdaX=0.5, lambdaY=0.5))
par(mfrow=c(1,2))
plot(fit123, axes = 1, XY="XY")
plot(fit123, axes = 2, XY="XY")


# Multi Block

Multiblock data is a list of data matrix.

dataset2 = simdata(n = 50, rho = 0.8, Yps = c(2, 3), Xps = c(3, 4), seed=1)
X2 = dataset2$X; Y2 = dataset2$Y


## PCA

The input class is list.

class(X2)


The list length is 2 for 2 blocks.

length(X2)


list of data matrix structure.

lapply(X2, dim)


The function msma is applied to this list X2 as follows.

(fit211 = msma(X2, comp=1))


The bar plots for the block and super weights (loadings) specified the argument block.

par(mfrow=c(1,2))
plot(fit211, axes = 1, plottype="bar", block="block")
plot(fit211, axes = 1, plottype="bar", block="super")


### Sparse PCA

If lambdaX with the length of 2 (same as the length of blocks) are specified, a multiblock sparse PCA is implemented.

(fit212 = msma(X2, comp=1, lambdaX=c(0.5, 0.5)))


The bar plots for the block and super weights (loadings).

par(mfrow=c(1,2))
plot(fit212, axes = 1, plottype="bar", block="block")
plot(fit212, axes = 1, plottype="bar", block="super")


### Supervised Sparse PCA

If the outcome Z is specified, a supervised analysis is implemented.

(fit213 = msma(X2, Z=Z, comp=1, lambdaX=c(0.5, 0.5)))
par(mfrow=c(1,2))
plot(fit213, axes = 1, plottype="bar", block="block")
plot(fit213, axes = 1, plottype="bar", block="super")


## PLS

If the another input (list) Y2 is specified, the partial least squared is implemented.

(fit221 = msma(X2, Y2, comp=1))

par(mfrow=c(1,2))
plot(fit221, axes = 1, plottype="bar", block="block", XY="X")
plot(fit221, axes = 1, plottype="bar", block="super", XY="X")

par(mfrow=c(1,2))
plot(fit221, axes = 1, plottype="bar", block="block", XY="Y")
plot(fit221, axes = 1, plottype="bar", block="super", XY="Y")


### Sparse PLS

The regularized parameters lambdaX and lambdaY are specified vectors with same length with the length of lists X2 and Y2, respectively.

(fit222 = msma(X2, Y2, comp=1, lambdaX=c(0.5, 0.5), lambdaY=c(0.5, 0.5)))

par(mfrow=c(1,2))
plot(fit222, axes = 1, plottype="bar", block="block", XY="X")
plot(fit222, axes = 1, plottype="bar", block="super", XY="X")

par(mfrow=c(1,2))
plot(fit222, axes = 1, plottype="bar", block="block", XY="Y")
plot(fit222, axes = 1, plottype="bar", block="super", XY="Y")


### Supervised Sparse PLS

(fit223 = msma(X2, Y2, Z, comp=1, lambdaX=c(0.5, 0.5), lambdaY=c(0.5, 0.5)))

par(mfrow=c(1,2))
plot(fit223, axes = 1, plottype="bar", block="block", XY="X")
plot(fit223, axes = 1, plottype="bar", block="super", XY="X")

par(mfrow=c(1,2))
plot(fit223, axes = 1, plottype="bar", block="block", XY="Y")
plot(fit223, axes = 1, plottype="bar", block="super", XY="Y")


# Parameter Selection

The number of components and regularized parameters can be selected by the function optparasearch. The following options are available.

criteria = c("BIC", "CV")
search.methods = c("regparaonly", "regpara1st", "ncomp1st", "simultaneous")


## Single Block

### PCA

• The regparaonly method searches for the regularized parameters with a fixed number of components.
(opt11 = optparasearch(X1, search.method = "regparaonly", criterion="BIC"))
(fit311 = msma(X1, comp=opt11$optncomp, lambdaX=opt11$optlambdaX))

• The regpara1st identifies the regularized parameters by fixing the number of components, then searching for the number of components with the selected regularized parameters.
(opt12 = optparasearch(X1, search.method = "regpara1st", criterion="BIC"))
(fit312 = msma(X1, comp=opt12$optncomp, lambdaX=opt12$optlambdaX))

• The ncomp1st method identifies the number of components with a regularized parameter of 0, then searches for the regularized parameters with the selected number of components.
(opt13 = optparasearch(X1, search.method = "ncomp1st", criterion="BIC"))
(fit313 = msma(X1, comp=opt13$optncomp, lambdaX=opt13$optlambdaX))

• The simultaneous method identifies the number of components by searching the regularized parameters in each component.
(opt14 = optparasearch(X1, search.method = "simultaneous", criterion="BIC"))
(fit314 = msma(X1, comp=opt14$optncomp, lambdaX=opt14$optlambdaX))


The argument maxpct4ncomp=0.5 means that 0.5$\lambda$ is used as the regularized parameter when the number of components is searched and where $\lambda$ is the maximum of the regularized parameters among the possible candidates.

(opt132 = optparasearch(X1, search.method = "ncomp1st", criterion="BIC", maxpct4ncomp=0.5))
(fit3132 = msma(X1, comp=opt132$optncomp, lambdaX=opt132$optlambdaX))


### PLS

For PLS, two parameters $\lambda_X$ and $\lambda_Y$ are used in arguments lambdaX and lambdaY to control sparseness for data X and Y, respectively.

(opt21 = optparasearch(X2, Y2, search.method = "regparaonly", criterion="BIC"))
(fit321 = msma(X2, Y2, comp=opt21$optncomp, lambdaX=opt21$optlambdaX, lambdaY=opt21$optlambdaY))  ## Multi Block The multi block structure has dataset3 = simdata(n = 50, rho = 0.8, Yps = rep(4, 5), Xps = rep(4, 5), seed=1) X3 = dataset3$X; Y3 = dataset3$Y  ### PCA (opt31 = optparasearch(X3, search.method = "regparaonly", criterion="BIC")) (fit331 = msma(X3, comp=opt31$optncomp, lambdaX=opt31$optlambdaX, lambdaXsup=opt31$optlambdaXsup))

(opt32 = optparasearch(X3, search.method = "regparaonly", criterion="BIC", whichselect="X"))
(fit332 = msma(X3, comp=opt32$optncomp, lambdaX=opt32$optlambdaX, lambdaXsup=opt32$optlambdaXsup))  (opt33 = optparasearch(X3, search.method = "regparaonly", criterion="BIC", whichselect="Xsup")) (fit333 = msma(X3, comp=opt33$optncomp, lambdaX=opt33$optlambdaX, lambdaXsup=opt33$optlambdaXsup))


### PLS

This is computationally expensive and takes much longer to execute due to the large number of blocks.

(opt41 = optparasearch(X3, Y3, search.method = "regparaonly", criterion="BIC"))
(fit341 = msma(X3, Y3, comp=opt41$optncomp, lambdaX=opt41$optlambdaX, lambdaY=opt41$optlambdaY, lambdaXsup=opt41$optlambdaXsup, lambdaYsup=opt41$optlambdaYsup))  In this example, it works by narrowing down the parameters as follows. (opt42 = optparasearch(X3, Y3, search.method = "regparaonly", criterion="BIC", whichselect=c("Xsup","Ysup"))) (fit342 = msma(X3, Y3, comp=opt42$optncomp, lambdaX=opt42$optlambdaX, lambdaY=opt42$optlambdaY, lambdaXsup=opt42$optlambdaXsup, lambdaYsup=opt42$optlambdaYsup))


Another example dataset is generated.

dataset4 = simdata(n = 50, rho = 0.8, Yps = rep(4, 2), Xps = rep(4, 3), seed=1)
X4 = dataset4$X; Y4 = dataset4$Y


With this number of blocks, the calculation can be performed in a relatively short time.

(opt43 = optparasearch(X4, Y4, search.method = "regparaonly", criterion="BIC"))
(fit343 = msma(X4, Y4, comp=opt43$optncomp, lambdaX=opt43$optlambdaX, lambdaY=opt43$optlambdaY, lambdaXsup=opt43$optlambdaXsup, lambdaYsup=opt43\$optlambdaYsup))


## Try the msma package in your browser

Any scripts or data that you put into this service are public.

msma documentation built on June 25, 2021, 5:09 p.m.