Example for msma"

r Sys.Date() @Atsushi Kawaguchi

The msma package provides functions for a matrix decomposition method incorporating sparse and supervised modeling for a multiblock multivariable data analysis.

Preparation

Install package (as necessary)

if(!require("msma")) install.packages("msma")

Load package

library(msma)
#source("../package/Ver3_nest/src.r")
#predir = unlist(strsplit(getwd(), "ADNI"))[1]
#source(paste(predir, "ADNI/Multimodal/program/package/Ver2/src.r", sep="")); library(mvtnorm)

Getting started

Simulated multiblock data (list data) by using the function simdata.

Sample size is 50. The correlation coeficient is 0.8. The numbers of columns for response and predictor can be specified by the argument Yps and Xps, respectively. The length of vecor represents the number of blocks. That is, response has three blocks with the numbers of columns being 3, 4, and 5 and predictor has one block with the number of columns being 3.

dataset0 = simdata(n = 50, rho = 0.8, Yps = c(3, 4, 5), Xps = 3, seed=1)
X0 = dataset0$X; Y0 = dataset0$Y 

The data generated here is applied to the msma function.

One Component

The argument comp can specify the number of components. The arguments lambdaX and lambdaY can specify the regularization parameters for X and Y, respectively.

First, we set comp=1, which will perform an analysis with 1 component.

fit01 = msma(X0, Y0, comp=1, lambdaX=0.05, lambdaY=1:3)
fit01

The plot function is available. In default setting, the block weights are displayed as a barplot.

plot(fit01)

Two Component

Next, we set comp=2, which will perform an analysis with 2 components.

fit02 = msma(X0, Y0, comp=2, lambdaX=0.03, lambdaY=0.01*(1:3))
fit02

Single Block

Two matrics are prepared by specifying arguments Yps and Xps.

dataset1 = simdata(n = 50, rho = 0.8, Yps = 5, Xps = 5, seed=1)
X1 = dataset1$X[[1]]; Y1 = dataset1$Y 

Principal Component Analysis (PCA)

If input is a matrix, a principal component analysis is implemented.

(fit111 = msma(X1, comp=5))

The weight (loading) vectors can be obtained as follows.

fit111$wbX

The bar plots of weight vectors are provided by the function plot. The component number is specified by the argument axes. The plot type is selected by the argument plottype. Furthermore, since this function uses the barplot function originally built into R, its arguments are also available. In the following example, on the horizontal axis, the magnification of the variable names is set to 0.7 by setting cex.names=0.7, and the variable names are oriented as las=2.

par(mfrow=c(1,2))
plot(fit111, axes = 1, plottype="bar", cex.names=0.7, las=2)
plot(fit111, axes = 2, plottype="bar", cex.names=0.7, las=2)

The score vectors for first six subjects.

lapply(fit111$sbX, head)

The scatter plots for the score vectors specified by the argument v. The argument axes is specified by the two length vector represents which components are displayed.

par(mfrow=c(1,2))
plot(fit111, v="score", axes = 1:2, plottype="scatter")
plot(fit111, v="score", axes = 2:3, plottype="scatter")

When the argument v was specified as "cpev", the cummulative eigenvalues are plotted.

par(mfrow=c(1,2))
plot(fit111, v="cpev", ylim=c(0.7, 1))

Other functions in R for PCA

There is the R function prcomp to implement PCA.

(fit1112 = prcomp(X1, scale=TRUE))
summary(fit1112)

This Rotation is almost the same as the output of msma, but it can be made closer by setting the argument ceps as follows.

fit1113 = msma(X1, comp=5, ceps=0.0000001)
fit1113$wbX

Plotting the scores with the signs turned over, we see that similar scores are calculated.

par(mfrow=c(1,2))
biplot(fit1112)
plot(-fit1113$sbX[[1]][,1:2],xlab="Component 1",ylab="Component 2")

The ggfortify package is also available for the PCA plot.

Sparse PCA

If lambdaX (>0) is specified, a sparse principal component analysis is implemented.

(fit112 = msma(X1, comp=5, lambdaX=0.1))
par(mfrow=c(1,2))
plot(fit112, axes = 1, plottype="bar", las=2)
plot(fit112, axes = 2, plottype="bar", las=2)

Supervised Sparse PCA

The outcome Z is generated.

set.seed(1); Z = rbinom(50, 1, 0.5)

If the outcome Z is specified, a supervised sparse principal component analysis is implemented.

(fit113 = msma(X1, Z=Z, comp=5, lambdaX=0.02))
par(mfrow=c(1,2))
plot(fit113, axes = 1, plottype="bar", las=2)
plot(fit113, axes = 2, plottype="bar", las=2)

Partial Least Squres (PLS)

If the another input Y1 is specified, a partial least squres is implemented.

(fit121 = msma(X1, Y1, comp=2))

The component number is specified by the argument axes. When the argument XY was specified as "XY", the scatter plots for Y score against X score are plotted.

par(mfrow=c(1,2))
plot(fit121, axes = 1, XY="XY")
plot(fit121, axes = 2, XY="XY")

Sparse PLS

If lambdaX and lambdaY are specified, a sparse PLS is implemented.

(fit122 = msma(X1, Y1, comp=2, lambdaX=0.5, lambdaY=0.5))
par(mfrow=c(1,2))
plot(fit122, axes = 1, XY="XY")
plot(fit122, axes = 2, XY="XY")

Supervised Sparse PLS

If the outcome Z is specified, a supervised sparse PLS is implemented.

(fit123 = msma(X1, Y1, Z, comp=2, lambdaX=0.5, lambdaY=0.5))
par(mfrow=c(1,2))
plot(fit123, axes = 1, XY="XY")
plot(fit123, axes = 2, XY="XY")

Multi Block

Multiblock data is a list of data matrix.

dataset2 = simdata(n = 50, rho = 0.8, Yps = c(2, 3), Xps = c(3, 4), seed=1)
X2 = dataset2$X; Y2 = dataset2$Y 

PCA

The input class is list.

class(X2)

The list length is 2 for 2 blocks.

length(X2)

list of data matrix structure.

lapply(X2, dim)

The function msma is applied to this list X2 as follows.

(fit211 = msma(X2, comp=1))

The bar plots for the block and super weights (loadings) specified the argument block.

par(mfrow=c(1,2))
plot(fit211, axes = 1, plottype="bar", block="block", las=2)
plot(fit211, axes = 1, plottype="bar", block="super")

Sparse PCA

If lambdaX with the length of 2 (same as the length of blocks) are specified, a multiblock sparse PCA is implemented.

(fit212 = msma(X2, comp=1, lambdaX=c(0.5, 0.5)))

The bar plots for the block and super weights (loadings).

par(mfrow=c(1,2))
plot(fit212, axes = 1, plottype="bar", block="block", las=2)
plot(fit212, axes = 1, plottype="bar", block="super")

Supervised Sparse PCA

If the outcome Z is specified, a supervised analysis is implemented.

(fit213 = msma(X2, Z=Z, comp=1, lambdaX=c(0.5, 0.5)))
par(mfrow=c(1,2))
plot(fit213, axes = 1, plottype="bar", block="block", las=2)
plot(fit213, axes = 1, plottype="bar", block="super")

Nested Component

A vector of length 2 can be given to the comp argument to perform the nested component analysis, which is a method to consider multiple components even in the super component. The first element of the vector corresponds to the number of block components and the second element corresponds to the number of (nested) super components.

(fit214 = msma(X2, comp=c(2,3)))

In this example, there are 2 block components and 3 super components.

fit214$wbX

For the block weights, the number of blocks is 2 since there are two data matrices as shown as follows, and the number of rows is 3 and 4, the number of variables in each.

The number of components is 2 for the first element of the vector specified by the comp argument, which is the number of columns in each matrix.

par(mfrow=c(1,2))
plot(fit214, axes = 1, axes2 = 1, plottype="bar", block="block", las=2)
plot(fit214, axes = 2, axes2 = 1, plottype="bar", block="block", las=2)
fit214$wsX
par(mfrow=c(2,3))
for(j in 1:2) for(i in 1:3) plot(fit214, axes = j, axes2 = i, plottype="bar", block="super")

PLS

If the another input (list) Y2 is specified, the partial least squared is implemented.

(fit221 = msma(X2, Y2, comp=1))
par(mfrow=c(1,2))
plot(fit221, axes = 1, plottype="bar", block="block", XY="X", las=2)
plot(fit221, axes = 1, plottype="bar", block="super", XY="X")
par(mfrow=c(1,2))
plot(fit221, axes = 1, plottype="bar", block="block", XY="Y", las=2)
plot(fit221, axes = 1, plottype="bar", block="super", XY="Y")

Sparse PLS

The regularized parameters lambdaX and lambdaY are specified vectors with same length with the length of lists X2 and Y2, respectively.

(fit222 = msma(X2, Y2, comp=1, lambdaX=c(0.5, 0.5), lambdaY=c(0.5, 0.5)))
par(mfrow=c(1,2))
plot(fit222, axes = 1, plottype="bar", block="block", XY="X", las=2)
plot(fit222, axes = 1, plottype="bar", block="super", XY="X")
par(mfrow=c(1,2))
plot(fit222, axes = 1, plottype="bar", block="block", XY="Y", las=2)
plot(fit222, axes = 1, plottype="bar", block="super", XY="Y")

Supervised Sparse PLS

(fit223 = msma(X2, Y2, Z, comp=1, lambdaX=c(0.5, 0.5), lambdaY=c(0.5, 0.5)))
par(mfrow=c(1,2))
plot(fit223, axes = 1, plottype="bar", block="block", XY="X", las=2)
plot(fit223, axes = 1, plottype="bar", block="super", XY="X")
par(mfrow=c(1,2))
plot(fit223, axes = 1, plottype="bar", block="block", XY="Y", las=2)
plot(fit223, axes = 1, plottype="bar", block="super", XY="Y")

Parameter Selection

Number of Components

number of components search

Single Block

PCA

(ncomp11 = ncompsearch(X1, comps = c(1, 5, 10*(1:2)), nfold=5))
plot(ncomp11)
(ncomp12 = ncompsearch(X1, comps = 20, criterion="BIC"))
plot(ncomp12)

PLS

ncomp21 = ncompsearch(X2, Y2, comps = c(1, 5, 10*(1:2)), nfold=5)
plot(ncomp21)

Multi Block

The multi block structure has

dataset3 = simdata(n = 50, rho = 0.8, Yps = rep(4, 5), Xps = rep(4, 5), seed=1)
X3 = dataset3$X; Y3 = dataset3$Y 

PCA

ncomp31 = ncompsearch(X3, comps = 20, criterion="BIC")
plot(ncomp31)

Nested

(ncomp32 = ncompsearch(X3, comps = list(20, 20), criterion="BIC"))
par(mfrow=c(1,2))
plot(ncomp32,1)
plot(ncomp32,2)

PLS

ncomp41 = ncompsearch(X3, Y3, comps = c(1, 5, 10*(1:2)), criterion="BIC")
plot(ncomp41)

Combined Search

The number of components and regularized parameters can be selected by the function optparasearch. The following options are available.

criteria = c("BIC", "CV")
search.methods = c("regparaonly", "regpara1st", "ncomp1st", "simultaneous")

Single Block

PCA

(opt11 = optparasearch(X1, search.method = "regparaonly", criterion="BIC", comp=ncomp11$optncomp))
(fit311 = msma(X1, comp=opt11$optncomp, lambdaX=opt11$optlambdaX))
(opt12 = optparasearch(X1, search.method = "ncomp1st", criterion="BIC"))
(fit312 = msma(X1, comp=opt12$optncomp, lambdaX=opt12$optlambdaX))
(opt13 = optparasearch(X1, search.method = "regpara1st", criterion="BIC"))
(fit313 = msma(X1, comp=opt13$optncomp, lambdaX=opt13$optlambdaX))
(opt14 = optparasearch(X1, search.method = "simultaneous", criterion="BIC"))
(fit314 = msma(X1, comp=opt14$optncomp, lambdaX=opt14$optlambdaX))

The argument maxpct4ncomp=0.5 means that 0.5$\lambda$ is used as the regularized parameter when the number of components is searched and where $\lambda$ is the maximum of the regularized parameters among the possible candidates.

(opt132 = optparasearch(X1, search.method = "ncomp1st", criterion="BIC", maxpct4ncomp=0.5))
(fit3132 = msma(X1, comp=opt132$optncomp, lambdaX=opt132$optlambdaX))

The result with the argument regpara1st depends on the number of components and the default value is 10. The number of components is set as follows.

(opt133 = optparasearch(X1, search.method = "regpara1st", criterion="BIC", comp=5))
(fit3133 = msma(X1, comp=opt133$optncomp, lambdaX=opt133$optlambdaX))

PLS

For PLS, two parameters $\lambda_X$ and $\lambda_Y$ are used in arguments lambdaX and lambdaY to control sparseness for data X and Y, respectively.

(opt21 = optparasearch(X2, Y2, search.method = "regparaonly", criterion="BIC"))
(fit321 = msma(X2, Y2, comp=opt21$optncomp, lambdaX=opt21$optlambdaX, lambdaY=opt21$optlambdaY))

Multi Block

PCA

(opt31 = optparasearch(X3, search.method = "regparaonly", criterion="BIC"))
(fit331 = msma(X3, comp=opt31$optncomp, lambdaX=opt31$optlambdaX, lambdaXsup=opt31$optlambdaXsup))
(opt32 = optparasearch(X3, search.method = "regparaonly", criterion="BIC", whichselect="X"))
(fit332 = msma(X3, comp=opt32$optncomp, lambdaX=opt32$optlambdaX, lambdaXsup=opt32$optlambdaXsup))
(opt33 = optparasearch(X3, search.method = "regparaonly", criterion="BIC", whichselect="Xsup"))
(fit333 = msma(X3, comp=opt33$optncomp, lambdaX=opt33$optlambdaX, lambdaXsup=opt33$optlambdaXsup))

Nested

ncomp1st

(opt341 = optparasearch(X3, search.method = "ncomp1st", criterion="BIC", comp=c(8, 8)))
(fit341 = msma(X3, comp=opt341$optncomp, lambdaX=opt341$optlambdaX, lambdaXsup=opt341$optlambdaXsup))

regparaonly

(opt342 = optparasearch(X3, search.method = "regparaonly", criterion="BIC", comp=c(4, 5)))
(fit342 = msma(X3, comp=opt342$optncomp, lambdaX=opt342$optlambdaX, lambdaXsup=opt342$optlambdaXsup))

regpara1st

(opt344 = optparasearch(X3, search.method = "regpara1st", criterion="BIC", comp=c(8, 8)))
(fit344 = msma(X3, comp=opt344$optncomp, lambdaX=opt344$optlambdaX, lambdaXsup=opt344$optlambdaXsup))

This is computationally expensive and takes much longer to execute.

(opt345 = optparasearch(X3, search.method = "simultaneous", criterion="BIC", comp=c(8, 8)))
(fit345 = msma(X3, comp=opt345$optncomp, lambdaX=opt345$optlambdaX))

PLS

This is computationally expensive and takes much longer to execute due to the large number of blocks.

(opt41 = optparasearch(X3, Y3, search.method = "regparaonly", criterion="BIC"))
(fit341 = msma(X3, Y3, comp=opt41$optncomp, lambdaX=opt41$optlambdaX, lambdaY=opt41$optlambdaY, lambdaXsup=opt41$optlambdaXsup, lambdaYsup=opt41$optlambdaYsup))

In this example, it works by narrowing down the parameters as follows.

(opt42 = optparasearch(X3, Y3, search.method = "regparaonly", criterion="BIC", whichselect=c("Xsup","Ysup")))
(fit342 = msma(X3, Y3, comp=opt42$optncomp, lambdaX=opt42$optlambdaX, lambdaY=opt42$optlambdaY, lambdaXsup=opt42$optlambdaXsup, lambdaYsup=opt42$optlambdaYsup))

Another example dataset is generated.

dataset4 = simdata(n = 50, rho = 0.8, Yps = rep(4, 2), Xps = rep(4, 3), seed=1)
X4 = dataset4$X; Y4 = dataset4$Y 

With this number of blocks, the calculation can be performed in a relatively short time.

(opt43 = optparasearch(X4, Y4, search.method = "regparaonly", criterion="BIC"))
(fit343 = msma(X4, Y4, comp=opt43$optncomp, lambdaX=opt43$optlambdaX, lambdaY=opt43$optlambdaY, lambdaXsup=opt43$optlambdaXsup, lambdaYsup=opt43$optlambdaYsup))


Try the msma package in your browser

Any scripts or data that you put into this service are public.

msma documentation built on Aug. 25, 2023, 9:07 a.m.