sgspls: Sparse Group Subgroup PLS

Description Usage Arguments Value Author(s) References See Also Examples

Description

Fit a PLS model to two blocks of data via the sparse group subgroup Partial Least Squares (sgspls) algorithm. The sgspls algorithm enables selection of variables at the group, subgroup and single feature levels.

Usage

1
2
3
4
5
6
sgspls(X, Y, ncomp = 2, mode = "regression", keepX = NA, keepY = NA,
  max.iter = 500, tol = 1e-12, scale.x = T, scale.y = F,
  groupX = rep(1, ncol(X)), groupY = rep(1, ncol(Y)), subgroupX = rep(1,
  ncol(X)), subgroupY = rep(1, ncol(Y)), indiv_sparsity_x = rep(0, ncomp),
  subgroup_sparsity_x = rep(0, ncomp), indiv_sparsity_y = rep(0, ncomp),
  subgroup_sparsity_y = rep(0, ncomp), ...)

Arguments

X

A matrix of regressors (n x p). By default the matrix will be centered to have mean zero.

Y

A matrix of continuous responses (n x q). By default the matrix will be centered to have mean zero.

ncomp

The number of components to include in the model.

mode

A character string. What type of PLS algorithm to use, matching one of "regression", "canonical". See Details.

keepX

Numeric vector of length ncomp, the number of groups to select in X-loadings. Default selects all groups.

keepY

Numeric vector of length ncomp, the number of groups to select in Y-loadings. Default selects all groups.

max.iter

How many iterations should be performed? Default is 500.

tol

A positive real tolerance for the PLS algorithm.

scale.x

Scale predictors by their standard deviation.

scale.y

Scale responses by their standard deviation.

groupX

A vector describing the group details of the X variable. (see example in Details).

groupY

A vector describing the group details of the Y variable. (see example in Details).

subgroupX

A vector describing the subgroup details of the X block (see example in Details).

subgroupY

A vector describing the subgroup details of the Y block (see example in Details).

indiv_sparsity_x

Individual sparisty parameter (value between 0 and 1) related to the sparisty within subgroups for the X block.

subgroup_sparsity_x

Sub-group sparisty parameter (value between 0 and 1) related to the number of subgroups selected for the PLS X weights.

indiv_sparsity_y

Individual sparisty parameter (value between 0 and 1) related to the sparisty within subgroups for the Y block.

subgroup_sparsity_y

Sub-group sparisty parameter (value between 0 and 1) related to the number of subgroups selected for the PLS Y weights.

...

additional arguments for low level functionality.

Value

sgspls returns an object of class "sgspls", a list that contains the following components:

weights

a list containing the X and Y pls weights.

scores

a list containing the X and Y pls scores.

names

a list containing the X and Y names.

parameters

a list containing the parameters of the model that was fitted.

Author(s)

Matthew Sutton m5.sutton@hdr.qut.edu.au

References

Liquet Benoit, Lafaye de Micheaux, Boris Hejblum, Rodolphe Thiebaut. A group and Sparse Group Partial Least Square approach applied in Genomics context. Submitted.

L\^e Cao, K.-A., Martin, P.G.P., Robert-Grani\'e, C. and Besse, P. (2009). Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10:34.

Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015-1034.

Tenenhaus, M. (1998). La r\'egression PLS: th\'eorie et pratique. Paris: Editions Technic.

See Also

Tuning functions calc_pve, tune_sgspls, tune_groups. Model performance and estimation predict.sgspls, perf.sgspls, coef.sgspls

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
set.seed(1)
n = 50; p = 510; 

size.groups = 30; size.subgroups = 5
groupX <- ceiling(1:p / size.groups)
subgroupX <- ceiling(1:p / size.subgroups)

X = matrix(rnorm(n * p), ncol = p, nrow = n)

beta <- rep(0,p)
bSG <- -2:2; b0 <- rep(0,length(bSG))
betaG <- c(bSG, b0, bSG, b0, bSG, b0)
beta[1:size.groups] <- betaG

y = X %*% beta + 0.1*rnorm(n)

model <- sgspls(X, y, ncomp = 3, mode = "regression", keepX = 1,
                groupX = groupX, subgroupX = subgroupX,
                indiv_sparsity_x = 0.8, subgroup_sparsity_x = 0.15)

reg_coef <- coef(model, type = "coefficients")

# Check model fit
cbind(reg_coef$B[ , , 3], beta)

## Not run: 
cbind(model.sgsplsR$B.hat[,,3], beta)[1:30,]

## End(Not run)

matt-sutton/sgspls documentation built on June 22, 2019, 10:21 a.m.