bigsgpls: Unified Algorithm for sparse group PLS methods

Description Usage Arguments Examples

View source: R/generalised_algorithm.R

Description

Function to perform big sparse group Partial Least Squares (sgPLS) in the conext of datasets are divided into groups of variables. The sgPLS approach enables selection at both groups and single feature levels.

Usage

1
2
3
4
bigsgpls(X, Y, regularised = "none", keepX = NULL, keepY = NULL,
  H = 3, alpha.x = 0, alpha.y = 0, case = 2, epsilon = 10^-6,
  ng = 1, big_matrix_backing = NULL, ind.block.x = NULL,
  ind.block.y = NULL, scale = TRUE, GPU = FALSE, lambda = 0)

Arguments

X

matrix or big.matrix object for data measured on the same samples. Corresponds to predictors in Case 2.

Y

matrix or big.matrix object for data measured on the same samples. Corresponds to responses in Case 4.

regularised

type of regularisation (can be one of "none", "sparse", "group" or "sparse group")

keepX

penalisation parameter numeric vector of length H, the number of variables to keep in X-loadings. By default all variables are kept in the model.

keepY

penalisation parameter numeric vector of length H, the number of variables to keep in Y-loadings. By default all variables are kept in the model.

H

the number of components to include in the model.

alpha.x

The mixing parameter (value between 0 and 1) related to the sparsity within group for the X dataset.

alpha.y

The mixing parameter (value between 0 and 1) related to the sparsity within group for the Y dataset.

case

matches the Algorithm in the paper (1 = SVD, 2 = W2A, 3 = CCA, 4 = Regression).

epsilon

A positive real, the tolerance used in the iterative algorithm.

ng

The number of chuncks used to read in the data and process using parallel computing.

big_matrix_backing

Gives the folder to use for file backed output. If NULL then output is not file backed.

ind.block.x

a vector of integers describing the grouping of the X-variables.

ind.block.y

a vector of integers describing the grouping of the Y-variables.

scale

If TRUE then the PLS data blocks are standardized to zero means and unit variances. Default TRUE.

GPU

If TRUE then use the GPU for calculation of the chunks in the cross product. Default FALSE.

lambda

Lambda for use in Case 3 the CCA implmenetation of PLS. Default 0.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
set.seed(1)
n <- 500
p <- 50
X = scale(matrix(rnorm(n*p), ncol = p, nrow = n))
y = X[,1:5] %*% 1:5 + rnorm(n)

library(bigmemory)
X.bm <- as.big.matrix(X)
y.bm <- as.big.matrix(y)

library(doParallel)
registerDoParallel(cores = 2)
getDoParWorkers()
fit.PLS <- bigsgpls(X.bm, y.bm, case = 4, H = 4, ng = 10, keepX = rep(5,4), regularised = "sparse")
pred.fit <- predict(fit.PLS, newX = X, ng = 1)
round(pred.fit$Beta,3)

matt-sutton/bigsgPLS documentation built on May 12, 2020, 2:47 p.m.