bigsgpls: Unified Algorithm for sparse group PLS methods
In matt-sutton/bigsgPLS: Implementation of sgPLS for Big data

Description Usage Arguments Examples

View source: R/generalised_algorithm.R

Function to perform big sparse group Partial Least Squares (sgPLS) in the conext of datasets are divided into groups of variables. The sgPLS approach enables selection at both groups and single feature levels.

bigsgpls(X, Y, regularised = "none", keepX = NULL, keepY = NULL,
  H = 3, alpha.x = 0, alpha.y = 0, case = 2, epsilon = 10^-6,
  ng = 1, big_matrix_backing = NULL, ind.block.x = NULL,
  ind.block.y = NULL, scale = TRUE, GPU = FALSE, lambda = 0)

`X`	matrix or big.matrix object for data measured on the same samples. Corresponds to predictors in Case 2.
`Y`	matrix or big.matrix object for data measured on the same samples. Corresponds to responses in Case 4.
`regularised`	type of regularisation (can be one of "none", "sparse", "group" or "sparse group")
`keepX`	penalisation parameter numeric vector of length `H`, the number of variables to keep in X-loadings. By default all variables are kept in the model.
`keepY`	penalisation parameter numeric vector of length `H`, the number of variables to keep in Y-loadings. By default all variables are kept in the model.
`H`	the number of components to include in the model.
`alpha.x`	The mixing parameter (value between 0 and 1) related to the sparsity within group for the X dataset.
`alpha.y`	The mixing parameter (value between 0 and 1) related to the sparsity within group for the Y dataset.
`case`	matches the Algorithm in the paper (1 = SVD, 2 = W2A, 3 = CCA, 4 = Regression).
`epsilon`	A positive real, the tolerance used in the iterative algorithm.
`ng`	The number of chuncks used to read in the data and process using parallel computing.
`big_matrix_backing`	Gives the folder to use for file backed output. If NULL then output is not file backed.
`ind.block.x`	a vector of integers describing the grouping of the X-variables.
`ind.block.y`	a vector of integers describing the grouping of the Y-variables.
`scale`	If TRUE then the PLS data blocks are standardized to zero means and unit variances. Default TRUE.
`GPU`	If TRUE then use the GPU for calculation of the chunks in the cross product. Default FALSE.
`lambda`	Lambda for use in Case 3 the CCA implmenetation of PLS. Default 0.

set.seed(1)
n <- 500
p <- 50
X = scale(matrix(rnorm(n*p), ncol = p, nrow = n))
y = X[,1:5] %*% 1:5 + rnorm(n)

library(bigmemory)
X.bm <- as.big.matrix(X)
y.bm <- as.big.matrix(y)

library(doParallel)
registerDoParallel(cores = 2)
getDoParWorkers()
fit.PLS <- bigsgpls(X.bm, y.bm, case = 4, H = 4, ng = 10, keepX = rep(5,4), regularised = "sparse")
pred.fit <- predict(fit.PLS, newX = X, ng = 1)
round(pred.fit$Beta,3)