subsampling: Subsampling
In lsgl: Linear Multiple Output Sparse Group Lasso

Description Usage Arguments Value Author(s) Examples

Linear multiple output subsampling using multiple possessors

subsampling(x, y, intercept = TRUE, weights = NULL, grouping = NULL,
  groupWeights = NULL, parameterWeights = NULL, alpha = 1, lambda,
  d = 100, train, test, collapse = FALSE, max.threads = NULL,
  use_parallel = FALSE, algorithm.config = lsgl.standard.config)

`x`	design matrix, matrix of size N \times p.
`y`	response matrix, matrix of size N \times K.
`intercept`	should the model include intercept parameters.
`weights`	sample weights, vector of size N \times K.
`grouping`	grouping of features, a factor or vector of length p. Each element of the factor/vector specifying the group of the feature.
`groupWeights`	the group weights, a vector of length m (the number of groups).
`parameterWeights`	a matrix of size K \times p.
`alpha`	the α value 0 for group lasso, 1 for lasso, between 0 and 1 gives a sparse group lasso penalty.
`lambda`	lambda.min relative to lambda.max or the lambda sequence for the regularization path (that is a vector or a list of vectors with the lambda sequence for the subsamples).
`d`	length of lambda sequence (ignored if `length(lambda) > 1`)
`train`	a list of training samples, each item of the list corresponding to a subsample. Each item in the list must be a vector with the indices of the training samples for the corresponding subsample. The length of the list must equal the length of the `test` list.
`test`	a list of test samples, each item of the list corresponding to a subsample. Each item in the list must be vector with the indices of the test samples for the corresponding subsample. The length of the list must equal the length of the `training` list.
`collapse`	if `TRUE` the results for each subsample will be collapse into one result (this is useful if the subsamples are not overlapping)
`max.threads`	Deprecated (will be removed in 2018), instead use `use_parallel = TRUE` and registre parallel backend (see package 'doParallel'). The maximal number of threads to be used.
`use_parallel`	If `TRUE` the `foreach` loop will use `%dopar%`. The user must registre the parallel backend.
`algorithm.config`	the algorithm configuration to be used.

`Yhat`	if `collapse = FALSE` then a list of length `length(test)` containing the predicted responses for each of the test sets. If `collapse = TRUE` a list of length `length(lambda)`
`Y.true`	a list of length `length(test)` containing the true responses of the test samples
`features`	number of features used in the models
`parameters`	number of parameters used in the models.

Martin Vincent

set.seed(100) # This may be removed, it ensures consistency of the daily tests

## Simulate from Y=XB+E, the dimension of Y is N x K, X is N x p, B is p x K

N <- 100 #number of samples
p <- 50 #number of features
K <- 25  #number of groups

B <- matrix(sample(c(rep(1,p*K*0.1),rep(0, p*K-as.integer(p*K*0.1)))),nrow=p,ncol=K)
X1 <- matrix(rnorm(N*p,1,1),nrow=N,ncol=p)
Y1 <- X1%*%B+matrix(rnorm(N*K,0,1),N,K)

## Do cross subsampling

train <- replicate(2, sample(1:N, 50), simplify = FALSE)
test <- lapply(train, function(idx) (1:N)[-idx])

lambda <- lapply(train, function(idx)
lsgl::lambda(
	x = X1[idx,],
	y = Y1[idx,],
	alpha = 1,
	d = 15L,
	lambda.min = 5,
	intercept = FALSE)
)

fit.sub <- lsgl::subsampling(
 x = X1,
 y = Y1,
 alpha = 1,
 lambda = lambda,
 train = train,
 test = test,
 intercept = FALSE
)

Err(fit.sub)

## Do the same cross subsampling using 2 parallel units
cl <- makeCluster(2)
registerDoParallel(cl)

# Run subsampling
# Using a lambda sequence ranging from the maximal lambda to 0.1 * maximal lambda
fit.sub <- lsgl::subsampling(
 x = X1,
 y = Y1,
 alpha = 1,
 lambda = 0.1,
 train = train,
 test = test,
 intercept = FALSE
)

stopCluster(cl)

Err(fit.sub)