Fit a regularized lasso model using customized training
Customized training is a simple strategy for making predictions on test data when the features of the test data are available at the time of model fitting. The method clusters the data to find training points close to each test point and then fits a GLM elastic net model separately in each training cluster. In this way, customized training is a localized method for transductive learning. In contrast with local regression, however, instead of fitting a separate regression model for each test point, customized training fits only one model for each of a handful of clusters.
customizedGlmnet() to fit a
glmnet() model with customized
cv.customizedGlmnet() to do the same while choosing the
regularization parameter and potentially the number of groups using
predict() methods are
implemented for both
Scott Powers, Trevor Hastie, Robert Tibshirani
Maintainer: Scott Powers <email@example.com>
Scott Powers, Trevor Hastie and Robert Tibshirani (2015) "Customized training with an application to mass specrometric imaging of gastric cancer data." Annals of Applied Statistics 9, 4:1709-1725.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
require(glmnet) # Simulate three groups, each with a different sparse linear model # producing the response, and fit customized training model (with CV) to the # synthetic data # Simulation parameters: n = m = 300 p = 100 q = 10 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) # Produce the synthetic data: beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] foldid = sample(rep(1:10, length = nrow(x.train))) # Fit the customized training model with CV: fit2 = cv.customizedGlmnet(x.train, y.train, x.test, Gs = 1:3, family = "gaussian", foldid = foldid) # Print the optimal number of groups and value of lambda: fit2$G.min fit2$lambda.min # Print the customized training model fit: fit2 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit2))^2) # Plot nonzero coefficients by group: plot(fit2)