| glmtrans | R Documentation | 
Fit a transfer learning generalized linear model through elastic net regularization with target data set and multiple source data sets. It also implements a transferable source detection algorithm, which helps avoid negative transfer in practice. Currently can deal with Gaussian, logistic and Poisson models.
glmtrans(
  target,
  source = NULL,
  family = c("gaussian", "binomial", "poisson"),
  transfer.source.id = "auto",
  alpha = 1,
  standardize = TRUE,
  intercept = TRUE,
  nfolds = 10,
  cores = 1,
  valid.proportion = NULL,
  valid.nfolds = 3,
  lambda = c(transfer = "lambda.1se", debias = "lambda.min", detection = "lambda.1se"),
  lambda.seq = list(transfer = NULL, debias = NULL, detection = NULL),
  detection.info = TRUE,
  target.weights = NULL,
  source.weights = NULL,
  C0 = 2,
  ...
)
| target | target data. Should be a list with elements x and y, where x indicates a predictor matrix with each row/column as a(n) observation/variable, and y indicates the response vector. | 
| source | source data. Should be a list with some sublists, where each of the sublist is a source data set, having elements x and y with the same meaning as in target data. | 
| family | response type. Can be "gaussian", "binomial" or "poisson". Default = "gaussian". 
 | 
| transfer.source.id | transferable source indices. Can be either a subset of  
 | 
| alpha | the elasticnet mixing parameter, with  
 .  | 
| standardize | the logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is  | 
| intercept | the logical indicator of whether the intercept should be fitted or not. Default =  | 
| nfolds | the number of folds. Used in the cross-validation for GLM elastic net fitting procedure. Default = 10. Smallest value allowable is  | 
| cores | the number of cores used for parallel computing. Default = 1. | 
| valid.proportion | the proportion of target data to be used as validation data when detecting transferable sources. Useful only when  | 
| valid.nfolds | the number of folds used in cross-validation procedure when detecting transferable sources. Useful only when  | 
| lambda | a vector indicating the choice of lambdas in transferring, debiasing and detection steps. Should be a vector with names "transfer", "debias", and "detection", each component of which can be either "lambda.min" or "lambda.1se". Component  
 | 
| lambda.seq | the sequence of lambda candidates used in the algorithm. Should be a list of three vectors with names "transfer", "debias", and "detection". Default = list(transfer = NULL, debias = NULL, detection = NULL). "NULL" means the algorithm will determine the sequence automatically, based on the same method used in  | 
| detection.info | the logistic flag indicating whether to print detection information or not. Useful only when  | 
| target.weights | weight vector for each target instance. Should be a vector with the same length of target response. Default =  | 
| source.weights | a list of weight vectors for the instances from each source. Should be a list with the same length of the number of sources. Default =  | 
| C0 | the constant used in the transferable source detection algorithm. See Algorithm 2 in Tian, Y. & Feng, Y. (2023). Default = 2. | 
| ... | additional arguments. | 
An object with S3 class "glmtrans".
| beta | the estimated coefficient vector. | 
| family | the response type. | 
| transfer.source.id | the transferable source index. If in the input,  | 
| fitting.list | a list of other parameters of the fitted model. | 
w_a: the estimator obtained from the transferring step.
delta_a: the estimator obtained from the debiasing step.
target.valid.loss: the validation (or cross-validation) loss on target data. Only available when transfer.source.id = "auto".
source.loss: the loss on each source data. Only available when transfer.source.id = "auto".
threshold: the threshold to determine transferability. Only available when transfer.source.id = "auto".
Tian, Y., & Feng, Y. (2023). Transfer learning under high-dimensional generalized linear models. Journal of the American Statistical Association, 118(544), 2684-2697.
Li, S., Cai, T.T. & Li, H. (2020). Transfer learning for high-dimensional linear regression: Prediction, estimation, and minimax optimality. arXiv preprint arXiv:2006.10593.
Friedman, J., Hastie, T. & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), p.1.
Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2), pp.301-320.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), pp.267-288.
predict.glmtrans, source_detection, models, plot.glmtrans, cv.glmnet, glmnet.
set.seed(0, kind = "L'Ecuyer-CMRG")
# fit a linear regression model
D.training <- models("gaussian", type = "all", n.target = 100, K = 2, p = 500)
D.test <- models("gaussian", type = "target", n.target = 500, p = 500)
fit.gaussian <- glmtrans(D.training$target, D.training$source)
y.pred.glmtrans <- predict(fit.gaussian, D.test$target$x)
# compare the test MSE with classical Lasso fitted on target data
library(glmnet)
fit.lasso <- cv.glmnet(x = D.training$target$x, y = D.training$target$y)
y.pred.lasso <- predict(fit.lasso, D.test$target$x)
mean((y.pred.glmtrans - D.test$target$y)^2)
mean((y.pred.lasso - D.test$target$y)^2)
# fit a logistic regression model
D.training <- models("binomial", type = "all", n.target = 100, K = 2, p = 500)
D.test <- models("binomial", type = "target", n.target = 500, p = 500)
fit.binomial <- glmtrans(D.training$target, D.training$source, family = "binomial")
y.pred.glmtrans <- predict(fit.binomial, D.test$target$x, type = "class")
# compare the test error with classical Lasso fitted on target data
library(glmnet)
fit.lasso <- cv.glmnet(x = D.training$target$x, y = D.training$target$y, family = "binomial")
y.pred.lasso <- as.numeric(predict(fit.lasso, D.test$target$x, type = "class"))
mean(y.pred.glmtrans != D.test$target$y)
mean(y.pred.lasso != D.test$target$y)
# fit a Poisson regression model
D.training <- models("poisson", type = "all", n.target = 100, K = 2, p = 500)
D.test <- models("poisson", type = "target", n.target = 500, p = 500)
fit.poisson <- glmtrans(D.training$target, D.training$source, family = "poisson")
y.pred.glmtrans <- predict(fit.poisson, D.test$target$x, type = "response")
# compare the test MSE with classical Lasso fitted on target data
fit.lasso <- cv.glmnet(x = D.training$target$x, y = D.training$target$y, family = "poisson")
y.pred.lasso <- as.numeric(predict(fit.lasso, D.test$target$x, type = "response"))
mean((y.pred.glmtrans - D.test$target$y)^2)
mean((y.pred.lasso - D.test$target$y)^2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.