source_detection: Transferable source detection for GLM transfer learning...
In glmtrans: Transfer Learning under Regularized Generalized Linear Models

View source: R/source_detection.R

source_detection

R Documentation

Transferable source detection for GLM transfer learning algorithm.

Description

Detect transferable sources from multiple source data sets. Currently can deal with Gaussian, logistic and Poisson models.

Usage

source_detection(
  target,
  source = NULL,
  family = c("gaussian", "binomial", "poisson"),
  alpha = 1,
  standardize = TRUE,
  intercept = TRUE,
  nfolds = 10,
  cores = 1,
  valid.nfolds = 3,
  lambda = "lambda.1se",
  lambda.seq = NULL,
  detection.info = TRUE,
  target.weights = NULL,
  source.weights = NULL,
  C0 = 2,
  ...
)

Arguments

`target`	target data. Should be a list with elements x and y, where x indicates a predictor matrix with each row/column as a(n) observation/variable, and y indicates the response vector.
`source`	source data. Should be a list with some sublists, where each of the sublist is a source data set, having elements x and y with the same meaning as in target data.
`family`	response type. Can be "gaussian", "binomial" or "poisson". Default = "gaussian". "gaussian": Gaussian distribution. "binomial": logistic distribution. When `family = "binomial"`, the input response in both `target` and `source` should be 0/1. "poisson": poisson distribution. When `family = "poisson"`, the input response in both `target` and `source` should be non-negative.
`alpha`	the elasticnet mixing parameter, with `0 \leq \alpha \leq 1`. The penality is defined as `(1-\alpha)/2\|\|\beta\|\|_2^2+\alpha \|\|\beta\|\|_1` . `alpha = 1` encodes the lasso penalty while `alpha = 0` encodes the ridge penalty. Default = 1.
`standardize`	the logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is `TRUE`.
`intercept`	the logical indicator of whether the intercept should be fitted or not. Default = `TRUE`.
`nfolds`	the number of folds. Used in the cross-validation for GLM elastic net fitting procedure. Default = 10. Smallest value allowable is `nfolds = 3`.
`cores`	the number of cores used for parallel computing. Default = 1.
`valid.nfolds`	the number of folds used in cross-validation procedure when detecting transferable sources. Useful only when `transfer.source.id = "auto"`. Default = 3.
`lambda`	lambda (the penalty parameter) used in the transferable source detection algorithm. Can be either "lambda.min" or "lambda.1se". Default = "lambda.1se".
`lambda.seq`	the sequence of lambda candidates used in the algorithm. Should be a vector of numerical values. Default = NULL, which means the algorithm will determine the sequence automatically, based on the same method used in `cv.glmnet`.
`detection.info`	the logistic flag indicating whether to print detection information or not. Useful only when `transfer.source.id = "auto"`. Default = `TURE`.
`target.weights`	weight vector for each target instance. Should be a vector with the same length of target response. Default = `NULL`, which makes all instances equal-weighted.
`source.weights`	a list of weight vectors for the instances from each source. Should be a list with the same length of the number of sources. Default = `NULL`, which makes all instances equal-weighted.
`C0`	the constant used in the transferable source detection algorithm. See Algorithm 2 in Tian, Y. and Feng, Y., 2021. Default = 2. "lambda.min": value of lambda that gives minimum mean cross-validated error in the sequence of lambda. "lambda.1se": largest value of lambda such that error is within 1 standard error of the minimum.
`...`	additional arguments.

Value

An object with S3 class "glmtrans_source_detection".

`transfer.source.id`	the index of transferable sources.
`source.loss`	the loss on each source data. Only available when `transfer.source.id = "auto"`.
`target.valid.loss`	the validation (or cross-validation) loss on target data. Only available when `transfer.source.id = "auto"`.
`threshold`	the threshold to determine transferability. Only available when `transfer.source.id = "auto"`.

Note

source.loss and threshold outputed by source_detection can be visualized by function plot.glmtrans.

References

Tian, Y., & Feng, Y. (2023). Transfer learning under high-dimensional generalized linear models. Journal of the American Statistical Association, 118(544), 2684-2697.

Li, S., Cai, T.T. & Li, H., (2020). Transfer learning for high-dimensional linear regression: Prediction, estimation, and minimax optimality. arXiv preprint arXiv:2006.10593.

Friedman, J., Hastie, T. & Tibshirani, R., (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), p.1.

Zou, H. & Hastie, T., (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2), pp.301-320.

Tibshirani, R., (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), pp.267-288.

Examples

set.seed(0, kind = "L'Ecuyer-CMRG")

# study the linear model
D.training <- models("gaussian", type = "all", K = 2, p = 500, Ka = 1, n.target = 100, cov.type = 2)
detection.gaussian <- source_detection(D.training$target, D.training$source)
detection.gaussian$transfer.source.id


# study the logistic model
D.training <- models("binomial", type = "all", K = 2, p = 500, Ka = 1, n.target = 100, cov.type = 2)
detection.binomial <- source_detection(D.training$target, D.training$source,
family = "binomial", cores = 2)
detection.binomial$transfer.source.id


# study Poisson model
D.training <- models("poisson", type = "all", K = 2, p = 500, Ka = 1, n.target = 100, cov.type = 2)
detection.poisson <- source_detection(D.training$target, D.training$source,
family = "poisson", cores = 2)
detection.poisson$transfer.source.id

glmtrans documentation built on April 4, 2025, 12:32 a.m.