title: "A demonstration of the glmtrans package"
author: "Ye Tian and Yang Feng"
date: "r Sys.Date()
"
header-includes:
- \usepackage{dsfont}
- \usepackage{bm}
- \usepackage[mathscr]{eucal}
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{glmtrans-demo}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
We provide an introductory demo of the usage for the \verb+glmtrans+ package. This package implements the transfer learning algorithms for high-dimensional generalized linear models (@tian2021transfer).
library(formatR)
Given the predictor $\bm{x}$, if response $y$ follows the generalized linear models (GLMs), then its distribution satisfies $$ y|\bm{x} \sim \mathbb{P}(y|\bm{x}) = \rho(y)\exp{y\bm{x}^T\bm{w} - \psi(\bm{x}^T\bm{w})}, $$ where $\psi'(\bm{x}^T\bm{w}) = \mathbb{E}(y|\bm{x})$ is called the \emph{inverse link function} (@mccullagh1989generalized). Another important property is that $\text{Var}(y|\bm{x})=\psi''(\bm{x}^T\bm{w})$, which is derived from the exponential family property. It is $\psi$ which characterizes different GLMs. For example, in Gaussian model, we have the continuous response $y$ and $\psi(u)=\frac{1}{2}u^2$; in the logistic model, $y$ is binary and $\psi(u)=\log(1+e^u)$; and in Poisson model, we have the integral response $y$ and $\psi(u)=e^u$.
Consider the multi-source transfer learning problem. Suppose we have the \emph{target} data $(X^{(0)}, Y^{(0)}) = {\bm{x}{i}^{(0)}, y{i}^{(0)}}{i=1}^{n_0}$ and \emph{source} data ${(X^{(k)}, Y^{(k)})}{k=1}^K = {{(\bm{x}{i}^{(k)}, y{i}^{(k)})}{i=1}^{n_k}}{k=1}^K$ for $k = 1,\ldots, K$. Denote the target coefficient $\bm{\beta} = \bm{w}^{(0)}$. Suppose target and source data follow the GLM as $$ y^{(k)}|\bm{x}^{(k)} \sim \mathbb{P}(y^{(k)}|\bm{x}^{(k)}) = \rho(y^{(k)})\exp{y^{(k)}(\bm{x}^{(k)})^T\bm{w}^{(k)} - \psi((\bm{x}^{(k)})^T\bm{w}^{(k)})}. $$
In order to borrow information from transferable sources, @bastani2020predicting and @li2020transfer developed \emph{two-step transfer learning algorithms} for high-dimensional linear models. In the first step, an approximate estimator is achieved via the information from the target data and useful source data. In the second step, the target data is used to debias the estimator obtained from the first step, leading to the final estimator.
@tian2021transfer extends the idea into GLM and proposes the corresponding \emph{oracle} algorithm, which can be easily applied when transferable sources are known. It is proved to enjoy a sharper bound of $\ell_2$-estimation error when the transferable source and target data are sufficiently similar.
In the multi-source transfer learning problem, some adversarial sources may share little similarity with the target, which can mislead the fitted model. We call this phenomenon as \emph{negative transfer} (@pan2009survey, @torrey2010transfer, @weiss2016survey).
To detect which sources are transferable, @tian2021transfer develops an \emph{algorithm-free} detection approach. Simply speaking, it tries to compute the gain for transferring each single source and compare it with the baseline where only the target data is used. The sources enjoying significant performance gain compared with the baseline are regarded as transferable ones. @tian2021transfer also proves the detection consistency property for this method under the high-dimensional GLM setting.
The implementation of this package leverages on package glmnet
, which applies the cyclic coordinate gradient descent and is very efficient (@friedman2010regularization). We use the argument offset
provided by the function glmnet
and cv.glmnet
to implement our two-step algorithms. Besides Lasso (@tibshirani1996regression), this package can adapt the elastic net type penalty (@zou2005regularization).
glmtrans
is now available on CRAN and can be easily intalled by one-line code.
install.packages("glmtrans", repos = "http://cran.us.r-project.org")
Then we can load the package:
library(glmtrans)
In this section, we show the user how to use the provided functions to fit the model, make predictions and visualize the results. We take logistic data as an example.
We first generate some logistic data through function models
. For target data, we set the coefficient vector $\bm{\beta} = (0.5\cdot \bm{1}s, \bm{0}{p-s})$, where $p = 500$ and $s = 10$. For $k$ in transferable source index set $\mathcal{A}$, let $\bm{w}^{(k)} = \bm{\beta} + h/p\cdot\bm{\mathscr{R}}p^{(k)}$, where $h = 5$ and $\bm{\mathscr{R}}_p^{(k)}$ are $p$ independent Rademacher variables (being $-1$ or $1$ with equal probability) for any $k$. $\bm{\mathscr{R}}_p^{(k)}$ is independent with $\bm{\mathscr{R}}_p^{(k')}$ for any $k \neq k'$. The coefficient of non-transferable sources is set to be $\bm{\xi}+h/p\cdot \bm{\mathscr{R}}_p^{(k)}$. And $\bm{\xi}{S'} = 0.5 \cdot \bm{1}{2s}, \quad \bm{\xi}{(S')^c} = \bm{0}{p-2s}$,
where $S' = S'_1 \cup S_2'$ and $|S_1'| = |S_2'| = s = 10$. $S_1' = {s+1, \ldots, 2s}$, and $S_2'$ is randomly sampled from ${2s+1, \ldots, p}$. We also add an intercept $0.5$. The generating procedure of each non-transferable source data is independent. The target predictor $\bm{x}^{(k)}_i \overset{i.i.d.}{\sim} N(\bm{0}, \bm{\Sigma})$, where $\bm{\Sigma} = (0.9^{|i-j|}){p \times p}$. The source predictor $\bm{x}^{(k)}_i \overset{i.i.d.}{\sim} t_4$. The target sample size $n_0 = 100$ and each source sample size $n_k = 100$ for any $k = 1, \ldots, K$. Let $K = 5$, $\mathcal{A} = {1, 2, 3}$.
We generate the training data as follows.
set.seed(1, kind = "L'Ecuyer-CMRG") D.training <- models(family = "binomial", type = "all", cov.type = 2, Ka = 3, K = 5, s = 10, n.target = 100, n.source = rep(100, 5))
Then suppose we know $\mathcal{A}$, let's fit an "oracle" GLM transfer learning model on the target data and source data in $\mathcal{A}$ by the oracle algorithm. We denote this procedure as Oralce-Trans-GLM.
fit.oracle <- glmtrans(target = D.training$target, source = D.training$source, family = "binomial", transfer.source.id = 1:3, cores = 2)
Notice that we set the argument transfer.source.id
equal to $\mathcal{A} = {1, 2, 3}$ to transfer only the first three sources.
And the output of glmtrans
function is an object belonging to S3 class "glmtrans". It contains:
beta: the estimated coefficient vector.
family: the response type.
transfer.source.id: the transferable souce index. If in the input, transfer.source.id = 1:length(source)
or transfer.source.id = "all"
, then the outputed transfer.source.id = 1:length(source)
. If the inputed transfer.source.id = "auto"
, only transferable source detected by the algorithm will be outputed.
fitting.list:
transfer.source.id = "auto"
.transfer.source.id = "auto"
.epsilon0
)$\cdot$loss of validation (cv) target data. Only available when transfer.source.id = "auto"
.transfer.source.id = "auto"
.Then suppose we do not know $\mathcal{A}$, let's set transfer.source.id = "auto"
to apply the transferable source detection algorithm to get estimate $\hat{\mathcal{A}}$. After that, glmtrans
will automatically run the oracle algorithm on $\hat{\mathcal{A}}$ to fit the model. We denote the approach as Trans-GLM.
fit.detection <- glmtrans(target = D.training$target, source = D.training$source, family = "binomial", transfer.source.id = "auto", cores = 2)
From the results, we could see that $\mathcal{A} = {1, 2, 3}$ is successfully detected via the detection algorithm. Next, to demonstrate the effectiveness of GLM transfer learning algorithm and the transferable source detection algorithm, we also fit the naive Lasso on target data (Lasso) and transfer learning model using all source data (Pooled-Trans-GLM) as baselines.
library(glmnet) fit.lasso <- cv.glmnet(x = D.training$target$x, y = D.training$target$y, family = "binomial") fit.pooled <- glmtrans(target = D.training$target, source = D.training$source, family = "binomial", transfer.source.id = "all", cores = 2)
Finally, we compare the $\ell_2$-estimation errors of the target coefficient $\bm{\beta}$ by different methods.
beta <- c(0, rep(0.5, 10), rep(0, 500-10)) er <- numeric(4) names(er) <- c("Lasso", "Pooled-Trans-GLM", "Trans-GLM", "Oracle-Trans-GLM") er["Lasso"] <- sqrt(sum((coef(fit.lasso)-beta)^2)) er["Pooled-Trans-GLM"] <- sqrt(sum((fit.pooled$beta-beta)^2)) er["Trans-GLM"] <- sqrt(sum((fit.detection$beta-beta)^2)) er["Oracle-Trans-GLM"] <- sqrt(sum((fit.oracle$beta-beta)^2)) er
Note that the transfer learning models outperform the classical Lasso fitted on target data. And due to negative transfer, Pooled-Trans-GLM performs worse than Oracle-Trans-GLM. By correctly detecting $\mathcal{A}$, the behavior of Trans-GLM mimics the oracle.
We could visualize the transferable source detection results by applying plot
function on objects in class "glmtrans" or "glmtrans_source_detection". Loss of each source and the transferability threshold will be drawed. Function glmtrans
outputs objects in class "glmtrans", while function source_detection
outputs objects in class "glmtrans_source_detection". The function source_detection
detects $\mathcal{A}$ without the post-detecting model fitting step.
Call plot
function to visualize the results as follows.
plot(fit.detection)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.