knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

fwelnet is a package that fits the feature-weighted elastic net (fwelnet), a variant of the elastic net which has feature-specific penalties. These penalties are based on additional information that the user has on the features. fwelnet works with continuous and binary responses.

We introduce some notation that we will use throughout this vignette. Let there be $n$ observations, each with feature vector $x_i \in \mathbb{R}^p$ and response $y_i$. Let $\mathbf{X} \in \mathbb{R}^{n \times p}$ denote the overall feature matrix, and let $y \in \mathbb{R}^n$ denote the vector of responses. Let $X_j \in \mathbb{R}^n$ denote the $j$th column of $\mathbf{X}$.

Assume that we also have some information on the features themselves. We organize these "features of features" into an auxiliary matrix $\mathbf{Z} \in \mathbb{p \times K}$, where $p$ is the number of features and $K$ is the number of sources of feature information. Each column of $\mathbf{Z}$ represents the values for each feature information source, while each row of $\mathbf{Z}$, denoted $\mathbf{z}_j^T$ for the $j$th row, represents the values that a particular feature takes on for the $K$ different sources.

Select hyperparameters $\alpha \in [0, 1]$ and $\lambda$ values $\lambda_1 > \dots > \lambda_m$. Here is a brief sketch of the fwelnet model-fitting algorithm:

  1. For $i = 1, \dots, m$, initialize $\beta^{(0)}(\lambda_i)$ at the elastic net solution for the corresponding $\lambda_i$. Set $\theta^{(0)} = \mathbf{0}$.

  2. For $k = 0, 1, \dots$ until convergence:

    (a) Set $\Delta \theta$ to be the component-wise mean/median of the gradient of the fwelnet objective function over $\lambda_1, \dots, \lambda_m$.

    (b) Set $\theta^{(k+1)} = \theta^{(k)} - \eta \Delta \theta$, where $\eta$ is the step size computed via backtracking line search to ensure that the mean/median of the fwelnet objective function is decreasing.

    (c) For $i = 1, \dots, m$, set $\beta^{(k+1)}(\lambda_i)$ to be elastic net solution for $\lambda_i$ where the penalty factor for feature $j$ is $\left(\sum_{\ell = 1}^p \exp (\mathbf{z}_\ell^T \theta) \right) / (p \exp (\mathbf{z}_j^T \theta))$.

For a new observation $\begin{pmatrix} x_1' & \dots & x_p'\end{pmatrix}$, the fwelnet prediction at $\lambda_i$ would be $\hat{\beta}0(\lambda_i) + \sum{j=1}^p x_j' \hat{\beta}_j (\lambda_i)$. For more details, see our preprint.

The fwelnet() function fits this model and returns an object with class "fwelnet". The fwelnet package includes methods for prediction for "fwelnet" objects, as well as a function which performs $k$-fold cross-validation for fwelnet().

Installation

The fwelnet package can be installed from Github. Type the following command in R console:

devtools::install_github("kjytay/fwelnet")

The fwelnet() function

The purpose of this section is to give users an overview of the fwelnet() function, which is the heart of this package. First, we load the fwelnet package:

library(fwelnet)

Let's generate some data. In this example, we assume that we have 40 features, and that these features come in 4 groups of 10. The response is a linear combination of the features from the first 2 groups with additive Gaussian noise.

set.seed(1)
n <- 100
p <- 40
groups <- list(1:10, 11:20, 21:30, 31:40)  # which features belong to which group
x <- matrix(rnorm(n * p), nrow = n, ncol = p)
beta <- matrix(rep(1:0, each = 20), ncol = 1)
y <- x %*% beta + rnorm(n)

In order to fit a fwelnet model, we have to define a feature information matrix. In our example, we have $\mathbf{Z} \in \mathbb{R}^{40 \times 4}$, with $z_{jk} = 1{ \text{feature } j \text{ belongs to group } k }$.

# generate Z matrix
z <- matrix(0, nrow = p, ncol = length(groups))
for (i in 1:length(groups)) {
    z[groups[[i]], i] <- 1
}

Once z is specified, we can fit the fwelnet model with fwelnet(). This returns an object of class "fwelnet".

fit <- fwelnet(x, y, z)

To print model-fitting information to the console while the function is running, pass verbose = TRUE:

fit <- fwelnet(x, y, z, verbose = TRUE)

Looking into the "fwelnet" object

Predictions for the fwelnet model can be obtained using the generic predict method. Predictions are returned for the lambda values in the "fwelnet" object. For example, the code below returns predictions for the first 5 observations in the training set for the 20th lambda value:

predict(fit, x[1:5, ])[, 20]

By default, the coef method returns the intercept and coefficients for the model for the whole lambda path as a sparse matrix. The code below returns the coefficients (with intercept) for the 20th lambda value:

as.numeric(coef(fit)[, 20])

The "fwelnet" object contains other key-value pairs that might be of interest. If the name of our object is fit, fit$lambda contains the values in the lambda path at which the fwelnet model is computed. fit$nzero tells us the number of non-zero coefficients at each lambda value. While not used in prediction, fit$theta gives us an indication of the relative importance of each source of feature information (see paper for details).

fwelnet() function options

fwelnet() shares a number of function options which are common with other functions for regularized models like glmnet::glmnet() (lambda, family, alpha, standardize, thresh). fwelnet() has some unique function options which we describe here:

Cross-validation (CV)

We can perform $k$-fold cross-validation (CV) for fwelnet with cv.fwelnet(). It does 10-fold cross-validation by default:

set.seed(10)
cvfit <- cv.fwelnet(x, y, z)

We can change the number of CV folds using the nfolds option:

cvfit <- cv.fwelnet(x, y, z, nfolds = 5)

If we want to specify which observation belongs to which fold, we can do that by specifying the foldid option, which is a vector of length $n$, with the $i$th element being the fold number for observation $i$.

set.seed(3)
foldid <- sample(rep(seq(10), length = n))
cvfit <- cv.fwelnet(x, y, z, foldid = foldid)

A cv.fwelnet() call returns a cv.fwelnet object. We can plot this object to get the CV curve with error bars (one standard error in each direction). The left vertical dotted line represents lambda.min, the lambda value which attains minimum CV error, while the right vertical dotted line represents lambda.1se, the largest lambda value with CV error within one standard error of the minimum CV error.

plot(cvfit)

The numbers at the top represent the number of features that are included in the model (i.e. the number of $j$ such that $\hat{\beta}_j$ is non-zero).

The two special lambda values can be extracted directly from the cv.fwelnet object as well:

cvfit$lambda.min
cvfit$lambda.1se

Predictions can be made from the fitted cv.fwelnet object. By default, predictions are given for lambda being equal to lambda.1se. To get predictions are lambda.min, set s = "lambda.min".

predict(cvfit, x[1:5, ])   # s = lambda.1se
predict(cvfit, x[1:5, ], s = "lambda.min")

fwelnet for binary data

Fwelnet models can be fit for binary data as well. In this case, a logistic model is assumed. Pass family = "binomial" to fwelnet() to fit a logistic model.

bin_y <- ifelse(y > 0, 1, 0)
binfit <- fwelnet(x, bin_y, z, family = "binomial")

When fitting a binomial model, note that the default return value for the predict method is the linear predictor $x^T \hat{\beta}$. To obtain fitted probabilities, pass type = "response":

# linear predictor
predict(binfit, x[1:5, ])[, 20]

# fitted probabilities
predict(binfit, x[1:5, ], type = "response")[, 20]


kjytay/fwelnet documentation built on June 9, 2020, 1:39 p.m.