rgam: Fit reluctant generalized additive model

Description Usage Arguments Details Value Examples

View source: R/rgam.R

Description

Fits a reluctant generalized additive model (RGAM) for an entire regularization path indexed by the parameter lambda. Fits linear, logistic, Poisson and Cox regression models. RGAM is a three-step algorithm: Step 1 fits the lasso and computes residuals, Step 2 constructs the non-linear features, and Step 3 fits a lasso of the response on both the linear and non-linear features.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
rgam(
  x,
  y,
  lambda = NULL,
  lambda.min.ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  standardize = TRUE,
  nl_maker = relgam:::makef,
  family = c("gaussian", "binomial", "poisson", "cox"),
  offset = NULL,
  init_nz,
  nfolds = 5,
  foldid = NULL,
  gamma,
  parallel = FALSE,
  verbose = TRUE,
  ...
)

Arguments

x

Input matrix, of dimension nobs x nvars; each row is an observation vector.

y

Response variable. Quantitative for family = "gaussian" or family = "poisson" (non-negative counts). For family="binomial", should be a numeric vector consisting of 0s and 1s. For family="cox", y should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating death, and '0' indicating right-censored.

lambda

A user-supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence; supplying a value of lambda overrides this.

lambda.min.ratio

Smallest value for lambda as a fraction of the largest lambda value. The default depends on the sample size nobs relative to the number of variables nvars. If nobs > nvars, the default is 0.0001, close to zero. If nobs < nvars, the default is 0.01.

standardize

If TRUE (default), the columns of the input matrix are standardized before the algorithm is run. See details section for more information.

nl_maker

This is a function that takes in one feature x and one response r and fits a model of r on x. It returns a list of two items: (i) f which is a vector of fitted values, and (ii) a function nl_predictor which returns the fit for a given newx value. Additional arguments for nl_maker can be passed via .... The default is relgam::makef, which fits a smoothing spline with a user-specified degrees of freedom.

family

Response type. Either "gaussian" (default) for linear regression, "binomial" for logistic regression, "poisson" for Poisson regression or "cox" for Cox regression.

offset

A vector of length nobs. Useful for the "poisson" family (e.g. log of exposure time), or for refining a model by starting at a current fit. Default is NULL. If supplied, then values must also be supplied to the predict function.

init_nz

A vector specifying which features we must include when computing the non-linear features. Default is to construct non-linear features for all given features.

nfolds

Number of folds for CV in Step 1 (default is 5). Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds = 3.

foldid

An optional vector of values between 1 and nfolds identifying what fold each observation is in. If supplied, nfolds can be missing.

gamma

Scale factor for non-linear features (vs. original features), to be between 0 and 1. Default is 0.8 if init_nz = c(), 0.6 otherwise.

parallel

If TRUE, the cv.glmnet() call in Step 1 is parallelized. Must register parallel before hand, such as doMC or others. Default is FALSE.

verbose

If TRUE (default), model-fitting is tracked with a progress bar.

...

Any additional arguments to be the non-linear fitter in Step 2.

Details

If there are variables which the user definitely wants to compute non-linear versions for in Step 2 of the algorithm, they can be specified as a vector for the init_nz option. The algorithm will compute non-linear versions for these features as well as the features suggested by Step 1 of the algorithm.

If standardize = TRUE, the standard deviation of the linear and non-linear features would be 1 and gamma respectively. If standardize = FALSE, linear features will remain on their original scale while non-linear features would have standard deviation gamma times the mean standard deviation of the linear features.

For family="gaussian", rgam standardizes y to have unit variance (using 1/n rather than 1/(n-1) formula).

Value

An object of class "rgam".

full_glmfit

The glmnet object resulting from Step 3: fitting a glmnet model for the response against the linear & non-linear features.

nl_predictor

List of functions used to get the non-linear features for new data. For internal use only.

init_nz

Column indices for the features which we allow to have non-linear relationship with the response.

step1_nz

Indices of features which CV in Step 1 chose.

mxf

Means of the features (both linear and non-linear).

sxf

Scale factors of the features (both linear and non-linear).

feat

Column indices of the non-zero features for each value of lambda.

linfeat

Column indices of the non-zero linear components for each value of lambda.

nonlinfeat

Column indices of the non-zero non-linear components for each value of lambda.

nzero_feat

The number of non-zero features for each value of lambda.

nzero_lin

The number of non-zero linear components for each value of lambda.

nzero_nonlin

The number of non-zero non-linear components for each value of lambda.

lambda

The actual sequence of lambda values used.

p

The number of features in the original data matrix.

family

Response type.

call

The call that produced this object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
set.seed(1)
n <- 100; p <- 20
x <- matrix(rnorm(n * p), n, p)
beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1)
y <- x %*% beta + rnorm(n)

fit <- rgam(x, y)

# construct non-linear features for only those selected by Step 1
fit <- rgam(x, y, init_nz = c())

# specify scale factor gamma and degrees of freedom
fit <- rgam(x, y, gamma = 1, df = 6)

# binomial family
bin_y <- ifelse(y > 0, 1, 0)
fit2 <- rgam(x, bin_y, family = "binomial")

# Poisson family
poi_y <- rpois(n, exp(x %*% beta))
fit3 <- rgam(x, poi_y, family = "poisson")
# Poisson with offset
offset <- rnorm(n)
fit3 <- rgam(x, poi_y, family = "poisson", offset = offset)

kjytay/relgam documentation built on March 4, 2020, 4:13 a.m.