rgam: Fit reluctant generalized additive model
In kjytay/relgam: Reluctant Generalized Additive Models

Description Usage Arguments Details Value Examples

View source: R/rgam.R

Fits a reluctant generalized additive model (RGAM) for an entire regularization path indexed by the parameter lambda. Fits linear, logistic, Poisson and Cox regression models. RGAM is a three-step algorithm: Step 1 fits the lasso and computes residuals, Step 2 constructs the non-linear features, and Step 3 fits a lasso of the response on both the linear and non-linear features.

rgam(
  x,
  y,
  lambda = NULL,
  lambda.min.ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  standardize = TRUE,
  nl_maker = relgam:::makef,
  family = c("gaussian", "binomial", "poisson", "cox"),
  offset = NULL,
  init_nz,
  nfolds = 5,
  foldid = NULL,
  gamma,
  parallel = FALSE,
  verbose = TRUE,
  ...
)

`x`	Input matrix, of dimension `nobs x nvars`; each row is an observation vector.
`y`	Response variable. Quantitative for `family = "gaussian"` or `family = "poisson"` (non-negative counts). For `family="binomial"`, should be a numeric vector consisting of 0s and 1s. For `family="cox"`, y should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating death, and '0' indicating right-censored.
`lambda`	A user-supplied `lambda` sequence. Typical usage is to have the program compute its own `lambda` sequence; supplying a value of lambda overrides this.
`lambda.min.ratio`	Smallest value for lambda as a fraction of the largest lambda value. The default depends on the sample size nobs relative to the number of variables nvars. If nobs > nvars, the default is 0.0001, close to zero. If nobs < nvars, the default is 0.01.
`standardize`	If `TRUE` (default), the columns of the input matrix are standardized before the algorithm is run. See details section for more information.
`nl_maker`	This is a function that takes in one feature `x` and one response `r` and fits a model of `r` on `x`. It returns a list of two items: (i) `f` which is a vector of fitted values, and (ii) a function `nl_predictor` which returns the fit for a given `newx` value. Additional arguments for `nl_maker` can be passed via `...`. The default is `relgam::makef`, which fits a smoothing spline with a user-specified degrees of freedom.
`family`	Response type. Either `"gaussian"` (default) for linear regression, `"binomial"` for logistic regression, `"poisson"` for Poisson regression or `"cox"` for Cox regression.
`offset`	A vector of length nobs. Useful for the "poisson" family (e.g. log of exposure time), or for refining a model by starting at a current fit. Default is NULL. If supplied, then values must also be supplied to the predict function.
`init_nz`	A vector specifying which features we must include when computing the non-linear features. Default is to construct non-linear features for all given features.
`nfolds`	Number of folds for CV in Step 1 (default is 5). Although `nfolds` can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is `nfolds = 3`.
`foldid`	An optional vector of values between 1 and `nfolds` identifying what fold each observation is in. If supplied, `nfolds` can be missing.
`gamma`	Scale factor for non-linear features (vs. original features), to be between 0 and 1. Default is 0.8 if `init_nz = c()`, 0.6 otherwise.
`parallel`	If TRUE, the `cv.glmnet()` call in Step 1 is parallelized. Must register parallel before hand, such as doMC or others. Default is FALSE.
`verbose`	If `TRUE` (default), model-fitting is tracked with a progress bar.
`...`	Any additional arguments to be the non-linear fitter in Step 2.

If there are variables which the user definitely wants to compute non-linear versions for in Step 2 of the algorithm, they can be specified as a vector for the init_nz option. The algorithm will compute non-linear versions for these features as well as the features suggested by Step 1 of the algorithm.

If standardize = TRUE, the standard deviation of the linear and non-linear features would be 1 and gamma respectively. If standardize = FALSE, linear features will remain on their original scale while non-linear features would have standard deviation gamma times the mean standard deviation of the linear features.

For family="gaussian", rgam standardizes y to have unit variance (using 1/n rather than 1/(n-1) formula).

An object of class "rgam".

`full_glmfit`	The glmnet object resulting from Step 3: fitting a `glmnet` model for the response against the linear & non-linear features.
`nl_predictor`	List of functions used to get the non-linear features for new data. For internal use only.
`init_nz`	Column indices for the features which we allow to have non-linear relationship with the response.
`step1_nz`	Indices of features which CV in Step 1 chose.
`mxf`	Means of the features (both linear and non-linear).
`sxf`	Scale factors of the features (both linear and non-linear).
`feat`	Column indices of the non-zero features for each value of `lambda`.
`linfeat`	Column indices of the non-zero linear components for each value of `lambda`.
`nonlinfeat`	Column indices of the non-zero non-linear components for each value of `lambda`.
`nzero_feat`	The number of non-zero features for each value of `lambda`.
`nzero_lin`	The number of non-zero linear components for each value of `lambda`.
`nzero_nonlin`	The number of non-zero non-linear components for each value of `lambda`.
`lambda`	The actual sequence of `lambda` values used.
`p`	The number of features in the original data matrix.
`family`	Response type.
`call`	The call that produced this object.

set.seed(1)
n <- 100; p <- 20
x <- matrix(rnorm(n * p), n, p)
beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1)
y <- x %*% beta + rnorm(n)

fit <- rgam(x, y)

# construct non-linear features for only those selected by Step 1
fit <- rgam(x, y, init_nz = c())

# specify scale factor gamma and degrees of freedom
fit <- rgam(x, y, gamma = 1, df = 6)

# binomial family
bin_y <- ifelse(y > 0, 1, 0)
fit2 <- rgam(x, bin_y, family = "binomial")

# Poisson family
poi_y <- rpois(n, exp(x %*% beta))
fit3 <- rgam(x, poi_y, family = "poisson")
# Poisson with offset
offset <- rnorm(n)
fit3 <- rgam(x, poi_y, family = "poisson", offset = offset)