dot-makeLarsData: Add columns to a data matrix to represent polynomial and...

.makeLarsDataR Documentation

Add columns to a data matrix to represent polynomial and interaction terms

Description

This function adds columns to a data matrix representing quadratic, cubic, 2-way-interaction, and linear:quadratic interactions. It is especially useful for preparing a data matrix for trainLars or predictLars.

Usage

.makeLarsData(
  data,
  resp,
  preds,
  scale = TRUE,
  quadratic = TRUE,
  cubic = TRUE,
  interaction = TRUE,
  interQuad = TRUE,
  na.rm = FALSE
)

Arguments

data

Data frame.

resp

Character or integer or NULL. Name or index of colum in data that represents the response variable. If NULL then it is assumed there is no response column in data.

preds

Character or integer. Names or indices of columns to use as predictors.

scale

Logical or a list. If TRUE then scale values in data[ , preds] to have mean of 0 and unit variance. Note that scaling is done before adding terms. If a list, then this is the same as, for example, the two attributes from attributes(scale(data[ , preds])) named `scaled:center`and `scaled:scale`. Ergo, they each are list of the centers and scales (usually means and standard deviation) of each column of data[ , preds], and each has names given by preds.

quadratic

Logical. If TRUE then include quadratic terms in model construction stage for non-factor predictors. Quadratic columns will be named <predictor name>_pow2.

cubic

Logical. If TRUE then include cubic terms in model construction stage for non-factor predictors. Cubic columns will be named <predictor name>_pow3.

interaction

Logical. If TRUE then include 2-way interaction terms (including interactions between factor predictors). Interaction columns will be named <predictor 1 name>_by_<predictor 2 name>.

interQuad

Logical. If TRUE then include all possible interactions of the form x * y^2 unless y is a factor (linear-by-quadratic features). Linear-by-quadratic columns will be named <predictor 1 name>_by_<predictor 2 name>_pow2.

na.rm

Logical. If TRUE then remove all rows of data in which there is at least one NA among resp or preds. The default is FALSE, which will cause an error if any row has an NA.

Details

If scale is TRUE then predictors with zero variance will be removed from the data before creating higher-order terms.

Value

An object of class larsData (which is also a list) with six elements: * A character named "resp" indicating the name of the column of that contains the response variable. * A character list named "preds" indicating the name of the columns of that contain the original predictors. * A data frame named "data" containing data but with extra columns representing the added terms; * A list object named "scales" representing the scale (mean and standard derviation) used to center and rescale values in the data frame; and * A list named "groups" with groups (names of predictors that should be considered together based on marginality). * A list named "features" indicating what kind of features were added to data (e.g., quadratic, cubic, etc.).

See Also

trainLars, predictLars

Examples

## Not run: 
set.seed(123)
x <- data.frame(y=c(rep(1, 10), rep(0, 10)), x1=1:10, x2=runif(20) * 1:20, x3=rnorm(20) - 1:20)
out <- .makeLarsData(x, resp='y', preds=c('x1', 'x2', 'x3'))
str(out)

## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.