getPoly: Get polynomial terms

View source: R/getPoly.R

getPolyR Documentation

Get polynomial terms

Description

Generate polynomial terms of predictor variables for a data frame or data matrix.

Usage

getPoly(xdata = NULL, deg = 1, maxInteractDeg = deg,
        Xy = NULL, standardize = FALSE,
        noisy = TRUE, intercept = FALSE, returnDF = TRUE, 
        modelFormula = NULL, retainedNames = NULL, ...)

Arguments

xdata

Data matrix or data frame without response variable. Categorical variables (> 2 levels) should be passed as factors, not dummy variables or integers, to ensure the polynomial matrix is constructed properly.

deg

The max degree of power terms. Default 1 so just returns model matrix by default.

maxInteractDeg

The max degree of nondummy interaction terms. x1 * x2 is degree 2. x1^3 * x2^2 is degree 5. Implicitly constrained by deg. For example, if deg = 3 and maxInteractDegree = 2, x1^1 * x2^2 (i.e., degree 3) will be included but x1^2 * x2^2 (i.e., degree 4) will not.

Xy

The dataframe with the response in the final column (provide xdata or Xy but not both).Categorical variables (> 2 levels) should be passed as factors, not dummy variables or integers, to ensure the polynomial matrix is constructed properly.

standardize

Standardize all continuous variables? (Default: FALSE.)

noisy

Output progress updates? (Default: TRUE.)

intercept

Include intercept? (Default: FALSE.)

returnDF

Return a data.frame (as opposed to model.matrix)? (Default: TRUE.)

modelFormula

Internal use. Formula used to generate the training model matrix. Note: anticipates that polynomial terms are generated using internal functions of library(polyreg). Also, providing modelFormula bypasses deg and maxInteractDeg.

retainedNames

Internal use. colnames of polyMatrix object$xdata. Requires modelFormula be inputted as well.

...

Additional arguments to be passed to model.matrix() via polyreg:::model_matrix(). Note na.action = "na.omit".

Details

The getPoly function takes in a data frame or data matrix and generates polynomial terms of predictor variables.

Note the subtleties involving dummy variables. The square, cubic and so on terms are the same as the original variable, and the various duplicates must be eliminated.

Similarly, after dummy variable are created from a categorical variable having more than two levels, the resulting columns will be orthogonal to each other. In almost all cases, this argument should be set to TRUE at the training stage, and then in predictions one should use the vector of names in the component in the return value; predict.polyFit does the latter automatically.

Note: If a column that is an R factor has levels with spaces in the names, this will interfere with the parsing, and must be avoided.

Value

The return value of getPoly is a polyMatrix object. This is an S3 class containing a model.matrix xdata of the generated polynomial terms. The predictor variables have column names V1, V2, etc. The object also contains modelFormula, the formula used to construct the model matrix, and XtestFormula, the formula which should be used out-of-sample (when y_test is not available).

Examples

N <- 125
rawdata <- data.frame(x1 = rnorm(N), 
                      x2 = rnorm(N),
                      group = sample(letters[1:5], N, replace=TRUE),
                      z = sample(c("treatment", "control"), N, replace=TRUE),
                      result = sample(c("win", "lose", "tie"), N, replace=TRUE))
head(rawdata)

P <- length(levels(rawdata$group)) - 1 + 
     length(levels(rawdata$z)) - 1 + 
     length(levels(rawdata$result)) - 1 + 
     sum(unlist(lapply(rawdata, is.numeric)))

# quadratic polynomial, includes interactions 
# since maxInteractDeg defaults to deg
X <- getPoly(rawdata, 2)$xdata 
ncol(X) # 40

# cubic polynomial, no interactions
X <- getPoly(rawdata, 3, 1)$xdata
ncol(X) # 13

# cubic polynomial, interactions
X <- getPoly(rawdata, 3, 2)$xdata
ncol(X) # 58

# cubic polynomial, interactions
X <- getPoly(rawdata, 3)$xdata
ncol(X) # 101

# making final column the response variable, y
# results in TRUE (fewer columns)
ncol(getPoly(Xy=rawdata, deg=2)$xdata) < ncol(getPoly(rawdata, 2)$xdata)

# preparing polynomial matrices for crossvalidation
# getPoly() returns a polyMatrix() object containing XtestFormula
# which should be used to ensure factors are handled correctly out-of-sample
Xtrain <- getPoly(rawdata[1:100,],2)
Xtest <- getPoly(rawdata[101:125,], 2, modelFormula = Xtrain$XtestFormula)


polyreg documentation built on March 31, 2022, 9:05 a.m.