Description Usage Arguments Details Value Author(s) References See Also Examples
Fits a regression model using a polynomial basis expansion of the input variables, with penalization via the adaptive LASSO or SCAD to provide oracle variable selection.
1 2 3 4 5 6 7 8  polywog(formula, data, subset, weights, na.action, degree = 3,
family = c("gaussian", "binomial"), method = c("alasso", "scad"),
penwt.method = c("lm", "glm"), unpenalized = character(0),
.parallel = FALSE, boot = 0, control.boot = control.bp(.parallel =
.parallel), lambda = NULL, nlambda = 100, lambda.min.ratio = 1e04,
nfolds = 10, foldid = NULL, thresh = ifelse(method == "alasso", 1e07,
0.001), maxit = ifelse(method == "alasso", 1e+05, 5000), model = TRUE,
X = FALSE, y = FALSE)

formula 
model formula specifying the response and input variables. See "Details" for more information. 
data 
a data frame, list or environment containing the variables specified in the model formula. 
subset 
an optional vector specifying a subset of observations to be used in fitting. 
weights 
an optional vector specifying weights for each observation to be used in fitting. 
na.action 
a function specifying what to do with observations
containing 
degree 
integer specifying the degree of the polynomial expansion of the input variables. 
family 

method 
variable selection method: 
penwt.method 
estimator for obtaining firststage estimates in
logistic models when 
unpenalized 
names of model terms to be exempt from the adaptive
penalty (only available when 
.parallel 
logical: whether to perform kfold crossvalidation in
parallel (only available when 
boot 
number of bootstrap iterations (0 for no bootstrapping). 
control.boot 
list of arguments to be passed to

lambda 
a vector of values from which the penalty factor is to be
selected via kfold crossvalidation. 
nlambda 
number of values of the penalty factor to examine via
crossvalidation if 
lambda.min.ratio 
ratio of the lowest value to the highest in the
generated sequence of values of the penalty factor if 
nfolds 
number of folds to use in crossvalidation to select the penalization factor. 
foldid 
optional vector manually assigning fold numbers to each
observation used for fitting (only available when 
thresh 
convergence threshold, passed as the 
maxit 
maximum number of iterations to allow in adaptive LASSO or SCAD fitting. 
model 
logical: whether to include the model frame in the returned object. 
X 
logical: whether to include the raw design matrix (i.e., the matrix of input variables prior to taking their polynomial expansion) in the returned object. 
y 
logical: whether to include the response variable in the returned object. 
The design matrix for the regression is a polynomial basis expansion of the
matrix of raw input variables. This includes all powers and interactions of
the input variables up to the specified degree
. For example, the
following terms will be included in polywog(y ~ x1 + x2, degree = 3,
...)
:
terms of degree 0: intercept
terms of degree 1: x1
, x2
terms of degree 2: x1^2
, x2^2
, x1*x2
terms of degree 3: x1^3
, x2^3
, x1*x2^2
,
x1^2*x2
To exclude certain terms from the basis expansion, use a model formula like
y ~ x1 + x2  z1 + z2
. Only the degree 1 terms of z1
and
z2
will be included.
It is possible that the "raw" basis expansion will be rankdeficient, such
as if there are binary input variables (in which case x_i = x_i^n for
all n > 0). The procedure detects collinearity via qr
and
removes extraneous columns before fitting.
For both the adaptive LASSO and SCAD, the penalization factor λ
is chosen by kfold crossvalidation. The selected value minimizes the
average mean squared error of outofsample fits. (To select both
λ and the polynomial degree simultaneously via crossvalidation,
see cv.polywog
.)
The crossvalidation process may be run in parallel via
foreach
by registering an appropriate backend and specifying
.parallel = TRUE
. The appropriate backend is systemspecific; see
foreach
for information on selecting and registering a
backend. The bootstrap iterations may also be run in parallel by
specifying control.boot = control.bp(.parallel = TRUE)
.
An object of class "polywog"
, a list containing:
coefficients
the estimated coefficients.
lambda
value of the penalty factor λ used to fit the final model.
lambda.cv
a list containing the results of the crossvalidation procedure used to select the penalty factor:
lambda
values of the penalty factor tested in crossvalidation.
cvError
outoffold prediction error corresponding to
each value of lambda
.
lambdaMin
value of lambda
with the minimal
crossvalidation error.
errorMin
minimized value of the crossvalidation error.
fitted.values
the fitted mean values for each observation used in fitting.
lmcoef
coefficients from an unpenalized leastsquares regression of the response variable on the polynomial expansion of the input variables.
penwt
adaptive weight given to each term in the LASSO
penalty (NULL
for models fit via SCAD).
formula
model formula, as a Formula
object.
degree
degree of the polynomial basis expansion.
family
model family, "gaussian"
or
"binomial"
.
weights
observation weights if specified.
method
the specified regularization method.
penwt.method
the specified method for calculating
the adaptive LASSO weights (NULL
for models fit via SCAD).
unpenalized
logical vector indicating which terms were not included in the LASSO penalty.
thresh
convergence threshold used in fitting.
maxit
iteration limit used in fitting.
terms
the terms
object used to construct the
model frame.
polyTerms
a matrix indicating the power of each raw input term (columns) in each term of the polynomial expansion used in fitting (rows).
nobs
the number of observations used to fit the model.
na.action
information on how NA
values in the input
data were handled.
xlevels
levels of factor variables used in fitting.
varNames
names of the raw input variables included in the model formula.
call
the original function call.
model
if model = TRUE
, the model frame used in
fitting; otherwise NULL
.
X
if X = TRUE
, the raw model matrix (i.e., prior to
taking the polynomial expansion); otherwise NULL
. For calculating
the expanded model matrix, see model.matrix.polywog
.
y
if y = TRUE
, the response variable used in
fitting; otherwise NULL
.
boot.matrix
if boot > 0
, a sparse matrix of class
"dgCMatrix"
where each column is the estimate from a
bootstrap replicate. See bootPolywog
for more information
on bootstrapping.
Brenton Kenkel and Curtis S. Signorino
Brenton Kenkel and Curtis S. Signorino. 2012. "A Method for Flexible Functional Form Estimation: Bootstrapped Basis Regression with Variable Selection." Typescript, University of Rochester.
To estimate variation via the bootstrap, see
bootPolywog
. To generate fitted values, see
predVals
(and the underlying method
predict.polywog
). For plots, see plot.polywog
.
The polynomial degree may be selected via crossvalidation using
cv.polywog
.
Adaptive LASSO estimates are provided via glmnet
and
cv.glmnet
from the glmnet package. SCAD estimates are
via ncvreg
and cv.ncvreg
in the ncvreg
package.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27  ## Using occupational prestige data
data(Prestige, package = "carData")
Prestige < transform(Prestige, income = income / 1000)
## Fit a polywog model with bootstrap iterations
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(22)
fit1 < polywog(prestige ~ education + income + type,
data = Prestige,
degree = 2,
boot = 5,
thresh = 1e4)
## Basic information
print(fit1)
summary(fit1)
## See how fitted values change with education holding all else fixed
predVals(fit1, "education", n = 10)
## Plot univariate relationships
plot(fit1)
## Use SCAD instead of adaptive LASSO
fit2 < update(fit1, method = "scad", thresh = 1e3)
cbind(coef(fit1), coef(fit2))

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.