frscvNOMAD: Categorical Factor Regression Spline Cross-Validation

View source: R/frscvNOMAD.R

frscvNOMADR Documentation

Categorical Factor Regression Spline Cross-Validation

Description

frscvNOMAD computes NOMAD-based (Nonsmooth Optimization by Mesh Adaptive Direct Search, Abramson, Audet, Couture and Le Digabel (2011)) cross-validation directed search for a regression spline estimate of a one (1) dimensional dependent variable on an r-dimensional vector of continuous predictors and nominal/ordinal (factor/ordered) predictors.

Usage

frscvNOMAD(xz,
           y,
           basis = c("additive","tensor","glp","auto"),
           complexity = c("degree-knots","degree","knots"),
           cv.df.min = 1,
           cv.func = c("cv.ls","cv.gcv","cv.aic"),
           degree = degree,
           degree.max = 10,
           degree.min = 0,
           display.nomad.progress = TRUE,
           display.warnings = TRUE,
           include = include,
           initial.mesh.size.integer = "1",
           knots = c("quantiles","uniform","auto"),
           max.bb.eval = 10000,
           min.mesh.size.integer = "1",
           min.frame.size.integer = "1",
           nmulti = 0,
           opts=list(),
           random.seed = 42,
           segments = segments,
           segments.max = 10,
           segments.min = 1,
           singular.ok = FALSE,
           tau = NULL,
           weights = NULL)

Arguments

Data, Model Inputs And Formula Interface

These arguments identify explicit data inputs for NOMAD spline search.

xz

continuous and/or nominal/ordinal (factor/ordered) predictors

y

continuous univariate vector

Basis And Spline Complexity

These arguments control basis type and spline complexity.

basis

a character string (default basis="additive") indicating whether the additive or tensor product B-spline basis matrix for a multivariate polynomial spline or generalized B-spline polynomial basis should be used. Note this can be automatically determined by cross-validation if cv=TRUE and basis="auto", and is an ‘all or none’ proposition (i.e. interaction terms for all predictors or for no predictors given the nature of ‘tensor products’). Note also that if there is only one predictor this defaults to basis="additive" to avoid unnecessary computation as the spline bases are equivalent in this case

complexity

a character string (default complexity="degree-knots") indicating whether model ‘complexity’ is determined by the degree of the spline or by the number of segments (‘knots’). This option allows the user to use cross-validation to select either the spline degree (number of knots held fixed) or the number of knots (spline degree held fixed) or both the spline degree and number of knots

degree

integer/vector specifying the degree of the B-spline basis for each dimension of the continuous x

degree.max

the maximum degree of the B-spline basis for each of the continuous predictors (default degree.max=10)

degree.min

the minimum degree of the B-spline basis for each of the continuous predictors (default degree.min=0)

knots

a character string (default knots="quantiles") specifying where knots are to be placed. ‘quantiles’ specifies knots placed at equally spaced quantiles (equal number of observations lie in each segment) and ‘uniform’ specifies knots placed at equally spaced intervals. If knots="auto", the knot type will be automatically determined by cross-validation

segments

integer/vector specifying the number of segments of the B-spline basis for each dimension of the continuous x (i.e. number of knots minus one)

segments.max

the maximum segments of the B-spline basis for each of the continuous predictors (default segments.max=10)

segments.min

the minimum segments of the B-spline basis for each of the continuous predictors (default segments.min=1)

Factor Inclusion Controls

These arguments control factor inclusion during search.

include

integer/vector for the categorical predictors. If it is not NULL, it will be the initial value for the fitting

NOMAD Search Controls

These arguments control NOMAD search, cross-validation objective selection, and restart behavior.

cv.df.min

the minimum degrees of freedom to allow when conducting cross-validation (default cv.df.min=1)

cv.func

a character string (default cv.func="cv.ls") indicating which method to use to select smoothing parameters. cv.gcv specifies generalized cross-validation (Craven and Wahba (1979)), cv.aic specifies expected Kullback-Leibler cross-validation (Hurvich, Simonoff, and Tsai (1998)), and cv.ls specifies least-squares cross-validation

initial.mesh.size.integer

argument passed to the NOMAD solver (see snomadr for further details)

max.bb.eval

argument passed to the NOMAD solver (see snomadr for further details)

min.frame.size.integer

arguments passed to the NOMAD solver (see snomadr for further details)

min.mesh.size.integer

arguments passed to the NOMAD solver (see snomadr for further details)

nmulti

integer number of times to restart the process of finding extrema of the cross-validation function from different (random) initial points (default nmulti=0)

opts

list of optional arguments to be passed to snomadr. If not user-specified, this function applies the NOMAD4 path defaults QUAD_MODEL_SEARCH="no", EVAL_QUEUE_SORT="DIR_LAST_SUCCESS", SIMPLE_LINE_SEARCH="yes", and SPECULATIVE_SEARCH="no", and DIRECTION_TYPE="ORTHO N+1 NEG" for faster mixed-integer search in this specific frscvNOMAD path. User-supplied opts entries always take precedence.

random.seed

when it is not missing and not equal to 0, the initial points will be generated using this seed when nmulti > 0

singular.ok

a logical value (default singular.ok=FALSE) that, when FALSE, discards singular bases during cross-validation (a check for ill-conditioned bases is performed).

Quantile And Weights

These arguments control quantile level and observation weights.

tau

if non-null a number in (0,1) denoting the quantile for which a quantile regression spline is to be estimated rather than estimating the conditional mean (default tau=NULL)

weights

an optional vector of weights to be used in the fitting process. Should be ‘NULL’ or a numeric vector. If non-NULL, weighted least squares is used with weights ‘weights’ (that is, minimizing ‘sum(w*e^2)’); otherwise ordinary least squares is used.

Warnings And Progress

These arguments control warnings and displayed optimizer progress.

display.nomad.progress

a logical value indicating whether to display the progress of the NOMAD solver (default display.nomad.progress=TRUE)

display.warnings

a logical value indicating whether to display warnings (default display.warnings=TRUE)

Details

frscvNOMAD computes NOMAD-based cross-validation for a regression spline estimate of a one (1) dimensional dependent variable on an r-dimensional vector of continuous and nominal/ordinal (factor/ordered) predictors. Numerical search for the optimal degree/segments/I is undertaken using snomadr.

The optimal K/I combination is returned along with other results (see below for return values).

For the continuous predictors the regression spline model employs either the additive or tensor product B-spline basis matrix for a multivariate polynomial spline via the B-spline routines in the GNU Scientific Library (https://www.gnu.org/software/gsl/) and the tensor.prod.model.matrix function.

For the nominal/ordinal (factor/ordered) predictors the regression spline model uses indicator basis functions.

Value

frscvNOMAD returns a crscv object. Furthermore, the function summary supports objects of this type. The returned objects have the following components:

K

scalar/vector containing optimal degree(s) of spline or number of segments

I

scalar/vector containing an indicator of whether the predictor is included or not for each dimension of the nominal/ordinal (factor/ordered) predictors

K.mat

vector/matrix of values of K evaluated during search

degree.max

the maximum degree of the B-spline basis for each of the continuous predictors (default degree.max=10)

segments.max

the maximum segments of the B-spline basis for each of the continuous predictors (default segments.max=10)

degree.min

the minimum degree of the B-spline basis for each of the continuous predictors (default degree.min=0)

segments.min

the minimum segments of the B-spline basis for each of the continuous predictors (default segments.min=1)

cv.func

objective function value at optimum

cv.func.vec

vector of objective function values at each degree of spline or number of segments in K.mat

Author(s)

Jeffrey S. Racine racinej@mcmaster.ca and Zhenghua Nie niez@mcmaster.ca

References

Abramson, M.A. and C. Audet and G. Couture and J.E. Dennis Jr. and S. Le Digabel (2011), “The NOMAD project”. Software available at https://www.gerad.ca/nomad.

Craven, P. and G. Wahba (1979), “Smoothing Noisy Data With Spline Functions,” Numerische Mathematik, 13, 377-403.

Hurvich, C.M. and J.S. Simonoff and C.L. Tsai (1998), “Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion,” Journal of the Royal Statistical Society B, 60, 271-293.

Le Digabel, S. (2011), “Algorithm 909: NOMAD: Nonlinear Optimization With the MADS Algorithm”. ACM Transactions on Mathematical Software, 37(4):44:1-44:15.

Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.

Ma, S. and J.S. Racine and L. Yang (2015), “Spline Regression in the Presence of Categorical Predictors,” Journal of Applied Econometrics, Volume 30, 705-717.

Ma, S. and J.S. Racine (2013), “Additive Regression Splines with Irrelevant Categorical and Continuous Regressors,” Statistica Sinica, Volume 23, 515-541.

See Also

loess, npregbw

Examples

set.seed(42)
## Simulated data
n <- 1000

x <- runif(n)
z <- round(runif(n,min=-0.5,max=1.5))
z.unique <- uniquecombs(as.matrix(z))
ind <-  attr(z.unique,"index")
ind.vals <-  sort(unique(ind))
dgp <- numeric(length=n)
for(i in 1:nrow(z.unique)) {
  zz <- ind == ind.vals[i]
  dgp[zz] <- z[zz]+cos(2*pi*x[zz])
}

y <- dgp + rnorm(n,sd=.1)

xdata <- data.frame(x,z=factor(z))

## Compute the optimal K and I, determine optimal number of knots, set
## spline degree for x to 3

cv <- frscvNOMAD(x=xdata,y=y,complexity="knots",degree=c(3),segments=c(5))
summary(cv)

crs documentation built on May 2, 2026, 1:06 a.m.