View source: R/model_smimodel.R
| model_smimodel | R Documentation |
Fits nonparametric multiple index model(s), with simultaneous predictor selection (hence "sparse") and predictor grouping. Possible to fit multiple SMI models based on a grouping variable.
model_smimodel(
data,
yvar,
neighbour = 0,
family = gaussian(),
index.vars,
initialise = c("ppr", "additive", "linear", "multiple", "userInput"),
num_ind = 5,
num_models = 5,
seed = 123,
index.ind = NULL,
index.coefs = NULL,
s.vars = NULL,
linear.vars = NULL,
lambda0 = 1,
lambda2 = 1,
M = 10,
max.iter = 50,
tol = 0.001,
tolCoefs = 0.001,
TimeLimit = Inf,
MIPGap = 1e-04,
NonConvex = -1,
verbose = list(solver = FALSE, progress = FALSE)
)
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
Sparse Multiple Index (SMI) model is a semi-parametric model that can be written as
y_{i} = \beta_{0} +
\sum_{j = 1}^{p}g_{j}(\boldsymbol{\alpha}_{j}^{T}\boldsymbol{x}_{ij}) +
\sum_{k = 1}^{d}f_{k}(w_{ik}) + \boldsymbol{\theta}^{T}\boldsymbol{u}_{i} +
\varepsilon_{i}, \quad i = 1, \dots, n,
where y_{i} is the univariate
response, \beta_{0} is the model intercept, \boldsymbol{x}_{ij} \in
\mathbb{R}^{l_{j}}, j = 1, \dots, p are p subsets of predictors
entering indices, \boldsymbol{\alpha}_{j} is a vector of index
coefficients corresponding to the index h_{ij} =
\boldsymbol{\alpha}_{j}^{T}\boldsymbol{x}_{ij}, and g_{j} is a
smooth nonlinear function (estimated by a penalised cubic regression
spline). The model also allows for predictors that do not enter any
indices, including covariates w_{ik} that relate to the response
through nonlinear functions f_{k}, k = 1, \dots, d, and linear
covariates \boldsymbol{u}_{i}.
In the model formulation related to this implementation, both the number of
indices p and the predictor grouping among indices are assumed to be
unknown prior to model estimation. Suppose we observe y_1,\dots,y_n,
along with a set of potential predictors,
\boldsymbol{x}_1,\dots,\boldsymbol{x}_n, with each vector
\boldsymbol{x}_i containing q predictors. This function
implements algorithmic variable selection for index variables (i.e.
predictors entering indices) of the SMI model by allowing for zero index
coefficients for predictors. Non-overlapping predictors among indices are
assumed (i.e. no predictor enters more than one index). For algorithmic
details see reference.
An object of class smimodel. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit contains a list with two elements:
initial |
A list of information of the model initialisation. (For
descriptions of the list elements see |
best |
A list of information of the final optimised model. (For
descriptions of the list elements see |
Palihawadana, N.K., Hyndman, R.J. & Wang, X. (2024). Sparse Multiple Index Models for High-Dimensional Nonparametric Forecasting. (Department of Econometrics and Business Statistics Working Paper Series 16/24).
greedy_smimodel
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1005
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_data,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
# Best (optimised) fitted model
smimodel_ppr$fit[[1]]$best
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.