View source: R/greedy_smimodel.R
| greedy_smimodel | R Documentation |
Performs a greedy search over a given grid of penalty parameter combinations (lambda0, lambda2), and fits SMI model(s) with best (lowest validation set MSE) penalty parameter combination(s). If the optimal combination lies on the edge of the grid, the penalty parameters are adjusted by ±10%, and a second round of grid search is performed. If a grouping variable is used, penalty parameters are tuned separately for each individual model.
greedy_smimodel(
data,
val.data,
yvar,
neighbour = 0,
family = gaussian(),
index.vars,
initialise = c("ppr", "additive", "linear", "multiple", "userInput"),
num_ind = 5,
num_models = 5,
seed = 123,
index.ind = NULL,
index.coefs = NULL,
s.vars = NULL,
linear.vars = NULL,
nlambda = 100,
lambda.min.ratio = 1e-04,
refit = TRUE,
M = 10,
max.iter = 50,
tol = 0.001,
tolCoefs = 0.001,
TimeLimit = Inf,
MIPGap = 1e-04,
NonConvex = -1,
verbose = list(solver = FALSE, progress = FALSE),
parallel = FALSE,
workers = NULL,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colRange = NULL
)
data |
Training data set on which models will be trained. Must be a data
set of class |
val.data |
Validation data set. (The data set on which the penalty
parameter selection will be performed.) Must be a data set of class
|
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
nlambda |
The number of values for lambda0 (penalty parameter for L0 penalty) - default is 100. |
lambda.min.ratio |
Smallest value for lambda0, as a fraction of lambda0.max (data derived). |
refit |
Whether to refit the model combining training and validation
sets after parameter tuning. If |
M |
Big-M value used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
parallel |
The option to use parallel processing in fitting SMI models for different penalty parameter combinations. |
workers |
If |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
An object of class smimodel. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit contains a list with six elements:
initial |
A list of information of the model initialisation. (For
descriptions of the list elements see |
best |
A list of information of the final optimised model. (For
descriptions of the list elements see |
best_lambdas |
Selected penalty parameter combination. |
lambda0_seq |
Sequence of values for lambda0 used to construct the initial grid. |
lambda2_seq |
Sequence of values for lambda2 used to construct the initial grid. |
searched |
A |
The number of
rows of the tibble equals to the number of levels in the grouping
variable.
model_smimodel
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1205
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Training set
sim_train <- sim_data[1:1000, ]
# Validation set
sim_val <- sim_data[1001:1200, ]
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Model fitting
smi_greedy <- greedy_smimodel(data = sim_train,
val.data = sim_val,
yvar = "y",
index.vars = index.vars,
initialise = "ppr",
lambda.min.ratio = 0.1)
# Best (optimised) fitted model
smi_greedy$fit[[1]]$best
# Selected penalty parameter combination
smi_greedy$fit[[1]]$best_lambdas
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.