NNS.stack: NNS Stack

View source: R/Stack.R

NNS.stackR Documentation

NNS Stack

Description

Prediction model using the predictions of the NNS base models NNS.reg as features (i.e. meta-features) for the stacked model.

Usage

NNS.stack(
  IVs.train,
  DV.train,
  IVs.test = NULL,
  type = NULL,
  obj.fn = expression(sum((predicted - actual)^2)),
  objective = "min",
  optimize.threshold = TRUE,
  dist = "L2",
  CV.size = NULL,
  balance = FALSE,
  ts.test = NULL,
  folds = 5,
  order = NULL,
  norm = NULL,
  method = c(1, 2),
  stack = TRUE,
  dim.red.method = "cor",
  pred.int = NULL,
  status = TRUE,
  ncores = NULL
)

Arguments

IVs.train

a vector, matrix or data frame of variables of numeric or factor data types.

DV.train

a numeric or factor vector with compatible dimensions to (IVs.train).

IVs.test

a vector, matrix or data frame of variables of numeric or factor data types with compatible dimensions to (IVs.train). If NULL, will use (IVs.train) as default.

type

NULL (default). To perform a classification of discrete integer classes from factor target variable (DV.train) with a base category of 1, set to (type = "CLASS"), else for continuous (DV.train) set to (type = NULL). Like a logistic regression, this setting is not necessary for target variable of two classes e.g. [0, 1].

obj.fn

expression; expression(sum((predicted - actual)^2)) (default) Sum of squared errors is the default objective function. Any expression() using the specific terms predicted and actual can be used.

objective

options: ("min", "max") "min" (default) Select whether to minimize or maximize the objective function obj.fn.

optimize.threshold

logical; TRUE (default) Will optimize the probability threshold value for rounding in classification problems. If FALSE, returns 0.5.

dist

options:("L1", "L2", "DTW", "FACTOR") the method of distance calculation; Selects the distance calculation used. dist = "L2" (default) selects the Euclidean distance and (dist = "L1") selects the Manhattan distance; (dist = "DTW") selects the dynamic time warping distance; (dist = "FACTOR") uses a frequency.

CV.size

numeric [0, 1]; NULL (default) Sets the cross-validation size if (IVs.test = NULL). Defaults to a random value between 0.2 and 0.33 for a random sampling of the training set.

balance

logical; FALSE (default) Uses both up and down sampling to balance the classes. type="CLASS" required.

ts.test

integer; NULL (default) Sets the length of the test set for time-series data; typically 2*h parameter value from NNS.ARMA or double known periods to forecast.

folds

integer; folds = 5 (default) Select the number of cross-validation folds.

order

options: (integer, "max", NULL); NULL (default) Sets the order for NNS.reg, where (order = "max") is the k-nearest neighbors equivalent, which is suggested for mixed continuous and discrete (unordered, ordered) data.

norm

options: ("std", "NNS", NULL); NULL (default) 3 settings offered: NULL, "std", and "NNS". Selects the norm parameter in NNS.reg.

method

numeric options: (1, 2); Select the NNS method to include in stack. (method = 1) selects NNS.reg; (method = 2) selects NNS.reg dimension reduction regression. Defaults to method = c(1, 2), which will reduce the dimension first, then find the optimal n.best.

stack

logical; TRUE (default) Uses dimension reduction output in n.best optimization, otherwise performs both analyses independently.

dim.red.method

options: ("cor", "NNS.dep", "NNS.caus", "equal", "all") method for determining synthetic X* coefficients. (dim.red.method = "cor") uses standard linear correlation for weights. (dim.red.method = "NNS.dep") (default) uses NNS.dep for nonlinear dependence weights, while (dim.red.method = "NNS.caus") uses NNS.caus for causal weights. (dim.red.method = "all") averages all methods for further feature engineering.

pred.int

numeric [0,1]; NULL (default) Returns the associated prediction intervals with each method.

status

logical; TRUE (default) Prints status update message in console.

ncores

integer; value specifying the number of cores to be used in the parallelized subroutine NNS.reg. If NULL (default), the number of cores to be used is equal to the number of cores of the machine - 1.

Value

Returns a vector of fitted values for the dependent variable test set for all models.

  • "NNS.reg.n.best" returns the optimum "n.best" parameter for the NNS.reg multivariate regression. "SSE.reg" returns the SSE for the NNS.reg multivariate regression.

  • "OBJfn.reg" returns the obj.fn for the NNS.reg regression.

  • "NNS.dim.red.threshold" returns the optimum "threshold" from the NNS.reg dimension reduction regression.

  • "OBJfn.dim.red" returns the obj.fn for the NNS.reg dimension reduction regression.

  • "probability.threshold" returns the optimum probability threshold for classification, else 0.5 when set to FALSE.

  • "reg" returns NNS.reg output.

  • "reg.pred.int" returns the prediction intervals for the regression output.

  • "dim.red" returns NNS.reg dimension reduction regression output.

  • "dim.red.pred.int" returns the prediction intervals for the dimension reduction regression output.

  • "stack" returns the output of the stacked model.

  • "pred.int" returns the prediction intervals for the stacked model.

Note

  • Incorporate any objective function from external packages (such as Metrics::mape) via NNS.stack(..., obj.fn = expression(Metrics::mape(actual, predicted)), objective = "min")

  • Like a logistic regression, the (type = "CLASS") setting is not necessary for target variable of two classes e.g. [0, 1]. The response variable base category should be 1 for multiple class problems.

  • Missing data should be handled prior as well using na.omit or complete.cases on the full dataset.

If error received:

"Error in is.data.frame(x) : object 'RP' not found"

reduce the CV.size.

Author(s)

Fred Viole, OVVO Financial Systems

References

Viole, F. (2016) "Classification Using NNS Clustering Analysis" \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2139/ssrn.2864711")}

Examples

 ## Using 'iris' dataset where test set [IVs.test] is 'iris' rows 141:150.
 ## Not run: 
 NNS.stack(iris[1:140, 1:4], iris[1:140, 5], IVs.test = iris[141:150, 1:4], type = "CLASS")

 ## Using 'iris' dataset to determine [n.best] and [threshold] with no test set.
 NNS.stack(iris[ , 1:4], iris[ , 5], type = "CLASS")

 ## Selecting NNS.reg and dimension reduction techniques.
 NNS.stack(iris[1:140, 1:4], iris[1:140, 5], iris[141:150, 1:4], method = c(1, 2), type = "CLASS")
 
## End(Not run)

NNS documentation built on Oct. 14, 2024, 5:09 p.m.