NNS.reg: NNS Regression

NNS.regR Documentation

NNS Regression

Description

Generates a nonlinear regression based on partial moment quadrant means.

Usage

NNS.reg(
  x,
  y,
  factor.2.dummy = TRUE,
  order = NULL,
  stn = 0.95,
  dim.red.method = NULL,
  tau = NULL,
  type = NULL,
  point.est = NULL,
  location = "top",
  return.values = TRUE,
  plot = TRUE,
  plot.regions = FALSE,
  residual.plot = TRUE,
  confidence.interval = NULL,
  threshold = 0,
  n.best = NULL,
  noise.reduction = "off",
  dist = "L2",
  ncores = NULL,
  point.only = FALSE,
  multivariate.call = FALSE
)

Arguments

x

a vector, matrix or data frame of variables of numeric or factor data types.

y

a numeric or factor vector with compatible dimensions to x.

factor.2.dummy

logical; TRUE (default) Automatically augments variable matrix with numerical dummy variables based on the levels of factors.

order

integer; Controls the number of partial moment quadrant means. Users are encouraged to try different (order = ...) integer settings with (noise.reduction = "off"). (order = "max") will force a limit condition perfect fit.

stn

numeric [0, 1]; Signal to noise parameter, sets the threshold of (NNS.dep) which reduces ("order") when (order = NULL). Defaults to 0.95 to ensure high dependence for higher ("order") and endpoint determination.

dim.red.method

options: ("cor", "NNS.dep", "NNS.caus", "all", "equal", numeric vector, NULL) method for determining synthetic X* coefficients. Selection of a method automatically engages the dimension reduction regression. The default is NULL for full multivariate regression. (dim.red.method = "NNS.dep") uses NNS.dep for nonlinear dependence weights, while (dim.red.method = "NNS.caus") uses NNS.caus for causal weights. (dim.red.method = "cor") uses standard linear correlation for weights. (dim.red.method = "all") averages all methods for further feature engineering. (dim.red.method = "equal") uses unit weights. Alternatively, user can specify a numeric vector of coefficients.

tau

options("ts", NULL); NULL(default) To be used in conjunction with (dim.red.method = "NNS.caus") or (dim.red.method = "all"). If the regression is using time-series data, set (tau = "ts") for more accurate causal analysis.

type

NULL (default). To perform a classification, set to (type = "CLASS"). Like a logistic regression, it is not necessary for target variable of two classes e.g. [0, 1].

point.est

a numeric or factor vector with compatible dimensions to x. Returns the fitted value y.hat for any value of x.

location

Sets the legend location within the plot, per the x and y co-ordinates used in base graphics legend.

return.values

logical; TRUE (default), set to FALSE in order to only display a regression plot and call values as needed.

plot

logical; TRUE (default) To plot regression.

plot.regions

logical; FALSE (default). Generates 3d regions associated with each regression point for multivariate regressions. Note, adds significant time to routine.

residual.plot

logical; TRUE (default) To plot y.hat and Y.

confidence.interval

numeric [0, 1]; NULL (default) Plots the associated confidence interval with the estimate and reports the standard error for each individual segment. Also applies the same level for the prediction intervals.

threshold

numeric [0, 1]; (threshold = 0) (default) Sets the threshold for dimension reduction of independent variables when (dim.red.method) is not NULL.

n.best

integer; NULL (default) Sets the number of nearest regression points to use in weighting for multivariate regression at sqrt(# of regressors). (n.best = "all") will select and weight all generated regression points. Analogous to k in a k Nearest Neighbors algorithm. Different values of n.best are tested using cross-validation in NNS.stack.

noise.reduction

the method of determining regression points options: ("mean", "median", "mode", "off"); In low signal:noise situations,(noise.reduction = "mean") uses means for NNS.dep restricted partitions, (noise.reduction = "median") uses medians instead of means for NNS.dep restricted partitions, while (noise.reduction = "mode") uses modes instead of means for NNS.dep restricted partitions. (noise.reduction = "off") uses an overall central tendency measure for partitions.

dist

options:("L1", "L2", "FACTOR") the method of distance calculation; Selects the distance calculation used. dist = "L2" (default) selects the Euclidean distance and (dist = "L1") selects the Manhattan distance; (dist = "FACTOR") uses a frequency.

ncores

integer; value specifying the number of cores to be used in the parallelized procedure. If NULL (default), the number of cores to be used is equal to the number of cores of the machine - 1.

point.only

Internal argument for abbreviated output.

multivariate.call

Internal argument for multivariate regressions.

Value

UNIVARIATE REGRESSION RETURNS THE FOLLOWING VALUES:

  • "R2" provides the goodness of fit;

  • "SE" returns the overall standard error of the estimate between y and y.hat;

  • "Prediction.Accuracy" returns the correct rounded "Point.est" used in classifications versus the categorical y;

  • "derivative" for the coefficient of the x and its applicable range;

  • "Point.est" for the predicted value generated;

  • "pred.int" lower and upper prediction intervals for the "Point.est" returned using the "confidence.interval" provided;

  • "regression.points" provides the points used in the regression equation for the given order of partitions;

  • "Fitted.xy" returns a data.table of x, y, y.hat, resid, NNS.ID, gradient;

MULTIVARIATE REGRESSION RETURNS THE FOLLOWING VALUES:

  • "R2" provides the goodness of fit;

  • "equation" returns the numerator of the synthetic X* dimension reduction equation as a data.table consisting of regressor and its coefficient. Denominator is simply the length of all coefficients > 0, returned in last row of equation data.table.

  • "x.star" returns the synthetic X* as a vector;

  • "rhs.partitions" returns the partition points for each regressor x;

  • "RPM" provides the Regression Point Matrix, the points for each x used in the regression equation for the given order of partitions;

  • "Point.est" returns the predicted value generated;

  • "pred.int" lower and upper prediction intervals for the "Point.est" returned using the "confidence.interval" provided;

  • "Fitted.xy" returns a data.table of x,y, y.hat, gradient, and NNS.ID.

Note

  • Please ensure point.est is of compatible dimensions to x, error message will ensue if not compatible.

  • Like a logistic regression, the (type = "CLASS") setting is not necessary for target variable of two classes e.g. [0, 1]. The response variable base category should be 1 for classification problems.

  • For low signal:noise instances, increasing the dimension may yield better results using NNS.stack(cbind(x,x), y, method = 1, ...).

Author(s)

Fred Viole, OVVO Financial Systems

References

Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" https://www.amazon.com/dp/1490523995/ref=cm_sw_su_dp

Vinod, H. and Viole, F. (2017) "Nonparametric Regression Using Clusters" https://link.springer.com/article/10.1007/s10614-017-9713-5

Vinod, H. and Viole, F. (2018) "Clustering and Curve Fitting by Line Segments" https://www.preprints.org/manuscript/201801.0090/v1

Examples

## Not run: 
set.seed(123)
x <- rnorm(100) ; y <- rnorm(100)
NNS.reg(x, y)

## Manual {order} selection
NNS.reg(x, y, order = 2)

## Maximum {order} selection
NNS.reg(x, y, order = "max")

## x-only paritioning (Univariate only)
NNS.reg(x, y, type = "XONLY")

## For Multiple Regression:
x <- cbind(rnorm(100), rnorm(100), rnorm(100)) ; y <- rnorm(100)
NNS.reg(x, y, point.est = c(.25, .5, .75))

## For Multiple Regression based on Synthetic X* (Dimension Reduction):
x <- cbind(rnorm(100), rnorm(100), rnorm(100)) ; y <- rnorm(100)
NNS.reg(x, y, point.est = c(.25, .5, .75), dim.red.method = "cor", ncores = 1)

## IRIS dataset examples:
# Dimension Reduction:
NNS.reg(iris[,1:4], iris[,5], dim.red.method = "cor", order = 5, ncores = 1)

# Dimension Reduction using causal weights:
NNS.reg(iris[,1:4], iris[,5], dim.red.method = "NNS.caus", order = 5, ncores = 1)

# Multiple Regression:
NNS.reg(iris[,1:4], iris[,5], order = 2, noise.reduction = "off")

# Classification:
NNS.reg(iris[,1:4], iris[,5], point.est = iris[1:10, 1:4], type = "CLASS")$Point.est

## To call fitted values:
x <- rnorm(100) ; y <- rnorm(100)
NNS.reg(x, y)$Fitted

## To call partial derivative (univariate regression only):
NNS.reg(x, y)$derivative

## End(Not run)

OVVO-Financial/NNS documentation built on April 22, 2024, 10:26 p.m.