Getting Started with NNS: Clustering and Regression"

knitr::opts_chunk$set(echo = TRUE)
library(NNS)
library(data.table)
data.table::setDTthreads(2L)
options(mc.cores = 1)
Sys.setenv("OMP_THREAD_LIMIT" = 2)
library(NNS)
library(data.table)
require(knitr)
require(rgl)
require(meboot)

Clustering and Regression

Below are some examples demonstrating unsupervised learning with NNS clustering and nonlinear regression using the resulting clusters. As always, for a more thorough description and definition, please view the References.

NNS Partitioning NNS.part()

NNS.part is both a partitional and hierarchical clustering method. NNS iteratively partitions the joint distribution into partial moment quadrants, and then assigns a quadrant identification (1:4) at each partition.

NNS.part returns a data.table of observations along with their final quadrant identification. It also returns the regression points, which are the quadrant means used in NNS.reg.

x = seq(-5, 5, .05); y = x ^ 3

for(i in 1 : 4){NNS.part(x, y, order = i, Voronoi = TRUE, obs.req = 0)}

X-only Partitioning

NNS.part offers a partitioning based on $x$ values only NNS.part(x, y, type = "XONLY", ...), using the entire bandwidth in its regression point derivation, and shares the same limit condition as partitioning via both $x$ and $y$ values.

for(i in 1 : 4){NNS.part(x, y, order = i, type = "XONLY", Voronoi = TRUE)}

Note the partition identifications are limited to 1's and 2's (left and right of the partition respectively), not the 4 values per the $x$ and $y$ partitioning.

NNS.part(x,y,order = 4, type = "XONLY")

Clusters Used in Regression

The right column of plots shows the corresponding regression (plus endpoints and central point) for the order of NNS partitioning. ```r,results='hide'} for(i in 1 : 3){NNS.part(x, y, order = i, obs.req = 0, Voronoi = TRUE, type = "XONLY") ; NNS.reg(x, y, order = i, ncores = 1)}

# NNS Regression `NNS.reg()`
**`NNS.reg`** can fit any $f(x)$, for both uni- and multivariate cases.  **`NNS.reg`** returns a self-evident list of values provided below.

## Univariate:
```r
NNS.reg(x, y, ncores = 1)

Multivariate:

Multivariate regressions return a plot of $y$ and $\hat{y}$, as well as the regression points ($RPM) and partitions ($rhs.partitions) for each regressor.

f = function(x, y) x ^ 3 + 3 * y - y ^ 3 - 3 * x
y = x ; z <- expand.grid(x, y)
g = f(z[ , 1], z[ , 2])
NNS.reg(z, g, order = "max", plot = FALSE, ncores = 1)

Inter/Extrapolation

NNS.reg can inter- or extrapolate any point of interest. The NNS.reg(x, y, point.est = ...) parameter permits any sized data of similar dimensions to $x$ and called specifically with NNS.reg(...)$Point.est.

NNS Dimension Reduction Regression

NNS.reg also provides a dimension reduction regression by including a parameter NNS.reg(x, y, dim.red.method = "cor", ...). Reducing all regressors to a single dimension using the returned equation NNS.reg(..., dim.red.method = "cor", ...)$equation.

NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", location = "topleft", ncores = 1)$equation
a = NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", location = "topleft", ncores = 1, plot = FALSE)$equation

Thus, our model for this regression would be: $$Species = \frac{r round(a$Coefficient[1],3)Sepal.Length r round(a$Coefficient[2],3)Sepal.Width +r round(a$Coefficient[3],3)Petal.Length +r round(a$Coefficient[4],3)Petal.Width}{4} $$

Threshold

NNS.reg(x, y, dim.red.method = "cor", threshold = ...) offers a method of reducing regressors further by controlling the absolute value of required correlation.

NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, location = "topleft", ncores = 1)$equation
a = NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, location = "topleft", ncores = 1, plot = FALSE)$equation

Thus, our model for this further reduced dimension regression would be: $$Species = \frac{\: r round(a$Coefficient[1],3)Sepal.Length + r round(a$Coefficient[2],3)Sepal.Width +r round(a$Coefficient[3],3)Petal.Length +r round(a$Coefficient[4],3)Petal.Width}{3} $$

and the point.est = (...) operates in the same manner as the full regression above, again called with NNS.reg(...)$Point.est.

NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est

Classification

For a classification problem, we simply set NNS.reg(x, y, type = "CLASS", ...).

NOTE: Base category of response variable should be 1, not 0 for classification problems.

NNS.reg(iris[ , 1 : 4], iris[ , 5], type = "CLASS", point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est

Cross-Validation NNS.stack()

The NNS.stack routine cross-validates for a given objective function the n.best parameter in the multivariate NNS.reg function as well as the threshold parameter in the dimension reduction NNS.reg version. NNS.stack can be used for classification:

NNS.stack(..., type = "CLASS", ...)

or continuous dependent variables:

NNS.stack(..., type = NULL, ...).

Any objective function obj.fn can be called using expression() with the terms predicted and actual, even from external packages such as Metrics.

NNS.stack(..., obj.fn = expression(Metrics::mape(actual, predicted)), objective = "min").

NNS.stack(IVs.train = iris[ , 1 : 4], 
          DV.train = iris[ , 5], 
          IVs.test = iris[1 : 10, 1 : 4],
          dim.red.method = "cor",
          obj.fn = expression( mean(round(predicted) == actual) ),
          objective = "max", type = "CLASS", 
          folds = 1, ncores = 1)
Folds Remaining = 0 
Current NNS.reg(... , threshold = 0.935 ) MAX Iterations Remaining = 2 
Current NNS.reg(... , threshold = 0.795 ) MAX Iterations Remaining = 1 
Current NNS.reg(... , threshold = 0.44 ) MAX Iterations Remaining = 0 
Current NNS.reg(... , n.best = 1 ) MAX Iterations Remaining = 12 
Current NNS.reg(... , n.best = 2 ) MAX Iterations Remaining = 11 
Current NNS.reg(... , n.best = 3 ) MAX Iterations Remaining = 10 
Current NNS.reg(... , n.best = 4 ) MAX Iterations Remaining = 9 
$OBJfn.reg
[1] 1

$NNS.reg.n.best
[1] 1

$probability.threshold
[1] 0.43875

$OBJfn.dim.red
[1] 0.9666667

$NNS.dim.red.threshold
[1] 0.935

$reg
 [1] 1 1 1 1 1 1 1 1 1 1

$reg.pred.int
NULL

$dim.red
 [1] 1 1 1 1 1 1 1 1 1 1

$dim.red.pred.int
NULL

$stack
 [1] 1 1 1 1 1 1 1 1 1 1

$pred.int
NULL

Increasing Dimensions

Given multicollinearity is not an issue for nonparametric regressions as it is for OLS, in the case of an ill-fit univariate model a better option may be to increase the dimensionality of regressors with a copy of itself and cross-validate the number of clusters n.best via:

NNS.stack(IVs.train = cbind(x, x), DV.train = y, method = 1, ...).

set.seed(123)
x = rnorm(100); y = rnorm(100)

nns.params = NNS.stack(IVs.train = cbind(x, x),
                        DV.train = y,
                        method = 1, ncores = 1)
set.seed(123)
x = rnorm(100); y = rnorm(100)

nns.params = list()
nns.params$NNS.reg.n.best = 100
NNS.reg(cbind(x, x), y, 
        n.best = nns.params$NNS.reg.n.best,
        point.est = cbind(x, x), 
        residual.plot = TRUE,  
        ncores = 1, confidence.interval = .95)

References

If the user is so motivated, detailed arguments further examples are provided within the following:

Sys.setenv("OMP_THREAD_LIMIT" = "")


Try the NNS package in your browser

Any scripts or data that you put into this service are public.

NNS documentation built on Nov. 28, 2023, 1:10 a.m.