library(NNS)
library(data.table)
require(knitr)
require(rgl)
require(meboot)
require(dtw)
Below are some examples demonstrating unsupervised learning with NNS clustering and nonlinear regression using the resulting clusters. As always, for a more thorough description and definition, please view the References.
NNS.part()
NNS.part
is both a partitional and
hierarchical clustering method. NNS
iteratively partitions
the joint distribution into partial moment quadrants, and then assigns a
quadrant identification (1:4) at each partition.
NNS.part
returns a
data.table
of observations along with their final quadrant
identification. It also returns the regression points, which are the
quadrant means used in NNS.reg
.
= seq(-5, 5, .05); y = x ^ 3
x
for(i in 1 : 4){NNS.part(x, y, order = i, Voronoi = TRUE, obs.req = 0)}
NNS.part
offers a partitioning based on
\(x\) values only
NNS.part(x, y, type = "XONLY", ...)
, using
the entire bandwidth in its regression point derivation, and shares the
same limit condition as partitioning via both \(x\) and \(y\) values.
for(i in 1 : 4){NNS.part(x, y, order = i, type = "XONLY", Voronoi = TRUE)}
Note the partition identifications are limited to 1’s and 2’s (left and right of the partition respectively), not the 4 values per the \(x\) and \(y\) partitioning.
## $order
## [1] 4
##
## $dt
## x y quadrant prior.quadrant
## 1: -5.00 -125.0000 q1111 q111
## 2: -4.95 -121.2874 q1111 q111
## 3: -4.90 -117.6490 q1111 q111
## 4: -4.85 -114.0841 q1111 q111
## 5: -4.80 -110.5920 q1111 q111
## ---
## 197: 4.80 110.5920 q2222 q222
## 198: 4.85 114.0841 q2222 q222
## 199: 4.90 117.6490 q2222 q222
## 200: 4.95 121.2874 q2222 q222
## 201: 5.00 125.0000 q2222 q222
##
## $regression.points
## quadrant x y
## 1: q111 -4.4137228 -87.0066046
## 2: q112 -3.1636209 -32.3875620
## 3: q121 -1.9134171 -7.4479769
## 4: q122 -0.6635190 -0.4462558
## 5: q211 0.5865829 0.3666250
## 6: q212 1.8365829 6.6149350
## 7: q221 3.0861753 30.1098359
## 8: q222 4.3612793 84.0304489
The right column of plots shows the corresponding regression for the
order of NNS
partitioning.
for(i in 1 : 3){NNS.part(x, y, order = i, obs.req = 0, Voronoi = TRUE) ; NNS.reg(x, y, order = i, ncores = 1)}
NNS.reg()
NNS.reg
can fit any \(f(x)\), for both uni- and multivariate
cases. NNS.reg
returns a self-evident list
of values provided below.
NNS.reg(x, y, ncores = 1)
## $R2
## [1] 1
##
## $SE
## [1] 0
##
## $Prediction.Accuracy
## NULL
##
## $equation
## NULL
##
## $x.star
## NULL
##
## $derivative
## Coefficient X.Lower.Range X.Upper.Range
## 1: 74.2525 -5.00 -4.95
## 2: 72.7675 -4.95 -4.90
## 3: 71.2975 -4.90 -4.85
## 4: 69.8425 -4.85 -4.80
## 5: 68.4025 -4.80 -4.75
## ---
## 196: 68.4025 4.75 4.80
## 197: 69.8425 4.80 4.85
## 198: 71.2975 4.85 4.90
## 199: 72.7675 4.90 4.95
## 200: 74.2525 4.95 5.00
##
## $Point.est
## NULL
##
## $regression.points
## x y
## 1: -5.00 -125.0000
## 2: -4.95 -121.2874
## 3: -4.90 -117.6490
## 4: -4.85 -114.0841
## 5: -4.80 -110.5920
## ---
## 197: 4.80 110.5920
## 198: 4.85 114.0841
## 199: 4.90 117.6490
## 200: 4.95 121.2874
## 201: 5.00 125.0000
##
## $Fitted.xy
## x y y.hat NNS.ID gradient residuals standard.errors
## 1: -5.00 -125.0000 -125.0000 q4444444444 74.2525 0 0
## 2: -4.95 -121.2874 -121.2874 q4444441444 72.7675 0 0
## 3: -4.90 -117.6490 -117.6490 q4444414444 71.2975 0 0
## 4: -4.85 -114.0841 -114.0841 q4444414144 69.8425 0 0
## 5: -4.80 -110.5920 -110.5920 q4444411444 68.4025 0 0
## ---
## 197: 4.80 110.5920 110.5920 q1111144144 69.8425 0 0
## 198: 4.85 114.0841 114.0841 q1111141444 71.2975 0 0
## 199: 4.90 117.6490 117.6490 q1111114444 72.7675 0 0
## 200: 4.95 121.2874 121.2874 q1111114144 74.2525 0 0
## 201: 5.00 125.0000 125.0000 q1111111444 74.2525 0 0
Multivariate regressions return a plot of \(y\) and \(\hat{y}\), as well as the regression points
($RPM
) and partitions ($rhs.partitions
) for
each regressor.
= function(x, y) x ^ 3 + 3 * y - y ^ 3 - 3 * x
f= x ; z = expand.grid(x, y)
y = f(z[ , 1], z[ , 2])
g NNS.reg(z, g, order = "max", ncores = 1)
## $R2
## [1] 1
##
## $rhs.partitions
## Var1 Var2
## 1: -5.00 -5
## 2: -4.95 -5
## 3: -4.90 -5
## 4: -4.85 -5
## 5: -4.80 -5
## ---
## 40397: 4.80 5
## 40398: 4.85 5
## 40399: 4.90 5
## 40400: 4.95 5
## 40401: 5.00 5
##
## $RPM
## Var1 Var2 y.hat
## 1: -4.8 -4.80 -7.105427e-15
## 2: -4.8 -2.55 -8.726063e+01
## 3: -4.8 -2.50 -8.806700e+01
## 4: -4.8 -2.45 -8.883587e+01
## 5: -4.8 -2.40 -8.956800e+01
## ---
## 40397: -2.6 -2.80 3.776000e+00
## 40398: -2.6 -2.75 2.770875e+00
## 40399: -2.6 -2.70 1.807000e+00
## 40400: -2.6 -2.65 8.836250e-01
## 40401: -2.6 -2.60 1.776357e-15
##
## $Point.est
## NULL
##
## $Fitted.xy
## Var1 Var2 y y.hat NNS.ID residuals
## 1: -5.00 -5 0.000000 0.000000 201.201 0
## 2: -4.95 -5 3.562625 3.562625 402.201 0
## 3: -4.90 -5 7.051000 7.051000 603.201 0
## 4: -4.85 -5 10.465875 10.465875 804.201 0
## 5: -4.80 -5 13.808000 13.808000 1005.201 0
## ---
## 40397: 4.80 5 -13.808000 -13.808000 39597.40401 0
## 40398: 4.85 5 -10.465875 -10.465875 39798.40401 0
## 40399: 4.90 5 -7.051000 -7.051000 39999.40401 0
## 40400: 4.95 5 -3.562625 -3.562625 40200.40401 0
## 40401: 5.00 5 0.000000 0.000000 40401.40401 0
NNS.reg
can inter- or extrapolate any point of interest.
The NNS.reg(x, y, point.est = ...)
parameter permits any sized data of similar dimensions to \(x\) and called specifically with
NNS.reg(...)$Point.est
.
NNS.reg
also provides a dimension
reduction regression by including a parameter
NNS.reg(x, y, dim.red.method = "cor", ...)
.
Reducing all regressors to a single dimension using the returned
equation
NNS.reg(..., dim.red.method = "cor", ...)$equation
.
NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", location = "topleft", ncores = 1)$equation
## Variable Coefficient
## 1: Sepal.Length 0.7980781
## 2: Sepal.Width -0.4402896
## 3: Petal.Length 0.9354305
## 4: Petal.Width 0.9381792
## 5: DENOMINATOR 4.0000000
Thus, our model for this regression would be: \[Species = \frac{0.798*Sepal.Length -0.44*Sepal.Width +0.935*Petal.Length +0.938*Petal.Width}{4} \]
NNS.reg(x, y, dim.red.method = "cor", threshold = ...)
offers a method of reducing regressors further by controlling the
absolute value of required correlation.
NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, location = "topleft", ncores = 1)$equation
## Variable Coefficient
## 1: Sepal.Length 0.7980781
## 2: Sepal.Width 0.0000000
## 3: Petal.Length 0.9354305
## 4: Petal.Width 0.9381792
## 5: DENOMINATOR 3.0000000
Thus, our model for this further reduced dimension regression would be: \[Species = \frac{\: 0.798*Sepal.Length + 0*Sepal.Width +0.935*Petal.Length +0.938*Petal.Width}{3} \]
and the point.est = (...)
operates in the same manner as
the full regression above, again called with
NNS.reg(...)$Point.est
.
NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est
## [1] 1 1 1 1 1 1 1 1 1 1
For a classification problem, we simply set
NNS.reg(x, y, type = "CLASS", ...)
.
NOTE: Base category of response variable should be 1, not 0 for classification problems.
NNS.reg(iris[ , 1 : 4], iris[ , 5], type = "CLASS", point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est
## [1] 1 1 1 1 1 1 1 1 1 1
NNS.stack()
The NNS.stack
routine cross-validates
for a given objective function the n.best
parameter in the
multivariate NNS.reg
function as well as
the threshold
parameter in the dimension reduction
NNS.reg
version.
NNS.stack
can be used for
classification:
NNS.stack(..., type = "CLASS", ...)
or continuous dependent variables:
NNS.stack(..., type = NULL, ...)
.
Any objective function obj.fn
can be called using
expression()
with the terms predicted
and
actual
.
NNS.stack(IVs.train = iris[ , 1 : 4],
DV.train = iris[ , 5],
IVs.test = iris[1 : 10, 1 : 4],
obj.fn = expression( mean(round(predicted) == actual) ),
objective = "max", type = "CLASS",
folds = 1, ncores = 1)
## $OBJfn.reg
## [1] 0.9733333
##
## $NNS.reg.n.best
## [1] 1
##
## $probability.threshold
## [1] 0.5
##
## $OBJfn.dim.red
## [1] 0.9066667
##
## $NNS.dim.red.threshold
## [1] 0.78
##
## $reg
## [1] 1 1 1 1 1 1 1 1 1 1
##
## $dim.red
## [1] 1 1 1 1 1 1 1 1 1 1
##
## $stack
## [1] 1 1 1 1 1 1 1 1 1 1
Given multicollinearity is not an issue for nonparametric regressions
as it is for OLS, in the case of an ill-fit univariate model a better
option may be to increase the dimensionality of regressors with a copy
of itself and cross-validate the number of clusters n.best
via:
NNS.stack(IVs.train = cbind(x, x), DV.train = y, method = 1, ...)
.
set.seed(123)
<- rnorm(100); y <- rnorm(100)
x
<- NNS.stack(IVs.train = cbind(x, x),
nns.params DV.train = y,
method = 1, ncores = 1)
NNS.reg(cbind(x, x), y,
n.best = nns.params$NNS.reg.n.best,
point.est = cbind(x, x), ncores = 1)
If the user is so motivated, detailed arguments further examples are provided within the following: