NNS.reg is a very robust regression technique capable of nonlinear regressions of continuous variables and classification tasks in machine learning problems.
We have extended the NNS.reg applications per the use of an ensemble method of classification in NNS.boost.
Popular boosting algorithms take a series of weak learning decision tree models, and aggregate their outputs. NNS is also a decision tree of sorts, by partitioning each regressor with respect to the dependent variable. We can directly control the number of “splits” with the NNS.reg(..., order = , ...) parameter.
We can see how NNS partitions each regressor by calling the $rhs.partitions output. You will notice that each partition is not an equal interval, nor of equal length, which differentiates NNS from other bandwidth techniques.
Higher dependence between a regressor and the dependent variable will allow for a larger number of partitions. This is determined internally with the NNS.dep measure.
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1: 4.300000 2.0 1.000000 0.1
## 2: 4.488889 2.6 1.325000 1.3
## 3: 4.771429 3.0 1.562500 2.5
## 4: 4.962500 3.2 2.266667 NA
## 5: 5.100000 3.7 3.483333 NA
## 6: 5.220000 4.4 4.127273 NA
## 7: 5.453846 NA 4.724138 NA
## 8: 5.657143 NA 5.334783 NA
## 9: 5.800000 NA 6.115789 NA
## 10: 5.966667 NA 6.900000 NA
## 11: 6.140000 NA NA NA
## 12: 6.300000 NA NA NA
## 13: 6.441667 NA NA NA
## 14: 6.680000 NA NA NA
## 15: 6.875000 NA NA NA
## 16: 7.233333 NA NA NA
## 17: 7.716667 NA NA NA
## 18: 7.900000 NA NA NA
NNS.boostThrough resampling of the training set and letting each iterated set of data speak for themselves, we can test various regressor combinations in these dynamic decision trees…only keeping those combinations that add predictive value. From there we simply aggregate the predictions.
NNS.boost will automatically search for an accuracy threshold from the training set, reporting iterations remaining and level obtained in the console. A plot of the frequency of the learning accuracy on the test set is also provided.
Once a threshold is obtained, NNS.boost will test various feature combinations against different splits of the training set and report back the frequency of each regressor used in the final estimate.
Let’s have a look and see how it works. We use 140 random iris observations as our training set and 10 observations as our test set.
set.seed(123)
test.set = sample(150,10)
a = NNS.boost(iris[-test.set, 1:4], iris[-test.set, 5],
IVs.test = iris[test.set, 1:4],
epochs = 100, learner.trials = 100, status = FALSE)## [1] 1
A perfect classification.
representative.sample uses a representation of each of the regressors via Tukey’s five number summary as well as mean and mode. This encoding of the regressors greatly reduces large datasets runtimes.
depth = "max" will force all observations to be their own partition, forcing a perfect fit of the multivariate regression. In essence, this is the basis for a kNN type of classification.
n.best = 1 will use the single nearest neighbor. When coupled with depth = "max", NNS will emulate a kNN = 1 but as the dimensions increase the results diverge demonstrating NNS is less sensitive to the curse of dimensionality than kNN.
extreme will use the maximum threshold obtained, and may result in errors if that threshold cannot be eclipsed by subsequent iterations.
If the user is so motivated, detailed arguments further examples are provided within the following: