## install.packages("devtools")
## devtools::install_github("r-lib/devtools")
url <- "http://owos.gm.fh-koeln.de:8055/bartz/spot.git"
devtools::install_git(url = url)2.0.21.library("SPOT")
packageVersion("SPOT")
#> [1] '2.1.2'The performance of modern search heuristics such as evolution strategies (ES), differential evolution (DE), or simulated annealing (SANN) relies crucially on their parameterizations—or, statistically speaking, on their factor settings.
Finding good parameter settings for an optimization algorithm will be referred to as tuning.
We will illustrate how an existing search heuristic can be tuned using the sequential parameter optimization toolbox (SPOT), which is one possible implementation of the sequential parameter optimization (SPO) framework introduced in~.
The version of SPOT presented in this article is implemented in R.
R is a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc."~.
can be downloaded from CRAN.
The package can be installed from within R using the installPackages command.
install.packages("SPOT")
R work space every time a new R session is started.SPOT can be loaded to the work space with R’s library command.library("SPOT")In order to keep the setup as simple as possible, we will use simulated annealing for illustrating the tuning procedure~.
Simulated annealing is freely available in .
This implementation of the simulated annealing heuristic will be referred to as SANN in the following.
Response surface methodology (RSM) will be used in this article~.
It can be seen as a set of statistical methods for empirical model building.
Using design of experiments, a response (dependent variable, output variable, or fitness value, \(y\)) that depends on one or several input variables (independent variables or solutions, \(\vec{x}\)) is optimized.
The underlying model can be formulated as \[ y = f(\vec{x}) + \epsilon, \] where \(\epsilon\) represents some noise (uncertainty, error observed) in the response \(y\).
The term response surface refers to the surface represented by \(f(\vec{x})\).
In order to estimate the quality of a solution, the term fitness is used in evolutionary optimization.
In physics, the concept of a potential or energy function is used.
Since we are dealing with minimization, a low fitness value \(f(\vec{x})\) implies that \(\vec{x}\) is a good solution.
This report is structured as follows.
SPOT configuration are described.SPOT approach.SPOT toolbox are described. These enable an exploratory fitness landscape analysis.SPOT to deterministic problems is briefly explained.SPOT. This setting will be referred to as algorithm tuning.SPOT can be applied as an optimizer. In this case, SPOT tries to find arguments of the objective function that result in an optimal function value. Following the taxonomy introduced in~, this setting will be referred to as surrogate model based optimization.(L1) The real-world system. This system allows the specification of an objective function, say \(f\). As an example, we will use the sphere function in the following.
(L2) The optimization algorithm, here SANN. It requires the specification of algorithm parameters.
(L3) The tuning algorithm, here SPOT.
An optimization algorithm (L2) requires parameters, e.g., the initial temperature of SANN or the mutation rate of evolution strategies.
These parameters determine the performance of the optimization algorithm.
Therefore, they should be tuned. The algorithm is in turn used to determine optimal values of the objective function \(f\) from level (L1).
The term algorithm design summarizes factors that influence the behavior (performance) of an algorithm, whereas problem design refers to factors from the optimization (simulation) problem.
The initial temperature in SANN is one typical factor which belongs to the algorithm design, the search space dimension belongs to the problem design.
SPOT itself can be used as a surrogate model based optimization algorithm.SPOT has the same role as SANN in the algorithm tuning scenario.SPOT finds improved solutions in the following way (see the following pseudo code):
SPOT algorithm.SANN can be started, the user has to specify an objective function \(f\).sphere <- function (x){
sum(x^2)
}
sphere( c(1,2) )
#> [1] 5sphere function uses vector inputs.funSphere in the SPOT package.funSphere
#> function (x)
#> {
#> matrix(apply(x, 1, function(x) {
#> sum(x^2)
#> }), , 1)
#> }
#> <bytecode: 0x7fd25f846438>
#> <environment: namespace:SPOT>function (x)
{
matrix(apply(x, 1, function(x) {
sum(x^2)
}), , 1)
}
plotFunction(funSphere)SANNSimulated annealing is a generic probabilistic heuristic algorithm for global optimization~.
The name comes from annealing in metallurgy.
Controlled heating and cooling of a material reduces defects.
Heating enables atoms to leave their initial positions (which are local minima of their internal energy), and controlled cooling improves the probability to find positions with lower states of internal energy than the initial positions.
The SANN algorithm replaces the current solution with a randomly generated new solution.
Better solutions are accepted deterministically, where worse solutions are accepted with a probability that depends on the difference between the corresponding function values and on a global parameter, which is commonly referred to as the temperature.
The algorithm parameter temp specifies the initial temperature of the SANN algorithm.
The temperature is gradually decreased during the optimization.
A second parameter, `tmax, is used to model this cooling scheme.
We consider the R implementation of SANN, which is available via the general-purpose optimization function optim() from the R package stats, which is part of every R installation.
The function optim() is parametrized as follows
optim(par, fn, gr = NULL, ..., method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), lower = -Inf, upper = Inf, control = list(), hessian = FALSE)
Here, par denotes initial values for the parameters to be optimized over.
Note, the problem dimension is specified by the length of this vector, so par=c(1,1,1,1) denotes a four-dimensional optimization problem.
fn is a function to be minimized (or maximized), with first argument the vector of parameters over which minimization is to take place.
gr defines a function to return the gradient for the BFGS, CG and L-BFGS-B methods.
If it is NULL, a finite-difference approximation will be used.
For the SANN method it specifies a function to generate a new candidate point.
If it is NULL, a default Gaussian Markov kernel is used.
The symbol ... represents further arguments (optional) that can be be passed to fn and gr.
The parameter method denotes the optimization method to be used.
Here, we will use the parameter value SANN.
The parameters lower, upper specify bounds on the variables for the “L-BFGS-B” method, or bounds in which to search for method Brent.
So, we will not use these variables in our examples.
The argument control defines a relatively long list of control parameters}. We will use the following parameters from this list:
maxit, i.e., the maximum number of iterations, which is for SANN the maximum number of function valuations. This is the stopping criterion.temp controls the SANN algorithm. It is the starting temperature for the cooling schedule with a default value of 10.tmax, which is the number of function evaluations at each temperature for the SANN method. Its default value is also 10.To obtain reproducible results, we will set the random number generator (RNG) seed.
Using a two-dimensional objective function (sphere) and the starting point (initial values for the parameters to be optimized over) \((10,10)\), we can execute the optimization runs as follows:
set.seed(123)
resSANN <- optim(c(10,10), sphere, method="SANN",
control=list(maxit=100, temp=10, tmax = 10))
resSANN
#> $par
#> [1] 4.835178 4.664964
#>
#> $value
#> [1] 45.14084
#>
#> $counts
#> function gradient
#> 100 NA
#>
#> $convergence
#> [1] 0
#>
#> $message
#> NULLThe best, i.e., smallest, function value, which was found by SANN, reads 45.14084.
The corresponding point in the search space is approximately (4.835178, 4.664964).
No gradient information was used and one hundred function evaluations were performed.
The variable convergence is an integer code, and its value 0 indicates successful completion of the SANN run.
No additional message is returned.
Now that we have performed a first run of the SANN algorithm on our simple test function, we are interested in improving SANN’s performance.
The SANN heuristic requires some parameter settings, namely temp and tmax. If these values are omitted, a default value of ten is used.
The questions is:
temp=10 and tmax=10, adequate for SANN or can these values be improved?That is, we are trying to tune the SANN optimization algorithm.
A typical beginner in algorithm tuning would try to improve the algorithm’s performance by manually increasing or decreasing the algorithm parameter values, e.g., choosing temp = 20 and tmax = 5.
set.seed(123)
resSANN <- optim(par = c(10,10), fn = sphere, method="SANN",
control = list(maxit = 100, temp = 20, tmax = 5))
resSANN
#> $par
#> [1] 6.163905 6.657100
#>
#> $value
#> [1] 82.3107
#>
#> $counts
#> function gradient
#> 100 NA
#>
#> $convergence
#> [1] 0
#>
#> $message
#> NULLSPOT.SPOT is very similar to the setup discussed in this section, it enables deeper insights into the algorithm’s performance.SPOT can be used to tune the SANN algorithm defined at level L2.SANN algorithm.SANN, the sphere function was chosen.sphere() test function, which was introduced above.x0 for the search.x0 = c(-1,1,-1) * Since `x0` has three elements, we are facing a three dimensional optimization problem.
* `SANN` will be used to determine its minimum function value.
SANN algorithm to be tuned.SANN is specified via maxit:maxit = 100 R implementation of SANN will be used via the optim() function.temp) andtmax).SANN, which were used for this simple example are summarized in the following table.| Name | Symbol | Factor name |
|---|---|---|
| Initial temperature | \(t\) | temp |
| Number of function evaluations at each temperature | $t_{} $ | tmax |
| Starting point | \(\vec{x_0} = (-1,1,-1)\) | x0 |
| Problem dimension | \(n=3\) | |
| Objective function | sphere | sphere() |
| Quality measure | Expected performance, e.g., \(E(y)\) | y |
| Initial seed | \(s\) | 1 |
| Budget | \(\textrm{maxit} = 100\) | maxit |
SANN with SPOT, the wrapper function sann2spot() is used.SPOT uses matrices as the basic data structure.
matrix() command can be used as follows:matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
temp, tmax), and each column specifies the parameters.temp, tmax) parameter settings.sann2spot <- function(algpar){
performance <- NULL
for (i in 1:nrow(algpar)){
resultList <- optim(par = c(10,10),
fn = sphere,
method = "SANN",
control = list(maxit = 100,
temp = algpar[i,1],
tmax = algpar[i,2]))
performance <- c(performance,resultList$value)
}
return(matrix(performance,,1))
}SANN with temp = 10 and tmax = 10.SANN run is performed using temp = 20 and tmax = 5.set.seed(123)
sann2spot(algpar = matrix(c(10,10),1))
#> [,1]
#> [1,] 45.14084set.seed(123)
sann2spot(algpar=matrix(c(5,20),1))
#> [,1]
#> [1,] 4.469163SPOT itself has various parameters that need to be configured so that it can solve the tuning problem efficiently.control list can be defined for SPOT.SANN + sphere()) will be explained.SANN’s parameters temp and tmax are integers, we provide this type information via types = c("integer", "integer").funEvals:SANN algorithm, is specified via funEvals.noise:SANN is a stochastic optimizer, so we specify noise = TRUE.seedFun:seedFun = 1.seedFun, subsequent evaluations will increment this value.replicates:replicates = 2 is used.seedSPOT:SPOT can be specified using seedSPOT = 1.SPOT run is reproducible, since SPOT itself may also be of a stochastic nature (depending on the configuration).design:design parameter defines the method to be used to generate an initial design (a set of initial algorithm settings (here: a number of pairs of temp and tmax).design = designLHD.model:model can be trained to learn the relation between algorithm parameters (temp,tmax) and algorithm performance.model, we use a random forest implementation~.model = buildRandomForest.optimizer:modelis trained, we need an optimizer to find the best potential algorithm configuration, based on the model.model: optimizer = optimLHD.optimizerControl:The specified optimizer may have options that need to be set.
Here, we only specify the number of model evaluations to be performed by the optimizer with optimizerControl = list(funEvals=1000).
Overall, we obtain the following configuration:
spotConfig <- list(
types = c("integer", "integer"), #data type of tuned parameters
funEvals = 50, #maximum number of SANN runs
noise = TRUE, #problem is noisy (SANN is non-deterministic)
seedFun = 1, #RNG start seed for algorithm calls (iterated)
replicates = 2, #2 replicates for each SANN parameterization
seedSPOT = 1, #main RNG
design = designLHD, #initial design: Latin Hypercube
model = buildRandomForest, # model = buildKriging Kriging surrogate model
optimizer = optimLHD, #Use LHD to optimize on model
optimizerControl = list(funEvals=100) #100 model evals in each iteration
)SPOT’s search intervals for the SANN parameters, i.e., for tmax and temp.temp and tmax will be tuned in the region between one and 100.tempLo = 1
tempHi = 100
tmaxLo = 1
tmaxHi = 100
lower=c(tempLo,tmaxLo)
upper=c(tempHi,tmaxHi)temp then tmax) has to be the same as in the sann2spot() interface.spot()SPOT experiment.SPOT via spot().resRf:resRf <- spot(x=NULL,
fun=sann2spot,
lower=lower,
upper=upper,
control=spotConfig)SPOT run, which is stored in the list resRf, has the following structure.str(resRf)
#> List of 7
#> $ xbest : num [1, 1:2] 12 78
#> $ ybest : num [1, 1] 0.0085
#> $ x : num [1:50, 1:2] 9 72 45 15 26 57 70 93 33 88 ...
#> $ y : num [1:50, 1] 0.226 28.891 59.309 0.192 2.864 ...
#> $ count : int 50
#> $ msg : chr "budget exhausted"
#> $ modelFit:List of 3
#> ..$ rfFit:List of 17
#> .. ..$ call : language randomForest(x = x, y = y)
#> .. ..$ type : chr "regression"
#> .. ..$ predicted : num [1:49] 0.323 31.175 23.541 28.39 10.454 ...
#> .. ..$ mse : num [1:500] 226 161 223 185 180 ...
#> .. ..$ rsq : num [1:500] 0.364 0.547 0.371 0.478 0.492 ...
#> .. ..$ oob.times : int [1:49] 177 175 194 177 192 165 174 199 212 173 ...
#> .. ..$ importance : num [1:2, 1] 7780 7508
#> .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. ..$ : chr [1:2] "1" "2"
#> .. .. .. ..$ : chr "IncNodePurity"
#> .. ..$ importanceSD : NULL
#> .. ..$ localImportance: NULL
#> .. ..$ proximity : NULL
#> .. ..$ ntree : num 500
#> .. ..$ mtry : num 1
#> .. ..$ forest :List of 11
#> .. .. ..$ ndbigtree : int [1:500] 29 31 31 29 33 23 25 27 31 31 ...
#> .. .. ..$ nodestatus : int [1:37, 1:500] -3 -1 -3 -3 -3 -1 -1 -3 -3 -3 ...
#> .. .. ..$ leftDaughter : int [1:37, 1:500] 2 0 4 6 8 0 0 10 12 14 ...
#> .. .. ..$ rightDaughter: int [1:37, 1:500] 3 0 5 7 9 0 0 11 13 15 ...
#> .. .. ..$ nodepred : num [1:37, 1:500] 8.85 65.11 3.85 20.22 1.33 ...
#> .. .. ..$ bestvar : int [1:37, 1:500] 2 0 2 1 1 0 0 1 1 1 ...
#> .. .. ..$ xbestsplit : num [1:37, 1:500] 50.5 0 70.5 49 16.5 0 0 9 40 4.5 ...
#> .. .. ..$ ncat : num [1:2] 1 1
#> .. .. ..$ nrnodes : int 37
#> .. .. ..$ ntree : num 500
#> .. .. ..$ xlevels :List of 2
#> .. .. .. ..$ : num 0
#> .. .. .. ..$ : num 0
#> .. ..$ coefs : NULL
#> .. ..$ y : num [1:49, 1] 0.226 28.891 59.309 0.192 2.864 ...
#> .. ..$ test : NULL
#> .. ..$ inbag : NULL
#> .. ..- attr(*, "class")= chr "randomForest"
#> ..$ x : num [1:49, 1:2] 9 72 45 15 26 57 70 93 33 88 ...
#> ..$ y : num [1:49, 1] 0.226 28.891 59.309 0.192 2.864 ...
#> ..- attr(*, "class")= chr "spotRandomForest"SPOT generates many information which can be used for a statistical analysis.cbind(resRf$xbest, resRf$ybest)
#> [,1] [,2] [,3]
#> [1,] 12 78 0.008496108SPOT recommends using temp =resRf$xbest[1]
#> [1] 12and tmax =
resRf$xbest[2]
#> [1] 78temp should be low and the number of evaluations at each temperature tmax should be high.SANN, which leads to a very localized search.spot() InterfaceSPOT uses the same interface as R’s standard optim() function, which uses the arguments reported in the following Table:| name | description |
|---|---|
x |
Optional start point (or set of start points), specified as a matrix. One row for each point, and one column for each optimized parameter. |
fun |
Objective function. It should receive a matrix x and return a matrix y. In case the function uses external code and is noisy, an additional seed parameter may be used, see the control$seedFun argument in the function documentation for details. |
lower |
Vector that defines the lower boundary of search space |
upper |
Vector that defines the upper boundary of search space |
control |
List of additional settings |
upper, lower and, if specified, x.
lower will be taken into account to establish the dimension of the problem.spot() returns a list with the values shown in the following Table.| name | description | type |
|---|---|---|
xbest |
Parameters of the best found solution | matrix |
ybest |
Objective function value of the best found solution | matrix |
x |
Archive of all evaluation parameters | matrix |
y |
Archive of the respective objective function values | matrix |
count |
Number of performed objective function evaluations | integer |
msg |
Message specifying the reason of termination | character |
modelFit |
The fit of the model from the last SPOT iteration, i.e., an object returned by the last call to the function specified by control$model |
list |
SPOT configuration settings are listed in following Table.| name | description | default |
|---|---|---|
funEvals |
Budget of function evaluations (spot uses no more than funEvals evaluations of fun). | 20 |
types |
Vector of data type of each variable as a string. | "numeric" |
design |
A function that creates an initial design of experiment. Functions that accept the same parameters, and return a matrix like designLHD or designUniformRandom can be used. |
designLHD |
designControl |
List of controls passed to the control list of the design function. |
empty list |
model |
Function that builds a model of the observed data. Functions that accept the same parameters, and return a matrix like buildKriging or buildRandomForest can be used. |
buildKriging |
modelControl |
List of controls passed to the control list of the model function. |
empty list |
optimizer |
Function that is used to optimize the model, finding the most promising candidate solutions. Functions that accept the same parameters, and return a matrix like optimLHD or optimLBFGSB can be used. |
optimLHD |
optimizerControl |
List of controls passed to the control list of the optimizer function. |
empty list |
noise |
Boolean, whether the objective function has noise. | FALSE |
OCBA |
Boolean, indicating whether Optimal Computing Budget Allocation (OCBA) should be used in case of a noisy objective function. OCBA controls the number of replications for each candidate solution. Note, that replicates should be larger than one in that case, and that the initial experimental design (see design) should also have replicates larger than one. |
FALSE |
OCBAbudget |
Number of objective function evaluations that OCBA can distribute in each iteration. | 3 |
replicates |
Number of times a candidate solution is initially evaluated, that is, in the initial design, or when created by the optimizer. | 1 |
seedFun |
Initial seed for the objective function in case of noise. The default means that no seed is set. The user should be very careful with this setting. It is intended to generate reproducible experiments for each objective function evaluation, e.g., when tuning non-deterministic algorithms. If the objective function uses a constant number of random number generations, this may be undesirable. Note, that this seed is by default set prior to each evaluation. A replicated evaluation will receive an incremented value of the seed. Sometimes, the user may want to call external code using random numbers. To allow for that case, the user can specify an objective function (fun), which has a second parameter seed, in addition to first parameter (matrix x). This seed can then be passed to the external code, for random number generator initialization. See end of examples section in the documentation of SPOT for a demonstration. |
NA |
seedSPOT |
Value used to initialize the random number generator. It ensures that experiments are reproducible. | 1 |
duplicate |
In case of a deterministic (non-noisy) objective function, this handles duplicated candidate solutions. By default (duplicate = "EXPLORE"), duplicates are replaced by new candidate solutions, generated by random sampling with uniform distribution. If desired, the user can set this to "STOP", which means that the optimization stops and results are returned to the user (with a warning). This may be desirable, as duplicates can be a indicator of convergence, or problems with the configuration. In case of noise, duplicates are allowed regardless of this parameter. |
"EXPLORE" |
plots |
Logical. Should the progress be tracked by a line plot? | FALSE |
SPOT specific settings.x0 = (-1,1,-1) belongs to the problem design.temp are chosen from the interval \([1; 100]\).tmax are chosen from the interval \([1; 100]\).SPOT implements a sequential approach, i.e., the available budget is not used in one step. Rather, sequential steps are made, comprised of model training, optimization, and evaluation. To initialize this procedure, some first data set is required to train the first, coarse-grained meta model.SANN algorithm runs, i.e., the available budget, can be set to 10 using `funEvals = 10},designControl$size,designControl$replicates, andSPOT can be modified via replicates.spotConfig10 <- list(
funEvals = 10,
designControl = list(
size = 6,
replicates = 1
),
noise = TRUE,
seedFun = 1,
seedSPOT = 1,
replicates = 2,
model = buildRandomForest
)SPOT will have a remaining budget of four evaluations. These can be spent on sequentially testing two additional design points. Each of those will be evaluated twice.res10 <- spot( ,fun=sann2spot
,lower=lower
,upper=upper
,control=spotConfig10)temp and tmax, respectively.cbind(res10$x, res10$y)
#> [,1] [,2] [,3]
#> [1,] 8.954322 63.418391 0.29353146
#> [2,] 93.392836 43.125099 107.32848102
#> [3,] 25.643432 26.240373 16.14928784
#> [4,] 70.072590 96.524378 11.29012282
#> [5,] 47.651660 67.384965 3.93335321
#> [6,] 61.529701 8.874296 73.12225819
#> [7,] 6.561794 81.375416 0.25116700
#> [8,] 6.561794 81.375416 0.02126833
#> [9,] 11.225956 75.246159 2.66469591
#> [10,] 11.225956 75.246159 0.07434020designLHD() is the default setting to generate designs. A simple one-dimensional design with values from the interval \([-1, 1]\) can be generated as follows:
designLHD(,-1,1)
#> [,1]
#> [1,] 0.4350726
#> [2,] -0.2725918
#> [3,] 0.9844751
#> [4,] 0.1061902
#> [5,] -0.9456038
#> [6,] -0.5142940
#> [7,] 0.6935275
#> [8,] 0.2901311
#> [9,] -0.0977685
#> [10,] -0.6617153designLHD(, c(-1,-2,1,0),c(1,4,9,1)
, control=list(size=5, retries=100, types=c("numeric","integer","factor","factor")))
#> [,1] [,2] [,3] [,4]
#> [1,] -0.00173205 3 6 0
#> [2,] 0.41823095 -1 9 1
#> [3,] -0.45046368 4 5 1
#> [4,] -0.82865697 1 1 1
#> [5,] 0.68113235 -2 3 0set.seed(123)
x1 <- designLHD(,c(-1,-1),c(1,1),control=list(size=50,retries=100))
x2 <- designLHD(x1,c(-2,-2),c(2,2),control=list(size=50,retries=100))plot(x2,pch=1)
points(x1, pch=4)designUniformRandom() as follows:designUniformRandom(,c(-1,0),c(1,10),control=list(size=5))
#> [,1] [,2]
#> [1,] 0.3480606 3.447254
#> [2,] -0.3917176 2.413147
#> [3,] 0.7143919 8.768452
#> [4,] -0.4602907 1.681719
#> [5,] -0.5792664 9.799514SPOT, a meta model or surrogate model is used to determine promising algorithm design points.buildKriging() function is used for modeling.SPOT as a surrogate model based algorithmSPOT can be used as a surrogate model based optimization algorithm, i.e., no algorithm is tuned.# Objective function
braninFunction <- function (x) {
(x[2] - 5.1/(4 * pi^2) * (x[1] ^2) + 5/pi * x[1] - 6)^2 + 10 * (1 - 1/(8 * pi)) * cos(x[1] ) + 10
}
## Create 20 design points
set.seed(1)
x <- cbind(runif(20)*15-5, runif(20)*15)
## Compute observations at design points (for Branin function)
y <- as.matrix(apply(x,1,braninFunction))
## Create model with default settings
fit <- buildKriging(x,y,control = list(algTheta=optimLHD))
## Print model parameters
print(fit)
#> ------------------------
#> Forrester Kriging model.
#> ------------------------
#> Estimated activity parameters (theta) sorted
#> from most to least important variable
#> x1 x2
#> 7.575502 1.361329
#>
#> exponent(s) p:
#> 2
#>
#> Estimated regularization constant (or nugget) lambda:
#> 5.239774e-06
#>
#> Number of Likelihood evaluations during MLE:
#> 600
#> ------------------------
##Define a new location
newloc <- matrix(c(1,2),nrow =1 )
##Predict at new location
predict(fit,newloc)
#> $y
#> [1] 21.08753
## True value at location
braninFunction(newloc)
#> [1] 21.62764
## SPOT functions will not be numerical, but rather categorical.types), Hamming distance, that determines the number of positions at which the corresponding values are different, will be used instead of \(|x_i-x'_i|\).braninFunctionFactor <- function (x) {
y <- (x[2] - 5.1 / (4 * pi^2) * (x[1]^2) + 5 / pi * x[1] - 6)^2 + 10 * (1 - 1 / (8 * pi)) * cos(x[1]) + 10
if(x[3] == 1)
y <- y + 1
else if(x[3]==2)
y <- y - 1
y
}set.seed(1)
## Replace x with new data
x <- cbind(runif(50)*15-5,runif(50)*15,sample(1:3,50,replace=TRUE))
##
y <- as.matrix(apply(x,1,braninFunctionFactor))
fitDefault <- buildKriging(x,y,control = list(algTheta=optimLBFGSB))fitFactor <- buildKriging(x,y,control = list(algTheta=optimLBFGSB,types=c("numeric","numeric","factor")))##Replace xtest with new data
xtest <- cbind(runif(200)*15-5,runif(200)*15,sample(1:3,200,replace=TRUE))
##
ytest <- as.matrix(apply(xtest,1,braninFunctionFactor))
## Predict test data with both models, and compute error
ypredDef <- predict(fitDefault,xtest)$y
ypredFact <- predict(fitFactor,xtest)$y
mean((ypredDef-ytest)^2)
#> [1] 4.099175
mean((ypredFact-ytest)^2)
#> [1] 2.094953types variable.optimLHD() uses LHS to optimize a specified target function as follows:
designLHD(), then evaluated by the objective function.optimLHD() can be implemented as follows. It uses 100 design points as a default value.resOptimumLHD <- optimLHD(,fun = funSphere,lower = c(-10,-20),upper=c(20,8))
str(resOptimumLHD)
#> List of 6
#> $ x : num [1:100, 1:2] -5.528 -6.509 0.718 -4.327 15.608 ...
#> $ y : num [1:100, 1] 48.8 62.5 295.3 260.7 366.2 ...
#> $ xbest: num [1, 1:2] 0.136 -0.558
#> $ ybest: num [1, 1] 0.33
#> $ count: num 100
#> $ msg : chr "success"
resOptimumLHD$ybest
#> [,1]
#> [1,] 0.32992optimLHD() search.L-BFGS-B is a pure local search, which may not be ideal to solve potentially multi-modal tuning problems.resOptimBFGS <- optimLBFGSB(,fun = funSphere,lower = c(-10,-20),upper=c(20,8))
resOptimBFGS$ybest
#> [1] 2.098584e-40SPOT also includes interfaces to more sophisticated algorithms, such as differential evolution from DEoptim package or various methods included in the nloptr package.SPOT run.SPOT can be restarted, reusing the collected data.SPOT with continued evaluationSPOT will be used at level L2.SPOT uses 5 function evaluations.control01 <- list(
designControl = list(size = 5,
replicates = 1),
funEvals = 5)
res1 <- spot(,funSphere,
lower = c(-2,-3),
upper = c(1,2),
control01)
cbind(res1$x, res1$y)
#> [,1] [,2] [,3]
#> [1,] -0.8963358 0.2026923 0.844502
#> [2,] -1.7919899 -1.2888788 4.872436
#> [3,] 0.6002650 -0.8783081 1.131743
#> [4,] -0.5141893 -2.7545115 7.851724
#> [5,] 0.3353190 1.1433044 1.419584SPOT run, the command spotLoop() can be used as follows:
spotLoop(x, y, fun, lower, upper, control, ...).x: the known candidate solutions that the SPOT loop is started with, specified as a matrix. One row for each point, and one column for each optimized parameter.y: the corresponding observations for each solution in x, specified as a matrix. One row for each point.fun: is the objective function. It should receive a matrix x and should return a matrix y.lower: is the vector that defines the lower boundary of search space. This determines also the dimensionality of the problem.upper: is the vector that defines the upper boundary of search space.control: is the list with control settings for spot....: additional parameters passed to fun.control01$funEvals <- 8
res2 <- spotLoop(res1$x,
res1$y,
funSphere,
lower = c(-2,-3),
upper = c(1,2),
control01)
cbind(res2$x, res2$y)
#> [,1] [,2] [,3]
#> [1,] -0.8963358 0.202692255 0.8445020
#> [2,] -1.7919899 -1.288878778 4.8724363
#> [3,] 0.6002650 -0.878308079 1.1317431
#> [4,] -0.5141893 -2.754511486 7.8517241
#> [5,] 0.3353190 1.143304379 1.4195837
#> [6,] 0.7868298 -0.009795371 0.6191971
#> [7,] 0.4923921 -0.124161024 0.2578659
#> [8,] 0.3555719 0.023539907 0.1269855SPOT package offers three plot functions that can be used to visualize data or evaluate a model’s performance creating 2D and 3D surface plots.
plotFunction() plots function objectsplotData() plots dataplotModel() plots model objects, created by build* functions from the SPOT package.plotFunction() visualizes the fitness landscape.f().plotFunction(f , lower , upper , type)plotFunction() requires a function that handles matrix objects, so funSphere() is used.plotFunction(funSphere, rep(-1,2), rep(1,2)) myFunction <- function (x){
matrix(apply(x, # matrix
1, # margin (apply over rows)
function(x) sum(x^3-1) # objective function
),
, 1) # number of columns
}
plotFunction(myFunction,
rep(-1,2),
rep(1,2),
color.palette = rainbow) plotFunction(myFunction,
rep(-1,2),
rep(1,2),
type="persp",
theta=10,
phi=25,
border = NA)plotModel() offers the possibility to visualize models that have already been trained, e.g., during a SPOT run.set.seed(123)
k <- 30
x.test <- designLHD(,rep(-1,3),rep(1,3), control = list(size = k))
y.test <- funSphere(x.test)
head( cbind(x.test, y.test))
#> [,1] [,2] [,3] [,4]
#> [1,] 0.6721562 -0.67693655 -0.06068276 0.9137194
#> [2,] -0.0813375 0.74944336 0.59109728 0.9176771
#> [3,] 0.5462156 -0.39765660 -0.09992404 0.4664671
#> [4,] 0.2437057 -0.29365065 0.21488375 0.1917982
#> [5,] -0.3043235 0.25574608 -0.30583872 0.2515562
#> [6,] 0.4131432 0.02414995 0.12575375 0.1870845SPOT’s buildRSM() function.plotModel().fit.test <- buildRSM(x.test,y.test)
plotModel(fit.test)type="contour" to the plotModel() function, a 2D contour plot can be generated as shown in the Figure.which specifies the independent variables \(x_i\) that are plotted.which=c(1,3) can be used.plotModel(fit.test,which=c(1,3),type="contour",pch1=24,col1="blue")theta and phi can be used to modify the view point.plotModel(fit.test,which=c(1,3),type="persp",border="NA",theta=255,phi=20)plotData()plotData(), different models built on provided data can be compared.plotData() function generates a (filled) contour or perspective plot of a data set with two independent and one dependent variable.LOESS function is used.LOESS and random forestLOESS model is used for interpolation.plotData(x.test,y.test)plotData(x.test,y.test,type="filled.contour",cex1=1,col1="red",pch1=21,model=buildRandomForest)plotData(x.test,y.test,type="persp",border=NA,model=buildLOESS)SPOT package can be used to perform a visual inspection of the fitness landscape during an interactive SPOT run.SPOT is used for the tuning procedure of two SANN design parameters:
temp and thetmax.SPOT builds a surrogate model during the sequential optimization, this model can be used to visualize the fitness landscape. In this case, the plotModel() function will be used.plotData() function will be u sed. Note, that the plotData() function allows the specification of several interpolation functions (LOESS is default).plotFunction() is usually not applicable, because the underlying (true) analytical function is not known.SPOT model based approach first.plotModel()SPOT run might be the most generic way of visualizing the results, because during the optimization, the optimizer trusted this model. So, why should it be considered unreliable after the optimization is finished?resRf data), we will demonstrate how the final model, which was built during the SPOT run, can be plotted.SPOT run, i.e., in resRf, the parameter resRf$modelFit() can be passed as an argument to the plotModel() function.SPOT run with random forest.plotModel(resRf$modelFit)plotData()resRf result data were obtained with the random forest model.LOESS) or Kriging model as follows.LOESSplotData(resRf$x,resRf$y,model=buildLOESS)plotData(resRf$x,resRf$y,model=buildKriging)SPOT run and for the final illustration.
SPOT configuration parameters can be changed as follows:spotConfig$model = buildKriging
spotConfig$optimizer = optimLBFGSB
spotConfig$modelControl = list(algTheta=optimLBFGSB)
## Run SPOT
resK <- spot(x=NULL,
fun=sann2spot,
lower=lower,
upper=upper,
control=spotConfig)SPOT run will be used to visualize the fitness landscape.plotModel(resK$modelFit)SPOT run and the landscape from the Example, which used the Kriging-based SPOT run, differ.resRf data, used 50 runs of the SANN algorithm.spotConfig$funEvals <- 100
spotConfig$model <- buildRandomForest
res100Rf <- spotLoop(resRf$x,
resRf$y,
fun=sann2spot,
lower=lower,
upper=upper,
control=spotConfig)resK continued with 50 additional runsresK.spotConfig$model = buildKriging
spotConfig$optimizer = optimLBFGSB
spotConfig$modelControl = list(algTheta=optimLBFGSB)
res100K <- spotLoop(resK$x,
resK$y,
fun=sann2spot,
lower=lower,
upper=upper,
control=spotConfig)resRf100 and res100K) is shown in the following Figures.resRf100. Long run using Random forest model.res100K. Long run with 100 function evaluations using a Kriging model.plotModel(res100Rf$modelFit)plotModel(res100K$modelFit)rsm package, which is maintained by~, the buildRSM() function builds a linear response surface model.buildRSM(x, y, control = list()) are as follows:
x: design matrix (sample locations), rows for each sample, columns for each variable.y: vector of observations at xcontrol: list, with the options for the model building procedure:
buildRSM(), the response surface model is build.descentSpotRSM() returns the path of the steepest descent.x <- designUniformRandom(lower=rep(-5,2),
upper=rep(15,2),
control=list(size=20))
y <- funSphere(x)fit <- buildRSM(x,y)predict(fit,cbind(1,2))
#> $y
#> [,1]
#> [1,] 5sphere(c(1,2))
#> [1] 5descentSpotRSM(fit)
#> Path of steepest descent from ridge analysis:
#> $x
#> V1 V2
#> 1 5.801230 4.503401
#> 2 5.252816 3.839109
#> 3 4.674758 3.223662
#> 4 4.074467 2.647291
#> 5 3.444532 2.119765
#> 6 2.792364 1.631315
#> 7 2.125374 1.172172
#> 8 1.443562 0.752105
#> 9 0.746928 0.371114
#> 10 0.035472 0.019430
#>
#> $y
#> [,1]
#> [1,] 53.934890080
#> [2,] 42.330833844
#> [3,] 32.245359049
#> [4,] 23.609430973
#> [5,] 16.358204354
#> [6,] 10.458485338
#> [7,] 5.891201837
#> [8,] 2.649533179
#> [9,] 0.695627038
#> [10,] 0.001635788plot(fit)SPOT run in Example res100K.buildRSM().rsm100K <- buildRSM(x=res100K$x,
y=res100K$y)
summary(rsm100K$rsmfit)
#>
#> Call:
#> rsm(formula = y ~ FO(x1, x2) + TWI(x1, x2) + PQ(x1, x2), data = codedData)
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 27.8212 4.2690 6.5171 3.533e-09 ***
#> x1 26.9435 2.8975 9.2990 5.689e-15 ***
#> x2 -21.5854 4.1993 -5.1402 1.488e-06 ***
#> x1:x2 -23.8875 5.9749 -3.9980 0.0001271 ***
#> x1^2 -2.2793 5.3683 -0.4246 0.6721072
#> x2^2 1.5927 6.0336 0.2640 0.7923808
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Multiple R-squared: 0.6211, Adjusted R-squared: 0.601
#> F-statistic: 30.82 on 5 and 94 DF, p-value: < 2.2e-16
#>
#> Analysis of Variance Table
#>
#> Response: y
#> Df Sum Sq Mean Sq F value Pr(>F)
#> FO(x1, x2) 2 31156.8 15578.4 68.6460 < 2.2e-16
#> TWI(x1, x2) 1 3737.4 3737.4 16.4687 0.0001022
#> PQ(x1, x2) 2 77.3 38.7 0.1704 0.8436197
#> Residuals 94 21332.2 226.9
#> Lack of fit 30 3761.3 125.4 0.4567 0.9897826
#> Pure error 64 17570.9 274.5
#>
#> Stationary point of response surface:
#> x1 x2
#> -0.7345253 1.2681073
#>
#> Stationary point in original units:
#> V1 V2
#> 14.00826 106.46699
#>
#> Eigenanalysis:
#> eigen() decomposition
#> $values
#> [1] 11.75633 -12.44293
#>
#> $vectors
#> [,1] [,2]
#> x1 0.6480722 -0.7615789
#> x2 -0.7615789 -0.6480722(xSteep <- descentSpotRSM(rsm100K) )
#> Path of steepest descent from ridge analysis:
#> $x
#> V1 V2
#> 1 46.080 50.2900
#> 2 41.964 52.8010
#> 3 37.456 54.8005
#> 4 32.605 56.0560
#> 5 27.264 56.1955
#> 6 21.727 55.1260
#> 7 16.190 53.0800
#> 8 10.947 50.4760
#> 9 5.900 47.5930
#> 10 1.098 44.6170
#>
#> $y
#> [,1]
#> [1,] 24.476384
#> [2,] 21.347709
#> [3,] 18.384709
#> [4,] 15.611535
#> [5,] 12.920574
#> [6,] 10.279201
#> [7,] 7.555571
#> [8,] 4.742900
#> [9,] 1.725641
#> [10,] -1.472259xNew <- xSteep$x[8,](yNew <- sann2spot(xNew))
#> [,1]
#> [1,] 0.04315271x101 <- rbind(res100K$x, xNew)
y101 <- rbind(res100K$y, yNew)
rsm101K <- buildRSM(x=x101,
y=y101)
summary(rsm101K$rsmfit)
#>
#> Call:
#> rsm(formula = y ~ FO(x1, x2) + TWI(x1, x2) + PQ(x1, x2), data = codedData)
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 27.6726 4.2214 6.5553 2.868e-09 ***
#> x1 27.0095 2.8757 9.3922 3.291e-15 ***
#> x2 -21.6066 4.1787 -5.1706 1.291e-06 ***
#> x1:x2 -23.9811 5.9386 -4.0382 0.0001092 ***
#> x1^2 -2.1542 5.3273 -0.4044 0.6868473
#> x2^2 1.7749 5.9759 0.2970 0.7671137
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Multiple R-squared: 0.6218, Adjusted R-squared: 0.6018
#> F-statistic: 31.23 on 5 and 95 DF, p-value: < 2.2e-16
#>
#> Analysis of Variance Table
#>
#> Response: y
#> Df Sum Sq Mean Sq F value Pr(>F)
#> FO(x1, x2) 2 31260.8 15630.4 69.5376 < 2.2e-16
#> TWI(x1, x2) 1 3763.0 3763.0 16.7412 8.982e-05
#> PQ(x1, x2) 2 77.6 38.8 0.1725 0.8418
#> Residuals 95 21353.7 224.8
#> Lack of fit 31 3782.8 122.0 0.4445 0.9924
#> Pure error 64 17570.9 274.5
#>
#> Stationary point of response surface:
#> x1 x2
#> -0.715249 1.254782
#>
#> Stationary point in original units:
#> V1 V2
#> 14.9528 105.8474
#>
#> Eigenanalysis:
#> eigen() decomposition
#> $values
#> [1] 11.96075 -12.34012
#>
#> $vectors
#> [,1] [,2]
#> x1 0.6474238 -0.7621302
#> x2 -0.7621302 -0.6474238plot(rsm101K)SPOT’s descentSpotRSM() function.descentSpotRSM(rsm101K)
#> Path of steepest descent from ridge analysis:
#> $x
#> V1 V2
#> 1 46.080 50.2900
#> 2 41.915 52.8010
#> 3 37.456 54.8005
#> 4 32.556 56.0095
#> 5 27.264 56.1490
#> 6 21.678 55.0330
#> 7 16.190 52.9870
#> 8 10.898 50.3830
#> 9 5.900 47.5465
#> 10 1.098 44.5240
#>
#> $y
#> [,1]
#> [1,] 24.323160
#> [2,] 21.168389
#> [3,] 18.232360
#> [4,] 15.451965
#> [5,] 12.788653
#> [6,] 10.133240
#> [7,] 7.438115
#> [8,] 4.600924
#> [9,] 1.619128
#> [10,] -1.570104SPOT.spotConfig$model = buildKriging
spotConfig$optimizer = optimLBFGSB
spotConfig$modelControl = list(algTheta=optimLBFGSB)
spotConfig$funEvals <- 110
res110K <- spotLoop(x=x101,
y=y101,
fun=sann2spot,
lower=lower,
upper=upper,
control=spotConfig)buildRSM() was used to build a response surface.SPOT run with 100 function evaluations, one additional point, which was calculated using the steepest descent function descentSpotRSM(), and one additional SPOT run with nine additional design points, were used to generate this model.plotModel(res110K$modelFit)resRf100.SPOT’s buildRandomForest() function is a wrapper function for the randomForest() function from the randomForest package.randomForest package has no default plot function, we switch to the party package.
ctree() function, which can be applied as follows:tmaxtempz.df <- data.frame(res100K$x[,1], res100K$x[,2], res100K$y)
names(tmaxtempz.df) <- c("tmax", "temp", "y")
tmaxtempz.tree <- party::ctree(y ~ ., data=tmaxtempz.df)
plot(tmaxtempz.tree, type="simple")SPOT run can be used for building linear models.lm() function for building the linear model.xyz100K.df <- data.frame(res100K$x[,1], res100K$x[,2], res100K$y)
names(xyz100K.df) <- c("x", "y", "z")
lm100K <- lm(z ~ x*y, data=xyz100K.df)
summary(lm100K)
#>
#> Call:
#> lm(formula = z ~ x * y, data = xyz100K.df)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -58.767 -1.434 0.318 0.392 48.363
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -3.077837 6.970272 -0.442 0.660
#> x 1.063641 0.154412 6.888 5.87e-10 ***
#> y 0.039537 0.117752 0.336 0.738
#> x:y -0.010534 0.002573 -4.094 8.85e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 14.93 on 96 degrees of freedom
#> Multiple R-squared: 0.6197, Adjusted R-squared: 0.6079
#> F-statistic: 52.15 on 3 and 96 DF, p-value: < 2.2e-16plot(lm100K)R’s termplot() function can be used to plot regression terms against their predictors, optionally with standard errors and partial residuals added.par(mfrow=c(1,2))
termplot(lm100K, partial = TRUE, smooth = panel.smooth, ask=FALSE)
par(mfrom=c(1,1))car package provides the function avPlots(), which can be used for visualization as follows:par(mfrow=c(1,3))
car::avPlots(lm100K,ask=F)
par(mfrow=c(1,1))SPOT in a simple setting:
SPOT can be used for optimizing deterministic problems directly, i.e., we apply SPOT in the context of surrogate model based optimization.SPOT will be used for minimizing the sphere function.SANN heuristic, which in turn optimizes the sphere function, SPOT tries to find the minimum of the deterministic sphere function directly.SPOT configuration in deterministic settings.res <- spot(,funSphere,c(-5,-5),c(5,5), control=list(optimizer=optimLBFGSB)) res$xbest
#> [,1] [,2]
#> [1,] -0.005598363 0.004331417
res$ybest
#> [,1]
#> [1,] 5.010284e-05SPOT provides several models that can be used as surrogates.
Sometimes it is not obvious, which surrogate should be chosen.
Ensemble-based models provide a well-established solution to this model selection problem~.
Therefore, SPOT provides a stacking approach, that combines several models in a sophisticated manner.
The stacking procedure is described in detail in~.
We will use the data from Example plotTrained to illustrate the stacking approach.
fit.stack <- buildEnsembleStack(x.test, y.test)plotModel(fit.stack)xNew <- cbind(1,1,1)
predict(fit.stack, xNew)
#> $y
#> 1
#> 2.919332
funSphere(xNew)
#> [,1]
#> [1,] 3SPOT.SPOT for automated and interactive tuning were illustrated and the underling concepts of the SPOT approach were explained.SPOT approach are techniques such as exploratory fitness landscape analysis and response surface methodology.