mlr3mbo makes Bayesian Optimization (BO) available within the mlr3 ecosystem. BO can be used for optimizing any black box function, and is very suitable for hyperparameter optimization of machine learning models. mlr3mbo allows for building custom BO algorithms relying on building blocks in a modular fashion, but also provides a variety of standard single- and multi-objective BO algorithms that can be used in a straightforward manner.
We assume that the reader is somewhat familiar with black box optimization and bbotk, hyperparameter optimization and mlr3tuning and knows the basics of BO. Background material is, for example, given by Garnett (2022), Bischl, Binder, et al. (2023), and Bischl, Sonabend, et al. (2023).
BO is an iterative optimization algorithm that makes use of a so-called surrogate to model the unknown black box function. After having observed an initial design of observations, the surrogate model is trained on all data points observed so far and an acquisition function is used to determine which points of the search space are promising candidates that should be evaluated next. The acquisition function relies on the prediction of the surrogate model and requires no evaluation of the true black box function and therefore is comparably cheap to optimize. After having evaluated the next candidate, the process repeats itself until a given termination criteria is met.
Most BO flavors therefore follow a simple loop:
In the following, the basic building blocks of BO and their implementation in mlr3mbo are introduced in more detail.
The loop_function
determines the behavior of the BO algorithm on a global level, i.e., how
the subroutine should look like that is performed at each iteration.
To get an overview of readily available loop_functions,
the following dictionary can be inspected:
library(mlr3mbo)
library(data.table)
as.data.table(mlr_loop_functions)
#> Key: <key>
#>                key                         label    instance
#>             <char>                        <char>      <char>
#> 1:    bayesopt_ego Efficient Global Optimization single-crit
#> 2:    bayesopt_emo           Multi-Objective EGO  multi-crit
#> 3:   bayesopt_mpcl      Multipoint Constant Liar single-crit
#> 4: bayesopt_parego                        ParEGO  multi-crit
#> 5: bayesopt_smsego                       SMS-EGO  multi-crit
#>                                   man
#>                                <char>
#> 1:    mlr3mbo::mlr_loop_functions_ego
#> 2:    mlr3mbo::mlr_loop_functions_emo
#> 3:   mlr3mbo::mlr_loop_functions_mpcl
#> 4: mlr3mbo::mlr_loop_functions_parego
#> 5: mlr3mbo::mlr_loop_functions_smsegoThe dictionary shows the key, i.e., ID of the
loop_function, a brief description, for which optimization
instance the resulting BO flavor can be used, as well as how
documentation can be accessed.
Technically, all loop_functions are members of the
S3 class loop_function,
and are simply decorated functions
(because using an R6Class
class would be over the top here - but this may change in the
future).
To write an own loop_function, users can get inspiration
from the readily available ones, e.g., bayesopt_ego
which performs sequential single-objective optimization:
After having made some assertions and safety checks, and having
evaluated the initial design, bayesopt_ego essentially only
performs the following steps:
acq_function$surrogate$update() # update the surrogate
modelacq_function$update() # update the acquisition function
(e.g., update the best value observed so far)acq_optimizer$optimize() # optimize the acquisition
function to yield a new candidateA surrogate encapsulates a regression learner that models the unknown
black box function based on observed data. In mlr3mbo, SurrogateLearner
and SurrogateLearnerCollection
are the higher-level R6 classes which should be used to
construct a surrogate, inheriting from the base Surrogate
class.
As a learner, any LearnerRegr
from mlr3 can be used, however,
most acquisition functions require both a mean and a variance prediction
(therefore not all learners are suitable for all scenarios). Typical
choices include:
Gaussian Process
for low dimensional numeric search spacesRandom Forest
for higher dimensional mixed (and / or hierarchical) search spacesA SurrogateLearner can be constructed via:
library(mlr3learners)
#> Loading required package: mlr3
surrogate = SurrogateLearner$new(lrn("regr.km"))or using syntactic sugar:
Note that default_gp
and default_rf
allow for the construction of a Gaussian Process and random forest with
sensible hyperparamter values already specified.
The encapsulated learner can be accessed via the
$learner field:
surrogate$learner
#> <LearnerRegrKM:regr.km>: Kriging
#> * Model: -
#> * Parameters: list()
#> * Packages: mlr3, mlr3learners, DiceKriging
#> * Predict Types:  response, [se]
#> * Feature Types: logical, integer, numeric
#> * Properties: -The surrogate itself has the following hyperparameters:
surrogate$param_set
#> <ParamSet>
#> Warning: Unknown argument 'on' has been passed.
#> Key: <id>
#>                      id    class lower upper nlevels
#>                  <char>   <char> <num> <num>   <num>
#> 1: assert_insample_perf ParamLgl    NA    NA       2
#> 2:         catch_errors ParamLgl    NA    NA       2
#> 3:         perf_measure ParamUty    NA    NA     Inf
#> 4:       perf_threshold ParamDbl  -Inf   Inf     Inf
#>                                                                                      default
#>                                                                                       <list>
#> 1: <NoDefault>\n  Public:\n    clone: function (deep = FALSE) \n    initialize: function () 
#> 2: <NoDefault>\n  Public:\n    clone: function (deep = FALSE) \n    initialize: function () 
#> 3: <NoDefault>\n  Public:\n    clone: function (deep = FALSE) \n    initialize: function () 
#> 4: <NoDefault>\n  Public:\n    clone: function (deep = FALSE) \n    initialize: function () 
#>                 parents  value
#>                  <list> <list>
#> 1:                       FALSE
#> 2:                        TRUE
#> 3: assert_insample_perf       
#> 4: assert_insample_perfassert_insample_perf = TRUE results in the insample
performance of the learner being calculated and asserted against a
performance threshold after each $update(). This requires
the specification of a perf_measure (any regression
measure, e.g., R squared)
and a perf_threshold. If the threshold is not met, an error
is thrown (that is caught within the optimization loop - unless
catch_errors = FALSE and results in, e.g., proposing the
next candidate uniformly at random). For more details on error handling,
see the Safety Nets section.
Note that this insample performance assertion is not always meaningful, e.g., in the case of using a Gaussian Process with no nugget, the insample performance will always be perfect.
Internally, the learner is fitted on a regression task
constructed from the Archive
of the OptimInstance
that is to be optimized and features and the target variable are
determined automatically but can also be specified via the
$cols_x and $cols_y active bindings. Ideally,
the archive is already passed during construction, however,
lazy initialization is also possible (i.e., the $archive
field will be automatically populated within the optimization routine of
an OptimizerMbo).
Important methods are $update() and
$predict() with the former one typically being used within
the loop_function and the latter one being used within the
implementation of an acquisition function.
Depending on the choice of the loop_function, multiple
targets must be modelled by (currently independent) surrogates, in which
case a SurrogateLearnerCollection should be used.
Construction and hyperparameters are analogous to the single target
scenario described above.
To get an overview of all available regression learners within the mlr3 ecosystem, use:
library(mlr3)
library(mlr3learners)
# there are plenty of more in mlr3extralearners
# library(mlr3extralearners)
learners = as.data.table(mlr_learners)
learners[task_type == "regr"]To use a custom learner not included in mlr3, mlr3learners, or mlr3extralearners, you
can inherit from LearnerRegr
and use this custom learner within the surrogate.
Based on a surrogate, an acquisition function quantifies the attractiveness of each point of the search space if it were to be evaluated in the next iteration.
A popular example is given by the Expected Improvement (Jones, Schonlau, and Welch 1998):
\[ \mathbb{E} \left[ \max \left( f_{\mathrm{min}} - Y(\mathbf{x}), 0 \right) \right] \] Here, \(Y(\mathbf{x})\) is the surrogate prediction (a random variable) for a given point \(\mathbf{x}\) (which when using a Gaussian Process follows a normal distribution) and \(f_{\mathrm{min}}\) is the currently best function value observed so far (when assuming minimization).
To get an overview of available acquisition functions, the following dictionary can be inspected:
as.data.table(mlr_acqfunctions)
#> Key: <key>
#>        key                                              label
#>     <char>                                             <char>
#>  1:    aei                     Augmented Expected Improvement
#>  2:     cb                     Lower / Upper Confidence Bound
#>  3:   ehvi                   Expected Hypervolume Improvement
#>  4: ehvigh Expected Hypervolume Improvement via GH Quadrature
#>  5:     ei                               Expected Improvement
#>  6:   eips                    Expected Improvement Per Second
#>  7:   mean                                     Posterior Mean
#>  8:     pi                         Probability Of Improvement
#>  9:     sd                       Posterior Standard Deviation
#> 10: smsego                                            SMS-EGO
#>                                  man
#>                               <char>
#>  1:    mlr3mbo::mlr_acqfunctions_aei
#>  2:     mlr3mbo::mlr_acqfunctions_cb
#>  3:   mlr3mbo::mlr_acqfunctions_ehvi
#>  4: mlr3mbo::mlr_acqfunctions_ehvigh
#>  5:     mlr3mbo::mlr_acqfunctions_ei
#>  6:   mlr3mbo::mlr_acqfunctions_eips
#>  7:   mlr3mbo::mlr_acqfunctions_mean
#>  8:     mlr3mbo::mlr_acqfunctions_pi
#>  9:     mlr3mbo::mlr_acqfunctions_sd
#> 10: mlr3mbo::mlr_acqfunctions_smsegoThe dictionary shows the key, i.e., ID of the
acquisition function, a brief description, and how the documentation can
be accessed.
Technically, all acquisition functions inherit from the
R6 class AcqFunction
which itself simply inherits from the base Objective
class.
Construction is straightforward via:
or using syntactic sugar:
Internally, the domain and codomain are
constructed based on the Archive
referenced by the Surrogate
and therefore the surrogate should be passed as an argument
already during construction.
However, lazy initialization is also possible.
In the case of the acquisition function itself being parameterized, hyperparameters should be passed as constants, e.g.:
acqf("cb") # lower / upper confidence bound with lambda hyperparameter
#> <AcqFunctionCB:acq_cb>
#> Domain:
#> <ParamSet>
#> Empty.
#> Codomain:
#> <Codomain>
#> Empty.
#> Constants:
#> <ParamSet>
#>        id    class lower upper nlevels default  value
#>    <char>   <char> <num> <num>   <num>  <list> <list>
#> 1: lambda ParamDbl     0   Inf     Inf       2      2To use a custom acquisition function you should implement a new
R6 class inheriting from AcqFunction.
To find the most promising candidate for evaluation, the acquisition
function itself must be optimized. Internally, an OptimInstance
is constructed using the acquisition function as an Objective.
An acquisition function optimizer is then used to solve this
optimization problem. Technically, this optimizer is a member of the AcqOptimizer
R6 class.
Construction requires specifying an Optimizer
as well as a Terminator:
library(bbotk)
#> Loading required package: paradox
acq_optimizer = AcqOptimizer$new(opt("random_search"), terminator = trm("evals"))Syntactic sugar:
The optimizer and terminator can be accessed via the
$optimizer and $terminator fields:
acq_optimizer$optimizer
#> <OptimizerRandomSearch>: Random Search
#> * Parameters: batch_size=1
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl, ParamFct
#> * Properties: dependencies, single-crit, multi-crit
#> * Packages: bbotkacq_optimizer$terminator
#> <TerminatorEvals>: Number of Evaluation
#> * Parameters: n_evals=100, k=0Internally, the acquisition function optimizer also requires the
acquisition function and therefore the acq_function
argument should be specified during construction.
However, lazy initialization is also possible.
An AcqOptimizer
has the following hyperparameters:
acq_optimizer$param_set
#> <ParamSet>
#> Warning: Unknown argument 'on' has been passed.
#> Key: <id>
#>                        id    class lower upper nlevels
#>                    <char>   <char> <num> <num>   <num>
#> 1:           catch_errors ParamLgl    NA    NA       2
#> 2:          logging_level ParamFct    NA    NA       6
#> 3:           n_candidates ParamInt     1   Inf     Inf
#> 4: skip_already_evaluated ParamLgl    NA    NA       2
#> 5:              warmstart ParamLgl    NA    NA       2
#> 6:         warmstart_size ParamInt     1   Inf     Inf
#>                                                                                      default
#>                                                                                       <list>
#> 1:                                                                                      TRUE
#> 2:                                                                                      warn
#> 3:                                                                                         1
#> 4:                                                                                      TRUE
#> 5:                                                                                     FALSE
#> 6: <NoDefault>\n  Public:\n    clone: function (deep = FALSE) \n    initialize: function () 
#>      parents  value
#>       <list> <list>
#> 1:             TRUE
#> 2:             warn
#> 3:                1
#> 4:             TRUE
#> 5:            FALSE
#> 6: warmstartcatch_errors = TRUE results in catching any errors that
can happen during the acquisition function optimization which allows
for, e.g., proposing the next candidate uniformly at random within the
loop_function. For more details on this mechanism, see the
Safety Nets section.
logging_level specifies the logging level during
acquisition function optimization. Often it is useful to only log the
progress of the BO loop and therefore logging_level is set
to "warn" by default. For debugging purposes, this should
be set to "info".
skip_already_evaluated = TRUE will result in not proposing
candidates for evaluation that were already evaluated in previous
iterations. warmstart = TRUE results in the best
warmstart_size points present in the archive
of the OptimInstance to also be evaluated on the
acquisition function OptimInstance prior to running the
actual acquisition function optimization. This is especially useful in
the context of using evolutionary algorithms or variants of local search
as the acquisition function optimizer (as the current best points should
usually be part of the initial population to further optimize local
optima).
To get an overview of all Optimizers implemented in bbotk you can use:
And similarly for Terminators:
Again, you can also use custom Optimizers or
Terminators by implementing new R6 classes
inheriting from Optimizer
and Terminator
respectively.
Having introduced all building blocks we are now ready to put
everything together in the form of an OptimizerMbo
or TunerMbo.
OptimizerMbo
inherits from Optimizer
and requires a loop_function, surrogate,
acq_function and acq_optimizer.
Construction is performed via:
optimizer = OptimizerMbo$new(bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)or using syntactic sugar:
optimizer = opt("mbo",
  loop_function = bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)optimizer
#> <OptimizerMbo>: Model Based Optimization
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl
#> * Properties: single-crit
#> * Packages: mlr3mbo, mlr3, mlr3learners, DiceKriging, bbotk
#> * Loop function: bayesopt_ego
#> * Surrogate: LearnerRegrKM
#> * Acquisition Function: AcqFunctionEI
#> * Acquisition Function Optimizer: (OptimizerRandomSearch |
#>   TerminatorEvals)Additional arguments, i.e., arguments of the
loop_function can be passed via the args
argument during construction or the $args active binding.
Finally, the mechanism how the final result is obtained after the
optimization process (i.e., the best point in the case of
single-objective and the Pareto set in the case of multi-objective
optimization) can be changed via the result_assigner
argument during construction or the $result_assigner active
binding. As an example, ResultAssignerSurrogate
will choose the final solution based on the prediction of the surrogate
instead of the evaluations logged in the archive which is
sensible in the case of noisy objective functions. The default, however,
is to use ResultAssignerArchive
which will directly choose the final solution based on the evaluations
logged in the archive. To get an overview of available
result assigners, the following dictionary can be inspected:
as.data.table(mlr_result_assigners)
#> Key: <key>
#>          key                     label                                     man
#>       <char>                    <char>                                  <char>
#> 1:   archive                   Archive   mlr3mbo::mlr_result_assigners_archive
#> 2: surrogate Mean Surrogate Prediction mlr3mbo::mlr_result_assigners_surrogateThe dictionary shows the key, i.e., ID of the result
assigner, a brief description, and how the documentation can be
accessed.
Construction of result assigners is straightforward:
Syntactic sugar:
Note that important fields of an OptimizerMbo such as
$param_classes, $packages,
$properties are automatically determined based on the
choice of the loop_function, surrogate,
acq_function, acq_optimizer, and
result_assigner.
If arguments such as the surrogate,
acq_function, acq_optimizer and
result_assigner were not fully initialized during
construction, e.g., the surrogate missing the
archive, or the acq_function missing the
surrogate, lazy initialization is completed prior to the
optimizer being used for optimization.
An object of class OptimizerMbo can be used to optimize
an object of class OptimInstanceSingleCrit
or OptimInstanceMultiCrit.
For hyperparameter optimization, TunerMbo
should be used (which simply relies on an OptimizerMbo that
is constructed internally):
tuner = TunerMbo$new(bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)
mlr3misc::get_private(tuner)[[".optimizer"]]
#> <OptimizerMbo>: Model Based Optimization
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl
#> * Properties: single-crit
#> * Packages: mlr3mbo, mlr3, mlr3learners, DiceKriging, bbotk
#> * Loop function: bayesopt_ego
#> * Surrogate: LearnerRegrKM
#> * Acquisition Function: AcqFunctionEI
#> * Acquisition Function Optimizer: (OptimizerRandomSearch |
#>   TerminatorEvals)mlr3mbo offers two different ways for specifying an initial design:
OptimInstance
that is to be optimized prior to using an OptimizerMbo. In
this case, the loop_function should skip the construction
and evaluation of an initial design.OptimInstance, the loop_function should
construct an initial design itself and evaluate it, e.g.,
bayesopt_ego then constructs an initial design of size
\(4D\) where \(D\) is the dimensionality of the search
space by sampling points uniformly at random.Functions for creating different initial designs are part of the paradox package, e.g.:
generate_design_random:
uniformly at randomgenerate_design_grid:
uniform sized gridgenerate_design_lhs:
Latin hypercube samplinggenerate_design_sobol:
Sobol sequencemlr3mbo tries to use
intelligent defaults for the loop_function,
surrogate, acq_function, and
acq_optimizer within OptimizerMbo and
TunerMbo.
For details, see mbo_defaults.
mlr3mbo is quite stable in
the sense that - if desired - all kinds of errors can be caught and
handled appropriately within the loop_function.
As an example, let’s have a look at the inner workings of bayesopt_ego:
repeat {
  xdt = tryCatch({
    .
    .
    .
    acq_function$surrogate$update()
    acq_function$update()
    acq_optimizer$optimize()
  }, mbo_error = function(mbo_error_condition) {
    lg$info(paste0(class(mbo_error_condition), collapse = " / "))
    lg$info("Proposing a randomly sampled point")
    SamplerUnif$new(domain)$sample(1L)$data
  })
  .
  .
  .
}In each iteration, a new candidate is chosen based on updating the
surrogate and acquisition function, and optimizing the acquisition
function. If any error happens during any of these steps, errors are
upgraded to errors of class "mbo_error" (and
"surrogate_update_error" for surrogate related errors as
well as "acq_optimizer_error" for acquisition function
optimization related errors). These errors are then caught and a
fallback is triggered: Evaluating the next candidate chosen uniformly at
random. Note that the same mechanism is actually also used to handle
random interleaving.
To illustrate this error handling mechanism, consider the following scenario: We try to minimize \(f: [-1, 1] \rightarrow \mathbb{R}, x \mapsto x^2\), however, our Gaussian Process fails due to data points being too close to each other:
set.seed(2906)
domain = ps(x = p_dbl(lower = -1, upper = 1))
codomain = ps(y = p_dbl(tags = "minimize"))
objective_function = function(xs) {
  list(y = xs$x ^ 2)
}
objective = ObjectiveRFun$new(
  fun = objective_function,
  domain = domain,
  codomain = codomain)
instance = OptimInstanceSingleCrit$new(
  objective = objective,
  terminator = trm("evals", n_evals = 10))
initial_design = data.table(x = rep(1, 4))
instance$eval_batch(initial_design)
#> INFO  [19:58:00.984] [bbotk] Evaluating 4 configuration(s)
#> INFO  [19:58:01.024] [bbotk] Result of batch 1:
#> INFO  [19:58:01.032] [bbotk]  x y
#> INFO  [19:58:01.032] [bbotk]  1 1
#> INFO  [19:58:01.032] [bbotk]  1 1
#> INFO  [19:58:01.032] [bbotk]  1 1
#> INFO  [19:58:01.032] [bbotk]  1 1
surrogate = srlrn(default_gp())
acq_function = acqf("ei")
acq_optimizer = acqo(opt("random_search", batch_size = 1000),
  terminator = trm("evals", n_evals = 1000))
optimizer = opt("mbo",
  loop_function = bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)
optimizer$optimize(instance)
#> INFO  [19:58:01.178] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerMbo>' and '<TerminatorEvals> [n_evals=10, k=0]'
#> WARN  [19:58:01.234] [bbotk] missing value where TRUE/FALSE needed
#> INFO  [19:58:01.235] [bbotk] surrogate_update_error / mbo_error / error / condition
#> INFO  [19:58:01.236] [bbotk] Proposing a randomly sampled point
#> INFO  [19:58:01.245] [bbotk] Evaluating 1 configuration(s)
#> INFO  [19:58:01.249] [bbotk] Result of batch 2:
#> INFO  [19:58:01.250] [bbotk]          x         y
#> INFO  [19:58:01.250] [bbotk]  0.6742299 0.4545859
#> INFO  [19:58:01.487] [bbotk] Evaluating 1 configuration(s)
#> INFO  [19:58:01.491] [bbotk] Result of batch 3:
#> INFO  [19:58:01.493] [bbotk]          x  x_domain     acq_ei .already_evaluated         y
#> INFO  [19:58:01.493] [bbotk]  0.6686506 <list[1]> 0.04358139              FALSE 0.4470936
#> INFO  [19:58:01.821] [bbotk] Evaluating 1 configuration(s)
#> INFO  [19:58:01.826] [bbotk] Result of batch 4:
#> INFO  [19:58:01.828] [bbotk]          x  x_domain    acq_ei .already_evaluated         y
#> INFO  [19:58:01.828] [bbotk]  0.3883431 <list[1]> 0.1264502              FALSE 0.1508104
#> INFO  [19:58:02.111] [bbotk] Evaluating 1 configuration(s)
#> INFO  [19:58:02.117] [bbotk] Result of batch 5:
#> INFO  [19:58:02.120] [bbotk]           x  x_domain    acq_ei .already_evaluated         y
#> INFO  [19:58:02.120] [bbotk]  -0.5325387 <list[1]> 0.2231668              FALSE 0.2835975
#> INFO  [19:58:02.406] [bbotk] Evaluating 1 configuration(s)
#> INFO  [19:58:02.410] [bbotk] Result of batch 6:
#> INFO  [19:58:02.412] [bbotk]           x  x_domain    acq_ei .already_evaluated           y
#> INFO  [19:58:02.412] [bbotk]  0.04818857 <list[1]> 0.1256542              FALSE 0.002322138
#> INFO  [19:58:02.678] [bbotk] Evaluating 1 configuration(s)
#> INFO  [19:58:02.681] [bbotk] Result of batch 7:
#> INFO  [19:58:02.684] [bbotk]            x  x_domain      acq_ei .already_evaluated            y
#> INFO  [19:58:02.684] [bbotk]  -0.01951836 <list[1]> 0.002040261              FALSE 0.0003809664
#> INFO  [19:58:02.732] [bbotk] Finished optimizing after 10 evaluation(s)
#> INFO  [19:58:02.733] [bbotk] Result:
#> INFO  [19:58:02.734] [bbotk]            x  x_domain            y
#> INFO  [19:58:02.734] [bbotk]        <num>    <list>        <num>
#> INFO  [19:58:02.734] [bbotk]  -0.01951836 <list[1]> 0.0003809664
#>              x  x_domain            y
#>          <num>    <list>        <num>
#> 1: -0.01951836 <list[1]> 0.0003809664The log tells us that an error happened and was caught:
"surrogate_update_error / mbo_error / error / condition".
We also see in the archive that the first candidate after
the initial design (the fifth point) was not proposed based on
optimizing the acquisition function (because the "acq_ei"
column is NA here):
instance$archive$data
#>               x            y  x_domain           timestamp batch_nr      acq_ei
#>           <num>        <num>    <list>              <POSc>    <int>       <num>
#>  1:  1.00000000 1.0000000000 <list[1]> 2024-03-01 19:58:00        1          NA
#>  2:  1.00000000 1.0000000000 <list[1]> 2024-03-01 19:58:00        1          NA
#>  3:  1.00000000 1.0000000000 <list[1]> 2024-03-01 19:58:00        1          NA
#>  4:  1.00000000 1.0000000000 <list[1]> 2024-03-01 19:58:00        1          NA
#>  5:  0.67422986 0.4545859064 <list[1]> 2024-03-01 19:58:01        2          NA
#>  6:  0.66865060 0.4470936288 <list[1]> 2024-03-01 19:58:01        3 0.043581388
#>  7:  0.38834314 0.1508103944 <list[1]> 2024-03-01 19:58:01        4 0.126450229
#>  8: -0.53253872 0.2835974922 <list[1]> 2024-03-01 19:58:02        5 0.223166768
#>  9:  0.04818857 0.0023221379 <list[1]> 2024-03-01 19:58:02        6 0.125654216
#> 10: -0.01951836 0.0003809664 <list[1]> 2024-03-01 19:58:02        7 0.002040261
#>     .already_evaluated
#>                 <lgcl>
#>  1:                 NA
#>  2:                 NA
#>  3:                 NA
#>  4:                 NA
#>  5:                 NA
#>  6:              FALSE
#>  7:              FALSE
#>  8:              FALSE
#>  9:              FALSE
#> 10:              FALSENevertheless, due to the safety net, the BO loop eventually worked just fine and did not simply throw an error.
If we set catch_errors = FALSE within the surrogate, we
see that the error was indeed caused by the surrogate:
instance$archive$clear()
instance$eval_batch(initial_design)
#> INFO  [19:58:02.756] [bbotk] Evaluating 4 configuration(s)
#> INFO  [19:58:02.761] [bbotk] Result of batch 1:
#> INFO  [19:58:02.763] [bbotk]  x y
#> INFO  [19:58:02.763] [bbotk]  1 1
#> INFO  [19:58:02.763] [bbotk]  1 1
#> INFO  [19:58:02.763] [bbotk]  1 1
#> INFO  [19:58:02.763] [bbotk]  1 1
optimizer$surrogate$param_set$values$catch_errors = FALSE
optimizer$optimize(instance)
#> INFO  [19:58:02.767] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerMbo>' and '<TerminatorEvals> [n_evals=10, k=0]'
#> Error in if (varinit.vario <= 1e-20) varinit.vario <- 1/2 * varinit.vario.total: missing value where TRUE/FALSE neededIn case of the error belonging to the
acq_optimizer_error class, it is helpful to increase the
logging level of the acquisition function optimizer (e.g.,
acq_optimizer$param_set$values$logging_level = "info") and
also set
acq_optimizer$param_set$values$catch_errors = FALSE. This
allows for straightforward debugging.
To make sure that your BO loop behaved as expected, always inspect
the log of the optimization process and inspect the archive
and check whether the acquisition function column is populated as
expected.
Writing a custom loop_function is straightforward.
Any loop_function must be an object of the S3 class loop_function
(simply a standard R function with some requirements regarding its
arguments and attributes). Arguments of the function must include
instance, surrogate,
acq_function, and acq_optimizer and attributes
must include id (id of the loop function),
label (brief description), instance
(“single-crit” and or “multi_crit”), and man (link to the
manual page).
Technically, any loop_function therefore looks like the
following:
bayesopt_custom = function(instance, surrogate, acq_function, acq_optimizer) {
  # typically some assertions
  # initial design handling
  # actual loop function
}
class(bayesopt_custom) = "loop_function"
attr(bayesopt_custom, "id") = "bayesopt_custom"
attr(bayesopt_custom, "label") = "My custom BO loop"
attr(bayesopt_custom, "instance") = "single-crit"
attr(bayesopt_custom, "man") = ""  # no man page
# if you want to add it to the dictionary: mlr_loop_functions$add("bayesopt_custom", bayesopt_custom)bayesopt_custom
#> Loop function: bayesopt_custom
#> * Description: My custom BO loop
#> * Supported Instance: single-critFor some inspiration on how to write actual meaningful
loop_functions, see, e.g., bayesopt_ego.
In this final section, some standard examples are provided.
objective_function = function(xs) {
  list(y = 418.9829 * 2 - (sum(unlist(xs) * sin(sqrt(abs(unlist(xs)))))))
}
domain = ps(x1 = p_dbl(lower = -500, upper = 500),
  x2 = p_dbl(lower = -500, upper = 500))
codomain = ps(y = p_dbl(tags = "minimize"))
objective = ObjectiveRFun$new(
  fun = objective_function,
  domain = domain,
  codomain = codomain)
instance = OptimInstanceSingleCrit$new(
  objective = objective,
  search_space = domain,
  terminator = trm("evals", n_evals = 60))
# Gaussian Process, EI, DIRECT
surrogate = srlrn(default_gp())
acq_function = acqf("ei")
acq_optimizer = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
  terminator = trm("stagnation", threshold = 1e-8))
optimizer = opt("mbo",
  loop_function = bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)
set.seed(2906)
optimizer$optimize(instance)library(ggplot2)
ggplot(aes(x = batch_nr, y = cummin(y)), data = instance$archive$data) +
  geom_point() +
  geom_step() +
  labs(x = "Batch Nr.", y = "Best y") +
  theme_minimal()xdt = generate_design_grid(instance$search_space, resolution = 101)$data
ydt = objective$eval_dt(xdt)
ggplot(aes(x = x1, y = x2, z = y), data = cbind(xdt, ydt)) +
  geom_contour_filled() +
  geom_point(aes(color = batch_nr), size = 2, data = instance$archive$data) +
  scale_color_gradient(low = "lightgrey", high = "red") +
  theme_minimal()objective_function = function(xs) {
  list(y1 = xs$x^2, y2 = (xs$x - 2)^2)
}
domain = ps(x = p_dbl(lower = -10, upper = 10))
codomain = ps(y1 = p_dbl(tags = "minimize"), y2 = p_dbl(tags = "minimize"))
objective = ObjectiveRFun$new(
  fun = objective_function,
  domain = domain,
  codomain = codomain)
instance = OptimInstanceMultiCrit$new(
  objective = objective,
  search_space = domain,
  terminator = trm("evals", n_evals = 30))
# Gaussian Process, EI, DIRECT
surrogate = srlrn(default_gp())
acq_function = acqf("ei")
acq_optimizer = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
  terminator = trm("stagnation", threshold = 1e-8))
optimizer = opt("mbo",
  loop_function = bayesopt_parego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)
set.seed(2906)
optimizer$optimize(instance)library(emoa)
library(mlr3misc)
library(data.table)
anytime_hypervolume = map_dtr(unique(instance$archive$data$batch_nr), function(bnr) {
  pareto = instance$archive$best(batch = 1:bnr)[, instance$archive$cols_y, with = FALSE]
  dhv = dominated_hypervolume(t(pareto), ref = t(t(c(100, 144))))
  data.table(batch_nr = bnr, dhv = dhv)
})
ggplot(aes(x = batch_nr, y = dhv), data = anytime_hypervolume[batch_nr > 1]) +
  geom_point() +
  geom_step(direction = "vh") +
  labs(x = "Batch Nr.", y = "Dominated Hypervolume") +
  theme_minimal()objective_function = function(xs) {
  list(y1 = xs$x^2, y2 = (xs$x - 2)^2)
}
domain = ps(x = p_dbl(lower = -10, upper = 10))
codomain = ps(y1 = p_dbl(tags = "minimize"), y2 = p_dbl(tags = "minimize"))
objective = ObjectiveRFun$new(
  fun = objective_function,
  domain = domain,
  codomain = codomain)
instance = OptimInstanceMultiCrit$new(
  objective = objective,
  search_space = domain,
  terminator = trm("evals", n_evals = 30))
# Gaussian Processes, SMS-EGO, DIRECT
learner_y1 = default_gp()
learner_y2 = learner_y1$clone(deep = TRUE)
surrogate = srlrn(list(learner_y1, learner_y2))
acq_function = acqf("smsego")
acq_optimizer = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
  terminator = trm("stagnation", threshold = 1e-8))
optimizer = opt("mbo",
  loop_function = bayesopt_smsego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)
set.seed(2906)
optimizer$optimize(instance)anytime_hypervolume = map_dtr(unique(instance$archive$data$batch_nr), function(bnr) {
  pareto = instance$archive$best(batch = 1:bnr)[, instance$archive$cols_y, with = FALSE]
  dhv = dominated_hypervolume(t(pareto), ref = t(t(c(100, 144))))
  data.table(batch_nr = bnr, dhv = dhv)
})
ggplot(aes(x = batch_nr, y = dhv), data = anytime_hypervolume[batch_nr > 1]) +
  geom_point() +
  geom_step(direction = "vh") +
  labs(x = "Batch Nr.", y = "Dominated Hypervolume") +
  theme_minimal()library(mlr3)
task = tsk("wine")
learner = lrn("classif.rpart",
  cp = to_tune(lower = 1e-4, upper = 1, logscale = TRUE),
  maxdepth = to_tune(lower = 1, upper = 10),
  minbucket = to_tune(lower = 1, upper = 10),
  minsplit = to_tune(lower = 1, upper = 10))
resampling = rsmp("cv", folds = 3)
measure = msr("classif.acc")
instance = TuningInstanceSingleCrit$new(
  task = task,
  learner = learner,
  resampling = resampling,
  measure = measure,
  terminator = trm("evals", n_evals = 30))
# Gaussian Process, EI, FocusSearch
surrogate = srlrn(default_gp(noisy = TRUE))
acq_function = acqf("ei")
acq_optimizer = acqo(opt("focus_search", n_points = 100L, maxit = 9),
  terminator = trm("evals", n_evals = 3000))
tuner = tnr("mbo",
  loop_function = bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)
set.seed(2906)
tuner$optimize(instance)
instance$resulttask = tsk("wine")
learner = lrn("classif.rpart",
  cp = to_tune(lower = 1e-4, upper = 1, logscale = TRUE),
  maxdepth = to_tune(lower = 1, upper = 10),
  minbucket = to_tune(lower = 1, upper = 10),
  minsplit = to_tune(lower = 1, upper = 10))
resampling = rsmp("cv", folds = 3)
measures = msrs(c("classif.acc", "selected_features"))
instance = TuningInstanceMultiCrit$new(
  task = task,
  learner = learner,
  resampling = resampling,
  measures = measures,
  terminator = trm("evals", n_evals = 30),
  store_models = TRUE) # required due to selected features
# Gaussian Process, EI, FocusSearch
surrogate = srlrn(default_gp(noisy = TRUE))
acq_function = acqf("ei")
acq_optimizer = acqo(opt("focus_search", n_points = 100L, maxit = 9),
  terminator = trm("evals", n_evals = 3000))
tuner = tnr("mbo",
  loop_function = bayesopt_parego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)
set.seed(2906)
tuner$optimize(instance)
instance$result