This vignette demonstrates the simulation capabilities included in the cramR package. The simulation code is primarily intended for reproducing experimental results from the associated theoretical papers and for validating the performance of the Cram method under controlled data-generating processes. While not intended for direct use in practical applications, these simulations allow users to benchmark and understand the empirical behavior of the method in synthetic environments.
cram_bandit_sim()
?The cram_bandit_sim()
function runs on-policy
simulation for contextual bandit algorithms using the Cram
method. It evaluates the statistical properties of policy value
estimates such as:
This is useful for benchmarking bandit policies under controlled, simulated environments.
You need to provide:
bandit
:
A contextual bandit environment object that generates
contexts (feature vectors) and rewards for each arm.
Example: ContextualLinearBandit
, or any object following
the contextual
package interface.
policy
:
A policy object that takes in a context and selects an
arm (action) at each timestep.
Example: BatchContextualLinTSPolicy
, or any compatible contextual
package policy.
horizon
:
An integer specifying the number of timesteps (rounds)
per simulation.
Each simulation will run for exactly horizon
steps.
simulations
:
An integer specifying the number of independent Monte
Carlo simulations to perform.
Each simulation will independently reset the environment and
policy.
Optional Parameters:
alpha
:
A numeric value between 0 and 1 specifying the
significance level for confidence intervals when calculating empirical
coverage.
Default: 0.05
(for 95% confidence intervals).
seed
:
An optional integer to set the random seed for
reproducibility.
If NULL
, no seed is set.
do_parallel
:
A logical value indicating whether to parallelize the
simulations across available CPU cores.
Default: FALSE
(parallelization disabled).
We recommend keeping do_parallel = FALSE
unless necessary,
as parallel execution can make it harder for the underlying contextual
package to reliably track simulation history.
In particular, parallel runs may cause missing or incomplete entries in
the recorded history, which are then discarded during analysis.
# Number of time steps
horizon <- 500L
# Number of simulations
simulations <- 100L
# Number of arms
k = 4
# Number of context features
d= 3
# Reward beta parameters of linear model (the outcome generation models, one for each arm, are linear with arm-specific parameters betas)
list_betas <- cramR::get_betas(simulations, d, k)
# Define the contextual linear bandit, where sigma is the scale of the noise in the outcome linear model
bandit <- cramR::ContextualLinearBandit$new(k = k, d = d, list_betas = list_betas, sigma = 0.3)
# Define the policy object (choose between Contextual Epsilon Greedy, UCB Disjoint and Thompson Sampling)
policy <- cramR::BatchContextualEpsilonGreedyPolicy$new(epsilon=0.1, batch_size=5)
# policy <- cramR::BatchLinUCBDisjointPolicyEpsilon$new(alpha=1.0, epsilon=0.1, batch_size=1)
# policy <- cramR::BatchContextualLinTSPolicy$new(v = 0.1, batch_size=1)
sim <- cram_bandit_sim(horizon, simulations,
bandit, policy,
alpha=0.05, do_parallel = FALSE)
#> Simulation horizon: 500
#> Number of simulations: 101
#> Number of batches: 1
#> Starting main loop.
#> Finished main loop.
#> Completed simulation in 0:00:04.500
#> Computing statistics.
The output contains:
A data.table
with one row per simulation, including:
estimate
: estimated policy valuevariance_est
: estimated varianceestimand
: true policy value (computed from held-out
context data)prediction_error
:
estimate - estimand
est_rel_error
: relative error on estimatevariance_prediction_error
: relative error on
varianceci_lower
, ci_upper
: bounds of the
confidence intervalstd_error
: standard errorResult tables (raw and interactive), reporting:
head(sim$estimates)
#> sim estimate variance_est estimand prediction_error est_rel_error
#> <int> <num> <num> <num> <num> <num>
#> 1: 1 0.5934946 0.007485342 0.5008213 0.092673298 0.18504265
#> 2: 2 0.5738572 0.004488658 0.5472302 0.026626936 0.04865765
#> 3: 3 0.3082747 0.005971414 0.2685309 0.039743854 0.14800479
#> 4: 4 0.4875583 0.001785894 0.5025160 -0.014957724 -0.02976567
#> 5: 5 0.6279068 0.002812175 0.6234795 0.004427278 0.00710092
#> 6: 6 0.6756592 0.001788615 0.6278958 0.047763481 0.07606913
#> variance_prediction_error std_error ci_lower ci_upper
#> <num> <num> <num> <num>
#> 1: -0.1543576 0.08651787 0.4239227 0.7630665
#> 2: -0.4929024 0.06699744 0.4425446 0.7051697
#> 3: -0.3253907 0.07727493 0.1568186 0.4597308
#> 4: -0.7982420 0.04225984 0.4047305 0.5703860
#> 5: -0.6822998 0.05302995 0.5239700 0.7318436
#> 6: -0.7979345 0.04229202 0.5927684 0.7585501
list_betas
is updated internally to track the true
parameters per simulationcontextual
) even when
do_parallel = FALSE
.This simulation builds on:
contextual
package)