control: Controls of Estimation Methods
In multiRL: Reinforcement Learning Tools for Multi-Armed Bandit

control

R Documentation

Controls of Estimation Methods

Description

The control argument is a mandatory list used to customize and manage various aspects of the iterative process, covering everything from optimization settings to model configuration.

Class

control [List]

Note

Different estimation methods require different slots. However, there is no need to worry if you set unnecessary slots, as this will not affect the execution.

0. General

seed [int]

The random seed controls the reproducibility of each iteration. Specifically, it determines how the algorithm package generates "random" input parameters when searching for the optimal parameters. Fixing the seed ensures that the optimal parameters found are the same in every run. The default value is 123.
core [int]

Since the parameter fitting process for individual subjects is independent, this procedure can be accelerated using CPU parallelism. This argument specifies the number of subjects to be fitted simultaneously (the number of parallel threads), with the default set to 1. If the user wishes to speed up the fitting, they can increase the number of cores appropriately based on their system specifications.
sample [int]

This parameter denotes the quantity of simulated data generated during the parameter recovery process.
dash [Numeric]

To prevent the optimal parameter estimates from converging to boundary values when the number of iterations is insufficient, a small value is added to the lower bound and subtracted from the upper bound.

For instance, if the input parameter bounds are (0, 1), the actual bounds used for fitting will be [0.00001, 0.99999]. This design prevents the occurrence of Infinite values.

1. Likelihood Based Inference (LBI)

algorithm [Character]

The package supports the following eight optimization packages for finding the optimal values of the model's free parameters.
1. L-BFGS-B (from stats::optim)
2. Simulated Annealing (GenSA::GenSA)
3. Genetic Algorithm (GA::ga)
4. Differential Evolution (DEoptim::DEoptim)
5. Bayesian Optimization (mlrMBO::mbo)
6. Particle Swarm Optimization (pso::psoptim)
7. Covariance Matrix Adapting Evolutionary Strategy (cmaes::cma_es)
8. Nonlinear Optimization (nloptr::nloptr)
pars [NumericVector]

Some algorithms require the specification of initial iteration values. If this value is left as the default NA, the iteration will commence with an initial value set to the lower bound of the estimate plus 0.01.
size [int]

Some algorithms, such as Genetic Algorithms (GA), require the specification of initial population values. For the definition of the population, users may refer to the relevant documentation on evolutionary algorithms. The default value is consistent with the standard default in GA, which is 50.

1.1 Maximum Likelihood Estimation (MLE)

iter [int]

This parameter defines the maximum number of iterations. The iterative process will stop when this value is reached. The default value is 10. It is recommended that you set this value to at least 100 for formal fitting procedures.

1.2 Maximum A Posteriori (MAP)

iter [int]

You can input a numeric vector of length 2. The first element specifies the number of iterations per algorithm call. The second element determines the total number of EM-MAP executions across all participants. In other words, if the first element matches your MLE settings, the second element represents the computational fold-change of MAP relative to MLE.
diff [double]

In the Expectation-Maximization with Maximum A Posteriori algorithm (EM-MAP), after estimating the optimal parameters for all subjects in each iteration, the posterior distribution of each free parameter is calculated, followed by continuous refinement of the prior distribution. The process stops when the change in the log-posterior value is less than the diff, which defaults to 0.001.
patience [int]

Given that the Expectation-Maximization with Maximum A Posteriori (EM-MAP) process can be time-consuming and often encounters non-convergence issues-for instance, when the log-posterior oscillates around a certain value-the patience parameter is used to manage early termination.Specifically, patience is incremented by 1 when the current result is better than the best previous result, and decremented by 1 when it is worse. The iteration is prematurely terminated when the patience count reaches zero.

2. Simulation Based Inference (SBI)

train [int]

This parameter is used to specify the quantity of simulated data utilized when training the Approximate Bayesian Computation (ABC) or Recurrent Neural Network (RNN) models.
scope [Character]

This parameter can be defined as individual or shared. The former indicates that a separate Approximate Bayesian Computation (ABC) or Recurrent Neural Network (RNN) model is trained for each dataset, while the latter means that only one Approximate Bayesian Computation (ABC) or Recurrent Neural Network (RNN) model is trained and shared across all datasets. In the context of the rcv_d function, the default setting is "shared", whereas in fit_p, the default is "individual".
- "shared":
  
  The most aggressive approach. It assumes that the trial presentation order does not influence RNN construction. All subjects are assumed to follow a single template, and only one model is trained. This is the default for rcv_d.
- "individual":
  
  The most conservative approach. It assumes that the RNN model is sensitive to even minor variations in trial sequences. A separate model is trained for each subject's unique data. This is the default for fit_p.
- "universal":
  
  An experimental "one-for-all" trade-off recommended only for fit_p. It expands the training pool by generating a dataset of size n_sub * n_train, incorporating templates from all subjects into a single trained RNN. This significantly reduces fitting time but may compromise the model's generalization performance.
  
  NOTE: Since abc lacks generalization capability, this scope is only applicable when estimate = "RNN".

2.1 Approximate Bayesian Computation (ABC)

tol [double]

This parameter, aka tolerance, controls how strict the Approximate Bayesian Computation (ABC) algorithm is when selecting good simulated data. It sets the acceptance rate. For example, setting tol = 0.1 (the default) means only the 10 percent of simulated data that is closest to your actual data is used.
reduction [Character]

Specifies the dimension reduction method for summary statistics. In ABC, high-dimensional summary statistics often lead to the "curse of dimensionality," where the algorithm struggles to find a solution or suffers from extremely slow convergence. Reducing the dimensions (compressing the data) helps retain the "fingerprint" of the original data while removing noise, ensuring the program can efficiently identify the underlying parameters.
- NULL: No compression is applied. This is suitable for smaller datasets where the total number of features (e.g., blocks * responses) is relatively low (typically less than 200).
- "PLS" (Partial Least Squares): A supervised reduction method that compresses the summary statistics into a space with dimensions equal to the number users set (as default, it is equal to the number of blocks).
- "PCA" (Principal Component Analysis): An unsupervised reduction method that compresses the information into a space with dimensions equal to users set (as default, it is equal to the number of blocks).
ncomp [int]

The number of components represents the quantity of information after compression. By default, this value is equal to the number of blocks. Since the summary statistics consist of the selection ratios for each action within each block, an excessive number of blocks or available actions can lead to high-dimensional information, making it difficult for the ABC to converge on a solution. In such cases, PLS or PCA can be selected for dimensionality reduction.
metric [Character]

Specifies the statistical metric used to determine the best estimated parameter from the posterior distribution. By default, this is set to "mode", which uses the mode of the accepted simulated parameters as the best estimate. Users can also change this to "mean" or "median" to use the average or the median value, respectively.

2.2 Recurrent Neural Network (RNN)

layer [Character]

Recurrent Neural Networks (RNNs) are neural networks where the sequence order is meaningful. Currently, the package supports the following types of recurrent layers:
- "RNN" (Simple Recurrent Neural Network):
  - "RNN" (Simple Recurrent Neural Network):
  - "BiRNN" (Bidirectional SimpleRNN):
- Gated Recurrent Unit (GRU) and Bidirectional GRU (BiGRU):
  - "GRU" (Gated Recurrent Unit):
  - "BiGRU" (Bidirectional GRU):
- Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM):
  - "LSTM" (Long Short-Term Memory):
  - "BiLSTM" (Bidirectional LSTM):
loss [Character]

Specifies the loss function used to train the Recurrent Neural Network (RNN). The choice of loss function depends on the nature of the prediction task and the desired properties of the estimated parameters.
- "MSE" (Mean Squared Error): A common loss function that measures the average squared difference between predicted and actual values. It is sensitive to outliers.
- "MAE" (Mean Absolute Error): Measures the average absolute difference between predicted and actual values. It is more robust to outliers than MSE.
- "HBR" (Huber Loss): A combination of MSE and MAE, acting as MSE for small errors and MAE for large errors. It is less sensitive to outliers than MSE while being smoother than MAE.
- "NLL" (Negative Log-Likelihood): Used for probabilistic predictions where the model outputs both the mean and variance of a Gaussian distribution, aiming to maximize the likelihood of the observed data.
- "QRL" (Quantile Regression Loss): Allows the model to predict specific quantiles (e.g., 0.05, 0.50, 0.95) of the target distribution, rather than just the mean.
- "MDN" (Mixture Density Network): Enables the model to predict a mixture of probability distributions, useful for capturing complex, multimodal, or uncertain posterior distributions of parameters.
info [CharacterVector]

The Recurrent Neural Network (RNN) needs to find the mapping relationship between the dataset and the free parameters. To minimize the time required for this process, we should only include useful information in the input dataset. The info parameter accepts a character vector which represents the amount of information (i.e., the specific columns) you deem necessary for training the Recurrent Neural Network (RNN) model. By default, only the colnames$object and colnames$action columns are included as input.
units [int]

The number of neurons (or units) in the Recurrent Layer (RNN, GRU or LSTM). Conceptually, this parameter represents the memory capacity and complexity of the network; it dictates how much information about the sequential trials the model can store and process.
dropout [double]

Dropout is a powerful regularization technique used to prevent overfitting in RNNs. During each training iteration, a predefined percentage of neurons (units) are randomly "dropped" or deactivated by setting their activations to zero.
L [Character]

This parameter determines the type of regularization applied to the log-likelihood to penalize model complexity, which helps prevent overfitting. The default is NA_character_, meaning no regularization is applied. Supported values include:
- L = 1: L1 regularization (Lasso), which adds a penalty proportional to the sum of the absolute values of the free parameters.
- L = 2: L2 regularization (Ridge), which adds a penalty proportional to the sum of the squared values of the free parameters.
- L = 12: Elastic Net regularization, which applies both L1 and L2 penalties simultaneously.
penalty [double]

This parameter specifies the strength of the regularization, acting as a multiplier for the penalty term defined by L. A larger value imposes a stronger penalty on the free parameters. The default value is 1e-5.
batch_size [int]

The number of samples processed before the model's parameters are updated. Think of this as the size of a study group; the model reviews this batch of data before adjusting its internal weights. A larger batch size speeds up calculation but may lead to less optimal convergence.
epochs [int]

The number of times the learning algorithm will work through the entire training dataset. This is equivalent to running through the "textbook" multiple times. Each epoch means the model has seen every training sample once. More epochs allow for more training but increase the risk of overfitting.
keras3 [logical]:

The version of Keras to be used for model construction. Currently supports keras and keras3 . Note that keras3 = TRUE enables multi-backend support via the backend parameter.
backend [Character]:

The deep learning framework to serve as the computation engine when keras3 = TRUE. Options include "tensorflow", "jax", and "torch". This parameter is ignored if keras3 = FALSE, as it defaults to the "tensorflow" backend.
check [logical]

A logical value indicating whether to perform environment verification. The default is TRUE. If set to FALSE, the function will skip the interactive check regarding whether the user has properly loaded the tensorflow environment.

Example

 # default values
 control = list(
   # General
   seed = 123,
   core = 1,
   sample = 100,
   dash = 1e-5,
   # LBI
   algorithm = "NLOPT_GN_MLSL",
   pars = NA,
   size = 50,
   # MLE
   iter = 10,
   # MAP
   diff = 0.001,
   patience = 10,
   # SBI
   sample = 100,
   train = 1000,
   scope = "individual",
   # ABC
   tol = 0.1,
   reduction = "PCA",
   ncomp = NULL,
   metric = "mode",
   # RNN
   layer = "GRU",
   loss = "MSE",
   info = c(colnames$object, colnames$action),
   units = 128,
   dropout = 0,
   L = NA_character_,
   penalty = 1e-5,
   batch_size = 10,
   epochs = 100,
   keras3 = FALSE,
   backend = "tensorflow",
   check = TRUE
 )

multiRL documentation built on June 9, 2026, 5:09 p.m.