rcv_d: Step 2: Generating fake data for parameter and model recovery
In binaryRL: Reinforcement Learning Tools for Two-Alternative Forced Choice Tasks

rcv_d

R Documentation

Step 2: Generating fake data for parameter and model recovery

Description

This function is designed for model and parameter recovery of user-created (black-box) models, provided they conform to the specified interface. (demo: TD, RSTD, Utility ). The process involves generating synthetic datasets. First, parameters are randomly sampled within a defined range. These parameters are then used to simulate artificial datasets.

Subsequently, all candidate models are used to fit these simulated datasets. Model recoverability is assessed if a synthetic dataset generated by Model A is consistently best fitted by Model A itself.

Furthermore, the function allows users to evaluate parameter recoverability. If, for instance, a synthetic dataset generated by Model A was based on parameters like 0.3 and 0.7, and Model A then recovers optimal parameters close to 0.3 and 0.7 from this data, it indicates that the parameters of Model A are recoverable.

The function provides several optimization algorithms:

1. L-BFGS-B (from stats::optim)
2. Simulated Annealing (GenSA::GenSA)
3. Genetic Algorithm (GA::ga)
4. Differential Evolution (DEoptim::DEoptim)
5. Particle Swarm Optimization (pso::psoptim)
6. Bayesian Optimization (mlrMBO::mbo)
7. Covariance Matrix Adapting Evolutionary Strategy (cmaes::cma_es)
8. Nonlinear Optimization (nloptr::nloptr)

For more information, please refer to the homepage of this package: https://yuki-961004.github.io/binaryRL/

Usage

rcv_d(
  policy = "off",
  data,
  id = NULL,
  n_trials = NULL,
  funcs = NULL,
  model_names = c("TD", "RSTD", "Utility"),
  simulate_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
  simulate_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
  simulate_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
  fit_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
  fit_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
  fit_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 10)),
  iteration_s = 10,
  iteration_f = 10,
  initial_params = NA,
  initial_size = 50,
  seed = 1,
  nc = 1,
  algorithm
)

Arguments

`policy`	[string] Specifies the learning policy to be used. This determines how the model updates action values based on observed or simulated choices. It can be either `"off"` or `"on"`. Off-Policy (Q-learning): This is the most common approach for modeling reinforcement learning in Two-Alternative Forced Choice (TAFC) tasks. In this mode, the model's goal is to learn the underlying value of each option by observing the human participant's behavior. It achieves this by consistently updating the value of the option that the human actually chose. The focus is on understanding the value representation that likely drove the participant's decisions. Off-Policy (SARSA): In this mode, the target policy and the behavior policy are identical. The model first computes the selection probability for each option based on their current values. Critically, it then uses these probabilities to sample its own action. The value update is then performed on the action that the model itself selected. This approach focuses more on directly mimicking the stochastic choice patterns of the agent, rather than just learning the underlying values from a fixed sequence of actions. default: `policy = "off"`
`data`	[data.frame] This data should include the following mandatory columns: `sub` "Subject" `time_line` "Block" "Trial" `L_choice` "L_choice" `R_choice` "R_choice" `L_reward` "L_reward" `R_reward` "R_reward" `sub_choose` "Sub_Choose"
`id`	[CharacterVector] Specifies which subject's data to use. In parameter and model recovery analyses, the specific subject ID is often irrelevant. Although the experimental trial order might have some randomness for each subject, the sequence of reward feedback is typically pseudo-random. The default value for this argument is `NULL`. When `id = NULL`, the program automatically detects existing subject IDs within the dataset. It then randomly selects one subject as a sample, and the parameter and model recovery procedures are performed based on this selected subject's data. default: `id = NULL`
`n_trials`	[integer] Represents the total number of trials a single subject experienced in the experiment. If this parameter is kept at its default value of `NULL`, the program will automatically detect how many trials a subject experienced from the provided data. This information is primarily used for calculating model fit statistics such as AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion). default: `n_trials = NULL`
`funcs`	[CharacterVector] A character vector containing the names of all user-defined functions required for the computation. When parallel computation is enabled (i.e., `nc > 1`), user-defined models and their custom functions might not be automatically accessible within the parallel environment. Therefore, if you have created your own reinforcement learning model that modifies the package's default six default functions (default functions: `util_func = func_gamma`, `rate_func = func_eta`, `expl_func = func_epsilon`, `bias_func = func_pi`, `prob_func = func_tau`, `loss_func = func_logl` ), you must explicitly provide the names of your custom functions as a vector here.
`model_names`	[List] The names of fit modals e.g. `model_names = c("TD", "RSTD", "Utility")`
`simulate_models`	[List] A collection of functions used to generate simulated data.
`simulate_lower`	[List] The lower bounds for simulate models e.g. `simulate_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0))`
`simulate_upper`	[List] The upper bounds for simulate models e.g. `simulate_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1))`
`fit_models`	[List] A collection of functions applied to fit models to the data. e.g. `fit_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility)`
`fit_lower`	[List] The lower bounds for model fit models e.g. `fit_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0))`
`fit_upper`	[List] The upper bounds for model fit models e.g. `fit_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 10))`
`iteration_s`	[integer] This parameter determines how many simulated datasets are created for subsequent model and parameter recovery analyses. default: `iteration_s = 10`
`iteration_f`	[integer] The number of iterations the optimization algorithm will perform when searching for the best-fitting parameters during the fitting phase. A higher number of iterations may increase the likelihood of finding a global optimum but also increases computation time. default: `iteration_f = 10`
`initial_params`	[NumericVector] Initial values for the free parameters that the optimization algorithm will search from. These are primarily relevant when using algorithms that require an explicit starting point, such as `L-BFGS-B`. If not specified, the function will automatically generate initial values close to zero. default: `initial_params = NA`.
`initial_size`	[integer] This parameter corresponds to the population size in genetic algorithms (`GA`). It specifies the number of initial candidate solutions that the algorithm starts with for its evolutionary search. This parameter is only required for optimization algorithms that operate on a population, such as `GA` or `DEoptim`. default: `initial_size = 50`.
`seed`	[integer] Random seed. This ensures that the results are reproducible and remain the same each time the function is run. default: `seed = 123`
`nc`	[integer] Number of cores to use for parallel processing. Since fitting optimal parameters for each subject is an independent task, parallel computation can significantly speed up the fitting process: nc = 1: The fitting proceeds sequentially. Parameters for one subject are fitted completely before moving to the next subject. nc > 1: The fitting is performed in parallel across subjects. For example, if `nc = 4`, the algorithm will simultaneously fit data for four subjects. Once these are complete, it will proceed to fit the next batch of subjects (e.g., subjects 5-8), and so on, until all subjects are processed. `default: nc = 1`
`algorithm`	[string] Choose an algorithm package from `L-BFGS-B`, `GenSA`,`GA`,`DEoptim`,`PSO`, `Bayesian`, `CMA-ES`. In addition, any algorithm from the `nloptr` package is also supported. If your chosen `nloptr` algorithm requires a local search, you need to input a character vector. The first element represents the algorithm used for global search, and the second element represents the algorithm used for local search.

Value

A list where each element is a data.frame. Each data.frame within this list records the results of fitting synthetic data (generated by Model A) with Model B.

Note

While both fit_p and rcv_d utilize the same underlying optimize_para function to find optimal parameters, they play distinct and sequential roles in the modeling pipeline.

The key differences are as follows:

Purpose and Data Source: rcv_d should always be performed before fit_p. Its primary role is to validate a model's stability by fitting it to synthetic data generated by the model itself. This process, known as parameter recovery, ensures the model is well-behaved. In contrast, fit_p is used in the subsequent stage to fit the validated model to real experimental data.
Estimation Method: rcv_d does not include an estimate argument. This is because the synthetic data is generated from known "true" parameters, which are drawn from pre-defined distributions (typically uniform for most parameters and exponential for the inverse temperature). Since the ground truth is known, a hierarchical estimation (MAP) is not applicable. The fit_p function, however, requires this argument to handle real data where the true parameters are unknown.
Policy Setting: In fit_p, the policy setting has different effects: "on-policy" is better for learning choice patterns, while "off-policy" yields more accurate parameter estimates. For rcv_d, the process defaults to an "off-policy" approach because its main objectives are to verify if the true parameters can be accurately recovered and to assess whether competing models are distinguishable, tasks for which off-policy estimation is more suitable.

Examples

## Not run: 
recovery <- binaryRL::rcv_d(
  data = binaryRL::Mason_2024_G2,
#+-----------------------------------------------------------------------------+#
#|----------------------------- black-box function ----------------------------|#
  #funcs = c("your_funcs"),
  model_names = c("TD", "RSTD", "Utility"),
  simulate_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
  simulate_lower = list(c(0, 1), c(0, 0, 1), c(0, 0, 1)),
  simulate_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
  fit_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
  fit_lower = list(c(0, 1), c(0, 0, 1), c(0, 0, 1)),
  fit_upper = list(c(1, 5), c(1, 1, 5), c(1, 1, 5)),
#|----------------------------- interation number -----------------------------|#
  iteration_s = 100,
  iteration_f = 100,
#|-------------------------------- algorithms ---------------------------------|#
  nc = 1,                 # <nc > 1>: parallel computation across subjects
  # Base R Optimization
  algorithm = "L-BFGS-B"  # Gradient-Based (stats)
#|-----------------------------------------------------------------------------|#
  # Specialized External Optimization
  #algorithm = "GenSA"    # Simulated Annealing (GenSA)
  #algorithm = "GA"       # Genetic Algorithm (GA)
  #algorithm = "DEoptim"  # Differential Evolution (DEoptim)
  #algorithm = "PSO"      # Particle Swarm Optimization (pso)
  #algorithm = "Bayesian" # Bayesian Optimization (mlrMBO)
  #algorithm = "CMA-ES"   # Covariance Matrix Adapting (cmaes)
#|-----------------------------------------------------------------------------|#
  # Optimization Library (nloptr)
  #algorithm = c("NLOPT_GN_MLSL", "NLOPT_LN_BOBYQA")
#|-------------------------------- algorithms ---------------------------------|#
#+#############################################################################+#
)

result <- dplyr::bind_rows(recovery) %>%
  dplyr::select(simulate_model, fit_model, iteration, everything())

# Ensure the output directory exists
if (!dir.exists("../OUTPUT")) {
  dir.create("../OUTPUT", recursive = TRUE)
}

write.csv(result, file = "../OUTPUT/result_recovery.csv", row.names = FALSE)

## End(Not run)

binaryRL documentation built on Aug. 21, 2025, 6:01 p.m.