R/step2_rcv_d.R
In binaryRL: Reinforcement Learning Tools for Two-Alternative Forced Choice Tasks

Documented in rcv_d

#' @title 
#' Step 2: Generating fake data for parameter and model recovery
#' 
#' @description
#' This function is designed for model and parameter recovery of user-created
#'  (black-box) models, provided they conform to the specified interface.
#'  (demo:  
#'     \code{\link[binaryRL]{TD}}, 
#'     \code{\link[binaryRL]{RSTD}}, 
#'     \code{\link[binaryRL]{Utility}}
#'  ).
#' The process involves generating synthetic datasets. First, parameters
#'  are randomly sampled within a defined range. These parameters are then
#'  used to simulate artificial datasets.
#'
#' Subsequently, all candidate models are used to fit these simulated
#'  datasets. Model recoverability is assessed if a synthetic dataset
#'  generated by Model A is consistently best fitted by Model A itself.
#'
#' Furthermore, the function allows users to evaluate parameter
#'  recoverability. If, for instance, a synthetic dataset generated by
#'  Model A was based on parameters like 0.3 and 0.7, and Model A then
#'  recovers optimal parameters close to 0.3 and 0.7 from this data,
#'  it indicates that the parameters of Model A are recoverable.
#'  
#'  The function provides several optimization algorithms:
#'   \itemize{
#'     \item 1. L-BFGS-B (from \code{stats::optim})
#'     \item 2. Simulated Annealing (\code{GenSA::GenSA})
#'     \item 3. Genetic Algorithm (\code{GA::ga})
#'     \item 4. Differential Evolution (\code{DEoptim::DEoptim})
#'     \item 5. Particle Swarm Optimization (\code{pso::psoptim})
#'     \item 6. Bayesian Optimization (\code{mlrMBO::mbo})
#'     \item 7. Covariance Matrix Adapting Evolutionary Strategy (\code{cmaes::cma_es})
#'     \item 8. Nonlinear Optimization (\code{nloptr::nloptr})
#'   }
#' 
#'  For more information, please refer to the homepage of this package:
#'  \url{https://yuki-961004.github.io/binaryRL/}
#'  
#' @note
#' While both \code{fit_p} and \code{rcv_d} utilize the same underlying
#' \code{optimize_para} function to find optimal parameters, they play
#' distinct and sequential roles in the modeling pipeline.
#'
#' The key differences are as follows:
#' \enumerate{
#'   \item{\strong{Purpose and Data Source}: \code{rcv_d} should always be
#'     performed before \code{fit_p}. Its primary role is to validate a
#'     model's stability by fitting it to synthetic data generated by the
#'     model itself. This process, known as parameter recovery, ensures the
#'     model is well-behaved. In contrast, \code{fit_p} is used in the
#'     subsequent stage to fit the validated model to real experimental
#'     data.}
#'
#'   \item{\strong{Estimation Method}: \code{rcv_d} does not include an
#'     \code{estimate} argument. This is because the synthetic data is
#'     generated from known "true" parameters, which are drawn from
#'     pre-defined distributions (typically uniform for most parameters and
#'     exponential for the inverse temperature). Since the ground truth is
#'     known, a hierarchical estimation (MAP) is not applicable. The
#'     \code{fit_p} function, however, requires this argument to handle
#'     real data where the true parameters are unknown.}
#'
#'   \item{\strong{Policy Setting}: In \code{fit_p}, the \code{policy}
#'     setting has different effects: "on-policy" is better for learning
#'     choice patterns, while "off-policy" yields more accurate parameter
#'     estimates. For \code{rcv_d}, the process defaults to an "off-policy"
#'     approach because its main objectives are to verify if the true
#'     parameters can be accurately recovered and to assess whether competing
#'     models are distinguishable, tasks for which off-policy estimation is
#'     more suitable.}
#' }
#' 
#' @param policy [string]
#' 
#' Specifies the learning policy to be used.
#' This determines how the model updates action values based on observed or
#'   simulated choices. It can be either \code{"off"} or \code{"on"}.
#'   
#'  \itemize{
#'   \item {
#'    \strong{Off-Policy (Q-learning): }
#'    This is the most common approach for modeling
#'     reinforcement learning in Two-Alternative Forced Choice (TAFC) tasks.
#'     In this mode, the model's goal is to learn the underlying value of
#'     each option by observing the human participant's behavior. It achieves
#'     this by consistently updating the value of the option that the
#'     human actually chose. The focus is on understanding the value 
#'     representation that likely drove the participant's decisions.
#'   }
#'   \item {
#'    \strong{Off-Policy (SARSA): }
#'    In this mode, the target policy and the behavior policy are identical. 
#'     The model first computes the selection probability for each option based 
#'     on their current values. Critically, it then uses these probabilities to 
#'     sample its own action. The value update is then performed on the action 
#'     that the model itself selected. This approach focuses more on directly 
#'     mimicking the stochastic choice patterns of the agent, rather than just 
#'     learning the underlying values from a fixed sequence of actions.
#'   }
#'  }
#'  
#' default: \code{policy = "off"}
#' 
#' @param data [data.frame] 
#' 
#' This data should include the following mandatory columns: 
#'  \itemize{
#'    \item \code{sub} "Subject"
#'    \item \code{time_line} "Block" "Trial"
#'    \item \code{L_choice} "L_choice"
#'    \item \code{R_choice} "R_choice"
#'    \item \code{L_reward} "L_reward"
#'    \item \code{R_reward} "R_reward"
#'    \item \code{sub_choose} "Sub_Choose"
#'  }
#'  
#' @param id [CharacterVector]
#' 
#' Specifies which subject's data to use. In parameter and model recovery
#'  analyses, the specific subject ID is often irrelevant. Although the
#'  experimental trial order might have some randomness for each subject,
#'  the sequence of reward feedback is typically pseudo-random.
#'
#' The default value for this argument is \code{NULL}. When \code{id = NULL},
#'  the program automatically detects existing subject IDs within the
#'  dataset. It then randomly selects one subject as a sample, and the
#'  parameter and model recovery procedures are performed based on this
#'  selected subject's data.
#'  
#' default: \code{id = NULL}
#'  
#' @param n_trials [integer]
#' 
#' Represents the total number of trials a single subject experienced
#'  in the experiment. If this parameter is kept at its default value
#'  of \code{NULL}, the program will automatically detect how many trials
#'  a subject experienced from the provided data. This information
#'  is primarily used for calculating model fit statistics such as
#'  AIC (Akaike Information Criterion) and BIC (Bayesian Information
#'  Criterion).
#'  
#'  default: \code{n_trials = NULL}
#'  
#' @param funcs [CharacterVector]
#' 
#' A character vector containing the names of all user-defined functions
#'  required for the computation. When parallel computation is enabled
#'  (i.e., \code{nc > 1}), user-defined models and their custom functions 
#'  might not be automatically accessible within the parallel environment.
#'
#' Therefore, if you have created your own reinforcement learning model
#'  that modifies the package's default six default functions 
#'  (default functions: 
#'     \code{util_func = \link[binaryRL]{func_gamma}}, 
#'     \code{rate_func = \link[binaryRL]{func_eta}}, 
#'     \code{expl_func = \link[binaryRL]{func_epsilon}},
#'     \code{bias_func = \link[binaryRL]{func_pi}},
#'     \code{prob_func = \link[binaryRL]{func_tau}},
#'     \code{loss_func = \link[binaryRL]{func_logl}}
#'  ), 
#'  you must explicitly provide the names of your custom functions as a 
#'  vector here.
#'  
#' @param model_names [List] 
#' 
#' The names of fit modals
#' 
#' e.g. \code{model_names = c("TD", "RSTD", "Utility")}
#'  
#' @param simulate_models [List] 
#' 
#' A collection of functions used to generate simulated data.
#' 
#' @param simulate_lower [List] 
#' 
#' The lower bounds for simulate models
#' 
#' e.g. \code{simulate_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0))}
#' 
#' @param simulate_upper [List] 
#' The upper bounds for simulate models
#' 
#' e.g. \code{simulate_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1))}
#' 
#' @param fit_models [List] 
#' 
#' A collection of functions applied to fit models to the data.
#' 
#' e.g. \code{fit_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility)}
#' 
#' @param fit_lower [List] 
#' 
#' The lower bounds for model fit models
#' 
#' e.g. \code{fit_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0))}
#' 
#' @param fit_upper [List] 
#' 
#' The upper bounds for model fit models
#' 
#' e.g. \code{fit_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 10))}
#'  
#' @param iteration_s [integer]
#' 
#' This parameter determines how many simulated datasets are created for 
#'  subsequent model and parameter recovery analyses.
#'  
#' default: \code{iteration_s = 10}
#' 
#' @param iteration_f [integer]
#' 
#' The number of iterations the optimization algorithm will perform
#'  when searching for the best-fitting parameters during the fitting
#'  phase. A higher number of iterations may increase the likelihood of 
#'  finding a global optimum but also increases computation time.
#'  
#' default: \code{iteration_f = 10}
#' 
#' @param initial_params [NumericVector]
#' 
#' Initial values for the free parameters that the optimization algorithm will
#'  search from. These are primarily relevant when using algorithms that require
#'  an explicit starting point, such as \code{L-BFGS-B}. If not specified,
#'  the function will automatically generate initial values close to zero.
#'  
#'  default: \code{initial_params = NA}.
#'
#' @param initial_size [integer]
#' 
#' This parameter corresponds to the population size in genetic 
#'  algorithms (\code{GA}). It specifies the number of initial candidate
#'  solutions that the algorithm starts with for its evolutionary search.
#'  This parameter is only required for optimization algorithms that operate on
#'  a population, such as \code{GA} or \code{DEoptim}. 
#'  
#'  default: \code{initial_size = 50}.
#' 
#' @param seed [integer] 
#' 
#' Random seed. This ensures that the results are 
#'  reproducible and remain the same each time the function is run. 
#'  
#'  default: \code{seed = 123}
#'  
#' @param nc [integer]
#' 
#' Number of cores to use for parallel processing. Since fitting
#' optimal parameters for each subject is an independent task,
#' parallel computation can significantly speed up the fitting process:
#' \itemize{
#'   \item \strong{nc = 1}: The fitting proceeds sequentially.
#'   Parameters for one subject are fitted completely before moving
#'   to the next subject.
#'   \item \strong{nc > 1}: The fitting is performed in parallel
#'   across subjects. For example, if \code{nc = 4}, the algorithm will
#'   simultaneously fit data for four subjects. Once these are complete,
#'   it will proceed to fit the next batch of subjects (e.g., subjects
#'   5-8), and so on, until all subjects are processed.
#' }
#' 
#'  \code{default: nc = 1}
#'  
#' @param algorithm [string] 
#' 
#' Choose an algorithm package from
#'  \code{L-BFGS-B}, \code{GenSA},\code{GA},\code{DEoptim},\code{PSO},
#'  \code{Bayesian}, \code{CMA-ES}.
#'  
#' In addition, any algorithm from the \code{nloptr} package is also
#'  supported. If your chosen \code{nloptr} algorithm requires a local search,
#'  you need to input a character vector. The first element represents
#'  the algorithm used for global search, and the second element represents
#'  the algorithm used for local search.
#'
#' @returns 
#' A list where each element is a data.frame. 
#'  Each data.frame within this list records the results of fitting synthetic 
#'  data (generated by Model A) with Model B.
#'  
#' @examples
#' \dontrun{
#' recovery <- binaryRL::rcv_d(
#'   data = binaryRL::Mason_2024_G2,
#' #+-----------------------------------------------------------------------------+#
#' #|----------------------------- black-box function ----------------------------|#
#'   #funcs = c("your_funcs"),
#'   model_names = c("TD", "RSTD", "Utility"),
#'   simulate_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
#'   simulate_lower = list(c(0, 1), c(0, 0, 1), c(0, 0, 1)),
#'   simulate_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
#'   fit_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
#'   fit_lower = list(c(0, 1), c(0, 0, 1), c(0, 0, 1)),
#'   fit_upper = list(c(1, 5), c(1, 1, 5), c(1, 1, 5)),
#' #|----------------------------- interation number -----------------------------|#
#'   iteration_s = 100,
#'   iteration_f = 100,
#' #|-------------------------------- algorithms ---------------------------------|#
#'   nc = 1,                 # <nc > 1>: parallel computation across subjects
#'   # Base R Optimization
#'   algorithm = "L-BFGS-B"  # Gradient-Based (stats)
#' #|-----------------------------------------------------------------------------|#
#'   # Specialized External Optimization
#'   #algorithm = "GenSA"    # Simulated Annealing (GenSA)
#'   #algorithm = "GA"       # Genetic Algorithm (GA)
#'   #algorithm = "DEoptim"  # Differential Evolution (DEoptim)
#'   #algorithm = "PSO"      # Particle Swarm Optimization (pso)
#'   #algorithm = "Bayesian" # Bayesian Optimization (mlrMBO)
#'   #algorithm = "CMA-ES"   # Covariance Matrix Adapting (cmaes)
#' #|-----------------------------------------------------------------------------|#
#'   # Optimization Library (nloptr)
#'   #algorithm = c("NLOPT_GN_MLSL", "NLOPT_LN_BOBYQA")
#' #|-------------------------------- algorithms ---------------------------------|#
#' #+#############################################################################+#
#' )
#'
#' result <- dplyr::bind_rows(recovery) %>%
#'   dplyr::select(simulate_model, fit_model, iteration, everything())
#'
#' # Ensure the output directory exists
#' if (!dir.exists("../OUTPUT")) {
#'   dir.create("../OUTPUT", recursive = TRUE)
#' }
#' 
#' write.csv(result, file = "../OUTPUT/result_recovery.csv", row.names = FALSE)
#' }
#' 
rcv_d <- function(
  policy = "off",
  
  data,
  id = NULL,
  n_trials = NULL,
  
  funcs = NULL,
  model_names = c("TD", "RSTD", "Utility"),
  simulate_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
  simulate_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
  simulate_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
  fit_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
  fit_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
  fit_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 10)),
  
  iteration_s = 10,
  iteration_f = 10,
  
  initial_params = NA,
  initial_size = 50,
  seed = 1,
  
  nc = 1,
  algorithm
){
  # 事前准备. 探测信息
  info <- suppressWarnings(suppressMessages(detect_information(data = data)))
  
  if (is.null(n_trials)) {
    n_trials <- info[["n_trials"]]
  }
  
  if (is.null(id)) {
    id <- info[["random_id"]]
  }
  
  # 需要循环多少次
  n_round_s <- length(simulate_models)
  n_round_f <- length(fit_models)
  
  # 创建空list用于储存结果
  list_recovery <- list()
  
  df_recovery <- list()
  
  # simulate list
  for (i in 1:n_round_s){
    np <- length(simulate_lower[[i]])
    nt <- n_trials
    
    list_simulated <- simulate_list(
      data = data,
      id = id,
      obj_func = simulate_models[[i]],
      n_params = np, 
      n_trials = nt,
      lower = simulate_lower[[i]],
      upper = simulate_upper[[i]],
      seed = seed,
      iteration = iteration_s
    )
    
    names(list_simulated) <- rep(model_names[i], length(list_simulated))
    
    # recovery data
    for (j in 1:n_round_f){
      
      # 编译对象函数
      obj_func <- compiler::cmpfun(fit_models[[j]])
      
      message(paste0(
        "\n", 
        "Simulating Model: ", model_names[i], 
        " | Fitting Model: ", model_names[j], 
        "\n"
      ))
      
      np <- length(fit_lower[[j]])
      nt <- n_trials
      
      list_recovery[[j]] <- recovery_data(
        list = list_simulated,
        id = id,
        n_trials = nt,
        n_params = np, 
        
        funcs = funcs,
        policy = policy,
        model_name = model_names[j],
        fit_model = obj_func,
        lower = fit_lower[[j]],
        upper = fit_upper[[j]],
        
        initial_params = initial_params,
        initial_size = initial_size,
        
        iteration = iteration_f,
        nc = nc,
        algorithm = algorithm
      )
      
      list_recovery[[j]]$simulate_model <- model_names[i]
      list_recovery[[j]]$iteration <- 1:nrow(list_recovery[[j]])
    }
    
    df_recovery[[i]] <- list_recovery
  }

  result <- df_recovery
  
  return(result)
}