Nothing
#' @title
#' Step 1: Building reinforcement learning model
#'
#' @description
#' This function is designed to construct and customize reinforcement
#' learning models.
#'
#' Items for model construction:
#' \itemize{
#' \item \strong{Data Input and Specification:} You must provide the raw
#' dataset for analysis. Crucially, you need to inform the
#' \code{\link[binaryRL]{run_m}} function about the corresponding column
#' names within your dataset
#' (e.g.,
#' \code{\link[binaryRL]{Mason_2024_G1}},
#' \code{\link[binaryRL]{Mason_2024_G2}}
#' )
#' This is a game, so it's critical that your dataset includes rewards
#' for both the human-chosen option and the unchosen options.
#' \item \strong{Customizable RL Models:} This function allows you
#' to define and adjust the number of free parameters to create
#' various reinforcement learning models.
#' \itemize{
#' \item \emph{Value Function:}
#' \itemize{
#' \item \emph{Learning Rate:}
#' By adjusting the number of \code{eta}, you can construct basic
#' reinforcement learning models such as Temporal Difference (TD)
#' and Risk Sensitive Temporal Difference (RSTD).
#' You can also directly adjust \code{\link[binaryRL]{func_eta}} to
#' define your own custom learning rate function.
#' \item \emph{Utility Function:} You can directly adjust the form
#' of \code{\link[binaryRL]{func_gamma}} to incorporate the
#' principles of Kahneman's Prospect Theory. Currently,
#' the built-in \code{\link[binaryRL]{func_gamma}} only takes the
#' form of a power function, consistent with Stevens' Power Law.
#' }
#' \item \emph{Exploration–Exploitation Trade-off:}
#' \itemize{
#' \item \emph{Initial Values:} This involves setting the
#' initial expected value for each option when it hasn't been
#' chosen yet. A higher initial value encourages exploration.
#' \item \emph{Epsilon:} Adjusting the \code{threshold},
#' \code{epsilon} and \code{lambda} parameters can lead to
#' exploration strategies such as epsilon-first, epsilon-greedy,
#' or epsilon-decreasing.
#' \item \emph{Upper-Confidence-Bound:} By adjusting \code{pi},
#' it controls the degree of exploration by scaling the uncertainty
#' bonus given to less-explored options.
#' \item \emph{Soft-Max:} By adjusting the inverse temperature
#' parameter \code{tau}, this controls the agent's sensitivity to
#' value differences. A higher value of tau means greater emphasis
#' on value differences, leading to more exploitation. A smaller
#' value of tau indicates a greater tendency towards exploration.
#' }
#' }
#'
#' \item \strong{Objective Function Format for Optimization:} Once your
#' model is defined in \code{\link[binaryRL]{run_m}}, it must be structured
#' as an objective function that accepts \code{params} as input and returns
#' a loss value (typically \code{logL}). This format ensures compatibility
#' with the \pkg{algorithm} package, which uses it to estimate optimal
#' parameters. For an example of a standard objective function format, see
#' \code{\link[binaryRL]{TD}},
#' \code{\link[binaryRL]{RSTD}},
#' \code{\link[binaryRL]{Utility}}.
#' }
#'
#' For more information, please refer to the homepage of this package:
#' \url{https://yuki-961004.github.io/binaryRL/}
#'
#' @param name [string]
#'
#' The name of your RL model
#'
#' @param mode [string]
#'
#' This parameter controls the function's operational mode. It has three
#' possible values, each typically associated with a specific function:
#' \itemize{
#' \item \code{"simulate"}: Should be used when working with
#' \code{\link[binaryRL]{rcv_d}}.
#' \item \code{"fit"}: Should be used when working with
#' \code{\link[binaryRL]{fit_p}}.
#' \item \code{"replay"}: Should be used when working with
#' \code{\link[binaryRL]{rpl_e}}.
#' }
#' In most cases, you won't need to modify this parameter directly, as suitable
#' default values are set for different contexts.
#'
#' @param policy [string]
#'
#' Specifies the learning policy to be used.
#' This determines how the model updates action values based on observed or
#' simulated choices. It can be either \code{"off"} or \code{"on"}.
#'
#' \itemize{
#' \item {
#' \strong{Off-Policy (Q-learning): }
#' This is the most common approach for modeling
#' reinforcement learning in Two-Alternative Forced Choice (TAFC) tasks.
#' In this mode, the model's goal is to learn the underlying value of
#' each option by observing the human participant's behavior. It achieves
#' this by consistently updating the value of the option that the
#' human actually chose. The focus is on understanding the value
#' representation that likely drove the participant's decisions.
#' }
#' \item {
#' \strong{Off-Policy (SARSA): }
#' In this mode, the target policy and the behavior policy are identical.
#' The model first computes the selection probability for each option based
#' on their current values. Critically, it then uses these probabilities to
#' sample its own action. The value update is then performed on the action
#' that the model itself selected. This approach focuses more on directly
#' mimicking the stochastic choice patterns of the agent, rather than just
#' learning the underlying values from a fixed sequence of actions.
#' }
#' }
#'
#' @param data [data.frame]
#'
#' This data should include the following mandatory columns:
#' \itemize{
#' \item \code{sub} "Subject"
#' \item \code{time_line} "Block" "Trial"
#' \item \code{L_choice} "L_choice"
#' \item \code{R_choice} "R_choice"
#' \item \code{L_reward} "L_reward"
#' \item \code{R_reward} "R_reward"
#' \item \code{sub_choose} "Sub_Choose"
#' }
#'
#' @param id [string]
#'
#' Which subject is going to be analyzed. The value should correspond to an
#' entry in the "sub" column, which must contain the subject IDs.
#'
#' e.g. \code{id = 18}
#'
#' @param n_params [integer]
#'
#' The number of free parameters in your model.
#'
#' @param n_trials [integer]
#'
#' The total number of trials in your experiment.
#'
#' @param gamma [NumericVector]
#'
#' \strong{Note}: This should not be confused with the discount rate parameter
#' (also named gamma) found in Temporal Difference (TD) models.
#' Rescorla-Wagner model does not include a discount rate.
#' Here, \code{gamma} is used as a free parameter to shape the
#' utility function.
#'
#' \itemize{
#' \item \strong{Stevens' Power Law}:
#' Utility is modeled as:
#' \deqn{U(R) = {R}^{\gamma}}
#'
#' \item \strong{Kahneman's Prospect Theory}:
#' This exponent is applied differently based on the sign of the reward:
#' \deqn{U(R) = \begin{cases}
#' R^{\gamma_{1}}, & R > 0 \\
#' \beta \cdot R^{\gamma_{2}}, & R < 0
#' \end{cases}}
#' }
#'
#' default: \code{gamma = 1}
#'
#' @param eta [NumericVector]
#'
#' Parameters used in the Learning Rate Function,
#' \code{rate_func},
#' representing the rate at which the subject updates the difference
#' (prediction error) between the reward and the expected value in the
#' subject's mind.
#'
#' The structure of \code{eta} depends on the model type:
#' \itemize{
#' \item For the \strong{Temporal Difference (TD) model},
#' where a single learning rate is used throughout the experiment
#' \deqn{V_{new} = V_{old} + \eta \cdot (R - V_{old})}
#'
#' \item For the \strong{Risk-Sensitive Temporal Difference (RDTD) model},
#' where two different learning rates are used depending on whether the
#' reward is lower or higher than the expected value:
#' \deqn{V_{new} = V_{old} + \eta_{+} \cdot (R - V_{old}), R > V_{old}}
#' \deqn{V_{new} = V_{old} + \eta_{-} \cdot (R - V_{old}), R < V_{old}}
#' }
#'
#' TD: \code{eta = 0.3}
#'
#' RSTD: \code{eta = c(0.3, 0.7)}
#'
#' @param initial_value [double]
#'
#' Subject's initial expected value for each stimulus's reward. If this value
#' is not set \code{initial_value = NA}, the subject will use the reward received
#' after the first trial as the initial value for that stimulus. In other
#' words, the learning rate for the first trial is 100%.
#'
#' default: \code{initial_value = NA_real_}
#'
#' @param threshold [integer]
#'
#' Controls the initial exploration phase in the \strong{epsilon-first} strategy.
#' This is the number of early trials where the subject makes purely random
#' choices, as they haven't yet learned the options' values. For example,
#' \code{threshold = 20} means random choices for the first 20 trials.
#' For \strong{epsilon-greedy} or \strong{epsilon-decreasing} strategies,
#' \code{threshold} should be kept at its default value.
#'
#' \deqn{P(x) = \begin{cases}
#' \text{trial} \le \text{threshold}, & x=1 \text{ (random choosing)} \\
#' \text{trial} > \text{threshold}, & x=0 \text{ (value-based choosing)}
#' \end{cases}}
#'
#' default: \code{threshold = 1}
#'
#' epsilon-first: \code{threshold = 20, epsilon = NA, lambda = NA}
#'
#' @param epsilon [NumericVector]
#'
#' A parameter used in the \strong{epsilon-greedy} exploration strategy. It
#' defines the probability of making a completely random choice, as opposed
#' to choosing based on the relative values of the left and right options.
#' For example, if \code{epsilon = 0.1}, the subject has a 10% chance of random
#' choice and a 90% chance of value-based choice. This parameter is only
#' relevant when \code{threshold} is at its default value (1) and
#' \code{lambda} is not set.
#'
#' \deqn{P(x) = \begin{cases}
#' \epsilon, & x=1 \text{ (random choosing)} \\
#' 1-\epsilon, & x=0 \text{ (value-based choosing)}
#' \end{cases}}
#'
#' epsilon-greedy: \code{threshold = 1, epsilon = 0.1, lambda = NA}
#'
#' @param lambda [NumericVector]
#'
#' A numeric value that controls the decay rate of exploration probability
#' in the \strong{epsilon-decreasing} strategy. A higher \code{lambda} value
#' means the probability of random choice will decrease more rapidly
#' as the number of trials increases.
#'
#' \deqn{P(x) = \begin{cases}
#' \frac{1}{1+\lambda \cdot trial}, & x=1 \text{ (random choosing)} \\
#' \frac{\lambda \cdot trial}{1+\lambda \cdot trial}, & x=0 \text{ (value-based choosing)}
#' \end{cases}}
#'
#' epsilon-decreasing: \code{threshold = 1, epsilon = NA, lambda = 0.5}
#'
#' @param pi [NumericVector]
#'
#' Parameter used in the Upper-Confidence-Bound (UCB) action selection
#' formula. \code{bias_func} controls the degree of
#' exploration by scaling the uncertainty bonus given to less-explored options.
#' A larger value of \code{pi} (denoted as \code{c} in Sutton and Barto(2018) )
#' increases the influence of this bonus, leading to more exploration of
#' actions with uncertain estimated values. Conversely, a smaller \code{pi}
#' results in less exploration.
#'
#' \deqn{
#' A_t = \arg \max_{a} \left[ V_t(a) + \pi \sqrt{\frac{\ln(t)}{N_t(a)}} \right]
#' }
#'
#' default: \code{pi = NA}
#'
#' @param tau [NumericVector]
#'
#' Parameters used in the Soft-Max Function. \code{prob_func}
#' representing the sensitivity of the subject to the value difference when
#' making decisions. It determines the probability of selecting the left option
#' versus the right option based on their values. A larger value of tau
#' indicates greater sensitivity to the value difference between the options.
#' In other words, even a small difference in value will make the subject more
#' likely to choose the higher-value option.
#'
#' \deqn{P_L = \frac{1}{1+e^{-(V_L-V_R) \cdot \tau}}; P_R = \frac{1}{1+e^{-(V_R-V_L) \cdot \tau}}}
#'
#' default \code{tau = NA}
#'
#' @param lapse [double]
#'
#' A numeric value between 0 and 1, representing the lapse rate.
#'
#' You can interpret this parameter as the probability of the agent "slipping"
#' or making a random choice, irrespective of the learned action values. This
#' accounts for moments of inattention or motor errors. In this sense, it
#' represents the minimum probability with which any given option will be
#' selected. It is a free parameter that acknowledges that individuals do not
#' always make decisions with full concentration throughout an experiment.
#'
#' From a modeling perspective, the lapse rate is crucial for preventing the
#' log-likelihood calculation from returning \code{-Inf}. This issue arises
#' when the model assigns a probability of zero to an action that the
#' participant actually chose (\code{log(0)} is undefined). By ensuring every
#' option has a non-zero minimum probability, the \code{lapse} parameter makes
#' the fitting process more stable and robust against noise in the data.
#'
#' \deqn{
#' P_{final} = (1 - lapse) \cdot P_{softmax} + \frac{lapse}{N_{choices}}
#' }
#'
#' default: \code{lapse = 0.02}
#'
#' This ensures each option has a minimum selection probability of 1 percent
#' in TAFC tasks.
#'
#' @param alpha [NumericVector]
#'
#' Extra parameters that may be used in functions.
#'
#' @param beta [NumericVector]
#'
#' Extra parameters that may be used in functions.
#'
#' @param priors [list]
#'
#' A list specifying the prior distributions for the model parameters.
#' This argument is mandatory when using \code{estimate = "MAP"}.
#'
#' default: \code{priors = NULL}
#'
#' @param util_func [Function]
#'
#' Utility Function see \code{\link[binaryRL]{func_gamma}}.
#'
#' @param rate_func [Function]
#'
#' Learning Rate Function see \code{\link[binaryRL]{func_eta}}.
#'
#' @param expl_func [Function]
#'
#' Exploration Strategy Function see \code{\link[binaryRL]{func_epsilon}}.
#'
#' @param bias_func [Function]
#'
#' Upper-Confidence-Bound see \code{\link[binaryRL]{func_pi}}.
#'
#' @param prob_func [Function]
#'
#' Soft-Max Function see \code{\link[binaryRL]{func_tau}}.
#'
#' @param loss_func [Function]
#'
#' Loss Function see \code{\link[binaryRL]{func_logl}}.
#'
#' @param sub [string]
#'
#' Column name of subject ID
#'
#' e.g. \code{sub = "Subject"}
#'
#' @param time_line [CharacterVector]
#'
#' A vector specifying the name of the column that the sequence of the
#' experiment. This argument defines how the experiment is structured,
#' such as whether it is organized by "Block" with breaks in between, and
#' multiple trials within each block.
#'
#' default: \code{time_line = c("Block", "Trial")}
#'
#' @param L_choice [string]
#'
#' Column name of left choice.
#'
#' default: \code{L_choice = "Left_Choice"}
#'
#' @param R_choice [string]
#'
#' Column name of right choice.
#'
#' default: \code{R_choice = "Right_Choice"}
#'
#' @param L_reward [string]
#'
#' Column name of the reward of left choice
#'
#' default: \code{L_reward = "Left_reward"}
#'
#' @param R_reward [string]
#'
#' Column name of the reward of right choice
#'
#' default: \code{R_reward = "Right_reward"}
#'
#' @param sub_choose [string]
#'
#' Column name of choices made by the subject.
#'
#' default: \code{sub_choose = "Choose"}
#'
#' @param rob_choose [string]
#'
#' Column name of choices made by the model, which you could ignore.
#'
#' default: \code{rob_choose = "Rob_Choose"}
#'
#' @param raw_cols [CharacterVector]
#'
#' Defaults to \code{NULL}. If left as \code{NULL}, it will directly capture
#' all column names from the raw data.
#'
#' @param var1 [string]
#'
#' Column name of extra variable 1. If your model uses more than just reward
#' and expected value, and you need other information, such as whether the
#' choice frame is Gain or Loss, then you can input the 'Frame' column as
#' var1 into the model.
#'
#' default: \code{var1 = NA_character_}
#'
#' @param var2 [string]
#'
#' Column name of extra variable 2. If one additional variable, var1, does not
#' meet your needs, you can add another additional variable, var2, into your
#' model.
#'
#' default: \code{var2 = NA_character_}
#'
#' @param seed [integer]
#'
#' Random seed. This ensures that the results are
#' reproducible and remain the same each time the function is run.
#'
#' default: \code{seed = 123}
#'
#' @param digits_1 [integer]
#'
#' The number of decimal places to retain for columns related to value function
#'
#' default: \code{digits_1 = 2}
#'
#' @param digits_2 [integer]
#'
#' The number of decimal places to retain for columns related to select function.
#'
#' default: \code{digits_2 = 5}
#'
#' @param engine [string]
#'
#' - \code{"r"}: Use the pure R version of the code.
#'
#' - \code{"cpp"}: Use the Rcpp-optimized version.
#'
#' default: \code{engine = "cpp"}
#'
#' @returns
#' A list of class \code{binaryRL} containing the results of the model fitting.
#'
#' @examples
#' data <- binaryRL::Mason_2024_G2
#'
#' binaryRL.res <- binaryRL::run_m(
#' mode = "replay",
#' data = data,
#' id = 18,
#' eta = c(0.321, 0.765),
#' tau = 0.5,
#' n_params = 3,
#' n_trials = 360
#' )
#'
#' summary(binaryRL.res)
#'
run_m <- function(
name = NA,
mode = c("simulate", "fit", "replay"),
policy = c("on", "off"),
data,
id,
n_params,
n_trials,
gamma = 1,
eta,
initial_value = NA_real_,
threshold = 1,
epsilon = NA,
lambda = NA,
pi = NA,
tau = NA,
lapse = 0.02,
alpha = NA,
beta = NA,
priors = NULL,
util_func = func_gamma,
rate_func = func_eta,
expl_func = func_epsilon,
bias_func = func_pi,
prob_func = func_tau,
loss_func = func_logl,
sub = "Subject",
time_line = c("Block", "Trial"),
L_choice = "L_choice",
R_choice = "R_choice",
L_reward = "L_reward",
R_reward = "R_reward",
sub_choose = "Sub_Choose",
rob_choose = "Rob_Choose",
raw_cols = NULL,
var1 = NA_character_,
var2 = NA_character_,
seed = 123,
digits_1 = 2,
digits_2 = 5,
engine = "cpp"
){
# 只有fit模式需要关心是否是on-policy还是off-policy
if (mode == "fit") {
policy <- policy
}
# 因为不论是simulate还是repay, 都是根据参数生成假数据, 一定是on
else {
policy <- "on"
}
# model name
if (!(is.na(name))) {
name <- name
}
else if (is.na(name) & length(eta) == 1 & gamma == 1) {
name <- "TD"
}
else if (is.na(name) & length(eta) == 2 & gamma == 1) {
name <- "RSTD"
}
else if (is.na(name) & length(eta) == 1 & gamma != 1) {
name <- "Utility"
}
else {
name <- "Customed"
}
# 原始数据集的列名
if (is.null(raw_cols)) {
raw_cols = colnames(data)
}
# 选择被试
data <- data[data[[sub]] == id, ]
step1 <- unique_choice(
data = data,
L_choice = L_choice,
R_choice = R_choice
)
step2 <- arrange_data(
data = step1[["data"]],
time_line = time_line
)
step3 <- add_NA(
data = step2
)
step4 <- set_initial_value(
data = step3,
options = step1[["options"]],
initial_value = initial_value
)
if (engine == "cpp") {
decision_making <- decision_making_cpp
}
else {
decision_making <- decision_making_r
}
step5 <- decision_making(
mode = mode,
policy = policy,
data = step4,
options = step1[["options"]],
seed = seed,
sub_choose = sub_choose, rob_choose = rob_choose,
L_choice = L_choice, R_choice = R_choice,
L_reward = L_reward, R_reward = R_reward,
var1 = var1, var2 = var2,
initial_value = initial_value,
threshold = threshold,
lapse = lapse,
alpha = alpha,
beta = beta,
gamma = gamma,
eta = eta,
epsilon = epsilon,
lambda = lambda,
pi = pi,
tau = tau,
util_func = util_func,
rate_func = rate_func,
expl_func = expl_func,
bias_func = bias_func,
prob_func = prob_func
)
step6 <- model_fit(
data = step5,
loss_func = loss_func,
alpha = alpha,
beta = beta,
var1 = var1,
var2 = var2,
L_choice = L_choice,
R_choice = R_choice,
sub_choose = sub_choose
)
step7 <- digits(
data = step6,
options = step1[["options"]],
digits_1 = digits_1,
digits_2 = digits_2
)
step8 <- output(
mode = mode,
policy = policy,
data = step7,
name = name,
n_params = n_params,
n_trials = n_trials,
initial_value = initial_value,
threshold = threshold,
lapse = lapse,
alpha = alpha,
beta = beta,
gamma = gamma,
eta = eta,
epsilon = epsilon,
lambda = lambda,
pi = pi,
tau = tau,
priors = priors
)
step9 <- mode(
data = step8,
mode = mode,
sub_choose = sub_choose,
rob_choose = rob_choose,
raw_cols = raw_cols
)
final <- step9
return(final)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.