Nothing
#' @title Model Parameters
#' @name params
#' @description
#'
#' The names of all these parameters are not necessarily fixed. You can define
#' the parameters you need and set their names according to the functions
#' used in your custom model. You must only ensure that the parameter names
#' defined here are consistent with those used in your model's functions,
#' and that their names do not conflict with each other.
#'
#' @section Class:
#' \code{params [List]}
#'
#' @section Note:
#' The parameters are divided into three types: \code{free}, \code{fixed},
#' and \code{constant.} This classification is not mandatory, any parameter
#' can be treated as a free parameter depending on the user's specification.
#' By default, the learning rate \code{alpha} and the inverse-temperature
#' \code{beta} are the required free parameters.
#'
#' @section Slots:
#' \subsection{free}{
#' \itemize{
#' \item \code{alpha [double]}
#'
#' Learning Rate \code{alpha} specifies how aggressively or
#' conservatively the agent adopts the prediction error
#' (the difference between the observed reward and the expected value).
#'
#' A value closer to 1 indicates a more aggressive update of the value
#' function, meaning the agent relies more heavily on the current
#' observed reward. Conversely, a value closer to 0 indicates a more
#' conservative update, meaning the agent trusts its previously
#' established expected value more.
#'
#' \item \code{beta [double]}
#'
#' The inverse temperature parameter, \code{beta}, is a crucial
#' component of the soft-max function. It reflects the extent to which
#' the agent's decision-making relies on the value differences between
#' various available options.
#'
#' A higher value of \code{beta} signifies more rational
#' decision-making; that is, the probability of executing actions with
#' higher expected value is greater. Conversely, a lower \code{beta}
#' value signifies more stochastic decision-making, where the
#' probability of executing different actions becomes nearly equal,
#' regardless of the differences in their expected values.
#' }
#' }
#'
#' \subsection{fixed}{
#' \itemize{
#' \item \code{gamma [double]}
#'
#' The physical reward received is often distinct from the
#' psychological value perceived by an individual. This concept
#' originates in psychophysics, specifically Stevens' Power Law.
#'
#' Note: The default utility function is defined as
#' \eqn{y = x^{\gamma}} and \eqn{\gamma = 1}, which assumed that the
#' physical quantity is equivalent to the psychological quantity.
#'
#' Since any number raised to the power of zero is one, fixing
#' \code{gamma} at 0 holds a unique theoretical significance: it
#' represents the 'H agent' as proposed by Collins, 2025
#' \doi{10.1038/s41562-025-02340-0}.
#' In this state, the agent treats every feedback as a reward,
#' effectively transforming repeated choices into a manifestation of
#' pure habit.
#'
#' \item \code{delta [double]}
#'
#' This parameter represents the weight given to the number of times
#' an option has been selected. Following the Upper Confidence Bound
#' (UCB) algorithm proposed by Sutton and Barto
#' (\href{http://incompleteideas.net/book/the-book-2nd.html}{2018})
#' options that have been selected less frequently should be assigned
#' a higher exploratory bias.
#'
#' Note: With the default set to 0.1, a bias value is effectively
#' applied only to options that have never been chosen. Once an action
#' has been executed even a single time, the assigned bias value
#' approaches zero.
#'
#' \item \code{epsilon [double]}
#'
#' This parameter governs the Exploration-Exploitation trade-off and
#' can be used to implement three distinct strategies by adjusting
#' \code{epsilon} and \code{threshold}:
#'
#' When set to \eqn{\epsilon-greedy}: \code{epsilon} represents the
#' probability that the agent will execute a random exploratory action
#' throughout the entire experiment, regardless of the estimated value.
#'
#' When set to \eqn{\epsilon-decreasing}: The probability of the agent
#' making a random choice decreases as the number of trials increases.
#' The rate of this decay is influenced by \code{epsilon}.
#'
#' By default, \code{epsilon} is set to \code{NA}, which corresponds
#' to the \eqn{\epsilon-first} model. In this model, the agent always
#' selects randomly before a specified trial (\code{threshold = 1}).
#'
#' \item \code{zeta [double]}
#'
#' Collins and Frank, (2012) \doi{10.1111/j.1460-9568.2011.07980.x}
#' proposed that in every trial, not only the chosen option undergoes
#' value updating, but the expected values of unchosen options also
#' decay towards their initial value, due to the constraints of
#' working memory. This specific parameter represents the rate of this
#' decay.
#'
#' Note: A larger value signifies a faster decay from the learned
#' value back to the initial value. The default value is set to 0,
#' which assumes that no such working memory system exists.
#'
#' When assuming the existence of a working memory system, it is
#' advisable to select a meaningful \code{Q0} toward which the
#' Q-values can decay.
#' }
#' }
#'
#' \subsection{constant}{
#' \itemize{
#' \item \code{seed [int]}
#'
#' This seed controls the random choice of actions in the
#' reinforcement learning model when the \code{sample()} function is
#' called to select actions based on probabilities estimated by the
#' softmax. It is not the seed used by the algorithm package when
#' searching for optimal input parameters. In most cases, there is no
#' need to modify this value; please keep it at the default value of
#' \code{123}.
#'
#' \item \code{L [numeric]}
#'
#' This parameter determines the type of regularization applied to the
#' log-likelihood to penalize model complexity, which helps prevent
#' overfitting. The default is \code{NA_real_}, meaning no
#' regularization is applied. Examples of valid inputs include:
#' \itemize{
#' \item \code{L = 0}: L0 regularization, which adds a penalty
#' proportional to the total number of free parameters.
#' \item \code{L = 1}: L1 regularization (Lasso), which adds a
#' penalty proportional to the sum of the absolute values of
#' the free parameters.
#' \item \code{L = 2}: L2 regularization (Ridge), which adds a
#' penalty proportional to the sum of the squared values of
#' the free parameters.
#' \item \code{L = p}: Lp regularization, where \code{p} is any
#' numeric value. The penalty is proportional to the sum of
#' the \code{p}-th power of the absolute values of the free
#' parameters.
#' \item \code{L = 12}: Elastic Net regularization, which applies
#' both L1 and L2 penalties simultaneously.
#' }
#'
#' \item \code{penalty [double]}
#'
#' This parameter specifies the strength of the regularization, acting
#' as a multiplier for the penalty term defined by \code{L}. A larger
#' value imposes a stronger penalty on the free parameters. The
#' default value is \code{1}.
#'
#' \item \code{Q0 [double]}
#'
#' This parameter represents the initial value assigned to each action
#' at the start of the Markov Decision Process. As argued by
#' Sutton and Barto
#' (\href{http://incompleteideas.net/book/the-book-2nd.html}{2018}),
#' initial values are often set to be optimistic
#' (i.e., higher than all possible rewards) to encourage exploration.
#' Conversely, an overly low initial value might lead the agent to
#' cease exploring other options after receiving the first reward,
#' resulting in repeated selection of the initially chosen action.
#'
#' The default value is set to \code{NA}, which implies that the agent
#' will use the first observed reward as the initial value for that
#' action. When combined with Upper Confidence Bound, this setting
#' ensures that every option is selected at least once, and their
#' first rewards are immediately memorized.
#'
#' Note: This is what I consider the reasonable setting. If you
#' think this interpretation unsuitable, you may explicitly set
#' \code{Q0} to 0 or another optimistic initial value instead.
#'
#' \item \code{reset [double]}
#'
#' If changes may occur between blocks, you can choose whether to
#' reset the learned values for each option. By default, no reset is
#' applied. For example, setting \code{reset = 0} means that upon
#' entering a new block, the values of all options are reset to 0. In
#' addition, if \code{Q0} is also set to 0, this implies that the
#' learning rate on the first trial of each block will be 100%.
#'
#' \item \code{lapse [double]}
#'
#' Wilson and Collins, (2019) \doi{10.7554/eLife.49547}
#' introduced the concept of the Lapse Rate, which represents the
#' probability that a subject makes a error (lapse). This parameter
#' ensures that every option has a minimum probability of being chosen,
#' preventing the probability from reaching zero. This is a very
#' reasonable assumption and, crucially, it avoids the numerical
#' instability issue where
#' \eqn{\log(P) = \log(0)} results in \code{-Inf}.
#'
#' Note: The default value here is set to 0.01, meaning every action
#' has at least 1\% probability of being executed by the agent. If the
#' paradigm you use have a large number of available actions, a 1\%
#' minimum probability for each action might be unreasonable. You can
#' adjust this value to be even smaller.
#'
#' \item \code{threshold [double]}
#'
#' This parameter represents the trial number before which the agent
#' will select completely randomly.
#'
#' Note: The default value is set to 1, meaning that only the very
#' first trial involves a purely random choice by the agent.
#'
#' \item \code{bonus [double]}
#'
#' Hitchcock, Kim and Frank, (2025) \doi{10.1037/xge0001817}
#' introduced modifications to the working memory model, positing that
#' the value of unchosen options is not merely subject to decay toward
#' the initial value. They suggest that the outcome obtained after
#' selecting an option might, to some extent, provide information
#' about the value of the unchosen options. This information, referred
#' to as a reward bonus, also influences the value update of the
#' unchosen options.
#'
#' Note: The default value for this \code{bonus} is 0, which assumes
#' that no such bonus value change exists.
#'
#' The concept of a bonus often does not require an additional
#' parameter; instead, it can be implemented through specific
#' \code{if-else} logic. For instance, in tasks with a single correct
#' answer, once the agent identifies the correct choice, it can infer
#' with certainty that the Q-values of all other actions should
#' be updated to zero.
#'
#' \item \code{weight [NumericVector]}
#'
#' The \code{weight} parameter governs the policy integration stage.
#' After each cognitive system (e.g., reinforcement learning (RL) and
#' working memory (WM)) calculates action probabilities using a soft-max
#' function based on its internal value estimates, the agent combines
#' these suggestions into a single choice probability.
#'
#' The default is \code{1}, which is equivalent to
#' \code{weight = c(1, 0)}. This represents exclusive reliance on
#' the first system (typically the Reinforcement Learning system).
#'
#' In a dual-system model (e.g., RL + WM), setting \code{weight = 0.5}
#' implies that the agent places equal trust in both the long-term RL
#' rewards and the immediate WM memory.
#'
#' \item \code{capacity [double]}
#'
#' This parameter represents the maximum number of stimulus-action
#' associations an individual can actively maintain in working memory
#' \eqn{weight = weight_{0} * min(1, (capacity / ns))}.
#'
#' This parameter determines the extent to which working memory (WM)
#' Q-values are prioritized during decision-making. When the stimulus
#' set size (\code{ns}) is within the capacity (\code{capacity}),
#' the model fully relies on the working memory system, resulting in a
#' working memory weight of 1. However, if \code{ns} exceeds
#' \code{capacity}, the decision-making process partially integrates
#' Q-values from the reinforcement learning (RL) system.
#'
#' \item \code{sticky [double]}
#'
#' The \code{sticky} parameter (represented as \eqn{kappa} in
#' Collins, 2025 \doi{10.1038/s41562-025-02340-0}) quantifies the
#' tendency for an agent to repeat a previous choice, a phenomenon
#' known as perseveration. This is fundamentally distinct from
#' value-based decision-making and captures a form of choice inertia.
#' In my opinion, the implementation of stickiness can vary depending
#' on the specifics of the experimental task. Here are three common
#' forms:
#'
#' \itemize{
#' \item Stick to the Same Stimulus:
#' The agent tends to choose the same stimulus that was chosen
#' in the previous trial. For example, if red and blue squares
#' are presented and the agent chose the red square on the
#' last trial, they are more likely to choose the red square
#' again on the current trial, regardless of its position.
#'
#' \item Stick to the Same Position:
#' The agent tends to choose the stimulus at the same physical
#' location as the previously chosen one. For instance, if two
#' stimuli are presented on the left and right sides of the
#' screen and the agent chose the left stimulus on the last
#' trial, they are more likely to choose the left stimulus on
#' the current trial, regardless of what stimulus is presented
#' there.
#'
#' \item Stick to the Same Latent:
#' The agent tends to repeat the same physical motor action.
#' This is particularly relevant in latent learning paradigms
#' where stimuli and responses are dissociated. For example,
#' if the task requires pressing Up, Down, Left, or Right keys
#' in response to colored arrows, an agent who pressed 'Up'
#' on the previous trial might be more inclined to press 'Up'
#' again, irrespective of the arrow stimuli.
#' }
#'
#' }
#' }
#'
#' @section Example:
#' \preformatted{ # TD
#' params = list(
#' free = list(
#' alpha = x[1],
#' beta = x[2]
#' ),
#' fixed = list(
#' gamma = 1,
#' delta = 0.1,
#' epsilon = NA_real_,
#' zeta = 0
#' ),
#' constant = list(
#' seed = 123,
#' L = 0,
#' penalty = 1,
#' Q0 = NA_real_,
#' reset = NA_real_,
#' lapse = 0.01,
#' threshold = 1,
#' bonus = 0,
#' weight = 1,
#' capacity = 0,
#' sticky = 0
#' )
#' )
#' }
#'
#' @references
#' Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning:
#' An Introduction (2nd ed). MIT press.
#'
#' Collins, A. G., & Frank, M. J. (2012). How much of reinforcement learning
#' is working memory, not reinforcement learning? A behavioral, computational,
#' and neurogenetic analysis. \emph{European Journal of Neuroscience, 35}(7),
#' 1024-1035.
#' \doi{10.1111/j.1460-9568.2011.07980.x}
#'
#' Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the
#' computational modeling of behavioral data. \emph{Elife, 8}, e49547.
#' \doi{10.7554/eLife.49547}
#'
#' Hitchcock, P. F., Kim, J., Frank, M. J. (2025). How working memory
#' and reinforcement learning interact when avoiding punishment and pursuing
#' reward concurrently. \emph{Journal of Experimental Psychology: General}.
#' \doi{10.1037/xge0001817}
#'
#' Collins, A. G. (2025). A habit and working memory model as an alternative
#' account of human reward-based learning. \emph{Nature Human Behaviour}, 1-13.
#' \doi{10.1038/s41562-025-02340-0}
#'
NULL
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.