func_alpha: Function: Learning Rate
In multiRL: Reinforcement Learning Tools for Multi-Armed Bandit

func_alpha

R Documentation

Function: Learning Rate

Description

Q_{new} = Q_{old} + \alpha \cdot (R - Q_{old})

Usage

func_alpha(
  shown,
  is.fp,
  qvalue,
  reward,
  utility,
  system,
  rownum,
  params,
  hidden,
  ...
)

Arguments

`shown`	Which options shown in this trial.
`is.fp`	Is it the first time picking this option?
`qvalue`	The expected Q values of different behaviors produced by different systems when updated to this trial.
`reward`	The feedback received by the agent from the environment at trial(t) following the execution of action(a)
`utility`	The subjective value (internal representation) assigned by the agent to the objective reward.
`system`	When the agent makes a decision, is a single system at work, or are multiple systems involved? see system
`rownum`	The trial number
`params`	Parameters used by the model's internal functions, see params
`hidden`	All hidden variables within the MDP process belong here.
`...`	It currently contains the following information; additional information may be added in future package versions. idinfo: subid block trial exinfo: contains information whose column names are specified by the user. Frame RT NetWorth ... behave: includes the following: action: the behavior performed by the human in the given trial. latent: the object updated by the agent in the given trial. simulation: the actual behavior performed by the agent. position: the position of the stimulus on the screen. cue and rsp: Cues and responses within latent learning rules, see behrule state: The state stores the stimuli shown in the current trial—split into components by underscores—and the rewards associated with them.

Value

A List

output [NumericVector]

A numeric value representing the updated Q-value after learning.

This function specifies how prediction error (PE) is incorporated into value updating, using a learning rate that determines whether updates are more conservative or more aggressive in response to PE.
hidden [CharacterVector]

User-defined internal variables generated by this function. These represent intermediate (latent) states produced during computation, which can be read or modified by other functions in the MDP process.

Body

func_alpha <- function(
    shown,
    is.fp,
    qvalue,
    reward,
    utility,
    params,
    rownum,
    system,
    hidden,
    ...
){

  list2env(list(...), envir = environment())
  
  # If you need extra information(...)
  # Column names may be lost(C++), indexes are recommended
  # e.g.
  # Trial  <- idinfo[3]
  # Frame  <- exinfo[1]
  # Action <- behave[1]
  
  Q0        <-  params[["Q0"]]
  alpha     <-  params[["alpha"]]
  alphaN    <-  params[["alphaN"]]
  alphaP    <-  params[["alphaP"]]
  
  if (is.nan(Q0) && first) {
    update <- utility
    hidden[1] <- "first"
    return(list(output = update, hidden = hidden))
  }

  # Determine the model currently in use based on which parameters are free.
  if (
    system == "RL" && !(is.null(alpha)) && is.null(alphaN) && is.null(alphaP)
  ) {
    model <- "TD"
  } else if (
    system == "RL" && is.null(alpha) && !(is.null(alphaN)) && !(is.null(alphaP))
  ) {
    model <- "RSTD"
  } else if (
    system == "WM"
  ) {
    model <- "WM"
  } else {
    stop("Unknown Model! Plase modify your learning rate function")
  }
  
  # TD
  if (model == "TD") {
    update <- qvalue + alpha * (utility - qvalue)
  # RSTD
  } else if (model == "RSTD" && utility < qvalue) {
    update <- qvalue + alphaN * (utility - qvalue)
  } else if (model == "RSTD" && utility >= qvalue) {
    update <- qvalue + alphaP * (utility - qvalue)
  # WM
  } else if (model == "WM") {
    update <- reward
  }
  
  return(list(output = update, hidden = hidden)) 
}

multiRL documentation built on June 9, 2026, 5:09 p.m.