func_beta: Function: Probability
In multiRL: Reinforcement Learning Tools for Multi-Armed Bandit

func_beta

R Documentation

Function: Probability

Description

P_{t}(a) = \frac{ \exp(\beta \cdot (Q_t(a) - \max_{a' \in \mathcal{A}} Q_t(a'))) } { \sum_{a' \in \mathcal{A}} \exp( \beta \cdot (Q_t(a') - \max_{a'_{i} \in \mathcal{A}} Q_t(a'_{i})) ) }

P_{t}(a) = (1 - lapse \cdot N_{shown}) \cdot P_{t}(a) + lapse

Usage

func_beta(shown, qvalue, explor, params, system, ...)

Arguments

`shown`	Which options shown in this trial.
`qvalue`	The expected Q values of different behaviors produced by different systems when updated to this trial.
`explor`	Whether the agent made a random choice (exploration) in this trial.
`params`	Parameters used by the model's internal functions, see params
`system`	When the agent makes a decision, is a single system at work, or are multiple systems involved? see system
`...`	It currently contains the following information; additional information may be added in future package versions. idinfo: subid block trial exinfo: contains information whose column names are specified by the user. Frame RT NetWorth ... behave: includes the following: action: the behavior performed by the human in the given trial. latent: the object updated by the agent in the given trial. simulation: the actual behavior performed by the agent. position: the position of the stimulus on the screen. cue and rsp: Cues and responses within latent learning rules, see behrule state: The state stores the stimuli shown in the current trial—split into components by underscores—and the rewards associated with them.

Value

A NumericVector containing the probability of choosing each option.

Body

func_beta <- function(
    shown,
    qvalue, 
    explor,
    params,
    system,
    ...
){
  
  list2env(list(...), envir = environment())
  
  # If you need extra information(...)
  # Column names may be lost(C++), indexes are recommended
  # e.g.
  # Trial  <- idinfo[3]
  # Frame  <- exinfo[1]
  # Action <- behave[1]
  
  beta     <- params[["beta"]]
  lapse    <- params[["lapse"]]
  weight   <- params[["weight"]]
  capacity <- params[["capacity"]]
  
  index     <- which(!is.na(qvalue[[1]]))
  n_shown   <- length(index)
  n_system  <- length(qvalue)
  n_options <- length(qvalue[[1]])
  
  # Assign weights to different systems
  if (length(weight) == 1L) {weight <- c(weight, 1 - weight)}
  weight <- weight / sum(weight)
  if (n_system == 1) {weight <- weight[1]}
  
  # Compute the probabilities estimated by different systems
  prob_mat <- matrix(0, nrow = n_options, ncol = n_system)
  
  if (explor == 1) {
    prob_mat[index, ] <- 1 / n_shown
    prob_mat[prob_mat == 0] <- NA
  } else {
    for (s in seq_len(n_system)) {
      sub_qvalue <- qvalue[[s]]
      exp_stable <- exp(beta * (sub_qvalue - max(sub_qvalue, na.rm = TRUE)))
      prob_mat[, s] <- exp_stable / sum(exp_stable, na.rm = TRUE)
    }
  }
  
  # Weighted average
  prob <- as.vector(prob_mat %*% weight)
  
  # lapse
  prob <- (1 - lapse * n_shown) * prob + lapse
   
  return(prob)
}

multiRL documentation built on March 31, 2026, 5:06 p.m.