rl_action_simulate.epsilonGreedy: Simulate an Action with a 'Epsilon-Greedy' Choice Policy
In jdtrat/rlsims: Simulate Reinforcement Learning Agents in R

View source: R/func_rl_simulate_action.R

rl_action_simulate.epsilonGreedy

R Documentation

Simulate an Action with a 'Epsilon-Greedy' Choice Policy

Description

This implementation of an 'epsilonGreedy' action selection policy accepts a parameter epsilon, which describes an agent's propensity to explore the action space. The higher the epsilon, the more likely the agent is to select a random action; the lower epsilon, the more likely the agent is to select the exploitative action (one with highest expected value).

Usage

## S3 method for class 'epsilonGreedy'
rl_action_simulate(policy = "epsilonGreedy", values, epsilon, ...)

Arguments

`policy`	Defines the action selection policy as "epsilonGreedy"; argument included in this method to support S3 Generics.
`values`	A numeric vector containing the current value estimates of each action.
`epsilon`	A parameter (between zero and one) modulating the RL agent's propensity to explore. That is, the higher the epsilon, the less exploitative choices the RL agent will make.
`...`	Additional arguments passed to or from other methods.

Value

A number representing which action will be taken.

Examples


# The lower the epsilon, the less exploration
exploit <- numeric(100)
for (trial in seq_along(exploit)) {
  exploit[trial] <- rl_action_simulate(
    policy = "epsilonGreedy",
    values = c(0.2, 0.25, 0.15, 0.8),
    epsilon = 0.1
  )
}
# Choice 4 (0.8 is most optimal option) and we see it is selected the most
sum(exploit == 4)

# The higher the epsilon, the more exploration
explore <- numeric(100)
for (trial in seq_along(exploit)) {
  explore[trial] <- rl_action_simulate(
    policy = "epsilonGreedy",
    values = c(0.2, 0.25, 0.15, 0.8),
    epsilon = 0.8
  )
}
# Choice 4 (0.8 is most optimal option) but we see more exploration here
sum(explore == 4)

jdtrat/rlsims documentation built on March 26, 2022, 6:17 p.m.