View source: R/func_rl_simulate_action.R
rl_action_simulate.epsilonGreedy | R Documentation |
This implementation of an 'epsilonGreedy' action selection
policy accepts a parameter epsilon
, which describes an agent's propensity
to explore the action space. The higher the epsilon, the more likely the
agent is to select a random action; the lower epsilon, the more likely the
agent is to select the exploitative action (one with highest expected
value).
## S3 method for class 'epsilonGreedy' rl_action_simulate(policy = "epsilonGreedy", values, epsilon, ...)
policy |
Defines the action selection policy as "epsilonGreedy"; argument included in this method to support S3 Generics. |
values |
A numeric vector containing the current value estimates of each action. |
epsilon |
A parameter (between zero and one) modulating the RL agent's propensity to explore. That is, the higher the epsilon, the less exploitative choices the RL agent will make. |
... |
Additional arguments passed to or from other methods. |
A number representing which action will be taken.
# The lower the epsilon, the less exploration exploit <- numeric(100) for (trial in seq_along(exploit)) { exploit[trial] <- rl_action_simulate( policy = "epsilonGreedy", values = c(0.2, 0.25, 0.15, 0.8), epsilon = 0.1 ) } # Choice 4 (0.8 is most optimal option) and we see it is selected the most sum(exploit == 4) # The higher the epsilon, the more exploration explore <- numeric(100) for (trial in seq_along(exploit)) { explore[trial] <- rl_action_simulate( policy = "epsilonGreedy", values = c(0.2, 0.25, 0.15, 0.8), epsilon = 0.8 ) } # Choice 4 (0.8 is most optimal option) but we see more exploration here sum(explore == 4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.