In compstat-lmu/rlR: Reinforcement Learning in R

library(rlR)
set.seed(123)
knitr::opts_chunk$set(cache = TRUE, collapse = FALSE, dev = "svg", fig.height = 3.5)
knitr::knit_hooks$set(document = function(x){
  gsub("```\n*```r*\n*", "", x)
})
library(reticulate)
#os = import("os")
#os$environ[["TF_CPP_MIN_LOG_LEVEL"]]="3"

Customized Brain for Mountain Car Problem

Action cheat to Environment

For the Mountain Car Senario, there are three valid actions: move left, do nothing and move right. Since do nothing does not help us in this environment, we could ignore this action. In rlR this is done by the following code.

library(rlR)
env = makeGymEnv("MountainCar-v0", act_cheat = c(0, 2))

act_cheat is a vector where the first element means the first action maps to the 0th action in the gym environment and the second element means the second action maps to the 2th action of the gym environment. But this definition, the 1th gym action is eliminated. Note that in gym the index is python convention where 0th means the 1th in R.

Define custom neural network

net_fun = function(state_dim, act_cnt) {
  model = keras::keras_model_sequential()
    model %>%
      layer_dense(units = 8, activation = "relu", input_shape = c(state_dim)) %>%
      layer_dropout(rate = 0.25) %>%
      layer_dense(units = act_cnt, activation = "linear")
    model$compile(loss = "mse", optimizer = optimizer_rmsprop(lr = 0.001, clipnorm = 1.0))
    model
}

Learning

conf = getDefaultConf("AgentDQN")
conf$set(console = TRUE, render = TRUE, policy.maxEpsilon = 1, policy.minEpsilon = 0, policy.decay = 1.0 / 1.01, replay.batchsize = 64, replay.epochs = 4, agent.lr.decay = 1, agent.gamma = 0.95)
agent = initAgent("AgentDQN", env, conf, custom_brain = T)
library(magrittr)
library(keras)
agent$customizeBrain(list(value_fun = net_fun))
agent$learn(1)