DDPG: Train a DDPG system.

Description Usage Arguments Value

View source: R/DDPG.R

Description

Train a DDPG system.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
DDPG(
  policy_nn,
  critic_nn,
  actualize,
  reset,
  reward,
  done,
  episodes,
  buffer_len,
  batch_size,
  explor,
  gradient_step_n,
  discount,
  polyak,
  object_inputs,
  see,
  track_weights = FALSE,
  track_object = FALSE,
  ...
)

Arguments

policy_nn

policy nn to choose actions.

critic_nn

critic nn to predict reward.

actualize

function to move object.

reset

function to reset object.

reward

function to get reward.

done

function to determine if episode if done.

episodes

integer : number of scenarios to train to.

buffer_len

integer : length of the replay buffer.

batch_size

integer : length of the batches used to backpropagate networks.

explor

numeric : >0 standard deviation of a value added to all weights in policy.

gradient_step_n

integer : number of backpropagation steps before moving on to updating targets.

discount

numeric : [0, 1] actualisation rate.

polyak

numeric : [0, 1] percentage of the new targets that we keep to update the actual targets.

object_inputs

function : function to get together in a data.frame the inputs for policy.

see

function : function to see the agent do his actions.

track_weights

logical : if the evolution of weights will be used in a graph.

track_object

logical : if the past states of object are to be seen.

...

(optional) other arguments passed to other functions.

max_iter

integer : maximum number of iterations for every episode.

Value

list of the policy and the critic's weights (and the tracking of the weights if track_weights is TRUE) and a plot of the last car position/line.


wiper8/AI documentation built on Dec. 23, 2021, 5:15 p.m.