Description Usage Arguments Value
Train a DDPG system.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
policy_nn |
policy nn to choose actions. |
critic_nn |
critic nn to predict reward. |
actualize |
function to move object. |
reset |
function to reset object. |
reward |
function to get reward. |
done |
function to determine if episode if done. |
episodes |
integer : number of scenarios to train to. |
buffer_len |
integer : length of the replay buffer. |
batch_size |
integer : length of the batches used to backpropagate networks. |
explor |
numeric : >0 standard deviation of a value added to all weights in policy. |
gradient_step_n |
integer : number of backpropagation steps before moving on to updating targets. |
discount |
numeric : [0, 1] actualisation rate. |
polyak |
numeric : [0, 1] percentage of the new targets that we keep to update the actual targets. |
object_inputs |
function : function to get together in a data.frame the inputs for policy. |
see |
function : function to see the agent do his actions. |
track_weights |
logical : if the evolution of weights will be used in a graph. |
track_object |
logical : if the past states of object are to be seen. |
... |
(optional) other arguments passed to other functions. |
max_iter |
integer : maximum number of iterations for every episode. |
list of the policy and the critic's weights (and the tracking of the weights if track_weights is TRUE) and a plot of the last car position/line.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.