DDPG: Train a DDPG system.
In wiper8/AI: Provide AI subject functions.

Train a DDPG system.

DDPG(
  policy_nn,
  critic_nn,
  actualize,
  reset,
  reward,
  done,
  episodes,
  buffer_len,
  batch_size,
  explor,
  gradient_step_n,
  discount,
  polyak,
  object_inputs,
  see,
  track_weights = FALSE,
  track_object = FALSE,
  ...
)

`policy_nn`	policy nn to choose actions.
`critic_nn`	critic nn to predict reward.
`actualize`	function to move object.
`reset`	function to reset object.
`reward`	function to get reward.
`done`	function to determine if episode if done.
`episodes`	integer : number of scenarios to train to.
`buffer_len`	integer : length of the replay buffer.
`batch_size`	integer : length of the batches used to backpropagate networks.
`explor`	numeric : >0 standard deviation of a value added to all weights in policy.
`gradient_step_n`	integer : number of backpropagation steps before moving on to updating targets.
`discount`	numeric : [0, 1] actualisation rate.
`polyak`	numeric : [0, 1] percentage of the new targets that we keep to update the actual targets.
`object_inputs`	function : function to get together in a data.frame the inputs for policy.
`see`	function : function to see the agent do his actions.
`track_weights`	logical : if the evolution of weights will be used in a graph.
`track_object`	logical : if the past states of object are to be seen.
`...`	(optional) other arguments passed to other functions.
`max_iter`	integer : maximum number of iterations for every episode.