optimisers | R Documentation |
Functions to set up optimisers (which find parameters that
maximise the joint density of a model) and change their tuning parameters,
for use in opt()
. For details of the algorithms and how to
tune them, see the TensorFlow optimiser docs, or the Tensorflow Probability optimiser docs.
nelder_mead(
objective_function = NULL,
initial_vertex = NULL,
step_sizes = NULL,
func_tolerance = 1e-08,
position_tolerance = 1e-08,
reflection = NULL,
expansion = NULL,
contraction = NULL,
shrinkage = NULL
)
bfgs(
value_and_gradients_function = NULL,
initial_position = NULL,
tolerance = 1e-08,
x_tolerance = 0L,
f_relative_tolerance = 0L,
initial_inverse_hessian_estimate = NULL,
stopping_condition = NULL,
validate_args = TRUE,
max_line_search_iterations = 50L,
f_absolute_tolerance = 0L
)
powell()
momentum()
cg()
newton_cg()
l_bfgs_b()
tnc()
cobyla()
slsqp()
gradient_descent(learning_rate = 0.01, momentum = 0, nesterov = FALSE)
adadelta(learning_rate = 0.001, rho = 1, epsilon = 1e-08)
adagrad(learning_rate = 0.8, initial_accumulator_value = 0.1, epsilon = 1e-08)
adagrad_da(
learning_rate = 0.8,
global_step = 1L,
initial_gradient_squared_accumulator_value = 0.1,
l1_regularization_strength = 0,
l2_regularization_strength = 0
)
adam(
learning_rate = 0.1,
beta_1 = 0.9,
beta_2 = 0.999,
amsgrad = FALSE,
epsilon = 1e-08
)
adamax(learning_rate = 0.001, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-07)
ftrl(
learning_rate = 1,
learning_rate_power = -0.5,
initial_accumulator_value = 0.1,
l1_regularization_strength = 0,
l2_regularization_strength = 0,
l2_shrinkage_regularization_strength = 0,
beta = 0
)
proximal_gradient_descent(
learning_rate = 0.01,
l1_regularization_strength = 0,
l2_regularization_strength = 0
)
proximal_adagrad(
learning_rate = 1,
initial_accumulator_value = 0.1,
l1_regularization_strength = 0,
l2_regularization_strength = 0
)
nadam(learning_rate = 0.001, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-07)
rms_prop(
learning_rate = 0.1,
rho = 0.9,
momentum = 0,
epsilon = 1e-10,
centered = FALSE
)
objective_function |
A function that accepts a point as a real Tensor
and returns a Tensor of real dtype containing the value of the function at
that point. The function to be minimized. If |
initial_vertex |
Tensor of real dtype and any shape that can be
consumed by the |
step_sizes |
Tensor of real dtype and shape broadcasting compatible
with |
func_tolerance |
Single numeric number. The algorithm stops if the absolute difference between the largest and the smallest function value on the vertices of the simplex is below this number. Default is 1e-08. |
position_tolerance |
Single numeric number. The algorithm stops if the largest absolute difference between the coordinates of the vertices is below this threshold. |
reflection |
(optional) Positive Scalar Tensor of same dtype as
|
expansion |
(optional) Positive Scalar Tensor of same dtype as
|
contraction |
(optional) Positive scalar Tensor of same dtype as
|
shrinkage |
(Optional) Positive scalar Tensor of same dtype as
|
value_and_gradients_function |
A function that accepts a point as a
real Tensor and returns a tuple of Tensors of real dtype containing the
value of the function and its gradient at that point. The function to be
minimized. The input should be of shape |
initial_position |
real Tensor of shape |
tolerance |
Scalar Tensor of real dtype. Specifies the gradient tolerance for the procedure. If the supremum norm of the gradient vector is below this number, the algorithm is stopped. Default is 1e-08. |
x_tolerance |
Scalar Tensor of real dtype. If the absolute change in the position between one iteration and the next is smaller than this number, the algorithm is stopped. Default of 0L. |
f_relative_tolerance |
Scalar Tensor of real dtype. If the relative change in the objective value between one iteration and the next is smaller than this value, the algorithm is stopped. |
initial_inverse_hessian_estimate |
Optional Tensor of the same dtype
as the components of the output of the value_and_gradients_function. If
specified, the shape should broadcastable to shape |
stopping_condition |
(Optional) A function that takes as input two
Boolean tensors of shape |
validate_args |
Logical, default TRUE. When TRUE, optimizer parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. |
max_line_search_iterations |
Python int. The maximum number of iterations for the hager_zhang line search algorithm. |
f_absolute_tolerance |
Scalar Tensor of real dtype. If the absolute change in the objective value between one iteration and the next is smaller than this value, the algorithm is stopped. |
learning_rate |
the size of steps (in parameter space) towards the optimal value. Default value 0.01 |
momentum |
hyperparameter that accelerates gradient descent in the relevant direction and dampens oscillations. Defaults to 0, which is vanilla gradient descent. |
nesterov |
Whether to apply Nesterov momentum. Defaults to FALSE. |
rho |
the decay rate |
epsilon |
a small constant used to condition gradient updates |
initial_accumulator_value |
initial value of the 'accumulator' used to tune the algorithm |
global_step |
the current training step number |
initial_gradient_squared_accumulator_value |
initial value of the accumulators used to tune the algorithm |
l1_regularization_strength |
L1 regularisation coefficient (must be 0 or greater) |
l2_regularization_strength |
L2 regularisation coefficient (must be 0 or greater) |
beta_1 |
exponential decay rate for the 1st moment estimates |
beta_2 |
exponential decay rate for the 2nd moment estimates |
amsgrad |
Boolean. Whether to apply AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and beyond". Defaults to FALSE. |
learning_rate_power |
power on the learning rate, must be 0 or less |
l2_shrinkage_regularization_strength |
A float value, must be greater than or equal to zero. This differs from L2 above in that the L2 above is a stabilization penalty, whereas this L2 shrinkage is a magnitude penalty. When input is sparse shrinkage will only happen on the active weights. |
beta |
A float value, representing the beta value from the paper by McMahan et al 2013. Defaults to 0 |
centered |
Boolean. If TRUE, gradients are normalized by the estimated variance of the gradient; if FALSE, by the uncentered second moment. Setting this to TRUE may help with training, but is slightly more expensive in terms of computation and memory. Defaults to FALSE. |
The optimisers powell()
, cg()
, newton_cg()
,
l_bfgs_b()
, tnc()
, cobyla()
, and slsqp()
are
now defunct. They will error when called in greta 0.5.0. This are removed
because they are no longer available in TensorFlow 2.0. Note that
optimiser momentum()
has been replaced with gradient_descent()
an optimiser
object that can be passed to opt()
.
This optimizer isn't supported in TF2, so proceed with caution. See the TF docs on AdagradDAOptimiser for more detail.
This optimizer isn't supported in TF2, so proceed with caution. See the TF docs on ProximalGradientDescentOptimizer for more detail.
This optimizer isn't supported in TF2, so proceed with caution. See the TF docs on ProximalAdagradOptimizer for more detail.
## Not run:
# use optimisation to find the mean and sd of some data
x <- rnorm(100, -2, 1.2)
mu <- variable()
sd <- variable(lower = 0)
distribution(x) <- normal(mu, sd)
m <- model(mu, sd)
# configure optimisers & parameters via 'optimiser' argument to opt
opt_res <- opt(m, optimiser = bfgs())
# compare results with the analytic solution
opt_res$par
c(mean(x), sd(x))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.