Grad | R Documentation |
Computes numerical derivatives and gradients of scalar-valued functions using finite differences. This function supports both two-sided (central, symmetric) and one-sided (forward or backward) derivatives. It can utilise parallel processing to accelerate computation of gradients for slow functions or to attain higher accuracy faster.
Grad(
FUN,
x,
elementwise = NA,
vectorised = NA,
multivalued = NA,
deriv.order = 1L,
side = 0,
acc.order = 2,
stencil = NULL,
h = NULL,
zero.tol = sqrt(.Machine$double.eps),
h0 = NULL,
control = list(),
f0 = NULL,
cores = 1,
preschedule = TRUE,
cl = NULL,
func = NULL,
method = NULL,
method.args = list(),
...
)
FUN |
A function returning a numeric scalar or a vector whose derivatives are to be computed. If the function returns a vector, the output will be a Jacobian. |
x |
Numeric vector or scalar: the point(s) at which the derivative is estimated.
|
elementwise |
Logical: is the domain effectively 1D, i.e. is this a mapping
|
vectorised |
Logical: if |
multivalued |
Logical: if |
deriv.order |
Integer or vector of integers indicating the desired derivative order,
|
side |
Integer scalar or vector indicating the type of finite difference:
|
acc.order |
Integer or vector of integers specifying the desired accuracy order
for each element of |
stencil |
Optional custom vector of points for function evaluation.
Must include at least |
h |
Numeric or character specifying the step size(s) for the numerical
difference or a method of automatic step determination ( |
zero.tol |
Small positive integer: if |
h0 |
Numeric scalar of vector: initial step size for automatic search with
|
control |
A named list of tuning parameters passed to |
f0 |
Optional numeric: if provided, used to determine the vectorisation type to save time. If FUN(x) must be evaluated (e.g. second derivatives), saves one evaluation. |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
func |
For compatibility with |
method |
For compatibility with |
method.args |
For compatibility with |
... |
Ignored. |
This function aims to be 100% compatible with the syntax of numDeriv::Grad()
,
but there might be differences in the step size because some choices made in
numDeriv
are not consistent with theory.
There is one feature of the default step size in numDeriv
that deserves
an explanation. In that package (but not in pnd
),
If method = "simple"
, then, simple forward differences are used with
a fixed step size eps
, which we denote by \varepsilon
.
If method = "Richardson"
, then, central differences are used with
a fixed step
h := |d\cdot x| + \varepsilon (|x| < \mathrm{zero.tol})
,
where d = 1e-4
is the relative step size and eps
becomes an extra
addition to the step size for the argument that are closer to zero than zero.tol
.
We believe that the latter may lead to mistakes when the user believes that they can set
the step size for near-zero arguments, whereas in reality, a combination of d
and eps
is used.
Here is the synopsis of the old arguments:
numDeriv
uses NA
for handling two-sided differences.
The pnd
equivalent is 0
, and NA
is replaced with 0
.
If numDeriv
method = "simple"
, then, eps = 1e-4
is
the absolute step size and forward differences are used.
If method = "Richardson"
, then, eps = 1e-4
is the absolute increment of the step
size for small arguments below the zero tolerance.
If numDeriv
method = "Richardson"
, then, d*abs(x)
is the
step size for arguments above the zero tolerance and the baseline step size for
small arguments that gets incremented by eps
.
The number of Richardson extrapolations that successively reduce the initial step size. For two-sided differences, each extrapolation increases the accuracy order by 2.
The reduction factor in Richardson extrapolations.
Here are the differences in the new compatible implementation.
If numDeriv
method = "simple"
, then,
ifelse(x!=0, abs(x), 1) * sqrt(.Machine$double.eps) * 2
is used because
one-sided differences require a smaller step size to reduce the truncation error.
If method = "Richardson"
, then, eps = 1e-5
.
If numDeriv
method = "Richardson"
, then, d*abs(x)
is the
step size for arguments above the zero tolerance and the baseline step size for
small arguments that gets incremented by eps
.
The number of Richardson extrapolations that successively reduce the initial step size. For two-sided differences, each extrapolation increases the accuracy order by 2.
The reduction factor in Richardson extrapolations.
Grad
does an initial check (if f0 = FUN(x)
is not provided)
and calls GenD()
with a set of appropriate parameters (multivalued = FALSE
if the check succeds). In case of parameter mismatch, throws and error.
Numeric vector of the gradient. If FUN
returns a vector,
a warning is issued suggesting the use of Jacobian()
.
GenD()
, Jacobian()
f <- function(x) sum(sin(x))
g1 <- Grad(FUN = f, x = 1:4)
g2 <- Grad(FUN = f, x = 1:4, h = 7e-6)
g2 - g1 # Tiny differences due to different step sizes
g.auto <- Grad(FUN = f, x = 1:4, h = "SW")
print(g.auto)
attr(g.auto, "step.search")$exitcode # Success
# Gradients for vectorised functions -- e.g. leaky ReLU
LReLU <- function(x) ifelse(x > 0, x, 0.01*x)
Grad(LReLU, seq(-1, 1, 0.1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.