knitr::opts_chunk$set(echo = TRUE) library("hyper2",quietly=TRUE) set.seed(0)
We will numerically verify the gradientn()
function for two hyper2
objects.
The chess dataset is as follows:
chess
We will define two gradient functions: chess_gradient_theoretical()
,
which calculates the partial derivatives using straightforward algebra
(specifically elementary calculus, $\partial(x+y)^n/\partial
x=n(x+y)^{n-1}$), and chess_gradient_theoretical()
, which uses C
code:
chess_gradient_theoretical <- function(x){ Topalov <- x[1] Anand <- x[2] Karpov <- x[3] out <- c( Topalov = 30/Topalov -35/(Topalov+Anand) -18/(Topalov+Karpov), Anand = -35/(Topalov+Anand) +36/Anand -35/(Anand+Karpov), Karpov = -18/(Topalov+Karpov) -35/(Anand+Karpov) +22/Karpov ) names(out) <- pnames(chess) out } chess_gradient_numerical <- function(x){gradientn(chess,x)}
So in the above we have two ways of calculating the gardient: one theoretical and one numerical. Do they agree?
p_chess <- c( Topalov = 0.5, Anand = 0.3, Karpov = 0.2 ) chess_gradient_theoretical(p_chess) - chess_gradient_numerical(p_chess)
Yes, they agree
The icons
dataset is:
icons
We can do the same thing but the algebra is more involved:
icons_gradient_theoretical <- function(x){ NB <- x[1] L <- x[2] PB <- x[3] THC <- x[4] OA <- x[5] WAIS <- x[6] out <- c( NB = ( +32/NB -20/(NB + L + THC + OA) -15/(NB + L + THC + WAIS) -09/(NB + L + OA + WAIS) -18/(NB + PB + THC + OA) -18/(NB + PB + THC + WAIS) -08/(NB + PB + OA + WAIS) ), L = ( -20/(NB + L + THC + OA) -15/(NB + L + THC + WAIS) -09/(NB + L + OA + WAIS) +24/L -11/(L + PB + THC + OA) -16/(L + PB + THC + WAIS) -18/(L + PB + OA + WAIS) ), PB = ( -18/(NB + PB + THC + OA) -18/(NB + PB + THC + WAIS) -08/(NB + PB + OA + WAIS) -11/(L + PB + THC + OA) -16/(L + PB + THC + WAIS) -18/(L + PB + OA + WAIS) +30/PB ), THC = ( -20/(NB + L + THC + OA) -15/(NB + L + THC + WAIS) -18/(NB + PB + THC + OA) -18/(NB + PB + THC + WAIS) -11/(L + PB + THC + OA) -16/(L + PB + THC + WAIS) +24/THC ), OA = ( -20/(NB + L + THC + OA) -09/(NB + L + OA + WAIS) -18/(NB + PB + THC + OA) -08/(NB + PB + OA + WAIS) -11/(L + PB + THC + OA) -18/(L + PB + OA + WAIS) +14/OA ), WAIS = ( -15/(NB + L + THC + WAIS) -09/(NB + L + OA + WAIS) -18/(NB + PB + THC + WAIS) -08/(NB + PB + OA + WAIS) -16/(L + PB + THC + WAIS) -18/(L + PB + OA + WAIS) +9/WAIS ) ) names(out) <- pnames(icons) return(out) } icons_gradient_numerical <- function(x){gradientn(icons,x)}
Then
p_icons <- c(NB=0.3, L=0.1, PB=0.2, THC=0.15, OA=0.15, WAIS=0.1)
icons_gradient_theoretical(p_icons) icons_gradient_theoretical(p_icons) - icons_gradient_numerical(p_icons)
also agreeing.
gradientn()
and gradient()
Taking the chess dataset we have
rbind(indep(p_chess), gradient(chess,indep(p_chess))) rbind(p_chess,gradientn(chess,p_chess))
Indeed, because the powers of chess
sum to zero, we can test the
fact that the derivative in the direction parallel to p_chess
is
zero:
sum(powers(chess)) sum(p_chess*gradientn(chess,p_chess))
which is correct to numerical precision. Further,
sum(powers(icons)) sum(p_icons*gradientn(icons,p_icons))
Consider the icons
likelihood function; we wonder whether the
loglikelihood function has a well-defined maximum.
(M <- hessian(icons)) is_ok_hessian(M)
This would suggest that the maximum is defined; we may ask how sharp it is:
is_ok_hessian(M,give=TRUE)
suggesting a reasonably sharp constrained maximum point. We now demostrate that the Hessian matrix of second derivatives is numerically correct. First the unconstrained problem, working with the first five elements. The first step is to look at the likelihood, the gradient, and the Hessian, all evaluated at a certain point:
(p <- indep(equalp(icons))) # NB not the evaluate (G <- gradient(icons,p)) (M <- hessian(icons,p,border=FALSE)) eigen(M,TRUE,TRUE)$values
dp <- rnorm(5)*1e-4 # small perturbation dL <- loglik(p+dp,icons) - loglik(p,icons) # delta loglikelihood (exact) dL1 <- sum(gradient(icons,p)*dp) # first order approximation dL2 <- dL1 + t(dp) %*% M %*% dp/2 # second order; should be quad.form(dp,M) c(dL,dL1,dL2) c(first_order_error=dL-dL1,second_order_error=dL-dL2)
In the above, see how the first order approximation is pretty good: it is of ${\mathcal O}(f''(x)\delta x^2)$ which here would be about $1000\times 10^{-8}=10^{-5}$ which is pretty much what we observe [the 1000 is the typical size of the elements of the Hessian]. The second-order approximation is better, having a smaller error ... but not that much better. I would expect the error to be ${\mathcal O}(f'''(x)\delta x^3)$ but maybe the third order derivatives are larger than one might expect. We can perform a similar analysis but using bordered Hessians:
dpc <- fillup(dp,0) # constrained delta p MC <- hessian(icons,p,border=TRUE) dLC1 <- sum(gradientn(icons,fillup(p))*dpc) # first order approximation dLC2 <- dLC1 + t(dpc) %*% MC %*% dpc/2 # second order c(dL,dLC1,dLC2) c(first_order_error=dL-dLC1,second_order_error=dL-dLC2)
showing similar results.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.