| pmnorm | R Documentation |
This function calculates and differentiates the probabilities of the (conditional) multivariate normal distribution.
pmnorm(
lower,
upper,
given_x = numeric(),
mean = numeric(),
sigma = matrix(),
given_ind = numeric(),
n_sim = 1000L,
method = "default",
ordering = "mean",
log = FALSE,
grad_lower = FALSE,
grad_upper = FALSE,
grad_sigma = FALSE,
grad_given = FALSE,
is_validation = TRUE,
control = NULL,
n_cores = 1L,
marginal = NULL,
grad_marginal = FALSE,
grad_marginal_prob = FALSE
)
lower |
numeric vector representing the lower integration limits.
If |
upper |
numeric vector representing the upper integration limits.
If |
given_x |
numeric vector whose |
mean |
numeric vector representing an expectation of the multivariate normal vector (distribution). |
sigma |
positively defined numeric matrix representing the covariance matrix of the multivariate normal vector (distribution). |
given_ind |
numeric vector representing indexes of a multivariate
normal vector which are conditioned on the values given by the
|
n_sim |
positive integer representing the number of draws from the
Richtmyer sequence in the GHK algorithm. More draws provide more accurate
results at the cost of additional computational burden. Alternative types of
sequences could be provided via the |
method |
string representing the method to be used to calculate the
multivariate normal probabilities. Possible options are |
ordering |
string representing the method to be used to order the integrals. See 'Details' section below. |
log |
logical; if |
grad_lower |
logical; if |
grad_upper |
logical; if |
grad_sigma |
logical; if |
grad_given |
logical; if |
is_validation |
logical value indicating whether input
arguments should be validated. Set it to |
control |
a list of control parameters. See Details. |
n_cores |
positive integer representing the number of CPU cores
used for parallel computing. Currently it is not recommended to set
|
marginal |
list such that |
grad_marginal |
logical; if |
grad_marginal_prob |
logical; if |
Consider notations from the Details sections of
cmnorm and dmnorm.
The function calculates probabilities of the form:
P\left(x^{(l)}\leq X_{I_{d}}\leq
x^{(u)}|X_{I_{g}}=x^{(g)}\right)
where x^{(l)} and x^{(u)} are lower and upper integration
limits, respectively, i.e., lower and
upper. Also x^{(g)} represents given_x.
Note that lower and upper should be matrices of the same size.
Also given_x should have the same number of rows as lower
and upper.
To calculate bivariate probabilities, the function applies the method of Drezner and Wesolowsky described in A. Genz (2004). In contrast to the classical implementation of this method, the function applies Gauss-Legendre quadrature with 30 sample points to approximate integral (1) of A. Genz (2004). Classical implementations of this method use up to 20 points but require some additional transformations of (1). During preliminary testing it has been found that the approach with 30 points provides similar accuracy being slightly faster because of better vectorization capabilities.
To calculate trivariate probabilities, the function uses the Drezner method following formula (14) of A. Genz (2004). Similarly to bivariate case, 30 points are used in Gauss-Legendre quadrature.
The function may apply the method of Gassmann (2003) for estimation of
m>3 dimensional normal probabilities. It uses the
matrix 5 representation of Gassmann (2003) and 30 points of
Gauss-Legendre quadrature.
For m-variate probabilities, where m > 1, the function may apply
the GHK algorithm described in Section 4.2 of A. Genz and F. Bretz (2009).
The implementation of GHK is based on the deterministic Halton sequence
with n_sim draws and uses variable reordering suggested in
Section 4.1.3 of A. Genz and F. Bretz (2009). The ordering algorithm may
be determined via the ordering argument. Available options
are "NO", "mean" (default), and "variance".
Univariate probabilities are always calculated via the standard approach so,
in this case method will not affect the output.
If method = "Gassmann", then the function uses fast (aforementioned)
algorithms for bivariate and trivariate probabilities or the method of
Gassmann for m>3 dimensional probabilities.
If method = "GHK", then the GHK algorithm is used.
If method = "default", then "Gassmann" is used for bivariate
and trivariate probabilities, while "GHK" is used for m>3
dimensional probabilities. During future updates, "Gassmann" may
become a default method for calculation of the 4-5 dimensional
probabilities.
We are going to provide alternative estimation algorithms during
future updates. These methods will be available via the method
argument.
The function is optimized to perform much faster when all upper integration
limits upper are finite, while all lower integration limits
lower are negative infinity. The derivatives could also be calculated
much faster when some integration limits are infinite.
For simplicity of the notations, further, let's consider
unconditioned probabilities. Derivatives with respect to conditioned
components are similar to those mentioned in the Details section
of dmnorm. We also provide formulas for m \geq 3.
But the function may calculate derivatives for m \leq 2 using some
simplifications of the formulas mentioned below.
If grad_upper is TRUE, then the function additionally
estimates the gradient with respect to upper:
\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial x^{(u)}_{i}}=
P\left(x^{(l)}_{(-i)}\leq X_{(-i)}\leq x^{(u)}_{(-i)}|
X_{i}=x^{(u)}_{i}\right)
f_{X_{i}}\left(x^{(u)}_{i};\mu_{i},\Sigma_{i,i}\right)
If grad_lower is TRUE then function additionally estimates the
gradient respect to lower:
\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial x^{(l)}_{i}}=
-P\left(x^{(l)}_{(-i)}\leq X_{(-i)}\leq x^{(u)}_{(-i)}|
X_{i}=x^{(l)}_{i}\right)
f_{X_{i}}\left(x^{(l)}_{i};\mu_{i},\Sigma_{i,i}\right)
If grad_sigma is TRUE then function additionally estimates the
gradient respect to sigma. For i\ne j, the function
calculates derivatives with respect to the covariances:
\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial \Sigma_{i, j}}=
=P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}|
X_{i}=x^{(u)}_{i}, X_{j}=x^{(u)}_{j}\right)
f_{X_{i}, X_{j}}\left(x^{(u)}_{i}, x^{(u)}_{j};
\mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right) -
-P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}|
X_{i}=x^{(l)}_{i}, X_{j}=x^{(u)}_{j}\right)
f_{X_{i}, X_{j}}\left(x^{(l)}_{i}, x^{(u)}_{j};
\mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right) -
-P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}|
X_{i}=x^{(u)}_{i}, X_{j}=x^{(l)}_{j}\right)
f_{X_{i}, X_{j}}\left(x^{(u)}_{i}, x^{(l)}_{j};
\mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right) +
+P\left(x^{(l)}_{(-(i, j))}\leq X_{-(i, j)}\leq x^{(u)}_{(-(i, j))}|
X_{i}=x^{(l)}_{i}, X_{j}=x^{(l)}_{j}\right)
f_{X_{i}, X_{j}}\left(x^{(l)}_{i}, x^{(l)}_{j};
\mu_{(i,j)},\Sigma_{(i, j),(i, j)}\right)
Note that if some of the integration limits are infinite, then some elements of this equation converge to zero, which highly simplifies the calculations.
Derivatives with respect to variances are calculated using derivatives with respect to covariances and integration limits:
\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial \Sigma_{i, i}}=
-\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial x^{(l)}_{i}} \frac{x^{(l)}_{i}}{2\Sigma_{i, i}}
-\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial x^{(u)}_{i}} \frac{x^{(u)}_{i}}{2\Sigma_{i, i}}-
-\sum\limits_{j\ne i}\frac{\partial P\left(x^{(l)}\leq X\leq
x^{(u)}\right)}{\partial \Sigma_{i, j}}
\frac{\Sigma_{i, j}}{2\Sigma_{i, i}}
If grad_given is TRUE then function additionally estimates the
gradient respect to given_x using formulas similar to those
described in Details section of dmnorm.
More details on the aforementioned differentiation formulas can be found in the appendix of E. Kossova and B. Potanin (2018).
If marginal is not empty, then Gaussian copula will be used instead of
a classical multivariate normal distribution. Without loss of generality,
let's assume that \mu is a vector of zeros and introduce some
additional notations:
q_{i}^{(u)} = \Phi^{-1}\left(P_{i}\left(\frac{x_{i}^{(u)}}{\sigma_{i}}\right)\right)
q_{i}^{(l)} = \Phi^{-1}\left(P_{i}\left(\frac{x_{i}^{(l)}}{\sigma_{i}}\right)\right)
where \Phi(.)^{-1} is a quantile function of a standard
normal distribution and P_{i} is a cumulative distribution function
of the standardized (to zero mean and unit variance) marginal distribution
whose name and parameters are defined by
names(marginal)[i] and marginal[[i]], respectively.
For example, if marginal[i] = "logistic", then:
P_{i}(t) = \frac{1}{1+e^{-\pi t / \sqrt{3}}}
Let's denote by X^{*} a random vector which is distributed
with Gaussian (its covariance matrix is \Sigma) copula and
marginals defined by marginal.
Then probabilities for this random vector are calculated as follows:
P\left(x^{(l)}\leq X^{*}\leq
x^{(u)}\right) =
P\left(\sigma q^{(l)}\leq X\leq
\sigma q^{(u)}\right) = P_{0}\left(\sigma q^{(l)}, \sigma q^{(u)}\right)
where q^{(l)} = (q_{1}^{(l)},...,q_{m}^{(l)}),
q^{(u)} = (q_{1}^{(u)},...,q_{m}^{(u)}) and
\sigma = (\sqrt{\Sigma_{1, 1}},...,\sqrt{\Sigma_{m, m}}). Therefore
probabilities of X^{*} may be calculated using probabilities
of multivariate normal random vector X (with the same
covariance matrix) by
substituting lower and upper integration limits x^{(l)} and
x^{(u)} with \sigma q^{(l)} and \sigma q^{(u)},
respectively. Therefore differentiation formulas are similar to
those mentioned above and will be automatically adjusted if
marginal is not empty (including conditional probabilities).
Argument control is a list with the following input parameters:
random_sequence – numeric matrix of uniform pseudo random numbers
(like Halton sequence). The number of columns should be equal to
the number of dimensions of multivariate random vector. If omitted, then
this matrix will be generated automatically using n_sim number
of simulations.
This function returns an object of class "mnorm_pmnorm".
An object of class "mnorm_pmnorm" is a list containing the
following components:
prob - the probability that a multivariate normal random
variable will be between the lower and upper bounds.
grad_lower - the gradient of the probability with respect to
lower, if grad_lower or grad_sigma the input argument
is set to TRUE.
grad_upper - the gradient of the probability with respect to
upper, if grad_upper or grad_sigma input argument is
set to TRUE.
grad_sigma - the gradient with respect to the elements of
sigma, if grad_sigma input argument is set to TRUE.
grad_given - the gradient with respect to the elements of
given_x, if grad_given input argument is set to TRUE.
grad_marginal - the gradient with respect to the elements of
marginal_par, if grad_marginal input argument is set
to TRUE. Currently only the derivatives with respect to the parameters
of the "PGN" distribution are available.
If log is TRUE, then prob is a log-probability,
so the output grad_lower, grad_upper, grad_sigma, and
grad_given are calculated with respect to the log-probability.
The output grad_lower and grad_upper are Jacobian matrices
whose rows are the gradients of the probabilities calculated for each row of
lower and upper, respectively. Similarly, grad_given
is a Jacobian matrix with respect to given_x.
The output grad_sigma is a 3D array such that
grad_sigma[i, j, k] is a partial derivative of the probability
function with respect to the sigma[i, j] estimated for
the k-th observation.
Output grad_marginal is a list such that grad_marginal[[i]]
is a Jacobian matrix whose rows are the gradients of the probabilities
calculated for each row of lower and upper, respectively,
with respect to the elements of marginal_par[[i]].
Genz, A. (2004), Numerical computation of rectangular bivariate and trivariate normal and t-probabilities, Statistics and Computing, 14, 251-260.
Genz, A. and Bretz, F. (2009), Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics, Vol. 195. Springer-Verlag, Heidelberg.
E. Kossova, B. Potanin (2018). Heckman method and switching regression model multivariate generalization. Applied Econometrics, vol. 50, pages 114-143.
H. I. Gassmann (2003). Multivariate Normal Probabilities: Implementing an Old Idea of Plackett's. Journal of Computational and Graphical Statistics, vol. 12 (3), pages 731-752.
# Consider a multivariate normal vector:
# X = (X1, X2, X3, X4, X5) ~ N(mean, sigma)
# Prepare the multivariate normal vector parameters
# the expected value
mean <- c(-2, -1, 0, 1, 2)
n_dim <- length(mean)
# the correlation matrix
cor <- c( 1, 0.1, 0.2, 0.3, 0.4,
0.1, 1, -0.1, -0.2, -0.3,
0.2, -0.1, 1, 0.3, 0.2,
0.3, -0.2, 0.3, 1, -0.05,
0.4, -0.3, 0.2, -0.05, 1)
cor <- matrix(cor, ncol = n_dim, nrow = n_dim, byrow = TRUE)
# the covariance matrix
sd_mat <- diag(c(1, 1.5, 2, 2.5, 3))
sigma <- sd_mat %*% cor %*% t(sd_mat)
# Estimate probability:
# P(-3 < X1 < 1, -2.5 < X2 < 1.5, -2 < X3 < 2, -1.5 < X4 < 2.5, -1 < X5 < 3)
lower <- c(-3, -2.5, -2, -1.5, -1)
upper <- c(1, 1.5, 2, 2.5, 3)
p.list <- pmnorm(lower = lower, upper = upper,
mean = mean, sigma = sigma)
p <- p.list$prob
print(p)
# Additionally estimate a probability
# P(-Inf < X1 < Inf, 0 < X2 < Inf, -Inf < X3 < 3, 1 < X4 < 4, -Inf < X5 < 5)
lower.1 <- c(-Inf, 0, -Inf, 1, -Inf)
upper.1 <- c(Inf, Inf, 3, 4, 5)
lower.mat <- rbind(lower, lower.1)
upper.mat <- rbind(upper, upper.1)
p.list.1 <- pmnorm(lower = lower.mat, upper = upper.mat,
mean = mean, sigma = sigma)
p.1 <- p.list.1$prob
print(p.1)
# Estimate the probabilities
# P(-1 < X1 < 1, -3 < X3 < 3, -5 < X5 < 5 | X2 = -2, X4 = 4)
lower.2 <- c(-1, -3, -5)
upper.2 <- c(1, 3, 5)
given_ind <- c(2, 4)
given_x <- c(-2, 4)
p.list.2 <- pmnorm(lower = lower.2, upper = upper.2,
mean = mean, sigma = sigma,
given_ind = given_ind, given_x = given_x)
p.2 <- p.list.2$prob
print(p.2)
# Additionally estimate the probability
# P(-Inf < X1 < 1, -3 < X3 < Inf, -Inf < X5 < Inf | X2 = 4, X4 = -2)
lower.3 <- c(-Inf, -3, -Inf)
upper.3 <- c(1, Inf, Inf)
given_x.1 <- c(-2, 4)
lower.mat.2 <- rbind(lower.2, lower.3)
upper.mat.2 <- rbind(upper.2, upper.3)
given_x.mat <- rbind(given_x, given_x.1)
p.list.3 <- pmnorm(lower = lower.mat.2, upper = upper.mat.2,
mean = mean, sigma = sigma,
given_ind = given_ind, given_x = given_x.mat)
p.3 <- p.list.3$prob
print(p.3)
# Estimate the gradient of the previous probabilities with respect to
# various arguments
p.list.4 <- pmnorm(lower = lower.mat.2, upper = upper.mat.2,
mean = mean, sigma = sigma,
given_ind = given_ind, given_x = given_x.mat,
grad_lower = TRUE, grad_upper = TRUE,
grad_sigma = TRUE, grad_given = TRUE)
p.4 <- p.list.4$prob
print(p.4)
# Gradient with respect to 'lower'
grad_lower <- p.list.4$grad_lower
# for the first probability
print(grad_lower[1, ])
# for the second probability
print(grad_lower[2, ])
# Gradient with respect to 'upper'
grad_upper <- p.list.4$grad_upper
# for the first probability
print(grad_upper[1, ])
# for the second probability
print(grad_upper[2, ])
# Gradient with respect to 'given_x'
grad_given <- p.list.4$grad_given
# for the first probability
print(grad_given[1, ])
# for the second probability
print(grad_given[2, ])
# Gradient with respect to 'sigma'
grad_sigma <- p.list.4$sigma
print(grad_sigma)
# Compare analytical gradients from the previous example with
# their numeric (forward difference) analogues for the first probability
n_dependent <- length(lower.2)
n_given <- length(given_x)
n_dim <- n_dependent + n_given
delta <- 1e-6
grad_lower.num <- rep(NA, n_dependent)
grad_upper.num <- rep(NA, n_dependent)
grad_given.num <- rep(NA, n_given)
grad_sigma.num <- matrix(NA, nrow = n_dim, ncol = n_dim)
for (i in 1:n_dependent)
{
# with respect to lower
lower.delta <- lower.2
lower.delta[i] <- lower.2[i] + delta
p.list.delta <- pmnorm(lower = lower.delta, upper = upper.2,
given_x = given_x,
mean = mean, sigma = sigma,
given_ind = given_ind)
grad_lower.num[i] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
# with respect to upper
upper.delta <- upper.2
upper.delta[i] <- upper.2[i] + delta
p.list.delta <- pmnorm(lower = lower.2, upper = upper.delta,
given_x = given_x,
mean = mean, sigma = sigma,
given_ind = given_ind)
grad_upper.num[i] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
}
for (i in 1:n_given)
{
# with respect to lower
given_x.delta <- given_x
given_x.delta[i] <- given_x[i] + delta
p.list.delta <- pmnorm(lower = lower.2, upper = upper.2,
given_x = given_x.delta,
mean = mean, sigma = sigma,
given_ind = given_ind)
grad_given.num[i] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
}
for (i in 1:n_dim)
{
for(j in 1:n_dim)
{
# with respect to sigma
sigma.delta <- sigma
sigma.delta[i, j] <- sigma[i, j] + delta
sigma.delta[j, i] <- sigma[j, i] + delta
p.list.delta <- pmnorm(lower = lower.2, upper = upper.2,
given_x = given_x,
mean = mean, sigma = sigma.delta,
given_ind = given_ind)
grad_sigma.num[i, j] <- (p.list.delta$prob - p.list.4$prob[1]) / delta
}
}
# Comparison of gradients with respect to the lower integration limits
h.lower <- cbind(analytical = p.list.4$grad_lower[1, ],
numeric = grad_lower.num)
print(h.lower)
# Comparison of gradients with respect to the upper integration limits
h.upper <- cbind(analytical = p.list.4$grad_upper[1, ],
numeric = grad_upper.num)
print(h.upper)
# Comparison of gradients with respect to the given values
h.given <- cbind(analytical = p.list.4$grad_given[1, ],
numeric = grad_given.num)
print(h.given)
# Comparison of gradients with respect to the covariance matrix
h.sigma <- list(analytical = p.list.4$grad_sigma[, , 1],
numeric = grad_sigma.num)
print(h.sigma)
# Let's again estimate the probability
# P(-1 < X1 < 1, -3 < X3 < 3, -5 < X5 < 5 | X2 = -2, X4 = 4)
# But assume that the variables are standardized to zero mean
# and unit variance:
# 1) X1 and X2 have standardized PGN distribution with coefficients vectors
# pc1 = (0.5, -0.2, 0.1) and pc2 = (0.2, 0.05), respectively.
# 2) X3 has a standardized Student distribution with 5 degrees of freedom
# 3) X4 has a standardized logistic distribution
# 4) X5 has a standard normal distribution
marginal <- list(PGN = c(0.5, -0.2, 0.1), hpa = c(0.2, 0.05),
student = 5, logistic = numeric(), normal = NULL)
p.list.5 <- pmnorm(lower = lower.2, upper = upper.2,
mean = mean, sigma = sigma,
given_ind = given_ind, given_x = given_x,
grad_lower = TRUE, grad_upper = TRUE,
grad_sigma = TRUE, grad_given = TRUE,
marginal = marginal, grad_marginal = TRUE)
# Let's investigate derivatives with respect to the parameters
# of marginal distributions
# with respect to pc1 of X1
p.list.5$grad_marginal[[1]]
# with respect to pc2 of X2
p.list.5$grad_marginal[[2]]
# derivative with respect to the degrees of freedom of X5 is
# currently unavailable and will be set to 0
p.list.5$grad_marginal[[3]]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.