test_msel | R Documentation |
This function conducts various statistical tests and calculates
confidence intervals for the parameters of the model estimated via the
msel
function.
test_msel(
object,
fn,
fn_args = list(),
test = "t",
method = "classic",
ci = "classic",
cl = 0.95,
se_type = "dm",
trim = 0,
vcov = object$cov,
iter = 100,
generator = rnorm,
bootstrap = NULL,
par_ind = 1:object$control_lnL$n_par,
eps = max(1e-04, sqrt(.Machine$double.eps) * 10),
n_sim = 1000,
n_cores = 1
)
object |
an object of class 'msel'. It also may be a list of two
objects. Then |
fn |
a function which returns a numeric vector and should depend on the
elements of |
fn_args |
a list of additional arguments of |
test |
a character representing the test to be used.
If |
method |
a character representing a method used to conduct a test.
If |
ci |
a character representing the type of the confidence interval
used. Available only if |
cl |
a numeric value between |
se_type |
a character representing a method used to estimate
the standard errors of the outputs of |
trim |
a numeric value between |
vcov |
an estimate of the asymptotic covariance matrix of the parameters of the model. |
iter |
the number of iterations used by the score bootstrap Wald test. |
generator |
function which is used by the score bootstrap to generate
random weights. It should have an argument |
bootstrap |
an object of class |
par_ind |
a vector of indexes of the model parameters used
in the calculation of |
eps |
a positive numeric value representing the increment used for the
numeric differentiation of |
n_sim |
the value passed to the |
n_cores |
the value passed to the |
Suppose that \theta
is a vector of parameters of the model
estimated via the msel
function and
g(\theta)
is a differentiable function representing fn
which
returns a m
-dimensional vector of real values:
g(\theta) = (g_{1}(\theta),...,g_{m}(\theta))^{T}.
Classic and bootstrap t-test
If test = "t"
then for each j\in \{1,...,m\}
the following
hypotheses is tested:
H_{0}:g_{j}(\theta) = 0,\qquad H_{1}:g_{j}(\theta)\ne 0.
The test statistic is:
T = g_{j}(\hat{\theta})/\hat{\sigma}_{j},
where \hat{\sigma}
is a standard error of g_{j}(\hat{\theta})
.
If se_type = "dm"
then delta method is used to estimate
this standard error:
\hat{\sigma}_{j} = \sqrt{\nabla g_{j}(\hat{\theta})^{T}
\widehat{As.Cov}(\hat{\theta})
\nabla g_{j}(\hat{\theta})},
where \nabla g_{j}(\hat{\theta})
is a gradient as a column vector and
the estimate of the asymptotic covariance matrix of the estimates
\widehat{As.Cov}(\hat{\theta})
is provided via the vcov
argument. Numeric differentiation is used to calculate
\nabla g_{j}(\hat{\theta})
.
If se_type = "bootstrap"
then bootstrap is
applied to estimate the standard error:
\hat{\sigma}_{j} = \sqrt{\frac{1}{B - 1}\sum\limits_{b = 1}^{B}
(g_{j}(\hat{\theta}^{(b)}) - \overline{g_{j}(\hat{\theta}^{(b)}}))^2},
where B
is the number of the bootstrap iterations
bootstrap$iter
, \hat{\theta}^{(b)}
is the estimate
associated with the b
-th of these iterations bootstrap$par[b, ]
,
and g_{j}(\overline{\hat{\theta}^{(b)}})
is a sample mean of the bootstrap
estimates:
\overline{g_{j}(\hat{\theta}^{(b)}}) =
\frac{1}{B}\sum\limits_{b = 1}^{B}g_{j}(\hat{\theta}^{(b)}).
If method = "classic"
it is assumed that if the null hypothesis is
true then the asymptotic distribution of the test statistic is standard
normal. This distribution is used for the calculation of the p-value:
p-value = 2\min(\Phi(T), 1 - \Phi(T)),
where \Phi()
is a cumulative distribution function of the standard
normal distribution.
If method = "bootstrap"
then p-value is calculated via the bootstrap
as suggested by Hansen (2022):
p-value = \frac{1}{B}\sum\limits_{b=1}^{B} I(|T_{b} - T| > |T|),
where T_{b} = g_{j}(\hat{\theta}^{(b)})/\hat{\sigma}_{j}
is the value of
the test statistic estimated on the b
-th bootstrap sample and
I(q)
is an indicator function which equals 1
when q
is a
true statement and 0
- otherwise.
Classic and bootstrap Wald test
Suppose that method = "classic"
or method = "bootstrap"
.
If test = "wald"
then the null hypothesis is:
H_{0}:
\begin{cases}
g_{1}(\theta) = 0\\
g_{2}(\theta) = 0\\
\vdots\\
g_{m}(\theta) = 0\\
\end{cases}.
The alternative hypothesis is that there is such j\in\{1,...,m\}
that:
H_{1}:g_{j}(\theta)\ne 0.
The test statistic is:
T = g(\hat{\theta})^{T}\widehat{As.Cov}(g(\hat{\theta}))^{-1}
g(\hat{\theta}),
where \widehat{As.Cov}(g(\hat{\theta}))
is the estimate of the
asymptotic covariance matrix of g(\hat{\theta})
.
If se_type = "dm"
then delta method is used to estimate this matrix:
\widehat{As.Cov}(g(\hat{\theta}))
= g'(\hat{\theta})\widehat{As.Cov}(\hat{\theta}) g'(\hat{\theta})^{T},
where g'(\hat{\theta})
is a Jacobian matrix. A numeric differentiation
is used to calculate its elements:
g'(\hat{\theta})_{ij} =
\frac{\partial g_{i}(\theta)}{\partial \theta_{j}}|_{\theta = \hat{\theta}}.
If se_type = "bootstrap"
then bootstrap is used to estimate this
matrix:
\widehat{As.Cov}(g(\hat{\theta}))=
\frac{1}{B-1}\sum\limits_{i=1}^{B}q_{b}q_{b}^{T},
where:
q_{b} = (g(\hat{\theta}^{(b)}) -
\overline{g(\hat{\theta}^{(b)})}),
\overline{g(\hat{\theta}^{(b)})} = \frac{1}{B}\sum\limits_{i=1}^{B}
g(\hat{\theta}^{(b)}).
If method = "classic"
then it is assumed that if the null hypothesis
is true then the asymptotic distribution of the test statistic is chi-squared
with m
degrees of freedom. This distribution is used for the
calculation of the p-value:
p-value = 1 - F_{m}(T),
where F_{m}
is a cumulative distribution function of the chi-squared
distribution with m
degrees of freedom.
If method = "bootstrap"
then p-value is calculated via the bootstrap
as suggested by Hansen (2022):
p-value = \frac{1}{B}\sum\limits_{b=1}^{B} I(T_{b} > T),
where:
T_{b} = s_{b}^{T}\widehat{As.Cov}(g(\hat{\theta}))^{-1}s_{b},
s_{b} = g(\hat{\theta}^{(b)}) - g(\hat{\theta}).
Score bootstrap Wald test
If method = "score"
and test = "Wald"
then score bootstrap
Wald test of Kline and Santos (2012) is used.
Consider B
independent samples of n
independent identically
distributed random weights with zero mean and unit variance.
Let w_{ib}
denote the i
-th weight of the b
-th sample.
Argument generator
is used to supply a function which generates these
weights w_{ib}
and iter
argument represents B
.
Also n
is the number of observations in the model
object$other$n_obs
.
Let J
denote a matrix of sample scores object$J
.
Further, denote by J_{b}
a matrix such that its b
-th row is a
product of the w_{ib}
and the b
-th row of J
.
Also, denote by H
a matrix of mean values of the derivatives of
sample scores respect to the estimated parameters object$H
.
In addition consider the following notations:
A = g'(\theta) H^{-1}, \qquad S_{b} = A J^{(c)}_{b},
where J^{(c)}_{b}
is a vector of the column sums of J_{b}
.
The test statistic is as follows:
T = g(\hat{\theta})^{T}(A\widehat{Cov}(J)
A^{T})^{-1}g(\hat{\theta}) / n,
where \widehat{Cov}(J)
is a sample covariance matrix of the sample
scores of the model cov(object$J)
.
The test statistic on the b
-th bootstrap sample is similar:
T_{b} = S^{T}(A\widehat{Cov}(J_{b})
A^{T})^{-1}S / n.
The p-value is estimated as follows:
p-value = \frac{1}{B}\sum\limits_{b=1}^{B} I(T_{b} \geq T).
Confidence intervals
If test = "t"
then the function also returns the realizations of the
lower and upper bounds of the 100 \times
cl
percent symmetric
asymptotic confidence interval of g_{j}(\theta)
.
If ci = "classic"
then classic confidence interval is used which
assumes asymptotic normality of g_{j}(\hat{\theta})
:
(g_{j}(\hat{\theta}) + z_{(1 - cl) / 2}\hat{\sigma}_{j},
g_{j}(\hat{\theta}) + z_{1 - (1 - cl) / 2}\hat{\sigma}_{j}),
where z_{q}
is a q
-th quantile of the standard normal
distribution and cl
is a confidence level cl
. The method used
to estimate \hat{\sigma}_{j}
depends on the se_type
argument as
described above.
If ci = "percentile"
then percentile bootstrap confidence interval
is used. Therefore the sample quantiles of g_{j}(\hat{\theta}^{(b)})
are used as the realizations of the lower and upper bounds of the confidence
interval.
If ci = "bc"
then bias corrected percentile bootstrap confidence
interval of Efron (1982) is used as described in Hansen (2022). The default
percentile bootstrap confidence interval uses sample quantiles of levels
(1 - cl)/2
and 1 - (1 - cl)/2
.
Bias corrected version uses the sample quantiles of the
following levels:
(1 - cl)/2 + \Phi(\Phi^{-1}((1 - cl)/2) + s),
1 - (1 - cl)/2 + \Phi(\Phi^{-1}(1 - (1 - cl)/2) + s),
where:
s = 2\Phi^{-1}(\frac{1}{B}\sum\limits_{b = 1}^{B}
I(g_{j}(\hat{\theta}^{(b)})\leq g_{j}(\hat{\theta}))).
Trimming
If se_type = "bootstrap"
and trim > 0
then trimming is used as
described in Hansen (2022) to estimate \hat{\sigma}_{j}
and
\widehat{As.Cov}(g(\hat{\theta}))
. The algorithm is as follows.
First, nullify 100trim
percent of g(\hat{\theta}^{(b)})
with the
greatest values of the L2-norm of q_{b}
(defined above).
Then use this 'trimmed' sample to estimate the standard error and the
asymptotic covariance matrix.
This function returns an object of class 'test_msel'
which is
a list. It may have the following elements:
tbl
- a list with the elements described below.
is_bootstrap
- a logical value which equals TRUE
if
bootstrap has been used.
is_ci
- a logical value which equals TRUE
if
confidence intervals were used.
test
- the same as the input argument test
.
method
- the same as the input argument method
.
se_type
- the same as the input argument method
.
ci
- the same as the input argument ci
.
cl
- the same as the input argument cl
.
iter
- the same as the input argument iter
.
n_bootstrap
- an integer representing the number of the
bootstrap iterations used.
n_val
- the length of the vector returned by fn
.
A list tbl
may have the following elements:
val
- an output of the fn
function.
se
- a numeric vector such that se[i]
represents
a standard error associated with val[i]
.
p_value
- a numeric vector of p-values.
lwr
- a numeric vector such that lwr[i]
is the
realization of the lower (left) bound of the confidence interval for the
true value of val[i]
.
upr
- a numeric vector such that upr[i]
is the
realization of the upper (right) bound of the confidence interval for the
true value of val[i]
.
stat
- a numeric vector of values of the test statistics.
An object of class 'test_msel'
has an implementation of the
summary
method
summary.test_msel
.
In a special case when object
is a list of length 2
the
function returns an object of class 'lrtest_msel'
since the function
lrtest_msel
is called internally.
B. Efron (1982). The Jackknife, the Bootstrap, and Other Resampling Plans. Society for Industrial and Applied Mathematics.
B. Hansen (2022). Econometrics. Princeton University Press.
P. Kline, A. Santos (2012). A Score Based Approach to Wild Bootstrap Inference. Journal of Econometric Methods, vol. 67, no. 1, pages 23-41.
# -------------------------------
# CPS data example
# -------------------------------
# Set seed for reproducibility
set.seed(123)
# Upload the data
data(cps)
# Estimate the employment model
model <- msel(work ~ age + I(age ^ 2) + bachelor + master, data = cps)
summary(model)
# Use Wald test to test the hypothesis that age has no
# effect on the conditional probability of employment:
# H0: coef age = 0
# coef age ^ 2 = 0
age_fn <- function(object)
{
lwage_coef <- coef(object, type = "coef")[[1]]
val <- c(lwage_coef["age"], lwage_coef["I(age^2)"])
return(val)
}
age_test <- test_msel(object = model, fn = age_fn, test = "wald")
summary(age_test)
# Use t-test to test for each individual the hypothesis:
# P(work = 1 | x) = 0.8
prob_fn <- function(object)
{
prob <- predict(object, group = 1, type = "prob")
val <- prob - 0.8
return(val)
}
prob_test <- test_msel(object = model, fn = prob_fn, test = "t")
summary(prob_test)
# -------------------------------
# Simulated data example
# Model with continuous outcome
# and ordinal selection
# -------------------------------
# ---
# Step 1
# Simulation of the data
# ---
# Set seed for reproducibility
set.seed(123)
# Load required package
library("mnorm")
# The number of observations
n <- 10000
# Regressors (covariates)
s1 <- runif(n = n, min = -1, max = 1)
s2 <- runif(n = n, min = -1, max = 1)
s3 <- runif(n = n, min = -1, max = 1)
s4 <- runif(n = n, min = -1, max = 1)
# Random errors
sigma <- matrix(c(1, 0.4, 0.45, 0.7,
0.4, 1, 0.54, 0.8,
0.45, 0.54, 0.81, 0.81,
0.7, 0.8, 0.81, 1), nrow = 4)
errors <- mnorm::rmnorm(n = n, mean = c(0, 0, 0, 0), sigma = sigma)
u1 <- errors[, 1]
u2 <- errors[, 2]
eps0 <- errors[, 3]
eps1 <- errors[, 4]
# Coefficients
gamma1 <- c(-1, 2)
gamma2 <- c(1, 1)
gamma1_het <- c(0.5, -1)
beta0 <- c(1, -1, 1, -1.2)
beta1 <- c(2, -1.5, 0.5, 1.2)
# Linear index of the ordinal equations
# mean part
li1 <- gamma1[1] * s1 + gamma1[2] * s2
li2 <- gamma2[1] * s1 + gamma2[2] * s3
# variance part
li1_het <- gamma1_het[1] * s2 + gamma1_het[2] * s3
# Linear index of the continuous equation
# regime 0
li_y0 <- beta0[1] + beta0[2] * s1 + beta0[3] * s3 + beta0[4] * s4
# regime 1
li_y1 <- beta1[1] + beta1[2] * s1 + beta1[3] * s3 + beta1[4] * s4
# Latent variables
z1_star <- li1 + u1 * exp(li1_het)
z2_star <- li2 + u2
y0_star <- li_y0 + eps0
y1_star <- li_y1 + eps1
# Cuts
cuts1 <- c(-1)
cuts2 <- c(0, 1)
# Observable ordinal outcome
# first
z1 <- rep(0, n)
z1[z1_star > cuts1[1]] <- 1
# second
z2 <- rep(0, n)
z2[(z2_star > cuts2[1]) & (z2_star <= cuts2[2])] <- 1
z2[z2_star > cuts2[2]] <- 2
z2[z1 == 0] <- NA
# Observable continuous outcome
y <- rep(NA, n)
y[which(z2 == 0)] <- y0_star[which(z2 == 0)]
y[which(z2 != 0)] <- y1_star[which(z2 != 0)]
y[which(z1 == 0)] <- NA
# Data
data <- data.frame(s1 = s1, s2 = s2, s3 = s3, s4 = s4,
z1 = z1, z2 = z2, y = y)
# ---
# Step 2
# Estimation of the parameters
# ---
# Assign the groups
groups <- matrix(c(1, 2,
1, 1,
1, 0,
0, -1),
byrow = TRUE, ncol = 2)
groups2 <- matrix(c(1, 1, 0, -1), ncol = 1)
# Estimate the model
model <- msel(list(z1 ~ s1 + s2 | s2 + s3,
z2 ~ s1 + s3),
list(y ~ s1 + s3 + s4),
groups = groups, groups2 = groups2,
data = data)
# ---
# Step 3
# Hypotheses testing
# ---
# Use t-test to test for each observation the hypothesis
# H0: P(z1 = 0, z2 = 2 | Xi) = 0
prob02_fn <- function(object)
{
val <- predict(object, group = c(1, 0))
return(val)
}
prob02_test <- test_msel(object = model, fn = prob02_fn, test = "t")
summary(prob02_test)
# Use t-test to test the hypothesis
# H0: E(y1|z1=0, z2=2) - E(y0|z1=0, z2=2)
ATE_fn <- function(object)
{
val1 <- predict(object, group = c(0, 2), group2 = 1)
val0 <- predict(object, group = c(0, 2), group2 = 0)
val <- mean(val1 - val0)
return(val)
}
ATE_test <- test_msel(object = model, fn = ATE_fn)
summary(ATE_test)
# Use Wald to test the hypothesis
# H0: beta1 = beta0
coef_fn <- function(object)
{
coef1 <- coef(object, regime = 1, type = "coef2")
coef0 <- coef(object, regime = 0, type = "coef2")
coef_difference <- coef1 - coef0
return(coef_difference)
}
coef_test <- test_msel(object = model, fn = coef_fn, test = "wald")
summary(coef_test)
# Use t-test to test for each 'k' the hypothesis
# H0: beta1k = beta0k
coef_test2 <- test_msel(object = model, fn = coef_fn, test = "t")
summary(coef_test2)
# Use Wald test to test the hypothesis
# H0: beta11 + beta12 - 0.5 = 0
# beta11 * beta13 - beta03 = 0
test_fn <- function(object)
{
coef1 <- coef(object, regime = 1, type = "coef2")
coef0 <- coef(object, regime = 0, type = "coef2")
val <- c(coef1[1] + coef1[2] - 0.5,
coef1[1] * coef1[3] - coef0[3])
return(val)
}
# classic Wald test
wald1 <- test_msel(object = model, fn = test_fn,
test = "wald", method = "classic")
summary(wald1)
# score bootstrap Wald test
wald2 <- test_msel(object = model, fn = test_fn,
test = "wald", method = "score")
summary(wald2)
# Replicate the latter test with the 2-step estimator
model2 <- msel(list(z1 ~ s1 + s2 | s2 + s3,
z2 ~ s1 + s3),
list(y ~ s1 + s3 + s4),
groups = groups, groups2 = groups2,
data = data, estimator = "2step")
# classic Wald test
wald1_2step <- test_msel(object = model2, fn = test_fn,
test = "wald", method = "classic")
summary(wald1_2step)
# score bootstrap Wald test
wald2_2step <- test_msel(object = model2, fn = test_fn,
test = "wald", method = "score")
summary(wald2_2step)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.