knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(amp)

The test control arguments:

| Argument | Description | | ------- | ---------- | |n_peld_mc_samples |Number of samples to be used in approximating the estimated limiting distribution of the parameter estimate under the null. Increasing this value reduces the approximation error of the test statistic. | |nrm_type |The type of norm to be used for the test. Generally the l_p norm | |perf_meas |the preferred measure used to generate the test statistic. | |pos_lp_norms |The index of the norms to be considered. For example if we use the l_p norm, norms_indx specifies the different p's to try. | |ld_est_meth |String indicating method for estimating the limiting distribution of the test statistic parametric bootstrap or permutation. | |ts_ld_bs_samp|The number of test statistic limiting distribution bootstrap samples to be drawn. | |other_output |A vector indicating additional data that should be returned. Currently only "var_est" is supported.| |... |Other arguments needed in other places. |

Throughout, we will use a simple data generating mechanism:

x_data <- matrix(rnorm(500), ncol = 5)
y_data <- rnorm(100) + 0.02 * x_data[, 2]
obs_data <- data.frame(y_data, x_data)

Test statistic controls

There are multiple options when defining a test statistic outside of the specification of the parameter estimator, $\hat{\Psi}$ and corresponding IC estimator, $\hat{IC}$ (which is specified in the param_est argument. There are four arguments arguments that control these options.

perf_meas

The first argument perf_meas specifies the performance measure used to define the test statistic. Loosely defined, a performance measure is a function that provides information about the performance of a simple test at a specified alternative. It takes as arguments a norm $\varphi$, an alternative $x$ and a limiting distribution $P_0$ and considers the performance of a test defined by [ \text{reject if } \varphi\left(\hat{\psi}\right) > c_{\alpha} ] if the parameter value $\psi$ was equal to $x$. The perf_meas specifies which measure of performance to use. Currently the package has implemented three such measures:

Recommendation: Based on what we know currently, we recommend that users use the multiplicative distance performance measure. The other measures can have limiting distributions that are highly concentrated near 0 which can cause issues when approximating the p-value of the test.

We will discuss specification of the norm in the next section. For more details on the procedure, including why performance measures are good for defining a test statistic, see A general adaptive framework for multivariate point null testing.

Norm specification

Two arguments are used to specify the norm used in defining the test statistic. The first is nrm_type which can either be "ssq" or "lp". These norms are defined as:

The choice of $p$ is specified by the pos_lp_norms argument. If pos_lp_norm is assigned a single value, a non-adaptive version of the test will be performed. If instead pos_lp_norm is assigned multiple arguments an adaptive test will be carried out. More information can be found in our paper. For the $\ell_p$ norm, it is possible to set $p = \infty$. To make this specification in R, include "max" in the vector of values assigned to pos_lp_norm.

ld_est_meth

The next argument we review specifies the method by which you wish to estimate the limiting distribution of the test statistic ($\Gamma(\hat{\psi}, \hat{P}_0)$). There are two options for this argument:

Approximation controls

The next two controls specify the accuracy of the approximation of the testing procedure.

n_peld_mc_samples

To understand this control argument it is important to distinguish between our parameter estimator $\hat{\psi}$ and our test statistic, which is a function of $\hat{\psi}$ and the estimated limiting distribution of $\hat{\psi}$ under the null hypothesis (that $\psi = 0$), denoted by $\hat{P}_0$. Letting $\Gamma$ denote our performance measure, conditional on our observations, the true value of the test statistic is fixed and equal to $\Gamma(\hat{\psi}, \hat{P}_0)$.

The n_peld_mc_samples argument determines how accurate the approximation of test statistic $\Gamma(\hat{\psi}, \hat{P}_0)$ will be. The performance measure is frequently a function of $\hat{P}_0$ through some probability statement (see the perf_meas for examples). To approximate these probabilities, a MC approximation is used and n_peld_mc_samples determines how many MC draws are taken.

Considering this argument in practice, note that the testing procedure only approximates the test statistic:

tc <- amp::test.control(n_peld_mc_samples = 50, pos_lp_norms = "2")
set.seed(10)
test_1 <- amp::mv_pn_test(obs_data = obs_data, param_est = amp::ic.pearson, 
                control = tc)
set.seed(20)
test_2 <- amp::mv_pn_test(obs_data = obs_data, param_est = amp::ic.pearson, 
                control = tc)
print(c(test_1$test_stat, test_2$test_stat))

In order to better approximate the test statistic, one may increase the value of this control argument:

mc_draws <- c(10, 50)
all_res <- list()
for (mc_draws in c(10, 50)) {
  set.seed(121)
  tc <- amp::test.control(n_peld_mc_samples = mc_draws, pos_lp_norms = 2, 
                          perf_meas = "est_acc")
  test_stat <- replicate(50, amp::mv_pn_test(obs_data = obs_data,
                            param_est = amp::ic.pearson,
                            control = tc)$test_stat)
  all_res[[as.character(mc_draws)]] <- 
    data.frame("mc_draws" = mc_draws, test_stat)
}
oldpar <- par(mfrow = c(1,2))
yl <- 25 
hist(all_res[[1]]$test_stat, main = "MC draws = 10",
     xlab = "Test Statistic", xlim = c(0, 1), ylim = c(0, yl), 
     breaks = seq(0, 1, 0.1)) 
hist(all_res[[2]]$test_stat, main = "MC draws = 50",
     xlab = "Test Statistic", xlim = c(0, 1),  ylim = c(0, yl), 
     breaks = seq(0, 1, 0.1))
par(oldpar)

ts_ld_bs_samp

The other parameter that determines the approximation accuracy of the testing procedure is ts_ld_bs_samp. This argument determines the number of draws taken from the estimated limiting distribution of $\Gamma(\hat{\psi}, \hat{P}_0)$. This is different that n_peld_mc_samples that determines the accuracy of these draws and the test statistic.

Controlling the output of mv_pn_test

The last argument determines the output of the mv_pn_test function. The standard output of the test function is a list containing the following:

other_output

other_output is a character vector. Currently other_output only provides the option of returning two additional output elements.



adam-s-elder/amp documentation built on July 17, 2022, 9:18 p.m.