select_h | R Documentation |
This function computes the kernel bandwidth of the Gaussian kernel for the normality, two-sample and k-sample kernel-based quadratic distance (KBQD) tests.
select_h(
x,
y = NULL,
alternative = NULL,
method = "subsampling",
b = 0.8,
B = 100,
delta_dim = 1,
delta = NULL,
h_values = NULL,
Nrep = 50,
n_cores = 2,
Quantile = 0.95,
power.plot = TRUE
)
x |
Data set of observations from X. |
y |
Numeric matrix or vector of data values. Depending on the input
|
alternative |
Family of alternative chosen for selecting h, between "location", "scale" and "skewness". |
method |
The method used for critical value estimation ("subsampling", "bootstrap", or "permutation"). |
b |
The size of the subsamples used in the subsampling algorithm . |
B |
The number of iterations to use for critical value estimation, B = 150 as default. |
delta_dim |
Vector of coefficient of alternative with respect to each dimension |
delta |
Vector of parameter values indicating chosen alternatives |
h_values |
Values of the tuning parameter used for the selection |
Nrep |
Number of bootstrap/permutation/subsampling replications. |
n_cores |
Number of cores used to parallel the h selection algorithm. If this is not provided, the function will detect the available cores. |
Quantile |
The quantile to use for critical value estimation, 0.95 is the default value. |
power.plot |
Logical. If TRUE, it is displayed the plot of power for values in h_values and delta. |
The function performs the selection of the optimal value for the tuning
parameter h
of the normal kernel function, for normality test, the
two-sample and k-sample KBQD tests. It performs a small simulation study,
generating samples according to the family of alternative
specified,
for the chosen values of h_values
and delta
.
We consider target alternatives F_\delta(\hat{\mathbf{\mu}},
\hat{\mathbf{\Sigma}}, \hat{\mathbf{\lambda}})
, where
\hat{\mathbf{\mu}}, \hat{\mathbf{\Sigma}}
and
\hat{\mathbf{\lambda}}
indicate the location,
covariance and skewness parameter estimates from the pooled sample.
Compute the estimates of the mean \hat{\mu}
, covariance matrix
\hat{\Sigma}
and skewness \hat{\lambda}
from the pooled sample.
Choose the family of alternatives F_\delta = F_\delta(\hat{\mu}
,\hat{\Sigma}, \hat{\lambda})
.
For each value of \delta
and h
:
Generate \mathbf{X}_1,\ldots,\mathbf{X}_{k-1} \sim F_0
, for
\delta=0
;
Generate \mathbf{X}_k \sim F_\delta
;
Compute the k
-sample test statistic between \mathbf{X}_1,
\mathbf{X}_2, \ldots, \mathbf{X}_k
with kernel parameter h
;
Compute the power of the test. If it is greater than 0.5,
select h
as optimal value.
If an optimal value has not been selected, choose the h
which
corresponds to maximum power.
The available alternative
are
location alternatives, F_\delta =
SN_d(\hat{\mu} + \delta,\hat{\Sigma}, \hat{\lambda})
,with
\delta = 0.2, 0.3, 0.4
;
scale alternatives,
F_\delta = SN_d(\hat{\mu} ,\hat{\Sigma}*\delta, \hat{\lambda})
,
\delta = 0.1, 0.3, 0.5
;
skewness alternatives,
F_\delta = SN_d(\hat{\mu} ,\hat{\Sigma}, \hat{\lambda} + \delta)
,
with \delta = 0.2, 0.3, 0.6
.
The values of h = 0.6, 1, 1.4, 1.8, 2.2
and N=50
are set as
default values.
The function select_h()
allows the user to
set the values of \delta
and h
for a more extensive grid search.
We suggest to set a more extensive grid search when computational resources
permit.
A list with the following attributes:
h_sel
the selected value of tuning parameter h;
power
matrix of power values computed for the considered
values of delta
and h_values
;
power.plot
power plots (if power.plot
is TRUE
).
Please be aware that the select_h()
function may take a significant
amount of time to run, especially with larger datasets or when using an
larger number of parameters in h_values
and delta
. Consider
this when applying the function to large or complex data.
Markatou, M. and Saraceno, G. (2024). “A Unified Framework for
Multivariate Two- and k-Sample Kernel-based Quadratic Distance
Goodness-of-Fit Tests.”
https://doi.org/10.48550/arXiv.2407.16374
Saraceno, G., Markatou, M., Mukhopadhyay, R. and Golzy, M. (2024).
Goodness-of-Fit and Clustering of Spherical Data: the QuadratiK package
in R and Python.
https://arxiv.org/abs/2402.02290.
The function select_h
is used in the kb.test()
function.
# Select the value of h using the mid-power algorithm
x <- matrix(rnorm(100),ncol=2)
y <- matrix(rnorm(100),ncol=2)
h_sel <- select_h(x,y,"skewness")
h_sel
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.