global_envelope_test | R Documentation |
Global envelope test, global envelopes and p-values
global_envelope_test(
curve_sets,
typeone = c("fwer", "fdr"),
alpha = 0.05,
alternative = c("two.sided", "less", "greater"),
type = "erl",
algorithm = c("IATSE", "ATSE"),
ties = "erl",
probs = c(0.025, 0.975),
quantile.type = 7,
central = "mean",
nstep = 2,
...,
lower = NULL,
upper = NULL
)
curve_sets |
A |
typeone |
Character string indicating which type I error rate to control, either the family-wise error rate ('fwer') or false discovery rate ('fdr'). |
alpha |
The significance level. The 100(1-alpha)% global envelope will be calculated under the 'fwer' or 'fdr' control. If a vector of values is provided, the global envelopes are calculated for each value. |
alternative |
A character string specifying the alternative hypothesis.
Must be one of the following: "two.sided" (default), "less" or "greater".
The last two options only available for types |
type |
The type of the global envelope with current options for 'rank', 'erl', 'cont', 'area', 'qdir', 'st' and 'unscaled'. See details. |
algorithm |
The algorithm for the computation of the FDR envelope. Either "IATSE" or "ATSE" standing for the iteratively adaptive two-stage envelope and the adaptive two-stage envelope, respectively, see Mrkvička and Myllymäki (2023). |
ties |
The method to obtain a unique p-value when |
probs |
A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017). |
quantile.type |
As type argument of |
central |
Either "mean" or "median". If the curve sets do not contain the component
|
nstep |
1 or 2 for how to contruct a combined global envelope if list of curve sets is provided. 2 (default) for a two-step combining procedure, 1 for one-step. |
... |
Additional parameters to be passed to |
lower |
A single number (or a vector of suitable length) giving a lower bound for the functions. Used only for the extension of the FDR envelope. |
upper |
A single number (or a vector of suitable length) giving an upper bound for the functions. Used only for the extension of the FDR envelope. |
Given a curve_set
object,
or an envelope
object of spatstat,
which contains both the data curve (or function or vector) T_1(r)
(in the component obs
) and
the simulated curves T_2(r),\dots,T_{s+1}(r)
(in the component sim_m
),
the function global_envelope_test
performs a global envelope test,
either under the control of family-wise error rate (FWER) or false discovery rate (FDR),
as specified by the argument typeone
.
The function global_envelope_test
is the main function for global envelope tests
(for simple hypotheses).
The case typeone = "fdr"
corresponds to the FDR envelopes proposed by
Mrkvička and Myllymäki (2023). See details in fdr_envelope
and
in the vignette vignette("FDRenvelopes")
. Note there also the arguments that are
the relevant ones for the FDR envelope specification.
The descriptions below concern the FWER envelopes.
If typeone = "fwer"
, the functionality of the function is rather similar to the
function central_region
, but in addition to ordering the functions from
the most extreme one to the least extreme one using different measures
and providing the global envelopes with intrinsic
graphical interpretation, p-values are calculated for the test.
Thus, while central_region
can be used to construct global
envelopes in a general setting, the function global_envelope_test
is devoted to testing as its name suggests.
Different type
of global envelope tests under the control of FWER can be computed.
We use such ordering of the functions for which we are able to construct global
envelopes with intrinsic graphical interpretation (IGI, see Myllymäki and Mrkvička, 2023).
'rank'
: the completely non-parametric rank envelope test (Myllymäki et al., 2017)
based on minimum of pointwise ranks
'erl'
: the completely non-parametric rank envelope test based on extreme rank lengths
(Myllymäki et al., 2017; Mrkvička et al., 2018) based on number of minimal pointwise ranks
'cont'
: the completely non-parametric rank envelope test based on continuous rank
(Hahn, 2015; Mrkvička et al., 2022) based on minimum of continuous pointwise ranks
'area'
: the completely non-parametric rank envelope test based on area rank
(Mrkvička et al., 2022) based on area between continuous pointwise ranks and minimum
pointwise ranks for those argument (r) values for which pointwise ranks achieve the minimum
(it is a combination of erl and cont)
"qdir", the directional quantile envelope test, protected against unequal variance and asymmetry of T(r) for different distances r (Myllymäki et al., 2015, 2017)
"st", the studentised envelope test, protected against unequal variance of T(r) for different distances r (Myllymäki et al., 2015, 2017)
"unscaled", the unscaled envelope (providing a baseline) that has a contant width and that corresponds to the classical maximum deviation test (Ripley, 1981).
The first four types are global rank envelopes.
The 'rank'
envelope test is a completely non-parametric test,
which provides the 100(1-alpha)% global envelope for the chosen test function
T(r) on the chosen interval of distances and associated p-values.
The other three are modifications of 'rank'
to treat the ties in
the extreme rank ordering on which the 'rank'
test is based on.
The last three envelopes are global scaled maximum absolute difference (MAD)
envelope tests. The unscaled envelope test leads to envelopes with constant width over the
distances r. Thus, it suffers from unequal variance of T(r) over the distances r and
from the asymmetry of distribution of T(r). We recommend to use the other global
envelope tests available. The unscaled envelope is provided as a reference.
See Myllymäki and Mrkvička (2023, Appendix.), i.e. vignette("GET")
, for more detailed
description of the measures and the corresponding envelopes.
See vignette("pointpatterns")
for examples of point pattern analyses.
See vignette("FDRenvelopes")
for examples of FDR envelopes obtained by
typeone = "fdr"
.
Either an object of class "global_envelope" or "combined_global_envelope",
similarly as the objects returned by central_region
.
Further, if typeone = "fdr"
, the objects have the further class
"fdr_envelope" or "combined_fdr_envelope".
The global_envelope
is essentially a data frame containing columns
the values of the argument r at which the test was made, copied from the argument curve_sets
with the corresponding names
obs = values of the data function, copied from the argument curve_sets
(unlike for central regions, obs
always exists for a global envelope test)
lo = the lower envelope; in case of a vector of alpha values, several 'lo' exist with names paste0("lo.", 100*(1-alpha))
hi = the upper envelope; in case of a vector of alpha values, several 'lo' exist with names paste0("hi.", 100*(1-alpha))
central = a central curve as specified in the argument central
.
Moreover, the returned object has the same attributes as the global_envelope
object returned by
central_region
and in addition
p = the p-value of the test
and in the case that type = 'rank'
also
p_interval = The p-value interval [p_{liberal}, p_{conservative}]
.
ties = As the argument ties
.
The combined_global_envelope
is a list of global_envelope
objects
containing the above mentioned columns and which all together form the global envelope.
It has the same attributes as described in central_region
, and in addition also
the p-value p
.
The 2d classes are attached as described in central_region
.
1) First the curves are ranked from the most extreme one to the least extreme one
by a measure that is specified by the argument type
. The options are
'rank': extreme ranks (Myllymäki et al., 2017)
'erl': extreme rank lengths (Myllymäki et al., 2017; Mrkvička et al., 2018)
'cont': continuous ranks (Hahn, 2015; Mrkvička et al., 2019)
'area': area ranks (Mrkvička et al., 2019)
'qdir': the directional quantile maximum absolute deviation (MAD) measure (Myllymäki et al., 2015, 2017)
'st': the studentized MAD measure (Myllymäki et al., 2015, 2017)
'unscaled': the unscaled MAD measure (Ripley, 1981)
2) Based on the measures used to rank the functions, the 100(1-alpha)% global envelope is provided. It corresponds to the 100*coverage% central region.
3) P-values:
In the case type="rank"
, based on the extreme ranks k_i, i=1, ..., s+1
,
the p-interval is calculated. Because the extreme ranks contain ties, there is not just
one p-value. The p-interval is given by the most liberal and the most conservative p-value
estimate. Also a single p-value is calculated.
By default this single p-value is the extreme rank length p-value ("erl") as specified by the argument ties
.
If the case of other measures, a (single) p-value based on the given ordering
of the functions is calculated and returned in the attribute p
.
For the global "rank"
envelope test, Myllymäki et al. (2017) recommended to use
at least 2500 simulations for testing at the significance level alpha = 0.05 for single
function tests, based on experiments with summary functions for point processes evaluated
approximately at 500 argument values.
In this case, the width of the p-interval associated with the extreme rank measure tended
to be smaller than 0.01.
The tests 'erl'
, 'cont'
and 'area'
, similarly as
the MAD deviation/envelope tests 'qdir'
, 'st'
and 'unscaled'
,
allow in principle a lower number of simulations to be used than the test based on
extreme ranks ('rank'
), because no ties occur for these measures.
If affordable, we recommend in any case some thousands of simulations for all the measures
to achieve a good power and repeatability of the test.
If the dimension of the test functions is higher, also the number of simulations should
preferably be higher.
If a list of (suitable) objects are provided in the argument curve_sets
,
then by default (nstep = 2
) the two-step combining procedure is used to
perform the combined global test as described in Myllymäki and Mrkvička (2020).
If nstep = 1
and the lengths of the multivariate vectors in each component
of the list are equal, then the one-step combining procedure is used where the
functions are concatenated together into a one long vector.
Mrkvička, T., Myllymäki, M. and Hahn, U. (2017). Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27(5), 1239-1255. doi: 10.1007/s11222-016-9683-9
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Mrkvička and Myllymäki (2023). False discovery rate envelopes. Statistics and Computing 33, 109. https://doi.org/10.1007/s11222-023-10275-7
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381–404. doi: 10.1111/rssb.12172
Myllymäki, M. and Mrkvička, T. (2023). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
Ripley, B.D. (1981). Spatial statistics. Wiley, New Jersey.
plot.global_envelope
, central_region
,
GET.composite
# Goodness-of-fit testing for simple hypothesis
if(require("spatstat.explore", quietly=TRUE)) {
# Testing complete spatial randomness (CSR)
#==========================================
X <- unmark(spruces)
nsim <- 1999 # Number of simulations
# Illustration of general workflow for simple hypotheses
#=======================================================
# First illustrate the general workflow for the test by this example
# of CSR test for a point pattern X using the empirical L-function.
# Define the argument values at which the functions are evaluated
obs.L <- Lest(X, correction="translate")
r <- obs.L[['r']]
# The test function for the data
obs <- obs.L[['trans']] - r
# Prepare simulations and calculate test functions for them at same r as 'obs'
sim <- matrix(nrow=length(r), ncol=nsim)
for(i in 1:nsim) {
sim.X <- runifpoint(ex=X) # simulation under CSR
sim[, i] <- Lest(sim.X, correction="translate", r=r)[['trans']] - r
}
# Create a curve_set containing argument values, observed and simulated functions
cset <- curve_set(r=r, obs=obs, sim=sim)
# Perform the test
res <- global_envelope_test(cset, type="erl")
plot(res) + ggplot2::ylab(expression(italic(hat(L)(r)-r)))
# Simple hypothesis for a point pattern utilizing the spatstat package
#=====================================================================
# Generate nsim simulations under CSR, calculate L-function for the data and simulations
env <- envelope(X, fun="Lest", nsim=nsim,
savefuns=TRUE, # save the functions
correction="translate", # edge correction for L
transform=expression(.-r), # centering
simulate=expression(runifpoint(ex=X))) # Simulate CSR
# The rank envelope test (ERL)
res <- global_envelope_test(env, type="erl")
# Plot the result
plot(res)
## Advanced use:
# Choose the interval of distances [r_min, r_max] (at the same time create a curve_set from 'env')
cset <- crop_curves(env, r_min=1, r_max=7)
# Do the rank envelope test (erl)
res <- global_envelope_test(cset, type="erl")
plot(res) + ggplot2::ylab(expression(italic(L(r)-r)))
# A combined global envelope test
#================================
# As an example test CSR of the saplings point pattern by means of
# L, F, G and J functions.
data(saplings)
X <- as.ppp(saplings, W=square(75))
nsim <- 499 # Number of simulations
# Specify distances for different test functions
n <- 500 # the number of r-values
rmin <- 0; rmax <- 20; rstep <- (rmax-rmin)/n
rminJ <- 0; rmaxJ <- 8; rstepJ <- (rmaxJ-rminJ)/n
r <- seq(0, rmax, by=rstep) # r-distances for Lest
rJ <- seq(0, rmaxJ, by=rstepJ) # r-distances for Fest, Gest, Jest
# Perform simulations of CSR and calculate the L-functions
env_L <- envelope(X, nsim=nsim,
simulate=expression(runifpoint(ex=X)),
fun="Lest", correction="translate",
transform=expression(.-r), # Take the L(r)-r function instead of L(r)
r=r, # Specify the distance vector
savefuns=TRUE, # Save the estimated functions
savepatterns=TRUE) # Save the simulated patterns
# Take the simulations from the returned object
simulations <- attr(env_L, "simpatterns")
# Then calculate the other test functions F, G, J for each simulated pattern
env_F <- envelope(X, nsim=nsim, simulate=simulations,
fun="Fest", correction="Kaplan", r=rJ,
savefuns=TRUE)
env_G <- envelope(X, nsim=nsim, simulate=simulations,
fun="Gest", correction="km", r=rJ,
savefuns=TRUE)
env_J <- envelope(X, nsim=nsim, simulate=simulations,
fun="Jest", correction="none", r=rJ,
savefuns=TRUE)
# Crop the curves to the desired r-interval I
curve_set_L <- crop_curves(env_L, r_min=rmin, r_max=rmax)
curve_set_F <- crop_curves(env_F, r_min=rminJ, r_max=rmaxJ)
curve_set_G <- crop_curves(env_G, r_min=rminJ, r_max=rmaxJ)
curve_set_J <- crop_curves(env_J, r_min=rminJ, r_max=rmaxJ)
res <- global_envelope_test(curve_sets=list(L=curve_set_L, F=curve_set_F,
G=curve_set_G, J=curve_set_J))
plot(res)
plot(res, labels=c("L(r)-r", "F(r)", "G(r)", "J(r)"))
}
# A test based on a low dimensional random vector
#================================================
# Let us generate some example data.
X <- matrix(c(-1.6,1.6),1,2) # data pattern X=(X_1,X_2)
if(requireNamespace("mvtnorm", quietly=TRUE)) {
Y <- mvtnorm::rmvnorm(200,c(0,0),matrix(c(1,0.5,0.5,1),2,2)) # simulations
plot(Y, xlim=c(min(X[,1],Y[,1]), max(X[,1],Y[,1])), ylim=c(min(X[,2],Y[,2]), max(X[,2],Y[,2])))
points(X, col=2)
# Test the null hypothesis is that X is from the distribution of Y's (or if it is an outlier).
# Case 1. The test vector is (X_1, X_2)
cset1 <- curve_set(r=1:2, obs=as.vector(X), sim=t(Y))
res1 <- global_envelope_test(cset1)
plot(res1)
# Case 2. The test vector is (X_1, X_2, (X_1-mean(Y_1))*(X_2-mean(Y_2))).
t3 <- function(x, y) { (x[,1]-mean(y[,1]))*(x[,2]-mean(y[,2])) }
cset2 <- curve_set(r=1:3, obs=c(X[,1],X[,2],t3(X,Y)), sim=rbind(t(Y), t3(Y,Y)))
res2 <- global_envelope_test(cset2)
plot(res2)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.