# global_envelope_test: Global envelope test In myllym/GET: Global Envelope Tests

## Description

Global envelope test, p-values and global envelopes

## Usage

 1 2 3 global_envelope_test(curve_sets, type = "erl", alpha = 0.05, alternative = c("two.sided", "less", "greater"), ties = "erl", probs = c(0.025, 0.975), central = "mean") 

## Arguments

 curve_sets A curve_set (see create_curve_set) or an envelope object containing a data function and simulated functions. If an envelope object is given, it must contain the summary functions from the simulated patterns which can be achieved by setting savefuns = TRUE when calling envelope. Alternatively, a list of curve_set or envelope objects can be given. type The type of the global envelope with current options for 'rank', 'erl', 'cont', 'area', 'qdir', 'st' and 'unscaled'. See details. alpha The significance level. The 100(1-alpha)% global envelope will be calculated. alternative A character string specifying the alternative hypothesis. Must be one of the following: "two.sided" (default), "less" or "greater". The last two options only available for types 'rank', 'erl', 'cont' and 'area'. ties The method to obtain a unique p-value when type = 'rank'. Possible values are 'midrank', 'random', 'conservative', 'liberal' and 'erl'. For 'conservative' the resulting p-value will be the highest possible. For 'liberal' the p-value will be the lowest possible. For 'random' the rank of the obs within the tied values is uniformly sampled so that the resulting p-value is at most the conservative option and at least the liberal option. For 'midrank' the mid-rank within the tied values is taken. For 'erl' the extreme rank length p-value is calculated. The default is 'erl'. probs A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017). central Either "mean" or "median". If the curve sets do not contain the component theo for the theoretical central function, then the central function (used for plotting only) is calculated either as the mean or median of functions provided in the curve sets.

## Details

Given a curve_set (see create_curve_set for how to create such an object) or an envelope object, which contains both the data curve (or function or vector) T_1(r) (in the component obs) and the simulated curves T_2(r),...,T_(s+1)(r) (in the component sim_m), the function global_envelope_test performs a global envelope test. The functionality of the function is rather similar to the function central_region, but in addition to ordering the functions from the most extreme one to the least extreme one using different measures and providing the global envelopes with intrinsic graphical interpretation, p-values are calculated for the test. Thus, while central_region can be used to construct global envelopes in a general setting, the function global_envelope_test is devoted to testing as its name suggests.

The function global_envelope_test is the main function for global envelope tests using single functions (for simple hypotheses). Different type of global envelope tests can be performed. We use such ordering of the functions for which we are able to construct global envelopes with intrinsic graphical interpretation.

• 'rank': the completely non-parametric rank envelope test (Myllymäki et al., 2017) based on minimum of pointwise ranks

• 'erl': the completely non-parametric rank envelope test based on extreme rank lengths (Myllymäki et al., 2017; Mrkvička et al., 2018) based on number of minimal pointwise ranks

• 'cont': the completely non-parametric rank envelope test based on continuous rank (Hahn, 2015; Mrkvička and Narisetty, 2018) based on minimum of continuous pointwise ranks

• 'area': the completely non-parametric rank envelope test based on area rank (Mrkvička and Narisetty, 2018) based on area between continuous pointwise ranks and minimum pointwise ranks for those argument (r) values for which pointwise ranks achieve the minimum (it is a combination of erl and cont)

• "qdir", the directional quantile envelope test, protected against unequal variance and asymmetry of T(r) for different distances r (Myllymäki et al., 2015, 2017)

• "st", the studentised envelope test, protected against unequal variance of T(r) for different distances r (Myllymäki et al., 2015, 2017)

• "unscaled", the unscaled envelope (providing a baseline) that has a contant width and that corresponds to the classical maximum deviation test (Ripley, 1981).

See forder and central_region and the references for more detailed description of the measures and the corresponding envelopes.

## Value

An object of class "global_envelope", "envelope" and "fv" (see fv.object), which can be printed and plotted directly.

Essentially a data frame containing columns

• r = the vector of values of the argument r at which the test was made

• obs = values of the data function

• lo = the lower envelope based on the simulated functions

• hi = the upper envelope based on the simulated functions

• central = If the curve_set (or envelope object) contains a component 'theo', then this function is used as the central curve and returned in this component. Otherwise, the central_curve is the mean of the test functions T_i(r), i=2, ..., s+1. Used for visualization only.

Moreover, the return value has the same attributes as the object returned by central_region and in addition

• p = A point estimate for the p-value (default is the mid-rank p-value).

and in the case that type = 'rank' also

• p_interval = The p-value interval [p_liberal, p_conservative].

• ties = As the argument ties.

## Ranking of the curves

The options for measures to order the functions from the most extreme one to the least extreme one are given by the argument type: 'rank', 'erl', 'cont', 'area', 'qdir', 'st', 'unscaled'. The options are

• 'rank': extreme ranks (Myllymäki et al., 2017)

• 'erl': extreme rank lengths (Myllymäki et al., 2017; Mrkvička et al., 2018)

• 'cont': continuous ranks (Hahn, 2015; Mrkvička and Narisetty, 2018)

• 'area': area ranks (Mrkvička and Narisetty, 2018)

• 'qdir': the directional quantile maximum absolute deviation (MAD) measure (Myllymäki et al., 2015, 2017)

• 'st': the studentized MAD measure (Myllymäki et al., 2015, 2017)

• 'unscaled': the unscaled MAD measure (Ripley, 1981)

See more detailed description of the envelopes and measures in central_region and forder.

## Global envelope

Based on the measures used to rank the functions, the 100(1-alpha)% global envelope is provided. It corresponds to the 100*coverage% central region.

## P-values

In the case type="rank", based on the extreme ranks k_i, i=1, ..., s+1, the p-interval is calculated. Because the extreme ranks contain ties, there is not just one p-value. The p-interval is given by the most liberal and the most conservative p-value estimate. Also a single p-value is calculated. By default this single p-value is the extreme rank length p-value ("erl"), but another option can be used by specifying ties argument.

If the case type = "erl", the (single) p-value based on the extreme rank length ordering of the functions is calculated and returned in the attribute p. The same is done for other measures, the p-value always being correspondent to the chosen measure.

## Number of simulations

The tests 'erl', 'cont' and 'area', similarly as the MAD deviation/envelope tests 'qdir', 'st' and 'unscaled', allow in principle a lower number of simulations to be used than the test based on extreme ranks ('rank'), because no ties occur for these measures. However, if affordable, we recommend some thousands of simulations in any case to achieve a good power and repeatability of the test.

## References

Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381–404. doi: 10.1111/rssb.12172

Mrkvička, T., Myllymäki, M. and Hahn, U. (2017). Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27 (5): 1239-1255. doi: 10.1007/s11222-016-9683-9

Mrkvička, T., Hahn, U. and Myllymäki, M. (2018). A one-way ANOVA test for functional data with graphical interpretation. arXiv:1612.03608 [stat.ME]

Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11: 19-34. doi: 10.1016/j.spasta.2014.11.004

Ripley, B.D. (1981). Spatial statistics. Wiley, New Jersey.

plot.global_envelope, central_region, global_envelope_test_2d
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 if(require(spatstat, quietly=TRUE)) { ## Testing complete spatial randomness (CSR) #------------------------------------------- pp <- unmark(spruces) # Generate nsim simulations under CSR, calculate L-function for the data and simulations env <- envelope(pp, fun="Lest", nsim=1999, savefuns=TRUE, # save the functions correction="translate", # edge correction for L simulate=expression(runifpoint(pp$n, win=pp$window))) # Simulate CSR # The rank envelope test (extreme rank length (ERL) breaking of ties) res <- global_envelope_test(env) # Plot the result. # - The central curve is now obtained from env[['theo']], which is the # value of the L-function under the null hypothesis (L(r) = r). # - Three different plot styles are provided: # a) a basic style plot(res) # b) spatstat's style (requires spatstat) plot(res, plot_style="fv") # or ggplot2 style (requires ggplot2) plot(res, plot_style="ggplot2") ## Advanced use: # Choose the interval of distances [r_min, r_max] (at the same time create a curve_set from 'env') curve_set <- crop_curves(env, r_min=1, r_max=7) # For better visualisation, take the L(r)-r function curve_set <- residual(curve_set, use_theo=TRUE) # Do the rank envelope test (erl) res <- global_envelope_test(curve_set); plot(res, plot_style="ggplot2") ## Random labeling test #---------------------- mpp <- spruces # 1) Perform simulations under the random labelling hypothesis and calculate # the test function T(r) for the data pattern (mpp) and each simulation. # The command below specifies that the test function is T(r) = \hat{L}_mm(r), # which is an estimator of the mark-weighted L function, L_mm(r), # with translational edge correction. env <- envelope(mpp, fun=Kmark, nsim = 1999, f=function(m1, m2) { m1*m2 }, correction="translate", returnL=TRUE, simulate=expression(rlabel(mpp, permute=TRUE)), # Permute the marks savefuns=TRUE) # Save the functions # 2) # Crop curves to desired r-interval curve_set <- crop_curves(env, r_min=1.5, r_max=9.5) # Center the functions, i.e. take \hat{L}_mm(r)-T_0(r). # Below T_0(r) = \hat{L}(r) is the mean of simulated functions. # (This is only for visualization, does not affect the test result.) curve_set <- residual(curve_set) # 3) Do the rank envelope test res <- global_envelope_test(curve_set) # 4) Plot the test result plot(res, plot_style="ggplot2", ylab=expression(italic(L[m](r)-L(r)))) ## Not run: ## Goodness-of-fit test (typically conservative, see dg.global_envelope for adjusted tests) #----------------------------------------------- pp <- unmark(spruces) # Minimum distance between points in the pattern min(nndist(pp)) # Fit a model fittedmodel <- ppm(pp, interaction=Hardcore(hc=1)) # Hardcore process # Simulating Gibbs process by 'envelope' is slow, because it uses the MCMC algorithm #env <- envelope(fittedmodel, fun="Jest", nsim=999, savefuns=TRUE, # correction="none", r=seq(0, 4, length=500)) # Using direct algorihm can be faster, because the perfect simulation is used here. simulations <- NULL for(j in 1:999) { simulations[[j]] <- rHardcore(beta=exp(fittedmodel$coef[1]), R=fittedmodel$interaction$par$hc, W=pp$window) if(j%%10==0) cat(j, "...", sep="") } env <- envelope(pp, simulate=simulations, fun="Jest", nsim=length(simulations), savefuns=TRUE, correction="none", r=seq(0, 4, length=500)) curve_set <- crop_curves(env, r_min=1, r_max=3.5) res <- global_envelope_test(curve_set, type="erl"); plot(res, plot_style="ggplot2") ## End(Not run) # A combined global envelope test #-------------------------------- # As an example test CSR of the saplings point pattern by means of # L, F, G and J functions. data(saplings) X <- saplings nsim <- 499 # the number of simulations for the tests # Specify distances for different test functions n <- 500 # the number of r-values rmin <- 0; rmax <- 20; rstep <- (rmax-rmin)/n rminJ <- 0; rmaxJ <- 8; rstepJ <- (rmaxJ-rminJ)/n r <- seq(0, rmax, by=rstep) # r-distances for Lest rJ <- seq(0, rmaxJ, by=rstepJ) # r-distances for Fest, Gest, Jest # Perform simulations of CSR and calculate the L-functions system.time( env_L <- envelope(X, nsim=nsim, simulate=expression(runifpoint(X$n, win=X\$window)), fun="Lest", correction="translate", transform=expression(.-r), # Take the L(r)-r function instead of L(r) r=r, # Specify the distance vector savefuns=TRUE, # Save the estimated functions savepatterns=TRUE) ) # Save the simulated patterns # Take the simulations from the returned object simulations <- attr(env_L, "simpatterns") # Then calculate the other test functions F, G, J for each simulated pattern system.time( env_F <- envelope(X, nsim=nsim, simulate=simulations, fun="Fest", correction="Kaplan", r=rJ, savefuns=TRUE) ) system.time( env_G <- envelope(X, nsim=nsim, simulate=simulations, fun="Gest", correction="km", r=rJ, savefuns=TRUE) ) system.time( env_J <- envelope(X, nsim=nsim, simulate=simulations, fun="Jest", correction="none", r=rJ, savefuns=TRUE) ) # Crop the curves to the desired r-interval I curve_set_L <- crop_curves(env_L, r_min=rmin, r_max=rmax) curve_set_F <- crop_curves(env_F, r_min=rminJ, r_max=rmaxJ) curve_set_G <- crop_curves(env_G, r_min=rminJ, r_max=rmaxJ) curve_set_J <- crop_curves(env_J, r_min=rminJ, r_max=rmaxJ) res <- global_envelope_test(curve_sets=list(curve_set_L, curve_set_F, curve_set_G, curve_set_J)) plot(res, plot_style="ggplot2", labels=c("L(r)-r", "F(r)", "G(r)", "J(r)")) } ## A test based on a low dimensional random vector #------------------------------------------------- # Let us generate some example data. X <- matrix(c(-1.6,1.6),1,2) # data pattern X=(X_1,X_2) if(requireNamespace("mvtnorm", quietly=TRUE)) { Y <- mvtnorm::rmvnorm(200,c(0,0),matrix(c(1,0.5,0.5,1),2,2)) # simulations plot(Y, xlim=c(min(X[,1],Y[,1]), max(X[,1],Y[,1])), ylim=c(min(X[,2],Y[,2]), max(X[,2],Y[,2]))) points(X, col=2) # Test the null hypothesis is that X is from the distribution of Y's (or if it is an outlier). # Case 1. The test vector is (X_1, X_2) cset1 <- create_curve_set(list(r=1:2, obs=as.vector(X), sim_m=t(Y))) res1 <- global_envelope_test(cset1) plot(res1) # Case 2. The test vector is (X_1, X_2, (X_1-mean(Y_1))*(X_2-mean(Y_2))). t3 <- function(x, y) { (x[,1]-mean(y[,1]))*(x[,2]-mean(y[,2])) } cset2 <- create_curve_set(list(r=1:3, obs=c(X[,1],X[,2],t3(X,Y)), sim_m=rbind(t(Y), t3(Y,Y)))) res2 <- global_envelope_test(cset2) plot(res2) }