global_envelope_test: Global envelope test

Description Usage Arguments Details Value Ranking of the curves Global envelope P-values Number of simulations References See Also Examples

View source: R/envelopes.r

Description

Global envelope test, p-values and global envelopes

Usage

1
2
3
global_envelope_test(curve_sets, type = "erl", alpha = 0.05,
  alternative = c("two.sided", "less", "greater"), ties = "erl",
  probs = c(0.025, 0.975), central = "mean")

Arguments

curve_sets

A curve_set (see create_curve_set) or an envelope object containing a data function and simulated functions. If an envelope object is given, it must contain the summary functions from the simulated patterns which can be achieved by setting savefuns = TRUE when calling envelope. Alternatively, a list of curve_set or envelope objects can be given.

type

The type of the global envelope with current options for 'rank', 'erl', 'cont', 'area', 'qdir', 'st' and 'unscaled'. See details.

alpha

The significance level. The 100(1-alpha)% global envelope will be calculated.

alternative

A character string specifying the alternative hypothesis. Must be one of the following: "two.sided" (default), "less" or "greater". The last two options only available for types 'rank', 'erl', 'cont' and 'area'.

ties

The method to obtain a unique p-value when type = 'rank'. Possible values are 'midrank', 'random', 'conservative', 'liberal' and 'erl'. For 'conservative' the resulting p-value will be the highest possible. For 'liberal' the p-value will be the lowest possible. For 'random' the rank of the obs within the tied values is uniformly sampled so that the resulting p-value is at most the conservative option and at least the liberal option. For 'midrank' the mid-rank within the tied values is taken. For 'erl' the extreme rank length p-value is calculated. The default is 'erl'.

probs

A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017).

central

Either "mean" or "median". If the curve sets do not contain the component theo for the theoretical central function, then the central function (used for plotting only) is calculated either as the mean or median of functions provided in the curve sets.

Details

Given a curve_set (see create_curve_set for how to create such an object) or an envelope object, which contains both the data curve (or function or vector) T_1(r) (in the component obs) and the simulated curves T_2(r),...,T_(s+1)(r) (in the component sim_m), the function global_envelope_test performs a global envelope test. The functionality of the function is rather similar to the function central_region, but in addition to ordering the functions from the most extreme one to the least extreme one using different measures and providing the global envelopes with intrinsic graphical interpretation, p-values are calculated for the test. Thus, while central_region can be used to construct global envelopes in a general setting, the function global_envelope_test is devoted to testing as its name suggests.

The function global_envelope_test is the main function for global envelope tests using single functions (for simple hypotheses). Different type of global envelope tests can be performed. We use such ordering of the functions for which we are able to construct global envelopes with intrinsic graphical interpretation.

See forder and central_region and the references for more detailed description of the measures and the corresponding envelopes.

Value

An object of class "global_envelope", "envelope" and "fv" (see fv.object), which can be printed and plotted directly.

Essentially a data frame containing columns

Moreover, the return value has the same attributes as the object returned by central_region and in addition

and in the case that type = 'rank' also

Ranking of the curves

The options for measures to order the functions from the most extreme one to the least extreme one are given by the argument type: 'rank', 'erl', 'cont', 'area', 'qdir', 'st', 'unscaled'. The options are

See more detailed description of the envelopes and measures in central_region and forder.

Global envelope

Based on the measures used to rank the functions, the 100(1-alpha)% global envelope is provided. It corresponds to the 100*coverage% central region.

P-values

In the case type="rank", based on the extreme ranks k_i, i=1, ..., s+1, the p-interval is calculated. Because the extreme ranks contain ties, there is not just one p-value. The p-interval is given by the most liberal and the most conservative p-value estimate. Also a single p-value is calculated. By default this single p-value is the extreme rank length p-value ("erl"), but another option can be used by specifying ties argument.

If the case type="erl", the (single) p-value based on the extreme rank length ordering of the functions is calculated and returned in the attribute p. The same is done for other measures, the p-value always being correspondent to the chosen measure.

Number of simulations

The tests 'erl', 'cont' and 'area', similarly as the MAD deviation/envelope tests 'qdir', 'st' and 'unscaled', allow in principle a lower number of simulations to be used than the test based on extreme ranks ('rank'), because no ties occur for these measures. However, if affordable, we recommend some thousands of simulations in any case to achieve a good power and repeatability of the test.

References

Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381–404. doi: 10.1111/rssb.12172

Mrkvička, T., Myllymäki, M. and Hahn, U. (2017). Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27 (5): 1239-1255. doi: 10.1007/s11222-016-9683-9

Mrkvička, T., Hahn, U. and Myllymäki, M. (2018). A one-way ANOVA test for functional data with graphical interpretation. arXiv:1612.03608 [stat.ME]

Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11: 19-34. doi: 10.1016/j.spasta.2014.11.004

Ripley, B.D. (1981). Spatial statistics. Wiley, New Jersey.

See Also

plot.global_envelope

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
if(require(spatstat, quietly=TRUE)) {
  ## Testing complete spatial randomness (CSR)
  #-------------------------------------------
  pp <- unmark(spruces)
  # Generate nsim simulations under CSR, calculate L-function for the data and simulations
  env <- envelope(pp, fun="Lest", nsim=1999,
                  savefuns=TRUE, # save the functions
                  correction="translate", # edge correction for L
                  simulate=expression(runifpoint(pp$n, win=pp$window))) # Simulate CSR
  # The rank envelope test (extreme rank length (ERL) breaking of ties)
  res <- global_envelope_test(env)
  # Plot the result.
  # - The central curve is now obtained from env[['theo']], which is the
  # value of the L-function under the null hypothesis (L(r) = r).
  plot(res)
  # or (requires R library ggplot2)
  plot(res, plot_style="ggplot2")

  ## Advanced use:
  # Choose the interval of distances [r_min, r_max] (at the same time create a curve_set from 'env')
  curve_set <- crop_curves(env, r_min = 1, r_max = 7)
  # For better visualisation, take the L(r)-r function
  curve_set <- residual(curve_set, use_theo = TRUE)
  # Do the rank envelope test (erl)
  res <- global_envelope_test(curve_set); plot(res, plot_style="ggplot2")

  ## Random labeling test
  #----------------------
  if(require(marksummary, quietly=TRUE)) {
  mpp <- spruces
  # 1) Perform simulations under the random labelling hypothesis and calculate
  # the test function T(r) for the data pattern (mpp) and each simulation.
  # The command below specifies that the test function is T(r) = \hat{L}_m(r),
  # which is an estimator of the mark-weighted L function, L_m(r),
  # with translational edge correction (default).
  # The random_labelling function returns the centred functions \hat{L}_m(r)-T_0(r),
  # where T_0(r) = \hat{L}(r) is the unmarked L function.
  curve_set <- random_labelling(mpp, mtf_name = 'm', nsim=2499, r_min=1.5, r_max=9.5)
  # 2) Do the rank envelope test
  res <- global_envelope_test(curve_set)
  # 3) Plot the test result
  plot(res, plot_style="ggplot2", ylab=expression(italic(L[m](r)-L(r))))

  # Make the test using instead the test function T(r) = \hat{L}_mm(r);
  # which is an estimator of the mark-weighted L function, L_mm(r),
  # with translational edge correction (default).
  curve_set <- random_labelling(mpp, mtf_name = 'mm', nsim=2499, r_min=1.5, r_max=9.5)
  res <- global_envelope_test(curve_set)
  plot(res, plot_style="ggplot2", ylab=expression(italic(L[mm](r)-L(r))))
  }

  ## Not run: 
  ## Goodness-of-fit test (typically conservative, see dg.global_envelope for adjusted tests)
  #-----------------------------------------------
  pp <- unmark(spruces)
  # Minimum distance between points in the pattern
  min(nndist(pp))
  # Fit a model
  fittedmodel <- ppm(pp, interaction=Hardcore(hc=1)) # Hardcore process

  # Simulating Gibbs process by 'envelope' is slow, because it uses the MCMC algorithm
  #env <- envelope(fittedmodel, fun="Jest", nsim=999, savefuns=TRUE,
                   correction="none", r=seq(0, 4, length=500))

  # Using direct algorihm can be faster, because the perfect simulation is used here.
  simulations <- NULL
  for(j in 1:999) {
     simulations[[j]] <- rHardcore(beta=exp(fittedmodel$coef[1]),
                                   R = fittedmodel$interaction$par$hc,
                                   W = pp$window)
     if(j%%10==0) cat(j, "...", sep="")
  }
  env <- envelope(pp, simulate=simulations, fun="Jest", nsim=length(simulations),
                  savefuns=TRUE, correction="none", r=seq(0, 4, length=500))
  curve_set <- crop_curves(env, r_min = 1, r_max = 3.5)
  res <- global_envelope_test(curve_set, type="erl"); plot(res, plot_style="ggplot2")
  
## End(Not run)

  # A combined global envelope test
  #--------------------------------
  # As an example test CSR of the saplings point pattern by means of
  # L, F, G and J functions.
  data(saplings)
  X <- saplings

  nsim <- 499 # the number of simulations for the tests
  # Specify distances for different test functions
  n <- 500 # the number of r-values
  rmin <- 0; rmax <- 20; rstep <- (rmax-rmin)/n
  rminJ <- 0; rmaxJ <- 8; rstepJ <- (rmaxJ-rminJ)/n
  r <- seq(0, rmax, by=rstep)    # r-distances for Lest
  rJ <- seq(0, rmaxJ, by=rstepJ) # r-distances for Fest, Gest, Jest

  # Perform simulations of CSR and calculate the L-functions
  system.time( env_L <- envelope(X, nsim=nsim,
   simulate=expression(runifpoint(X$n, win=X$window)),
   fun="Lest", correction="translate",
   transform = expression(.-r), # Take the L(r)-r function instead of L(r)
   r=r,                         # Specify the distance vector
   savefuns=TRUE,               # Save the estimated functions
   savepatterns=TRUE) )         # Save the simulated patterns
  # Take the simulations from the returned object
  simulations <- attr(env_L, "simpatterns")
  # Then calculate the other test functions F, G, J for each simulated pattern
  system.time( env_F <- envelope(X, nsim=nsim,
                                 simulate=simulations,
                                 fun="Fest", correction="Kaplan", r=rJ,
                                 savefuns=TRUE) )
  system.time( env_G <- envelope(X, nsim=nsim,
                                 simulate=simulations,
                                 fun="Gest", correction="km", r=rJ,
                                 savefuns=TRUE) )
  system.time( env_J <- envelope(X, nsim=nsim,
                                 simulate=simulations,
                                 fun="Jest", correction="none", r=rJ,
                                 savefuns=TRUE) )

  # Crop the curves to the desired r-interval I
  curve_set_L <- crop_curves(env_L, r_min=rmin, r_max=rmax)
  curve_set_F <- crop_curves(env_F, r_min=rminJ, r_max=rmaxJ)
  curve_set_G <- crop_curves(env_G, r_min=rminJ, r_max=rmaxJ)
  curve_set_J <- crop_curves(env_J, r_min=rminJ, r_max=rmaxJ)

  res <- global_envelope_test(curve_sets=list(curve_set_L, curve_set_F,
                                              curve_set_G, curve_set_J))
  plot(res, plot_style="ggplot2",
       labels=c("L(r)-r", "F(r)", "G(r)", "J(r)"),
       separate_yaxes=TRUE, base_size=12)
}

## A test based on a low dimensional random vector
#-------------------------------------------------
# Let us generate some example data.
X <- matrix(c(-1.6,1.6),1,2) # data pattern X=(X_1,X_2)
if(requireNamespace("mvtnorm", quietly = TRUE)) {
  Y <- mvtnorm::rmvnorm(200,c(0,0),matrix(c(1,0.5,0.5,1),2,2)) # simulations
  plot(Y, xlim=c(min(X[,1],Y[,1]), max(X[,1],Y[,1])), ylim=c(min(X[,2],Y[,2]), max(X[,2],Y[,2])))
  points(X, col=2)

  # Test the null hypothesis is that X is from the distribution of Y's (or if it is an outlier).

  # Case 1. The test vector is (X_1, X_2)
  cset1 <- create_curve_set(list(r=1:2, obs=as.vector(X), sim_m=t(Y)))
  res1 <- global_envelope_test(cset1)
  plot(res1)

  # Case 2. The test vector is (X_1, X_2, (X_1-mean(Y_1))*(X_2-mean(Y_2))).
  t3 <- function(x, y) { (x[,1]-mean(y[,1]))*(x[,2]-mean(y[,2])) }
  cset2 <- create_curve_set(list(r=1:3, obs=c(X[,1],X[,2],t3(X,Y)), sim_m=rbind(t(Y), t3(Y,Y))))
  res2 <- global_envelope_test(cset2)
  plot(res2)
}

myllym/GET documentation built on Sept. 30, 2018, 5:49 a.m.