GET.composite: Adjusted global envelope tests

View source: R/adjusted_envelopes.r

GET.compositeR Documentation

Adjusted global envelope tests

Description

Adjusted global envelope tests for composite null hypothesis.

Usage

GET.composite(
  X,
  X.ls = NULL,
  nsim = 499,
  nsimsub = nsim,
  simfun = NULL,
  fitfun = NULL,
  calcfun = function(X) {
     X
 },
  testfuns = NULL,
  ...,
  type = "erl",
  alpha = 0.05,
  alternative = c("two.sided", "less", "greater"),
  probs = c(0.025, 0.975),
  r_min = NULL,
  r_max = NULL,
  take_residual = FALSE,
  save.cons.envelope = savefuns,
  savefuns = FALSE,
  verbose = TRUE,
  MrkvickaEtal2017 = FALSE,
  mc.cores = 1L
)

Arguments

X

An object containing the data in some form. A curve_set (see create_curve_set) or an envelope object (of the spatstat package), as the curve_sets argument of global_envelope_test (need to provide X.ls), or a fitted point process model of spatstat (e.g. object of class ppm or kppm), or a point pattern object of class ppp of spatstat, or another data object (need to provide simfun, fitfun, calcfun).

X.ls

A list of objects as curve_sets argument of global_envelope_test, giving the second stage simulations, see details.

nsim

The number of simulations to be generated in the primary test. Ignored if X.ls provided.

nsimsub

Number of simulations in each basic test. There will be nsim repetitions of the basic test, each involving nsimsub simulated realisations. Total number of simulations will be nsim * (nsimsub + 1).

simfun

A function for generating simulations from the null model. If given, this function is called by replicate(n=nsim, simfun(simfun.arg), simplify=FALSE) to make nsim simulations. Here simfun.arg is obtained by fitfun(X).

fitfun

A function for estimating the parameters of the null model. The function should return the fitted model in the form that it can be directly passed to simfun as its argument.

calcfun

A function for calculating a summary function from a simulation of the model. The default is the identity function, i.e. the simulations from the model are functions themselves. The use of calcfun is still experimental. Preferably provide X and X.ls instead, if X is not a point pattern or fitted point process model object of spatstat.

testfuns

A list of lists of parameters to be passed to the envelope function of spatstat if X is a point pattern of a fitted point process model of spatstat. A list of parameters should be provided for each test function that is to be used in the combined test.

...

Additional parameters to the envelope function of spatstat in the case where only one test function is used. In that case, this is an alternative to providing the parameters in the argument testfuns. If envelope is also used to generate simulations under the null hypothesis (if simfun not provided), then also recall to specify how to generate the simulations.

type

The type of the global envelope with current options for 'rank', 'erl', 'cont', 'area', 'qdir', 'st' and 'unscaled'. See details.

alpha

The significance level. The 100(1-alpha)% global envelope will be calculated. If a vector of values is provided, the global envelopes are calculated for each value.

alternative

A character string specifying the alternative hypothesis. Must be one of the following: "two.sided" (default), "less" or "greater". The last two options only available for types 'rank', 'erl', 'cont' and 'area'.

probs

A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017).

r_min

The minimum argument value to include in the test.

r_max

The maximum argument value to include in the test. in calculating functions by the envelope function of spatstat.

take_residual

Logical. If TRUE (needed for visual reasons only) the mean of the simulated functions is reduced from the functions in each first and second stage test.

save.cons.envelope

Logical flag indicating whether to save the unadjusted envelope test results.

savefuns

Logical flag indicating whether to save all the simulated function values. Similar to the same argument of the envelope function of spatstat.

verbose

Logical flag indicating whether to print progress reports during the simulations. Similar to the same argument of envelope function of spatstat.

MrkvickaEtal2017

Logical. If TRUE, type is "st" or "qdir" and several test functions are used, then the combined scaled MAD envelope presented in Mrkvička et al. (2017) is calculated. Otherwise, the two-step procedure described in global_envelope_test is used for combining the tests. Default to FALSE. The option is kept for historical reasons.

mc.cores

The number of cores to use, i.e. at most how many child processes will be run simultaneously. Must be at least one, and parallelization requires at least two cores. On a Windows computer mc.cores must be 1 (no parallelization). For details, see mclapply, for which the argument is passed. Parallelization can be used in generating simulations and in calculating the second stage tests.

Details

The specification of X, X.ls, fitfun, simfun is important:

  • If X.ls is provided, then the global envelope test is calculated based on functions in these objects. X should be a curve_set (see create_curve_set) or an envelope object of spatstat including the observed function and simulations from the tested model. X.ls should be a list of curve_set or envelope (of R package spatstat) objects, where each component contains an "observed" function f that has been simulated from the model fitted to the data and the simulations that have been obtained from the same model that has been fitted to the "observed" f. The user has the responsibility that the functions have been generated correctly, the test is done based on these provided simulations. See the examples.

  • Otherwise, if simfun and fitfun are provided, X can be general. The function fitfun is used for fitting the desired model M and the function simfun for simulating from a fitted model M. These functions should be coupled with each other such that the object returned by fitfun is directly accepted as the (single) argument in simfun. In the case, that the global envelope should not be calculated directly for X (X is not a function), calcfun can be used to specify how to calculate the function from X and from simulations generated by simfun. Special attention is needed in the correct specification of the functions, see examples.

  • Otherwise, X should be either a fitted (point process) model object or a ppp object of the R package spatstat.

    • If X is a fitted (point process) model object of the R package spatstat, then the simulations from this model are generated and summary functions for testing calculated by the envelope function of spatstat. Which summary function to use and how to calculate it, can be passed to envelope either in ... or testfuns. Unless otherwise specified the default function of envelope, i.g. the K-function, is used. The argument testfuns should be used to specify the test functions in the case where one wants to base the test on several test functions.

    • If X is a ppp object of spatstat, then the envelope function is used for simulations and model fitting and the complete spatial randomness (CSR) is tested (with intensity parameter).

For the rank envelope test, the global envelope test is the test described in Myllymäki et al. (2017) with the adjustment of Baddeley et al. (2017). For other test types, the test (also) uses the two-stage procedure of Dao and Genton (2014) with the adjustment of Baddeley et al. (2017) as descripbed in Myllymäki and Mrkvička (2020).

See examples also in saplings.

Value

An object of class global_envelope or combined_global_envelope, which can be printed and plotted directly. See global_envelope_test.

References

Baddeley, A., Hardegen, A., Lawrence, T., Milne, R. K., Nair, G. and Rakshit, S. (2017). On two-stage Monte Carlo tests of composite hypotheses. Computational Statistics and Data Analysis 114: 75-87. doi: http://dx.doi.org/10.1016/j.csda.2017.04.003

Dao, N.A. and Genton, M. (2014). A Monte Carlo adjusted goodness-of-fit test for parametric models describing spatial point patterns. Journal of Graphical and Computational Statistics 23, 497-517.

Mrkvička, T., Myllymäki, M. and Hahn, U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27(5): 1239-1255. DOI: 10.1007/s11222-016-9683-9

Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381-404. doi: 10.1111/rssb.12172

Myllymäki, M. and Mrkvička, T. (2020). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]

See Also

global_envelope_test, plot.global_envelope, saplings

Examples

# Graphical normality test (Myllymaki and Mrkvicka, 2020, Section 3.3.)
#=========================
if(require("fda.usc", quietly=TRUE)) {
  data("poblenou")
  dat <- poblenou[['nox']][['data']][,'H10']
  n <- length(dat)

  # The number of simulations
  nsim <- nsimsub <- 199
  

  set.seed(200127)
  # General setup
  #==============
  # 1. Fit the model
  mu <- mean(dat)
  sigma <- sd(dat)
  # 2. Simulate a sample from the fitted null model and
  #    compute the test vectors for data (obs) and each simulation (sim)
  r <- seq(min(dat), max(dat), length=100)
  obs <- stats::ecdf(dat)(r)
  sim <- sapply(1:nsimsub, function(i) {
    x <- rnorm(n, mean = mu, sd = sigma)
    stats::ecdf(x)(r)
  })
  cset <- create_curve_set(list(r = r, obs = obs, sim_m = sim))

  # 3. Simulate another sample from the fitted null model.
  # 4. Fit the null model to each of the patterns,
  #    simulate a sample from the null model,
  #    and compute the test vectors for all.
  cset.ls <- vector("list", nsim)
  for(rep in 1:nsim) {
    x <- rnorm(n, mean = mu, sd = sigma)
    mu2 <- mean(x)
    sigma2 <- sd(x)
    obs2 <- stats::ecdf(x)(r)
    sim2 <- sapply(1:nsimsub, function(i) {
      x2 <- rnorm(n, mean = mu2, sd = sigma2)
      stats::ecdf(x2)(r)
    })
    cset.ls[[rep]] <- create_curve_set(list(r = r, obs = obs2, sim_m = sim2))
  }
  # Perform the adjusted test
  res <- GET.composite(X = cset, X.ls = cset.ls, type = 'erl')
  plot(res) + ggplot2::labs(x = "NOx", y = "Ecdf")
}

# Example of a point pattern data
#================================
# Test the fit of a Matern cluster process.

if(require("spatstat.model", quietly=TRUE)) {
  data("saplings")
  saplings <- as.ppp(saplings, W = square(75))

  # First choose the r-distances
  rmin <- 0.3; rmax <- 10; rstep <- (rmax-rmin)/500
  r <- seq(0, rmax, by = rstep)

  # Number of simulations
  nsim <- 19 # Increase nsim for serious analysis!

  # Option 1: Give the fitted model object to GET.composite
  #--------------------------------------------------------
  # This can be done and is preferable when the model is
  # a point process model of spatstat.
  # 1. Fit the Matern cluster process to the pattern
  # (using minimum contrast estimation with the K-function)
  M1 <- kppm(saplings~1, clusters = "MatClust", statistic = "K")
  summary(M1)
  # 2. Make the adjusted global area rank envelope test using the L(r)-r function
  adjenvL <- GET.composite(X = M1, nsim = nsim,
              testfuns = list(L =list(fun="Lest", correction="translate",
                            transform=expression(.-r), r=r)), # passed to envelope
              type = "area", r_min = rmin, r_max = rmax)
  # Plot the test result
  plot(adjenvL)

  # Option 2: Generate the simulations "by yourself"
  #-------------------------------------------------
  # and provide them as curve_set or envelope objects
  # Preferable when you want to have a control
  # on the simulations yourself.
  # 1. Fit the model
  M1 <- kppm(saplings~1, clusters = "MatClust", statistic = "K")
  # 2. Generate nsim simulations by the given function using the fitted model
  X <- envelope(M1, nsim = nsim, savefuns = TRUE,
                fun = "Lest", correction = "translate",
                transform = expression(.-r), r = r)
  plot(X)
  # 3. Create another set of simulations to be used to estimate
  # the second-state p-value (as proposed by Baddeley et al., 2017).
  simpatterns2 <- simulate(M1, nsim = nsim)
  # 4. Calculate the functions for each pattern
  simf <- function(rep) {
    # Fit the model to the simulated pattern Xsims[[rep]]
    sim_fit <- kppm(simpatterns2[[rep]], clusters = "MatClust", statistic = "K")
    # Generate nsim simulations from the fitted model
    envelope(sim_fit, nsim = nsim, savefuns = TRUE,
             fun = "Lest", correction = "translate",
             transform = expression(.-r), r = r)
  }
  X.ls <- parallel::mclapply(X = 1:nsim, FUN = simf, mc.cores = 1) # list of envelope objects
  # 5. Perform the adjusted test
  res <- GET.composite(X = X, X.ls = X.ls, type = "area",
                      r_min = rmin, r_max = rmax)
  plot(res)
}

GET documentation built on Sept. 29, 2023, 5:06 p.m.