central_region: Central region / Global envelope

View source: R/envelopes.r

central_regionR Documentation

Central region / Global envelope

Description

Provides central regions or global envelopes or confidence bands

Usage

central_region(
  curve_sets,
  type = "erl",
  coverage = 0.5,
  alternative = c("two.sided", "less", "greater"),
  probs = c(0.25, 0.75),
  quantile.type = 7,
  central = "median",
  nstep = 2,
  ...
)

Arguments

curve_sets

A curve_set object or a list of curve_set objects.

type

The type of the global envelope with current options for 'rank', 'erl', 'cont', 'area', 'qdir', 'st' and 'unscaled'. See details.

coverage

A number between 0 and 1. The 100*coverage% central region will be calculated. A vector of values can also be provided, leading to the corresponding number of central regions.

alternative

A character string specifying the alternative hypothesis. Must be one of the following: "two.sided" (default), "less" or "greater". The last two options only available for types 'rank', 'erl', 'cont' and 'area'.

probs

A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017).

quantile.type

As type argument of quantile, how to calculate quantiles for 'q' or 'qdir'.

central

Either "mean" or "median". If the curve sets do not contain the component theo for the theoretical central function, then the central function (used for plotting only) is calculated either as the mean or median of functions provided in the curve sets. For 'qdir', 'st' and 'unscaled' only the mean is allowed as an option, due to their definition.

nstep

1 or 2 for how to contruct a combined global envelope if list of curve sets is provided. 2 (default) for a two-step combining procedure, 1 for one-step.

...

Ignored.

Details

Given a curve_set (see create_curve_set for how to create such an object) or an envelope object of spatstat or fdata object of fda.usc, the function central_region construcst a central region, i.e. a global envelope, from the given set of functions (or vectors).

Generally an envelope is a band bounded by the vectors (or functions) T_lo and T_hi. A 100(1-alpha)% or 100*coverage% global envelope is a set (T_lo, T_hi) of envelope vectors such that the probability that T_i falls outside this envelope in any of the d points of the vector T_i is less or equal to alpha. The global envelopes can be constructed based on different measures that order the functions from the most extreme one to the least extreme one. We use such orderings of the functions for which we are able to construct global envelopes with intrinsic graphical interpretation.

The type of the global envelope can be chosen with the argument type and the options are given in the following. Further information about the measures, on which the global envelopes are based, can be found in Myllymäki and Mrkvička (2020, Section 2.).

  • 'rank': The global rank envelope proposed by Myllymäki et al. (2017) based on the extreme rank defined as the minimum of pointwise ranks.

  • 'erl': The global rank envelope based on the extreme rank length (Myllymäki et al.,2017, Mrkvička et al., 2018). This envelope is constructed as the convex hull of the functions which have extreme rank length measure that is larger or equal to the critical alpha level of the extreme rank length measure.

  • 'cont': The global rank envelope based on the continuous rank (Hahn, 2015; Mrkvička et al., 2019) based on minimum of continuous pointwise ranks. It is contructed as the convex hull in a similar way as the 'erl' envelope.

  • 'area': The global rank envelope based on the area rank (Mrkvička et al., 2019) which is based on area between continuous pointwise ranks and minimum pointwise ranks for those argument (r) values for which pointwise ranks achieve the minimum (it is a combination of erl and cont). It is contructed as the convex hull in a similar way as the 'erl' and 'area' envelopes.

  • 'qdir': The directional quantile envelope based on the directional quantile maximum absolute deviation (MAD) test (Myllymäki et al., 2017, 2015), which takes into account the unequal variances of the test function T(r) for different distances r and is also protected against asymmetry of distribution of T(r).

  • 'st': The studentised envelope based on the studentised MAD measure (Myllymäki et al., 2017, 2015), which takes into account the unequal variances of the test function T(r) for different distances r.

  • 'unscaled': The unscaled envelope (Ripley, 1981), which leads to envelopes with constant width. It corresponds to the classical maximum deviation test without scaling. This test suffers from unequal variance of T(r) over the distances r and from the asymmetry of distribution of T(r). We recommend to use the other alternatives instead. This unscaled global envelope is provided for reference.

The values of the chosen measure M are determined for each curve in the curve_set, and based on the chosen measure, the central region, i.e. the global envelope, is constructed for the given curves.

If a list of (suitable) objects are provided in the argument curve_sets, then by default (nstep = 2) the two-step combining procedure is used to construct the combined global envelope as described in Myllymäki and Mrkvička (2020, Section 2.2.). If nstep = 1 and the lengths of the multivariate vectors in each component of the list are equal, then the one-step combining procedure is used where the functions are concatenated together into a one long vector (see again Myllymäki and Mrkvička, 2020, Section 2.2.).

Value

Either an object of class global_envelope and or an combined_global_envelope object. The former class is obtained when a set of curves is provided, while the latter in the case that curve_sets is a list of objects. The print and plot function are defined for the returned objects (see examples).

The global_envelope object is essentially a data frame containing columns

  • r = the vector of values of the argument r at which the test was made

  • lo = the lower envelope based on the simulated functions; in case of a vector of coverage values, several 'lo' exist with names paste0("lo.", 100*coverage)

  • hi = the upper envelope based on the simulated functions; in case of a vector of coverage values, several 'lo' exist with names paste0("hi.", 100*coverage)

  • central = If the curve_set (or envelope object) contains a theoretical curve, then this function is used as the central curve and returned in this component. Otherwise, the central curve is the mean or median (according to the argument central) of the test functions T_i(r), i=2, ..., s+1. Used for visualization only.

and potentially additionally

  • obs = the data function, if there is only one data function in the given curve_sets. Otherwise not existing.

(Most often central_region is directly applied to functional data where all curves are observed.) Additionally, the returned object has some attributes, where

  • M = A vector of the values of the chosen measure for all the function. If there is only one observed function, then M[1] gives the value of the measure for this.

  • M_alpha = The critical value of M corresponding to the 100(1-alpha)% global envelope (see Myllymäki and Mrkvička, 2020, Definition 1.1. IGI).

Further the object has some attributes for printing and plotting purposes, where alternative, type, ties, alpha correspond to those in the function call and method gives a name for the method. Attributes of an object res can be obtained using the function attr, e.g. attr(res, "M") for the values of the ordering measure.

If the given set of curves had the class envelope of spatstat, then the returned global_envelope object has also the class fv of spatstat, whereby one can utilize also the plotting functions of spatstat, see example in plot.global_envelope. However, the envelope objects are most often used with global_envelope_test and not with central_region. For an fv object, also some further attributes exists as required by fv of spatstat.

The combined_global_envelope is a list of global_envelope objects, where the components correspond to the components of curve_sets. The combined_global_envelope object constructed with nstep = 2 contains, in addition to some conventional ones (method, alternative, type, alpha, M, M_alpha, see above), the second level envelope information as the attributes

  • level2_ge = The second level envelope on which the envelope construction is based

  • level2_curve_set = The second level curve_set from which level2_ge is constructed

In the case that the given curve sets are two-dimensional, i.e., their arguments values are two-dimensional, then the returned objects have in addition to the class global_envelope or combined_global_envelope, the class global_envelope2d or combined_global_envelope2d, respectively. This class is assigned for plotting purposes: For the 2d envelopes, also the default plots are 2d. Otherwise the 1d and 2d objects are similar.

References

Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432

Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236

Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004

Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381-404. doi: 10.1111/rssb.12172

Myllymäki, M. and Mrkvička, T. (2020). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]

Ripley, B.D. (1981). Spatial statistics. Wiley, New Jersey.

See Also

forder, global_envelope_test

Examples

## A central region of a set of functions
#----------------------------------------
if(requireNamespace("fda", quietly=TRUE)) {
  curve_set <- create_curve_set(list(r=as.numeric(row.names(fda::growth$hgtf)),
                                     obs=fda::growth$hgtf))
  plot(curve_set) + ggplot2::ylab("height")
  cr <- central_region(curve_set, coverage=0.50, type="erl")
  plot(cr)
}

## Confidence bands for linear or polynomial regression
#------------------------------------------------------
# Simulate regression data according to the cubic model
# f(x) = 0.8x - 1.8x^2 + 1.05x^3 for x in [0,1]
par <- c(0,0.8,-1.8,1.05) # Parameters of the true polynomial model
res <- 100 # Resolution
x <- seq(0, 1, by=1/res); x2=x^2; x3=x^3;
f <- par[1] + par[2]*x + par[3]*x^2 + par[4]*x^3 # The true function
d <- f + rnorm(length(x), 0, 0.04) # Data
# Plot the true function and data
plot(f, type="l", ylim=range(d))
points(d)

# Estimate polynomial regression model
reg <- lm(d ~ x + x2 + x3)
ftheta <- reg$fitted.values
resid0 <- reg$residuals
s0 <- sd(resid0)

# Bootstrap regression
B <- 2000 # Number of bootstrap samples

ftheta1 <- array(0, c(B,length(x)))
s1 <- array(0,B)
for(i in 1:B) {
  u <- sample(resid0, size=length(resid0), replace=TRUE)
  reg1 <- lm((ftheta+u) ~ x + x2 + x3)
  ftheta1[i,] <- reg1$fitted.values
  s1[i] <- sd(reg1$residuals)
}

# Centering and scaling
meanftheta <- apply(ftheta1, 2, mean)
m <- array(0, c(B,length(x)))
for(i in 1:B) { m[i,] <- (ftheta1[i,]-meanftheta)/s1[i] }

# Central region computation
boot.cset <- create_curve_set(list(r=1:length(x), obs=ftheta+s0*t(m)))
cr <- central_region(boot.cset, coverage=0.95, type="erl")

# Plotting the result
plot(cr) + ggplot2::labs(x=expression(italic(x)), y=expression(italic(f(x)))) +
  ggplot2::geom_point(data=data.frame(id=1:length(d), points=d),
                      ggplot2::aes(x=id, y=points)) + # data points
  ggplot2::geom_line(data=data.frame(id=1:length(d), points=f),
                     ggplot2::aes(x=id, y=points)) # true function

GET documentation built on Nov. 16, 2022, 5:09 p.m.