GET-package: Global Envelopes

GET-packageR Documentation

Global Envelopes

Description

The GET package provides implementation of global envelopes for a set of general d-dimensional vectors T in various applications. A 100(1-alpha) the probability that T falls outside this envelope in any of the d points is equal to alpha. Global means that the probability is controlled simultaneously for all the d elements of the vectors. The global envelopes can be used for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot), for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional ANOVA, functional GLM, n-sample test of correspondence of distribution functions), and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction).

Details

The GET package provides central regions (i.e. global envelopes) and global envelope tests with intrinsic graphical interpretation. The central regions can be constructed from (functional) data. The tests are Monte Carlo or permutation tests, which demand simulations from the tested null model. The methods are applicable for any multivariate vector data and functional data (after discretization).

To get an overview of the package, start R and type library("GET") and vignette("GET").

To get examples of point pattern analysis, start R and type library("GET") and vignette("pointpatterns").

To get examples of Mrkvička and Myllymäki (2022), start R and type library("GET") and vignette("FDRenvelopes").

Key functions in GET

  • Central regions or global envelopes or confidence bands: central_region. E.g. 50% central region of growth curves of girls growth.

    • First create a curve_set of the growth curves, e.g.

      cset <- curve_set(r = as.numeric(row.names(growth$hgtf)), obs = growth$hgtf)

    • Then calculate 50% central region (see central_region for further arguments)

      cr <- central_region(cset, coverage = 0.5)

    • Plot the result (see plot.global_envelope for plotting options)

      plot(cr)

    It is also possible to do combined central regions for several sets of curves provided in a list for the function, see examples in central_region.

  • Global envelope tests: global_envelope_test is the main function. E.g. A test of complete spatial randomness (CSR) for a point pattern X:

    X <- spruces # an example pattern from spatstat

    • Use the function envelope of spatstat to create nsim simulations under CSR and to calculate the functions you want (below K-functions by Kest). Important: use the option 'savefuns=TRUE' and specify the number of simulations nsim.

      env <- envelope(X, nsim=999, savefuns = TRUE, fun = Kest, simulate = expression(runifpoint(ex = X)))

    • Perform the test (see global_envelope_test for further arguments)

      res <- global_envelope_test(env)

    • Plot the result (see plot.global_envelope for plotting options)

      plot(res)

    It is also possible to do combined global envelope tests for several sets of curves provided in a list for the function, see examples in global_envelope_test. To obtain false discovery rate envelopes of Mrkvička and Myllymäki (2023) use the argument typeone = "fdr".

  • Functional ordering: central_region and global_envelope_test are based on different measures for ordering the functions (or vectors) from the most extreme to the least extreme ones. The core functionality of calculating the measures is in the function forder, which can be used to obtain different measures for sets of curves. Usually there is no need to call forder directly.

  • Functional boxplots: fBoxplot

  • Adjusted global envelope tests for composite null hypotheses

    • GET.composite, see a detailed example in saplings

  • One-way functional ANOVA:

    • Graphical functional ANOVA tests: graph.fanova

    • Global rank envelope based on F-values: frank.fanova

  • Functional general linear model (GLM):

    • Graphical functional GLM: graph.flm

    • Global rank envelope based on F-values: frank.flm

    • For large data (not fitting comfortably in memory): partial_forder

  • Functional clustering: fclustering

  • Global quantile regression: global_rq

  • Functions for performing global envelopes for other specific purposes:

    • Graphical n sample test of correspondence of distribution functions: GET.distrequal

    • Permutation-based tests of independence to samples from any bivariate distribution: GET.distrindep

    • Testing global and local dependence of point patterns on covariates: GET.spatialF

    • Testing local correlations: GET.localcor

    • Variogram and residual variogram with global envelopes: GET.variogram

  • Deviation tests (for simple hypothesis): deviation_test (no graphical interpretation)

  • Most functions accept the curves provided in a curve_set object. Use curve_set to create a curve_set object from the functions. Other formats to provide the curves to the above functions are also accepted, see the information on the help pages.

See the help files of the functions for examples.

Workflow for (single hypothesis) tests based on single functions

To perform a test you always first need to obtain the test function T(r) for your data (T_1(r)) and for each simulation (T_2(r), \dots, T_{s+1}(r)) in one way or another. Given the set of the functions T_i(r), i=1, \dots, s+1, you can perform a test by global_envelope_test.

1) The workflow when using your own programs for simulations:

  • (Fit the model and) Create s simulations from the (fitted) null model.

  • Calculate the functions T_1(r), T_2(r), \dots, T_{s+1}(r).

  • Use curve_set to create a curve_set object from the functions T_i(r), i=1, \dots, s+1.

  • Perform the test

    res <- global_envelope_test(curve_set)

    where curve_set is the 'curve_set'-object you created, and plot the result

    plot(res)

2) The workflow utilizing spatstat: start R, type library("GET") and vignette("pointpatterns"), which explains the workflow and gives many examples of point pattern analysis

Functions for modifying sets of functions

It is possible to modify the curve set T_1(r), T_2(r), \dots, T_{s+1}(r) for the test.

  • You can choose the interval of distances [r_{\min}, r_{\max}] by crop_curves.

  • For better visualisation, you can take T(r)-T_0(r) by residual. Here T_0(r) is the expectation of T(r) under the null hypothesis.

Example data (see references on the help pages of each data set)

  • abide_9002_23: see help page

  • adult_trees: a point pattern of adult rees

  • cgec: centred government expenditure centralization (GEC) ratios (see graph.fanova)

  • fallen_trees: a point pattern of fallen trees

  • GDPtax: GDP per capita with country groups and other covariates

  • imageset3: a simulated set of images

  • rimov: water temperature curves in 365 days of the 36 years

  • saplings: a point pattern of saplings (see GET.composite)

The data sets are used to show examples of the functions of the library.

Number of functions

If the number of functions is low, the choice of the measure (or type or depth) playes a role, as explained in vignette("GET") (Section 2.4).

Note that the recommended minimum number of simulations for the rank envelope test (Myllymäki et al., 2017) based on a single function in spatial statistics is nsim=2499. When the number of argument values is large, also larger number simulations is needed in order to have a narrow p-interval. The "erl", "cont", "area", "qdir" and "st" global envelope tests and deviation tests can be used with a lower number of simulations, although the Monte Carlo error is obviously larger with a lower number of simulations. For increasing the number of simulations, all the global rank envelopes approach the same curves.

Mrkvička et al. (2017) discussed the number of simulations for tests based on many functions.

Documentation

Myllymäki and Mrkvička (2023) provides description of the package. The material can also be found in the corresponding vignette, which is available by starting R and typing library("GET") and vignette("GET").

In the special case of spatial processes (spatial point processes, random sets), the functions are typically estimators of summary functions. The package supports the use of the R package spatstat for generating simulations and calculating estimators of the chosen summary function, but alternatively these can be done by any other way, thus allowing for any user-specified models/functions. To see examples of global envelopes for analysing point pattern data, start R, type library("GET") and vignette("pointpatterns").

Mrkvička and Myllymäki (2023) developed false discovery rate (FDR) envelopes. Examples can be found by in associated vignette: start R, and type library("GET") and vignette("pointpatterns").

Mrkvička et al. (2023a) proposed global quantile regression. An example of global quantile regression is given in the vignette vignette("QuantileRegression").

The vignette vignette("HotSpots") illustrates the methodology proposed by Mrkvička et al. (2023b) for detecting hotspots on a linear network.

Type citation("GET") to get a full list of references.

Acknowledgements

Mikko Kuronen has made substantial contributions of code. Additional contributions and suggestions from Jiří Dvořák, Pavel Grabarnik, Ute Hahn, Michael Rost and Henri Seijo.

Author(s)

Mari Myllymäki (mari.myllymaki@luke.fi, mari.j.myllymaki@gmail.com) and Tomáš Mrkvička (mrkvicka.toma@gmail.com)

References

Dai, W., Athanasiadis, S., Mrkvička, T. (2021) A new functional clustering method with combined dissimilarity sources and graphical interpretation. Intech open, London, UK. DOI: 10.5772/intechopen.100124

Dvořák, J. and Mrkvička, T. (2022). Graphical tests of independence for general distributions. Computational Statistics 37, 671–699.

Mrkvička, T., Konstantinou, K., Kuronen, M. and Myllymäki, M. (2023a) Global quantile regression. arXiv:2309.04746 [stat.ME]. https://doi.org/10.48550/arXiv.2309.04746

Mrkvička, T., Kraft, S., Blažek, V. and Myllymäki, M. (2023b) Hotspots detection on a linear network with presence of covariates: a case study on road crash data. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4598454

Mrkvička, T., Myllymäki, M. and Hahn, U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27(5), 1239-1255. doi: 10.1007/s11222-016-9683-9

Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432

Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236

Mrkvička, T., Myllymäki, M. (2023) False discovery rate envelopes. Statistics and Computing 33, 109. https://doi.org/10.1007/s11222-023-10275-7

Mrkvička, T., Roskovec, T. and Rost, M. (2021) A nonparametric graphical tests of significance in functional GLM. Methodology and Computing in Applied Probability 23, 593-612. doi: 10.1007/s11009-019-09756-y

Mrkvička, T., Soubeyrand, S., Myllymäki, M., Grabarnik, P., and Hahn, U. (2016) Monte Carlo testing in spatial statistics, with applications to spatial residuals. Spatial Statistics 18, Part A, 40-53. doi: http://dx.doi.org/10.1016/j.spasta.2016.04.005

Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015) Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004

Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017) Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381-404. doi: 10.1111/rssb.12172

Myllymäki, M. and Mrkvička, T. (2023). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583

Myllymäki, M., Kuronen, M. and Mrkvička, T. (2020). Testing global and local dependence of point patterns on covariates in parametric models. Spatial Statistics 42, 100436. doi: 10.1016/j.spasta.2020.100436


GET documentation built on Sept. 11, 2024, 5:46 p.m.