GET-package | R Documentation |
The GET package provides implementation of global envelopes for a set of general d-dimensional vectors T in various applications. A 100(1-alpha) the probability that T falls outside this envelope in any of the d points is equal to alpha. Global means that the probability is controlled simultaneously for all the d elements of the vectors. The global envelopes can be used for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot), for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional ANOVA, functional GLM, n-sample test of correspondence of distribution functions), and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction).
The GET package provides central regions (i.e. global envelopes) and global envelope tests with intrinsic graphical interpretation. The central regions can be constructed from (functional) data. The tests are Monte Carlo or permutation tests, which demand simulations from the tested null model. The methods are applicable for any multivariate vector data and functional data (after discretization).
To get an overview of the package, start R and type library("GET")
and vignette("GET")
.
To get examples of point pattern analysis, start R and type library("GET")
and vignette("pointpatterns")
.
To get examples of Mrkvička and Myllymäki (2022), start R and type library("GET")
and vignette("FDRenvelopes")
.
Central regions or global envelopes or confidence bands:
central_region
.
E.g. 50% central region of growth curves of girls growth
.
First create a curve_set of the growth curves, e.g.
cset <- curve_set(r = as.numeric(row.names(growth$hgtf)),
obs = growth$hgtf)
Then calculate 50% central region (see central_region
for further arguments)
cr <- central_region(cset, coverage = 0.5)
Plot the result (see plot.global_envelope
for plotting options)
plot(cr)
It is also possible to do combined central regions for several sets of curves provided in a list
for the function, see examples in central_region
.
Global envelope tests: global_envelope_test
is the main function.
E.g. A test of complete spatial randomness (CSR) for a point pattern X
:
X <- spruces # an example pattern from spatstat
Use the function envelope
of spatstat to create nsim simulations
under CSR and to calculate the functions you want (below K-functions by Kest).
Important: use the option 'savefuns=TRUE' and
specify the number of simulations nsim
.
env <- envelope(X, nsim=999, savefuns = TRUE, fun = Kest, simulate = expression(runifpoint(ex = X)))
Perform the test (see global_envelope_test
for further arguments)
res <- global_envelope_test(env)
Plot the result (see plot.global_envelope
for plotting options)
plot(res)
It is also possible to do combined global envelope tests for several sets of curves provided in a list
for the function, see examples in global_envelope_test
.
To obtain false discovery rate envelopes of Mrkvička and Myllymäki (2023) use the argument typeone = "fdr"
.
Functional ordering: central_region
and global_envelope_test
are based on different measures for ordering the functions (or vectors) from
the most extreme to the least extreme ones. The core functionality of calculating the measures
is in the function forder
, which can be used to obtain different measures for sets of
curves. Usually there is no need to call forder
directly.
Functional boxplots: fBoxplot
Adjusted global envelope tests for composite null hypotheses
GET.composite
, see a detailed example in saplings
One-way functional ANOVA:
Graphical functional ANOVA tests: graph.fanova
Global rank envelope based on F-values: frank.fanova
Functional general linear model (GLM):
Graphical functional GLM: graph.flm
Global rank envelope based on F-values: frank.flm
For large data (not fitting comfortably in memory): partial_forder
Functional clustering: fclustering
Global quantile regression: global_rq
Functions for performing global envelopes for other specific purposes:
Graphical n sample test of correspondence of distribution functions: GET.distrequal
Permutation-based tests of independence to samples from any bivariate distribution: GET.distrindep
Testing global and local dependence of point patterns on covariates: GET.spatialF
Testing local correlations: GET.localcor
Variogram and residual variogram with global envelopes: GET.variogram
Deviation tests (for simple hypothesis): deviation_test
(no graphical
interpretation)
Most functions accept the curves provided in a curve_set
object.
Use curve_set
to create a curve_set
object from the
functions. Other formats to provide the curves to the above functions are
also accepted, see the information on the help pages.
See the help files of the functions for examples.
To perform a test you always first need to obtain the test function T(r)
for your data (T_1(r)
) and for each simulation
(T_2(r), \dots, T_{s+1}(r)
) in one way or another.
Given the set of the functions T_i(r), i=1, \dots, s+1
,
you can perform a test by global_envelope_test
.
1) The workflow when using your own programs for simulations:
(Fit the model and) Create s
simulations from the (fitted) null model.
Calculate the functions T_1(r), T_2(r), \dots, T_{s+1}(r)
.
Use curve_set
to create a curve_set
object
from the functions T_i(r), i=1, \dots, s+1
.
Perform the test
res <- global_envelope_test(curve_set)
where curve_set
is the 'curve_set'-object you created, and plot the result
plot(res)
2) The workflow utilizing spatstat: start R, type library("GET")
and vignette("pointpatterns")
,
which explains the workflow and gives many examples of point pattern analysis
It is possible to modify the curve set T_1(r), T_2(r), \dots, T_{s+1}(r)
for the test.
You can choose the interval of distances [r_{\min}, r_{\max}]
by crop_curves
.
For better visualisation, you can take T(r)-T_0(r)
by residual
.
Here T_0(r)
is the expectation of T(r)
under the null hypothesis.
abide_9002_23
: see help page
adult_trees
: a point pattern of adult rees
cgec
: centred government expenditure centralization (GEC) ratios (see graph.fanova
)
fallen_trees
: a point pattern of fallen trees
GDPtax
: GDP per capita with country groups and other covariates
imageset3
: a simulated set of images
rimov
: water temperature curves in 365 days of the 36 years
saplings
: a point pattern of saplings (see GET.composite
)
The data sets are used to show examples of the functions of the library.
If the number of functions is low, the choice of the measure (or type or depth) playes a role,
as explained in vignette("GET")
(Section 2.4).
Note that the recommended minimum number of simulations for the rank envelope test (Myllymäki et al., 2017) based on a single function in spatial statistics is nsim=2499. When the number of argument values is large, also larger number simulations is needed in order to have a narrow p-interval. The "erl", "cont", "area", "qdir" and "st" global envelope tests and deviation tests can be used with a lower number of simulations, although the Monte Carlo error is obviously larger with a lower number of simulations. For increasing the number of simulations, all the global rank envelopes approach the same curves.
Mrkvička et al. (2017) discussed the number of simulations for tests based on many functions.
Myllymäki and Mrkvička (2023) provides description of the package.
The material can also be found in the corresponding vignette, which is available by
starting R and typing library("GET")
and vignette("GET")
.
In the special case of spatial processes (spatial point processes, random sets),
the functions are typically estimators of summary functions. The package supports
the use of the R package spatstat for generating simulations and calculating
estimators of the chosen summary function, but alternatively these can be done by
any other way, thus allowing for any user-specified models/functions.
To see examples of global envelopes for analysing point pattern data,
start R, type library("GET")
and vignette("pointpatterns")
.
Mrkvička and Myllymäki (2023) developed false discovery rate (FDR) envelopes.
Examples can be found by in associated vignette: start R, and type
library("GET")
and vignette("pointpatterns")
.
Mrkvička et al. (2023a) proposed global quantile regression. An example of
global quantile regression is given in the vignette vignette("QuantileRegression")
.
The vignette vignette("HotSpots")
illustrates the methodology proposed by
Mrkvička et al. (2023b) for detecting hotspots on a linear network.
Type citation("GET") to get a full list of references.
Mikko Kuronen has made substantial contributions of code. Additional contributions and suggestions from Jiří Dvořák, Pavel Grabarnik, Ute Hahn, Michael Rost and Henri Seijo.
Mari Myllymäki (mari.myllymaki@luke.fi, mari.j.myllymaki@gmail.com) and Tomáš Mrkvička (mrkvicka.toma@gmail.com)
Dai, W., Athanasiadis, S., Mrkvička, T. (2021) A new functional clustering method with combined dissimilarity sources and graphical interpretation. Intech open, London, UK. DOI: 10.5772/intechopen.100124
Dvořák, J. and Mrkvička, T. (2022). Graphical tests of independence for general distributions. Computational Statistics 37, 671–699.
Mrkvička, T., Konstantinou, K., Kuronen, M. and Myllymäki, M. (2023a) Global quantile regression. arXiv:2309.04746 [stat.ME]. https://doi.org/10.48550/arXiv.2309.04746
Mrkvička, T., Kraft, S., Blažek, V. and Myllymäki, M. (2023b) Hotspots detection on a linear network with presence of covariates: a case study on road crash data. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4598454
Mrkvička, T., Myllymäki, M. and Hahn, U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27(5), 1239-1255. doi: 10.1007/s11222-016-9683-9
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Mrkvička, T., Myllymäki, M. (2023) False discovery rate envelopes. Statistics and Computing 33, 109. https://doi.org/10.1007/s11222-023-10275-7
Mrkvička, T., Roskovec, T. and Rost, M. (2021) A nonparametric graphical tests of significance in functional GLM. Methodology and Computing in Applied Probability 23, 593-612. doi: 10.1007/s11009-019-09756-y
Mrkvička, T., Soubeyrand, S., Myllymäki, M., Grabarnik, P., and Hahn, U. (2016) Monte Carlo testing in spatial statistics, with applications to spatial residuals. Spatial Statistics 18, Part A, 40-53. doi: http://dx.doi.org/10.1016/j.spasta.2016.04.005
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015) Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017) Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381-404. doi: 10.1111/rssb.12172
Myllymäki, M. and Mrkvička, T. (2023). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
Myllymäki, M., Kuronen, M. and Mrkvička, T. (2020). Testing global and local dependence of point patterns on covariates in parametric models. Spatial Statistics 42, 100436. doi: 10.1016/j.spasta.2020.100436
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.