Introduction to EgoCor"

knitr::opts_chunk$set(
  warning = FALSE,
  message = FALSE,
  collapse = TRUE,
  comment = "#>"
)
library(EgoCor)

This is an introduction to the package EgoCor. EgoCor offers a user friendly interface to fit and display a range semi-variogram model graphics and tables of parameters to facilitate the decision making about which exponential parameters fit either raw data or residuals best. The modeling function vario.mod() is based on the functions of the R package gstat. A further function providing the measure of uncertainty proposed by @dyck_sv_ses. has been implemented in the package. With the R package EgoCor modelling the spatial correlation structure of health outcome with a measure of uncertainly is made available to non specialists.

Statistical background

Please find more detailed information about the used statistical methods in @dyck_sv_ses.

Data

The simulated dataset birth is provided with the package EgoCor. The dataset is based on the spatial distribution of real birthweight data. It contains eight variables for 903 births:

head(birth)

Functions

We use this dataset to illustrate the following functions:

  1. coords.plot(): for graphical description of locations;
  2. distance.info(): for descriptive information about distances between observations;
  3. vario.mod(): to fit empirical semi-variograms and based on that exponential models with graphical presentation;
  4. vario.reg.prep(): to extract the spatial correlation structure of residuals of a regression model;
  5. par.uncertainty(): to obtain bootstrap standard errors for the parameters of the exponential semi-variogram model.

While detailed information can be retrieved with help(function.name), a presentation and demonstration of all functions is provided in the following sections.

1. coords.plot()

The first three columns of the data frame or matrix should be ordered the following way: 1st column: x-coordinate in meters for a Cartesian coordinate system; 2nd column: y-coordinate in meters for a Cartesian coordinate system; 3rd column: outcome of interest. Other columns will be ignored.

The function coords.plot() provides a simple visualization of the locations on a two dimensional map and indicates whether the outcome is observed (by a black circle) or missing (by a red x) at a specific location. The purpose of this function is to look at the spatial distribution of observations and if there might be a spatial pattern in the distribution of missing values in the outcome of interest or in covariates.

coords.plot(birth)

2. distance.info()

This function provides further information about the distribution of pairwise Euclidean distances. It displays the following descriptive statistics:

distance.info(birth)

From all the 815 409 pairwise distances, 30 570 are of less than 2 000 meters and will be used for modelling of the local spatial correlation structure.

3. vario.mod()

This function enables the simultaneous output of multiple exponential semi-variogram models fitted for a range of maximal distances and a number of bins. Thereby, the focus lies on the ability of the function to provide multiple estimation results depending on various specifications for the meta parameters max.dist and nbins.

In the default setting shinyresults = TRUE, an interactive shiny application is opened automatically that displays the results. For the purpose of this vignette though we set shinyresults = FALSE and windowplots = TRUE.

The chosen maximal distance value specifies the subset of data pairs that are incorporated in the semi-variogram estimation. Only data pairs with an Euclidean distance ≤ max.dist are taken into account.

For a first exploration, it might be useful to try a range of maximal distances to locate where the range might be situated:

mod = vario.mod(birth, max.dist = c(1000,800,600), shinyresults = FALSE, windowplots = TRUE)

You can also get the estimated parameters later by

mod$infotable

The maximal distance of 800 m seems to provide a good fit for this dataset. We can now refine the analysis by varying the number of bins or lags. The nbins parameter specifies the number of lags of the empirical semi-variogram to be estimated. A high number of lags might lead to a small within-lag-sample-size and thus to an unstable estimate. Simultaneously, a too small number of lags might lead to a model that does not detect a spatial correlation structure at all.

mod_2 = vario.mod(birth, max.dist = 800, nbins = c(11,12,13), 
                  shinyresults = FALSE, windowplots = TRUE)

All models' graphical representations look similar. According to the relative bias model 3 provides a slightly better fit. We choose model 3 with max.dist = 800 and nbins = 13 as final model.

4. vario.reg.prep()

To use vario.mod() to model the spatial correlation structure of residuals, the studentized residuals from a (hierachical) linear regression model can be extracted by vario.reg.prep(). We want to see if adjusting for some predictors of birthweight explain some or all of the spatial correlation structure seen:

res <- lm(formula = birthweight ~ datediff + primiparous + bmi, data = birth)
v.prep = vario.reg.prep(res, data = birth)
models = vario.mod(v.prep, max.dist = c(800,600), shinyresults = FALSE, windowplots = TRUE)

The results point towards a reduced spatial correlation structure with a maximal distance reduced to half the one obtained before adjustment and much less regularity in the empirical semi-variogram.

5. par.uncertainty()

This function provides the filtered bootstrap standard errors for all three exponential model parameters. As the bootstrap can take some time, it is not called by the function vario.mod() directly so that the bootstrap is not executed automatically for all fitted models. This is left to the choice of the user by selecting one specific fitted model, thus saving execution time. The main output of the function is a table with parameter estimates and standard errors for all three parameters of the estimated model. The user can either provide a model estimated by vario.mod() and a model number that specifies which one to use or provide the parameter estimates, the data, the maximum distance and number of bins seperately. The first option is more convenient, therefore we chose it here. The second option enables standard error estimation for exponential semi-variogram models for which no result of class "vario.mod.output" is available. This might be useful if a fitted model which was not obtained with EgoCor (eg. estimated directly with gstat functions or other software making use of the same estimation algorithm) shall be investigated with respect to parameter standard errors. It is recommended to use more bootstrap samples than done here.

unc = par.uncertainty(mod_2, mod.nr = 2, B = 100)


Try the EgoCor package in your browser

Any scripts or data that you put into this service are public.

EgoCor documentation built on June 22, 2024, 12:11 p.m.