knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

EpiGeostats

EpiGeostats R package generates geostatistical disease maps using block direct sequential simulation algorithm and provides visualization tools for effective analysis and communication of disease risk map results.

The package may also be used with mortality rates, or applied to other fields like ecology or criminology.

R wrapper

EpiGeostats R package is a wrapper for running a geostatistical software - dss.c.64.exe - to compute block direct sequential simulation algorithm, and for calling pixelate R package (https://github.com/aimeertaylor/pixelate) to plot spatial risk uncertainty in disease risk maps.

Details on the geostatistical algorithm used in EpiGeostats are described in Azevedo et al (https://doi.org/10.1186/s12942-020-00221-5).

The package provides convenient wrappers to set parameters required for geostatistical simulations and writes files in dss.c.64.exe readable format, after which block direct sequential simulation algorithm can be executed. Results generate a set of simulated disease risk, median risk and spatial uncertainty maps, and a pixelated single-map with disease risk and uncertainty, that can be extracted and/or plotted in R.

Getting started

The installation of EpiGeostats needs compilation, hence it requires to have Rtools in the system. Moreover, the stand-alone software dss.c.64.exe is required to run EpiGeostats. Therefore, two prior steps are required to run EpiGeostats for the first time:

  1. download Rtools and run the installer (select the default options),
  2. download dss.c.64.exe.

After downloading dss.c.64.exe from https://github.com/maluicr/dss, create a folder called “input” in the working directory, add the zipped dss.c.64.exe.zip inside and unzip the file into the same folder.

You may execute these actions from your R console, running the following code:

# -----------------------------------------
# Step 1, if not installed, installs Rtools
# -----------------------------------------

# check if Rtools software is installed. if not, download and install from CRAN.
if (!require("installr")){install.packages("installr")}

# select the default options when installing
installr::install.Rtools(check_r_update = F)

# -----------------------------------------
# Step 2, download dss.c.64.exe from GitHub
# -----------------------------------------

# create folder input
inp <- "./input"
if(!file.exists(inp)) dir.create(inp, recursive = TRUE)

# download dss.c.64.exe.zip from github
gitURL <- "https://github.com/maluicr/dss/raw/main/DSS.C.64.exe.zip"
utils::download.file(url = gitURL, destfile = file.path(inp, "DSS.C.64.exe.zip"))

# unzip file into 'input' folder
unzip(file.path(inp, "DSS.C.64.exe.zip"), exdir = inp)

Then, install the required packages using the following code:

# install devtools from CRAN
if (!require("devtools")){
  install.packages("devtools")
  }

  # install packages from GitHub
  devtools::install_github("maluicr/EpiGeostats", build_vignettes = TRUE, dependencies = TRUE)
  devtools::install_github("aimeertaylor/pixelate", build_vignettes = TRUE, dependencies = TRUE)

You should now be ready to go.

Example

As an example, we will show how to use EpiGeostats to perform disease mapping of COVID-19 incidence and plot in a single-map disease risk and spatial risk uncertainty.

The input required to run EpiGeostats R package are two datasets :

  1. block data, a data frame with incidence data (counts) per region,
  2. point data, a regular grid dataset with id region values at grid nodes.

In this example, EpiGeostats is used to map COVID-19 risk in Portugal on January 15, 2021. The covid dataset ptdata and the regular grid ptgrid comes with the package EpiGeostats. The coordinate reference system is ETRS89 / Portugal TM06 (EPSG: 3763) and coordinates are in metres.

# load required libraries
library(EpiGeostats)
library(sp)
library(raster)

data(ptdata)
data(ptgrid)

Block data refers to disease data.frame with spatial data point locations (centroids), region id, number of disease cases and size of population. Point data refers to a rectangular grid with region id values at all simulation x, y coordinates (grid nodes). All unique id regions in disease data.frame should be represented by 1 or more grid nodes.

We need to transform ptgrid to SpatialPixelsDataFrame:

library(sp)
coordinates(ptgrid) <- ~x+y
proj4string(ptgrid) <- CRS("+init=epsg:3763")
class(ptgrid)
gridded(ptgrid) <- TRUE
class(ptgrid)

Compute incidence rates

Compute incidence rates and variance-error term per region, with function irates():

# compute incidence rates
inc <- irates(dfobj = ptdata, oid = "oid_", xx = "x", yy = "y", zz = "t", 
               cases = "ncases", pop = "pop19", casesNA = 1, day = "20210115")

An incidence file is also created in a readable format for dss.c.64.exe (.out) and stored in './input' folder.

Create grid node file for dss.c.64.exe

A regular gridded file with incidence data is also required for block simulation with dss.c.64.exe. grdfile() function transforms the gridded data into a readable format for dss.c.64.exe and prints it into a text file (.out) in './input' folder.

# create gridnode file
gnode <- grdfile(rateobj = inc, gridimage = ptgrid, NAval = -999)

Create mask file for dss.c.64.exe

A mask for the gridded file is required for dss.c.64.exe. maskfile() function produces the required file (.out). The only argument of the function is the name of list, output of function grdfile().

mask <- maskfile(gnode)

The function returns a list of objects and generates a text file (.out) with values {-1,0} where -1 are assigned to nodata locations and 0 are assigned to nodes with values (id region). The file is stored in ./input folder.

Estimate incidence rates semi-variogram

Use varexp() to calculate experimental variogram from COVID-19 rates. For now, varexp() is only implemented in the omnidirectional case.

# compute experimental variogram
vexp <- varexp(inc, lag = 7000, nlags = 15)

# plot experimental variogram
plot(vexp[["semivar"]][1:2], ylab = expression(paste(gamma, "(h)")), xlab = "h (in m)", main = "Semi-variogram") 

# add estimated (weighted) sill
abline(h = vexp[["weightsvar"]], col ="red", lty = 2)

The function returns a list with the weighted variance (by population size) and variogram estimates at nlags.

Fit incidence rates theoretical semi-variogram

Function varmodel() fits (manually) a theoretical semi-variogram. The user should provide the experimental semi-variogram data to evaluate fit by visual inspection, the semi-variogram model type and the semi-variogram parameters. For now, only spherical and exponential models are implemented.

vmod <- varmodel(vexp, mod = "sph", nug = 0, ran = 35000, sill = vexp[["weightsvar"]])

Plotting the fitted model:

# plot experimental variogram
plot(vexp[["semivar"]][1:2], ylab = expression(paste(gamma, "(h)")), xlab = "h (in m)") 

# add estimated sill
abline(h = vexp[["weightsvar"]], col ="red", lty = 2)

# add theoretical model
lines(vmod[["fittedval"]]) 

Create parameter file and invokes dss.c.64.exe

Function ssdpars() generates a parameters file (.par) and invokes dss.c.64.exe to run block direct sequential simulation.

ssdpars(grdobj = gnode, maskobj = mask, dfobj = inc, varmobj = vmod, 
        simulations = 5, radius1 = 35000, radius2 = 35000)

Note that this process may take a while, depending mostly on the number of simulation nodes and number of simulations specified. In the end of the process, the block simulation maps are stored in .out format.

Both parameters file (.par) and simulations files (.out) are stored in ./input folder.

Import dss.c.64.exe grid outputs in R readable formats

outraster() allows to import to R the .out files generated by dss.c.64.exe with a single-argument function. The user may specify additional arguments to save maps in native raster package format (.grd / .gri).

maps <- outraster(gnode)

Plotting disease map results

spmap() to plot disease risk map simulations:

library(ggplot2)
spmap(maps, mapvar = "simulations", simid = 1, legname = "Simulation #1")

spmap() to plot median disease risk and spatial uncertainty maps:

library(pixelate)
spmap(maps, mapvar = "etype", legname = "Median \n incidence")
spmap(maps, mapvar = "uncertainty", legname = "IQ range")

pxmap() to plot pixelated map:

pxmap(mapobj = maps)

Acknowledgements

Manuel Ribeiro acknowledges the financial support of the CERENA (project FCT-UIDB/04028/2020) and Fundação para a Ciencia e Tecnologia (research contract IF2018-CP1384). Manuel Ribeiro gratefully acknowledge CERENA-IST/UL researchers Leonardo Azevedo, Maria João Pereira and Amilcar Soares for providing code in Matlab and Fortran.



maluicr/blockdss documentation built on Nov. 22, 2023, 11:42 a.m.