knitr::opts_chunk$set( message = FALSE, # Switch of all messages collapse = TRUE, comment = "#> ", # To match console fig.align = "center", fig.height = 5, fig.width = 7, out.width = "100%", dev = 'png', dpi = 300 )
The R package pixelate centres around a single function, pixelate
, designed to pixelate spatial predictions as per their average uncertainty.
In brief, the function pixelate
groups predictions into a specified number of large pixels; computes the average uncertainty within each large pixel; then, for each large pixel, depending on its average uncertainty, either averages the predictions across it or across smaller pixels nested within it. These averaged predictions are then plotted.
The resulting plot of averaged predictions is selectively pixelated, similar to a photo that is deliberately pixelated to disguise a person's identity. Areas of high average uncertainty in the pixelated plot are unresolved, while areas with high average certainty are resolved, similar to information poor versus rich regions of a satellite map.
For full details of the function pixelate
see its documentation accessed by ?pixelate
or help(pixelate)
, its code accessed by pixelate
and the code for the functions on which it depends, which can be found in pixelate_internal_functions.R under the R/ directory of the source package, available online github.com/artaylor85/pixelate/R/.
The function pixelate
supports spatially continuous predictions (e.g. those generated by a geostatistical @diggle1998 or geospatial model @hengl2018) on any topic providing that they have associated measures of uncertainty. Its output could be plotted using any of the various geographic information systems available. We illustrate its utility using predicted Plasmodium falciparum incidence @weiss2019 and plot its output in R using functions from the ggplot2 package @ggplot2.
As the warning generated by pixelate
in the quick example below states, there are issues around averaging uncertainty generated by a so-called 'per-pixel' simulation @gething:10, where 'per-pixel' is synonymous with 'per-prediction' (whereas, in pixelate, a pixel refers to a collection of one or more predictions). This is a problem with the input to pixelate
not the basis of pixelate
. Predicted P. falciparum incidence is based on a posterior predictive simulation that approximates the joint posterior predictive distribution thus accounting for spatial covariance and enabling averaging of uncertainty [@gething:10; @weiss2019].
While developing this R package, we became aware of an identically named function provided by the Vizumap package @vizumap. Vizumap::pixelate
is designed to pixelate choropleth maps @lucchesi2017, whereas pixelate::pixelate
is designed to pixelate spatially continuous predictions [@diggle1998; @hengl2018]. Within each region of the cloropleth map, Vizumap::pixelate
works by randomly assigning each pixel a value sampled from the margin of error around the corresponding regional estimate (pixels are equally-sized across the entire map); see the vignette of Vizumap and @lucchesi2017 for more details. Another interesting and related article that discusses communicating uncertainty in general highlights the utility of exceedance probabilities (e.g. the probability that some prediction exceeds a threshold) @kuhnert2018. Exceedance probabilities (EPs) provide a very simple alternative to pixelation: 'EPs from a threshold provide one piece of information to the decision maker: a probability.' and so can convey useful information to decision makers in a single map, whether it be a cloropleth map or a map of spatially continuous predictions.
# Load and attach required packages library(pixelate) if (!require("ggplot2")) { stop("Package ggplot2 is needed to build this vignette. Please install it.") }
To use pixelate
, simply input some spatial predictions and plot the output as follows. Please follow the detailed example in the next section to better understand the versatility of pixelate
and to generate prettier plots.
# Pixelate px <- pixelate(SubSaharanAfrica_Pf_incidence) # Plot ggplot(px$pix_df) + geom_raster(mapping = aes(x = x, y = y, fill = pix_z))
Before we begin, to avoid duplicating code over multiple plots, let's define a function,
plot_sp_pred
, that wraps various calls to functions from the package ggplot2 @ggplot2.
The plot_sp_pred
function generates a raster plot with country borders if possible.
Country borders enhance plots but are inessential.
We attempt to add them using shape file data and
functions ggplot2::geom_polygon
and ggplot2::coord_sf
.
The latter requires the sf package whose installation can be problematic because it may require some external packages. To ensure the code runs for all, including those who have problems with the installation of sf, we check for the availability of sf and generate plots accordingly,
with and without country borders.
Shape file data (see e.g. ?SubSaharanAfrica_shp
) were obtained using the function malariaAtlas::getShp
from the package malariaAtlas @malariaAtlas; see get_shape_files.R in the data-raw/ directory of the source package, available online github.com/artaylor85/pixelate/data-raw/.
# Function to wrap many functions from ggplot2 plot_sp_pred = function(sp_pred, shp_file, legend_position = c(0.02, 0.01)){ p <- ggplot() + # Raster surface geom_raster(data = sp_pred, mapping = aes(x = x, y = y, fill = z)) + ylab('Latitude (degrees)') + xlab('Longitude (degrees)') + # Modify the theme: theme(legend.justification = c(0, 0), legend.position = legend_position, legend.background = element_rect(fill = NA), legend.title = element_text(size = 8), legend.text = element_text(size = 8), text = element_text(family = "serif")) # Check the availability of package sf (needed for ggplot2::coord_sf) # and proceed accordingly. Note that requireNamespace("sf") is FALSE # if sf is available but one of its dependencies (e.g. units) is not if (requireNamespace("sf")) { # Set the limits of the plot using coord_sf p <- p + coord_sf(xlim = range(sp_pred$x), ylim = range(sp_pred$y), expand = FALSE) + # Add country borders based on shape file data geom_polygon(data = shp_file, aes(x = long, y = lat, group = group), fill = NA, colour = "gray30", size = 0.1) } else { p <- p + coord_fixed(expand = FALSE) } return(p) }
The function pixelate
supports spatially continuous predictions on any topic providing that they have associated measures of uncertainty. To use pixelate
, simply format your spatial predictions, uncertainty measures and coordinates as per the example data sets described in more detail below. A minimum of eight predictions (two by four or four by two) are necessary for pixelation by two or more large pixels.
We illustrate the utility of pixelate
using spatial predictions of P. falciparum malaria incidence in 2017 @weiss2019, which are available at the Malaria Atlas Project (MAP) website map.ox.ac.uk. Specifically, we downloaded posterior predictive summaries of P. falciparum incidence by selecting "ANNUAL MEAN OF PF INCIDENCE" at map.ox.ac.uk/malaria-burden-data-download/. We formatted the downloads using the script format_pf_incidence.R in the data-raw/ directory of the source package, available online github.com/artaylor85/pixelate/data-raw/.
(As an aside, the R package malariaAtlas @malariaAtlas provides a direct R interface to parasite predictions and MAP rasters via malariaAtlas::getPR
and malariaAtlas::getRaster
. However, at the time of writing, uncertainty measures were not available via the package malariaAtlas.)
The formatted incidence predictions are included in the installed package pixelate. They are stored in an observation data frame with an observation per row. Each observation has four variables: a longitude, x; a latitude, y; a prediction, z; and an uncertainty measure, u; for more details see ?SubSaharanAfrica_Pf_incidence
and ?CentralAfrica_Pf_incidence
, where the latter is simply a cropped version of the former. A selection of observations with predictions that are neither zero (e.g. because they are on a mountain top) or NA (e.g. because they are in the ocean) is printed below.
example_count = 5 knitr::kable(CentralAfrica_Pf_incidence[1:example_count,], row.names = FALSE, caption = sprintf("A selection of %s observations.", example_count))
In our example z is the median predicted incidence and u is the width of the 0.95\% predicted incidence credible interval. (Instead of taking the median and credible interval width as z and u respectively, we could have taken other measures of location and uncertainty, e.g. the mean and standard deviation as in @van2019.) Both z and u are plotted in turn below. But first let's create a box to delineate the central Africa region that features in the example data set CentralAfrica_Pf_incidence
:
CA_box = CentralAfrica_Pf_incidence CA_box = data.frame(x = c(min(CA_box$x), min(CA_box$x), max(CA_box$x), max(CA_box$x), min(CA_box$x)), y = c(min(CA_box$y), max(CA_box$y), max(CA_box$y), min(CA_box$y), min(CA_box$y)), id = rep(1,5))
Now let's plot z (median predicted incidence) and u (uncertainty of predicted incidence), adding the box to delineate the central Africa region:
# Set up the plot using plot_sp_pred unpix <- plot_sp_pred(sp_pred = SubSaharanAfrica_Pf_incidence, shp_file = SubSaharanAfrica_shp) # Add gradient unpix <- unpix + scale_fill_gradientn(name = "Median \nincidence", colors = c("seashell", "tomato", "darkred"), na.value = 'lightblue2') + # Add box geom_polygon(data = CA_box, aes(x = x, y = y, group = id), fill = NA, colour = "black", size = 0.5) # Plot unpix
# Create a temporary observation data frame and replace z by u temp_obs <- SubSaharanAfrica_Pf_incidence temp_obs$z <- SubSaharanAfrica_Pf_incidence$u # Set up the plot using plot_sp_pred uncrt <- plot_sp_pred(sp_pred = temp_obs, shp_file = SubSaharanAfrica_shp) # Add gradient uncrt <- uncrt + scale_fill_gradientn(name = "95% credible \ninterval width", colors = c("seashell", "darkgray", "black"), na.value = 'lightblue2') + # Add box geom_polygon(data = CA_box, aes(x = x, y = y, group = id), fill = NA, colour = "black", size = 0.5) # Plot uncrt
Uncertainty in predicted P. falciparum incidence (figure above) is low in regions were there are epidemiological data and in regions where there are strong predictors of incidence.
pixelate
Spatial predictions are input into pixelate
via an argument called obs_df. An observation (obs) refers to a set containing a single prediction throughout the code and package documentation, whereas pixels refer to squares or rectangles comprised of one or more observations and thus predictions. Note that the warning regards averaging uncertainty is silenced in the following examples.
px <- pixelate(obs_df = SubSaharanAfrica_Pf_incidence)
The above call to pixelate implicitly relies on default values for a set of pixelation arguments; see ?pixelate
to see what they are. The default values will fail if there are fewer than 480 by 480 predictions in the obs_df. To pixelate a obs_df with fewer predictions, reduce the number of large pixels (argument num_bigk_pix) and/or the number of different pixel sizes (argument bigk):
px_CA <- pixelate(obs_df = CentralAfrica_Pf_incidence, num_bigk_pix = c(12,12)) # Change from default, which is 15
If the obs_df has more than 480 by 480 predictions, you can increase the number of large pixels and/or the number of different pixel sizes:
px_SSA <- pixelate(obs_df = SubSaharanAfrica_Pf_incidence, num_bigk_pix = c(20,20)) # Change from default, which is 15
It is also possible to explore pixels that scale in size (observations per pixel) more rapidly by increasing the scale_factor argument and changing the scale argument from imult to iexpn. However, because the observations per pixel can scale extremely rapidly when default pixelation arguments are changed, the prediction count required for seemingly modest argument combinations may exceed those available and thus generate an error, e.g.
px <- pixelate(obs_df = SubSaharanAfrica_Pf_incidence, scale = "iexpn", # Change from default, which is "imult" scale_factor = 2) # Change from default, which is 1
pixelate
-ed outputTo better appreciate the different pixel sizes, we first generate a pixel size legend layer as follows:
gap_y <- 20 # Number of observations between example pixels and left edge of the plot gap_x <- 350 # Number of observations from the bottom edge of the plot pix_legend <- px_SSA$pix_df # Copy coordinates pix_legend$z <- NA # Set z to NA bigk <- nrow(px_SSA$opp) # Extract the number of pixel sizes csy <- c(0,cumsum(px_SSA$opp$y)) # Cumulatively add pixel sizes xs <- sort(unique(pix_legend$x)) # Extract unique longitude (degrees) ys <- sort(unique(pix_legend$y)) # Extract unique latitude (degrees) for(k in 1:bigk){ # For each of the pixel sizes ind_y <- gap_y*k + ((csy[k] + 1):csy[k+1]) # Indices of latitude of example pixels ind_x <- (gap_x + 1):(gap_x + px_SSA$opp$x[k]) # Indices of longitude of example pixels # Indices in the observation data frame inds <- which(pix_legend$x %in% xs[ind_x] & pix_legend$y %in% ys[ind_y]) pix_legend$z[inds] <- 0 # Set location measure of example pixel predictions to zero } # Remove all NAs pix_legend <- pix_legend[!is.na(pix_legend$z), ]
We then add the pixels size legend layer to a plot of the pixelated spatial predictions as follows.
# Create a temporary observation data frame and set z to pixelated pix_z temp_obs <- px_SSA$pix_df temp_obs$z <- temp_obs$pix_z # Generate a pixelated plot pix_SSA <- plot_sp_pred(sp_pred = temp_obs, shp_file = SubSaharanAfrica_shp) # Add gradient pix_SSA <- pix_SSA + scale_fill_gradientn(name = "Average median \nincidence", colors = c("seashell", "tomato", "darkred"), na.value = 'lightblue2') + # Add pixel size legend layer with title geom_tile(data = pix_legend, mapping = aes(x = x, y = y, fill = z)) + annotate("text", x = min(pix_legend$x), y = max(pix_legend$y) + 1.5, label = "Pixel sizes", family = "serif", size = 2.5, hjust = 0) # Plot pix_SSA
Following the example above, we can plot the bespoke pixelation of the central Africa region (i.e. pixelation of the predictions within the boxed region of the unpixelated plot as per the uncertainties within the boxed region of the uncertainty plot):
# Pixel size legend gap_y <- 5 # Number of observations between example pixels and left edge of the plot gap_x <- 5 # Number of observations from the bottom edge of the plot pix_legend <- px_CA$pix_df # Copy coordinates pix_legend$z <- NA # Set z to NA bigk <- nrow(px_CA$opp) # Extract the number of pixel sizes csy <- c(0,cumsum(px_CA$opp$y)) # Cumulatively add pixel sizes xs <- sort(unique(pix_legend$x)) # Extract unique longitude (degrees) ys <- sort(unique(pix_legend$y)) # Extract unique latitude (degrees) for(k in 1:bigk){ # For each of the pixel sizes ind_y <- gap_y*k + ((csy[k] + 1):csy[k+1]) # Indices of latitude of example pixels ind_x <- (gap_x + 1):(gap_x + px_CA$opp$x[k]) # Indices of longitude of example pixels # Indices in the observation data frame inds <- which(pix_legend$x %in% xs[ind_x] & pix_legend$y %in% ys[ind_y]) pix_legend$z[inds] <- 0 # Set location measure of examples pixel predictions to zero } # Remove all NAs pix_legend <- pix_legend[!is.na(pix_legend$z), ] # Create a temporary data frame and set z to pixelated pix_z temp_obs <- px_CA$pix_df temp_obs$z <- temp_obs$pix_z # Generate a pixelated plot pix_CA <- plot_sp_pred(sp_pred = temp_obs, shp_file = CentralAfrica_shp, legend_position = "right") # Add gradient pix_CA <- pix_CA + scale_fill_gradientn(name = "Average \nmedian \nincidence", colors = c("seashell", "tomato", "darkred"), na.value = 'lightblue2') + # Add pixel size legend layer with title geom_tile(data = pix_legend, mapping = aes(x = x, y = y, fill = z)) + annotate("text", x = min(pix_legend$x), y = max(pix_legend$y) + 0.8, label = "Pixel \nsizes", family = "serif", size = 2, hjust = 0) # Add border pix_CA_border <- pix_CA + theme(panel.border = element_rect(colour = "black", fill = NA, size = 1)) # Plot pix_CA_border
Note that the pixelated plots of both sub-Saharan Africa and central Africa feature perfectly resolved blue and white regions. This is because, importantly, missing predictions (coloured blue) and predictions that are zero with certainty (coloured white) are excluded from the entire pixelation process (i.e. computation and classification of average uncertainty, and computation of average prediction across large and nested pixel sizes). Missing predictions and predictions that are zero with certainty thus appear exactly as they do in the unpixelated plot and the plot of uncertainty.
As mentioned above pixelate
works by grouping predictions into a specified number of large pixels and for each large pixel either averaging predictions across it or across smaller pixels nested within it depending on its average uncertainty. (These averaged predictions are then plotted.) Specifically, each large pixel is allocated a nested pixel size depending on which quantile interval its average uncertainty falls into, where the number of quantile intervals is equal to the number of different nested pixel sizes and the quantiles are based on the empirical distribution of average uncertainty. For example, in the pixelated plot of sub-Saharan Africa the nested pixel sizes correspond to the following quantile intervals.
# Extract bigk bigk <- nrow(px_SSA$opp) # Define quantile intervals qs <- format(as.numeric(gsub('%', '',names(px_SSA$uncertainty_breaks)))/100, digits = 2) quantile_intervals <- sapply(2:(bigk+1), function(k){ bounds <- qs[c(k-1,k)] paste(bounds, collapse = ' to ') }) # Define average uncertainty intervals avus <- format(px_SSA$uncertainty_breaks, digits = 2) avuncertainty_intervals <- sapply(2:(bigk+1), function(k){ bounds <- avus[c(k-1,k)] paste(bounds, collapse = ' to ') }) # Group together pixel_table <- cbind(quantile_intervals, avuncertainty_intervals, px_SSA$opp) # Generate table knitr::kable(pixel_table, row.names = FALSE, col.names = c("Quantile interval", "Average uncertainty", "Nested pixel width (obs)", "Nested pixel height (obs)"), align = rep("c",ncol(pixel_table)))
We can plot the large pixel allocations to see how uncertainty varies across large pixels (i.e. which quantile interval each large pixel falls into):
# Set z equal to the large pixel average uncertainty quantile interval index temp_obs <- px_SSA$pix_df temp_obs$z <- as.factor(px_SSA$pix_df$bins) # Generate and print the plot plot_sp_pred(sp_pred = temp_obs, shp_file = SubSaharanAfrica_shp) + scale_fill_discrete(name = "Average uncertainty \nquantile interval", na.value = 'lightblue2', h = c(120, 360), labels = c(quantile_intervals, 'NA'))
From the plot and table of quantile intervals, we can see that the greenest regions have high certainty and thus predictions in these regions of the pixelated plot of sub-Saharan Africa are not averaged (the nested pixels are one by one observation). Meanwhile predictions in the pixelated plot of sub-Saharan Africa that fall with blue regions of the quantile plot are averaged over moderately sized nested pixels (six by six observations). The redest regions (that cover large parts of the Democratic Republic of the Congo) have high uncertainty; predictions in the pixelated plot of sub-Saharan Africa that fall within pink regions of the quantile plot are averaged across the entire large pixel, which is 48 by 48 observations.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.