In Michael-Stevens-27/silverblaze: Geographic Profiling in R

knitr::opts_chunk$set(echo = TRUE)

Motivation

A variable sigma model is available within the Silverblaze package but is yet to be discussed in great detail. The literature on geographic profiling has thus far assumed that the dispersal associated with an offender, an invasive species, or infectious disease is the same across source locations. @Ratcliffe2006 makes it clear that this is not always the case. An offender that commits crime around a work place will be restricted by time constraints to return to work within a certain time. This compares to more relaxed time constraints when committing crimes around a home. So we might expect a tight dispersal around a workplace but a wider dispersal around a home.

In this tutorial, we generate a couple of toy examples to investigate the following questions:

Can the model successfully return different dispersal (sigma) values for different source locations?
How well does the model perform when there is overlap in the dispersal ranges of sources with different sources?

Let's start by loading the Silverblaze package:

library(silverblaze)

set.seed(1)

Simple two source example

Setting up and running the independent sigma model

Set up an array of sentinel sites and a uniform prior. We'll also specify the sentinel site radius.

# sentinel sites
sentinel_lon <- seq(-0.2, 0.0, l = 15)
sentinel_lat <- seq(51.45, 51.55, l = 15)
sentinel_grid <- expand.grid(sentinel_lon, sentinel_lat)
names(sentinel_grid) <- c("longitude", "latitude")

# create source prior
uniform_prior <- raster_grid(cells_lon = 100, cells_lat = 100,
                               range_lat = range(sentinel_lat),
                               range_lon = range(sentinel_lon), guard_rail = 0.5)

# set sentinel radius                               
sentinel_site_radius <- 0.2

Now we generate some data. We use the sim_data() function to generate two sources, specifying the kind of sigma model we want to draw from by setting sigma_model = "independent". For each source we choose a sigma value using sigma_mean and sigma_var. This specifies the mean and variance of each normal distribution a sigma is being drawn from.

# sim data
sim1 <- sim_data(sentinel_grid$longitude,
                 sentinel_grid$latitude,
                 sigma_model = "independent", # set the sigma model to independent
                 sigma_mean = c(1, 2),      
                 sigma_var = c(0, 0),         
                 sentinel_radius = sentinel_site_radius,
                 K = 2,
                 expected_popsize = 1000)

# record imporant data
data_all1 <- sim1$record$data_all
true_source1 <- sim1$record$true_source
K_model <- 2

# pull the different sigma values
true_sigma1 <- sim1$record$true_sigma

Notice sigma_var is equal to zero for both sources, this allows us to force the value of each sigma. Now we build our geoprofiling project, add in our data and setup parameters to run the MCMC.

# create a project and bind data
p1 <- rgeoprofile_project()
p1 <- bind_data(p1,
                df = sim1$data,
                data_type = "counts")

# add parameter set to project
p1 <- new_set(project = p1,
             spatial_prior = uniform_prior,
             sentinel_radius = sentinel_site_radius,
             sigma_model = "independent",
             sigma_prior_mean = 1.5,
             sigma_prior_sd = 5,
             expected_popsize_prior_mean = 1000,
             expected_popsize_prior_sd = 100)

plot1 <- plot_map()
plot1 <- overlay_points(plot1, data_all1$lon, data_all1$lat)
plot1 <- overlay_sources(plot1, true_source1$lon, true_source1$lat)
plot1

We can see a source with a wider sigma and if we zoom into the bottom left we can see a source with a much tighter sigma. Let's create our geoprofiling project, load in this data and run the MCMC under a variable sigma model. We expect that we may run into some mixing issues when estimating multiple sigma values, hence we make use of the Metropolis-Hasting coupling described in one of the previous tutorials.

# optimise beta values for heated MCMC chains with an acceptance prob of at 
# least 0.75.  
beta_k <- optimise_beta(proj = p1, 
                        K = 2, 
                        target_acceptance = 0.75, 
                        max_iterations = 25,
                        beta_init = seq(0,1, l = 10), 
                        coupling_on = TRUE,
                        burnin = 1e3,
                        converge_test = 5e2,
                        samples = 10,
                        create_maps = FALSE,
                        pb_markdown = TRUE)

beta_values <- beta_k$beta_vec[[length(beta_k$beta_vec)]]

# run MCMC under K = 2 model
p1 <- run_mcmc(project = p1,
               K = K_model,
               burnin = 1e4,
               samples = 5e3,
               converge_test = 1e3,
               coupling_on = TRUE, 
               beta_manual = beta_values)

p1 <- rgeoprofile_file("VarSigproject_no_overlap.rds")

Results

Below we can see the geoprofile of the two sources. Just by eye it looks like each cluster has a different sigma value.

# produce and store map
plot1 <- plot_map()
plot1 <- overlay_sentinels(plot1, p1, sentinel_radius = sentinel_site_radius)
plot1 <- overlay_geoprofile(plot1, p1, K = 2)
plot1 <- overlay_sources(plot1, true_source1$lon, true_source1$lat)
plot1

How can we be sure these sigma values are different? We can access the raw output associated to sigma using the get_output() function. We can also look into where our true values fall within the 95% credible intervals of the fitted values.

# get the raw sigma draws
sigma_output <- get_output(p1, "sigma_sampling", K = 2, type = "raw")
sigma1 <- density(sigma_output[,1])
sigma2 <- density(sigma_output[,2])

# calculate the mean of these draws
estimatedSigma <- apply(sigma_output, 2, mean)
estimatedSigma

# get the credible intervals for sigma
sigma_intervals <- get_output(p1, "sigma_intervals", K = 2)
sigma_intervals
plot_sigma(p1, K = 2)

# print true values 
true_sigma1

These results are promising! As we can see the model is returning two distinct values of sigma.

Overlapping sources and variable dispersal

Now we generate another data set, both sources will be closer than the previous data to investigate the models ability to pick up two sources of varying dispersal on top of one another.

# set seed for simmed data
set.seed(1)

# sim data
sim2 <- sim_data(sentinel_grid$longitude,
                 sentinel_grid$latitude,
                 source_lon_min = -0.1, 
                 source_lon_max = -0.1,
                 source_lat_min = 51.49,
                 source_lat_max = 51.51,                 
                 sigma_model = "independent", # set the sigma model to independent
                 sigma_mean = c(0.5, 2),
                 sigma_var = c(0.1, 0.1),
                 sentinel_radius = sentinel_site_radius,
                 K = 2,
                 expected_popsize = 1000)

data_all2 <- sim2$record$data_all
true_source2 <- sim2$record$true_source

# pull the different sigma values
true_sigma2 <- sim2$record$true_sigma

# create a project and bind data
p2 <- rgeoprofile_project()
p2 <- bind_data(p2,
                df = sim2$data,
                data_type = "counts")

# add parameter set to project
p2 <- new_set(project = p2,
            spatial_prior = uniform_prior,
            sentinel_radius = sentinel_site_radius,
            sigma_model = "independent",
            sigma_prior_mean = 1.25,
            sigma_prior_sd = 5,
            expected_popsize_prior_mean = 1000,
            expected_popsize_prior_sd = 100)

Plot the the new data:

plot2 <- plot_map()
plot2 <- overlay_points(plot2, data_all2$lon, data_all2$lat)
plot2 <- overlay_sources(plot2, true_source2$lon, true_source2$lat)
plot2

Again, optimise the Metropolis_Hastings Coupling heats and run the MCMC and plot the new geoprofile:

# optimise beta values
beta_k <- optimise_beta(proj = p2, 
                        K = 2, 
                        target_acceptance = 0.75, 
                        max_iterations = 25,
                        beta_init = seq(0,1, l = 10), 
                        coupling_on = TRUE,
                        burnin = 1e3,
                        converge_test = 5e2,
                        samples = 10,
                        create_maps = FALSE,
                        pb_markdown = TRUE)

beta_values <- beta_k$beta_vec[[length(beta_k$beta_vec)]]

# run MCMC under K = 2 model
p2 <- run_mcmc(project = p2,
              K = K_model,
              burnin = 1e4,
              samples = 5e3,
              converge_test = 1e3,
              coupling_on = TRUE, 
              beta_manual = beta_values)

p2 <- rgeoprofile_file("VarSigproject_overlap.rds")

Below we can see the geoprofile of the two sources.

# produce and store map
plot2 <- plot_map()
plot2 <- overlay_sentinels(plot2, p2, sentinel_radius = sentinel_site_radius)
plot2 <- overlay_geoprofile(plot2, p2, K = 2, smoothing = 2)
plot2 <- overlay_sources(plot2, true_source2$lon, true_source2$lat)
plot2

Results

Again let's check the raw output of the MCMC chain for each sigma.

# get the raw sigma draws
sigma_output2 <- get_output(p2, "sigma_sampling", K = 2, type = "raw")
sigma1 <- density(sigma_output2[,1])
sigma2 <- density(sigma_output2[,2])

plot_sigma(p2, K = 2)

# calculate the mean of these draws
estimatedSigma2 <- apply(sigma_output2, 2, mean)
estimatedSigma2

# get the credible intervals for sigma
sigma_intervals2 <-get_output(p2, "sigma_intervals", K = 2)
sigma_intervals2

# print true values 
true_sigma2

As we can see the MCMC can return the independent sigma values even in extreme cases where there are overlapping home ranges.

References

# secretly save/load data from file so output is consistent between tutorials
# saveRDS(p1, file = "inst/extdata/VarSigproject_no_overlap.rds")
# saveRDS(p2, file = "inst/extdata/VarSigproject_overlap.rds")

Michael-Stevens-27/silverblaze documentation built on May 28, 2021, 5:47 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com