knitr::opts_chunk$set(echo = TRUE)
A variable sigma model is available within the Silverblaze package but is yet to be discussed in great detail. The literature on geographic profiling has thus far assumed that the dispersal associated with an offender, an invasive species, or infectious disease is the same across source locations. @Ratcliffe2006 makes it clear that this is not always the case. An offender that commits crime around a work place will be restricted by time constraints to return to work within a certain time. This compares to more relaxed time constraints when committing crimes around a home. So we might expect a tight dispersal around a workplace but a wider dispersal around a home.
In this tutorial, we generate a couple of toy examples to investigate the following questions:
Let's start by loading the Silverblaze package:
library(silverblaze)
set.seed(1)
Set up an array of sentinel sites and a uniform prior. We'll also specify the sentinel site radius.
# sentinel sites sentinel_lon <- seq(-0.2, 0.0, l = 15) sentinel_lat <- seq(51.45, 51.55, l = 15) sentinel_grid <- expand.grid(sentinel_lon, sentinel_lat) names(sentinel_grid) <- c("longitude", "latitude") # create source prior uniform_prior <- raster_grid(cells_lon = 100, cells_lat = 100, range_lat = range(sentinel_lat), range_lon = range(sentinel_lon), guard_rail = 0.5) # set sentinel radius sentinel_site_radius <- 0.2
Now we generate some data. We use the sim_data()
function to generate two sources, specifying the kind of sigma model we want to draw from by setting sigma_model = "independent"
. For each source we choose a sigma value using sigma_mean
and sigma_var
. This specifies the mean and variance of each normal distribution a sigma is being drawn from.
# sim data sim1 <- sim_data(sentinel_grid$longitude, sentinel_grid$latitude, sigma_model = "independent", # set the sigma model to independent sigma_mean = c(1, 2), sigma_var = c(0, 0), sentinel_radius = sentinel_site_radius, K = 2, expected_popsize = 1000) # record imporant data data_all1 <- sim1$record$data_all true_source1 <- sim1$record$true_source K_model <- 2 # pull the different sigma values true_sigma1 <- sim1$record$true_sigma
Notice sigma_var
is equal to zero for both sources, this allows us to force the value of each sigma. Now we build our geoprofiling project, add in our data and setup parameters to run the MCMC.
# create a project and bind data p1 <- rgeoprofile_project() p1 <- bind_data(p1, df = sim1$data, data_type = "counts") # add parameter set to project p1 <- new_set(project = p1, spatial_prior = uniform_prior, sentinel_radius = sentinel_site_radius, sigma_model = "independent", sigma_prior_mean = 1.5, sigma_prior_sd = 5, expected_popsize_prior_mean = 1000, expected_popsize_prior_sd = 100) plot1 <- plot_map() plot1 <- overlay_points(plot1, data_all1$lon, data_all1$lat) plot1 <- overlay_sources(plot1, true_source1$lon, true_source1$lat) plot1
We can see a source with a wider sigma and if we zoom into the bottom left we can see a source with a much tighter sigma. Let's create our geoprofiling project, load in this data and run the MCMC under a variable sigma model. We expect that we may run into some mixing issues when estimating multiple sigma values, hence we make use of the Metropolis-Hasting coupling described in one of the previous tutorials.
# optimise beta values for heated MCMC chains with an acceptance prob of at # least 0.75. beta_k <- optimise_beta(proj = p1, K = 2, target_acceptance = 0.75, max_iterations = 25, beta_init = seq(0,1, l = 10), coupling_on = TRUE, burnin = 1e3, converge_test = 5e2, samples = 10, create_maps = FALSE, pb_markdown = TRUE) beta_values <- beta_k$beta_vec[[length(beta_k$beta_vec)]] # run MCMC under K = 2 model p1 <- run_mcmc(project = p1, K = K_model, burnin = 1e4, samples = 5e3, converge_test = 1e3, coupling_on = TRUE, beta_manual = beta_values)
p1 <- rgeoprofile_file("VarSigproject_no_overlap.rds")
Below we can see the geoprofile of the two sources. Just by eye it looks like each cluster has a different sigma value.
# produce and store map plot1 <- plot_map() plot1 <- overlay_sentinels(plot1, p1, sentinel_radius = sentinel_site_radius) plot1 <- overlay_geoprofile(plot1, p1, K = 2) plot1 <- overlay_sources(plot1, true_source1$lon, true_source1$lat) plot1
How can we be sure these sigma values are different? We can access the raw output associated to sigma using the get_output()
function. We can also look into where our true values fall within the 95% credible intervals of the fitted values.
# get the raw sigma draws sigma_output <- get_output(p1, "sigma_sampling", K = 2, type = "raw") sigma1 <- density(sigma_output[,1]) sigma2 <- density(sigma_output[,2]) # calculate the mean of these draws estimatedSigma <- apply(sigma_output, 2, mean) estimatedSigma # get the credible intervals for sigma sigma_intervals <- get_output(p1, "sigma_intervals", K = 2) sigma_intervals plot_sigma(p1, K = 2) # print true values true_sigma1
These results are promising! As we can see the model is returning two distinct values of sigma.
Now we generate another data set, both sources will be closer than the previous data to investigate the models ability to pick up two sources of varying dispersal on top of one another.
# set seed for simmed data set.seed(1) # sim data sim2 <- sim_data(sentinel_grid$longitude, sentinel_grid$latitude, source_lon_min = -0.1, source_lon_max = -0.1, source_lat_min = 51.49, source_lat_max = 51.51, sigma_model = "independent", # set the sigma model to independent sigma_mean = c(0.5, 2), sigma_var = c(0.1, 0.1), sentinel_radius = sentinel_site_radius, K = 2, expected_popsize = 1000) data_all2 <- sim2$record$data_all true_source2 <- sim2$record$true_source # pull the different sigma values true_sigma2 <- sim2$record$true_sigma # create a project and bind data p2 <- rgeoprofile_project() p2 <- bind_data(p2, df = sim2$data, data_type = "counts") # add parameter set to project p2 <- new_set(project = p2, spatial_prior = uniform_prior, sentinel_radius = sentinel_site_radius, sigma_model = "independent", sigma_prior_mean = 1.25, sigma_prior_sd = 5, expected_popsize_prior_mean = 1000, expected_popsize_prior_sd = 100)
Plot the the new data:
plot2 <- plot_map() plot2 <- overlay_points(plot2, data_all2$lon, data_all2$lat) plot2 <- overlay_sources(plot2, true_source2$lon, true_source2$lat) plot2
Again, optimise the Metropolis_Hastings Coupling heats and run the MCMC and plot the new geoprofile:
# optimise beta values beta_k <- optimise_beta(proj = p2, K = 2, target_acceptance = 0.75, max_iterations = 25, beta_init = seq(0,1, l = 10), coupling_on = TRUE, burnin = 1e3, converge_test = 5e2, samples = 10, create_maps = FALSE, pb_markdown = TRUE) beta_values <- beta_k$beta_vec[[length(beta_k$beta_vec)]] # run MCMC under K = 2 model p2 <- run_mcmc(project = p2, K = K_model, burnin = 1e4, samples = 5e3, converge_test = 1e3, coupling_on = TRUE, beta_manual = beta_values)
p2 <- rgeoprofile_file("VarSigproject_overlap.rds")
Below we can see the geoprofile of the two sources.
# produce and store map plot2 <- plot_map() plot2 <- overlay_sentinels(plot2, p2, sentinel_radius = sentinel_site_radius) plot2 <- overlay_geoprofile(plot2, p2, K = 2, smoothing = 2) plot2 <- overlay_sources(plot2, true_source2$lon, true_source2$lat) plot2
Again let's check the raw output of the MCMC chain for each sigma.
# get the raw sigma draws sigma_output2 <- get_output(p2, "sigma_sampling", K = 2, type = "raw") sigma1 <- density(sigma_output2[,1]) sigma2 <- density(sigma_output2[,2]) plot_sigma(p2, K = 2) # calculate the mean of these draws estimatedSigma2 <- apply(sigma_output2, 2, mean) estimatedSigma2 # get the credible intervals for sigma sigma_intervals2 <-get_output(p2, "sigma_intervals", K = 2) sigma_intervals2 # print true values true_sigma2
As we can see the MCMC can return the independent sigma values even in extreme cases where there are overlapping home ranges.
# secretly save/load data from file so output is consistent between tutorials # saveRDS(p1, file = "inst/extdata/VarSigproject_no_overlap.rds") # saveRDS(p2, file = "inst/extdata/VarSigproject_overlap.rds")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.