In nategarton13/CrimeDLM.RPackage: Methods for analysis of crime data using dynamic linear models

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library("ggplot2")
library("CrimeDLM.RPackage")
library("rstan")
library("Rcpp")

Introduction

The purpose of this document is to provide the code to perform a limited dynamic linear model analysis of counts of monthly robberies and burglaries in Chicago from 2012-2016. Demonstrations of functions provided in CrimeDLM.RPackage will be given, and simultaneously we will show the basic methods by which DLM's can be used to analyze temporal crime data.

Model

Data Model

Define $Y_{ct}$ to be the log of the number of reported incidents of crime type $c\in {1, \ldots, C}$ at time $t\in {1, \ldots, T}$. For each crime, we assumed $Y_{ct} = \mu_{ct} + \phi_{ct} + \epsilon_{ct}$ where $\phi_{ct}$ accounts for periodicity in the data due to systematic seasonal variability and $\mu_{ct}$ smoothed (because it does not involve seasonality) mean level of (log) crime reporting. For $\mu_{ct}$ we assumed a linear trend model, i.e. $\mu_{ct} = \mu_{c,t-1} + \beta_{c,t-1}$ and $\beta_{ct} = \beta_{c,t-1} + \delta_{ct}$, were $\beta_{ct}$ represents the difference in mean level from time $t$ to time $t+1$ and $\delta_{ct}$ represents the change in this difference from time $t$ to time $t+1$. Lastly, $(\epsilon_{1t}, \epsilon_{2t})^{\top} \stackrel{iid}{\sim} N(0,\Sigma_{\epsilon})$ and $(\delta_{1t}, \delta_{2t})^{\top} \stackrel{iid}{\sim} N(0,\Sigma_{\delta})$, where each $\epsilon_{ct}$ is independent of every $\delta_{ct}$.

Prior

We chose to use independent $Ca^{+}(0,1)$ priors on the evolution and residual standard deviations, but we then put independent LKJ priors with parameter $\nu = 1$ on the correlation matrices for the residual and evolution vectors, $\Omega_\epsilon$ and $\Omega_\delta$, respectively. The density function of the LKJ distribution is, up to a constant of proportionality, $\text{Lkj}(\Sigma|\nu) \propto |\Sigma|^{\nu - 1}$ where $\nu > 0$. The LKJ distribution with $\nu = 1$ is a distribution that is uniform on all correlation matrices of the given dimension. Also, the initial states were assumed to be iid $N(0,10^7)$.

Analysis of correlations between crimes

Get posterior samples of covariance

cov_samples <- run_mcmc(data = chicago, chains = 1, adapt_delta = 0.8, 
                        iter = 2000, warmup = 1000)

In order to understand whether there is a relationship in how robberies and burglaries trend over time, we can examine the posterior distribution of the evolution correlation.

ggplot() +
  geom_density(mapping = aes(x = cov_samples$samples$omega_evo[,1,2]), fill = "blue", alpha = 0.5) +
  xlab("Evolution correlation")

Similarly, we can seek to understand if, on a month to month basis, under or over prediction of robberies and burglaries coincide. This we can do by examining the posterior distribution of the error correlations.

ggplot() +
  geom_density(mapping = aes(x = cov_samples$samples$omega_error[,1,2]), fill = "red", alpha = 0.5) +
  xlab("Error correlation")

Get posterior samples of the states

Now we must sample the latent states. It is important that the order of the crime types given in the run_mcmc() function matches that of future functions.

state_samples <- get_states(mcmc_samples = cov_samples$samples, data = chicago, crime_types = cov_samples$crime_types)

Analysis of the smoothed time series means

We have an idea of the relationship between monthly burglaries and robberies over this time period, but we would like to know what the trends for each of these crimes actually look like. We can do this by plotting the smoothed time series means. These correspond to the $\mu_{ct}$ parameters in the model. We also plot 95% pointwise credible intervals.

smoothmeans <- get_smoothmeans(states = state_samples$state_samples, crime_types = state_samples$crime_types)

## create data frame for plotting 
df <- expand.grid(unique(chicago$month), c(2012:2016),stringsAsFactors = FALSE)
df$date <- as.Date(paste(df[,1],df[,2],15, sep = "-"), format = "%b-%Y-%d")
df <- rbind(df,df)
df$type <- rep(c(state_samples$crime_types), each = nrow(df)/2)
df$mean <- c(apply(X = smoothmeans$smoothmeans[,-1,1], MARGIN = 2, FUN = mean), apply(X = smoothmeans$smoothmeans[,-61,2], MARGIN = 2, FUN = mean))
df$lower <- c(apply(X = smoothmeans$smoothmeans[,-1,1], MARGIN = 2, FUN = quantile, probs = 0.025), apply(X = smoothmeans$smoothmeans[,-61,2], MARGIN = 2, FUN = quantile, probs = 0.025))
df$upper <- c(apply(X = smoothmeans$smoothmeans[,-1,1], MARGIN = 2, FUN = quantile, probs = 0.975), apply(X = smoothmeans$smoothmeans[,-61,2], MARGIN = 2, FUN = quantile, probs = 0.975))

ggplot(df) +
  geom_ribbon(aes(x=date, ymin = exp(lower), ymax = exp(upper), fill = type), alpha = 0.5) +
  geom_line(aes(x = date, y = exp(mean), colour = type), size = 1.3) +
  ylab("Crime count") +
  xlab("Date") + 
  scale_x_date(labels = date_format("%b"), date_breaks = "1 year", date_labels = "%b-%Y")

Analysis of seasonality

It may also be of interest to examine and compare any periodicity in the behavior of crime time series. The seasonal effects correspond to the $\phi_{ct}$ parameters in the model. We plot the estimated posterior means as well as 95% pointwise credible intervals.

seasonality <- get_seasonality(states = state_samples$state_samples, harmonics = 4, crime_types = state_samples$crime_types)

df_seasonality <- df[c(1:12,61:72),3:4]
df_seasonality$mean <- c(apply(X = seasonality$seasonal_effects[,c(2:13),1], MARGIN = 2, FUN = mean), apply(X = seasonality$seasonal_effects[,c(2:13),2], MARGIN = 2, FUN = mean))
df_seasonality$lower <- c(apply(X = seasonality$seasonal_effects[,c(2:13),1], MARGIN = 2, FUN = quantile, probs = 0.025), apply(X = seasonality$seasonal_effects[,c(2:13),2], MARGIN = 2, FUN = quantile, probs = 0.025))
df_seasonality$upper <- c(apply(X = seasonality$seasonal_effects[,c(2:13),1], MARGIN = 2, FUN = quantile, probs = 0.975), apply(X = seasonality$seasonal_effects[,c(2:13),2], MARGIN = 2, FUN = quantile, probs = 0.975))

ggplot(df_seasonality) +
  geom_ribbon(aes(x = date, ymin = exp(lower), ymax = exp(upper), fill = type), alpha = 0.5) +
  geom_line(aes(x = date, y = exp(mean), colour = type), size = 1.3) +
  ylab("Multiplicative effect \n on crime count") +
  xlab("Month") +
  scale_x_date(labels = date_format("%b"), date_breaks = "1 month", date_labels = "%b")

Estimated time series means (smoothed means + seasonality)

The estimated, non-smoothed, time series means may also be of interest.

ts_means <- get_means(states = state_samples$state_samples, harmonics = 4, crime_types = state_samples$crime_types)

df_means <- df[,3:4]
df_means$mean <- c(apply(X = ts_means$ts_means[,-1,1], MARGIN = 2, FUN = mean), apply(X = ts_means$ts_means[,-61,2], MARGIN = 2, FUN = mean))
df_means$lower <- c(apply(X = ts_means$ts_means[,-1,1], MARGIN = 2, FUN = quantile, probs = 0.025), apply(X = ts_means$ts_means[,-61,2], MARGIN = 2, FUN = quantile, probs = 0.025))
df_means$upper <- c(apply(X = ts_means$ts_means[,-1,1], MARGIN = 2, FUN = quantile, probs = 0.975), apply(X = ts_means$ts_means[,-61,2], MARGIN = 2, FUN = quantile, probs = 0.975))

ggplot(df_means) +
  geom_ribbon(aes(x=date, ymin = exp(lower), ymax = exp(upper), fill = type), alpha = 0.5) +
  geom_line(aes(x = date, y = exp(mean), colour = type), size = 1.3) +
  ylab("Crime count") +
  xlab("Date") +
  scale_x_date(labels = date_format("%b"), date_breaks = "1 year", date_labels = "%b-%Y")