knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library("ggplot2") library("CrimeDLM.RPackage") library("rstan") library("Rcpp")
The purpose of this document is to provide the code to perform a limited dynamic linear model analysis of counts of monthly robberies and burglaries in Chicago from 2012-2016. Demonstrations of functions provided in CrimeDLM.RPackage
will be given, and simultaneously we will show the basic methods by which DLM's can be used to analyze temporal crime data.
Define $Y_{ct}$ to be the log of the number of reported incidents of crime type $c\in {1, \ldots, C}$ at time $t\in {1, \ldots, T}$. For each crime, we assumed $Y_{ct} = \mu_{ct} + \phi_{ct} + \epsilon_{ct}$ where $\phi_{ct}$ accounts for periodicity in the data due to systematic seasonal variability and $\mu_{ct}$ smoothed (because it does not involve seasonality) mean level of (log) crime reporting. For $\mu_{ct}$ we assumed a linear trend model, i.e. $\mu_{ct} = \mu_{c,t-1} + \beta_{c,t-1}$ and $\beta_{ct} = \beta_{c,t-1} + \delta_{ct}$, were $\beta_{ct}$ represents the difference in mean level from time $t$ to time $t+1$ and $\delta_{ct}$ represents the change in this difference from time $t$ to time $t+1$. Lastly, $(\epsilon_{1t}, \epsilon_{2t})^{\top} \stackrel{iid}{\sim} N(0,\Sigma_{\epsilon})$ and $(\delta_{1t}, \delta_{2t})^{\top} \stackrel{iid}{\sim} N(0,\Sigma_{\delta})$, where each $\epsilon_{ct}$ is independent of every $\delta_{ct}$.
We chose to use independent $Ca^{+}(0,1)$ priors on the evolution and residual standard deviations, but we then put independent LKJ priors with parameter $\nu = 1$ on the correlation matrices for the residual and evolution vectors, $\Omega_\epsilon$ and $\Omega_\delta$, respectively. The density function of the LKJ distribution is, up to a constant of proportionality, $\text{Lkj}(\Sigma|\nu) \propto |\Sigma|^{\nu - 1}$ where $\nu > 0$. The LKJ distribution with $\nu = 1$ is a distribution that is uniform on all correlation matrices of the given dimension. Also, the initial states were assumed to be iid $N(0,10^7)$.
cov_samples <- run_mcmc(data = chicago, chains = 1, adapt_delta = 0.8, iter = 2000, warmup = 1000)
In order to understand whether there is a relationship in how robberies and burglaries trend over time, we can examine the posterior distribution of the evolution correlation.
ggplot() + geom_density(mapping = aes(x = cov_samples$samples$omega_evo[,1,2]), fill = "blue", alpha = 0.5) + xlab("Evolution correlation")
Similarly, we can seek to understand if, on a month to month basis, under or over prediction of robberies and burglaries coincide. This we can do by examining the posterior distribution of the error correlations.
ggplot() + geom_density(mapping = aes(x = cov_samples$samples$omega_error[,1,2]), fill = "red", alpha = 0.5) + xlab("Error correlation")
Now we must sample the latent states. It is important that the order of the crime types given in the run_mcmc()
function matches that of future functions.
state_samples <- get_states(mcmc_samples = cov_samples$samples, data = chicago, crime_types = cov_samples$crime_types)
We have an idea of the relationship between monthly burglaries and robberies over this time period, but we would like to know what the trends for each of these crimes actually look like. We can do this by plotting the smoothed time series means. These correspond to the $\mu_{ct}$ parameters in the model. We also plot 95% pointwise credible intervals.
smoothmeans <- get_smoothmeans(states = state_samples$state_samples, crime_types = state_samples$crime_types) ## create data frame for plotting df <- expand.grid(unique(chicago$month), c(2012:2016),stringsAsFactors = FALSE) df$date <- as.Date(paste(df[,1],df[,2],15, sep = "-"), format = "%b-%Y-%d") df <- rbind(df,df) df$type <- rep(c(state_samples$crime_types), each = nrow(df)/2) df$mean <- c(apply(X = smoothmeans$smoothmeans[,-1,1], MARGIN = 2, FUN = mean), apply(X = smoothmeans$smoothmeans[,-61,2], MARGIN = 2, FUN = mean)) df$lower <- c(apply(X = smoothmeans$smoothmeans[,-1,1], MARGIN = 2, FUN = quantile, probs = 0.025), apply(X = smoothmeans$smoothmeans[,-61,2], MARGIN = 2, FUN = quantile, probs = 0.025)) df$upper <- c(apply(X = smoothmeans$smoothmeans[,-1,1], MARGIN = 2, FUN = quantile, probs = 0.975), apply(X = smoothmeans$smoothmeans[,-61,2], MARGIN = 2, FUN = quantile, probs = 0.975)) ggplot(df) + geom_ribbon(aes(x=date, ymin = exp(lower), ymax = exp(upper), fill = type), alpha = 0.5) + geom_line(aes(x = date, y = exp(mean), colour = type), size = 1.3) + ylab("Crime count") + xlab("Date") + scale_x_date(labels = date_format("%b"), date_breaks = "1 year", date_labels = "%b-%Y")
It may also be of interest to examine and compare any periodicity in the behavior of crime time series. The seasonal effects correspond to the $\phi_{ct}$ parameters in the model. We plot the estimated posterior means as well as 95% pointwise credible intervals.
seasonality <- get_seasonality(states = state_samples$state_samples, harmonics = 4, crime_types = state_samples$crime_types) df_seasonality <- df[c(1:12,61:72),3:4] df_seasonality$mean <- c(apply(X = seasonality$seasonal_effects[,c(2:13),1], MARGIN = 2, FUN = mean), apply(X = seasonality$seasonal_effects[,c(2:13),2], MARGIN = 2, FUN = mean)) df_seasonality$lower <- c(apply(X = seasonality$seasonal_effects[,c(2:13),1], MARGIN = 2, FUN = quantile, probs = 0.025), apply(X = seasonality$seasonal_effects[,c(2:13),2], MARGIN = 2, FUN = quantile, probs = 0.025)) df_seasonality$upper <- c(apply(X = seasonality$seasonal_effects[,c(2:13),1], MARGIN = 2, FUN = quantile, probs = 0.975), apply(X = seasonality$seasonal_effects[,c(2:13),2], MARGIN = 2, FUN = quantile, probs = 0.975)) ggplot(df_seasonality) + geom_ribbon(aes(x = date, ymin = exp(lower), ymax = exp(upper), fill = type), alpha = 0.5) + geom_line(aes(x = date, y = exp(mean), colour = type), size = 1.3) + ylab("Multiplicative effect \n on crime count") + xlab("Month") + scale_x_date(labels = date_format("%b"), date_breaks = "1 month", date_labels = "%b")
The estimated, non-smoothed, time series means may also be of interest.
ts_means <- get_means(states = state_samples$state_samples, harmonics = 4, crime_types = state_samples$crime_types) df_means <- df[,3:4] df_means$mean <- c(apply(X = ts_means$ts_means[,-1,1], MARGIN = 2, FUN = mean), apply(X = ts_means$ts_means[,-61,2], MARGIN = 2, FUN = mean)) df_means$lower <- c(apply(X = ts_means$ts_means[,-1,1], MARGIN = 2, FUN = quantile, probs = 0.025), apply(X = ts_means$ts_means[,-61,2], MARGIN = 2, FUN = quantile, probs = 0.025)) df_means$upper <- c(apply(X = ts_means$ts_means[,-1,1], MARGIN = 2, FUN = quantile, probs = 0.975), apply(X = ts_means$ts_means[,-61,2], MARGIN = 2, FUN = quantile, probs = 0.975)) ggplot(df_means) + geom_ribbon(aes(x=date, ymin = exp(lower), ymax = exp(upper), fill = type), alpha = 0.5) + geom_line(aes(x = date, y = exp(mean), colour = type), size = 1.3) + ylab("Crime count") + xlab("Date") + scale_x_date(labels = date_format("%b"), date_breaks = "1 year", date_labels = "%b-%Y")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.