\FloatBarrier

Purpose

This file analyzes data derived from publicly archived data files (Campbell, 2019) to produce outputs for a descriptive paper about the criminal histories of suspected serial sexual perpetrators.^[In this document, we use the terms perpetrator and offender interchangeably. Both terms should be interpreted as referring to individuals suspected of committing a crime but we occasionally omit that qualifier for the sake of brevity.] The data come from Detroit, MI.

\FloatBarrier

Notes on Generalized Estimating Equation (GEE) Results

Numerous sections below contain output for GEE models and estimates derived from them, such as estimated marginal means and contrasts for ratios of those means. We performed those analyses because we think the contrasts provide useful insights into how much offenders' rates of criminal activity differed between the periods before versus after the testing window. Reviewers found some of them less relevant than we did. We have left those results in the research compendium despite having removed some of them from the final paper because readers may still find them interesting and relevant. Preserving them facilitates achieving transparency regarding all the analyses we conducted.

In particular note that we ran models that compared incident counts with and without adjustments for period duration. Only the adjusted comparisons remain in the published paper.

Setup

Set global R chunk options (local chunk options will over-ride global options). The method for creating a size option that controls font size in code chunks and their text output is based on an answer to a question posted on
stackoverflow.com.

``` {r global_options, cfsize = "footnotesize"}

Create a custom chunk hook/option for controlling font size in chunk & output.

def.chunk.hook <- knitr::knit_hooks$get("chunk") knitr::knit_hooks$set(chunk = function(x, options) { x <- def.chunk.hook(x, options) ifelse(options$cfsize != "normalsize", paste0("\n \", options$cfsize,"\n\n", x, "\n\n \normalsize"), x) })

Global chunk options (over-ridden by local chunk options)

knitr::opts_chunk$set(include = TRUE, echo = TRUE, error = TRUE, message = TRUE, warning = TRUE, cfsize = "footnotesize")

Declare location of this script relative to the project root directory.

here::i_am(path = "inst/Step_02_Analysis.Rmd")

Load R packages that we need to get additional functions. 

``` {r load_packages}
library(here)             # for here()
library(plyr)             # For mapvalues()
library(dplyr)            # for %>%, filter(), group_by(), & summarise()
library(tidyr)            # for arrange(), filter(), group_by(), mutate(),  
                          # spread(), summarise(), %>%, etc. 
library(rmarkdown)        # for render()
library(knitr)            # for kable()
options(kableExtra.latex.load_packages = FALSE)
library(kableExtra)       # for kable_styling(), add_header_above(), column_spec(),
                          # collapse_rows(), and landscape()
library(descr)            # For freq().
options(descr.plot=FALSE) # Make freq() & crosstab() skip plots by default.
library(lubridate)        # For date conversion, eg. ymd(), time_length().
library(sjlabelled)       # For set_label(), get_label()
library(haven)            # for read_spss()
library(lattice)          # For xyplot(), bwplot(), etc. 
library(latticeExtra)     # for layer()
library(psych)            # For describe()
library(car)              # For recode()
library(SSACHR)           # for git_report(), rvlabel(), which_latex()
library(geepack)          # for geeglm()
library(emmeans)          # for emmeans()
library(texreg)           # for texreg()
library(ggplot2)          # for ggplot()
library(ggdist)           # for stat_halfeye()

Create objects to hold settings we plan to reuse often.

# Custom settings for use with bwplot().
my.boxes <- list(box.rectangle = list(col = "black", lwd = 2), 
                 box.umbrella = list(col = "black", lwd = 2))

\FloatBarrier

Load Data

The chunk below loads the data saved by the prior script Step_01_Data_Mgt.Rmd.

load(file = here::here("data/CHR_Data.RData"))

\FloatBarrier

Offender-Level Summaries

We have few offender-level demographic variables that are worth reporting (we will omit summarizing height, weight, hair color, & eye color). After examining sex, race, and age, we switch focus to looking at descriptive data about the criminal history records for arrests, prosecutor charges, and adjudicated charges.

# Store objects with sample sizes we will reuse often later. 
N_All    <- IDNEW %>% nrow()
N_Before <- IDNEW %>% filter(Years_Before > 0) %>% nrow()
N_During <- IDNEW %>% filter(Years_During > 0) %>% nrow()
N_After  <- IDNEW %>% filter(Years_After > 0) %>% nrow()

data.frame(N_All, N_Before, N_During, N_After)

All the CHR summaries in this section focus on variables that have been aggregated to the offender level. That is why the sample size for all of these variables is N = r format(N_All, big.mark = ",") (the number of offenders for whom Michigan CHR data were available and the start date for the earliest SAK testing window could be established).

\FloatBarrier

Sex

kable(freq(as_factor(IDNEW$Sex)), format = "latex", booktabs = TRUE, digits = 2,
      caption = "Offender Sex")

\FloatBarrier

Race

kable(freq(as_factor(IDNEW$Race), user.missing = "Unknown"), format = "latex", 
      booktabs = TRUE,  digits = 2, caption = "Offender Race")

\FloatBarrier

Age

Table \ref{tab:examine_Age} presents descriptive statistics for offender age at the start of the earliest testing window and at the time of CHR collection. Figure \ref{fig:boxplot_Age} shows boxplots of the distributions for both of these age variables.

CNames <- c("Variable", "N", "Mean", "SD", "Min", "Max", "Skew", "Kurtosis", 
            "Q25", "Q50", "Q75")
IDNEW %>% 
  select(AgeWDate, Age) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  mutate(Variable  = c("Age at earliest SAK", "Age at CHR collection")) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, 
         Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, 
      col.names = CNames, row.names = FALSE,
      caption = "Offender Age")
# Figure caption.
FCap <- paste("\\label{fig:boxplot_Age}",
              "Box Plots for Offender Age. Only offenders with valid testing", 
              "window start dates are included. The red line marks age 16",
              "years (the start of observed adult CHR data). CHR collection",
              "occurred on 04/15/2016 for all offfenders.",
              "CHR, criminal history record; SAK, sexual assault kit.")

IDNEW %>% 
  select(OID, AgeWDate, Age) %>% 
  # Rearrange data so we can easily plot multiple variables. 
  pivot_longer(cols = -OID, names_to = "VName", values_to = "Age") %>% 
  mutate(VName = ordered(VName, levels = c("AgeWDate", "Age"), 
                         labels = c("At earliest SAK", "At CHR collection"))) %>%
  bwplot(VName ~ Age, data = ., factor = 1, xlim = c(-2, 82),
       panel = panel.mybox, par.settings = my.boxes,
       xlab = "Offender Age")  + 
  latticeExtra::layer(panel.abline(v = 16, col.line = "red"))

\FloatBarrier

CHR Periods

The observed portions of these offenders' adult criminal histories include incident, arrest, prosecutor charge, and judicial charge records associated with crimes that occurred between the offender's 16th birthday and April 15, 2016 (when the CHR data were collected from the official statewide CHR database). That time frame can be divided into three important periods for the purposes of this study: before, during, and after the testing window associated with the offender's earliest sexual assault kit (SAK). Here we provide descriptive data about those periods. Note that the relative timing of an offender's earliest SAK relative to their 16th birthday controls how much of the before and during periods were observed versus unobserved. Figures \ref{fig:boxplot_Age16Date} and \ref{fig:boxplot_WDate} respectively show boxplots for the date of the offenders 16th birthday and the start date of the earliest testing window (which is the date the earliest SAK was collected).

# Figure caption.
FCap <- paste("\\label{fig:boxplot_Age16Date}",
              "Box Plot for Date of Offender's 16th Birthday. Only offenders", 
              "with valid testing window start dates are included.",
              "These dates mark the start of the observed adult criminal",
              "history record for each offender.")

bwplot(~ Age16Date, data = IDNEW, factor = 9, 
       xlim = c(ymd("1948-01-01"), ymd("2012-12-31")),
       panel = panel.mybox, par.settings = my.boxes,
       xlab = "Offender's 16th Birthday")
# Figure caption.
FCap <- paste("\\label{fig:boxplot_WDate}",
              "Box Plot for Start Date of Earliest Testing Window. Only", 
              "offenders with valid testing window start dates are included.")

bwplot(~ WDate, data = IDNEW, factor = 9, 
       xlim = c(ymd("1984-01-01"), ymd("2016-12-31")),
       panel = panel.mybox, par.settings = my.boxes,
       xlab = "Start of Earliest Testing Window")

Those dates were used in the previous data management script to compute the durations of the observed portions of adult CHR that correspond to the periods before, during, and after the earliest testing window. Table \ref{tab:describe_windows} shows descriptive statistics for the these durations and Figure \ref{fig:boxplot_windows} shows corresponding boxplots. These provide crucial context for interpreting the numbers of crimes observed in each of those periods and provide the denominators for translating them into incidence rates.

# Table caption.
TCap <- paste("Observed Period Durations (Years)")
# Footnote text.
FN <- paste0("Only offenders with valid testing window start dates are ", 
             "included. ",
             "TW, earliest testing window.")

rlabs <- c("Before TW", "During TW", "After TW")

IDNEW %>% 
  select(Years_Before, Years_During, Years_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = rlabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE,
      col.names = CNames, caption = TCap) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Figure caption.
FCap <- paste("\\label{fig:boxplot_windows}",
              "Box Plots for Duration of Observed Portions of Periods.",
              "Only offenders with valid testing", 
              "window start dates are included. The red line marks age 16",
              "years, which is the start of observed adult CHR, which was ",
              "collected on 04/15/2016.",
              "TW, earliest testing window.")

IDNEW %>% 
  select(OID, Years_Before, Years_During, Years_After) %>% 
  # Rearrange data so we can easily plot multiple variables. 
  pivot_longer(cols = -OID, names_to = "VName", values_to = "Duration") %>% 
  mutate(VName = ordered(VName, 
                         levels = rev(c("Years_Before", "Years_During",
                                        "Years_After")), 
                         labels = rev(rlabs))) %>%
  bwplot(VName ~ Duration, data = ., factor = 1, xlim = c(-2, 42),
       panel = panel.mybox, par.settings = my.boxes, 
       xlab = "Observed Period Duration (Years)")

CHR Record Counts

We aggregated data to count how many CHR records of each type are associated with each offender. Here we are counting only records associated with incidents for which the offender was arrested for, charged with, or convicted of at least one offense from the 12 main crime categories. We summarize the resulting variables (NINC12, NARR12, NCHG12, NJUD12, & NCON12) below. Table \ref{tab:describe_rcvars} provides overall descriptive statistics for all record types, plus summaries broken down by when the associated incidents occurred relative to the testing window.

Only the data about incident records counts from Table \ref{tab:describe_rcvars} is directly reported in the manuscript. The numbers about other types of records are contextual information about the volume of data that was available.

# Table caption. 
TCap <- paste("Offender-Level Record Counts (All Crime Categories, Overall and",
              "By When Incident Occurred Relative to Testing Window)")
# Footnote text. 
FN <- paste("Only offenders with valid testing window start dates are included.",
            "Only records associated with incidents where the offender was",
            "arrested for, charged with, or convicted of at least one offense",
            "from one of the 12 main crime categories were counted.",
            "Before, during, and after refer to when the incident associated",
            "with the record occurred, not when the arrest date, charge date,",
            "adjudication date, or conviction date occurred relative to the",
            "testing window.")

BDA <- c("...Before testing window", 
         "...During testing window", 
         "...After testing window")
rclabs <- c("No. of incidents", BDA, 
            "No. of arrest offenses", BDA,
            "No. of prosecutor charges", BDA,
            "No. of adjudicated charges (all)", BDA,
            "No. of adjudicated charges (convictions)", BDA)

# Summarize CHR record counts (both overall and broken down by IWhen).
IDNEW %>% 
  rename(NINC12 = HXCat12_Any, NINC12_Before = HXCat12_Any_Before, 
         NINC12_During = HXCat12_Any_During, NINC12_After = HXCat12_Any_After) %>% 
  select(NINC12, NINC12_Before, NINC12_During, NINC12_After, 
         NARR12, NARR12_Before, NARR12_During, NARR12_After,
         NCHG12, NCHG12_Before, NCHG12_During, NCHG12_After,
         NJUD12, NJUD12_Before, NJUD12_During, NJUD12_After,
         NCON12, NCON12_Before, NCON12_During, NCON12_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = rclabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap) %>% 
  group_rows(" ", 1, 4) %>% 
  group_rows(" ", 5, 8) %>% 
  group_rows(" ", 9, 12) %>% 
  group_rows(" ", 13, 16) %>% 
  group_rows(" ", 17, 20) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Vectors of variables names and labels. 
rcvars  <- c("NINC12", "NARR12", "NCHG12", "NJUD12", "NCON12")
rctypes <- c("Incidents", "Arrest Offenses", "Prosecutor Charges", 
             "Adjudicated Charges", "Convictions")

IDNEW %>% 
  rename(NINC12 = HXCat12_Any) %>% 
  select(OID, NINC12, NARR12, NCHG12, NJUD12, NCON12) %>% 
  # Rearrange data so we can easily plot multiple variables. 
  pivot_longer(cols = -OID, names_to = "VName", values_to = "Count") %>% 
  mutate(VName = ordered(VName, levels = rcvars, labels = rctypes)) ->
  RCCOUNTS
# Figure caption.
FCap <- paste("\\label{fig:boxplot_RCCOUNTS}",
              "Box Plots for Overall Record Counts by Criminal History Record", 
              "Type. Only offenders with valid testing window start dates are", 
              "included.",
              "Only records associated with incidents where the offender was",
              "arrested for, charged with, or convicted of at least one offense",
              "from one of the 12 main crime categories were counted.")

bwplot(VName ~ Count, data = RCCOUNTS, factor = 1, xlim = c(-4, 64),
       panel = panel.mybox, par.settings = my.boxes,
       xlab = "Overall Number of Records for Individual Offender")

``` {r densityplot_RCCOUNTS, fig.height = 3, fig.width = 8.5, fig.cap=FCap}

Figure caption.

FCap <- paste("\label{fig:densityplot_RCCOUNTS}", "Density Plots for Overall Record Counts by Criminal History", "Record Type. Only offenders with valid testing window start", "dates are included.", "Only records associated with incidents where the offender was", "arrested for, charged with, or convicted of at least one offense", "from one of the 12 main crime categories were counted.", "Dashed red lines are normal distribution curves (for comparison).")

densityplot(~ Count | VName, data = RCCOUNTS, layout = c(5,1), panel = panel.mydensity, jitter.amount = 0.005, strip = strip.custom(par.strip.text = list(cex = .8)), xlab = "Overall Number of Records for Offender")

In the subsections below, Tables \ref{tab:freq_NINC12}, \ref{tab:freq_NARR12}, 
\ref{tab:freq_NCHG12}, \ref{tab:freq_NJUD12}, and \ref{tab:freq_NCON12} 
respectively show the frequency distributions underlying those statistics for
incidents, arrest offenses, prosecutor charges, all adjudicated charges, and
adjudicated charges that were convictions.

\FloatBarrier 

### Incident Records

```r
# Table caption.
TCap <- paste("Frequency Distributions for Number of Incidents",
              "Overall and By When Incident Occurred")
# Footnote text.
FN <- paste0("Only offenders with valid testing window start dates are ", 
             "included. Only incidents where the offender was arrested for, ",
             "charged with, or convicted of at least one offense from one of ",
             "the 12 main crime categories were included. ",
             "N = ", format(N_All, big.mark = ","), ". TW, earliest ",
             "testing window.")

# Get frequency distributions. 
IDNEW %>% 
  rename(NINC12 = HXCat12_Any, NINC12_Before = HXCat12_Any_Before, 
         NINC12_During = HXCat12_Any_During, NINC12_After = HXCat12_Any_After) %>% 
  select(OID, NINC12, NINC12_Before, NINC12_During, NINC12_After) %>% 
  rename(All = NINC12, Before = NINC12_Before, During = NINC12_During, 
         After = NINC12_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value),
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

GEE Models

To more closely examine the incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_Any_gee1 <- IDNEWL %>% 
  # Keep only observations for observed periods (i.e., those with Years > 0).
  filter(Years > 0) %>%
  geeglm(HXCat12_Any ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_Any_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_Any ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_Any_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference level for When was the period after the testing",
             "window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_Any_gee1, HXCat12_Any_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_Any_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_Any_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_Any_emmeans1 <- emmeans(HXCat12_Any_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_Any_emmeans2 <- emmeans(HXCat12_Any_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_Any_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_Any_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_Any_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Incident",
              "Counts and Incidence Rates By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_Any_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_Any_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Arrest Offense Records

Output in this section is not directly used in the manuscript. It is here because it helped us to understand the volume of raw data that was being processed along the way to constructing the data sets used for the manuscript.

# Table caption.
TCap <- paste("Frequency Distributions for Number of Arrest Offenses (All",
              "Crime Categories) Overall and By When Incident Occurred")
# Footnote text.
FN <- paste0("N = ", format(N_All, big.mark = ","), ". TW, earliest ",
            "testing window. ",
            "Only arrest records for offenders with valid testing window start ",
            "dates and with offenses from the 12 main crime categories were ",
            "included. ",
            "Before, during, and after refer to when the incident associated ",
            "with the record occurred, not when the arrest date occurred ",
            "relative to the testing window. For example, an offender could be ",
            "arrested during the testing window for an incident that occurred ",
            "before it began. That would show up here in the before column.")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, NARR12, NARR12_Before, NARR12_During, NARR12_After) %>% 
  rename(All = NARR12, Before = NARR12_Before, During = NARR12_During, 
         After = NARR12_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value),
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>%
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Prosecutor Charge Records

Output in this section is not directly used in the manuscript. It is here because it helped us to understand the volume of raw data that was being processed along the way to constructing the data sets used for the manuscript.

# Table caption.
TCap <- paste("Frequency Distributions for Number of Prosecutor Charges (All",
              "Crime Categories) Overall and By When Incident Occurred")
# Footnote text.
FN <- paste0("N = ", format(N_All, big.mark = ","), ". TW, earliest ",
            "testing window. ",
            "Only charge records for offenders with valid testing window start ",
            "dates and with charges from the 12 main crime categories were ",
            "included. ",
            "Before, during, and after refer to when the incident associated ",
            "with the record occurred, not when the charge date occurred ",
            "relative to the testing window. For example, an offender could be ",
            "charged during the testing window for an incident that occurred ",
            "before it began. That would show up here in the before column.")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, NCHG12, NCHG12_Before, NCHG12_During, NCHG12_After) %>% 
  rename(All = NCHG12, Before = NCHG12_Before, During = NCHG12_During, 
         After = NCHG12_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value),
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>%
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%  
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Adjudicated Charge Records (All)

Output in this section is not directly used in the manuscript. It is here because it helped us to understand the volume of raw data that was being processed along the way to constructing the data sets used for the manuscript.

# Table caption.
TCap <- paste("Frequency Distributions for Number of Adjudicated Charges (All",
              "Crime Categories) Overall and By When Incident Occurred")
# Footnote text.
FN <- paste0("N = ", format(N_All, big.mark = ","), ". TW, earliest ",
            "testing window. ",
            "Only records for offenders with valid testing window start dates ",
            "and with charges from one of the 12 main crime categories were ",
            "included. ", 
            "Before, during, and after refer to when the incident associated ",
            "with the record occurred, not when the adjudication date occurred ",
            "relative to the testing window. For example, a charge could be ",
            "adjudicated during the testing window for an incident that ",
            "occurred before it began. That would show up here in the before ",
            "column.")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, NJUD12, NJUD12_Before, NJUD12_During, NJUD12_After) %>% 
  rename(All = NJUD12, Before = NJUD12_Before, During = NJUD12_During, 
         After = NJUD12_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Adjudicated Charge Records (Convictions)

Output in this section is not directly used in the manuscript. It is here because it helped us to understand the volume of raw data that was being processed along the way to constructing the data sets used for the manuscript.

# Table caption.
TCap <- paste("Frequency Distributions for Number of Adjudicated Charges (All",
              "Crime Categories, Convictions) Overall and By When Incident",
              "Occurred")
# Footnote text.
FN <- paste0("N = ", format(N_All, big.mark = ","), ". TW, earliest ",
            "testing window. ",
            "Only conviction records for offenders with valid testing window ",
            "start dates and with charges from one of the 12 main crime ",
            "categories were included. ", 
            "Before, during, and after refer to when the incident associated ",
            "with the record occurred, not when the adjudication date occurred ",
            "relative to the testing window. For example, an offender could be ",
            "convicted during the testing window for an incident that occurred ",
            "before it began. That would show up here in the before column.")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, NCON12, NCON12_Before, NCON12_During, NCON12_After) %>% 
  rename(All = NCON12, Before = NCON12_Before, During = NCON12_During, 
         After = NCON12_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Crime Category Counts (Arrested, Charged, or Convicted)

Table \ref{tab:describe_cccvars} summarizes how many of 12 different crime categories each offender was arrested for, charged with, or convicted of according to the CHR data. This sums a set of binary transforms on the incident count variables associated with each crime category in HXCat12. It is possible to have multiple arrest offense records, charge records, or conviction records for a specific category, but we only count each distinct category once at most. The Excluded user-missing value is not counted as a category, so the possible range of values for these counts is 0-12. We report both overall results, and results broken down by when the incidents associated with the arrest, charge, and conviction records occurred relative to the earliest testing window. Table \ref{tab:freq_HXCat12_Count} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Crime Category Counts (Overall and By When",
              "Incident Occurred)")
# Footnote text. 
FN <- paste("Only offenders with valid testing window start dates are included.",
            "Only records associated with incidents where the offender was ",
            "arrested for, charged with, or convicted of at least one offense ",
            "from one of the 12 main crime categories were counted. ",
            "Before, during, and after refer to when the incident associated",
            "with the record occurred, not when the arrest date, charge date,",
            "adjudication date, or conviction date occurred relative to the",
            "testing window.")

cclabs <- c("No. of crime categories (arrested, charged, or convicted)", BDA)

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_Count, HXCat12_Count_Before, HXCat12_Count_During, 
         HXCat12_Count_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = cclabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Crime Categories for",
              "Which Offender Was Arrested, Charged, or Convicted (Overall",
              "and By When Incident Occurred)")
# Footnote text.
FN <- paste0("N = ", format(N_All, big.mark = ","), ". TW, earliest ",
            "testing window. ",
            "Only records associated with incidents where the offender was ",
            "arrested for, charged with, or convicted of at least one offense ",
            "from one of the 12 main crime categories were counted. ",
            "Before, during, and after refer to when the incident associated ",
            "with the record occurred, not when the charge date occurred ",
            "relative to the testing window. For example, an offender could be ",
            "convicted during the testing window for an incident that occurred ",
            "before it began. That would show up here in the before column.")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_Count, HXCat12_Count_Before, HXCat12_Count_During, 
         HXCat12_Count_After) %>% 
  rename(All = HXCat12_Count, Before = HXCat12_Count_Before, 
         During = HXCat12_Count_During, After = HXCat12_Count_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the crime category counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

cccvar_gee1 <- IDNEWL %>% 
  # Keep only observations for observed periods (i.e., those with Years > 0).
  filter(Years > 0) %>%
  geeglm(HXCat12_Count ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

cccvar_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_Count ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:cccvar_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Crime Category Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference level for When was the period after the testing",
             "window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(cccvar_gee1, cccvar_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:cccvar_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:cccvar_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Crime Category Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

cccvar_emmeans1 <- emmeans(cccvar_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
cccvar_emmeans2 <- emmeans(cccvar_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(cccvar_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(cccvar_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:cccvar_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Crime",
              "Category Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(cccvar_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(cccvar_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Incident Counts by Crime Category (12 levels) and Period

In this section, we analyze variables that contain the number of unique incidents in the offender's criminal history for a given type of crime. For each category of crime, we report include counts of incidents based on a combined criterion where the offender was arrested for, charged with, or convicted of (any of the three events, or any combination of them) crimes classified into the relevant category.

Note that these variables were created by first aggregating ARR, CHG, and JUD records to the incident level to flag incidents that fit the relevant category, then aggregated again to get to the offender level incident counts. It is possible to have multiple arrest offense records, charge records, or adjudication records for convictions on a single incident, but these variables ignore that and only count each incident once.

It may be helpful to note that the order of subsections below follows the numerical coding of the categories (which was influenced by alphabetical order of the labels we chose). We grouped and presented results in a different order in the manuscript that we think makes the material easier to present to readers. So, in comparing these materials to the published paper, pay attention to the crime category labels and section headings.

The chunk below creates some objects we use to dynamically construct text later in the subsections below.

# Save a formatted value for total number of offenders.
N_IDNEW <- format(N_All, big.mark = ",")

# Check how many offenders had >= 1 CSC arrest, charge, or conviction in 
# their CHR data for each crime category.
IDNEW %>% filter(HXCat12_1  >= 1) %>% nrow()       -> N_HXCat12_1
IDNEW %>% filter(HXCat12_2  >= 1) %>% nrow()       -> N_HXCat12_2
IDNEW %>% filter(HXCat12_3  >= 1) %>% nrow()       -> N_HXCat12_3
IDNEW %>% filter(HXCat12_4  >= 1) %>% nrow()       -> N_HXCat12_4
IDNEW %>% filter(HXCat12_5  >= 1) %>% nrow()       -> N_HXCat12_5
IDNEW %>% filter(HXCat12_6  >= 1) %>% nrow()       -> N_HXCat12_6
IDNEW %>% filter(HXCat12_7  >= 1) %>% nrow()       -> N_HXCat12_7
IDNEW %>% filter(HXCat12_8  >= 1) %>% nrow()       -> N_HXCat12_8
IDNEW %>% filter(HXCat12_9  >= 1) %>% nrow()       -> N_HXCat12_9
IDNEW %>% filter(HXCat12_10 >= 1) %>% nrow()       -> N_HXCat12_10
IDNEW %>% filter(HXCat12_11 >= 1) %>% nrow()       -> N_HXCat12_11
IDNEW %>% filter(HXCat12_12 >= 1) %>% nrow()       -> N_HXCat12_12
IDNEW %>% filter(HXCat12_Sexual >= 1) %>% nrow()   -> N_HXCat12_Sexual
IDNEW %>% filter(HXCat12_Violent >= 1) %>% nrow()  -> N_HXCat12_Violent
IDNEW %>% filter(HXCat12_Property >= 1) %>% nrow() -> N_HXCat12_Property
IDNEW %>% filter(HXCat12_Other >= 1) %>% nrow()    -> N_HXCat12_Other

# Convert those new variables to percentages. 
P_HXCat12_1        <- round(100*N_HXCat12_1/N_All, digits = 0)
P_HXCat12_2        <- round(100*N_HXCat12_2/N_All, digits = 0)
P_HXCat12_3        <- round(100*N_HXCat12_3/N_All, digits = 0)
P_HXCat12_4        <- round(100*N_HXCat12_4/N_All, digits = 0)
P_HXCat12_5        <- round(100*N_HXCat12_5/N_All, digits = 0)
P_HXCat12_6        <- round(100*N_HXCat12_6/N_All, digits = 0)
P_HXCat12_7        <- round(100*N_HXCat12_7/N_All, digits = 0)
P_HXCat12_8        <- round(100*N_HXCat12_8/N_All, digits = 0)
P_HXCat12_9        <- round(100*N_HXCat12_9/N_All, digits = 0)
P_HXCat12_10       <- round(100*N_HXCat12_10/N_All, digits = 0)
P_HXCat12_11       <- round(100*N_HXCat12_11/N_All, digits = 0)
P_HXCat12_12       <- round(100*N_HXCat12_12/N_All, digits = 0)
P_HXCat12_Sexual   <- round(100*N_HXCat12_Sexual/N_All, digits = 0)
P_HXCat12_Violent  <- round(100*N_HXCat12_Violent/N_All, digits = 0)
P_HXCat12_Property <- round(100*N_HXCat12_Property/N_All, digits = 0)
P_HXCat12_Other    <- round(100*N_HXCat12_Other/N_All, digits = 0)

This next chunk creates some objects to store things like standardized footnotes that we will re-use.

# Footnote text: Descriptives tables. 
FN1 <- paste("Only offenders with valid testing window start dates are",
             "included.",
             "Only incidents where the offender was arrested for, charged with,",
             "or convicted of at least one offense from one of the 12 main crime",
             "categories were counted.",
             "Before, during, and after refer to when the incident",
             "associated with the record occurred, not when the arrest date,",
             "charge date, or conviction date occurred relative to the testing",
             "window.")

# Footnote text.
FN5 <- paste0("N = ", N_IDNEW, ". TW, earliest testing window. ",
              "Only offenders with valid testing window start dates are ",
              "included. Before, during, and after refer to when the incident ",
              "associated with the record occurred, not when the arrest, ",
              "charge, or conviction date occurred relative to the testing ",
              "window. For example, an offender could be arrested during the ",
              "testing window for an incident that occurred before it began. ",
              "That would show up here in the before column. ", 
              "An incident is counted if there are any arrest, charge, or ",
              "conviction records (any one type, or any combination of them ",
              "will suffice) for the specified crime category associated with ",
              "it.")

# Vector of count types
counttypes <- c("Arrested", "Charged", "Convicted", 
                "Arrested, Charged, or Convicted")

# Vector of row labels for descriptives tables. 
TRowLabs <- c("Incident count (arrested, charged, or convicted)", BDA)

\FloatBarrier

Arson

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_1 (r P_HXCat12_1\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for arson. Table \ref{tab:describe_arson} summarizes the counts of unique arson incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_1} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Arson Incident Counts (Overall and By When",
              "Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_1, HXCat12_1_Before, HXCat12_1_During, HXCat12_1_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Arson Incidents for Which",
              "Offender Was Arrested, Charged, or Convicted (Overall and By",
              "When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_1, HXCat12_1_Before, HXCat12_1_During, HXCat12_1_After) %>% 
  rename(All = HXCat12_1, Before = HXCat12_1_Before, During = HXCat12_1_During, 
         After = HXCat12_1_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value),
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the arson incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_1_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_1 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_1_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_1 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_1_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Arson Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_1_gee1, HXCat12_1_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_1_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_1_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Arson Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_1_emmeans1 <- emmeans(HXCat12_1_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_1_emmeans2 <- emmeans(HXCat12_1_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_1_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_1_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_1_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Arson",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_1_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_1_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Assault - DV, Stalking

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_2 (r P_HXCat12_2\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for assault involving domestic violence and/or stalking. Table \ref{tab:describe_assaultdv} summarizes the counts of unique assault incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_2} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Assault Involving Domestc Violence and/or",
              "Stalking Incident Counts (Overall and By When Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_2, HXCat12_2_Before, HXCat12_2_During, HXCat12_2_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Assault Involving",
              "Domestic Violence and/or Stalking Incidents for Which",
              "Offender Was Arrested, Charged, or Convicted (Overall and By",
              "When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_2, HXCat12_2_Before, HXCat12_2_During, HXCat12_2_After) %>% 
  rename(All = HXCat12_2, Before = HXCat12_2_Before, During = HXCat12_2_During, 
         After = HXCat12_2_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the assault involving domestic violence and/or stalking incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_2_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_2 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_2_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_2 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_2_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Assault Involving",
              "Domestic Violence and/or Stalking Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_2_gee1, HXCat12_2_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_2_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_2_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Assault Involving Domestic Violenace and/or",
              "Stalking Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_2_emmeans1 <- emmeans(HXCat12_2_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_2_emmeans2 <- emmeans(HXCat12_2_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_2_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_2_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_2_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Assault",
              "Involving Domestic Violence and/or Stalking",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_2_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_2_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Assault - Non-Sexual, Non-DV

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_3 (r P_HXCat12_3\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for a non-sexual, non-domestic violence assault. Table \ref{tab:describe_assaultndv} summarizes the counts of unique assault incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_3} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Assault (Non-Sexual, Non-Domestic Violence)",
              "Incident Counts (Overall and By When Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_3, HXCat12_3_Before, HXCat12_3_During, HXCat12_3_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Assault (Non-Sexual,", 
              "Non-Domestic Violence) Incidents for Which Offender Was",
              "Arrested, Charged, or Convicted (Overall and By When Incident",
              "Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_3, HXCat12_3_Before, HXCat12_3_During, HXCat12_3_After) %>% 
  rename(All = HXCat12_3, Before = HXCat12_3_Before, During = HXCat12_3_During, 
         After = HXCat12_3_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the assault (non-sexual, non-domestic violence) incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_3_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_3 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_3_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_3 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_3_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Assault (Non-Sexual,",
              "Non-Domestic Violance) Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_3_gee1, HXCat12_3_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_3_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_3_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Assault (Non-Sexual, Non-Domestic Violence)",
              "Incident Counts and Incidence Rates By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_3_emmeans1 <- emmeans(HXCat12_3_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_3_emmeans2 <- emmeans(HXCat12_3_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_3_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_3_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_3_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Assault", 
              "(Non-Sexual, Non-Domestic Violence)",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_3_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_3_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Burglary

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_4 (r P_HXCat12_4\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for burglary. Table \ref{tab:describe_burglary} summarizes the counts of unique burglary incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_4} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Burglary Incident Counts (Overall and By When",
              "Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_4, HXCat12_4_Before, HXCat12_4_During, HXCat12_4_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Burglary Incidents for",
              "Which Offender Was Arrested, Charged, or Convicted (Overall and",
              "By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_4, HXCat12_4_Before, HXCat12_4_During, HXCat12_4_After) %>% 
  rename(All = HXCat12_4, Before = HXCat12_4_Before, During = HXCat12_4_During, 
         After = HXCat12_4_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the burglary incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_4_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_4 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_4_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_4 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_4_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Burglary Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_4_gee1, HXCat12_4_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_4_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_4_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Burglary Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_4_emmeans1 <- emmeans(HXCat12_4_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_4_emmeans2 <- emmeans(HXCat12_4_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_4_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_4_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_4_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Burglary",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_4_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_4_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Criminal Sexual Conduct

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_5 (r P_HXCat12_5\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for criminal sexual conduct. Table \ref{tab:describe_csc} summarizes the counts of unique criminal sexual conduct incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_5} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Criminal Sexual Conduct Incidents (Overall and",
              "By When Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_5, HXCat12_5_Before, HXCat12_5_During, HXCat12_5_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Criminal Sexual Conduct",
              "Incidents for Which Offender Was Arrested, Charged, or",
              "Convicted (Overall and By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_5, HXCat12_5_Before, HXCat12_5_During, HXCat12_5_After) %>% 
  rename(All = HXCat12_5, Before = HXCat12_5_Before, 
         During = HXCat12_5_During, After = HXCat12_5_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Overlap Between During Period Incidents and Earliest Known SAKs

The earliest known SAKs come from a different data source than the criminal history records. Those SAKs were tested long after most of the offenders’ criminal history records had accumulated. It is possible that some or even all of the criminal sexual conduct incidents found in the offenders' criminal histories during the testing window represent the same event as the earliest known SAK that starts the testing window. It is also possible that some or all of the criminal sexual conduct incidents in the criminal histories are entirely different events that show repeat offending behavior within a short period of time. Therefore, below we assess how much overlap there is between the earliest SAKs that define when the testing windows start and the CSC incidents from the criminal histories that occur during that period.

We start by examining criminal sexual conduct incident records. On these records, the IID variable is a primary key used for uniquely identifying each incident, the ESAK_IID variable on an incident record contains the IID associated with the offender's earliest known SAK, and IWhen classifies each incident as occurring before, during, or after the testing window associated with the offender's earliest known SAK. When IID = ESAK_IID, the incident record overlaps with the earliest known SAK. So, Table \ref{tab:crosstab_overlap} shows how many CSC incident records there are in each IWhen period that overlap with the earliest SAK.

# Create tibble with only CSC incident records. 
INCEW %>% 
  filter(HXCat12_5 == 1) %>% 
  mutate(Overlap = (ESAK_IID == IID)) -> 
  INCEW_CSC

# Create tibble with only CSC incident records during the testing window. 
INCEW_CSC %>% filter(IWhen == "During") ->
  INCEW_CSCD

TCap <- paste("Number of Criminal Sexual Conduct Incidents By When Incident",
              "Occurred and Whether The Incident Overlaps The Offender's", 
              "Earliest SAK")

# Crosstab 
xtabs(~IWhen + Overlap, data = INCEW_CSC, addNA = TRUE) %>% 
  addmargins() %>% 
  as.data.frame() %>% 
  pivot_wider(names_from = Overlap, values_from = Freq) %>% 
  kable(., format = "latex", booktabs = TRUE, caption = TCap, 
        col.names = c("IWhen", "False", "True", "Sum"),
        format.args = list(big.mark = ",")) %>%
  add_header_above(c("", "Overlap Earliest SAK" = 2, "")) %>%
  column_spec(2:4, width = "1.6cm") 

All of the overlap incidents should occur during the testing window and that is consistent with what we see in Table \ref{tab:crosstab_overlap}. Therefore, we move on to extract a few key numbers and do some additional computations to quantify the degree of overlap.

# No. of CSC incidents observed (all periods). 
INCEW_CSC %>% nrow() -> N_ICSC
N_ICSC

# No. of CSC incidents observed during testing window. 
INCEW_CSCD %>% nrow() -> N_ICSCD
N_ICSCD

# No. of unique offenders with >=1 CSC incident in CHR
INCEW_CSC %>% pull(OID) %>% unique() %>% length() -> N_OCSC
N_OCSC

# No. of unique offenders with >=1 CSC incident in CHR during testing window
INCEW_CSCD %>% pull(OID) %>% unique() %>% length() -> N_OCSCD 
N_OCSCD

# No. of unique offenders whose earliest SAK overlaps a CSC incident in CHR
# There is a maximum of one overlap incident per offender, so this is also the
# number of unique overlap CSC incidents.
INCEW_CSC %>% filter(ESAK_IID == IID) %>% pull(OID) %>% unique() %>% length() ->
  N_Overlap
N_Overlap

From an incident perspective, there were a total of r N_ICSC CSC incidents in the criminal history data, but only r N_ICSCD (r round(100*N_ICSCD/N_ICSC, digits = 2)%) of them occurred during an offender's testing window. The r N_Overlap incidents that overlap with the offender's earliest known SAK comprise r round(100*N_Overlap/N_ICSC, digits = 2)% of all the CSC incidents and r round(100*N_Overlap/N_ICSCD, digits = 2)% of the CSC incidents that occurred during testing windows. So, most of the CSC incidents that occurred during an offender’s testing window are linked to the very SAK that started the testing window itself.

Meanwhile, from an offender perspective, there are r N_OCSC offenders (r round(100*N_OCSC/N_All, digits = 2)% of the sample) with at least one CSC incident in their criminal history records. However, only r N_OCSCD (r round(100*N_OCSCD/N_All, digits = 2)%) offenders have one or more CSC incidents in their criminal history that specifically fall during their testing window. Further subsetting shows that there are r N_Overlap offenders for whom we believe the earliest known SAK can be linked to a specific CSC incident from their criminal history data. They comprise r round(100*N_Overlap/N_All, digits = 2)% of the full sample and r round(100*N_Overlap/N_OCSCD, digits = 2)% of the subset of offenders who had at least one CSC incident during the testing window according to the criminal history records.

\FloatBarrier

GEE Models

To more closely examine the criminal sexual conduct incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_5_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_5 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_5_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_5 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_5_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Criminal Sexual",
              "Conduct Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_5_gee1, HXCat12_5_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_5_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_5_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Criminal Sexual Conduct Incident Counts and",
              "Incidence Rates By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_5_emmeans1 <- emmeans(HXCat12_5_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_5_emmeans2 <- emmeans(HXCat12_5_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_5_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_5_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_5_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Criminal",
              "Sexual Conduct Incident Counts and Incidence Rates By When",
              "Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_5_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_5_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Drug Crime

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_6 (r P_HXCat12_6\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for drug crime. Table \ref{tab:describe_drug} summarizes the counts of unique drug crime incidents overall and by and when the incident occurred, while Table \ref{tab:freq_HXCat12_6} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Drug Crime Incident Counts (Overall and By When",
              "Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_6, HXCat12_6_Before, HXCat12_6_During, HXCat12_6_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Drug Crime Incidents for",
              "Which Offender Was Arrested, Charged, or Convicted (Overall and",
              "By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_6, HXCat12_6_Before, HXCat12_6_During, HXCat12_6_After) %>% 
  rename(All = HXCat12_6, Before = HXCat12_6_Before, During = HXCat12_6_During, 
         After = HXCat12_6_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the drug crime incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_6_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_6 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_6_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_6 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_6_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Drug Crime Incident",
              "Counts Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_6_gee1, HXCat12_6_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_6_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_6_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Drug Crime Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_6_emmeans1 <- emmeans(HXCat12_6_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_6_emmeans2 <- emmeans(HXCat12_6_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_6_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_6_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_6_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Drug Crime",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_6_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_6_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Homicide

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_7 (r P_HXCat12_7\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for homicide. Table \ref{tab:describe_homicide} summarizes the counts of unique homicide incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_7} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Homicide Incident Counts (Overall and By When",
              "Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_7, HXCat12_7_Before, HXCat12_7_During, HXCat12_7_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Homicide Incidents for",
              "Which Offender Was Arrested, Charged, or Convicted (Overall and",
              "By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_7, HXCat12_7_Before, HXCat12_7_During, HXCat12_7_After) %>% 
  rename(All = HXCat12_7, Before = HXCat12_7_Before, During = HXCat12_7_During, 
         After = HXCat12_7_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the homicide incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_7_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_7 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_7_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_7 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_7_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Homicide Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_7_gee1, HXCat12_7_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_7_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_7_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Homicide Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_7_emmeans1 <- emmeans(HXCat12_7_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_7_emmeans2 <- emmeans(HXCat12_7_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_7_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_7_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_7_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Homicide",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_7_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_7_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Larceny/Theft/Fraud

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_8 (r P_HXCat12_8\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for larceny, theft, or fraud. Table \ref{tab:describe_larceny} summarizes the counts of unique larceny, theft, or fraud incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_8} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Larceny, Theft, or Fraud Incidents (Overall and",
              "By When Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_8, HXCat12_8_Before, HXCat12_8_During, HXCat12_8_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Larceny, Theft, or Fraud",
              "Incidents for Which Offender Was Arrested, Charged, or",
              "Convicted (Overall and By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_8, HXCat12_8_Before, HXCat12_8_During, HXCat12_8_After) %>% 
  rename(All = HXCat12_8, Before = HXCat12_8_Before, 
         During = HXCat12_8_During, After = HXCat12_8_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the larceny, theft or fraud incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_8_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_8 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_8_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_8 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_8_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Larceny, Theft, or",
              "Fraud Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_8_gee1, HXCat12_8_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_8_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_8_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Larceny, Theft, or Fraud Incident Counts and",
              "Incidence Rates By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_8_emmeans1 <- emmeans(HXCat12_8_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_8_emmeans2 <- emmeans(HXCat12_8_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_8_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_8_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_8_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Larceny,",
              "Theft, or Fraud Incident Counts and Incidence Rates By When",
              "Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_8_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_8_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Robbery

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_9 (r P_HXCat12_9\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for robbery. Table \ref{tab:describe_robbery} summarizes the counts of unique robbery incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_9} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Robbery Incident Counts (Overall and By When",
              "Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_9, HXCat12_9_Before, HXCat12_9_During, HXCat12_9_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Robbery Incidents for",
              "Which Offender Was Arrested, Charged, or Convicted (Overall and",
              "By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_9, HXCat12_9_Before, HXCat12_9_During, HXCat12_9_After) %>% 
  rename(All = HXCat12_9, Before = HXCat12_9_Before, During = HXCat12_9_During, 
         After = HXCat12_9_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the robbery incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_9_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_9 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_9_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_9 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_9_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Robbery Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_9_gee1, HXCat12_9_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_9_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_9_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Robbery Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_9_emmeans1 <- emmeans(HXCat12_9_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_9_emmeans2 <- emmeans(HXCat12_9_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_9_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_9_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_9_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Robbery",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_9_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_9_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Sex Crimes

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_10 (r P_HXCat12_10\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for sex crimes. Table \ref{tab:describe_sexcrime} summarizes the counts of unique sex crime incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_10} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Sex Crime Incident Counts (Overall and By When",
              "Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_10, HXCat12_10_Before, HXCat12_10_During, HXCat12_10_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Sex Crime Incidents for",
              "Which Offender Was Arrested, Charged, or Convicted (Overall and",
              "By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_10, HXCat12_10_Before, HXCat12_10_During, HXCat12_10_After) %>% 
  rename(All = HXCat12_10, Before = HXCat12_10_Before, During = HXCat12_10_During, 
         After = HXCat12_10_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>%
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the sex crime incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_10_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_10 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_10_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_10 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_10_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Sex Crime Incident",
              "Counts Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_10_gee1, HXCat12_10_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_10_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_10_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Sex Crime Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_10_emmeans1 <- emmeans(HXCat12_10_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_10_emmeans2 <- emmeans(HXCat12_10_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_10_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_10_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_10_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Sex Crime",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_10_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_10_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Traffic and Ordinances

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_11 (r P_HXCat12_11\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for traffic and ordinance violations. Table \ref{tab:describe_traffic} summarizes the counts of unique traffic and ordinance incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_11} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Traffic and Ordinance Incidents (Overall and",
              "By When Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_11, HXCat12_11_Before, HXCat12_11_During, HXCat12_11_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Traffic and Ordinance",
              "Incidents for Which Offender Was Arrested, Charged, or",
              "Convicted (Overall and By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_11, HXCat12_11_Before, HXCat12_11_During, HXCat12_11_After) %>% 
  rename(All = HXCat12_11, Before = HXCat12_11_Before, 
         During = HXCat12_11_During, After = HXCat12_11_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the traffic and ordinance incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_11_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_11 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_11_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_11 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_11_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Traffic and",
              "Ordinance Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_11_gee1, HXCat12_11_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_11_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_11_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Traffic and Ordinance Incident Counts and",
              "Incidence Rates By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_11_emmeans1 <- emmeans(HXCat12_11_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_11_emmeans2 <- emmeans(HXCat12_11_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_11_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_11_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_11_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Traffic and",
              "Ordinance Incident Counts and Incidence Rates By When",
              "Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_11_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_11_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Weapons

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_12 (r P_HXCat12_12\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for weapons crimes. Table \ref{tab:describe_weapons} summarizes the counts of unique weapons crime incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_12} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Weapons Crime Incident Counts (Overall and By",
              "When Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_12, HXCat12_12_Before, HXCat12_12_During, HXCat12_12_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Weapons Crime Incidents for",
              "Which Offender Was Arrested, Charged, or Convicted (Overall and",
              "By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_12, HXCat12_12_Before, HXCat12_12_During, HXCat12_12_After) %>% 
  rename(All = HXCat12_12, Before = HXCat12_12_Before, During = HXCat12_12_During, 
         After = HXCat12_12_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value), 
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the weapon incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_12_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_12 ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_12_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_12 ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_12_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Weapon Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_12_gee1, HXCat12_12_gee2), booktabs = TRUE, dcolumn = TRUE,
       threeparttable = TRUE, fontsize = "normalsize", table = TRUE, 
       use.packages = FALSE, ci.force = TRUE, label = "tab:HXCat12_12_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_12_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Weapon Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_12_emmeans1 <- emmeans(HXCat12_12_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_12_emmeans2 <- emmeans(HXCat12_12_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_12_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_12_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_12_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Weapon",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_12_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_12_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Incident Counts by Broader Crime Category (4 levels) and Period

In this section, we analyze variables that contain the number of unique incidents in the offender's criminal history for a given broader category of crime. The numerical results for broader categories were omitted from the paper for the sake of brevity, but we retain them here for future reference.

For each broader crime category, we report include counts of incidents based on a combined criterion where the offender was arrested for, charged with, or convicted of (any of the three events, or any combination of them) any of the more specific types of crimes classified into the relevant category.

Note that these variables were created by first aggregating ARR, CHG, and JUD records to the incident level to flag incidents that fit the relevant category, then aggregated again to get to the offender level incident counts. It is possible to have multiple arrest offense records, charge records, or adjudication records for convictions on a single incident, but these variables ignore that and only count each incident once.

\FloatBarrier

Sexual Crimes

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_Sexual (r P_HXCat12_Sexual\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for a sexual crime (either a CSC or a non-CSC sex crime). Table \ref{tab:describe_sexual} summarizes the counts of unique sexual crimes incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_Sexual} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Sexual Crimes Incident Counts (Overall and By When",
              "Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_Sexual, HXCat12_Sexual_Before, HXCat12_Sexual_During, 
         HXCat12_Sexual_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Sexual Crimes Incidents ",
              "for Which Offender Was Arrested, Charged, or Convicted (Overall",
              "and By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_Sexual, HXCat12_Sexual_Before, HXCat12_Sexual_During, 
         HXCat12_Sexual_After) %>% 
  rename(All = HXCat12_Sexual, Before = HXCat12_Sexual_Before, 
         During = HXCat12_Sexual_During, After = HXCat12_Sexual_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value),
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the sexual crimes incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_Sexual_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_Sexual ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_Sexual_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_Sexual ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_Sexual_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Sexual Crimes Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_Sexual_gee1, HXCat12_Sexual_gee2), booktabs = TRUE, 
       dcolumn = TRUE, threeparttable = TRUE, fontsize = "normalsize", 
       table = TRUE, use.packages = FALSE, ci.force = TRUE, 
       label = "tab:HXCat12_Sexual_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_Sexual_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Sexual Crimes Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_Sexual_emmeans1 <- emmeans(HXCat12_Sexual_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_Sexual_emmeans2 <- emmeans(HXCat12_Sexual_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_Sexual_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_Sexual_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_Sexual_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Sexual Crimes",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_Sexual_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_Sexual_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Violent Non-Sexual Crimes

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_Violent (r P_HXCat12_Violent\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for a violent, non-sexual crime (homicide; assault involving domestic violence and/or stalking; non-sexual, non-domestic violence assault; robbery; or weapons). Table \ref{tab:describe_violent} summarizes the counts of unique sexual crimes incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_Violent} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Violent, Non-Sexual Crimes Incident Counts (Overall and By When",
              "Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_Violent, HXCat12_Violent_Before, HXCat12_Violent_During, 
         HXCat12_Violent_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Violent, Non-Sexual Crimes Incidents ",
              "for Which Offender Was Arrested, Charged, or Convicted (Overall",
              "and By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_Violent, HXCat12_Violent_Before, HXCat12_Violent_During, 
         HXCat12_Violent_After) %>% 
  rename(All = HXCat12_Violent, Before = HXCat12_Violent_Before, 
         During = HXCat12_Violent_During, After = HXCat12_Violent_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value),
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the violent, non-sexual crimes incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_Violent_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_Violent ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_Violent_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_Violent ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_Violent_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Violent, Non-Sexual Crimes Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_Violent_gee1, HXCat12_Violent_gee2), booktabs = TRUE, 
       dcolumn = TRUE, threeparttable = TRUE, fontsize = "normalsize", 
       table = TRUE, use.packages = FALSE, ci.force = TRUE, 
       label = "tab:HXCat12_Violent_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_Violent_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Violent, Non-Sexual Crimes Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_Violent_emmeans1 <- emmeans(HXCat12_Violent_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_Violent_emmeans2 <- emmeans(HXCat12_Violent_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_Violent_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_Violent_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_Violent_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Violent, Non-Sexual Crimes",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_Violent_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_Violent_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Property Crimes

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_Property (r P_HXCat12_Property\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for a property crime (arson, burglary, or larceny/theft/fraud). Table \ref{tab:describe_property} summarizes the counts of unique sexual crimes incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_Property} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Property Crimes Incident Counts (Overall and By When",
              "Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_Property, HXCat12_Property_Before, HXCat12_Property_During, 
         HXCat12_Property_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Property Crimes Incidents ",
              "for Which Offender Was Arrested, Charged, or Convicted (Overall",
              "and By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_Property, HXCat12_Property_Before, HXCat12_Property_During, 
         HXCat12_Property_After) %>% 
  rename(All = HXCat12_Property, Before = HXCat12_Property_Before, 
         During = HXCat12_Property_During, After = HXCat12_Property_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value),
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the property crimes incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_Property_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_Property ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_Property_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_Property ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_Property_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Property Crimes Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_Property_gee1, HXCat12_Property_gee2), booktabs = TRUE, 
       dcolumn = TRUE, threeparttable = TRUE, fontsize = "normalsize", 
       table = TRUE, use.packages = FALSE, ci.force = TRUE, 
       label = "tab:HXCat12_Property_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_Property_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Property Crimes Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_Property_emmeans1 <- emmeans(HXCat12_Property_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_Property_emmeans2 <- emmeans(HXCat12_Property_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_Property_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_Property_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_Property_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Property Crimes",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_Property_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_Property_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Other Crimes

Among these r N_IDNEW suspected sexual offenders, there were r N_HXCat12_Property (r P_HXCat12_Property\%) offenders who had criminal histories containing at least one incident associated with an arrest, charge, or conviction for other crimes (drug crimes or traffic and ordinances). Table \ref{tab:describe_other} summarizes the counts of unique sexual crimes incidents overall and by when the incident occurred, while Table \ref{tab:freq_HXCat12_Other} shows the frequency distributions underlying those summaries.

# Table caption. 
TCap <- paste("Offender-Level Other Crimes Incident Counts (Overall and By When",
              "Incident Occurred)")

# Summarize crime category counts (both overall and broken down by IWhen).
IDNEW %>% 
  select(HXCat12_Other, HXCat12_Other_Before, HXCat12_Other_During, 
         HXCat12_Other_After) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  bind_cols(data.frame(Variable = TRowLabs), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap,
        linesep = c('', '', '', '\\addlinespace')) %>% 
  column_spec(column = 1, width = "6 cm") %>% 
  footnote(general = FN1, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)
# Table caption.
TCap <- paste("Frequency Distributions for Number of Other Crimes Incidents ",
              "for Which Offender Was Arrested, Charged, or Convicted (Overall",
              "and By When Incident Occurred)")

# Get frequency distributions. 
IDNEW %>% 
  select(OID, HXCat12_Other, HXCat12_Other_Before, HXCat12_Other_During, 
         HXCat12_Other_After) %>% 
  rename(All = HXCat12_Other, Before = HXCat12_Other_Before, 
         During = HXCat12_Other_During, After = HXCat12_Other_After) %>% 
  pivot_longer(cols = -OID, names_to = "When", values_to = "Value") %>% 
  group_by(When, Value) %>% 
  count() %>% 
  arrange(Value) %>% 
  pivot_wider(names_from = When, values_from = n, values_fill = 0) %>% 
  mutate(All_p = 100*All/N_All,
         All_v = if_else(is.na(Value), 
                         true = as.numeric(NA), 
                         false = 100*All/N_All),
         Before_p = 100*Before/N_All,
         Before_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*Before/N_Before),
         During_p = 100*During/N_All,
         During_v = if_else(is.na(Value), 
                            true = as.numeric(NA), 
                            false = 100*During/N_During),
         After_p = 100*After/N_All,
         After_v = if_else(is.na(Value),
                           true = as.numeric(NA), 
                           false = 100*After/N_After)) %>% 
  select(Value, All, All_p, All_v, Before, Before_p, Before_v, During, 
         During_p, During_v, After, After_p, After_v) %>% 
  kable(format = "latex", booktabs = TRUE, digits = 2, 
        col.names = c("Value", rep(c("N", "%", "Valid %"), 4)),
        caption = TCap) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 1, "Overall" = 3, "Before TW" = 3, "During TW" = 3,
                   "After TW" = 3)) %>% 
  footnote(general = FN5, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

GEE Models

To more closely examine the other crimes incident counts, we fitted two generalized estimating equation (GEE) models to the long-format IDNEWL data, after dropping rows for unobserved periods (Years = 0) because the counts are missing for those observations.

Model 1 used When as a predictor of the period-specific counts, but did not adjust for the period durations. Model 2 supplemented When as a predictor with an offset term for log(Years) to adjust for the period durations. Both models used Poisson distributions, an exchangeable correlation structure, a log link function, and robust standard errors based on the sandwich estimator.

HXCat12_Other_gee1 <- IDNEWL %>% 
  filter(Years > 0) %>%
  geeglm(HXCat12_Other ~ When, 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

HXCat12_Other_gee2 <- IDNEWL %>% 
  filter(Years > 0) %>% 
  geeglm(HXCat12_Other ~ When + offset(log(Years)), 
         family = poisson(link = "log"), data = ., 
         id = OID, corstr = "exchangeable", std.err = "san.se")

Table \ref{tab:HXCat12_Other_texreg} below shows the parameter estimates for both GEE models.

TCap <- paste("Generalized Estimating Equation Models for Other Crimes Incident Counts",
              "Predicted by When Incidents Occurred")
FN <- paste ("\\item * p < .05, null hypothesis value outside 95\\% confidence",
             "interval based on z-score.",
             "\\\\\n\\item Note.",
             "The data contain up to 3 longitudinal observations (one per", 
             "period) for each offender (i.e., cluster).",
             "The reference priod for When was after the testing window.",
             "Both models used a log link function, exchangeable correlation", 
             "structure, a Poisson distribution, and robust standard errors",
             "(sandwich estimator).",
             "Model 1 did not use an offset.",
             "Model 2 used log(Years) as an offset term.")

texreg(list(HXCat12_Other_gee1, HXCat12_Other_gee2), booktabs = TRUE, 
       dcolumn = TRUE, threeparttable = TRUE, fontsize = "normalsize", 
       table = TRUE, use.packages = FALSE, ci.force = TRUE, 
       label = "tab:HXCat12_Other_texreg",
       stars = 0.05, caption = TCap, custom.note = FN)

\FloatBarrier

Estimated Marginal Means

Table \ref{tab:HXCat12_Other_emmeans} provides the estimated marginal means obtained from Models 1 and 2, along with corresponding confidence intervals. The units of measurement for those means are counts per offender for Model 1 (which ignore period duration) and counts per offender-year for Model 2 (which adjust for duration and are thus considered annual incidence rates).

# Table caption.
TCap <- paste("Marginal Means of Other Crimes Incident Counts and Incidence Rates",
              "By When Incidents Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "Confidence intervals are based on z-scores.")

HXCat12_Other_emmeans1 <- emmeans(HXCat12_Other_gee1, specs = pairwise ~ When, 
                           type = "response", ref = "During")
HXCat12_Other_emmeans2 <- emmeans(HXCat12_Other_gee2, specs = pairwise ~ When, 
                           type = "response", ref = "During")

as_tibble(HXCat12_Other_emmeans1$emmeans) %>% 
  full_join(x = ., y = as_tibble(HXCat12_Other_emmeans2$emmeans)) %>% 
  mutate(Model = c(1, 1, 1, 2, 2, 2), 
         When = factor(When, levels = c("Before", "During", "After"))) %>% 
  arrange(Model, When) %>% 
  select(When, rate, SE, df, asymp.LCL, asymp.UCL) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("When", "Rate", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Mean per offender", start_row = 1,
             end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Mean per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Contrasts for Ratios of Estimated Marginal Means

Finally, we also estimated a set of contrasts to compare the marginal means from Models 1 and 2 (Table \ref{tab:HXCat12_Other_ratios}). Each of these contrasts estimates the ratios of a pair of means rather than a raw difference between means. This is because of the nature of the underlying Poisson GEE model.

TCap <- paste("Contrasts Estimating Ratios of Marginal Means for Other Crimes",
              "Incident Counts and Incidence Rates By When Incidents",
              "Occurred")

# Table footnote.
FN <- paste("Model 1 did not use an offset, so means are counts.",
            "Model 2 used log(Years) as an offset term, so means are annual",
            "incidence rates.",
            "These contrasts estimate ratios of those quantities and used",
            "Tukey's method to adjust for multiple comparisons.",
            "Confidence intervals are based on z-scores.")

as_tibble(confint(HXCat12_Other_emmeans1$contrasts)) %>% 
  full_join(x = ., y = as_tibble(confint(HXCat12_Other_emmeans2$contrasts))) %>% 
  kable(format = "latex", booktabs = TRUE, caption = TCap, digits = 2, 
        col.names = c("Contrast", "Ratio", "SE", "df", "LCL", "UCL")) %>% 
  kable_styling() %>% 
  add_header_above(c(" " = 4, "95% CI" = 2)) %>% 
  group_rows(group_label = "Model 1: Ratios of means per offender", 
             start_row = 1, end_row = 3, italic = TRUE) %>% 
  group_rows(group_label = "Model 2: Ratios of means per offender-year", 
             start_row = 4, end_row = 6, italic = TRUE) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Before Period Summary

The content of this section was not reported in the manuscript for the sake of brevity. Table \ref{tab:before_summary} remains here to document an analysis performed and provide it in a form consistent with the one used for some of the after period results.

TCap <- paste("Amount of Crime That Occurred Before the Testing Window")
FN <- paste0("Separately for each crime category, we counted offenders ",
             "arrested, charged, or convicted for at least one incident ",
             "occurring before the testing window. Total incidents are summed ",
             "across those offenders. The bottom four italicized broader ",
             "categories include offenders who were arrested, charged or ", 
             "convicted for any of the more specific crime categories ",
             "nested below them, while the Any Crimes category includes ",
             "incidents associated with at least one of the 12 specific ",
             "crime categories. ",
             "The numerator for each percentage is the listed value for n; ",
             "the denominator is the sample size for the before period (N = ",
             format(N_Before, big.mark = ","), 
             "). ",
             "Incidents associated with more than one kind of crime are ",
             "counted in all applicable specific categories, but only once ",
             "per applicable broader category. For example, an incident with ",
             "both a homicide and a robbery only counts once toward the total ",
             "incident count for the broader violent crimes category. TW, ",
             "testing window.")

IDNEWB %>%
  # Drops rows for crime category counts and individuals with no incidents
  filter(Variable != "HXCat12_Count") %>%
  # Drops rows for offenders with no incidents of a given type occuring after 
  # the testing window. 
  filter(Count > 0) %>%
  group_by(Variable, VLabel, VOrder) %>%
  summarise(N = n(),
            Pct = 100*N/N_Before,
            Incidents = sum(Count)) %>%
  ungroup() %>%
  arrange(VOrder) %>%
  select(VLabel, N, Pct, Incidents) %>%
  kable(., format = "latex", booktabs = TRUE, digits = 1, row.names = FALSE,
        col.names = c("Crime Category", "n", "%", "Total Incidents"),
        format.args = list(big.mark = ','), caption = TCap, linesep = "") %>%
  kable_styling() %>%
  add_header_above(c(" " = 1, "Offenders" = 2, " " = 1)) %>%
  row_spec(., row = c(1, 2, 5, 11, 15), italic = TRUE) %>% 
  pack_rows(" ", start_row = 1, end_row = 1) %>% 
  pack_rows(" ", start_row = 2, end_row = 4) %>% 
  pack_rows(" ", start_row = 5, end_row = 10) %>% 
  pack_rows(" ", start_row = 11, end_row = 14) %>% 
  pack_rows(" ", start_row = 15, end_row = 17) %>% 
  add_indent(., positions = c(3:4), level_of_indent = 1) %>%
  add_indent(., positions = c(6:10), level_of_indent = 1) %>%
  add_indent(., positions = c(12:14), level_of_indent = 1) %>%
  add_indent(., positions = c(16:17), level_of_indent = 1) %>%
  column_spec(., column = 2, width = "1.15cm") %>%
  column_spec(., column = 3, width = "1.15cm") %>%
  column_spec(., column = 4, width = "2cm") %>%
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
           threeparttable = TRUE)

\FloatBarrier

During Period Summary

The content of this section was not reported in the manuscript for the sake of brevity. Table \ref{tab:during_summary} remains here to document an analysis performed and provide it in a form consistent with the one used for some of the after period results.

TCap <- paste("Amount of Crime That Occurred During the Testing Window")
FN <- paste0("Separately for each crime category, we counted offenders ",
             "arrested, charged, or convicted for at least one incident ",
             "occurring during the testing window. Total incidents are summed ",
            "across those offenders. The bottom four italicized broader ",
             "categories include offenders who were arrested, charged or ", 
             "convicted for any of the more specific crime categories ",
             "nested below them, while the Any Crimes category includes ",
             "incidents associated with at least one of the 12 specific ",
             "crime categories. ",
             "The numerator for each percentage is the listed value for n; ",
             "the denominator is the sample size for the during period (N = ",
             format(N_During, big.mark = ","), "). ",
             "Incidents associated with more than one kind of crime are ",
             "counted in all applicable specific categories, but only once ",
             "per applicable broader category. For example, an incident with ",
             "both a homicide and a robbery only counts once toward the total ",
             "incident count for the broader violent crimes category. TW, ",
             "testing window.")

IDNEWD %>%
  # Drops rows for crime category counts and individuals with no incidents
  filter(Variable != "HXCat12_Count") %>%
  # Drops rows for offenders with no incidents of a given type occuring after 
  # the testing window. 
  filter(Count > 0) %>%
  group_by(Variable, VLabel, VOrder) %>%
  summarise(N = n(),
            Pct = 100*N/N_During,
            Incidents = sum(Count)) %>%
  ungroup() %>%
  arrange(VOrder) %>%
  select(VLabel, N, Pct, Incidents) %>%
  kable(., format = "latex", booktabs = TRUE, digits = 1, row.names = FALSE,
        col.names = c("Crime Category", "n", "%", "Total Incidents"),
        format.args = list(big.mark = ','), caption = TCap, linesep = "") %>%
  kable_styling() %>%
  add_header_above(c(" " = 1, "Offenders" = 2, " " = 1)) %>%
 row_spec(., row = c(1, 2, 5, 11, 15), italic = TRUE) %>% 
  pack_rows(" ", start_row = 1, end_row = 1) %>% 
  pack_rows(" ", start_row = 2, end_row = 4) %>% 
  pack_rows(" ", start_row = 5, end_row = 10) %>% 
  pack_rows(" ", start_row = 11, end_row = 14) %>% 
  pack_rows(" ", start_row = 15, end_row = 17) %>% 
  add_indent(., positions = c(3:4), level_of_indent = 1) %>%
  add_indent(., positions = c(6:10), level_of_indent = 1) %>%
  add_indent(., positions = c(12:14), level_of_indent = 1) %>%
  add_indent(., positions = c(16:17), level_of_indent = 1) %>%
  column_spec(., column = 2, width = "1.15cm") %>%
  column_spec(., column = 3, width = "1.15cm") %>%
  column_spec(., column = 4, width = "2cm") %>%
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
           threeparttable = TRUE)

\FloatBarrier

After Period Summary

Next we summarize the amount of crime that occurred after the testing window. Separately for each crime category, Table \ref{tab:after_summary} shows the number of offenders with one or more incidents from the given category. It also translates those offender counts into percentages relative to the total sample size and reports the total number of incidents associated with those offenders that occurred after the testing window ended.

TCap <- paste("Amount of Crime That Occurred After the Testing Window")
FN <- paste0("Separately for each crime category, we counted offenders ",
             "arrested, charged, or convicted for at least one incident " ,
             "occurring after the testing window. Total incidents are summed ",
             "across those offenders. The bottom four italicized broader ",
             "categories include offenders who were arrested, charged or ", 
             "convicted for any of the more specific crime categories ",
             "nested below them, while the Any Crimes category includes ",
             "incidents associated with at least one of the 12 specific ",
             "crime categories. ",
             "The numerator for each percentage is the listed value for n; ",
             "the denominator is the sample size for the after period (N = ",
             format(N_After, big.mark = ","), "). ",
             "Incidents associated with more than one kind of crime are ",
             "counted in all applicable specific categories, but only once ",
             "per applicable broader category. For example, an incident with ",
             "both a homicide and a robbery only counts once toward the total ",
             "incident count for the broader violent crimes category. TW, ",
             "testing window.")

IDNEWA %>%
  # Drops rows for crime category counts and individuals with no incidents
  filter(Variable != "HXCat12_Count") %>%
  # Drops rows for offenders with no incidents of a given type occuring after 
  # the testing window. 
  filter(Count > 0) %>%
  group_by(Variable, VLabel, VOrder) %>%
  summarise(N = n(),
            Pct = 100*N/N_After,
            Incidents = sum(Count)) %>%
  ungroup() %>%
  arrange(VOrder) %>%
  select(VLabel, N, Pct, Incidents) %>%
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE,
        col.names = c("Crime Category", "n", "%", "Total Incidents"),
        format.args = list(big.mark = ','), caption = TCap, linesep = "") %>%
  kable_styling() %>%
  add_header_above(c(" " = 1, "Offenders" = 2, " " = 1)) %>%
  row_spec(., row = c(1, 2, 5, 11, 15), italic = TRUE) %>% 
  pack_rows(" ", start_row = 1, end_row = 1) %>% 
  pack_rows(" ", start_row = 2, end_row = 4) %>% 
  pack_rows(" ", start_row = 5, end_row = 10) %>% 
  pack_rows(" ", start_row = 11, end_row = 14) %>% 
  pack_rows(" ", start_row = 15, end_row = 17) %>% 
  add_indent(., positions = c(3:4), level_of_indent = 1) %>%
  add_indent(., positions = c(6:10), level_of_indent = 1) %>%
  add_indent(., positions = c(12:14), level_of_indent = 1) %>%
  add_indent(., positions = c(16:17), level_of_indent = 1) %>%
  column_spec(., column = 2, width = "1.15cm") %>%
  column_spec(., column = 3, width = "1.15cm") %>%
  column_spec(., column = 4, width = "2cm") %>%
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
           threeparttable = TRUE)

\FloatBarrier

Incident Count Distributions By Crime Categeory

Figure \ref{fig:plot_Crime_full_width} plots the distributions of the incident count variables for each of the 12 main crime categories during the period after the testing window. Because a few large outlier values make the details harder to see for the majority of the data (nearly all values are for 12 or fewer incidents), we also produced Figure \ref{fig:plot_Crime_partial_width}, which zooms in on the 0-12 range of incident counts.

FCap <- paste("\\label{fig:plot_Crime_full_width}",
              "Distributions for Number of Incidents After the Testing Window",
              "for Which Offender Was Arrested, Charged, or Convicted,",
              "By Crime Category.",
              "Each panel shows a histogram above a strip plot with an",
              "overlaid boxplot. Histogram bar heights show the percentage of",
              "offenders who had a given number of incidents.",
              "The strip plots show a dot for each offender, with positions",
              "jittered to reduce overlap.")

# Graph after period frequency distributions.
IDNEWA %>%
  # Keep only the main 12 crime categories (drops the broader crime categories).
  filter(!is.na(Crime)) %>%
  # use y = -... to position boxplot and jitterplot below the histogram
  ggplot(data = ., aes(x = Count, y = -.4)) +
  # Add histogram with bar heights scaled to percentages via after_stat.
  geom_histogram(aes(y = after_stat(count/N_After)), binwidth = .5, 
                 fill = "black") +
  # Add a jittered strip plot. 
  geom_point(aes(), position = position_jitter(width = .25, height = .25),
             size = 1, color = "gray50") +
  # Overlay a boxplot on the strip plot from ggstance
  geom_boxplot(lwd = 1, width = .2, outlier.shape = NA) +
  # Merged those calls to one display
  guides(scale = "none") +
  # Split each crime category into a separate panel (facet)
  facet_wrap(~Crime, nrow = 6, strip.position = "top") +
  xlab("Incident Count After Testing Window") +
  # Remove y axis title label from the y axis
  theme(axis.title.y = element_blank()) +
  # Format the y axis ticks. 
  scale_y_continuous(limits = c(-.65, 1.00), 
                     breaks = c(0, .5, 1.0), labels = c("0%", "50%", "100%")) ->
  Figure_Crime_full_width

Figure_Crime_full_width
FCap <- paste("\\label{fig:plot_Crime_partial_width}",
              "Distributions for Number of Incidents After the Testing Window",
              "for Which Offender Was Arrested, Charged, or Convicted,",
              "By Crime Category.",
              "Each panel shows a histogram above a strip plot with an",
              "overlaid boxplot. Histogram bar heights show the percentage of",
              "offenders who had a given number of incidents.",
              "The strip plots show a dot for each offender, with positions",
              "jittered to reduce overlap. This plot has been zoomed",
              "in to show incident counts in the range 0-12, omitting a few",
              "outliers for (a) larceny, theft, fraud,",
              "(b) drug crimes, and (c) traffic and ordinances.")

Figure_Crime_full_width + 
  scale_x_continuous(limits = c(-.6, 12.5)) ->
  Figure_Crime_partial_width

Figure_Crime_partial_width

Figure \ref{fig:plot_BCrime_full_width} plots the distributions of the incident count variables for each of the four broader crime categories during the period after the testing window. Because a few large outlier values make the details harder to see for the majority of the data (nearly all values are for 15 or fewer incidents), we also produced Figure \ref{fig:plot_BCrime_partial_width}, which zooms in on the 0-15 range of incident counts.

FCap <- paste("\\label{fig:plot_BCrime_full_width}",
              "Distributions for Number of Incidents After the Testing Window",
              "for Which Offender Was Arrested, Charged, or Convicted,",
              "By Broader Crime Category.",
              "Each panel shows a histogram above a strip plot with an",
              "overlaid boxplot. Histogram bar heights show the percentage of",
              "offenders who had a given number of incidents.",
              "The strip plots show a dot for each offender, with positions",
              "jittered to reduce overlap.")

# Graph after period frequency distributions.
IDNEWA %>%
  # Keep only the 4 broader crime categories (drops 12 main crime categories).
  filter(!is.na(BCrime)) %>%
  # use y = -... to position boxplot and jitterplot below the histogram
  ggplot(data = ., aes(x = Count, y = -.4)) +
  # Add histogram with bar heights scaled to percentages via after_stat.
  geom_histogram(aes(y = after_stat(count/N_After)), binwidth = .5, 
                 fill = "black") +
  # Add a jittered strip plot. 
  geom_point(aes(), position = position_jitter(width = .25, height = .25),
             size = 1, color = "gray50") +
  # Overlay a boxplot on the strip plot from ggstance
  geom_boxplot(lwd = 1, width = .2, outlier.shape = NA) +
  # Merged those calls to one display
  guides(scale = "none") +
  # Split each crime category into a separate panel (facet)
  facet_wrap(~BCrime, nrow = 4, strip.position = "top") +
  xlab("Incident Count After Testing Window") +
  # Remove y axis title label from the y axis
  theme(axis.title.y = element_blank()) +
  # Format the y axis ticks. 
  scale_y_continuous(limits = c(-.65, 1.00), 
                     breaks = c(0, .5, 1.0), labels = c("0%", "50%", "100%")) ->
  Figure_BCrime_full_width

Figure_BCrime_full_width
FCap <- paste("\\label{fig:plot_BCrime_partial_width}",
              "Distributions for Number of Incidents After the Testing Window",
              "for Which Offender Was Arrested, Charged, or Convicted,",
              "By Broader Crime Category.",
              "Each panel shows a histogram above a strip plot with an",
              "overlaid boxplot. Histogram bar heights show the percentage of",
              "offenders who had a given number of incidents.",
              "The strip plots show a dot for each offender, with positions",
              "jittered to reduce overlap. This plot has been zoomed",
              "in to show incident counts in the range 0-15, omitting some",
              "larger values in the any crimes panel and some outliers for",
              "(a) property crimes and (b) other crimes.")

Figure_BCrime_full_width + 
  scale_x_continuous(limits = c(-.6, 15.5)) ->
  Figure_BCrime_partial_width

Figure_BCrime_partial_width

\FloatBarrier

Years to Incident Date Distributions by Crime Category

Table \ref{tab:describe_IYearsAfterTW} shows descriptive statistics about the distribution for IYearsAfterTW, which records how many years after the end of the testing window an incident actually occurred, broken down by crime category. This is particularly useful for identifying the median (Q50) and quartiles (Q25 and Q75), which clearly vary across types of crime.

# Table caption. 
TCap <- paste("Years to Incident Date By Crime Category",
              "for Incidents After the Testing Window Where Offender Was",
              "Arrested, Charged, or Convicted.")
FN2 <- paste("Only offenders with valid testing window start dates are",
             "included.",
             "Only incidents that occurred after the testing window ended and",
             "where the offender was arrested for, charged with, or convicted",
             "of at least one offense from the relevant crime category were",
             "included.",
             "The bottom four italicized broader categories include offenders",
             "who were arrested, charged or convicted for any of the more", 
             "specific crime categories nested below them, while the Any",
             "Crimes category includes incidents associated with at least one",
             "of the 12 specific crime categories.",
             "Incidents associated with more than one kind of crime are",
             "included in all applicable specific categories, but only once per",
             "applicable broader category. For example, an incident with both a",
             "homicide and a robbery only contributes one observation to the",
             "years summary for the broader violent crimes category.")

VLLevels <- c("Any Crimes (12 categories)",
              "Sexual Crimes (2 categories)",
              "Criminal Sexual Conduct (CSC)",
              "Sex Crimes (non-CSC crimes)",
              "Violent Non-Sexual Crimes (5 categories)",
              "Assault, Domestic Violence",
              "Assault, non-Domestic Violence",
              "Homicide",
              "Robbery",
              "Weapons",
              "Property Crimes (3 categories)",
              "Arson",
              "Burglary",
              "Larceny, Theft, Fraud",
              "Other Crimes (2 categories)",
              "Drug Crimes",
              "Traffic & Ordinances")

CNames2 <- c("Crime Category", "N", "Mean", "SD", "Min", "Max", "Skew", 
             "Kurtosis", "Q25", "Q50", "Q75")

# Summarize crime category counts (both overall and broken down by IWhen).
INCEWA %>% 
  filter(!is.na(VLabel)) %>%
  select(VLabel, IYearsAfterTW) %>% 
  describeBy(., group = "VLabel", quant=c(.25, .50, .75), mat = TRUE) %>% 
  filter(vars == 2) %>% 
  rename(Category = group1) %>% 
  mutate(Category = factor(Category, levels = VLLevels, labels = VLLevels)) %>% 
  arrange(Category) %>% 
  select(Category, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames2, caption = TCap, linesep = "") %>% 
  row_spec(., row = c(1, 2, 5, 11, 15), italic = TRUE) %>% 
  pack_rows(" ", start_row = 1, end_row = 1) %>% 
  pack_rows(" ", start_row = 2, end_row = 4) %>% 
  pack_rows(" ", start_row = 5, end_row = 10) %>% 
  pack_rows(" ", start_row = 11, end_row = 14) %>% 
  pack_rows(" ", start_row = 15, end_row = 17) %>% 
  add_indent(., positions = c(3:4), level_of_indent = 1) %>%
  add_indent(., positions = c(6:10), level_of_indent = 1) %>%
  add_indent(., positions = c(12:14), level_of_indent = 1) %>%
  add_indent(., positions = c(16:17), level_of_indent = 1) %>%
  column_spec(column = 1, width = "6.8 cm") %>% 
  footnote(general = FN2, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

Figure \ref{fig:plot_IYearsAfterTW_Crime} shows the density plots (i.e., smoothed histograms) for the distribution of IYearsAfterTW for each crime category. The shapes of these plots help us better understand Table \ref{tab:describe_IYearsAfterTW}.

FCap <- paste("\\label{fig:plot_IYearsAfterTW_Crime}",
              "Distributions for Years to Incident Date",
              "for Incidents After the Testing Window Where Offender Was",
              "Arrested, Charged, or Convicted, By Crime Category.",
              "Each panel shows a densityplot above a strip plot with an",
              "overlaid boxplot.",
              "The strip plots show a dot for each incident, with positions",
              "jittered to reduce overlap.")

# Graph after period frequency distributions.
INCEWA %>%
  # Keep only the main 12 crime categories (drops the broader crime categories).
  filter(!is.na(Crime)) %>%
  select(OID, IID, Crime, IYearsAfterTW) %>% 
  # use y = -... to position boxplot and jitterplot below the histogram
  ggplot(data = ., aes(x = IYearsAfterTW, y = -.4)) +
  # Add density plot.
  stat_halfeye(slab_fill = "black", width = .6, justification = -.42, 
               .width = 0, point_colour = "NA") +
  # Add a jittered strip plot. 
  geom_point(aes(), position = position_jitter(width = 0, height = .25),
             size = 1, color = "gray50") +
  # Overlay a boxplot on the strip plot from ggstance
  geom_boxplot(lwd = 1, width = .20, outlier.shape = NA) +
  # Merged those calls to one display
  guides(scale = "none") +
  # Split each crime category into a separate panel (facet)
  facet_wrap(~Crime, nrow = 6, strip.position = "top") +
  xlab("Years to Incident Date") +
  ylab("Density") +
  # Remove y axis title label from the y axis
  theme(axis.title.y = element_blank(), axis.text.y = element_blank(), 
        axis.ticks.y = element_blank()) +
  # Format the y axis ticks. 
  scale_x_continuous(limits = c(-.5, 30.00), 
                     breaks = c(0, 5, 10, 15, 20, 25, 30), 
                     labels = c(0, 5, 10, 15, 20, 25, 30)) ->
  Figure_Crime_Years

Figure_Crime_Years

Figure \ref{fig:plot_IYearsAfterTW_Crime_ecdf} shows the cumulative distribution functions for the number of years elapsed between the end of the testing window and each incident that occurred after the window ended, broken down by crime category. This is useful because the number of preventable incidents depends on when after the end of the testing window the offender reference sample gets uploaded to CODIS. This figure is an alternative way to visualize the same data shown in Figure \ref{fig:plot_IYearsAfterTW_Crime} as a cumulative density up to a given number of years rather than the point density at each time point. The medians (Q50) shown in Table \ref{tab:describe_IYearsAfterTW} are number of years on the horizontal axis of \ref{fig:plot_IYearsAfterTW_Crime_ecdf} where the line rises to the 50% mark on the vertical axis. Similarly, the 25th and 75th quantiles (Q25 and Q75) are where the line rises to the 25% and 75% marks on the vertical axis.

FCap <- paste("\\label{fig:plot_IYearsAfterTW_Crime_ecdf}",
              "Cumulative Distribution Function for Years to Incident Date",
              "for Incidents After the Testing Window Where Offender Was",
              "Arrested, Charged, or Convicted, By Crime Category.",
              "Each panel shows a cumulative density function plot above a ",
              "strip plot with an overlaid boxplot.",
              "The strip plots show a dot for each incident, with positions",
              "jittered to reduce overlap.")

# Graph after period frequency distributions.
INCEWA %>%
  # Keep only the main 12 crime categories (drops the broader crime categories).
  filter(!is.na(Crime)) %>%
  select(OID, IID, Crime, IYearsAfterTW) %>% 
  # use y = -... to position boxplot and jitterplot below the histogram
  ggplot(data = ., aes(x = IYearsAfterTW, y = -.4)) +
  # Add empirical cumulative distribution function plot.
  stat_ecdf(geom = "step", pad = FALSE) +
  # Add a jittered strip plot. 
  geom_point(aes(), position = position_jitter(width = 0, height = .25),
             size = 1, color = "gray50") +
  # Overlay a boxplot on the strip plot from ggstance
  geom_boxplot(lwd = 1, width = .20, outlier.shape = NA) +
  # Merged those calls to one display
  guides(scale = "none") +
  # Split each crime category into a separate panel (facet)
  facet_wrap(~Crime, nrow = 6, strip.position = "top") +
  xlab("Years to Incident Date") +
  ylab("Cumulative Density") +
    # Format the y axis ticks. 
  scale_y_continuous(limits = c(-.65, 1.00), 
                     breaks = c(0, .5, 1.0), labels = c("0%", "50%", "100%")) +
  scale_x_continuous(limits = c(-.5, 30.00), 
                     breaks = c(0, 5, 10, 15, 20, 25, 30), 
                     labels = c(0, 5, 10, 15, 20, 25, 30))

Figure \ref{fig:plot_IYearsAfterTW_BCrime} shows the distributions for the number of years elapsed between the end of the testing window and each incident that occurred after the window ended broken down by broader crime category.

FCap <- paste("\\label{fig:plot_IYearsAfterTW_BCrime}",
              "Distributions for Years to Incident Date",
              "for Incidents After the Testing Window Where Offender Was",
              "Arrested, Charged, or Convicted, By Broader Crime Category.",
              "Each panel shows a densityplot above a strip plot with an",
              "overlaid boxplot.",
              "The strip plots show a dot for each incident, with positions",
              "jittered to reduce overlap.")

# Graph after period frequency distributions.
INCEWA %>%
  # Keep only the 4 broader crime categories (drops 12 main crime categories).
  filter(!is.na(BCrime)) %>%
  select(OID, IID, BCrime, IYearsAfterTW) %>% 
  # use y = -... to position boxplot and jitterplot below the histogram
  ggplot(data = ., aes(x = IYearsAfterTW, y = -.4)) +
  # Add density plot.
  stat_halfeye(slab_fill = "black", width = .6, justification = -.42, 
               .width = 0, point_colour = "NA") +
  # Add a jittered strip plot. 
  geom_point(aes(), position = position_jitter(width = 0, height = .25),
             size = 1, color = "gray50") +
  # Overlay a boxplot on the strip plot from ggstance
  geom_boxplot(lwd = 1, width = .20, outlier.shape = NA) +
  # Merged those calls to one display
  guides(scale = "none") +
  # Split each crime category into a separate panel (facet)
  facet_wrap(~BCrime, nrow = 4, strip.position = "top") +
  xlab("Years to Incident Date") +
  ylab("Density") +
  # Remove y axis title label from the y axis
  theme(axis.title.y = element_blank(), axis.text.y = element_blank(), 
        axis.ticks.y = element_blank()) +
  # Format the y axis ticks. 
  scale_x_continuous(limits = c(-.5, 30.00), 
                     breaks = c(0, 5, 10, 15, 20, 25, 30), 
                     labels = c(0, 5, 10, 15, 20, 25, 30)) ->
  Figure_BCrime_Years

Figure_BCrime_Years

Figure \ref{fig:plot_IYearsAfterTW_Crime_ecdf} was the original version of a
figure we submitted in a revision to our manuscript. Reviewers found it confusing, so we created Figure \ref{fig:plot_IYearsAfterTW_Crime_ecdf2} as a replacement when resubmitting the manuscript. The cumulative density at any given number of years after the end of the testing window is the proportion of incidents that had already occurred by that time. Switching to a focus on the proportion remaining instead more closely aligns with the number of potentially preventable incidents. We also improved the labeling of the axes and the figure caption to address reviewer comments and make the figure easier to understand.

Subsequent peer review feedback caused us to eventually remove the figure even though we believe is is useful in communicating the results.

FCap <- paste("\\label{fig:plot_IYearsAfterTW_Crime_ecdf2}",
              "Percentage of Incidents Remaining As a Function of Years From",
              "End of Testing Window to Incident Date, By Crime Category.",
              "Only incidents occurring after the testing window where",
              "the offender was arrested for, charged with, or convicted of at", 
              "least one offense from the relevant crime category were",
              "included.",
              "The lines show how the percentage of incidents remaining", 
              "declines over time (they are the complements of the cumulative",
              "density functions).", 
              "The strip plots below the lines show a dot for when each",
              "incident occurred: these are the data being summarized.",
              "Dot positions were jittered to reduce overlap.",
              "The overlaid boxplots provide an additional summary that ",
              "highlight when 75%, 50%, and 25% of the incidents remain.",
              "Bold x-axis values (at 1, 2, 5, and 10) are time points we", 
              "examined to quantify implications of hypothetical delayed CODIS",
              "matches for how many crimes might still be preventable.")

x.ticks <- c(0, 1, 2, 5, 10, 15, 20, 25, 30)
x.face  <- c("0", expression(bold("1")), expression(bold("2")), 
             expression(bold("5")), expression(bold("10")), "15", "20", "25", 
             "30")

# Graph after period frequency distributions.
INCEWA %>%
  # Keep only the main 12 crime categories (drops the broader crime categories).
  filter(!is.na(Crime)) %>%
  select(OID, IID, Crime, IYearsAfterTW) %>% 
  # use y = -... to position boxplot and jitterplot below the histogram
  ggplot(data = ., aes(x = IYearsAfterTW, y = -.4)) +
  # Add the complement of the empirical cumulative distribution function plot.
  stat_ecdf(mapping = aes(y = 1 - ..y..), geom = "step", pad = FALSE) +
  # Add a jittered strip plot. 
  geom_point(aes(), position = position_jitter(width = 0, height = .25),
             size = 1, color = "gray50") +
  # Overlay a boxplot on the strip plot from ggstance
  geom_boxplot(lwd = 1, width = .20, outlier.shape = NA) +
  # Merged those calls to one display
  guides(scale = "none") +
  # Split each crime category into a separate panel (facet)
  facet_wrap(~Crime, nrow = 6, strip.position = "top") +
  xlab("Years From End of Testing Window to Incident Date") +
  ylab("Percentage of Incidents Remaining") +
    # Format the y axis ticks. 
  scale_y_continuous(limits = c(-.65, 1.00), 
                     breaks = c(0, .5, 1.0), labels = c("0%", "50%", "100%")) +
  scale_x_continuous(limits = c(-.5, 30.00), 
                     breaks = x.ticks, labels = x.face) -> 
  #theme(axis.text.x = element_text(face=x.face)) ->
  Figure_Crime_Years

Figure_Crime_Years

\FloatBarrier

Number of Potentially Preventable Incidents by Crime Category

Now we extract the empirical cumulative distribution functions data underlying Figure \ref{fig:plot_IYearsAfterTW_Crime_ecdf} and use them to prepare Table \ref{tab:table_IYearsAfterTW_ecdf}. A reviewer noted that uploading an offender DNA profile from a SAK to CODIS can only produce immediate identification of the offender if the corresponding offender reference sample was already present in CODIS beforehand. However, that may not always happen. The offender reference sample that allows identifying the offender associated with SAK may not arrive in CODIS until later. That means there could be a delay between uploading the SAK forensic testing results to CODIS as a forensic sample and getting a match to an offender. Incidents occurring between the end of the testing window and the arrival of a matching offender reference sample cannot be prevented through timely response to CODIS search results that are not yet available. Only crimes occurring after a match is available could be prevented that way.

It would be ideal if we had the CODIS upload date for the reference sample that matched to each SAK. That would allow us to examine and describe the distribution of the delay between uploading the SAK sample and receiving an offender match. There would be no delay ($\le 0$ years) when the offender reference sample was already present in CODIS, but it could be anywhere from 1 day to many years before one is available. Knowing more about that distribution would allow us to find the subset of potentially preventable incidents that occurred both after the testing window ended and after the matching reference sample was uploaded to CODIS.

Unfortunately, we do not have access to those offender reference sample upload dates. Therefore, we examine the implications of assuming a fixed delay in when offender reference samples arrive in CODIS relative to the end of the testing window. We do that by using the cumulative distribution functions graphed in Figure \ref{fig:plot_IYearsAfterTW_Crime_ecdf} to compute how many incidents occurred 1, 2, 5, and 10 years after the end of the testing window. This makes the crude assumption that all SAKs experience the same delay.

# Table caption. 
TCap <- paste("Number and Percentage of Incidents Remaining Given Hypothetical",
              "Delays Between End of Testing Window and Matching to an ",
              "Offender Reference Sample")
FN2 <- paste("Only offenders with valid testing window start dates are",
             "included.",
             "Only incidents that occurred after the testing window ended and",
             "where the offender was arrested for, charged with, or convicted",
             "of at least one offense from the relevant crime category were",
             "included.",
             "The bottom four italicized broader categories include offenders",
             "who were arrested, charged or convicted for any of the more", 
             "specific crime categories nested below them, while the Any",
             "Crimes category includes incidents associated with at least one",
             "of the 12 specific crime categories. ",
             "Incidents associated with more than one kind of crime are",
             "included in all applicable specific categories, but only once per",
             "applicable broader category. For example, an incident with both a",
             "homicide and a robbery only contributes one observation to the",
             "years summary for the broader violent crimes category.",
             "N, total number of incidents after the end of testing window;", 
             "n, number of incidents remaining after specified delay." )

INCEWA %>% 
  filter(!is.na(VLabel)) %>%
  mutate(Category = factor(VLabel, levels = VLLevels, labels = VLLevels)) %>%
  select(Category, IYearsAfterTW) %>%
  arrange(Category, IYearsAfterTW) %>% 
  group_by(Category) %>% 
  summarize(N = n(),
            # If CODIS reference sample uploaded 1 year after SAK
            PRemain_1 = 1 - ecdf(IYearsAfterTW)(1),
            NRemain_1 = round(N*PRemain_1, digits = 0),
            # If CODIS reference sample uploaded 2 years after SAK
            PRemain_2 = 1 - ecdf(IYearsAfterTW)(2),
            NRemain_2 = round(N*PRemain_2, digits = 0),
            # If CODIS reference sample uploaded 5 years after SAK
            PRemain_5 = 1 - ecdf(IYearsAfterTW)(5),
            NRemain_5 = round(N*PRemain_5, digits = 0),
            # If CODIS reference sample uploaded 10 years after SAK
            PRemain_10 = 1 - ecdf(IYearsAfterTW)(10),
            NRemain_10 = round(N*PRemain_10, digits = 0)) %>% 
  # Rescale proportions to percentages
  mutate(PRemain_1 = PRemain_1*100,
         PRemain_2 = PRemain_2*100,
         PRemain_5 = PRemain_5*100,
         PRemain_10 = PRemain_10*100) %>% 
  select(Category, N, NRemain_1, PRemain_1, NRemain_2, PRemain_2,
         NRemain_5, PRemain_5, NRemain_10, PRemain_10) %>% 
  kable(format = "latex", digits = 2, booktabs = TRUE, caption = TCap, 
        format.args = list(big.mark = ','), linesep = "", 
        col.names = c("Category", "N", rep(c("n", "%"), times = 4))) %>% 
  kable_styling() %>% 
  add_header_above(., c(" ", " ", "1 Year" = 2, "2 Years" = 2, 
                        "5 Years" = 2, "10 Years" = 2)) %>% 
  add_header_above(., c(" ", " ",
                        "Delay in Matching to Offender Reference Sample" = 8)) %>% 
  row_spec(., row = c(1, 2, 5, 11, 15), italic = TRUE) %>% 
  pack_rows(" ", start_row = 1, end_row = 1) %>% 
  pack_rows(" ", start_row = 2, end_row = 4) %>% 
  pack_rows(" ", start_row = 5, end_row = 10) %>% 
  pack_rows(" ", start_row = 11, end_row = 14) %>% 
  pack_rows(" ", start_row = 15, end_row = 17) %>% 
  add_indent(., positions = c(3:4), level_of_indent = 1) %>%
  add_indent(., positions = c(6:10), level_of_indent = 1) %>%
  add_indent(., positions = c(12:14), level_of_indent = 1) %>%
  add_indent(., positions = c(16:17), level_of_indent = 1) %>%
  column_spec(column = 1, width = "6.8 cm") %>% 
  footnote(general = FN2, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Export Figures for Manuscript

The American Psychological Association's general instructions regarding manuscript preparation refer authors to KnowledgeWorks Global Digital Art Support General Guidelines for guidance on preparing figures. We are complying with those guidelines for preparing the following Figure files that we export.

We can choose between submitting TIFF versus EPS file. Because EPS files are orders of magnitude smaller, we choose those for submission. We also export PNG files for convenience (they are easier to view without specialized software).

We wrote the code below to export a version of Figure \ref{fig:plot_IYearsAfterTW_Crime_ecdf2}, but have subsequently disabled the code chunk below because the final manuscript will no longer contain the figure.

ggsave(filename = here::here("inst/Figure_Crime_Years.eps"), 
       plot = Figure_Crime_Years, device = "eps", dpi = 600, 
       height = 6.0, width = 7.2, units = "in")
ggsave(filename = here::here("inst/Figure_Crime_Years.png"), 
       plot = Figure_Crime_Years, device = "png", dpi = 600, 
       height = 6.0, width = 7.2, units = "in")

\FloatBarrier

Arrest Offense Record Summaries

All summaries in this section directly summarize the ARR data without first aggregating to the incident or offender level. These outputs are not directly used in the manuscript.

\FloatBarrier

Crime Category (12 levels)

FN <- paste("Only arrest records for offenders with valid testing window start",
            "dates and with offenses from the 12 main crime categories were",
            "included.")

kable(freq(as_factor(ARREW$ACat12), user.missing = "Excluded user-missing"), 
      format = "latex", booktabs = TRUE, digits = 2, 
      format.args = list(big.mark = ','),
      caption = "Number of Arrest Offense Records by Crime Category (12 levels)") %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Age at Arrest for Criminal Sexual Conduct

# Table caption. 
TCap <- paste("Offender Age at Arrest for Criminal Sexual Conduct (Overall)")
# Footnote text. 
FN <- paste("This is an arrest offense record-level summary (offenders with",
            "multiple arrests contribute one age value per arrest record.",
            "Only arrest records for offenders with valid testing window start",
            "dates and where the arrest was for criminal sexual conduct are",
            "included.")

ARREW %>% 
  filter(ACat12_5 == 1) %>% 
  select(OAgeA, ALag) %>% 
  describe(., quant=c(.25, .50, .75)) %>% 
  filter(vars == 1) %>% 
  bind_cols(data.frame(Variable = "Age at arrest"), .) %>% 
  select(Variable, n, mean, sd, min, max, skew, kurtosis, Q0.25, Q0.5, Q0.75) %>% 
  kable(., format = "latex", booktabs = TRUE, digits = 2, row.names = FALSE, 
        col.names = CNames, caption = TCap) %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Prosecutor Charge Record Summaries

This section directly summarizes the CHG data without first aggregating to the incident or offender level. These outputs are not directly used in the manuscript.

\FloatBarrier

FN <- paste("Only charge records for offenders with valid testing window start",
            "dates and with charges from the 12 main crime categories were",
            "included.")

kable(freq(as_factor(CHGEW$CCat12), user.missing = "Excluded user-missing"), 
      format = "latex", booktabs = TRUE, digits = 2, 
      format.args = list(big.mark = ','),
      caption = "Number of Prosecutor Charge Records by Crime Category (12 levels)") %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

\FloatBarrier

Judicial Charge Record Summaries

All summaries in this section directly summarize the JUD data without first aggregating to the incident or offender level. While some of these records are convictions, others will have other dispositions. These outputs are not directly used in the manuscript.

\FloatBarrier

Crime Category (12 levels, all dispositions)

FN <- paste("Only records for offenders with valid testing window start dates",
            "and with charges from one of the 12 main crime categories were",
            "included.")

kable(freq(as_factor(JUDEW$JCat12), user.missing = "Excluded user-missing"), 
      format = "latex", booktabs = TRUE, digits = 2, 
      format.args = list(big.mark = ','),
      caption = "Number of Adjudicated Charge Records (All Dispositions) by Crime Category (12 levels)")

\FloatBarrier

Crime Category (12 levels, convictions)

FN <- paste("Only conviction records for offenders with valid testing window",
            "start dates and with charges from one of the 12 main crime",
            "categories were included.")

kable(freq(as_factor(CONEW$JCat12), user.missing = "Excluded user-missing"), 
      format = "latex", booktabs = TRUE, digits = 2, 
      format.args = list(big.mark = ','),
      caption = "Number of Adjudicated Charge Records (Convictions Only) by Crime Category (12 levels)") %>% 
  footnote(general = FN, general_title = "Note: ", footnote_as_chunk = TRUE,
         threeparttable = TRUE)

Wrap Up

\FloatBarrier

Project Information

These materials are scholarly products based on research funded by the following grant.

Campbell, R., Pierce, S. J., & Sharma, D. (2015–2018). Serial sexual assaults: A longitudinal examination of offending patterns using DNA evidence. (NIJ Award # 2014-NE-BX-0006) [Grant]. National Institute of Justice.

\FloatBarrier

References

Campbell, R. (2019). Serial sexual assaults: A longitudinal examination of offending patterns using DNA evidence, Detroit, Michigan, 2009 [Data files, codebooks, computer programs, and statistical output]. ICPSR37134-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2019-02-28. Retrieved from: https://doi.org/10.3886/ICPSR37134.v1

\FloatBarrier

Software Information

We use R Markdown to enhance reproducibility. Knitting the source R Markdown script r knitr:::current_input() generates this PDF file containing explanatory text, R code, plus R output (text and graphics).

This document was generated using the following computational environment and dependencies:

``` {r show_citations}

Check and report whether we used TinyTex or other LaTeX software.

which_latex()

Get R and R package version numbers in use.

devtools::session_info()

The current Git commit details and status are:

```r
git_report()


sjpierce/SSACHR documentation built on Jan. 16, 2022, 12:39 a.m.