knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
This package and corresponding GitHub repository are intended to enhance the reproducibility of a research paper (Campbell, Pierce, Goodman-Williams, & Feeney, 2021) by serving as a research compendium (Marwick, Boettiger, & Mullen, 2018). The principal investigator for the study is Dr. Rebecca Campbell (Professor, Department of Psychology, Michigan State University). The paper presents descriptive data about the criminal histories of suspected serial sexual offenders using data from both criminal history records and forensic DNA testing of sexual assault kits (SAKs).
The SSACHR package is available from a public repository available on GitHub at https://github.com/sjpierce/SSACHR. It was made public after the manuscript was accepted for publication.
Before installing SSACHR, make sure you have:
If you use Git (Torvalds et al., 2022) and have a GitHub account, either clone
or fork and clone the package to your computer using the usual Git commands
(Bryan, 2018; Bryan et al., 2019, Chapters 28 and 32). Otherwise, manually
download a ZIP file and unzip its contents to a folder. Either way, the files
should end up in a folder called SSACHR
on your computer. That folder is your
local copy of the repository.
Note that the package code uses relative rather than absolute folder path and file name references. Moving or renaming subfolders and/or files may cause problems. We have tested it only with the folder structure and file naming used in the primary repository on GitHub.
The structure for the package is shown in the outline below, where folder names
and file names are highlighted like this
and comments are in normal text.
The folder structure is largely determined by the conventions governing the
structure of R packages. It deviates a bit from the example research compendium
folder structures discussed by Marwick et al. (2018). The repository is also
set up as an
RStudio project.
SSACHR
: This is the root folder for the repository.
data
: This folder is where the data file produced by the script
Step_01_Data_Mgt.Rmd
will be stored. This is a standard folder for R
package structures. CHR_Data.RData
is the data file produced by the script
Step_01_Data_Mgt.Rmd
after it imports the external data files. It is
not distributed with the repository (see the Obtaining Data Files
section).Read_This_Nate.text
This text file is just present to ensure that
the data
subfolder will be created when you clone the repository
or extract files from ZIP file copy of the repository obtained
from GitHub.inst
: This folder is where you can find the key files you will need to
use if you want to re-run our analyses on your own computer. The
unintuitive name for this folder is a result of R package building
conventions (it is where you put files that should be installed with the
package).
extdata
: This subfolder is where you will need to put the SPSS
data files mentioned in the Obtaining Data Files section below.Read_This_Note.txt
: This text file is just present to remind youextdata
subfolder is where you should put the SPSS data
files after you get them. TWindow_Scenarios.csv
: This file contains hypothetical data used
by Step_01_Data_Mgt.Rmd
to generate a figure. compact-title.tex
: This LaTeX file is used when you knit .Rmd
files into PDF files. It helps control title section formatting. Development_Tools.R
: This file just contains R code reminders that
I use while developing packages. Step_01_Data_Mgt.pdf
: This is the default output filename produced
by knitting Step_01_Data_Mgt.Rmd
. It is not distributed with the
package because you would just generate it on your own computer to
check reproducibility. Step_01_Data_Mgt_Published.pdf
: This is a copy of the final output I
produced on my computer when preparing to release the package to the
public. We relied on it to create our manuscript. It is distributed
with the package because you may want to compare it to the results you
get on your own computer by knitting Step_01_Data_Mgt.Rmd
.Step_01_Data_Mgt.Rmd
: Knitting this file requires that you have
already obtained the data files mentioned in the Obtaining Data Files
section below. It performs initial data management steps to prepare
the data for use in other scripts. Step_02_Analysis.pdf
: This is the default output filename produced
by knitting Step_02_Analysis.Rmd
. It is not distributed with the
package because you would just generate it on your own computer to
check reproducibility. Step_02_Analysis_Published.pdf
: This is a copy of the final output I
produced on my computer when preparing to release the package to the
public. We relied on it to create our manuscript. It is distributed
with the package because you may want to compare it to the results you
get on your own computer by knitting Step_02_Analysis.Rmd
.Step_02_Analysis.Rmd
: This file should be knitted after you knit
Step_01_Data_Mgt.Rmd
because it depends on a data files created by
that script. It will produce a PDF file called Step_02_Analysis.pdf
.R_Citations.pdf
: This is the default output filename produced
by knitting R_Citations.Rmd
. It is not distributed with the package
because you would just generate it on your own computer to check
reproducibility. R_Citations_Published.pdf
: This is a copy of the final output I
produced on my computer when preparing to release the package to the
public. It describes the software environment I used to generate the
files upon which our published manuscript was based. You can compare
it to the environment on your computer. Maximum reproducibility should
occur when you are using the environment described in this document. R_Citations.Rmd
: This file generates details about the R packages
used by Step_01_Data_Mgt.Rmd
and Step_02_Analysis.Rmd
. I
recommend knitting it after you knit those two files. When you knit
it, you will get an output file called R_Citations.pdf
showing the
citations and versions for what is installed and in use on your
computer. man
: This folder contains R help files (*.Rd
) for the package and
its custom functions. It is required by R package building conventions.
R
: This folder contains the source code for the package's custom
functions in a set of *.R
script files. It is required by R package
building conventions..gitignore
: This file tells Git what files to ignore and omit from
synchronizing with the main repository on GitHub. .Rbuildignore
: This file tells R what files to ignore when building the
package from the source code.DESCRIPTION
: This file is a brief, structured description of the package
that is required by R package building conventions. LICENSE
: This file contains the terms of the CC-BY-SA-4.0 license
that applies to all non-source code content in this repository. LICENSE.md
: This file contains the terms of the GPL3 software license
that apply to the source code in this repository. LICENSE.note
: This file contais a notes explaining why there are are
multiple licenses by specifying which content repository/package content
falls under each license.NAMESPACE
: This file is created automatically by R when building the
package. You should not edit it manually. It is required by R package
building conventions.NEWS.md
: This file contains an list of comments about the changes made
with each version of this package. It is required by R package building
conventionsREADME.md
: This file is obtained by knitting the README.Rmd
file and
is used by GitHub to display information about the package. Do not edit it
manually. In R Studio, you can read the formatted version by opening the
file and clicking the Preview button.README.Rmd
: This file gives an introduction to the package. Knitting it
produces the README.md
file and opens the preview automatically. SSACHR.Rproj
: This is an RStudio project file. It contains some settings
for working with the project in that software. Parts of the SSACHR package rely on custom functions defined in the package
repository's SSACHR/R
subfolder. The easiest way to use them is to
install the package to your personal R package library. Downloading or cloning
the repository files to your computer does not install that package into your R
package library. It just creates your local copy of the repository files.
To install the package to your R package library, you have to either build and install the package from those local files, or install it directly from GitHub.
The following code will install SSACHR directly from the Github repository
into your personal package library by using the install_github()
function
from devtools.
devtools::install_github("sjpierce/SSACHR", ref = "main")
devtools::install_github("sjpierce/SSACHR", ref = "main")
Scripts in this R package depend on having a number of other R packages installed. Those packages are available from CRAN and can be installed by running the following code in the R console.
install.packages(pkgs = c("assertthat", "car", "descr", "dplyr", "emmeans", "geepack", "git2r", "ggdist", "ggplot2", "haven", "here", "kableExtra", "knitr", "lattice", "latticeExtra", "lubridate", "plyr", "psych", "rmarkdown", "RColorBrewer", "sjlabelled", "texreg", "tinytex", "tidyr", "utils", "vistime", "xfun"))
install.packages(pkgs = c("assertthat", "car", "descr", "dplyr", "emmeans", "geepack", "git2r", "ggdist", "ggplot2", "haven", "here", "kableExtra", "knitr", "lattice", "latticeExtra", "lubridate", "plyr", "psych", "rmarkdown", "RColorBrewer", "sjlabelled", "texreg", "tinytex", "tidyr", "utils", "vistime", "xfun"))
The data files required by this package were deposited into the National Archive of Criminal Justice Data (Campbell, 2019). Please visit the web page for that deposit at NACJD to download the files.
After you have downladed the data from NACJD, unzip all the SPSS data files
(*.sav
) into the SSACHR/inst/extdata
subfolder of your local repository.
That should allow you to reproduce the analyses by re-running our scripts.
Once you have completed this step and all the others listed above, you should be ready to use this package to reproduce our results.
After it has been installed to your package library as described above, you can load SSACHR via the following R console command. That provides access to the custom R functions we have included in the package.
library(SSACHR)
library(SSACHR)
You can see information about the package by using the following command in the R console. The resulting help page has an Index link at the bottom that will show you a list of all the custom functions in the package.
?SSACHR
If you are using RStudio Desktop,
the easiest way to start reproducing our results is to navigate to the SSACHR
folder containing the repository and open the project file
SSACHR\SSACHR.Rproj
.
Then you can open and knit the key scripts in the following order:
SSACHR/inst/Step_01_Data_Mgt.Rmd
SSACHR/inst/Step_02_Analysis.Rmd
SSACHR/inst/R_Citations.Rmd
To do that from the R console, the following code should work. Each call to
Rscript_call()
runs the listed input script in a fresh R session and writes a
PDF output file to the specified name and folder inside the local repository.
That will replace any prior version of the output by overwriting the file.
library(xfun) # for Rscript_call() library(here) # for here() library(rmarkdown) # for render() Rscript_call(fun = render, args = list(input = here("inst/Step_01_Data_Mgt.Rmd"), params = list(LogFile = "Step_01_Data_Mgt.pdf"), output_file = "Step_01_Data_Mgt.pdf", output_dir = here("inst"))) Rscript_call(fun = render, args = list(input = here("inst/Step_02_Analysis.Rmd"), params = list(LogFile = "Step_02_Analysis.pdf"), output_file = "Step_02_Analysis.pdf", output_dir = here("inst"))) Rscript_call(fun = render, args = list(input = here("inst/R_Citations.Rmd"), params = list(LogFile = "R_Citations.pdf"), output_file = "R_Citations.pdf", output_dir = here("inst")))
# This chunk can be used by people trying to reproduce our results without # overwriting the "*_Published.pdf" versions of the output distributed with the # repository. library(xfun) # for Rscript_call() library(here) # for here() library(rmarkdown) # for render() Rscript_call(fun = render, args = list(input = here("inst/Step_01_Data_Mgt.Rmd"), output_file = "Step_01_Data_Mgt.pdf", output_dir = here("inst"))) Rscript_call(fun = render, args = list(input = here("inst/Step_02_Analysis.Rmd"), output_file = "Step_02_Analysis.pdf", output_dir = here("inst"))) Rscript_call(fun = render, args = list(input = here("inst/R_Citations.Rmd"), output_file = "R_Citations.pdf", output_dir = here("inst")))
# This will chunk is intended only for use by the package author! # Running these commands will over-write the "*_Published.pdf" versions of # the output distributed with the repository. library(xfun) # for Rscript_call() library(here) # for here() library(rmarkdown) # for render() # Render each script in a fresh R session via xfun::Rscript_call(). Rscript_call(fun = render, args = list(input = here("inst/Step_01_Data_Mgt.Rmd"), params = list(LogFile = "Step_01_Data_Mgt_Published.pdf"), output_file = "Step_01_Data_Mgt_Published.pdf", output_dir = here("inst"))) Rscript_call(fun = render, args = list(input = here("inst/Step_02_Analysis.Rmd"), params = list(LogFile = "Step_02_Analysis_Published.pdf"), output_file = "Step_02_Analysis_Published.pdf", output_dir = here("inst"))) Rscript_call(fun = render, args = list(input = here("inst/R_Citations.Rmd"), params = list(LogFile = "R_Citations_Published.pdf"), output_file = "R_Citations_Published.pdf", output_dir = here("inst")))
We use R Markdown to enhance reproducibility because it provides excellent
support for generating dynamic reports (Mair, 2016). Knitting the source R
Markdown script r knitr:::current_input()
generates this Markdown file.
Knitting our other R Markdown scripts from this package generates PDF output
files containing explanatory text, R code, plus R output (both text and
graphics).
r rstudioapi::versionInfo()$version
or
later to work with R and R markdown files. The software chain looks like this:
Rmd file > RStudio > R > rmarkdown > knitr > md file > pandoc > tex file > TinyTeX > PDF file.r rmarkdown::pandoc_version()
or later
for this document. A version of pandoc comes bundled with RStudio, but if
you want the most recent version, download it from https://pandoc.org/. Bryan, J. (2018). Excuse me, do you have a moment to talk about version control? The American Statistician, 72(1), 20-27. doi:10.1080/00031305.2017.1399928
Bryan, J., The STAT 545 TAs, & Hester, J. (2019). Happy Git and GitHub for the useR. Retrieved from https://happygitwithr.com
Campbell, R. (2019). Serial sexual assaults: A longitudinal examination of offending patterns using DNA evidence, Detroit, Michigan, 2009 [Data files, codebooks, computer programs, and statistical output]. ICPSR37134-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2019-02-28. Retrieved from: https://doi.org/10.3886/ICPSR37134.v1
Campbell, R., Pierce, S. J., Goodman-Williams, R., & Feeney, H. (2022). A window of opportunity: Examining the potential impact of mandatory sexual assault kit (SAK) testing legislation on crime prevention [Manuscript accepted for publication]. Psychology, Public Policy, and Law.
Mair, P. (2016). Thou shalt be reproducible! A technology perspective. Frontiers in Psychology, 7(1079), 1-17. doi:10.3389/fpsyg.2016.01079
Marwick, B., Boettiger, C., & Mullen, L. (2018). Packaging data analytical work reproducibly using R (and friends). The American Statistician, 72(1), 80-88. doi:10.1080/00031305.2017.1375986
Torvalds, L., Hamano, J. C., & other contributors to the Git Project. (2022). Git for Windows (Version 2.34.1) [Computer program]. Brooklyn, NY: Software Freedom Conservancy. Retrieved from https://git-scm.com
This R package and repository are based on research supported by the following grant.
Campbell, R., Pierce, S. J., & Sharma, D. (2015–2018). Serial sexual assaults: A longitudinal examination of offending patterns using DNA evidence. (NIJ Award # 2014-NE-BX-0006) [Grant]. National Institute of Justice.
This research was supported by a grant from the National Institute of Justice, United States (2014-NE-BX-0006). The opinions or points of view expressed in this document (or any other document included in this R package and repository) are solely those of the authors and do not reflect the official positions of any participating organization or the U.S. Department of Justice.
Please cite the package itself, plus the associated data files and the journal article.
Pierce, S. J. (2022). SSACHR: Serial sexual assault study criminal history records paper research compendium. (Version 1.0.0) [Reproducible research materials and computer program, R package]. GitHub and Zenodo. https://github.com/sjpierce/SSACHR and https://doi.org/10.5281/zenodo.5854874
Campbell, R. (2019). Serial sexual assaults: A longitudinal examination of offending patterns using DNA evidence, Detroit, Michigan, 2009 [Data files, codebooks, computer programs, and statistical output]. ICPSR37134-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2019-02-28. Retrieved from: https://doi.org/10.3886/ICPSR37134.v1
Campbell, R., Pierce, S. J., Goodman-Williams, R., & Feeney, H. (2022). A window of opportunity: Examining the potential impact of mandatory sexual assault kit (SAK) testing legislation on crime prevention [Manuscript accepted for publication]. Psychology, Public Policy, and Law.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.