The goal of rfars is to facilitate transportation safety analysis by
simplifying the process of extracting data from official crash
databases. The National Highway Traffic Safety
Administration collects and publishes a census
of fatal crashes in the Fatality Analysis Reporting
System
and a sample of fatal and non-fatal crashes in the Crash Report
Sampling
System
(an evolution of the General Estimates
System).
The Fatality and Injury Reporting System
Tool allows users to query these databases,
and can produce simple tables and graphs. This suffices for simple
analysis, but often leaves researchers wanting more. Digging any deeper,
however, involves a time-consuming process of downloading annual ZIP
files and attempting to stitch them together - after first combing
through immense data dictionaries to determine the required variables
and table names.
rfars allows users to download the last 10 years of FARS and GES/CRSS
data with just one line of code. The result is a full, rich dataset
ready for mapping, modeling, and other downstream analysis. Codebooks
with variable definitions and value labels support an informed analysis
of the data (see vignette("Searchable Codebooks", package = "rfars")
for more information). Helper functions are also provided to produce
common counts and comparisons.
You can install the latest version of rfars from
GitHub with:
# install.packages("devtools")
devtools::install_github("s87jackson/rfars")
or the CRAN stable release with:
install.packages("rfars")
Then load rfars and some helpful packages:
library(rfars)
library(dplyr)
The get_fars() and get_gescrss() are the primary functions of the
rfars package. These functions download and process data files
directly from NHTSA’s FTP
Site, or pull
the prepared data stored on your local machine, or (as of Version 2.0)
pull the prepared data from Zenodo. The data files hosted on Zenodo are
stable, have DOIs, and replicate the data that would be produced by
get_fars() and get_gescrss(), but in a fraction of the time.
They take the parameters years and states (FARS) or regions
(GES/CRSS). As the source data files follow an annual structure, years
determines how many file sets are downloaded or loaded, and
states/regions filters the resulting dataset. Downloading and
processing these files can take several minutes. Before downloading,
rfars will inform you that it’s about to download files and asks your
permission to do so. To skip this dialog, set proceed = TRUE. You can
use the dir and cache parameters to save an RDS file to your local
machine. The dir parameter specifies the directory, and cache names
the file (be sure to include the .rds file extension).
Executing the code below will download the prepared FARS and GES/CRSS databases for 2014-2023.
myFARS <- get_fars(proceed = TRUE)
myCRSS <- get_gescrss(proceed = TRUE)
get_fars() and get_gescrss() return a list with six dataframes:
flat, multi_acc, multi_veh, multi_per, events, and codebook.
The tables below show records for randomly selected crashes to illustrate the content and structure of the data. The tables are transposed for readability.
Each row in the flat dataframe corresponds to a person involved in a
crash. As there may be multiple people and/or vehicles involved in one
crash, some variable-values are repeated within a crash or vehicle. Each
crash is uniquely identified with id, which is a combination of year
and st_case. Note that st_case is not unique across years, for
example, st_case 510001 will appear in each year. The id variable
attempts to avoid this issue. The GES/CRSS data includes a weight
variable that indicates how many crashes each row represents.
The multi_ dataframes contain those variables for which there may be a
varying number of values for any entity (e.g., driver impairments,
vehicle events, weather conditions at time of crash). Each dataframe has
the requisite data elements corresponding to the entity: multi_acc
includes st_case and year, multi_veh adds veh_no (vehicle
number), and multi_per adds per_no (person number).
The events dataframe provides a sequence of events for each vehicle in
each crash. See the vignette(“Crash Sequence of Events”, package =
“rfars”) for more information.
The codebook dataframe provides a searchable codebook for the data,
useful if you know what concept you’re looking for but not the variable
that describes it. rfars also includes pre-loaded codebooks for FARS
and GESCRSS (rfars::fars_codebook and rfars::gescrss_codebook). See
vignette('Searchable Codebooks', package = 'rfars') for more
information.
See vignette("Counts", package = "rfars") for information on the
pre-loaded annual_counts dataframe and the counts() and
compare_counts() functions. Also see
vignette("Alcohol Counts", package = "rfars") for details on how BAC
values are imputed and reported in Traffic Safety Facts.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.