setwd("../") source("R/fars_functions.R") library(tidyverse)
The fars
package serves to import and evaluate information from the US National Highway Traffic Safety Administration's Fatality Analysis Reporting System.
It allows the user to
The FARS data for the years 2013 through 2015 is included in the package and can be accessed
directly via system.file("extdata", "accident_2013.csv.bz2", package = "fars")
(change year in filename to access the other years or use make_filename()
to generate
the respective filename).
fars_read()
The fars_read()
function uses the read_csv()
function from
the readr
package to read in a data set and converts it to the
tbl_df
format known from the dplyr
and tibble
packages.
The function has been developed to read in FARS data but works with every data format
that is accepted by readr::read_csv()
. It gives a warning
if the specified file name does not exist.
The required input is a character string with the name of the file you want to read in.
make_filename()
The make_filename()
function generates standardised file names for FARS
data sets based on the year you pass to it as argument.
Its main raison d'ĂȘtre is its use inside the fars_read_years()
function (see below)
but it can also be used stand-alone.
fars_read_years()
The fars_read_years()
function is used to read in FARS data for a specified range of years.
It accepts a year or a vector of years as input and
returns a list of FARS data where every list entry contains the data
for one of the years specified in form of a tbl_df
. The data frame has one
line for every data point in the respective FARS data set and indicates the month
and the year for this data point.
fars_summarize_years()
The fars_summarize_years()
function is used to read in FARS data from a specified range of years
and return the monthly number of accident registered in FARS
as a tbl_df
with one column for every year analysed and
twelve rows, one for each month. The columns contain the sum of accidents reported to
FARS in each month of the given year.
The function internally uses the fars_read_years()
function to read in and
extract the data and then uses dplyr
and tidyr
functions to summarise and
reformat the data.
Like fars_read_years()
it reads the data from your working directory
and it will produce a warning if an invalid year is given as input. Please be aware that
the standard naming convention for FARS data has to be used (cf. make_filename
).
fars_map_state()
This function makes use of the geographic data contained in FARS accident data sets to plot
accidents on maps of US states.
Inputs are the year from which the data is to be plotted and the numeric code
of the US state to be analysed.
The function returns a plot with a graphical representation of the selected US state with the
accidents of the indicated year plotted as dots on the map according to their geographic
location. If the data set contains no accidents for the indicated year and state,
the message "no accidents to plot"
is printed to the console and if an invalid
state number is given as input, a warning will be displayed.
fars_map_state()
internally uses the make_filename
and fars_read
functions to read in the FARS data for the given year and than filters for the given state.
The geographic data is extracted and plotted using the maps
package.
library(knitr) opts_knit$set(root.dir = "../inst/extdata/")
Some examples of the functions in the package are given using the data for 2013, 2014 and 2015.
First, you need to install the package from GitHub and load it.
In the examples, we will also use additional functions from the tidyverse
package,
so we load this as well.
devtools::install_github('tdo15/fars') library(fars) library(tidyverse)
If we want to look at the data for 2013, we can first use make_filename()
function to
get the respective filename.
make_filename(2013)
We can read in the data using fars_read_years()
:
FARS_2013 <- fars_read_years(2013)
Now, we can do exploratory data analysis. For example, we can create a visualisation of the distribution of accidents per month:
distr_2013 <- FARS_2013[1] %>% as.data.frame %>% group_by(MONTH) %>% summarise(accidents = n()) ggplot(distr_2013, aes(x = as.factor(MONTH), y = accidents)) + geom_bar(stat = "identity") + labs(x = "month", y = "number of accidents")
This analysis step is already implemented in the fars_summarize_years()
function
(and gives the same result):
distr_2013_2 <- fars_summarize_years(2013) ggplot(distr_2013_2, aes(x = as.factor(MONTH), y = `2013`)) + geom_bar(stat = "identity") + labs(x = "month", y = "number of accidents")
We can also study several years in the same manner and e.g. have a look how accidents in december have changed over the years:
accidents_dec <- fars_summarize_years(2013:2015) %>% filter(MONTH == 12) plot(as.numeric(accidents_dec[-1]), type = "b", xlab = "Year", ylab = "no of accidents in December", xaxt = "n") axis(1, at=1:3, labels=2013:2015)
With the fars_map_state()
function, finally, we can create geographical visualisation
of accident data.
If we for example want to plot all reported accidents that happened in Nebraska in 2015,
we first need to find out the numeric state code of Nebraska,
e.g. at https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code.
Then, we only need to give this code and the year to the function:
fars_map_state(31,2015)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.