In DocOfi/FARSread-Package: Package to Read and Manipulate FARS Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Package FARSread

Package FARSread was designed to make it easy for R users to summarize and visualize the Fatality Analysis Reporting System or FARS data from the National Highway Traffic Safety Administration NHTSA. This document will familiarize the reader with the fars_summarize_years function and the The fars_map_state function from the package readFARS. it will also provide an example of how to download different years of FARS data from the NHTSA.

This is the second vignette for the package FARSread. The first one entitled "Introduction to FARSread" familiarizes the reader with the function fars_read which reads the FARS data from the NHSTA into R. If you haven't read it yet, we advise that you read it before proceeding with this second tutorial.

library(FARSread) ###load package FARSread so we can use the fars_summarize_years function and the the fars_map_state function

Data: fars2015

Fatality Analysis Reporting System or FARS contains data derived from a census of fatal motor vehicle traffic crashes within the 50 States, the District of Columbia, and Puerto Rico. FARS data is available for every year since FARS was established in 1975.

Interested parties may access data tables at NHTSA, download files via FTP at ftp://ftp.nhtsa.dot.gov/fars/, or make requests through the Publications and Data Requests page. Information requests can also be made by phone at 800-934-8517, or by e-mail at ncsaweb@dot.gov

Downloading file from NHTSA's ftp site

For this tutorial we will be comparing the FARS data for the years 2013, 2014, and 2015. We have obtained the URL for the 2013, 2014, and 2015 National FARS data from the NHSTA's website and we will use it to download the file from the NHSTA's ftp site.

First we'll create folders or directories to contain the zip file.

if(!file.exists("farsdata2013")){ ### create folder for 2013 zip file
        dir.create("farsdata2013")
}

if(!file.exists("farsdata2014")){ ### create folder for 2014 zip file
        dir.create("farsdata2014")
}

if(!file.exists("farsdata2015")){ ### create folder for 2015 zip file
        dir.create("farsdata2015")
}

Next, we will save the URL's for files in an R object.

fileurl2013 <- "ftp://ftp.nhtsa.dot.gov/fars/2013/National/FARS2013NationalCSV.zip" ### URL for 2013 data

fileurl2014 <- "ftp://ftp.nhtsa.dot.gov/fars/2014/National/FARS2014NationalCSV.zip" ### URL for 2014 data

fileurl2015 <- "ftp://ftp.nhtsa.dot.gov/fars/2015/National/FARS2015NationalCSV.zip" ### URL for 2015 data

Now we download the files for the year 2013 and save it to the appropriate folder. Using the unzip function in R, we extract the accident.csv file from each of the zip files. We then read the accident.csv file in R using the fars_read function from the FARSread package and write a new file in .csv.bz2 format in the working directory . We'll explain in a little while why we need to do this. The download may take a considerable amount of time.

download.file(fileurl2013, destfile="./farsdata2013/FARS2013NationalCSV.zip") ### download file and save in the folder farsdata2013

dateDownloaded2013 <- date() ### recording the date and time of our download
dateDownloaded2013

unzip("./farsdata2013/FARS2013NationalCSV.zip", exdir = "./farsdata2013", files = "accident.csv") ### unzip file and extract accident.csv file

acc_2013 <- fars_read("./farsdata2013/accident.csv") ### read file into R and save as an R object called acc_2013

write.csv(acc_2013, ### from the R object acc_2013 
          file = "accident_2013.csv.bz2", ### create accident_2013.csv.bz2 in the working directory
          row.names = FALSE) ### do not create a new column of rownames when you read in the file in R

unlink("./farsdata2013/accident.csv") ### delete accident.csv file

unlink("./farsdata2013/FARS2013NationalCSV.zip") ### delete FARS2013NationalCSV.zip file

Then for the 2014 FARS data...

download.file(fileurl2014, destfile="./farsdata2014/FARS2014NationalCSV.zip") ### download file and save in the folder farsdata2014

dateDownloaded2014 <- date() ### recording the date and time of our download
dateDownloaded2014

unzip("./farsdata2014/FARS2014NationalCSV.zip", exdir = "./farsdata2014", files = "accident.csv") ### unzip file and extract accident.csv file

acc_2014 <- fars_read("./farsdata2014/accident.csv") ### read file into R and save as an R object called acc_2014

write.csv(acc_2014, ### from the R object acc_2014 
          file = "accident_2014.csv.bz2", ### we create accident_2014.csv.bz2 in the working directory
          row.names = FALSE) ### do not create a new column of rownames when you read in the file in R

unlink("./farsdata2014/accident.csv") ### delete accident.csv file

unlink("./farsdata2014/FARS2014NationalCSV.zip") ### delete FARS2014NationalCSV.zip file

Then for the 2015 FARS data.

download.file(fileurl2015, destfile="./farsdata2015/FARS2015NationalCSV.zip") ### download file and save in the folder farsdata2015

dateDownloaded2015 <- date() ### recording the date and time of our download
dateDownloaded2015

unzip("./farsdata2015/FARS2015NationalCSV.zip", exdir = "./farsdata2015", files = "accident.csv") ### unzip file and extract accident.csv file

acc_2015 <- fars_read("./farsdata2015/accident.csv") ### read file into R and save as an R object called acc_2015

write.csv(acc_2015, ### from the R object acc_2015 
          file = "accident_2015.csv.bz2", ### we create accident_2015.csv.bz2 in the working directory
          row.names = FALSE) ### do not create a new column of rownames when you read in the file in R

unlink("./farsdata2015/accident.csv") ### delete accident.csv file

unlink("./farsdata2015/FARS2015NationalCSV.zip") ### delete FARS2015NationalCSV.zip file

At this point all the codes you have seen so far can be used interactively on the console pane of R studio. The codes for downloading data from NHTSA were presented here to give the reader the freedom to choose whichever year of the FARS data he/she wishes to analyze.

If you want to run the functions-readfars.Rmd file from this repository, you'll have to set the arguments eval = FALSE to eval = TRUE in all the code chunks prior to this note. It was set to FALSE because it took too long to run the vignettes everytime the package is tested during development.

Since the FARS data for 2013 to 2015 is included in this package, I will now show you how to access them from your directory. The data can be accessed for use using the system.file function in R.

Accessing the 2013 to 2015 FARS data

The function system.file is the key that will allow us to gain access to the 2013 to 2015 data within the package. Let's begin with the 2015 data.

### the following code reads access the accident_2015.csv.bz2 file in the folder extdata in package FARSread and save it as an R object named fars_2015
fars_2015 <- system.file("extdata", "accident_2015.csv.bz2", package = "FARSread")

### copy the file from its folder and save it to the working directory
file.copy(from = fars_2015, to = getwd())

Now, we are ready to use the fars_summarize_years function.

The fars_summarize_years function

The fars_summarize_years function is designed to create a table that will show the number of traffic accidents for every month in a single or a number of years, provided that you have the data from the NHSTA in your working directory in a .csv.bz2 format.

Unlike the fars_read function which reads into R a .csv file or a .csv.bz2 file, the fars_summarize_year function requires a .csv.bz2 file to work. A .csv.bz2 file occupy less space in your hard drive, which is the reason we saved the file in this format. The .csv file of the 2015 National FARS data occupies 5.17 MB of your hard drive while the .csv.bz2 file occupies only 876 KB. A whopping difference!!!

We are now ready to use the fars_summarize_years function It takes as input the year that the user wants to analyze. This is specially advantageous to new users in R who may not be familiar with manipulating data in order to arrive at the information they need. The FARS data has many valuables which provide a lot of information. But, if you're only interested in comparing the number of traffic accidents by month in a particular year or several years, then the 'fars_summarize_years` can save you a lot of time and effort.

most_months <- fars_summarize_years(2015) ### The first argument to this function is the year you want to analyze. You may input as many years as you wish provided that the data for that year is available in a .csv.bnz2 format in your working directory

The output of the fars_summarize_years function is a table providing a summary of the number of tracffic accidents for each month of 2015.

Notice that table above is the same table we created in the first tutorial but it only took us one line of code with one argument 2015 in order to arrive at the same output. Just as an exercise we will plot the result in the same way that we did in the previous tutorial.

library(dplyr) ###load package dplyr to access %>%  forward pipe operator
library(ggplot2) ### load package ggplot2 for plotting data

most_months <- most_months  %>% ### get the table we created called summ_months
        mutate(mon_abb = as.factor(month.abb)) %>% ### create a new variable of abbreviations for month names
        as.data.frame() ### save result as a data frame

most_months$mon_abb <- factor(most_months$mon_abb, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) ### assign factor levels to correctly plot the data

most_months

colnames(most_months) <- c("MONTH", "count", "mon_abb") ### change variable name `2015` to count

ggplot(data = most_months, aes(x = mon_abb, y = count, group = 1)) + ### select the data and variables we will plot
        geom_line(colour = "red", linetype = "dashed", size = 1.5) + ### plot data as a line
        geom_point(colour = "red", size = 4, shape = 21, fill = "green") + ### plot data as green round points as well
        xlab("Month") + ### add label to x-axis
        ggtitle("Number of Accidents Recorded per Month in 2015") ### add title

The plot shows the peak months for traffic accidents in 2015 were from July to October.

If we wanted to compare the months in the years: 2013, 2014, and 2015, we first have to access the FARS data for 2013 and 2014 within the package.

### the following code reads access the accident_2013.csv.bz2 file in the folder extdata in package FARSread and save it as an R object named fars_2015
fars_2013 <- system.file("extdata", "accident_2013.csv.bz2", package = "FARSread")

### copy the file from its folder and save it to the working directory
file.copy(from = fars_2013, to = getwd()) 

### the following code reads access the accident_2013.csv.bz2 file in the folder extdata in package FARSread and save it as an R object named fars_2015
fars_2014 <- system.file("extdata", "accident_2014.csv.bz2", package = "FARSread")

### copy the file from its folder and save it to the working directory
file.copy(from = fars_2014, to = getwd())

NOw, we are ready to summarize the FARS data from 2013 to 2015.

fars_summarize_years(c(2013, 2014, 2015)) ### The first argument to this function are the years 2013, 2014, and 2015. You may input as many years as you wish provided that the data for that year is available in a .csv.bnz2 format in your working directory

To better appreciate the information that the table contains, We will plot the output of the table.

comp_yrs <- fars_summarize_years(c(2013, 2014, 2015)) ### sumarize data for years 2013, 2014, and 2015

names(comp_yrs) <- c("month", "yr2013", "yr2014", "yr2015") ### change variable/column names to appropriate names in R

comp_yrs <- comp_yrs %>% mutate(month = as.factor(month.abb)) ### change the numerical representation of months to abbreviations

comp_yrs$month <- factor(comp_yrs$month, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) ### assign factor levels to correctly plot the data

library(tidyr) ### load package tidyr so we can access the function gather()
long_comp_yrs <- gather(comp_yrs, key = key, value = value, -month) ### change data frame structure to facilitate  plotting with ggplot

ggplot(data = long_comp_yrs, aes(x = month, y = value, group = key, color = key)) + ### input the variables we want to plot
        geom_line(linetype = "dashed", size = 1) + ### show a line plot
        geom_point(size = 2, shape = 21) + ### show the points as well
        xlab("Month") + ### put a label on the x axis 
        ggtitle("Number of Accidents Recorded per Month in 2015") ### Put a title

It seems what the character Ethan Hunt, played by Tom Cruise in the movie Mission Impossible, said was true. "Because Traffic has a memory. It's Amazing!!! It's like a living organism". Based on this plot of traffic accidents from 2013 to 2015, traffic accidents have a seasonal pattern.

The fars_map_state function

The fars_map_state function takes in two arguments or inputs. The first is a number from 1 to 56 which stands for the States in the united States including Puerto Rico and the district of Columbia. For some reason, the numbers 3, 7, and 14 were not used to denote a State. Using them as the first argument for the state no. or any number above 56 will result in an error.

The second is a number representing the year we're interested in. Again, for this to work, we should have downloaded the .csv file for that particular year in our working directory and have created a new file from it in a .csv.bz2 format.

The fars-map_state function also requires the data to be in .csv.bz2 format just like the fars-summarize_years function. But since we already created the accident_2015.csv.bz2 file for the 2015 data in your working directory in the previous example, we can proceed to using the fars_map_state function right away.

To examine a single map we can:

par(mar = c(0.5,0.5,0.5,0.5)) ###Arrange the margins to fit the plot
fars_map_state(48, ### selects the number 48 which stands for the state of Texas
               2015) ### selects the year 2015

We see the lone map of the Lone Star State of Texas. Every dot on the map is a representation of where a traffic accident occured based on the latitude and longitudinal position reported in FARS data. We see that there seem to be a confluence of the dots in certain areas indicating where traffic accidents seem to happen the most.

The first argument to our fars-map_state function is the number 48 which corresponds to the State of Texas. At the end of this document is the appendix, which contains a table of the numbers used in the FARS data and their corresponding State names. You will find this helpful when you start exploring the FARS data.

Why create only one when we can have more. We can create multiple plots side by side. In the first tutorial we saw that the State of Texas, California, and Florida have the most number of traffic accidents for the year 2015. For this demonstration of the fars_map_state function, We'll choose these States which corresponds to the number 48, 6, and 12 respectively. These will serve as the first arguments and the year 2015 for our second argument.

par(mfrow = c(1,3)) ### plot the map in three columns in a single row
par(oma = c(2, 0,0,0)) ### specify the margins to allow for titles and texts
par(mar=c(1,0.5,0.5,0.5)) ### specify the margins to fit the plot
fars_map_state(48, ### selects the number 48 which stands for the state of Texas
               2015) ### selects the year 2015
title("The State\n of\n Texas", ###text for title
      line=-2, ### position of the title
      cex.main = 1, ### size of the text
      adj = 0.8, ### left to right position of title
      col.main= "blue") ### color of the text

fars_map_state(6, ### selects the number 6 which stands for the state of California
               2015) ### selects the year 2015
title("The State of\n California", ###text for title
      line=-6, ### position of the title
      cex.main = 1, ### size of the text
      adj = 0.9, ### left to right position of title
      col.main= "blue") ### color of the text
mtext("Map Plot of Traffic Accidents Using Geospatial Locationing", ### title to print
      side = 1, ### print title on the bottom
      line = 0, ### start printin at line 0
      outer = TRUE, ### Use margins if available
      cex = 1.5) ### size of the text

fars_map_state(12, ### selects the number 12 which stands for the state of Florida
               2015) ### selects the year 2015
title("The\n State\n of\n Florida", ###text for title
      line=-8, ### position of the title
      adj = 0.4, ### left to right position of title
      cex.main = 1, ### size of the text
      col.main= "blue") ### color of the text

We can compare the distribution of traffic accidents in key cities of these States with a little help from google to tell us which cities are located where the points on the map seem to aggregate.

If we have the accident_2013.csv, accident_2014.csv, and the accident_2015.csv files in our working directory, we can read them in R and create accident_2013.csv.bz2, accident_2015.csv.bz2, and accident_2015.csv.bz2 files, which we can use to plot a map of the State of Georgia from 2013 to 2015. I just remembered, we do have them.

par(mfrow = c(1,3)) ### plot the map in three columns in a single row
par(oma = c(2, 0,0,0)) ### specify the margins to allow for titles and texts
par(mar=c(1,0.5,0.5,0.5)) ### specify the margins to fit the plot
fars_map_state(13, ### selects the number 48 which stands for the state of Texas
               2013) ### selects the year 2015
title("The State\n of\n Georgia\n 2013", ###text for title
      line=-4, ### position of the title
      cex.main = 1, ### size of the text
      adj = 0.95, ### left to right position of title
      col.main= "firebrick3") ### color of the text

fars_map_state(13, ### selects the number 48 which stands for the state of Texas
               2014) ### selects the year 2015
title("The State\n of\n Georgia\n 2014", ###text for title
      line=-4, ### position of the title
      cex.main = 1, ### size of the text
      adj = 0.95, ### left to right position of title
      col.main= "firebrick3") ### color of the text
mtext("Map Plot of Traffic Accidents in Georgia from 2013-2015", ### title to print
      side = 1, ### print title on the bottom
      line = 0, ### start printin at line 0
      outer = TRUE, ### Use margins if available
      cex = 1.5) ### size of the text

fars_map_state(13, ### selects the number 48 which stands for the state of Texas
               2015) ### selects the year 2015
title("The State\n of\n Georgia\n 2015", ###text for title
      line=-4, ### position of the title
      cex.main = 1, ### size of the text
      adj = 0.95, ### left to right position of title
      col.main= "firebrick3") ### color of the text

Looking at these plots of Georgia, we can see there was an increase in the number of traffic accidents in Georgia from 2013 to 2015 even without looking at the numbers or a table. It also shows the the increase did not happen just in selected areas, but rather appears to have occured throughout the State. Compared to numbers or tables, plots have the advantage of conveying more information at a glance.

We can also add some features to improve the maps readability by adding a few more information. Here we re-examine the the State of Texas and add some features to easily identify certain areas of the map. This becomes handy during presentation to highlight which areas are being referred to in the map.

But before we can plot the results, we first have to find the relative centers where most of the accidents occured. We will first find which County in Texas has the most number of recorded traffic accidents and then find their relative position based on the geospatial location provided in the FARS data.

acc_2015 <- fars_read("accident_2015.csv.bz2") ### read in the FARS data for 2015

texas <- acc_2015 %>% ### 2015 FARS data
        filter(STATE == 48) %>% ### limit our analysis to the State of Texas
        select(COUNTY, CITY, LONGITUD, LATITUDE) ### select the variables we need

texas %>% ### 2015 FARS data from the state of Texas
        group_by(COUNTY) %>% ### group data by county
        summarize(count = n()) %>% ### count number of accidents per county
        arrange(desc(count)) ### arrange county from highest to lowest

We see that County number 201 and 113 have the highest number of recorded tracfic accidents. We can mark the area by drawing a circle or a square to specify our area of interest.

county201 <- texas %>% ### we take our subset of the FARS data limited to the State of Texas
        filter(COUNTY == 201) %>%  ### filter only observations for the COUNTY 201 
        summarize(mean_lat = mean(LATITUDE), mean_long = mean(LONGITUD)) ### summarize the latitude and longitudinal position to come up with a relative position of the center where the accidents happen the most 

county201 ### gives relative position where accidents happen most in County 201

We now have the relative center where most accidents happen in County 201 of the State of Texas.

county113 <- texas %>% ### we take our subset of the FARS data limited to the State of Texas
        filter(COUNTY == 113) %>% ### filter only observations for the COUNTY 113
        summarize(mean_lat = mean(LATITUDE), mean_long = mean(LONGITUD)) ### summarize the latitude and longitudinal position to come up with a relative position of the center where the accidents happen the most

county113 ### gives relative position where accidents happen most in County 133

We now have the relative center where most accidents happen in County 133 of the State of Texas. Let's start plotting!

par(oma = c(1, 0,0,0)) ### specify the margins to allow for titles and texts
par(mar=c(0.5,0.5,0.5,0.5))
fars_map_state(48, ### selects the number 48 which stands for the state of Texas
               2015) ### selects the year 2015
points(x = -95.38142, y =  29.80174, col = "red", cex = 1.7, pch = 0, lwd = 2)
text(x = -95.38142, y =  29.80174, labels = "201", col = "red", cex = 1.2, pos = 1, offset = 1.5)
points(x = -96.79868, y =  32.80453, col = "blue", cex = 1.7, pch = 1, lwd = 2)
text(x = -96.79868, y =  32.80453, labels = "113", col = "blue", cex = 1.2, pos = 3, offset = 1.5)
title("The State\n of\n Texas", ###text for title
      line=-2, ### position of the title
      cex.main = 1, ### size of the text
      adj = 0.8, ### left to right position of title
      col.main= "black") ### color of the text
mtext("Map Plot of Traffic Accidents Using Geospatial Locationing", ### title to print
      side = 1, ### print title on the bottom
      line = 0, ### start printin at line 0
      outer = TRUE, ### Use margins if available
      cex = 1) ### size of the text

It turns out, Texas County 113 is Dallas County and Texas County 201 is Harris County.

By providing features like circles, squares, dots or even arrows, we can improve how we convey information to our listeners during presentations.

Before we end, let's get read of the copy of the files we copied from the system directory to the working directory.

file.remove("./accident_2013.csv.bz2")
file.remove("./accident_2014.csv.bz2")
file.remove("./accident_2015.csv.bz2")

I hope you found our short demonstration of the package readFARS entertaining as well as educational and may the use of the functions in these package facilitate your analysis of the FARS data.

Appendix

List of State Number and their correspong State

|State Number |State | |:------------|:--------------------| |1 |Alabama | |2 |Alaska | |3 | | |4 |Arizona | |5 |Arkansas | |6 |California | |7 | | |8 |Colorado | |9 |Connecticut | |10 |Delaware | |11 |Distric of Columbia | |12 |Florida | |13 |Georgia | |14 | | |15 |Hawaii | |16 |Idaho | |17 |Illinois | |18 |Indiana | |19 |Iowa | |20 |Kansas | |21 |Kentucky | |22 |Louisiana | |23 |Maine | |24 |Maryland | |25 |Massachusetts | |26 |Michigan | |27 |Minnesota | |28 |Mississippi | |29 |Missouri | |30 |Montana | |31 |Nebraska | |32 |Nevada | |33 |New Hampshire | |34 |New jersey | |35 |New Mexico | |36 |New York | |37 |North Carolina | |38 |North Dakota | |39 |Ohio | |40 |Oklahoma | |41 |Oregon | |42 |Pennsylvania | |43 |Puerto Rico | |44 |Rhode island | |45 |South Carolina | |46 |South Dakota | |47 |Tennessee | |48 |Texas | |49 |Utah | |50 |Vermont | |51 |Virgini islands | |52 |Virginia | |53 |Washington | |54 |West virginia | |55 |Wisconsin | |56 |Wyoming |

for some reason, the numbers 3, 7, and 14 were not used Using them as an argument for the state no. in the function `fars_map_state` or any number above 56 will result in an error.

References

FARS Analytical User's Manual. 1975 - 2016. U. S. Department of Transportation. National Highway Traffic Safety Administration. National Center for Statistics and?Analysis. Washington, D.C. 20590
National Highway Traffic Safety Administration NHTSA website

DocOfi/FARSread-Package documentation built on May 21, 2019, 3:07 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com