knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Package FARSread was designed to make it easy for R users to summarize and visualize the Fatality Analysis Reporting System or FARS data from the National Highway Traffic Safety Administration NHTSA. This document will familiarize the reader with the fars_summarize_years
function and the The fars_map_state
function from the package readFARS. it will also provide an example of how to download different years of FARS data from the NHTSA.
This is the second vignette for the package FARSread. The first one entitled "Introduction to FARSread" familiarizes the reader with the function fars_read
which reads the FARS data from the NHSTA into R. If you haven't read it yet, we advise that you read it before proceeding with this second tutorial.
library(FARSread) ###load package FARSread so we can use the fars_summarize_years function and the the fars_map_state function
Fatality Analysis Reporting System or FARS contains data derived from a census of fatal motor vehicle traffic crashes within the 50 States, the District of Columbia, and Puerto Rico. FARS data is available for every year since FARS was established in 1975.
Interested parties may access data tables at NHTSA, download files via FTP at ftp://ftp.nhtsa.dot.gov/fars/, or make requests through the Publications and Data Requests page. Information requests can also be made by phone at 800-934-8517, or by e-mail at ncsaweb@dot.gov
For this tutorial we will be comparing the FARS data for the years 2013, 2014, and 2015. We have obtained the URL for the 2013, 2014, and 2015 National FARS data from the NHSTA's website and we will use it to download the file from the NHSTA's ftp site.
First we'll create folders or directories to contain the zip file.
if(!file.exists("farsdata2013")){ ### create folder for 2013 zip file dir.create("farsdata2013") } if(!file.exists("farsdata2014")){ ### create folder for 2014 zip file dir.create("farsdata2014") } if(!file.exists("farsdata2015")){ ### create folder for 2015 zip file dir.create("farsdata2015") }
Next, we will save the URL's for files in an R object.
fileurl2013 <- "ftp://ftp.nhtsa.dot.gov/fars/2013/National/FARS2013NationalCSV.zip" ### URL for 2013 data fileurl2014 <- "ftp://ftp.nhtsa.dot.gov/fars/2014/National/FARS2014NationalCSV.zip" ### URL for 2014 data fileurl2015 <- "ftp://ftp.nhtsa.dot.gov/fars/2015/National/FARS2015NationalCSV.zip" ### URL for 2015 data
Now we download the files for the year 2013 and save it to the appropriate folder. Using the unzip
function in R, we extract the accident.csv
file from each of the zip files. We then read the accident.csv
file in R using the fars_read
function from the FARSread
package and write a new file in .csv.bz2 format in the working directory . We'll explain in a little while why we need to do this. The download may take a considerable amount of time.
download.file(fileurl2013, destfile="./farsdata2013/FARS2013NationalCSV.zip") ### download file and save in the folder farsdata2013 dateDownloaded2013 <- date() ### recording the date and time of our download dateDownloaded2013 unzip("./farsdata2013/FARS2013NationalCSV.zip", exdir = "./farsdata2013", files = "accident.csv") ### unzip file and extract accident.csv file acc_2013 <- fars_read("./farsdata2013/accident.csv") ### read file into R and save as an R object called acc_2013 write.csv(acc_2013, ### from the R object acc_2013 file = "accident_2013.csv.bz2", ### create accident_2013.csv.bz2 in the working directory row.names = FALSE) ### do not create a new column of rownames when you read in the file in R unlink("./farsdata2013/accident.csv") ### delete accident.csv file unlink("./farsdata2013/FARS2013NationalCSV.zip") ### delete FARS2013NationalCSV.zip file
Then for the 2014 FARS data...
download.file(fileurl2014, destfile="./farsdata2014/FARS2014NationalCSV.zip") ### download file and save in the folder farsdata2014 dateDownloaded2014 <- date() ### recording the date and time of our download dateDownloaded2014 unzip("./farsdata2014/FARS2014NationalCSV.zip", exdir = "./farsdata2014", files = "accident.csv") ### unzip file and extract accident.csv file acc_2014 <- fars_read("./farsdata2014/accident.csv") ### read file into R and save as an R object called acc_2014 write.csv(acc_2014, ### from the R object acc_2014 file = "accident_2014.csv.bz2", ### we create accident_2014.csv.bz2 in the working directory row.names = FALSE) ### do not create a new column of rownames when you read in the file in R unlink("./farsdata2014/accident.csv") ### delete accident.csv file unlink("./farsdata2014/FARS2014NationalCSV.zip") ### delete FARS2014NationalCSV.zip file
Then for the 2015 FARS data.
download.file(fileurl2015, destfile="./farsdata2015/FARS2015NationalCSV.zip") ### download file and save in the folder farsdata2015 dateDownloaded2015 <- date() ### recording the date and time of our download dateDownloaded2015 unzip("./farsdata2015/FARS2015NationalCSV.zip", exdir = "./farsdata2015", files = "accident.csv") ### unzip file and extract accident.csv file acc_2015 <- fars_read("./farsdata2015/accident.csv") ### read file into R and save as an R object called acc_2015 write.csv(acc_2015, ### from the R object acc_2015 file = "accident_2015.csv.bz2", ### we create accident_2015.csv.bz2 in the working directory row.names = FALSE) ### do not create a new column of rownames when you read in the file in R unlink("./farsdata2015/accident.csv") ### delete accident.csv file unlink("./farsdata2015/FARS2015NationalCSV.zip") ### delete FARS2015NationalCSV.zip file
At this point all the codes you have seen so far can be used interactively on the console pane of R studio. The codes for downloading data from NHTSA were presented here to give the reader the freedom to choose whichever year of the FARS data he/she wishes to analyze.
If you want to run the functions-readfars.Rmd
file from this repository, you'll have to set the arguments eval = FALSE
to eval = TRUE
in all the code chunks prior to this note. It was set to FALSE because it took too long to run the vignettes everytime the package is tested during development.
Since the FARS data for 2013 to 2015 is included in this package, I will now show you how to access them from your directory. The data can be accessed for use using the system.file
function in R.
The function system.file
is the key that will allow us to gain access to the 2013 to 2015 data within the package. Let's begin with the 2015 data.
### the following code reads access the accident_2015.csv.bz2 file in the folder extdata in package FARSread and save it as an R object named fars_2015 fars_2015 <- system.file("extdata", "accident_2015.csv.bz2", package = "FARSread") ### copy the file from its folder and save it to the working directory file.copy(from = fars_2015, to = getwd())
Now, we are ready to use the fars_summarize_years
function.
The fars_summarize_years
function is designed to create a table that will show the number of traffic accidents for every month in a single or a number of years, provided that you have the data from the NHSTA in your working directory in a .csv.bz2 format.
Unlike the fars_read
function which reads into R a .csv file or a .csv.bz2 file, the fars_summarize_year
function requires a .csv.bz2 file to work. A .csv.bz2 file occupy less space in your hard drive, which is the reason we saved the file in this format. The .csv file of the 2015 National FARS data occupies 5.17 MB of your hard drive while the .csv.bz2 file occupies only 876 KB. A whopping difference!!!
We are now ready to use the fars_summarize_years
function It takes as input the year that the user wants to analyze. This is specially advantageous to new users in R who may not be familiar with manipulating data in order to arrive at the information they need. The FARS data has many valuables which provide a lot of information. But, if you're only interested in comparing the number of traffic accidents by month in a particular year or several years, then the 'fars_summarize_years` can save you a lot of time and effort.
most_months <- fars_summarize_years(2015) ### The first argument to this function is the year you want to analyze. You may input as many years as you wish provided that the data for that year is available in a .csv.bnz2 format in your working directory
The output of the fars_summarize_years
function is a table providing a summary of the number of tracffic accidents for each month of 2015.
Notice that table above is the same table we created in the first tutorial but it only took us one line of code with one argument 2015
in order to arrive at the same output. Just as an exercise we will plot the result in the same way that we did in the previous tutorial.
library(dplyr) ###load package dplyr to access %>% forward pipe operator library(ggplot2) ### load package ggplot2 for plotting data most_months <- most_months %>% ### get the table we created called summ_months mutate(mon_abb = as.factor(month.abb)) %>% ### create a new variable of abbreviations for month names as.data.frame() ### save result as a data frame most_months$mon_abb <- factor(most_months$mon_abb, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) ### assign factor levels to correctly plot the data most_months colnames(most_months) <- c("MONTH", "count", "mon_abb") ### change variable name `2015` to count ggplot(data = most_months, aes(x = mon_abb, y = count, group = 1)) + ### select the data and variables we will plot geom_line(colour = "red", linetype = "dashed", size = 1.5) + ### plot data as a line geom_point(colour = "red", size = 4, shape = 21, fill = "green") + ### plot data as green round points as well xlab("Month") + ### add label to x-axis ggtitle("Number of Accidents Recorded per Month in 2015") ### add title
The plot shows the peak months for traffic accidents in 2015 were from July to October.
If we wanted to compare the months in the years: 2013, 2014, and 2015, we first have to access the FARS data for 2013 and 2014 within the package.
### the following code reads access the accident_2013.csv.bz2 file in the folder extdata in package FARSread and save it as an R object named fars_2015 fars_2013 <- system.file("extdata", "accident_2013.csv.bz2", package = "FARSread") ### copy the file from its folder and save it to the working directory file.copy(from = fars_2013, to = getwd()) ### the following code reads access the accident_2013.csv.bz2 file in the folder extdata in package FARSread and save it as an R object named fars_2015 fars_2014 <- system.file("extdata", "accident_2014.csv.bz2", package = "FARSread") ### copy the file from its folder and save it to the working directory file.copy(from = fars_2014, to = getwd())
NOw, we are ready to summarize the FARS data from 2013 to 2015.
fars_summarize_years(c(2013, 2014, 2015)) ### The first argument to this function are the years 2013, 2014, and 2015. You may input as many years as you wish provided that the data for that year is available in a .csv.bnz2 format in your working directory
To better appreciate the information that the table contains, We will plot the output of the table.
comp_yrs <- fars_summarize_years(c(2013, 2014, 2015)) ### sumarize data for years 2013, 2014, and 2015 names(comp_yrs) <- c("month", "yr2013", "yr2014", "yr2015") ### change variable/column names to appropriate names in R comp_yrs <- comp_yrs %>% mutate(month = as.factor(month.abb)) ### change the numerical representation of months to abbreviations comp_yrs$month <- factor(comp_yrs$month, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) ### assign factor levels to correctly plot the data library(tidyr) ### load package tidyr so we can access the function gather() long_comp_yrs <- gather(comp_yrs, key = key, value = value, -month) ### change data frame structure to facilitate plotting with ggplot ggplot(data = long_comp_yrs, aes(x = month, y = value, group = key, color = key)) + ### input the variables we want to plot geom_line(linetype = "dashed", size = 1) + ### show a line plot geom_point(size = 2, shape = 21) + ### show the points as well xlab("Month") + ### put a label on the x axis ggtitle("Number of Accidents Recorded per Month in 2015") ### Put a title
It seems what the character Ethan Hunt, played by Tom Cruise in the movie Mission Impossible, said was true. "Because Traffic has a memory. It's Amazing!!! It's like a living organism". Based on this plot of traffic accidents from 2013 to 2015, traffic accidents have a seasonal pattern.
The fars_map_state
function takes in two arguments or inputs. The first is a number from 1 to 56 which stands for the States in the united States including Puerto Rico and the district of Columbia.
For some reason, the numbers 3, 7, and 14 were not used to denote a State. Using them as the first argument for the state no. or any number above 56 will result in an error.
The second is a number representing the year we're interested in. Again, for this to work, we should have downloaded the .csv file for that particular year in our working directory and have created a new file from it in a .csv.bz2 format.
The fars-map_state
function also requires the data to be in .csv.bz2 format just like the fars-summarize_years
function. But since we already created the accident_2015.csv.bz2
file for the 2015 data in your working directory in the previous example, we can proceed to using the fars_map_state
function right away.
To examine a single map we can:
par(mar = c(0.5,0.5,0.5,0.5)) ###Arrange the margins to fit the plot fars_map_state(48, ### selects the number 48 which stands for the state of Texas 2015) ### selects the year 2015
We see the lone map of the Lone Star State of Texas. Every dot on the map is a representation of where a traffic accident occured based on the latitude and longitudinal position reported in FARS data. We see that there seem to be a confluence of the dots in certain areas indicating where traffic accidents seem to happen the most.
The first argument to our fars-map_state
function is the number 48 which corresponds to the State of Texas. At the end of this document is the appendix, which contains a table of the numbers used in the FARS data and their corresponding State names. You will find this helpful when you start exploring the FARS data.
Why create only one when we can have more. We can create multiple plots side by side. In the first tutorial we saw that the State of Texas, California, and Florida have the most number of traffic accidents for the year 2015. For this demonstration of the fars_map_state
function, We'll choose these States which corresponds to the number 48, 6, and 12 respectively. These will serve as the first arguments and the year 2015 for our second argument.
par(mfrow = c(1,3)) ### plot the map in three columns in a single row par(oma = c(2, 0,0,0)) ### specify the margins to allow for titles and texts par(mar=c(1,0.5,0.5,0.5)) ### specify the margins to fit the plot fars_map_state(48, ### selects the number 48 which stands for the state of Texas 2015) ### selects the year 2015 title("The State\n of\n Texas", ###text for title line=-2, ### position of the title cex.main = 1, ### size of the text adj = 0.8, ### left to right position of title col.main= "blue") ### color of the text fars_map_state(6, ### selects the number 6 which stands for the state of California 2015) ### selects the year 2015 title("The State of\n California", ###text for title line=-6, ### position of the title cex.main = 1, ### size of the text adj = 0.9, ### left to right position of title col.main= "blue") ### color of the text mtext("Map Plot of Traffic Accidents Using Geospatial Locationing", ### title to print side = 1, ### print title on the bottom line = 0, ### start printin at line 0 outer = TRUE, ### Use margins if available cex = 1.5) ### size of the text fars_map_state(12, ### selects the number 12 which stands for the state of Florida 2015) ### selects the year 2015 title("The\n State\n of\n Florida", ###text for title line=-8, ### position of the title adj = 0.4, ### left to right position of title cex.main = 1, ### size of the text col.main= "blue") ### color of the text
We can compare the distribution of traffic accidents in key cities of these States with a little help from google to tell us which cities are located where the points on the map seem to aggregate.
If we have the accident_2013.csv
, accident_2014.csv
, and the accident_2015.csv
files in our working directory, we can read them in R and create accident_2013.csv.bz2
, accident_2015.csv.bz2
, and accident_2015.csv.bz2
files, which we can use to plot a map of the State of Georgia from 2013 to 2015. I just remembered, we do have them.
par(mfrow = c(1,3)) ### plot the map in three columns in a single row par(oma = c(2, 0,0,0)) ### specify the margins to allow for titles and texts par(mar=c(1,0.5,0.5,0.5)) ### specify the margins to fit the plot fars_map_state(13, ### selects the number 48 which stands for the state of Texas 2013) ### selects the year 2015 title("The State\n of\n Georgia\n 2013", ###text for title line=-4, ### position of the title cex.main = 1, ### size of the text adj = 0.95, ### left to right position of title col.main= "firebrick3") ### color of the text fars_map_state(13, ### selects the number 48 which stands for the state of Texas 2014) ### selects the year 2015 title("The State\n of\n Georgia\n 2014", ###text for title line=-4, ### position of the title cex.main = 1, ### size of the text adj = 0.95, ### left to right position of title col.main= "firebrick3") ### color of the text mtext("Map Plot of Traffic Accidents in Georgia from 2013-2015", ### title to print side = 1, ### print title on the bottom line = 0, ### start printin at line 0 outer = TRUE, ### Use margins if available cex = 1.5) ### size of the text fars_map_state(13, ### selects the number 48 which stands for the state of Texas 2015) ### selects the year 2015 title("The State\n of\n Georgia\n 2015", ###text for title line=-4, ### position of the title cex.main = 1, ### size of the text adj = 0.95, ### left to right position of title col.main= "firebrick3") ### color of the text
Looking at these plots of Georgia, we can see there was an increase in the number of traffic accidents in Georgia from 2013 to 2015 even without looking at the numbers or a table. It also shows the the increase did not happen just in selected areas, but rather appears to have occured throughout the State. Compared to numbers or tables, plots have the advantage of conveying more information at a glance.
We can also add some features to improve the maps readability by adding a few more information. Here we re-examine the the State of Texas and add some features to easily identify certain areas of the map. This becomes handy during presentation to highlight which areas are being referred to in the map.
But before we can plot the results, we first have to find the relative centers where most of the accidents occured. We will first find which County in Texas has the most number of recorded traffic accidents and then find their relative position based on the geospatial location provided in the FARS data.
acc_2015 <- fars_read("accident_2015.csv.bz2") ### read in the FARS data for 2015 texas <- acc_2015 %>% ### 2015 FARS data filter(STATE == 48) %>% ### limit our analysis to the State of Texas select(COUNTY, CITY, LONGITUD, LATITUDE) ### select the variables we need texas %>% ### 2015 FARS data from the state of Texas group_by(COUNTY) %>% ### group data by county summarize(count = n()) %>% ### count number of accidents per county arrange(desc(count)) ### arrange county from highest to lowest
We see that County number 201 and 113 have the highest number of recorded tracfic accidents. We can mark the area by drawing a circle or a square to specify our area of interest.
county201 <- texas %>% ### we take our subset of the FARS data limited to the State of Texas filter(COUNTY == 201) %>% ### filter only observations for the COUNTY 201 summarize(mean_lat = mean(LATITUDE), mean_long = mean(LONGITUD)) ### summarize the latitude and longitudinal position to come up with a relative position of the center where the accidents happen the most county201 ### gives relative position where accidents happen most in County 201
We now have the relative center where most accidents happen in County 201 of the State of Texas.
county113 <- texas %>% ### we take our subset of the FARS data limited to the State of Texas filter(COUNTY == 113) %>% ### filter only observations for the COUNTY 113 summarize(mean_lat = mean(LATITUDE), mean_long = mean(LONGITUD)) ### summarize the latitude and longitudinal position to come up with a relative position of the center where the accidents happen the most county113 ### gives relative position where accidents happen most in County 133
We now have the relative center where most accidents happen in County 133 of the State of Texas. Let's start plotting!
par(oma = c(1, 0,0,0)) ### specify the margins to allow for titles and texts par(mar=c(0.5,0.5,0.5,0.5)) fars_map_state(48, ### selects the number 48 which stands for the state of Texas 2015) ### selects the year 2015 points(x = -95.38142, y = 29.80174, col = "red", cex = 1.7, pch = 0, lwd = 2) text(x = -95.38142, y = 29.80174, labels = "201", col = "red", cex = 1.2, pos = 1, offset = 1.5) points(x = -96.79868, y = 32.80453, col = "blue", cex = 1.7, pch = 1, lwd = 2) text(x = -96.79868, y = 32.80453, labels = "113", col = "blue", cex = 1.2, pos = 3, offset = 1.5) title("The State\n of\n Texas", ###text for title line=-2, ### position of the title cex.main = 1, ### size of the text adj = 0.8, ### left to right position of title col.main= "black") ### color of the text mtext("Map Plot of Traffic Accidents Using Geospatial Locationing", ### title to print side = 1, ### print title on the bottom line = 0, ### start printin at line 0 outer = TRUE, ### Use margins if available cex = 1) ### size of the text
It turns out, Texas County 113 is Dallas County and Texas County 201 is Harris County.
By providing features like circles, squares, dots or even arrows, we can improve how we convey information to our listeners during presentations.
Before we end, let's get read of the copy of the files we copied from the system directory to the working directory.
file.remove("./accident_2013.csv.bz2") file.remove("./accident_2014.csv.bz2") file.remove("./accident_2015.csv.bz2")
I hope you found our short demonstration of the package readFARS entertaining as well as educational and may the use of the functions in these package facilitate your analysis of the FARS data.
|State Number |State |
|:------------|:--------------------|
|1
|Alabama |
|2
|Alaska |
|3
| |
|4
|Arizona |
|5
|Arkansas |
|6
|California |
|7
| |
|8
|Colorado |
|9
|Connecticut |
|10
|Delaware |
|11
|Distric of Columbia |
|12
|Florida |
|13
|Georgia |
|14
| |
|15
|Hawaii |
|16
|Idaho |
|17
|Illinois |
|18
|Indiana |
|19
|Iowa |
|20
|Kansas |
|21
|Kentucky |
|22
|Louisiana |
|23
|Maine |
|24
|Maryland |
|25
|Massachusetts |
|26
|Michigan |
|27
|Minnesota |
|28
|Mississippi |
|29
|Missouri |
|30
|Montana |
|31
|Nebraska |
|32
|Nevada |
|33
|New Hampshire |
|34
|New jersey |
|35
|New Mexico |
|36
|New York |
|37
|North Carolina |
|38
|North Dakota |
|39
|Ohio |
|40
|Oklahoma |
|41
|Oregon |
|42
|Pennsylvania |
|43
|Puerto Rico |
|44
|Rhode island |
|45
|South Carolina |
|46
|South Dakota |
|47
|Tennessee |
|48
|Texas |
|49
|Utah |
|50
|Vermont |
|51
|Virgini islands |
|52
|Virginia |
|53
|Washington |
|54
|West virginia |
|55
|Wisconsin |
|56
|Wyoming |
fars_map_state
or any number above 56 will result in an error.FARS Analytical User's Manual. 1975 - 2016. U. S. Department of Transportation. National Highway Traffic Safety Administration. National Center for Statistics and?Analysis. Washington, D.C. 20590
National Highway Traffic Safety Administration NHTSA website
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.