knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Package FARSread

Package FARSread was designed to make it easy for R users to summarize and visualize the Fatality Analysis Reporting System or FARS data from the National Highway Traffic Safety Administration NHTSA. This document hopes to familiarize the reader with the fars_read function from the package FARSread and the variables contained in a FARS data file.

library(FARSread) ###load package FARSread so we can use the fars_read function and other functions in the Package

Data: fars2015

Fatality Analysis Reporting System or FARS contains data derived from a census of fatal motor vehicle traffic crashes within the 50 States, the District of Columbia, and Puerto Rico. FARS data is available for every year since FARS was established in 1975.

Interested parties may access data tables at NHTSA, download files via FTP at ftp://ftp.nhtsa.dot.gov/fars/, or make requests through the Publications and Data Requests page. Information requests can also be made by phone at 800-934-8517, or by e-mail at ncsaweb@dot.gov

To download a copy of the 2015 FARS data, first go to this NHTSA website https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars. Click on the the link Download Raw Data from FTP Site which will bring you to another page with the title Index of /fars/. Click on the year 2015 which will bring you to the Index of /fars/2015/ page. Click on the link National which contains the zip file for the FARS national data for the year 2015. This will bring you another page Index of /fars/2015/National/ where you can choose the file format you want to download. The FARS 2015 National data is available in .csv, .sas, and .dbf format. Choose the FARS2015NationalCSV.zip which will start the download process from the ftp site. After the download has completed, unzip the file using a local software in your computer such as WinRAR in Windows or the unzip function in R. The .csv file should have the file name accident.csv. WARNING!!! If you are downloading .csv files for different years, it will be a good idea to download the zip files to different directories or folders. The files in each zip files have similar names which can result in confusion or files being overwritten.

For example the .csv file we are interested in is the accident.csv file. The same file name exists in the 2016 zip file , the 2015 zip file, the 2014 zip file as well as other zip files for different years. After unzipping the files, it's a good idea to change the name of the file to reflect the year the data was derived from. Changing accident.csv to accident_2015.csv can save some frustrations later on and also because you cannot keep a file with same file name in a single directory.

You can rename a file in R Studio. Go to the lower right pane in R Studio and select the square slot next to the file you want to change the name. A check mark will appear in the square slot. Click on this and click rename. Enter the new name for the file.

Downloading the file in R

If you know the URL for the file you want to download, you can download the file inside R itself. One of the great things about R is its versatility. It can be adapted to different demands of the workspace by utilizing the different packages that has been developed for it.

if(!file.exists("farsdata2015")){ ### create folder for 2015 zip file
        dir.create("farsdata2015")
}

fileurl2015 <- "ftp://ftp.nhtsa.dot.gov/fars/2015/National/FARS2015NationalCSV.zip" ### URL of the file we want to download

download.file(fileurl2015, destfile="./farsdata2015/FARS2015NationalCSV.zip") ### download file and save in farsdata2015

dateDownloaded2015 <- date() ### recording the date and time of our download
dateDownloaded2015

unzip("./farsdata2015/FARS2015NationalCSV.zip", exdir = "./farsdata2015", files = "accident.csv") ### unzip file and extract accident.csv file

acc_2015 <- fars_read("./farsdata2015/accident.csv") ### read file into R and save as an R object called acc_2015

We have read accident.csv from the zip file we downloaded. We assigned it to an R object named acc_2015. But before we can start analyzing the data, we need to do some housekeeping chores.

Zip files from ftp site of the NHTSA all contain the accident.csv file. So if you downloaded several years of FARS data it might becoming confusing which file is from which year. Furthermore, the .csv file takes up a lot of space on your hard drive. We can compress this file and save it in a .csv.bz2 format to lessen the file size. WARNING!!! The next chunk of codes will create a file in your working directory.

write.csv(acc_2015, ### from the R object acc_2015 
          file = "accident_2015.csv.bz2", ### we create the accident_2015.csv.bz2 in the working directory
          row.names = FALSE) ### do not create a new column of rownames when you read in the file in R

unlink("./farsdata2015/accident.csv") ### delete accident.csv file

unlink("./farsdata2015/FARS2015NationalCSV.zip") ### delete FARS2015NationalCSV.zip file

Reading the data in R: The fars_read function

We already used the fars_read function earlier to read the accident.csv file from the zip file. We will now use the same function to read the accident2015.csv.bz2 file we created to demonstrate the versatility of this function.

At this point all the codes you have seen so far can be used interactively on the console pane of R studio. If you want to run the introduction-to-readfars.Rmd file from this repository, you'll have to set the arguments eval = FALSE to eval = TRUE in all the code chunks prior to this note. we had to set them to FALSE as it took to long to run the vignettes everytime the package is tested during development.

Since the FARS data for 2015 is included in this package, I will now show you how to access them from your directory. The data can be accessed for use using the system.file function in R.

fars_2015 <- system.file("extdata", "accident_2015.csv.bz2", package = "FARSread") ### find the file path 

accident_2015 <- fars_read(fars_2015) ### read accident2015.csv.bz2 file into R and save as an R object called accident_2015

Looking at the variables in our data

Once the data is read in R, we can use all the functions available in R to analyze the data

To see what variables are present in the data we can use the r function glimpse from the package dplyr. You can see the results below but first we have to load the dplyr package using the function library.

library(dplyr) ### load package dplyr to access the function glimpse
glimpse(accident_2015) ### view the variable names 

From the output we can see:

If these are the variables we are interested in, we can select these among the many variables in the data and see a summary.

Exploring FARS

If we wanted to see the first 6 rows of data we can do the following:

accident_2015 %>% 
        select(STATE, MONTH, PEDS, PERNOTMVIT, VE_TOTAL, FATALS, DRUNK_DR) %>% ###select variables
        as.data.frame() %>% ###  save results as a data frame
        head() ### print first 6 rows of data

Creating summaries of variables of interest

We can create a summary of the variables we're interested in

accident_2015 %>% 
        select(STATE, MONTH, PEDS, PERNOTMVIT, VE_TOTAL, FATALS, DRUNK_DR) %>% ### selects variables
        summary() ### summarize data

Creating tables

If we wanted to see the States with the highest number of recorded traffic accidents, we can:

most_count <- accident_2015 %>% ###fars data object in R
        group_by(STATE) %>% ### groups the data according to State
        summarize(count = n()) %>% ### count the number of observations per State 
        arrange(desc(count)) ### arrange from highest to lowest

head(most_count, 15)

The table above shows that Texas (48) has the highest number of accidents followed by California (6) and Florida (12).

Plotting the results

We can also plot these results for presentations using the barplot() function that comes with a standard installation of R, but first we'll have to change the names of the States from the numbers that represent them to their true names.

plot_most_count <- most_count %>% ### The table above we just created
                        slice(1:15) %>% ### Get only top 15 States
                        mutate(STATE = c("TX", "CA", "FL", "GA", "NC", "PA", "NY", "OH",
                               "IL", "SD", "MI", "TN", "AR", "MS", "AL")) %>% ### Change to State abbreviated Names
                        as.data.frame() ### save as a data frame
barplot(height = plot_most_count$count, ### variable for y-axis
        names.arg = plot_most_count$STATE, ### variable for x-axis
        las = 1, ### orientation of labels on the x -axis
        cex.lab = 1, ### magnification for the size of the labels on the y and x-axis
        cex.names = 0.8, ### size of the text for the labels on the x and y-axis
        col = rainbow(20), ### color for the bars of the plot
        ylab = "Number of Crashes", ### label for the y-axis
        xlab = "States") ### label for the x-axis
title("Top 15 States\n With Highest \n Number of Traffic Accidents", ###text for title
      line=-4, ### position of the title
      cex.main = 1.2) ###size of the text

If we wanted to find the peak months for traffic accidents in 2015 we can

most_months <- accident_2015 %>% ### data that we read in R
        group_by(MONTH) %>% ### groups the data according to month
        summarize(count = n()) %>% ### find the total number of accidents per month
        mutate(mon_abb = as.factor(month.abb)) %>% ### create a new variable of abbreviations for month names
        as.data.frame() ### save result as a data frame
most_months

It's hard to see a trend or compare values looking at a table. We can better appreciate the results by plotting them and this time we'll use the function ggplot from the package ggplot2. We'll load this package again using the functionlibrary.

most_months$mon_abb <- factor(c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) ### assign factor levels to correctly plot the data

library(ggplot2) ### load ggplot2 package for plotting data
ggplot(data = most_months, aes(x = mon_abb, y = count, group = 1)) + ### select the data and variables we will plot
        geom_line(colour = "red", linetype = "dashed", size = 1.5) + ### plot data as a line
        geom_point(colour = "red", size = 4, shape = 21, fill = "green") + ### plot data as green round points as well
        xlab("Month") + ### add label to x-axis
        ggtitle("Number of Accidents Recorded per Month in 2015") ### add title

We see that the peak months for traffic accidents are the months from July to october. The package FARSread has other functions which will be demonstrated on the second vignette included in this package. Among those functions is a function that creates a map plot of a State with the location of the traffic accidents shown as points in the map.

To gain a better understanding of the data in FARS we have provided below a brief description of the variables in the 2015 FARS data. You'll find it in the appendix at the bottom of this page. For a detailed description of the variables in the data you can download a copy of the pdf file Fatality Analysis Reporting System (FARS) Analytical User's Manual 1975-2016 available from the website https://crashstats.nhtsa.dot.gov/Api/Public/Publication/812447.

You can also view the FARS data from 2015 in R without having to use the system.file function. We have read the file in R for you and saved it in an R object called fars_2015. You can access the first 6 rows of the data by typing the following on the console pane in R Studio:

library(FARSread)
head(fars_2015, 6)

Raw data for the year 2013 and 2014 in .csv.bz2 format are also available with this package. You may access them like in the demonstration above using the system.file function.

Before we end, let's get read of the copy of the accident_2015.csv.bz2 in our working directory since we already have a copy within the package which we can access anytime.

file.remove("./accident_2015.csv.bz2")

This ends the first part of the tutorial for the package FARSread. For the second tutorial, "Other functions in FARSread" we will be demonstrating the use of the function fars_summarize_years function and the fars-map_state function.

The fars_summarize_years function is designed to facilitate creating a table that will show the number of traffic accidents for every month in a single or a number of years. Similar to the table we created above, but with less effort on your part. The fars-map_state function create a map plot using geospatial locationing to plot the location where the accidents occured in a particular State.

We hope that you enjoyed our short presentation.

Appendix

List of variables in the dataset provided in package FARSread

|Variable |Desscription |DataType | |:----------------|:----------------------------------------------------------------------------------------|:--------| |STATE |State number corresponding to a State in the USA |int | |ST_CASE |a unique case number assigned by the data entry system |int | |VE_TOTAL |number of contact motor vehicles involved in the crash |int | |VE_FORMS |number of vehicles in-transport involved in crash. Legally parked vehicles not included |int | |PVH_INVL |count of the number of parked and working vehicles involved |int | |PEDS |number of Person Forms (Not a Motor Vehicle Occupant) |int | |PERNOTMVIT |number of non-motorists in the crash |int | |PERMVIT |number of motorists in the crash |int | |PERSONS |number of all persons involved in the crash |int | |COUNTY |location of the unstabilized event with regard to the County |int | |CITY |location of the unstabilized event with regard to the City |int | |DAY |day of the month on which the crash occurred |int | |MONTH |month in which the crash occurred |int | |YEAR |the year in which the crash occurred |int | |DAY_WEEK |the day of the week on which the crash occurred |int | |HOUR |the hour at which the crash occurred |int | |MINUTE |the minutes after the hour at which the crash occurred |int | |NHS |whether the crash occurred on a trafficway that is part of the National Highway System. |int | |RUR_URB |classification of the segment of the trafficway on which the crash occurred |int | |FUNC_SYS |functional classification of the segment of the trafficway on which the crash occurred |int | |RD_OWNER |legal owner of the segment of the trafficway on which the crash occurred |int | |ROUTE |the route signing of the trafficway on which the crash occurred. |int | |TWAY_ID |the trafficway on which the crash occurred 1982-Later |int | |TWAY_ID2 |the trafficway on which the crash occurred 2004-Later |int | |MILEPT |he milepoint nearest to the location where the crash occurred |int | |LATITUDE |the position of latitude in Global Position coordinates |num | |LONGITUD |the position of longitude in Global Position coordinates |num | |SP_JUR |identifies if the location where the crash occurred qualifies as a Special Jurisdiction |int | |HARM_EV |describes the first injury or damage producing event of the crash |int | |MAN_COLL |describes the orientation of 2 motor vehicles involved in the “First Harmful Event” |int | |RELJCT1 |dentifies the crash's location with respect to presence in an interchange area 1975-2009 |int | |RELJCT2 |dentifies the crash's location with respect to presence in an interchange area 2010-Later|int | |TYP_INT |identifies and allows separation of various intersection types |int | |WRK_ZONE |identifies whether the first harmful event occured within the boundaries of a work zone |int | |REL_ROAD |identifies the location of the crash in relation to its position with the trafficway |int | |LGT_COND |records the type/level of light that existed at the time of the crash |int | |WEATHER1 |the prevailing atmospheric conditions that existed at the time of the crash 1975-2006 |int | |WEATHER2 |the prevailing atmospheric conditions that existed at the time of the crash 2007-Later |int | |SCH_BUS |identifies if a school bus/vehicle functioning as a schoolbus, is related to the crash. |int | |NOT_HOUR |the hour that emergency medical service was notified |int | |NOT_MIN |the minutes after the hour that emergency medical service was notified |int | |ARR_HOUR |the hour that emergency medical service arrived |int | |ARR_MIN |the minutes after the hour that emergency medical service (EMS) arrived |int | |HOSP_HR |the hour that emergency medical service arrived at the treatment facility |int | |HOSP_MN |the minutes after the hour that EMS arrived at the treatment facility |int | |CF1 |factors related to the crash expressed by the investigating officer 1975-1981 |int | |CF2 |factors related to the crash expressed by the investigating officer 1982-2013 |int | |CF3 |factors related to the crash expressed by the investigating officer 2013-later |int | |FATALS |the number of fatally injured persons in the crash |int | |DRUNK_DR |the number of drinking drivers involved in the crash |int |

References



DocOfi/FARSread-Package documentation built on May 21, 2019, 3:07 a.m.