README.md

wisdotcrashdatabase

wisdotcrashdatabase makes data pulls and data analysis much easier inside an R environment. To set up an R environment, you must put the files into a SQL database. I wrote automated scripts for these. This package can import data, gather crash flags, relabel by age or injury type for easier grouping, and can look-up county/municipality codes.

Installation

Set-Up GitHub

Note: This was done on Meredith’s computer, replace DOTMCE with your ID.

We want to connect RStudio to our GitHub repos on https://github-p.dot.state.wi.us/DSP. First, RStudio needs to know where Git is located on our computer.

In RStudio, navigate to Tools -> Global Options -> Git/SVN Add these paths:

Git Executable: C:\Users\DOTMCE\AppData\Local\Programs\Git\cmd\git.exe

SVN: C:/Users/DOTMCE/AppData/Local/Programs/Git/mingw64/libexec/git-core/git-svn

You will also need to generate an SSH key. So click ‘Create SSH Key’ (should be id_ed25519). As an alternative to this type ’ ssh-keygen -t ed25519 -C “your_email@example.com” ’ in the terminal.

This SSH key will have to be added to your GitHub accout. Navigate to C:\Users\DOTMCE\.ssh and open ‘id_ed25519.pub’ in Notepad. Copy this.

Got to your GitHub -> Settings -> SSH and GPG keys -> New SSH key. Paste here.

In order to connect the two, we will need a PAT ()

If you get an error something like ‘SSL Certificate error: local issuer not working’, go to terminal and type ’ git config --global http.sslBackend schannel ’

You’ll need to generate a GitHub Personal Access Token (PAT) and then set your credentials. Save your PAT in a text file on your local drive - you MAY NEED this in the future. I recommend saving it to C:/r_projects/data/git_key.txt Use this key for future needs, instead of generating a new key.

# Create Token
usethis::create_github_token(host = "https://github-p.dot.state.wi.us")
# Set your credentials (may have you login to GitHub)
install.packages("gitcreds")
gitcreds::gitcreds_set("https://github-p.dot.state.wi.us")

First Project

Whelp, everything should be set-up! I recommend using Projects, this allows to have relative path names to whatever is inside the Project location. Also you should work locally on drive C since it’ll be much faster than saving everything in the network.

Next, open RStudio and create a project. File -> New Project and save it at C:/r_projects

Install this package

The package can be installed with:

install.packages("devtools")
devtools::install_github(
  repo = "DSP/wisdotcrashdatabase",
  host = "github-p.dot.state.wi.us/api/v3",
  auth_token = <my_github_pat> # the key you saved in a txt file, remove the brackers
)

Packages to install

These are the dependencies for this package. Run install.packages(“pkgname”)

data.table, dplyr, duckdb, forcats, lubridate, purrr, stringr

Setting up the R environment

This package pulls crash data stored in a SQL (duckdb) database. This database should be saved folder C:/data

To find where this package was installed, run .libPaths() in the terminal. Then navigate to wisdotcrashdatabase/extdata/SAS_to_other_formats

First, the data must be exported from SAS into CSV files; run the ‘Export to CSV.sas’ files to do this. Use the appropriate code for the appropriate crash db year. These CSVs must placed in a single folder and you may have to change the folder path in the code.

Next, run ‘CSV_to_SQL_DuckDb.R’ to export each CSV into a SQL database: this is a batch job and runs only ONCE. Alternatively, 1989 - 2022 (prelim) data is already stored in W:/HSSA/Keep/Jaclyn Ziebert/Data/crashes_duck.duckdb

To update current year (like 2022), ‘run_to_update_2022_crashdata.ps1’ automates this translation from SAS to SQL database by running the command in Shell. Be sure to change the file paths, if needed! I have this set to run every Monday at 8am using Microsoft’s Task Scheduler.

Importing data

Old (2016 and prior) and new (2017 and recent) database data can be imported into R as a single data frame. If ‘years’ for a new db is selected, then columns will have to be selected. Certain old db columns are renamed to match the new db. Some columns will always be selected (see Documentation). Import data is done with one function:

Example: Number of crashes

library(wisdotcrashdatabase)
library(dplyr)

crash <- import_db_data(
  db_loc = "C:/data/crashes_duck.duckdb",
  db_type = "crash",
  years = get_list_of_years("21", "22"),
  columns = c("TOTFATL", "TOTINJ")
)
aggregate_crashes_by_crshsvr_tot_inj(crash)
#> # A tibble: 2 × 7
#>   year  `People injured` `People killed` Fatal Injury `Property Damage`
#>   <fct>            <int>           <int> <int>  <int>             <int>
#> 1 2021             35676             595   546  26209            101541
#> 2 2022             33823             596   548  25079            103224
#> # ℹ 1 more variable: `Total crashes` <int>

Crash flags functions

There are also functions to get crash flags. When run, a new column is added with either a “Y” or “N” if that flag exists. Flags include:

Example: Number of speed crashes

import_db_data(
  db_loc = "C:/data/crashes_duck.duckdb",
  db_type = "person",
  years = c("16", "17", "18"),
  columns = c("STATNM", "DRVRPC")
) |> 
  get_driver_flags(flags = "speed") |> 
  filter(speed_flag == "Y") |> 
  distinct(CRSHNMBR, .keep_all = TRUE) |> 
  count(year)
#> Warning: STATNM, DRVRPC not found in new db (ignore if variable is in multiple
#> cols)
#>    year     n
#> 1: 2016 19540
#> 2: 2017 19182
#> 3: 2018 20061

Relabel functions

There’s also functions that may summarize data for analysis. These include:

Example: People injured

import_db_data(
  db_loc = "C:/data/crashes_duck.duckdb",
  db_type = "person",
  years = c("16", "17", "18")
) |> 
  relabel_person_variables(relabel_by = "wisinj") |> 
  janitor::tabyl(year, inj)
#>  year Injured Killed No Injury
#>  2016   43669    588    240916
#>  2017   42178    594    248072
#>  2018   41124    576    255579

Other functions

For help

Load the wisdotcrashdatabase package in R and type in the console ?import_db_data or any other function.

Other Help

GitHub Enterprise

GitHub is a place to store, track, and collaborate coding projects in a repository. It cannot store files > 50mb so some data for projects might be saved in another location and you’d have to copy/paste into the repo location. Each repo under DSP is an R project and can be cloned. When it is cloned, then the user can then push code into the repo. The main actions of Git are:

Duckdb

Occasionally the duckdb package gets updated. When you get an error that says that the old database is not supported, a new one has to be created. Take all CSV files (1989 - 2021) located in a zip (Jaclyn Ziebert/Data/all_crashes_1989_2021.zip) and extract to your local drive. Run CSV_to_SQL_DuckDb.R (script is found in step Setting Up the Environment above) and follow the steps at the bottom. This script will take all CSV and put them in a new duckdb file.

Create Repos under DSP

Dave

Clone Repos

DSP GitHub has a few repos. You want to clone them so they are saved locally. If it’s cloned, you can also edit the code and push (upload) your changes to GitHub. To clone a repo, navigate to the repo. Then go to Code (in green) -> HTTPS -> Copy the https link.

In RStudio and in Terminal, navigate to the folder where the repo will be cloned to (type ‘cd c:/r_projects’ for example). To clone type ‘git clone <>’

Can’t download a package?

If you can’t install a package in RStudio, try downloading it so you can install it from a local drive. Go to cran, the package name is at the end of the URL. (https://cran.r-project.org/web/packages/lubridate) Download the tar.gz file. If that doesn’t work, submit an IT ticket to unblock CRAN website. Tell them to unlock https://cran.r-project.org/

Invalid GitHub Credentials

Do gitcreds::gitcreds_set(), then put in your token (PAT). You will have to then enter your GitHub username and password.

Common R Package development functions

Editing this Package

Make the changes in this code. Then change the version number under DESCRIPTION. Push the changes to GitHub. In GitHub under the package, click Release. Draft New Release (or edit the existing one) and use the new version # under Tag Version. You’re all set - install the package so the library on your computer gets updated.

Shortcuts

Training / Other Functions

I created a training notebook that gives an overview of this package. There are also a few temples/functions to create charts. These are found wisdotcrashdatabase/inst/extdata/Templates



jacciz/wisdotcrashdatabase documentation built on June 3, 2023, 2:26 a.m.