wisdotcrashdatabase makes data pulls and data analysis much easier inside an R environment. To set up an R environment, you must put the files into a SQL database. I wrote automated scripts for these. This package can import data, gather crash flags, relabel by age or injury type for easier grouping, and can look-up county/municipality codes.
Note: This was done on Meredith’s computer, replace DOTMCE with your ID.
We want to connect RStudio to our GitHub repos on https://github-p.dot.state.wi.us/DSP. First, RStudio needs to know where Git is located on our computer.
In RStudio, navigate to Tools -> Global Options -> Git/SVN Add these paths:
Git Executable: C:\Users\DOTMCE\AppData\Local\Programs\Git\cmd\git.exe
SVN: C:/Users/DOTMCE/AppData/Local/Programs/Git/mingw64/libexec/git-core/git-svn
You will also need to generate an SSH key. So click ‘Create SSH Key’ (should be id_ed25519). As an alternative to this type ’ ssh-keygen -t ed25519 -C “your_email@example.com” ’ in the terminal.
This SSH key will have to be added to your GitHub accout. Navigate to C:\Users\DOTMCE\.ssh and open ‘id_ed25519.pub’ in Notepad. Copy this.
Got to your GitHub -> Settings -> SSH and GPG keys -> New SSH key. Paste here.
In order to connect the two, we will need a PAT ()
If you get an error something like ‘SSL Certificate error: local issuer not working’, go to terminal and type ’ git config --global http.sslBackend schannel ’
You’ll need to generate a GitHub Personal Access Token (PAT) and then set your credentials. Save your PAT in a text file on your local drive - you MAY NEED this in the future. I recommend saving it to C:/r_projects/data/git_key.txt Use this key for future needs, instead of generating a new key.
# Create Token
usethis::create_github_token(host = "https://github-p.dot.state.wi.us")
# Set your credentials (may have you login to GitHub)
install.packages("gitcreds")
gitcreds::gitcreds_set("https://github-p.dot.state.wi.us")
Whelp, everything should be set-up! I recommend using Projects, this allows to have relative path names to whatever is inside the Project location. Also you should work locally on drive C since it’ll be much faster than saving everything in the network.
Next, open RStudio and create a project. File -> New Project and save it at C:/r_projects
The package can be installed with:
install.packages("devtools")
devtools::install_github(
repo = "DSP/wisdotcrashdatabase",
host = "github-p.dot.state.wi.us/api/v3",
auth_token = <my_github_pat> # the key you saved in a txt file, remove the brackers
)
These are the dependencies for this package. Run install.packages(“pkgname”)
data.table, dplyr, duckdb, forcats, lubridate, purrr, stringr
This package pulls crash data stored in a SQL (duckdb) database. This database should be saved folder C:/data
To find where this package was installed, run .libPaths() in the terminal. Then navigate to wisdotcrashdatabase/extdata/SAS_to_other_formats
First, the data must be exported from SAS into CSV files; run the ‘Export to CSV.sas’ files to do this. Use the appropriate code for the appropriate crash db year. These CSVs must placed in a single folder and you may have to change the folder path in the code.
Next, run ‘CSV_to_SQL_DuckDb.R’ to export each CSV into a SQL database: this is a batch job and runs only ONCE. Alternatively, 1989 - 2022 (prelim) data is already stored in W:/HSSA/Keep/Jaclyn Ziebert/Data/crashes_duck.duckdb
To update current year (like 2022), ‘run_to_update_2022_crashdata.ps1’ automates this translation from SAS to SQL database by running the command in Shell. Be sure to change the file paths, if needed! I have this set to run every Monday at 8am using Microsoft’s Task Scheduler.
Old (2016 and prior) and new (2017 and recent) database data can be imported into R as a single data frame. If ‘years’ for a new db is selected, then columns will have to be selected. Certain old db columns are renamed to match the new db. Some columns will always be selected (see Documentation). Import data is done with one function:
library(wisdotcrashdatabase)
library(dplyr)
crash <- import_db_data(
db_loc = "C:/data/crashes_duck.duckdb",
db_type = "crash",
years = get_list_of_years("21", "22"),
columns = c("TOTFATL", "TOTINJ")
)
aggregate_crashes_by_crshsvr_tot_inj(crash)
#> # A tibble: 2 × 7
#> year `People injured` `People killed` Fatal Injury `Property Damage`
#> <fct> <int> <int> <int> <int> <int>
#> 1 2021 35676 595 546 26209 101541
#> 2 2022 33823 596 548 25079 103224
#> # ℹ 1 more variable: `Total crashes` <int>
There are also functions to get crash flags. When run, a new column is added with either a “Y” or “N” if that flag exists. Flags include:
import_db_data(
db_loc = "C:/data/crashes_duck.duckdb",
db_type = "person",
years = c("16", "17", "18"),
columns = c("STATNM", "DRVRPC")
) |>
get_driver_flags(flags = "speed") |>
filter(speed_flag == "Y") |>
distinct(CRSHNMBR, .keep_all = TRUE) |>
count(year)
#> Warning: STATNM, DRVRPC not found in new db (ignore if variable is in multiple
#> cols)
#> year n
#> 1: 2016 19540
#> 2: 2017 19182
#> 3: 2018 20061
There’s also functions that may summarize data for analysis. These include:
import_db_data(
db_loc = "C:/data/crashes_duck.duckdb",
db_type = "person",
years = c("16", "17", "18")
) |>
relabel_person_variables(relabel_by = "wisinj") |>
janitor::tabyl(year, inj)
#> year Injured Killed No Injury
#> 2016 43669 588 240916
#> 2017 42178 594 248072
#> 2018 41124 576 255579
Load the wisdotcrashdatabase package in R and type in the console
?import_db_data
or any other function.
GitHub is a place to store, track, and collaborate coding projects in a repository. It cannot store files > 50mb so some data for projects might be saved in another location and you’d have to copy/paste into the repo location. Each repo under DSP is an R project and can be cloned. When it is cloned, then the user can then push code into the repo. The main actions of Git are:
Clone: Copies the repo from GitHub into your personal computer
Commit and push: When changes are made, do this to save the code on GitHub
Pull: Update your local repo based on what is on GitHub repo (not used often)
Occasionally the duckdb package gets updated. When you get an error that says that the old database is not supported, a new one has to be created. Take all CSV files (1989 - 2021) located in a zip (Jaclyn Ziebert/Data/all_crashes_1989_2021.zip) and extract to your local drive. Run CSV_to_SQL_DuckDb.R (script is found in step Setting Up the Environment above) and follow the steps at the bottom. This script will take all CSV and put them in a new duckdb file.
Dave
DSP GitHub has a few repos. You want to clone them so they are saved locally. If it’s cloned, you can also edit the code and push (upload) your changes to GitHub. To clone a repo, navigate to the repo. Then go to Code (in green) -> HTTPS -> Copy the https link.
In RStudio and in Terminal, navigate to the folder where the repo will be cloned to (type ‘cd c:/r_projects’ for example). To clone type ‘git clone <>’
If you can’t install a package in RStudio, try downloading it so you can install it from a local drive. Go to cran, the package name is at the end of the URL. (https://cran.r-project.org/web/packages/lubridate) Download the tar.gz file. If that doesn’t work, submit an IT ticket to unblock CRAN website. Tell them to unlock https://cran.r-project.org/
Do gitcreds::gitcreds_set(), then put in your token (PAT). You will have to then enter your GitHub username and password.
Make the changes in this code. Then change the version number under DESCRIPTION. Push the changes to GitHub. In GitHub under the package, click Release. Draft New Release (or edit the existing one) and use the new version # under Tag Version. You’re all set - install the package so the library on your computer gets updated.
I created a training notebook that gives an overview of this package. There are also a few temples/functions to create charts. These are found wisdotcrashdatabase/inst/extdata/Templates
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.