This document will be checking out the PSRC data.

knitr::opts_chunk$set(echo = TRUE, include = TRUE, root.dir = here::here())
library(tidyverse)
library(here)
library(janitor)

Code for building tables, etc.

Summary table:

sumtbl <- function(.data, ...) {
  .data %>% 
    group_by(...) %>% 
    summarize(n = n())
}

Crosstab:

sum_by_year <- function(data, sumvar) {
  data %>% 
    tabyl({{ sumvar }}, survey_year) %>% 
    adorn_totals(where = c("row", "col")) %>%
    adorn_percentages(denominator = "col") %>% 
      adorn_pct_formatting() %>% 
    adorn_ns()
}

Household Level {.tabset .tabset-fade .tabset-pills}

Import household data:

hhdat <- PSRCData::hhdat

other stuff

# hhdat %>% 
#   tabyl(final_home_rgcnum, final_home_uvnum)

Basic counts

How many households from 2017 vs 2019

hhdat %>% sumtbl(survey_year)

Number of smartphone vs travel diary

hhdat %>% 
  sumtbl(hhgroup)

Cross of those two: how many in each survey year

hhdat %>% 
  sum_by_year(hhgroup)

Income

Broad income (everyone answered)

hhdat %>% 
  sum_by_year(hhincome_broad)

Detailed income

hhdat %>% 
  sum_by_year(hhincome_detailed)

Follow-up categories if they said "prefer not to answer" to the detailed question

hhdat %>% 
  filter(!is.na(hhincome_followup)) %>% 
  sum_by_year(hhincome_followup)

Household size

hhdat %>% 
  sum_by_year(hhsize)

Number adults and children

# hhdat %>% 
#   mutate(hhsize = as.numeric(str_sub(hhsize, start = 1, end = 2))) %>% 
#   mutate(adult_ratio = numadults/hhsize) %>%  
#   sum_by_year(adult_ratio)

hhdat %>% sum_by_year(numadults)

Number of workers

hhdat %>% 
  sum_by_year(numworkers)

Number of workers by number of adults for 2017

hhdat %>% 
  filter(survey_year == 2017) %>% 
  tabyl(numworkers, numadults) %>% 
  adorn_totals() %>% 
  adorn_percentages(denominator = "col") %>% 
  adorn_pct_formatting() %>% 
  adorn_ns() %>% 
  adorn_title()

Life Cycle

hhdat %>% 
  sum_by_year(lifecycle)

Is lifecycle exclusionary? How could that work? If not, how is it decided which group they are in?

hhdat %>% 
  filter(lifecycle == "Household includes children under 5" | lifecycle == "Household includes children age 5-17") %>% 
  select(hhid, lifecycle, numchildren, numadults) %>% 
  View(title = "kids")

hhdat %>% 
  filter(numchildren == 0) %>% 
  select(hhid, lifecycle, numchildren, numadults) %>% 
  View(title = "nokids")

If they have kids, then they are put into a "kids" group. The group is determined by the youngest kid they have. If under 5, then they are put in "children under 5" group. If over 5, they are put in "children age 5-17" group.

Else, they are put into the appropriate group based on number of hh members and householder age.

Day of week of diary

hhdat %>% 
  sum_by_year(dayofweek)

Travel diary ONLY Tuesday, Wednesday, Thursday. For "typical day" type of measurement.

Residence type

Rent/own

hhdat %>% 
  sum_by_year(rent_own)

Residence type

hhdat %>% 
  sum_by_year(res_type)

Number of weekdays

Doing this one because I don't understand what it's purpose is.

hhdat %>% 
  select(lifecycle, hhsize, nwkdays, dayofweek) %>% 
  head()

Still don't understand what this means.

Person Level {.tabset .tabset-fade .tabset-pills}

What do we have in the person-level data?

Import Person Data

prdat <- PSRCData::prdat

Other stuff

prdat %>% 
  tabyl(student, worker) %>% 
  adorn_percentages() %>% adorn_pct_formatting()
prdat %>% 
  tabyl(commute_mode, student)

Demographics

Age

prdat %>% 
  sum_by_year(age)
prdat %>% 
  sum_by_year(age_category)

Gender

prdat %>% 
  sum_by_year(gender)

SES

Socioeconomic status variables.

Employment

General employment variable

prdat %>% 
  sum_by_year(employment)

Who are in the NA group? I think it's probably ppl under 18

prdat %>% 
  filter(is.na(employment)) %>%
  pull(age_category) %>% 
  unique()

prdat %>% 
  filter(age_category == "Under 18 years") %>% 
  pull(employment) %>%
  unique()

Are number under 18 and number NA for employment the same?

prdat %>% filter(is.na(employment)) %>% pull(employment) %>% length()
prdat %>% filter(age_category == "Under 18 years") %>% pull(age_category) %>% length()

YES. All NA values for employment are ppl under 18.

Number of jobs

prdat %>% 
  sum_by_year(jobs_count)

Worker

prdat %>% 
  sum_by_year(worker)
prdat %>% 
  filter(age == "16-17 years" | age_category != "Under 18 years") %>% 
  filter(worker == "No jobs") %>%
  select(hhid, personid, age, employment, jobs_count, worker) %>% 
  View(title = "non-workers")
prdat %>% 
  tabyl(jobs_count, worker)
prdat %>% 
  tabyl(employment, worker)

Difference between worker and jobs_count: jobs_count is only "0 jobs" for 16-17 year-olds. Otherwise, it is NA. The values for both jobs_count and worker are determined by the employment variable. If employment is "retired", "homemaker", or "not currently employed", then jobs_count is NA, and worker is "No jobs".

For my analysis, worker is probably useful enough on it's own. However, jobs_count is interesting. If ppl have multiple jobs in the same day, they need to go to more places. This could pose issues with taking public transit or other modes besides SOV's, where freedom of movement/flexibility is reduced.

Education

Education

prdat %>% 
  sum_by_year(education)
prdat %>% 
  tabyl(education, age_category) %>% 
  adorn_percentages(denominator = "col") %>% adorn_pct_formatting()

Seattle area has skewed demographics in terms of educational attainment: number of people with bachelor degree and above is extremely high. In particular, number with graduate/post-graduate degree.

prdat %>% 
  sum_by_year(education)

Student

prdat %>% 
  sum_by_year(student)
prdat %>% 
  tabyl(student, age_category)

School type

prdat %>% 
  sum_by_year(schooltype)
prdat %>% 
  tabyl(schooltype, age_category) %>% 
  adorn_percentages(denominator = "col") %>% adorn_pct_formatting()

Travel Behavior

Workplace

prdat %>% 
  sum_by_year(workplace)

License

prdat %>% 
  sum_by_year(license)

Commute

Frequency:

prdat %>% 
  sum_by_year(commute_freq)
prdat %>% 
  sum_by_year(commute_mode)
prdat %>% 
  sum_by_year(telecommute_freq)
prdat %>% 
  sum_by_year(school_freq)

School frequency was NOT repeated in 2019.

Mode

Times ridden transit in past 30 days

prdat %>% 
  sum_by_year(mode_freq_1)

Times ridden a bike in past 30 days

prdat %>% 
  sum_by_year(mode_freq_2)

Times gone for a walk in past 30 days

prdat %>% 
  sum_by_year(mode_freq_3)

Times used carshare in past 30 days

prdat %>% 
  sum_by_year(mode_freq_4)

Times used rideshare in past 30 days

prdat %>% 
  sum_by_year(mode_freq_5)

Attitudes, opinions

Preference survey

prdat %>% 
  sum_by_year(completed_pref_survey)
prdat %>% 
  filter(!(completed_pref_survey == "Yes")) %>% 
  View()

This was only included in 2017.

According to p. 19 of the PSRC 2017 final report, preference survey was the following:

If a person's survey was reported by proxy, then they were not asked opinion and preference questions. This consisted of 211 respondents in 2017.

Use more transit

Safer ways to get to stops

prdat %>% 
  sum_by_year(wbt_transitmore_1)

Increased frequency

prdat %>% 
  sum_by_year(wbt_transitmore_2)

Increased reliability

prdat %>% 
  sum_by_year(wbt_transitmore_3)

Use more bike

Shared use path or protected bike lane

prdat %>% 
  sum_by_year(wbt_bikemore_1)

Neighborhood greenway

prdat %>% 
  sum_by_year(wbt_bikemore_2)

Bike lane

prdat %>% 
  sum_by_year(wbt_bikemore_3)

Shared roadway lane

prdat %>% 
  sum_by_year(wbt_bikemore_4)

End of trip amenities

prdat %>% 
  sum_by_year(wbt_bikemore_5)

Trip Level {.tabset .tabset-fade .tabset-pills}

Import Trip-level Data

trdat <- PSRCData::trdat

Diary day sequence

Not sure what this is so i'm checking it out

# trdat %>% pull(personid)
trdat %>% 
  filter(personid == 1710065203) %>% 
  pull(daynum) %>% 
  unique()

SO: this person (id: 1710065203) has 7 daynums recorded. According to the psrc 2017 final report: those that used rSurvey reported their travel for 1 day. Those that used rMove reported their travel for 7 days.

It also says that households were "preassigned" to a Tues, Wed, or Thurs, and then reassigned to week always starting with a Tues if they were using rMove.

Mode stuff

modedat <- trdat %>% 
  select(hhid, personid, contains("mode"))

Multi-mode travel

walk2bus <- modedat %>% 
  filter(str_detect(mode_1, "Walk")) %>% 
  filter(str_detect(mode_2, regex("bus", ignore_case = TRUE)) | 
           str_detect(mode_2, regex("rail", ignore_case = TRUE)) 
         )

Which trips even have multiple modes?

multmode <- modedat %>% 
  filter(!is.na(mode_2))

1,584 trips out of 124,516 have multiple modes. Out of those,

multmode %>% 
  group_by(mode_1) %>% 
  summarise(n = n()) %>% 
  arrange(desc(n))
multmode %>% 
  tabyl(mode_1) %>% 
  arrange(desc(n))
multmode %>% 
  tabyl(mode_2) %>% 
  arrange(desc(n)) %>% 
  adorn_pct_formatting()
trdat %>% 
  tabyl(mode_1) %>% 
  arrange(desc(n)) %>% 
  adorn_pct_formatting()
multmode %>% 
  group_by(mode_2) %>% 
  summarise(n = n()) %>% 
  arrange(desc(n))

Combine main_mode and mode_simple and driver

Combining these because I want drive alone, drive others, and passenger separated

trdat %>% 
  select(mode_simple, main_mode) %>% 
  mutate(samemode = mode_simple == main_mode) %>% 
  filter(!samemode) %>% 
  group_by(main_mode, mode_simple) %>% 
  summarise(n())

The only difference between mode_simple and main_mode is in the SOV vs HOV and driving. Must use driver variable to determine if someone is a driver

trdat %>% 
  # select(mode_simple, main_mode, driver) %>% 
  group_by(driver) %>% 
  # filter(main_mode == "HOV") %>% 
  # group_by(main_mode) %>% 
  summarise(n())
mode_comb <- trdat %>% 
  # select(mode_simple, main_mode, driver) %>% 
  mutate(mode_combine = 
           case_when(
             main_mode == "HOV" & driver == "Driver"  ~ "DrOth",
             main_mode == "HOV" & driver == "Passenger" ~ "Pass",
             TRUE                                        ~ main_mode
           )
         )
mode_comb %>% 
  select(main_mode, driver, mode_combine) %>% 
  View()

This combined variable has now been added to trdat under the name mode_full_EM to indicate that I have created a variable that manipulated the mode to be more detailed.

First travel beh LCA: Main mode

trdat %>% 
  group_by(main_mode) %>% 
  summarise(n = n())

Get all the modes used by each person . Should look like this:

| personid | Bike | Walk | Transit | HOV | SOV | Other | |-------------|---------|---------|-----------|---------|---------|-----------| | 12345 | 1 | 1 | 1 | 0 | 1 | 1 | | 23456 | 0 | 1 | 1 | 0 | 1 | 0 |

This is done in analysis/02_01_prep-mode-data.r

Second travel beh LCA: modified main mode



e-mcbride/PSRC.analysis documentation built on April 14, 2022, 5:29 p.m.