This document will be checking out the PSRC data.
knitr::opts_chunk$set(echo = TRUE, include = TRUE, root.dir = here::here())
library(tidyverse) library(here) library(janitor)
sumtbl <- function(.data, ...) { .data %>% group_by(...) %>% summarize(n = n()) }
sum_by_year <- function(data, sumvar) { data %>% tabyl({{ sumvar }}, survey_year) %>% adorn_totals(where = c("row", "col")) %>% adorn_percentages(denominator = "col") %>% adorn_pct_formatting() %>% adorn_ns() }
Import household data:
hhdat <- PSRCData::hhdat
# hhdat %>% # tabyl(final_home_rgcnum, final_home_uvnum)
hhdat %>% sumtbl(survey_year)
hhdat %>% sumtbl(hhgroup)
hhdat %>% sum_by_year(hhgroup)
hhdat %>% sum_by_year(hhincome_broad)
hhdat %>% sum_by_year(hhincome_detailed)
hhdat %>% filter(!is.na(hhincome_followup)) %>% sum_by_year(hhincome_followup)
hhdat %>% sum_by_year(hhsize)
# hhdat %>% # mutate(hhsize = as.numeric(str_sub(hhsize, start = 1, end = 2))) %>% # mutate(adult_ratio = numadults/hhsize) %>% # sum_by_year(adult_ratio) hhdat %>% sum_by_year(numadults)
hhdat %>% sum_by_year(numworkers)
hhdat %>% filter(survey_year == 2017) %>% tabyl(numworkers, numadults) %>% adorn_totals() %>% adorn_percentages(denominator = "col") %>% adorn_pct_formatting() %>% adorn_ns() %>% adorn_title()
hhdat %>% sum_by_year(lifecycle)
Is lifecycle
exclusionary? How could that work? If not, how is it decided which group they are in?
hhdat %>% filter(lifecycle == "Household includes children under 5" | lifecycle == "Household includes children age 5-17") %>% select(hhid, lifecycle, numchildren, numadults) %>% View(title = "kids") hhdat %>% filter(numchildren == 0) %>% select(hhid, lifecycle, numchildren, numadults) %>% View(title = "nokids")
If they have kids, then they are put into a "kids" group. The group is determined by the youngest kid they have. If under 5, then they are put in "children under 5" group. If over 5, they are put in "children age 5-17" group.
Else, they are put into the appropriate group based on number of hh members and householder age.
hhdat %>% sum_by_year(dayofweek)
Travel diary ONLY Tuesday, Wednesday, Thursday. For "typical day" type of measurement.
hhdat %>% sum_by_year(rent_own)
hhdat %>% sum_by_year(res_type)
Doing this one because I don't understand what it's purpose is.
hhdat %>% select(lifecycle, hhsize, nwkdays, dayofweek) %>% head()
Still don't understand what this means.
What do we have in the person-level data?
Import Person Data
prdat <- PSRCData::prdat
prdat %>% tabyl(student, worker) %>% adorn_percentages() %>% adorn_pct_formatting()
prdat %>% tabyl(commute_mode, student)
prdat %>% sum_by_year(age)
prdat %>% sum_by_year(age_category)
prdat %>% sum_by_year(gender)
Socioeconomic status variables.
prdat %>% sum_by_year(employment)
Who are in the NA
group? I think it's probably ppl under 18
prdat %>% filter(is.na(employment)) %>% pull(age_category) %>% unique() prdat %>% filter(age_category == "Under 18 years") %>% pull(employment) %>% unique()
Are number under 18 and number NA
for employment the same?
prdat %>% filter(is.na(employment)) %>% pull(employment) %>% length() prdat %>% filter(age_category == "Under 18 years") %>% pull(age_category) %>% length()
YES. All NA
values for employment
are ppl under 18.
prdat %>% sum_by_year(jobs_count)
prdat %>% sum_by_year(worker)
prdat %>% filter(age == "16-17 years" | age_category != "Under 18 years") %>% filter(worker == "No jobs") %>% select(hhid, personid, age, employment, jobs_count, worker) %>% View(title = "non-workers")
prdat %>% tabyl(jobs_count, worker)
prdat %>% tabyl(employment, worker)
Difference between worker
and jobs_count
: jobs_count
is only "0 jobs" for 16-17 year-olds. Otherwise, it is NA
. The values for both jobs_count
and worker
are determined by the employment
variable. If employment
is "retired", "homemaker", or "not currently employed", then jobs_count
is NA
, and worker
is "No jobs".
For my analysis, worker
is probably useful enough on it's own. However, jobs_count
is interesting. If ppl have multiple jobs in the same day, they need to go to more places. This could pose issues with taking public transit or other modes besides SOV's, where freedom of movement/flexibility is reduced.
prdat %>% sum_by_year(education)
prdat %>% tabyl(education, age_category) %>% adorn_percentages(denominator = "col") %>% adorn_pct_formatting()
Seattle area has skewed demographics in terms of educational attainment: number of people with bachelor degree and above is extremely high. In particular, number with graduate/post-graduate degree.
prdat %>% sum_by_year(education)
prdat %>% sum_by_year(student)
prdat %>% tabyl(student, age_category)
School type
prdat %>% sum_by_year(schooltype)
prdat %>% tabyl(schooltype, age_category) %>% adorn_percentages(denominator = "col") %>% adorn_pct_formatting()
prdat %>% sum_by_year(workplace)
prdat %>% sum_by_year(license)
Frequency:
prdat %>% sum_by_year(commute_freq)
prdat %>% sum_by_year(commute_mode)
prdat %>% sum_by_year(telecommute_freq)
prdat %>% sum_by_year(school_freq)
School frequency was NOT repeated in 2019.
Times ridden transit in past 30 days
prdat %>% sum_by_year(mode_freq_1)
Times ridden a bike in past 30 days
prdat %>% sum_by_year(mode_freq_2)
Times gone for a walk in past 30 days
prdat %>% sum_by_year(mode_freq_3)
Times used carshare in past 30 days
prdat %>% sum_by_year(mode_freq_4)
Times used rideshare in past 30 days
prdat %>% sum_by_year(mode_freq_5)
prdat %>% sum_by_year(completed_pref_survey)
prdat %>% filter(!(completed_pref_survey == "Yes")) %>% View()
This was only included in 2017.
According to p. 19 of the PSRC 2017 final report, preference survey was the following:
If a person's survey was reported by proxy, then they were not asked opinion and preference questions. This consisted of 211
respondents in 2017.
Safer ways to get to stops
prdat %>% sum_by_year(wbt_transitmore_1)
Increased frequency
prdat %>% sum_by_year(wbt_transitmore_2)
Increased reliability
prdat %>% sum_by_year(wbt_transitmore_3)
Shared use path or protected bike lane
prdat %>% sum_by_year(wbt_bikemore_1)
Neighborhood greenway
prdat %>% sum_by_year(wbt_bikemore_2)
Bike lane
prdat %>% sum_by_year(wbt_bikemore_3)
Shared roadway lane
prdat %>% sum_by_year(wbt_bikemore_4)
End of trip amenities
prdat %>% sum_by_year(wbt_bikemore_5)
Import Trip-level Data
trdat <- PSRCData::trdat
Not sure what this is so i'm checking it out
# trdat %>% pull(personid) trdat %>% filter(personid == 1710065203) %>% pull(daynum) %>% unique()
SO: this person (id: 1710065203
) has 7 daynums recorded. According to the psrc 2017 final report: those that used rSurvey reported their travel for 1 day. Those that used rMove reported their travel for 7 days.
It also says that households were "preassigned" to a Tues, Wed, or Thurs, and then reassigned to week always starting with a Tues if they were using rMove.
modedat <- trdat %>% select(hhid, personid, contains("mode"))
walk2bus <- modedat %>% filter(str_detect(mode_1, "Walk")) %>% filter(str_detect(mode_2, regex("bus", ignore_case = TRUE)) | str_detect(mode_2, regex("rail", ignore_case = TRUE)) )
Which trips even have multiple modes?
multmode <- modedat %>% filter(!is.na(mode_2))
1,584 trips out of 124,516 have multiple modes. Out of those,
multmode %>% group_by(mode_1) %>% summarise(n = n()) %>% arrange(desc(n))
multmode %>% tabyl(mode_1) %>% arrange(desc(n))
multmode %>% tabyl(mode_2) %>% arrange(desc(n)) %>% adorn_pct_formatting()
trdat %>% tabyl(mode_1) %>% arrange(desc(n)) %>% adorn_pct_formatting()
multmode %>% group_by(mode_2) %>% summarise(n = n()) %>% arrange(desc(n))
main_mode
and mode_simple
and driver
Combining these because I want drive alone, drive others, and passenger separated
trdat %>% select(mode_simple, main_mode) %>% mutate(samemode = mode_simple == main_mode) %>% filter(!samemode) %>% group_by(main_mode, mode_simple) %>% summarise(n())
The only difference between mode_simple
and main_mode
is in the SOV vs HOV and driving. Must use driver
variable to determine if someone is a driver
trdat %>% # select(mode_simple, main_mode, driver) %>% group_by(driver) %>% # filter(main_mode == "HOV") %>% # group_by(main_mode) %>% summarise(n())
mode_comb <- trdat %>% # select(mode_simple, main_mode, driver) %>% mutate(mode_combine = case_when( main_mode == "HOV" & driver == "Driver" ~ "DrOth", main_mode == "HOV" & driver == "Passenger" ~ "Pass", TRUE ~ main_mode ) )
mode_comb %>% select(main_mode, driver, mode_combine) %>% View()
This combined variable has now been added to trdat
under the name mode_full_EM
to indicate that I have created a variable that manipulated the mode to be more detailed.
trdat %>% group_by(main_mode) %>% summarise(n = n())
Get all the modes used by each person . Should look like this:
| personid | Bike | Walk | Transit | HOV | SOV | Other | |-------------|---------|---------|-----------|---------|---------|-----------| | 12345 | 1 | 1 | 1 | 0 | 1 | 1 | | 23456 | 0 | 1 | 1 | 0 | 1 | 0 |
This is done in analysis/02_01_prep-mode-data.r
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.