Workshop Goals
This presentation is intended for re-use
\
Presentation hotkeys
| Key | Action | |:----|:-----------------------------------------| | C | Show table of contents | | F | Toggles the display of the footer | | A | Toggles display of current vs all slides | | S | Make fonts smaller | | B | Make fonts larger |
Click here for latest installation instructions
Instructions provide explicit download links for
Once installed, open RStudio
Make sure you have the "summarizeNHTS" R software package installed
install.packages("devtools") devtools::install_github("Westat-Transportation/summarizeNHTS")
And now that we have installed the necessary software, load the software
library(summarizeNHTS)
OK, we're ready!
Before we begin, let's make sure the summarizeNHTS
package is loaded.
library(summarizeNHTS)
# Not Run download_nhts_data("2001") download_nhts_data("2009") download_nhts_data("2017")
download_nhts_data
function.download_nhts_data
parametersnhts_data <- read_data("2017", "C:/NHTS")
read_data
is a function that reads and compiles data from CSVs.nhts_data
is the new object we created. '2017'
to specify that we are working with the 2017 dataset.download_nhts_data
function).nhts_data
object we created.$
summary
function to get an overview of the data structure.summary(nhts_data$data)
data
element include four data.table
objects:data.tables
are 2-dimensional data structures (rows X columns).data.frame
structure (with enhanced functionality).nhts_data$data$vehicle
# By position nhts_data$data$vehicle[, c(1, 3)] # By name (single variable) nhts_data$data$vehicle$ANNMILES # By name nhts_data$data$vehicle[, list(HOUSEID, ANNMILES)]
# By row numbers (first 5 rows) nhts_data$data$vehicle[1:5, ] # By condition nhts_data$data$vehicle[VEHTYPE == "01", ] # By condition (multiple values) nhts_data$data$vehicle[VEHTYPE %in% c("01","02"), ]
codebook_2001
, codebook_2009
, codebook_2017
# 2017 variables table head(codebook_2017$variables) # 2017 values table head(codebook_2017$values)
summarize_data
functionsummarize_data
parameterssummarize_data
summarizeNHTS
package has built in functions for running complex queries on the NHTS datasetsummarize_data
is the workhorse function behind these queries.summarize_data( data = nhts_data, agg = "household_count" )
What do these values mean?
Every summarize_data
query will return these fields.
summarize_data
Parameterssummarize_data( data = nhts_data, agg = "household_count" )
data
- NHTS dataset objectread_data
.nhts_data
object.agg
- Aggregate function label'household_count'
but agg
could be a number of other labels.by
parameter to group by metropolitan status.summarize_data( data = nhts_data, agg = "household_count", by = "IS_METRO" )
by
parameter.summarize_data( data = nhts_data, agg = "household_count", by = c("IS_METRO","HOMEOWN") )
'household_count'
, 'person_count'
, 'trip_count'
, 'vehicle_count'
# Person count summarize_data( data = nhts_data, agg = "person_count" )
prop
parameterby
variable is specified# Proportion of persons by WORKER, worker status summarize_data( data = nhts_data, agg = "person_count", by = "WORKER", prop = TRUE )
'sum'
, 'avg'
, 'median'
agg_var
parameter# Average TRPMILES, trip distance in miles summarize_data( data = nhts_data, agg = "avg", agg_var = "TRPMILES" )
Notes
summarize_data
handles missing value (-1,-7,-8,-9) exclusion for numeric aggregates'household_trip_rate'
- Daily Person Trips per Household'person_trip_rate'
- Daily Person Trips per Person# Daily person Trips by worker status summarize_data( data = nhts_data, agg = "person_trip_rate", by = "WORKER" )
summarize_data
Pre-aggregation subset conditions can be specified using the subset
parameter.
Subsetting character variables
# Distribution of social/recreational trips by travel day summarize_data( data = nhts_data, agg = "trip_count", by = "TRAVDAY", prop = TRUE, subset = "WHYTRP90 %in% c('07','08','10')" )
# Person trip rate by Sex (for millennials) summarize_data( data = nhts_data, agg = "person_trip_rate", by = "R_SEX", subset = "R_AGE >= 18 & R_AGE <= 34" )
?summarize_data
Example Derived Variable Coding Scenario
1) Someone's interested in querying the NHTS for a particular travel behavior
Anthony: "I am interested in exploring how financial burden may affect travel."
Alex: "Remember that question about walking to save money? I would include that in your analysis."
2) Consider suggested variable's usefulness for Anthony's analysis:
WALK2SAVE: "I walk to places to save money."
Values:
| | | |:------|---------------------------| | 01 | Strongly agree | | 02 | Agree | | 03 | Neither Agree or Disagree | | 04 | Disagree | | 05 | Strongly disagree |
3) Look for potential other ways of maniupulating this variable for analysis
4) Create variable called WALK_FINANCE, a yes/no variable for the binary analysis question, "who does or does not walk to save money?"
A derived variable template file is included in summarizeNHTS
Create basic and complex variables with your own logic
Variables loaded automatically for you by read_data()
The derived variable template file preserves details of coding for your documentation
Derived variable file requirements
| Item | Description | |:-------|:----------------------------------------------------------| | NAME | The name of the variable as it will appear in the dataset | | TABLE | The table level this variable is being computed for | | TYPE | Data type (numeric or character) | | DOMAIN | Logical expression that decides value assignment | | VALUE | A variable code value | | LABEL | Description of code value |
Using the Derived Variables file, create a variable with the following requirements:
\
| NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL | |:------------|:----------|:----------|:--------------|:------|:------| | HAS_VEHICLE | household | character | HHVEHCNT > 0 | 1 | Yes | | HAS_VEHICLE | household | character | HHVEHCNT == 0 | 2 | No |
Using the Derived Variables file, create a variable with the following requirements:
\
| NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL | |:----------|:-------|:----------|:--------------------------|:------|:-------------| | AGE_GROUP | person | character | R_AGE >= 0 & R_AGE <= 17 | 1 | Child | | AGE_GROUP | person | character | R_AGE >= 18 & R_AGE <= 44 | 2 | Young Adult | | AGE_GROUP | person | character | R_AGE >= 45 & R_AGE <= 65 | 3 | Middle Adult | | AGE_GROUP | person | character | R_AGE >= 66 | 4 | Older Adult |
Using the Derived Variables file, create a variable with the following requirements:
\
| NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL | |:----------|:-------|:----------|:---------------|:------|:------| | USES_TNC | person | character | RIDESHARE > 0 | 1 | Yes | | USES_TNC | person | character | RIDESHARE == 0 | 2 | No |
Using the Derived Variables file, create a variable with the following requirements:
\
| NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL | |:---------|:----------|:----------|:------------------------------|:------|:------| | IS_METRO | household | character | MSACAT %in% c('01','02','03') | 1 | Yes | | IS_METRO | household | character | MSACAT %in% c('04') | 2 | No |
make_table
- Create report-ready, formatted tables.make_chart
- Create interactive bar charts.make_map
- Create interactive choropleth maps.summarize_data
to a new object.statistic <- summarize_data( data = nhts_data, agg = "person_trip_rate", by = "WORKER" ) make_table(statistic)
statistic <- summarize_data( data = nhts_data, agg = "person_count", by = c("TRAVDAY","OCCAT","EDUC"), exclude_missing = TRUE ) make_table( tbl = statistic, title = "Table 1: Distribution of Persons (%) by Travel Day, Job Category, and Educational Attainment", output = c(W = "Weighted Percentage", N = "Sample Size"), row_vars = c("EDUC","OCCAT") )
statistic <- summarize_data( data = nhts_data, agg = "person_trip_rate", by = "WORKER", exclude_missing = TRUE ) make_chart(statistic)
Person Trip Rate by Sex, Worker Status, and Travel Day of Week
statistic <- summarize_data( data = nhts_data, agg = "person_trip_rate", by = c("R_SEX","WORKER","TRAVDAY"), exclude_missing = TRUE )
# Specify fill and facet make_chart( tbl = statistic, fill = "WORKER", facet = "TRAVDAY", palette = "Accent" )
statistic <- summarize_data( data = nhts_data, agg = "person_count", by = "CENSUS_D" ) make_map(statistic)
Census Regions
census_region_layer
Census Divisions
census_division_layer
States
state_layer
/ state_tile_layer
CBSA
cbsa_layer
Include a second table grouping by the original geography plus one variable.
statistic1 <- summarize_data( data = nhts_data, agg = "person_trip_rate", by = "HHSTFIPS", exclude_missing = TRUE ) statistic2 <- summarize_data( data = nhts_data, agg = "person_trip_rate", by = c("HHSTFIPS","WORKER"), exclude_missing = TRUE ) map <- make_map( tbl = statistic1, tbl2 = statistic2 )
map
All 3 visualization function support the same value formatting options:
digits
- Number of decimal places to usepercentage
- Treat proportions as percentagesscientific
- Use scientific notationmultiplier
- A value multiplierFormatting example with make_table
statistic <- summarize_data( data = nhts_data, agg = "trip_count", by = "PRMACT", exclude_missing = TRUE ) make_table( tbl = statistic, title = "Trip Count by Primary Activity (in Millions)", output = c(W = "Trip Count (Millions)", E = "SE"), digits = 0, multiplier = 1000000 )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.