  1. Setting Up an R Analysis Environment
  2. Accessing the Data
  3. Generating Estimates
  4. Creating New Variables
  5. Visualizing Estimates
  6. Producing a Travel Analysis Report

Setting Up an R Analysis Environment

Click here for latest installation instructions

Instructions provide explicit download links for

Once installed, open RStudio

Setting Up an R Analysis Environment (cont.)

Make sure you have the "summarizeNHTS" R software package installed


And now that we have installed the necessary software, load the software


OK, we're ready!

Accessing the Data

Before we begin, let's make sure the summarizeNHTS package is loaded.


Accessing the Data: Downloading NHTS Data

# Not Run

Accessing the Data: Reading the 2017 NHTS data

nhts_data <- read_data("2017", "C:/NHTS")

Accessing the Data: Summarizing the data object


Accessing the Data: Snapshot of the vehicle data


Accessing the Data: Subsetting

# By position
nhts_data$data$vehicle[, c(1, 3)]

# By name (single variable)

# By name
nhts_data$data$vehicle[, list(HOUSEID, ANNMILES)]
# By row numbers (first 5 rows)
nhts_data$data$vehicle[1:5, ]

# By condition
nhts_data$data$vehicle[VEHTYPE == "01", ]

# By condition (multiple values)
nhts_data$data$vehicle[VEHTYPE %in% c("01","02"), ]

Accessing the Data: Codebook objects

# 2017 variables table

# 2017 values table

Generating Estimates

Generating Estimates: Introduction to summarize_data

  data = nhts_data,
  agg = "household_count"

Generating Estimates: Exploring summarize_data Parameters

  data = nhts_data,
  agg = "household_count"

Generating Estimates: Grouping by Variables

  data = nhts_data,
  agg = "household_count",
  by = "IS_METRO"
  data = nhts_data,
  agg = "household_count",
  by = c("IS_METRO","HOMEOWN")

Generating Estimates: Frequencies/Proportions

# Person count
  data = nhts_data,
  agg = "person_count"
# Proportion of persons by WORKER, worker status
  data = nhts_data,
  agg = "person_count",
  by = "WORKER",
  prop = TRUE

Generating Estimates: Numeric Aggregates

# Average TRPMILES, trip distance in miles
  data = nhts_data,
  agg = "avg",
  agg_var = "TRPMILES"


Generating Estimates: Trip Rates

# Daily person Trips by worker status
  data = nhts_data,
  agg = "person_trip_rate",
  by = "WORKER"

Generating Estimates: Subsetting in summarize_data

# Distribution of social/recreational trips by travel day
  data = nhts_data,
  agg = "trip_count",
  by = "TRAVDAY",
  prop = TRUE,
  subset = "WHYTRP90 %in% c('07','08','10')"
# Person trip rate by Sex (for millennials)
  data = nhts_data,
  agg = "person_trip_rate",
  by = "R_SEX",
  subset = "R_AGE >= 18 & R_AGE <= 34"

Generating Estimates: Documentation


R Documentation for summarize_data

Creating New Variables

Creating New Variables: Example Scenario

Example Derived Variable Coding Scenario

1) Someone's interested in querying the NHTS for a particular travel behavior

Anthony: "I am interested in exploring how financial burden may affect travel."
Alex: "Remember that question about walking to save money? I would include that in your analysis."

2) Consider suggested variable's usefulness for Anthony's analysis:

WALK2SAVE: "I walk to places to save money."


| | | |:------|---------------------------| | 01 | Strongly agree | | 02 | Agree | | 03 | Neither Agree or Disagree | | 04 | Disagree | | 05 | Strongly disagree |

3) Look for potential other ways of maniupulating this variable for analysis

4) Create variable called WALK_FINANCE, a yes/no variable for the binary analysis question, "who does or does not walk to save money?"

Creating New Variables: Configuration

Creating New Variables: Configuration (cont.)

Derived variable file requirements

| Item | Description | |:-------|:----------------------------------------------------------| | NAME | The name of the variable as it will appear in the dataset | | TABLE | The table level this variable is being computed for | | TYPE | Data type (numeric or character) | | DOMAIN | Logical expression that decides value assignment | | VALUE | A variable code value | | LABEL | Description of code value |

Review File

Creating New Variables: Example 1 (Has/Has-not)

Using the Derived Variables file, create a variable with the following requirements:


| NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL | |:------------|:----------|:----------|:--------------|:------|:------| | HAS_VEHICLE | household | character | HHVEHCNT > 0 | 1 | Yes | | HAS_VEHICLE | household | character | HHVEHCNT == 0 | 2 | No |

Creating New Variables: Example 2 (Grouping)

Using the Derived Variables file, create a variable with the following requirements:


| NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL | |:----------|:-------|:----------|:--------------------------|:------|:-------------| | AGE_GROUP | person | character | R_AGE >= 0 & R_AGE <= 17 | 1 | Child | | AGE_GROUP | person | character | R_AGE >= 18 & R_AGE <= 44 | 2 | Young Adult | | AGE_GROUP | person | character | R_AGE >= 45 & R_AGE <= 65 | 3 | Middle Adult | | AGE_GROUP | person | character | R_AGE >= 66 | 4 | Older Adult |

Creating New Variables: Example 3 (Uses/Does-not-use)

Using the Derived Variables file, create a variable with the following requirements:


| NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL | |:----------|:-------|:----------|:---------------|:------|:------| | USES_TNC | person | character | RIDESHARE > 0 | 1 | Yes | | USES_TNC | person | character | RIDESHARE == 0 | 2 | No |

Creating New Variables: Example 4 (Is/Is-not)

Using the Derived Variables file, create a variable with the following requirements:


| NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL | |:---------|:----------|:----------|:------------------------------|:------|:------| | IS_METRO | household | character | MSACAT %in% c('01','02','03') | 1 | Yes | | IS_METRO | household | character | MSACAT %in% c('04') | 2 | No |

Creating New Variables: Summary

Visualizing Estimates

Visualizing Estimates: Tables (Introductory)

statistic <- summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = "WORKER"


Visualizing Estimates: Tables (Advanced)

statistic <- summarize_data(
  data = nhts_data,
  agg = "person_count",
  by = c("TRAVDAY","OCCAT","EDUC"),
  exclude_missing = TRUE

  tbl = statistic,
  title = "Table 1: Distribution of Persons (%) by Travel Day, Job Category, and Educational Attainment",
  output = c(W = "Weighted Percentage", N = "Sample Size"),
  row_vars = c("EDUC","OCCAT")

Visualizing Estimates: Charts (Introductory)

statistic <- summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = "WORKER",
  exclude_missing = TRUE


Visualizing Estimates: Charts (Advanced)

Person Trip Rate by Sex, Worker Status, and Travel Day of Week

statistic <- summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = c("R_SEX","WORKER","TRAVDAY"),
  exclude_missing = TRUE
# Specify fill and facet
  tbl = statistic, 
  fill = "WORKER",
  facet = "TRAVDAY",
  palette = "Accent"

Visualizing Estimates: Maps (Introductory)

statistic <- summarize_data(
  data = nhts_data,
  agg = "person_count",
  by = "CENSUS_D"


Visualizing Estimates: Maps - Built in Geography Layers

Visualizing Estimates: Maps (Advanced)

Include a second table grouping by the original geography plus one variable.

statistic1 <- summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = "HHSTFIPS",
  exclude_missing = TRUE

statistic2 <- summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = c("HHSTFIPS","WORKER"),
  exclude_missing = TRUE

map <- make_map(
  tbl = statistic1, 
  tbl2 = statistic2


statistic <- summarize_data(
  data = nhts_data,
  agg = "trip_count",
  by = "PRMACT",
  exclude_missing = TRUE

  tbl = statistic,
  title = "Trip Count by Primary Activity (in Millions)",
  output = c(W = "Trip Count (Millions)", E = "SE"),
  digits = 0,
  multiplier = 1000000

Producing a Travel Analysis Report

