summarize_data: Summarize HTS Data

View source: R/summarize_data.R

summarize_dataR Documentation

Summarize HTS Data

Description

Create weighted aggregate tables using HTS data.

Usage

summarize_data(data, agg, agg_var = NULL, by = NULL, subset = NULL,
  prop = FALSE, prop_by = NULL, exclude_missing = FALSE,
  use_labels = TRUE)

Arguments

data

Object returned by read_data.

agg

Aggregate function label. Either "household_count", "person_count", "trip_count", "sum", "avg", "median", "household_trip_rate", or "person_trip_rate". See Aggregates section

agg_var

Character string specifying a numeric variable over which to aggregate. Only relavent when agg is "sum", "avg", or "median"

by

Character vector of one or more variable names to group by. See Analysis Groups section.

subset

Character string containing a pre-aggregation subset condition using data.table syntax. See Filter section.

prop

logical. Use proportions for count aggregates?

prop_by

Character vector of one or more variable names by which to group proportions.

exclude_missing

logical. Exclude missing responses from summary.

label

logical. Use labels for table output?

Value

data.table object aggregated by input specifications containing the following fields:

  • by variables. For each by variable, a column of the same name is created. They will appear in the order they are listed as factors ordered by their codebook values.

  • Estimate - Weighted statistic.

  • SE - Standard error of the weighted statistic.

  • Survey - Surveyed/sampled statistic.

  • N - Number of observations/sample size.

Aggregates (agg)

What type of aggregate are you interested in?

Frequencies / Proportions

  • household_count - Count of households

  • person_count - Count of persons

  • trip_count - Count of trips

  • vehicle_count - Count of vehicles

*Use prop = TRUE in combination with a count aggregate to get the proportion.

Numeric Aggregates (Sum / Average / Median)

Must also specify a numeric aggregate variable using the agg_var parameter.

  • sum - Sum of agg_var

  • avg - Arithmetic mean of agg_var

  • median - Median of agg_var

Trip Rates (Daily Person Trips per Person/Household)

Simply put, the count of trips divided by the count of persons or households.

  • household_trip_rate - Daily trips per household.

  • person_trip_rate - Daily trips per person.

Analysis Groups (by)

By which variables to you wish to aggregate?

Similar to GROUP BY in SQL or a CLASS statement in SAS. There is no limit to the number of variables specified in the character vector, however many by variables can result in groups with small sample sizes which need to be interpreted carefully.

The data.table returned by summarize_data will include a column (of class factor) for each by variable specified.

Filtering (subset)

Which households/person/trips do you wish to include or exclude?

Similar to WHERE in SQL, subset allows you to filter observations/rows in the dataset before summarizing/aggregating.

subset is a string that will be evaluated as a logical vector indicating the rows to keep. As mentioned above, the string will be evaluated as the i index in a data.table. In short, similar to the base function subset, there is no need to specify the data object in which the variables are included (i.e.: your code would look like "var < 10" instead of "data$var < 10").

Any variable (or combination of variables) found in the codebook can be used in the subset condition. See Logic for a refresher on R's logical operators when using more than one logical condition.

Quoting within quotes

You will frequently need to include quotes in your string. You can tackle this a few different ways. The following examples would all evaluate the same way:

  • "HHSTATE %in% c('GA','FL')"

  • 'HHSTATE %in% c("GA","FL")'

  • "HHSTATE %in% c(\"GA\",\"FL\")"

Examples


# Read the data
hts_data <- read_data(
  study = 'nirpc_2018',
  project_path = 'C:/2018 NIRPC Household Travel Survey'
)

summarize_data(
  data = hts_data,             # Using the hts_data object,
  agg = 'person_trip_rate',    # calculate the person trip rate
  by = 'sex',                  # by gender
  subset = 'emply_ask == "1"'  # for workers
)




Westat-Transportation/surveysummarize documentation built on Oct. 20, 2023, 2:44 a.m.