Home

/

GitHub

/

Westat-Transportation/surveysummarize

/

summarize_data: Summarize HTS Data

summarize_data: Summarize HTS Data
In Westat-Transportation/surveysummarize: Survey Summarize

View source: R/summarize_data.R

summarize_data

R Documentation

Summarize HTS Data

Description

Create weighted aggregate tables using HTS data.

Usage

summarize_data(data, agg, agg_var = NULL, by = NULL, subset = NULL,
  prop = FALSE, prop_by = NULL, exclude_missing = FALSE,
  use_labels = TRUE)

Arguments

`data`	Object returned by read_data.
`agg`	Aggregate function label. Either "household_count", "person_count", "trip_count", "sum", "avg", "median", "household_trip_rate", or "person_trip_rate". See Aggregates section
`agg_var`	Character string specifying a numeric variable over which to aggregate. Only relavent when agg is "sum", "avg", or "median"
`by`	Character vector of one or more variable names to group by. See Analysis Groups section.
`subset`	Character string containing a pre-aggregation subset condition using data.table syntax. See Filter section.
`prop`	logical. Use proportions for count aggregates?
`prop_by`	Character vector of one or more variable names by which to group proportions.
`exclude_missing`	logical. Exclude missing responses from summary.
`label`	logical. Use labels for table output?

Value

data.table object aggregated by input specifications containing the following fields:

by variables. For each by variable, a column of the same name is created. They will appear in the order they are listed as factors ordered by their codebook values.
Estimate - Weighted statistic.
SE - Standard error of the weighted statistic.
Survey - Surveyed/sampled statistic.
N - Number of observations/sample size.

Aggregates (`agg`)

What type of aggregate are you interested in?

Frequencies / Proportions

household_count - Count of households
person_count - Count of persons
trip_count - Count of trips
vehicle_count - Count of vehicles

*Use prop = TRUE in combination with a count aggregate to get the proportion.

Numeric Aggregates (Sum / Average / Median)

Must also specify a numeric aggregate variable using the agg_var parameter.

sum - Sum of agg_var
avg - Arithmetic mean of agg_var
median - Median of agg_var

Trip Rates (Daily Person Trips per Person/Household)

Simply put, the count of trips divided by the count of persons or households.

household_trip_rate - Daily trips per household.
person_trip_rate - Daily trips per person.

Analysis Groups (`by`)

By which variables to you wish to aggregate?

Similar to GROUP BY in SQL or a CLASS statement in SAS. There is no limit to the number of variables specified in the character vector, however many by variables can result in groups with small sample sizes which need to be interpreted carefully.

The data.table returned by summarize_data will include a column (of class factor) for each by variable specified.

Filtering (`subset`)

Which households/person/trips do you wish to include or exclude?

Similar to WHERE in SQL, subset allows you to filter observations/rows in the dataset before summarizing/aggregating.

subset is a string that will be evaluated as a logical vector indicating the rows to keep. As mentioned above, the string will be evaluated as the i index in a data.table. In short, similar to the base function subset, there is no need to specify the data object in which the variables are included (i.e.: your code would look like "var < 10" instead of "data$var < 10").

Any variable (or combination of variables) found in the codebook can be used in the subset condition. See Logic for a refresher on R's logical operators when using more than one logical condition.

Quoting within quotes

You will frequently need to include quotes in your string. You can tackle this a few different ways. The following examples would all evaluate the same way:

"HHSTATE %in% c('GA','FL')"
'HHSTATE %in% c("GA","FL")'
"HHSTATE %in% c(\"GA\",\"FL\")"

Examples


# Read the data
hts_data <- read_data(
  study = 'nirpc_2018',
  project_path = 'C:/2018 NIRPC Household Travel Survey'
)

summarize_data(
  data = hts_data,             # Using the hts_data object,
  agg = 'person_trip_rate',    # calculate the person trip rate
  by = 'sex',                  # by gender
  subset = 'emply_ask == "1"'  # for workers
)

Westat-Transportation/surveysummarize documentation built on April 13, 2025, 9:53 p.m.

Westat-Transportation/surveysummarize index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Westat-Transportation/surveysummarize
Survey Summarize

summarize_data: Summarize HTS Data
In Westat-Transportation/surveysummarize: Survey Summarize

Summarize HTS Data

Description

Usage

Arguments

Value

Aggregates (`agg`)

Frequencies / Proportions

Numeric Aggregates (Sum / Average / Median)

Trip Rates (Daily Person Trips per Person/Household)

Analysis Groups (`by`)

Filtering (`subset`)

Quoting within quotes

Examples

Related to summarize_data in Westat-Transportation/surveysummarize...

R Package Documentation

Browse R Packages

We want your feedback!

Westat-Transportation/surveysummarize Survey Summarize

summarize_data: Summarize HTS Data In Westat-Transportation/surveysummarize: Survey Summarize

Summarize HTS Data

Description

Usage

Arguments

Value

Aggregates (agg)

Frequencies / Proportions

Numeric Aggregates (Sum / Average / Median)

Trip Rates (Daily Person Trips per Person/Household)

Analysis Groups (by)

Filtering (subset)

Quoting within quotes

Examples

Related to summarize_data in Westat-Transportation/surveysummarize...

R Package Documentation

Browse R Packages

We want your feedback!

Westat-Transportation/surveysummarize
Survey Summarize

summarize_data: Summarize HTS Data
In Westat-Transportation/surveysummarize: Survey Summarize

Aggregates (`agg`)

Analysis Groups (`by`)

Filtering (`subset`)