Home

/

GitHub

/

Westat-Transportation/summarizeNHTS

/

summarize_data: Summarize NHTS Data

summarize_data: Summarize NHTS Data
In Westat-Transportation/summarizeNHTS: National Household Travel Survey R Analysis Toolkit

Description Usage Arguments Value Aggregates (agg) Analysis Groups (by) Filtering (subset) Examples

View source: R/summarize_data.R

Create weighted aggregate tables using NHTS data.

1
2
3

summarize_data(data, agg, agg_var = NULL, by = NULL, subset = NULL,
  label = TRUE, prop = FALSE, prop_by = NULL,
  exclude_missing = FALSE)

`data`	Object returned by read_data.
`agg`	Aggregate function label. Either "household_count", "person_count", "trip_count", "sum", "avg", "median", "household_trip_rate", or "person_trip_rate". See Aggregates section
`agg_var`	Character string specifying a numeric variable over which to aggregate. Only relavent when agg is "avg" or "sum"
`by`	Character vector of one or more variable names to group by. See Analysis Groups section.
`subset`	Character string containing a pre-aggregation subset condition using data.table syntax. See Filter section.
`label`	logical. Use labels for table output?
`prop`	logical. Use proportions for count aggregates?
`prop_by`	Character vector of one or more variable names by which to group proportions.
`exclude_missing`	logical. Exclude missing responses from summary.

data.table object aggregated by input specifications containing the following fields:

by variables. For each by variable, a column of the same name is created. They will appear in the order they are listed as factors ordered by their codebook values.
W - Weighted statistic.
E - Standard error of the weighted statistic.
S - Surveyed/sampled statistic.
N - Number of observations/sample size.

Aggregates (`agg`)

What type of aggregate are you interested in?

Frequencies / Proportions

household_count - Count of households
person_count - Count of persons
trip_count - Count of trips
vehicle_count - Count of vehicles

*Use prop = TRUE in combination with a count aggregate to get the proportion.

Numeric Aggregates (Sum / Average / Median)

Must also specify a numeric aggregate variable using the agg_var parameter.

sum - Sum of agg_var
avg - Arithmetic mean of agg_var
median - Median of agg_var

Trip Rates (Daily Person Trips per Person/Household)

Simply put, the count of trips divided by the count of persons or households.

household_trip_rate - Daily trips per household.
person_trip_rate - Daily trips per person.

Analysis Groups (`by`)

By which variables to you wish to aggregate?

Similar to GROUP BY in SQL or a CLASS statement in SAS. There is no limit to the number of variables specified in the character vector, however many by variables can result in groups with small sample sizes which need to be interpreted carefully.

The data.table returned by summarize_data will include a column (of class factor) for each by variable specified.

Filtering (`subset`)

Which households/person/trips do you wish to include or exclude?

Similar to WHERE in SQL, subset allows you to filter observations/rows in the dataset before summarizing/aggregating.

subset is a string that will be evaluated as a logical vector indicating the rows to keep. As mentioned above, the string will be evaluated as the i index in a data.table. In short, similar to the base function subset, there is no need to specify the data object in which the variables are included (i.e.: your code would look like "var < 10" instead of "data$var < 10").

Any variable (or combination of variables) found in the codebook can be used in the subset condition. See Logic for a refresher on R's logical operators when using more than one logical condition.

Quoting within quotes

You will frequently need to include quotes in your string. You can tackle this a few different ways. The following examples would all evaluate the same way:

"HHSTATE %in% c('GA','FL')"
'HHSTATE %in% c("GA","FL")'
"HHSTATE %in% c(\"GA\",\"FL\")"

# Read 2009 NHTS data with specified csv path:
nhts_data <- read_data('2009', csv_path = 'C:/NHTS')

summarize_data(
  data = nhts_data,           # Using the nhts_data object,
  agg = 'person_trip_rate',   # calculate the person trip rate
  by = 'WORKER',              # by worker status
  subset = 'CENSUS_R == "01"' # for households in the NE Census region
)

Westat-Transportation/summarizeNHTS documentation built on May 17, 2020, 8:57 p.m.

Westat-Transportation/summarizeNHTS index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Westat-Transportation/summarizeNHTS
National Household Travel Survey R Analysis Toolkit

summarize_data: Summarize NHTS Data
In Westat-Transportation/summarizeNHTS: National Household Travel Survey R Analysis Toolkit

Description

Usage

Arguments

Value

Aggregates (`agg`)

Frequencies / Proportions

Numeric Aggregates (Sum / Average / Median)

Trip Rates (Daily Person Trips per Person/Household)

Analysis Groups (`by`)

Filtering (`subset`)

Quoting within quotes

Examples

Related to summarize_data in Westat-Transportation/summarizeNHTS...

R Package Documentation

Browse R Packages

We want your feedback!

Westat-Transportation/summarizeNHTS National Household Travel Survey R Analysis Toolkit

summarize_data: Summarize NHTS Data In Westat-Transportation/summarizeNHTS: National Household Travel Survey R Analysis Toolkit

Description

Usage

Arguments

Value

Aggregates (agg)

Frequencies / Proportions

Numeric Aggregates (Sum / Average / Median)

Trip Rates (Daily Person Trips per Person/Household)

Analysis Groups (by)

Filtering (subset)

Quoting within quotes

Examples

Related to summarize_data in Westat-Transportation/summarizeNHTS...

R Package Documentation

Browse R Packages

We want your feedback!

Westat-Transportation/summarizeNHTS
National Household Travel Survey R Analysis Toolkit

summarize_data: Summarize NHTS Data
In Westat-Transportation/summarizeNHTS: National Household Travel Survey R Analysis Toolkit

Aggregates (`agg`)

Analysis Groups (`by`)

Filtering (`subset`)