calc: Compute metrics from record (e.g. vital stats) or survey data

View source: R/calc.R

calcR Documentation

Compute metrics from record (e.g. vital stats) or survey data

Description

Compute metrics from record (e.g. vital stats) or survey data

Usage

calc(ph.data, ...)

## S3 method for class 'dtsurvey'
calc(
  ph.data,
  what = NULL,
  where,
  by = NULL,
  metrics = c("mean", "numerator", "denominator"),
  per = NULL,
  win = NULL,
  time_var = NULL,
  proportion = FALSE,
  fancy_time = TRUE,
  ci = 0.95,
  verbose = FALSE,
  ...
)

Arguments

ph.data

data.table or tbl_svy. Dataset.

...

not implemented

what

character vector. Variable to calculate metrics for.

where

subsetting expression

by

character vector. Must refer to variables within ph.data. The variables within ph.data to compute what by

metrics

character. See metrics or scroll below for the available options.

per

integer. The denominator when "rate" or "adjusted-rate" are selected as the metric. Metrics will be multiplied by this value.

win

integer. The number of consecutive units of time (e.g., years, months, etc.) over which the metrics will be calculated, i.e., the 'window' for a rolling average, sum, etc.

time_var

character. The name of the time variable in the dataset. Used in combination with the "win" argument to do time windowed calculations.

proportion

logical. For survey data, should metrics be calculated assuming the output is proportion-like? See details for more. Currently does not have functionality for non-survey data.

fancy_time

logical. If TRUE, a record of all the years going into the data is provided. If FALSE, just a simple range (where certain years within the range might not be represented in your data).

ci

numeric. Confidence level, >0 & <1, typically 0.95

verbose

logical. Mostly unused, but toggles on/off printed warnings.

Details

This function calculates metrics for each variable in what from rows meeting the conditions specified by where for each grouping implied by by.

Available metrics include:

  1. total: Count of people with the given value. Mostly relevant for surveys (where total is approximately mean * sum(pweights)). Returns total, total_se, total_upper, total_lower. total_se, total_upper, & total_lower are only valid for survey data. Default ci (e.g. upper and lower) is 95 percent.

  2. mean: Average response and associated metrics of uncertainty. Returns mean, mean_se, mean_lower, mean_upper. Default ci (e.g. upper and lower) is 95 percent.

  3. rse: Relative standard error. 100*se/mean.

  4. numerator: Sum of non-NA values for 'what“. The numerator is always unweighted.

  5. denominator: Number of rows where what is not NA. The denominator is always unweighted.

  6. obs: Number of unique observations (i.e., rows), agnostic as to whether there is missing data for what. The obs is always unweighted.

  7. median: The median non NA response. Not populated when what is a factor or character. Even for surveys, the median is the unweighted result.

  8. unique.time: Number of unique time points (from time_var) included in each tabulation (i.e., number of unique time points when the what is not missing).

  9. missing: Number of rows in a given grouping with an NA value for what. missing + denominator = Number of people in a given group. When what is a factor/character, the missing information is provided for the other.

  10. missing.prop: The proportion of the data that has an NA value for what.

  11. rate: mean * per. Provides rescaled mean estimates (i.e., per 100 or per 100,0000). Returns rate, rate_se, rate_lower, rate_upper. Default ci (e.g. upper and lower) is 95 percent.

For survey data, use the proportion argument where relevant to ensure metrics are calculated using special proportion (e.g svyciprop) methods. That is, when you want to find the fraction of ____, toggle proportion to TRUE.

Value

a data.table containing the results

References

https://github.com/PHSKC-APDE/rads/wiki/calc

Examples


#record data
test.data <- get_data_birth(
               year = 2015:2017,
               cols = c("chi_year", "kotelchuck",
                        "chi_sex", "fetal_pres"))

test.results <- calc(test.data,
                     what = c("kotelchuck", "fetal_pres"),
                     chi_year == 2016 & chi_sex %in% c('Male', 'Female'),
                      by = c("chi_year", "chi_sex"),
                      metrics = c("mean", "numerator", "denominator",
                                  "total"))

print(test.results)


PHSKC-APDE/rads documentation built on April 14, 2025, 10:47 a.m.