multi_t_test: Perform t-tests for Multiple Comparisons with Summary...
In PHSKC-APDE/rads: Assisted computation of King County public health data

multi_t_test

R Documentation

Perform t-tests for Multiple Comparisons with Summary Statistics

Description

This function performs t-tests comparing multiple groups against a reference group using summary statistics. It offers flexibility in the method for calculating degrees of freedom, can estimate sample sizes if they are not provided, and can adjust p-values for multiple comparisons.

Usage

multi_t_test(
  means,
  ses,
  reference_index,
  n = NULL,
  alpha = 0.05,
  df_method = "estimated",
  alternative = "two.sided",
  adjust_method = NULL
)

Arguments

`means`	Numeric vector of group means.
`ses`	Numeric vector of standard errors for each group.
`reference_index`	Integer indicating the index of the reference group.
`n`	Optional numeric vector of sample sizes for each group.
`alpha`	Numeric value for significance level (default is `0.05`).
`df_method`	String specifying the method for calculating degrees of freedom. Options are: `'estimated'` (Welch–Satterthwaite equation): This method, which corresponds to Welch's t-test, calculates an approximation of the degrees of freedom based on the sample variances and sizes. It's particularly useful when groups have unequal variances and/or unequal sample sizes, making it generally more reliable than the standard t-test in these situations. It is a data driven approach and is often preferred due to balance between Type I Errors (false +) and Type II Errors (false -). `'conservative'` (df = 2): Uses the minimum possible degrees of freedom, resulting in the widest confidence intervals (for the difference in means) and the most conservative (largest) p-values. Reduces Type I Error (false +) and increases Type II Error (false -). `'moderate'` (df = k - 1): Uses the number of groups minus 1 as the degrees of freedom, providing a balance between conservative and liberal approaches. `'liberal'` (df = Inf): Assumes infinite degrees of freedom, resulting in the narrowest confidence intervals (for the difference in means) and the most liberal (smallest) p-values. Increases Type I Error (false +) and reduces Type II Error (false -). Default is `'estimated'`.
`alternative`	String specifying the alternative hypothesis: `'two.sided'` (default), `'less'`, or `'greater'`. Default is `'two.sided'`.
`adjust_method`	String specifying the method of adjustment for multiple comparisons: `NULL`, `'Holm-Bonferroni'`, `'Benjamini-Hochberg'`. Refer to the `holm` and `bh` descriptions in `p.adjust` in the `stats` package for more information. Default is `NULL`.

Details

This function conducts t-tests to compare multiple groups against a reference group.

The estimated degrees of freedom method (Welch's t-test) is generally preferred and is set as the default. However, when sample sizes (n) are less than 30, results can be unreliable. When n is not specified and df_method = "estimated", the function estimates sample sizes based partly on the distribution of mean values. The quality of these estimates depends on the number of groups (length of the means argument). While the function can estimate sample sizes if not provided, it's always preferable to use actual sample sizes when available to ensure more accurate results.

Value

A data.table containing comparison results with the following columns:

`comparison`	String describing the comparison
`diff_means`	Numeric difference in means
`ci_lower`	Numeric lower bound of the confidence interval
`ci_upper`	Numeric upper bound of the confidence interval
`p.value`	Numeric p-value
`significant`	Logical indicating if the result is significant (TRUE if p-value < alpha, FALSE otherwise)
`t.statistic`	Numeric t-statistic
`df`	Numeric degrees of freedom
`df_method`	String indicating the method used for calculating degrees of freedom
`adjust_method`	String indicating the method used for multiple comparisons p.value adjustment (when `adjust_method` is not `NULL`)

Note

This function assumes unequal variances, which is typically more appropriate for comparisons across demographic groups in vital statistics, survey data, and other population-based studies. Equal variances are rarely encountered in such contexts due to inherent differences between subpopulations. If you have the underlying raw data (not just the means and standard errors) and want to perform calculations assuming equal variances or a paired t-test, please refer to t.test in the stats package.

Examples

# Example 1: Comparing birthweights across different maternal age groups
age_groups <- c("18-24", "25-29", "30-34", "35-39", "40+")
birthweight_means <- c(3150, 3450, 3400, 3250, 3100)  # in grams
birthweight_ses <- c(50, 45, 40, 55, 60)
sample_sizes <- c(500, 800, 750, 400, 200)
reference_group <- 3  # comparing all groups to the 30-34 age group

birthweight_comparison <- multi_t_test(
  means = birthweight_means,
  ses = birthweight_ses,
  reference_index = reference_group,
  n = sample_sizes,
  df_method = "estimated"
)

# Add age group labels to the results
birthweight_comparison[, Age_Group := age_groups]

print(birthweight_comparison)

PHSKC-APDE/rads documentation built on April 14, 2025, 10:47 a.m.