multi_t_test: Perform t-tests for Multiple Comparisons with Summary...

multi_t_testR Documentation

Perform t-tests for Multiple Comparisons with Summary Statistics

Description

This function performs t-tests comparing multiple groups against a reference group using summary statistics. It offers flexibility in the method for calculating degrees of freedom, can estimate sample sizes if they are not provided, and can adjust p-values for multiple comparisons.

Usage

multi_t_test(
  means,
  ses,
  reference_index,
  n = NULL,
  alpha = 0.05,
  df_method = "estimated",
  alternative = "two.sided",
  adjust_method = NULL
)

Arguments

means

Numeric vector of group means.

ses

Numeric vector of standard errors for each group.

reference_index

Integer indicating the index of the reference group.

n

Optional numeric vector of sample sizes for each group.

alpha

Numeric value for significance level (default is 0.05).

df_method

String specifying the method for calculating degrees of freedom. Options are:

  • 'estimated' (Welch–Satterthwaite equation): This method, which corresponds to Welch's t-test, calculates an approximation of the degrees of freedom based on the sample variances and sizes. It's particularly useful when groups have unequal variances and/or unequal sample sizes, making it generally more reliable than the standard t-test in these situations. It is a data driven approach and is often preferred due to balance between Type I Errors (false +) and Type II Errors (false -).

  • 'conservative' (df = 2): Uses the minimum possible degrees of freedom, resulting in the widest confidence intervals (for the difference in means) and the most conservative (largest) p-values. Reduces Type I Error (false +) and increases Type II Error (false -).

  • 'moderate' (df = k - 1): Uses the number of groups minus 1 as the degrees of freedom, providing a balance between conservative and liberal approaches.

  • 'liberal' (df = Inf): Assumes infinite degrees of freedom, resulting in the narrowest confidence intervals (for the difference in means) and the most liberal (smallest) p-values. Increases Type I Error (false +) and reduces Type II Error (false -).

Default is 'estimated'.

alternative

String specifying the alternative hypothesis: 'two.sided' (default), 'less', or 'greater'. Default is 'two.sided'.

adjust_method

String specifying the method of adjustment for multiple comparisons: NULL, 'Holm-Bonferroni', 'Benjamini-Hochberg'. Refer to the holm and bh descriptions in p.adjust in the stats package for more information. Default is NULL.

Details

This function conducts t-tests to compare multiple groups against a reference group.

The estimated degrees of freedom method (Welch's t-test) is generally preferred and is set as the default. However, when sample sizes (n) are less than 30, results can be unreliable. When n is not specified and df_method = "estimated", the function estimates sample sizes based partly on the distribution of mean values. The quality of these estimates depends on the number of groups (length of the means argument). While the function can estimate sample sizes if not provided, it's always preferable to use actual sample sizes when available to ensure more accurate results.

Value

A data.table containing comparison results with the following columns:

comparison

String describing the comparison

diff_means

Numeric difference in means

ci_lower

Numeric lower bound of the confidence interval

ci_upper

Numeric upper bound of the confidence interval

p.value

Numeric p-value

significant

Logical indicating if the result is significant (TRUE if p-value < alpha, FALSE otherwise)

t.statistic

Numeric t-statistic

df

Numeric degrees of freedom

df_method

String indicating the method used for calculating degrees of freedom

adjust_method

String indicating the method used for multiple comparisons p.value adjustment (when adjust_method is not NULL)

Note

This function assumes unequal variances, which is typically more appropriate for comparisons across demographic groups in vital statistics, survey data, and other population-based studies. Equal variances are rarely encountered in such contexts due to inherent differences between subpopulations. If you have the underlying raw data (not just the means and standard errors) and want to perform calculations assuming equal variances or a paired t-test, please refer to t.test in the stats package.

Examples

# Example 1: Comparing birthweights across different maternal age groups
age_groups <- c("18-24", "25-29", "30-34", "35-39", "40+")
birthweight_means <- c(3150, 3450, 3400, 3250, 3100)  # in grams
birthweight_ses <- c(50, 45, 40, 55, 60)
sample_sizes <- c(500, 800, 750, 400, 200)
reference_group <- 3  # comparing all groups to the 30-34 age group

birthweight_comparison <- multi_t_test(
  means = birthweight_means,
  ses = birthweight_ses,
  reference_index = reference_group,
  n = sample_sizes,
  df_method = "estimated"
)

# Add age group labels to the results
birthweight_comparison[, Age_Group := age_groups]

print(birthweight_comparison)


PHSKC-APDE/rads documentation built on April 14, 2025, 10:47 a.m.