cat_all_fn: Compares observed and expected distribution of all...

cat_all_fnR Documentation

Compares observed and expected distribution of all categorical (binomial) variables

Description

Creates plots of observed to expected numbers and ratios for the binomial variables and/or compares reported and calculated p-values for the variables
Reference: Bolland MJ, Gamble GD, Avenell A, Cooper DJ, Grey A. Distributions of baseline categorical variables were different from the expected distributions in randomized trials with integrity concerns. J Clin Epidemiol. 2023;154:117-124

Usage

cat_all_fn(
  df = cat_all_data,
  comp.pvals = "no",
  fisher.sim = "y",
  fish.n.sims = 10000,
  binom = "no",
  two_levels = "no",
  del.disparate = "yes",
  excl.level = "yes",
  seed = 0,
  title = "",
  verbose = TRUE
)

Arguments

df

data frame generated from load_clean function

comp.pvals

"yes" or "no" indicator whether reported and calculated p-values should be compared

fisher.sim

"yes" or "no" indicator whether to allow fisher test to simulate p-values for >2*2 tables

fish.n.sims

number of simulations to use in Fisher test, default 10,000

binom

"yes" or "no" indicator whether observed to expected distributions of binomial variables should be calculated

two_levels

"yes" or "no" indicator whether variables with more than 2 levels should be collapsed to 2 levels

del.disparate

if yes, data in which the absolute difference between group sizes is >20% are deleted

excl.level

"yes" or "no" indicator whether one level of a variable should be deleted. Deleted level is chosen randomly using seed parameter.

seed

seed for random number generator, default 0 = current date and time. Specify seed to make repeatable.

title

title name for plots (optional)

verbose

TRUE or FALSE indicates whether progress bar and comments show and flextable or plot or both are printed

Details

Returns a list containing objects described below and (if verbose = TRUE) prints the flextable cat_all_diff_calc_rep_ft and/or graph cat_all_graph depending on options chosen

Value

list containing objects as described

if p-value comparison used:

  • cat_all_pvals = data frame of data for comparison of reported and calculated p-values

  • cat_all_diff_calc_rep_ft = flextable of comparison of reported and calculated p-values

  • cat_all_diff_calc_rep_data = data frame used to make flextable

  • cat_all_diff_thresh_ft = flextable of comparison of reported and calculated p-values when only threshold given

  • cat_all_diff_thresh_data = data frame used to make flextable for p-value thresholds

if comparing categorical variables used

  • cat_all_graph = plot of observed to expected numbers and differences between groups, top panels are the absolute numbers, bottom panels are the differences between trial arms in two arm studies

  • cat_all_graph_pc = plot of observed to expected numbers expressed as percentages and differences between groups, top panels are the percentages, bottom panels are the differences between trial arms in two arm studies

  • cat_all_data_abs = data frame of data for absolute numbers

  • cat_all_data_df = data frame of data for difference between groups in two arm studies

  • cat_all_dataset_abs = data frame of dataset used for all trials

  • cat_all_dataset_df = data frame of dataset used for two arm trials

  • cat_all_all_graphs list containing

    • abs = plot for absolute numbers only

    • df = plot for difference between groups in two arm studies only

    • pc = plot for percentages only

    • all_pc = composite plot of percentages and absolute numbers

    • individual_graphs list of 6 individual plots making up composite figures

Examples

# load example data
cat_all_data <- load_clean(import= "no", file.cat = "SI_cat_all", cat_all= "yes",
format.cat = "wide")$cat_all_data


# run function comparing p-values only (takes only a few seconds)
cat_all_fn (comp.pvals = "yes")$cat_all_diff_calc_rep_ft

# run function comparing distribution of binomial variables only

# to speed example up limit to 12 2-arm trials with 20 variables
# (takes close to 5 secs)

cat_all_data <- cat_all_data [1:41, c(1:8,10:11,13:15)]

cat_all_fn (binom = "yes", two_levels = "yes", del.disparate = "yes",
excl.level = "yes", seed = 10)$cat_all_graph


# to import an excel spreadsheet (modify using local path,
# file and sheet name, range, and format):

# get path for example files
path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised",
                   mustWork = TRUE)
# delete file name from path
path <- sub("/[^/]+$", "", path)

# load data
cat_all_data <- load_clean(import= "yes", cat_all = "yes", dir = path,
   file.name.cat = "reappraised_examples.xlsx", sheet.name.cat = "SI_cat_all",
   range.name.cat = "A:N", format.cat = "wide")$cat_all_data


reappraised documentation built on Oct. 6, 2023, 9:08 a.m.