clintrial: Simulated Clinical Trial Dataset
In summata: Publication-Ready Summary Tables and Forest Plots

clintrial

R Documentation

Simulated Clinical Trial Dataset

Description

A simulated dataset from a hypothetical multi-center oncology clinical trial comparing two experimental drugs against control. Designed to demonstrate the full capabilities of descriptive and regression analysis functions.

Usage

clintrial

Format

A data frame with 850 observations and 32 variables:

patient_id: Unique patient identifier (character)
age: Age at enrollment in years (numeric: 18-90)
sex: Biological sex (factor: Female, Male)
race: Self-reported race (factor: White, Black, Asian, Other)
ethnicity: Hispanic ethnicity (factor: Non-Hispanic, Hispanic)
bmi: Body mass index in kg/m^2 (numeric)
smoking: Smoking history (factor: Never, Former, Current)
hypertension: Hypertension diagnosis (factor: No, Yes)
diabetes: Diabetes diagnosis (factor: No, Yes)
ecog: ECOG performance status (factor: 0, 1, 2, 3)
creatinine: Baseline creatinine in mg/dL (numeric)
hemoglobin: Baseline hemoglobin in g/dL (numeric)
biomarker_x: Serum biomarker A in ng/mL (numeric)
biomarker_y: Serum biomarker B in U/L (numeric)
site: Enrolling site (factor: Site Alpha through Site Kappa)
grade: Tumor grade (factor: Well/Moderately/Poorly differentiated)
stage: Disease stage at diagnosis (factor: I, II, III, IV)
treatment: Randomized treatment (factor: Control, Drug A, Drug B)
surgery: Surgical resection (factor: No, Yes)
any_complication: Any post-operative complication (factor: No, Yes)
wound_infection: Post-operative wound infection (factor: No, Yes)
icu_admission: ICU admission required (factor: No, Yes)
readmission_30d: Hospital readmission within 30 days (factor: No, Yes)
pain_score: Pain score at discharge (numeric: 0-10)
recovery_days: Days to functional recovery (numeric)
los_days: Hospital length of stay in days (numeric)
ae_count: Adverse event count (integer). Overdispersed count suitable for negative binomial or quasipoisson regression.
fu_count: Follow-up visit count (integer). Equidispersed count suitable for standard Poisson regression.
pfs_months: Progression-Free Survival Time (months)
pfs_status: Progression or Death Event
os_months: Overall survival time in months (numeric)
os_status: Death indicator (numeric: 0=censored, 1=death)

Details

This dataset includes realistic correlations between variables: - Survival is worse with higher stage, ECOG, age, and biomarker_x - Treatment effects show Drug B > Drug A > Control - ae_count is overdispersed (variance > mean) for negative binomial demos - fu_count is equidispersed (variance \approx mean) for Poisson demos - Approximately 2% of values are missing at random - Median follow-up is approximately 30 months

Source

Simulated data for demonstration purposes

Examples

data(clintrial)
data(clintrial_labels)

# Descriptive statistics by treatment arm
desctable(clintrial,
        by = "treatment", 
        variables = c("age", "sex", "stage", "ecog", 
                     "biomarker_x", "Surv(os_months, os_status)"),
        labels = clintrial_labels)


# Poisson regression for equidispersed counts
fit(clintrial,
    outcome = "fu_count",
    predictors = c("age", "stage", "treatment"),
    model_type = "glm",
    family = "poisson",
    labels = clintrial_labels)

# Negative binomial for overdispersed counts
fit(clintrial,
    outcome = "ae_count",
    predictors = c("age", "treatment", "diabetes"),
    model_type = "negbin",
    labels = clintrial_labels)

# Complete analysis pipeline
fullfit(clintrial,
        outcome = "Surv(os_months, os_status)",
        predictors = c("age", "sex", "stage", "grade", "ecog",
                      "smoking", "biomarker_x", "biomarker_y", "treatment"),
        method = "screen",
        p_threshold = 0.20,
        model_type = "coxph",
        labels = clintrial_labels)

summata documentation built on May 7, 2026, 5:07 p.m.