clintrial: Simulated Clinical Trial Dataset

clintrialR Documentation

Simulated Clinical Trial Dataset

Description

A simulated dataset from a hypothetical multi-center oncology clinical trial comparing two experimental drugs against control. Designed to demonstrate the full capabilities of descriptive and regression analysis functions.

Usage

clintrial

Format

A data frame with 850 observations and 32 variables:

patient_id

Unique patient identifier (character)

age

Age at enrollment in years (numeric: 18-90)

sex

Biological sex (factor: Female, Male)

race

Self-reported race (factor: White, Black, Asian, Other)

ethnicity

Hispanic ethnicity (factor: Non-Hispanic, Hispanic)

bmi

Body mass index in kg/m^2 (numeric)

smoking

Smoking history (factor: Never, Former, Current)

hypertension

Hypertension diagnosis (factor: No, Yes)

diabetes

Diabetes diagnosis (factor: No, Yes)

ecog

ECOG performance status (factor: 0, 1, 2, 3)

creatinine

Baseline creatinine in mg/dL (numeric)

hemoglobin

Baseline hemoglobin in g/dL (numeric)

biomarker_x

Serum biomarker A in ng/mL (numeric)

biomarker_y

Serum biomarker B in U/L (numeric)

site

Enrolling site (factor: Site Alpha through Site Kappa)

grade

Tumor grade (factor: Well/Moderately/Poorly differentiated)

stage

Disease stage at diagnosis (factor: I, II, III, IV)

treatment

Randomized treatment (factor: Control, Drug A, Drug B)

surgery

Surgical resection (factor: No, Yes)

any_complication

Any post-operative complication (factor: No, Yes)

wound_infection

Post-operative wound infection (factor: No, Yes)

icu_admission

ICU admission required (factor: No, Yes)

readmission_30d

Hospital readmission within 30 days (factor: No, Yes)

pain_score

Pain score at discharge (numeric: 0-10)

recovery_days

Days to functional recovery (numeric)

los_days

Hospital length of stay in days (numeric)

ae_count

Adverse event count (integer). Overdispersed count suitable for negative binomial or quasipoisson regression.

fu_count

Follow-up visit count (integer). Equidispersed count suitable for standard Poisson regression.

pfs_months

Progression-Free Survival Time (months)

pfs_status

Progression or Death Event

os_months

Overall survival time in months (numeric)

os_status

Death indicator (numeric: 0=censored, 1=death)

Details

This dataset includes realistic correlations between variables: - Survival is worse with higher stage, ECOG, age, and biomarker_x - Treatment effects show Drug B > Drug A > Control - ae_count is overdispersed (variance > mean) for negative binomial demos - fu_count is equidispersed (variance \approx mean) for Poisson demos - Approximately 2% of values are missing at random - Median follow-up is approximately 30 months

Source

Simulated data for demonstration purposes

See Also

Other sample data: clintrial_labels

Examples

data(clintrial)
data(clintrial_labels)

# Descriptive statistics by treatment arm
desctable(clintrial,
        by = "treatment", 
        variables = c("age", "sex", "stage", "ecog", 
                     "biomarker_x", "Surv(os_months, os_status)"),
        labels = clintrial_labels)


# Poisson regression for equidispersed counts
fit(clintrial,
    outcome = "fu_count",
    predictors = c("age", "stage", "treatment"),
    model_type = "glm",
    family = "poisson",
    labels = clintrial_labels)

# Negative binomial for overdispersed counts
fit(clintrial,
    outcome = "ae_count",
    predictors = c("age", "treatment", "diabetes"),
    model_type = "negbin",
    labels = clintrial_labels)

# Complete analysis pipeline
fullfit(clintrial,
        outcome = "Surv(os_months, os_status)",
        predictors = c("age", "sex", "stage", "grade", "ecog",
                      "smoking", "biomarker_x", "biomarker_y", "treatment"),
        method = "screen",
        p_threshold = 0.20,
        model_type = "coxph",
        labels = clintrial_labels)

        

summata documentation built on May 7, 2026, 5:07 p.m.