| clintrial | R Documentation |
A simulated dataset from a hypothetical multi-center oncology clinical trial comparing two experimental drugs against control. Designed to demonstrate the full capabilities of descriptive and regression analysis functions.
clintrial
A data frame with 850 observations and 32 variables:
Unique patient identifier (character)
Age at enrollment in years (numeric: 18-90)
Biological sex (factor: Female, Male)
Self-reported race (factor: White, Black, Asian, Other)
Hispanic ethnicity (factor: Non-Hispanic, Hispanic)
Body mass index in kg/m^2 (numeric)
Smoking history (factor: Never, Former, Current)
Hypertension diagnosis (factor: No, Yes)
Diabetes diagnosis (factor: No, Yes)
ECOG performance status (factor: 0, 1, 2, 3)
Baseline creatinine in mg/dL (numeric)
Baseline hemoglobin in g/dL (numeric)
Serum biomarker A in ng/mL (numeric)
Serum biomarker B in U/L (numeric)
Enrolling site (factor: Site Alpha through Site Kappa)
Tumor grade (factor: Well/Moderately/Poorly differentiated)
Disease stage at diagnosis (factor: I, II, III, IV)
Randomized treatment (factor: Control, Drug A, Drug B)
Surgical resection (factor: No, Yes)
Any post-operative complication (factor: No, Yes)
Post-operative wound infection (factor: No, Yes)
ICU admission required (factor: No, Yes)
Hospital readmission within 30 days (factor: No, Yes)
Pain score at discharge (numeric: 0-10)
Days to functional recovery (numeric)
Hospital length of stay in days (numeric)
Adverse event count (integer). Overdispersed count suitable for negative binomial or quasipoisson regression.
Follow-up visit count (integer). Equidispersed count suitable for standard Poisson regression.
Progression-Free Survival Time (months)
Progression or Death Event
Overall survival time in months (numeric)
Death indicator (numeric: 0=censored, 1=death)
This dataset includes realistic correlations between variables:
- Survival is worse with higher stage, ECOG, age, and biomarker_x
- Treatment effects show Drug B > Drug A > Control
- ae_count is overdispersed (variance > mean) for negative binomial demos
- fu_count is equidispersed (variance \approx mean) for Poisson demos
- Approximately 2% of values are missing at random
- Median follow-up is approximately 30 months
Simulated data for demonstration purposes
Other sample data:
clintrial_labels
data(clintrial)
data(clintrial_labels)
# Descriptive statistics by treatment arm
desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "stage", "ecog",
"biomarker_x", "Surv(os_months, os_status)"),
labels = clintrial_labels)
# Poisson regression for equidispersed counts
fit(clintrial,
outcome = "fu_count",
predictors = c("age", "stage", "treatment"),
model_type = "glm",
family = "poisson",
labels = clintrial_labels)
# Negative binomial for overdispersed counts
fit(clintrial,
outcome = "ae_count",
predictors = c("age", "treatment", "diabetes"),
model_type = "negbin",
labels = clintrial_labels)
# Complete analysis pipeline
fullfit(clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "stage", "grade", "ecog",
"smoking", "biomarker_x", "biomarker_y", "treatment"),
method = "screen",
p_threshold = 0.20,
model_type = "coxph",
labels = clintrial_labels)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.