View source: R/generate_cure_data.R
generate_cure_data | R Documentation |
Simulate data under a mixture cure model.
generate_cure_data(
n = 400,
j = 500,
nonp = 2,
train_prop = 0.75,
n_true = 10,
a = 1,
rho = 0.5,
itct_mean = 0.5,
cens_ub = 20,
alpha = 1,
lambda = 2,
same_signs = FALSE,
model = "weibull"
)
n |
an integer denoting the total sample size. |
j |
an integer denoting the number of penalized predictors which is the same for both the incidence and latency portions of the model. |
nonp |
an integer denoting the number of unpenalized predictors (which is the same for both the incidence and latency portions of the model). |
train_prop |
a numeric value in [0, 1) representing the fraction of |
n_true |
an integer less than |
a |
a numeric value denoting the effect size (signal amplitude) which is the same for both the incidence and latency portions of the model. |
rho |
a numeric value in [0, 1) representing the correlation between adjacent covariates in the same block. |
itct_mean |
a numeric value representing the expectation of the incidence intercept which controls the cure rate. |
cens_ub |
a numeric value representing the upper bound on the censoring
time distribution which follows a uniform distribution on (0, |
alpha |
a numeric value representing the shape parameter in the Weibull density. |
lambda |
a numeric value representing the rate parameter in the Weibull density. |
same_signs |
logical, if TRUE the incidence and latency coefficients have the same signs. |
model |
type of regression model to use for the latency portion of mixture cure model. Can be one of the following:
|
training |
training data.frame which includes Time, Censor, and
covariates. Variables prefixed with |
testing |
testing data.frame which includes Time, Censor, and
covariates. Variables prefixed with |
parameters |
a list including: the indices of true incidence
signals ( |
library(survival)
withr::local_seed(1234)
# This dataset has 2 penalized features associated with the outcome,
# 3 penalized features not associated with the outcome (noise features), and 1
# unpenalized noise feature.
data <- generate_cure_data(n = 1000, j = 5, n_true = 2, nonp = 1, a = 2)
# Extract the training data
training <- data$training
# Extract the testing data
testing <- data$testing
# To identify the features truly associated with incidence
names(training)[grep("^X", names(training))][data$parameters$nonzero_b]
# To identify the features truly associated with latency
names(training)[grep("^X", names(training))][data$parameters$nonzero_beta]
# Fit the model to the training data
fitem <- cureem(Surv(Time, Censor) ~ ., data = training,
x_latency = training)
# Examine the estimated coefficients at the (default) minimum AIC
coef(fitem)
# As the penalty increases, the coefficients for the noise variables shrink
# to or remain at zero, while the truly associated features have coefficient
# paths that depart from zero. This shows the model's ability to distinguish
# signal from noise.
plot(fitem, label = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.