simulate_data | R Documentation |
The data is simulated assuming that the response to treatment is influenced by a subset of K unknown covariates (the sensitive covariates) through the following model:
logit(p_i)= mu+lambda*t_i+gamma_1*t_i*x_i1+...+gamma_K*t_i*x_iK,
where p_i is the probability of response to treatment for the i-th patient; mu is the intercept; lambda is the treatment main effect that all patients experience regardless of the values of the covariates; t_i is the treatment that the i-th patient receives (t_i = 0 for the control arm and t_i=1 for the treatment arm); x_i1,...,x_iK are the values for the K unknown sensitive covariates; gamma_1,...,gamma_K are treatment-covariate interaction effects for the K covariates. The model assumes that there is a subset of patients (the sensitive group) with a higher probability of response when treated with the new treatment, compared with the control treatment.
simulate_data(param_file)
param_file |
A name of the parameters' text file. The file sould have a row for each parameter with the name of the parameters followed by a space followed by a value of the parameter. The list of the parameters is as follows: size_stage1 - sample size for stage 1 size_stage2 - sample size for stage 2 num_all_var - number of covariates num_sens_var - number of sensitive covariates mu1 - mean for sensitive covariates in the sensitive group mu2 - mean for the sensitive covariates in the non-sensitive group mu0 - mean for the non-senstive covariates sigma1 - sd for sensitive covariates in the sensitive group sigma2 - sd for the sensitive covariates in the non-sensitive group sigma0 - sd for the non-senstive covariates rho1 - correlation for sensitive covariates in the sensitive group rho2 - correlation for the sensitive covariates in the non-sensitive group rho0 - correlation for the non-senstive covariates perc_sens - prevalence of the sensitive group resp_rate_treat - response rate for everyone on treatment resp_rate_con - response rate for everyone on control resp_rate_sens_treat - response rate for the sensitive group on treatment seed - seed for random number generating threshold_overall - p-value threshold for the test for the differences in the treatment effect in the overall trial population threshold_group - p-value threshold for the test for the treatment effect in the sensitive group standardise_cvrs - A logical flag (0/1) for the standardisation of the risk scores during the analysis. The standardisation is performed with respect to the training data sets, per cross-validation fold full_model - A logical flag for the full model (treatment effect, covariate effect and the interaction effect) for the analysis. When full_model is 0, only interaction effect is included in the model The example of the file is as follows (see also the example of the file in the "data" directory). size_stage1 500 size_stage2 500 num_all_var 100 num_sens_var 10 mu1 1 mu2 0 mu0 0 sigma1 0.5 sigma2 0.1 sigma0 0.5 rho1 0 rho2 0 rho0 0 perc_sens 0.1 resp_rate_treat 0.25 resp_rate_con 0.25 resp_rate_sens_treat 0.7 seed 123 threshold_overall 0.04 threshold_group 0.01 standardise_cvrs 0 full_model 0 |
A list with two data frames (patients, covar) and two vectors (resp.rate, response)
patients: a data frame with one row per patient and the following columns: FID (family ID), IID (individual ID), treat (1 for treatment and 0 for control), sens.status (true sensitivity status), stage (1)
covar: a data frame with covariate data for L covariates
resp.rate: a vector of response rates
response: a vector of simulated binary responses
Svetlana Cherlin, James Wason
analyse_simdata.R
function.
param_file = "data/param.txt"
simdata_stage1 <- simulate_data(param_file)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.