Synthetic.4-data: Synthetic Dataset #4: p > n case

Description Usage Format Acknowledgments Author(s) Source References


Dataset from simulated regression survival model #4 as described in Dazard et al. (2015). Here, the regression function uses 1/10 of informative predictors in a p > n situation with p = 1000 and n = 100. The rest represents non-informative noisy covariates, which are not part of the design matrix. Survival time was generated from an exponential model with rate parameter λ (and mean 1/λ) according to a Cox-PH model with hazard exp(eta), where eta(.) is the regression function. Censoring indicator were generated from a uniform distribution on [0, 2]. In this synthetic example, all covariates are continuous, i.i.d. from a multivariate standard normal distribution.




Each dataset consists of a numeric matrix containing n=100 observations (samples) by rows and p=1000 variables by columns, not including the censoring indicator and (censored) time-to-event variables. It comes as a compressed Rda data file.


This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. This project was partially funded by the National Institutes of Health NIH - National Cancer Institute (R01-CA160593) to J-E. Dazard and J.S. Rao.


Maintainer: "Jean-Eudes Dazard, Ph.D." [email protected]


See simulated survival model #4 in Dazard et al., 2015.


PRIMsrc documentation built on Oct. 5, 2018, 1:03 a.m.