Description Usage Format Details

A data set containing the binary outcome and 1028 predictor variables of 400 artificial AML patients.

1 |

A data frame with 400 rows and 1029 variables:

- pl_out: (
`pl_data[,1029]`

) binary outcome representing refractory status.

- b1: (
`pl_data[,1:4]`

) 4 binary variables representing variables with a known influence on the outcome.

- b2: (
`pl_data[,5:9]`

) 5 continuous variables representing clinical variables.

- b3: (
`pl_data[,10:28]`

) 19 binary variables representing mutations.

- b4: (
`pl_data[,29:1028]`

) 1000 continuous variables representing gene expression data.

We generated the data in the following way: We took the empirical correlation of 1028 variables related to
315 AML patients. This correlation served as a correlation matrix when generating 1028 multivariate
normally distributed variables with the R function `rmvnorm`

. Because we didn't have a positive
definite matrix, we took the nearest positive definite matrix according to the function `nearPD`

.
The variables that should be binary were dichotomized, so that their marginal probabilities corresponded to
the marginal probabilities they were based on.
The coefficients were defined by

`beta_b1 <- c(0.8, 0.8, 0.6, 0.6)`

`beta_b2 <- c(rep(0.5,3), rep(0,2))`

`beta_b3 <- c(rep(0.4, 4), rep(0,15))`

`beta_b4 <- c(rep(0.5, 5), rep(0.3, 5), rep(0,990))`

.

We included them in the vector `beta <- c(beta_b1, beta_b2, beta_b3, beta_b4)`

and calculated
the probability through

*pi = exp(β*x) / (1 + exp(β*x))*

where x denotes our data matrix
with 1028 predictor variables. Finally we got the outcome through
`pl_out <- rbinom(400, size = 1, p = pi)`

.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.