# genBinom: Generate data for binomial regression In dardisco/LogisticDx: Diagnostic Tests for Models with a Binomial Response

## Description

Generates a `data.frame` or `data.table` with a binary outcome, and a logistic model to describe it.

## Usage

 ```1 2 3 4 5 6 7``` ```genBinomDf(b = 2L, f = 2L, c = 1L, n = 20L, nlf = 3L, pb = 0.5, rc = 0.8, py = 0.5, asFactor = TRUE, model = FALSE, timelim = 5, speedglm = FALSE) genBinomDt(b = 2L, f = 2L, c = 1L, n = 20L, nlf = 3L, pb = 0.5, rc = 0.8, py = 0.5, asFactor = TRUE, model = FALSE, timelim = 5, speedglm = FALSE) ```

## Arguments

 `b` The number of binomial variables (the number of predictors which are binary). These are limited to 0 or 1. `f` The number of factor predictors. The number of predictors which are `factor`s. `c` The number of continuous predictors. the number of predictors which are continuous. `n` The number of observations (rows) in the `data.frame` or `data.table`. `nlf` The number of levels in a factor. `pb` The probability for binomnial predictors: the probability of binomial predictors being =1. E.g. if `pb=0.3`, 30\% will be 1s, 70\% will be 0s `rc` The ratio for continuous variables. The ratio of levels of continuous variables to the total number of observations `n`. E.g. if `rc=0.8` and `n=100`, it will be in the range 1 to 80. `py` The ratio for y, the ratio of 1s to the total number of observations for the binomial predictors. E.g. if `ry=0.5`, 50% will be 1s, 50\% will be 0s. `asFactor` If `asFactor=TRUE` (the default), predictors given as `factor`s will be converted to `factor`s in the data frame before the model is fit. `model` If `model=TRUE`, will also return a model fitted with `stats::glm` or `speedglm::speedglm` `timelim` function will timeout after `timelim` secs. This is present to prevent duplication of rows. `speedglm` If `speedglm=TRUE`, return a model fitted with `speedglm` instead of `glm`. See: ?speedglm::speedglm

## Value

If `model=TRUE`: a list with the following values:

 `df or dt` A `data.frame` (for `genBinomDf`) or `data.table` (for `genBinomDt`). Predictors are labelled x1, x2, ..., xn. The response is y. Rows represent to n observations `model` A model fit with `stats::glm` or `speedglm::speedglm`

If `model=FALSE` a `data.frame` or `data.table` as above.

## Note

`genBinomDt` is faster and more efficient for large datasets.

Using `asFactor=TRUE` with `factor`s which have a large number of `levels` (e.g. `nlf > 30`) on large datasets (e.g. n > 1000) can cause fitting to be excessively slow.

## Examples

 ```1 2 3``` ```set.seed(1) genBinomDf(speedglm=TRUE) genBinomDt(b=0, c=2, n=100L, rc=0.7, model=FALSE) ```

dardisco/LogisticDx documentation built on May 12, 2017, 5:37 p.m.