genBinom: Generate data for binomial regression

Description Usage Arguments Value Note Examples

Description

Generates a data.frame or data.table with a binary outcome, and a logistic model to describe it.

Usage

1
2
3
4
5
6
7
genBinomDf(b = 2L, f = 2L, c = 1L, n = 20L, nlf = 3L, pb = 0.5,
  rc = 0.8, py = 0.5, asFactor = TRUE, model = FALSE, timelim = 5,
  speedglm = FALSE)

genBinomDt(b = 2L, f = 2L, c = 1L, n = 20L, nlf = 3L, pb = 0.5,
  rc = 0.8, py = 0.5, asFactor = TRUE, model = FALSE, timelim = 5,
  speedglm = FALSE)

Arguments

b

The number of binomial variables (the number of predictors which are binary).
These are limited to 0 or 1.

f

The number of factor predictors.
The number of predictors which are factors.

c

The number of continuous predictors.
the number of predictors which are continuous.

n

The number of observations (rows) in the data.frame or data.table.

nlf

The number of levels in a factor.

pb

The probability for binomnial predictors:
the probability of binomial predictors being =1.
E.g. if pb=0.3, 30\% will be 1s, 70\% will be 0s

rc

The ratio for continuous variables.
The ratio of levels of continuous variables to the total number of observations n.
E.g. if rc=0.8 and n=100, it will be in the range 1 to 80.

py

The ratio for y, the ratio of 1s to the total number of observations for the binomial predictors.
E.g. if ry=0.5, 50% will be 1s, 50\% will be 0s.

asFactor

If asFactor=TRUE (the default), predictors given as factors will be converted to factors in the data frame before the model is fit.

model

If model=TRUE, will also return a model fitted with stats::glm or speedglm::speedglm

timelim

function will timeout after timelim secs. This is present to prevent duplication of rows.

speedglm

If speedglm=TRUE, return a model fitted with speedglm instead of glm. See: ?speedglm::speedglm

Value

If model=TRUE: a list with the following values:

df or dt

A data.frame (for genBinomDf) or data.table (for genBinomDt).
Predictors are labelled x1, x2, ..., xn.
The response is y.
Rows represent to n observations

model

A model fit with stats::glm or speedglm::speedglm

If model=FALSE a data.frame or data.table as above.

Note

genBinomDt is faster and more efficient for large datasets.

Using asFactor=TRUE with factors which have a large number of levels (e.g. nlf > 30) on large datasets (e.g. n > 1000) can cause fitting to be excessively slow.

Examples

1
2
3
set.seed(1)
genBinomDf(speedglm=TRUE)
genBinomDt(b=0, c=2, n=100L, rc=0.7, model=FALSE)

dardisco/LogisticDx documentation built on May 12, 2017, 5:37 p.m.