genBinom: Generate data for binomial regression

Description Usage Arguments Value Note Examples

Description

Generates a data.frame or data.table with a binary outcome, and a logistic model to describe it.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
genBinomDf(
  b = 2L,
  f = 2L,
  c = 1L,
  n = 20L,
  nlf = 3L,
  pb = 0.5,
  rc = 0.8,
  py = 0.5,
  asFactor = TRUE,
  model = FALSE,
  timelim = 5,
  speedglm = FALSE
)

genBinomDt(
  b = 2L,
  f = 2L,
  c = 1L,
  n = 20L,
  nlf = 3L,
  pb = 0.5,
  rc = 0.8,
  py = 0.5,
  asFactor = TRUE,
  model = FALSE,
  timelim = 5,
  speedglm = FALSE
)

Arguments

b

The number of binomial variables (the number of predictors which are binary).
These are limited to 0 or 1.

f

The number of factor predictors.
The number of predictors which are factors.

c

The number of continuous predictors.
the number of predictors which are continuous.

n

The number of observations (rows) in the data.frame or data.table.

nlf

The number of levels in a factor.

pb

The probability for binomnial predictors:
the probability of binomial predictors being =1.
E.g. if pb=0.3, 30\% will be 1s, 70\% will be 0s

rc

The ratio for continuous variables.
The ratio of levels of continuous variables to the total number of observations n.
E.g. if rc=0.8 and n=100, it will be in the range 1 to 80.

py

The ratio for y, the ratio of 1s to the total number of observations for the binomial predictors.
E.g. if ry=0.5, 50% will be 1s, 50\% will be 0s.

asFactor

If asFactor=TRUE (the default), predictors given as factors will be converted to factors in the data frame before the model is fit.

model

If model=TRUE, will also return a model fitted with stats::glm or speedglm::speedglm

timelim

function will timeout after timelim secs. This is present to prevent duplication of rows.

speedglm

If speedglm=TRUE, return a model fitted with speedglm instead of glm. See: ?speedglm::speedglm

Value

If model=TRUE: a list with the following values:

df or dt

A data.frame (for genBinomDf) or data.table (for genBinomDt).
Predictors are labelled x1, x2, ..., xn.
The response is y.
Rows represent to n observations

model

A model fit with stats::glm or speedglm::speedglm

If model=FALSE a data.frame or data.table as above.

Note

genBinomDt is faster and more efficient for large datasets.

Using asFactor=TRUE with factors which have a large number of levels (e.g. nlf > 30) on large datasets (e.g. n > 1000) can cause fitting to be excessively slow.

Examples

1
2
3
4
set.seed(1)
genBinomDf(speedglm=TRUE)

genBinomDt(b=0, c=2, n=100L, rc=0.7, model=FALSE)

dardisco/LogisticDx documentation built on Dec. 26, 2021, 8:11 p.m.