genLogi: Generate data for logistic regression

Description Usage Arguments Value Note Examples

Description

Generates a data.frame or data.table with a binary outcome, and a logistic model to describe it.

Usage

1
2
3
4
5
6
7
  genLogiDf(b = 2L, f = 2L, c = 1L, n = 20L, nlf = 3L,
    pb = 0.5, rc = 0.8, py = 0.5, asFactor = TRUE,
    model = TRUE, timelim = 5, speedglm = FALSE)

  genLogiDt(b = 2L, f = 2L, c = 1L, n = 20L, nlf = 3L,
    pb = 0.5, rc = 0.8, py = 0.5, asFactor = TRUE,
    model = TRUE, timelim = 5, speedglm = FALSE)

Arguments

b

binomial predictors, the number of predictors which are binary, i.e. limited to 0 or 1

f

factors, the number of predictors which are factors

c

continuous predictors, the number of predictors which are continuous

n

number of observations in the data frame

nlf

the no. of levels in a factor

pb

probability for binomnial predictors: the probability of binomial predictors being =1 e.g. if pb=0.3, 30\% will be 1s, 70\% will be 0s

rc

ratio for continuous variables the ratio of levels of continuous variables to the total number of observations n e.g. if rc=0.8 and n=100, it will be in the range 1-80

py

ratio for y the ratio of 1s to total observations for the binomial predictors e.g. if ry=0.5, 50% will be 1s, 50\% will be 0s

asFactor

If asFactor=TRUE (the default), predictors given as factors will be converted to factors in the data frame before the model is fit

model

If model=TRUE will also return a model fitted with stats::glm or speedglm::speedglm

timelim

function will timeout after timelim secs. This is present to prevent duplication of rows.

speedglm

If speedglm=TRUE, return a model fitted with speedglm instead of glm

Value

If model=TRUE: a list with the following values:

df or dt

A data.frame (for genLogiDf) or data.table (for genLogiDt).
Predictors are labelled x1, x2, ..., xn.
Outcome is y.
Rows represent to n observations

model

A model fit with stats::glm or speedglm::speedglm

If model=FALSE a data.frame or data.table as above.

Note

genLogiDt is faster and more efficient for larger datasets.

Using asFactor=TRUE with factors which have a large number of levels (e.g. nlf >30) on large datasets (e.g. n >1000) can cause fitting to be excessively slow.

Examples

1
2
3
set.seed(1)
genLogiDf()
genLogiDt(b=0, c=2, n=100, rc=0.7, model=FALSE)

logisticDx documentation built on May 2, 2019, 6:30 p.m.