QAS.func: Quasi analytical solution for logit.

Description Usage Arguments Details Value Author(s) References Examples

Description

QAS.func is used to replace numerical optimization with a quasi-analytical approach for logit models on big data. It returns the coefficients, predicted values and quality criteria for the provided variables.

Usage

1
QAS.func(frml, data = data, weights = NULL, seed = NULL, tau = NULL)

Arguments

frml

an object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.

data

a data frame containing the variables in the model (or object coercible by as.data.frame to a data frame). Details of the structure of the data are given under 'Details'.

weights

an optional vector of prior weights to be used in the fitting process. Should be NULL or a numeric vector. In case of NULL, each case is weighted with 1.

seed

saving the state of a random process. Should be NULL or a numeric vector. In case of NULL a seed is generated at random.

tau

an optional parameter proposed by King and Zeng (2001) which comprises prior information about the fraction of ones in the population of the dependent variable. It has to lie between 0 and 1.

Details

A typical predictor has the form dependent_Variable '~' independent_Variables.
The dependent_Variable has two categories.
If there is more than one independent_Variable, they can be combined with a '+'.

The data frame must not contain any missing values.
Metric variables have to be of type numeric. All other variables have to be of type integer.
The first variable in the dataset hat to be the dependent variable.
The scale of large numbers has to be reduced e.g. standardization.

Value

An object of class QAS.func is a list containing at least the following components:

coefficients

a vector of coefficients

weights

the working weights

call

the call of the final function within QAS.func

terms

the term object used

model

the model frame

means.for.cat

the cut points of the metric variables for a categorization of the original dataset

categorized.variables

the variables that have been categorized within QAS

seed

used seed for calculations

Author(s)

This method is based on the research work of Stan Lipovetsky and Birgit Stoltenberg.

References

King, G. & Zeng, L. (2001), Logistic Regression in Rare Events Data, Political Analysis, No. 9 / 2001
Lipovetsky, S. (2014), Analytical closed-form solution for binary logit regression by categorical predictors, Journal of Applied Statistics, No. 42 / 2015
Lipovetsky, S. & Conklin, M. (2014), Best-Worst Scaling in analytical closed-form solution, The Journal of Choice Modelling, No. 10 / 2014
Stoltenberg, B. (2016), Using logit on big data - from iterative methods to analytical solutions, GfK Verein Working Paper Series, No. 3 / 2016

Examples

1
2
3
4
5
6
7
8
# generate Data
y <- as.integer(c(1,0,0,0,1,1,1,0,0,1))
x <- c(15,88,90,60,24,30,26,57,69,18)
z <- as.integer(c(3,2,2,1,3,3,2,1,1,3))
example_data <- data.frame(y,x,z)

# deploy QAS.func-Function
result <- QAS.func(y~x+z, data=example_data, weights=NULL, seed=NULL, tau = NULL)

LisaMaag/QAS documentation built on May 9, 2019, midnight