QAS.func: Quasi analytical solution for logit.
In LisaMaag/QAS: Quasi-analytical solution for logit

Description Usage Arguments Details Value Author(s) References Examples

QAS.func is used to replace numerical optimization with a quasi-analytical approach for logit models on big data. It returns the coefficients, predicted values and quality criteria for the provided variables.

1	QAS.func(frml, data = data, weights = NULL, seed = NULL, tau = NULL)

`frml`	an object of class `formula` (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.
`data`	a data frame containing the variables in the model (or object coercible by `as.data.frame` to a data frame). Details of the structure of the data are given under 'Details'.
`weights`	an optional vector of prior weights to be used in the fitting process. Should be NULL or a numeric vector. In case of NULL, each case is weighted with 1.
`seed`	saving the state of a random process. Should be NULL or a numeric vector. In case of NULL a seed is generated at random.
`tau`	an optional parameter proposed by King and Zeng (2001) which comprises prior information about the fraction of ones in the population of the dependent variable. It has to lie between 0 and 1.

A typical predictor has the form dependent_Variable '~' independent_Variables.
The dependent_Variable has two categories.
If there is more than one independent_Variable, they can be combined with a '+'.

The data frame must not contain any missing values.
Metric variables have to be of type numeric. All other variables have to be of type integer.
The first variable in the dataset hat to be the dependent variable.
The scale of large numbers has to be reduced e.g. standardization.

An object of class QAS.func is a list containing at least the following components:

coefficients: a vector of coefficients
weights: the working weights
call: the call of the final function within QAS.func
terms: the term object used
model: the model frame
means.for.cat: the cut points of the metric variables for a categorization of the original dataset
categorized.variables: the variables that have been categorized within QAS
seed: used seed for calculations

This method is based on the research work of Stan Lipovetsky and Birgit Stoltenberg.

King, G. & Zeng, L. (2001), Logistic Regression in Rare Events Data, Political Analysis, No. 9 / 2001
Lipovetsky, S. (2014), Analytical closed-form solution for binary logit regression by categorical predictors, Journal of Applied Statistics, No. 42 / 2015
Lipovetsky, S. & Conklin, M. (2014), Best-Worst Scaling in analytical closed-form solution, The Journal of Choice Modelling, No. 10 / 2014
Stoltenberg, B. (2016), Using logit on big data - from iterative methods to analytical solutions, GfK Verein Working Paper Series, No. 3 / 2016

# generate Data
y <- as.integer(c(1,0,0,0,1,1,1,0,0,1))
x <- c(15,88,90,60,24,30,26,57,69,18)
z <- as.integer(c(3,2,2,1,3,3,2,1,1,3))
example_data <- data.frame(y,x,z)

# deploy QAS.func-Function
result <- QAS.func(y~x+z, data=example_data, weights=NULL, seed=NULL, tau = NULL)