generalizeToSpecific: A main function of the package to apply generalize to...

Description Usage Arguments Value Examples

View source: R/autoGLM.R

Description

This function is a wrapper to the functions opt.ic(), opt.t() and opt.h().

Usage

1
2
3
generalizeToSpecific(model = "lm", Y, X, method = "opt.ic", KLIC = "AICc",
  crit.t = 1.64, crit.p = 0.1, test = "LR", tracelevel = 1,
  memorymanagement = TRUE)

Arguments

model

Either "lm" for the linear probability model, "logit" for the logistic probability model, or "probit", for the probit model. The logit and probit models are solved using Iterated Weighted Least Squares, and optimization of the logit model is significantly faster than the probit model. Defaults to "lm".

Y

A binary response variable.

X

A dataframe of multiple exogenous regressors.

method

The optimization strategy. Either "opt.ic" to optimize using information criteria, "opt.t" for step-wise elimination of insignificant values (statistically speaking not a sound procedure, but it will provide a parsimonious model that can be usefull as a benchmark), or "opt.h" to optimize by classical hypothesis tests. defaults to "opt.ic".

KLIC

the information criterion used by "opt.ic", either "AIC" or "AICc", defaults to the latter.

crit.t

The t-value indicating significance when using method "opt.t", defaults to 1.64.

crit.p

the p-value used by method "opt.h" in the hypothesis tests. Defaults to 0.05.

test

The hypothesis test used by "opt.h". Defaults to "LR" for the Likelihood Ratio test. Other options are "F", for an F test for joint significance of insignificant parameters, or "Chisq" for a wald test against the Chi squared distribution. Recommended setting is either "LR" as it is less dependent on correct estimation of the standard errors. Keep in mind that "Chisq" is an asymptotic test, anf "F" is more appropiate for small sample tests. Howver "Chisq" holds under milder conditions and should be used if no small sample theory is available for the model.

tracelevel

the amount of information to be printed. Passed on to underlying routines. Defaults to 1 for printing, set to 0 for no printing.

memorymanagement

TRUE/FALSE indicating whether garbage collection should be forec regularly when memory usage is high. Defaults to TRUE, recommended setting for large datasets.

share

between 0-1, specifying the amount of data that should be passed on to the optimization strategies. Defaults to 0.75, to improve speed. Uses getSamples() to maintain first and second moments of the data.

Value

Either a dataframe of exogenous variables, or a vector containing the collumn names indicating the optimal variables extracted from the supplied dataset.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
randomlogit <- simulateLogit(nobs=8000, pars = c(0.5, -0.4, -0.3, 0.1, 0.05, 0.025, 0.01,
                                                 0.005, 0.005, 0.005, 0.005, 0.005, 0.005,
                                                 0.0025, 0.0025, 0.0025, 0.0025, 0.0, 0.0, 0.0))
# add multicollinear vector, to see how the method responds to faulty variables.
randomlogit<-cbind(randomlogit,mcv = randomlogit[,2])

Y=randomlogit[,1]
X=randomlogit[,-1]

logit_ic <- generalizeToSpecific(model ="logit", Y, X)

logit_t <- generalizeToSpecific(model ="logit", Y, X, "opt.t")

logit_h <- generalizeToSpecific(model ="logit", Y, X, "opt.h")

probit_ic <- generalizeToSpecific(model ="probit", Y, X)
linear_ic <- generalizeToSpecific(model ="probit", Y, X)

BPJandree/AutoGLM documentation built on Nov. 6, 2018, 10:43 p.m.