Description Usage Arguments Value Examples
This function is a wrapper to the functions opt.ic(), opt.t() and opt.h().
1 2 3 | generalizeToSpecific(model = "lm", Y, X, method = "opt.ic", KLIC = "AICc",
crit.t = 1.64, crit.p = 0.1, test = "LR", tracelevel = 1,
memorymanagement = TRUE)
|
model |
Either "lm" for the linear probability model, "logit" for the logistic probability model, or "probit", for the probit model. The logit and probit models are solved using Iterated Weighted Least Squares, and optimization of the logit model is significantly faster than the probit model. Defaults to "lm". |
Y |
A binary response variable. |
X |
A dataframe of multiple exogenous regressors. |
method |
The optimization strategy. Either "opt.ic" to optimize using information criteria, "opt.t" for step-wise elimination of insignificant values (statistically speaking not a sound procedure, but it will provide a parsimonious model that can be usefull as a benchmark), or "opt.h" to optimize by classical hypothesis tests. defaults to "opt.ic". |
KLIC |
the information criterion used by "opt.ic", either "AIC" or "AICc", defaults to the latter. |
crit.t |
The t-value indicating significance when using method "opt.t", defaults to 1.64. |
crit.p |
the p-value used by method "opt.h" in the hypothesis tests. Defaults to 0.05. |
test |
The hypothesis test used by "opt.h". Defaults to "LR" for the Likelihood Ratio test. Other options are "F", for an F test for joint significance of insignificant parameters, or "Chisq" for a wald test against the Chi squared distribution. Recommended setting is either "LR" as it is less dependent on correct estimation of the standard errors. Keep in mind that "Chisq" is an asymptotic test, anf "F" is more appropiate for small sample tests. Howver "Chisq" holds under milder conditions and should be used if no small sample theory is available for the model. |
tracelevel |
the amount of information to be printed. Passed on to underlying routines. Defaults to 1 for printing, set to 0 for no printing. |
memorymanagement |
TRUE/FALSE indicating whether garbage collection should be forec regularly when memory usage is high. Defaults to TRUE, recommended setting for large datasets. |
share |
between 0-1, specifying the amount of data that should be passed on to the optimization strategies. Defaults to 0.75, to improve speed. Uses getSamples() to maintain first and second moments of the data. |
Either a dataframe of exogenous variables, or a vector containing the collumn names indicating the optimal variables extracted from the supplied dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | randomlogit <- simulateLogit(nobs=8000, pars = c(0.5, -0.4, -0.3, 0.1, 0.05, 0.025, 0.01,
0.005, 0.005, 0.005, 0.005, 0.005, 0.005,
0.0025, 0.0025, 0.0025, 0.0025, 0.0, 0.0, 0.0))
# add multicollinear vector, to see how the method responds to faulty variables.
randomlogit<-cbind(randomlogit,mcv = randomlogit[,2])
Y=randomlogit[,1]
X=randomlogit[,-1]
logit_ic <- generalizeToSpecific(model ="logit", Y, X)
logit_t <- generalizeToSpecific(model ="logit", Y, X, "opt.t")
logit_h <- generalizeToSpecific(model ="logit", Y, X, "opt.h")
probit_ic <- generalizeToSpecific(model ="probit", Y, X)
linear_ic <- generalizeToSpecific(model ="probit", Y, X)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.