Description Usage Arguments Value Examples
This function is a wrapper to the functions bestlinearX(), bestlogitX() and bestprobitX(), with an additional option to call getSamples for improved speed. Take into account that sampling itself takes time, such that total computational burden is a trade-off between the load of the getSample function and the model optimization itself.
1 2 3 4 |
Y |
A binary response variable. |
X |
A dataframe of multiple exogenous regressors. |
model |
Either "lm" for the linear probability model, "logit" for the logistic probability model, or "probit", for the probit model. The logit and probit models are solved using Iterated Weighted Least Squares, and optimization of the logit model is significantly faster than the probit model. Defaults to "lm". |
returntype |
Either "data" to return a dataset, or colnames" to only return the collumn names of the variables that are used in the optimal model. "data" by default. |
method |
The optimization strategy. Either "opt.ic" to optimize using information criteria, "opt.t" for step-wise elimination of insignificant values (statistically speaking not a sound procedure, but it will provide a parsimonious model that can be usefull as a benchmark), or "opt.h" to optimize by classical hypothesis tests. defaults to "opt.ic". |
KLIC |
the information criterion used by "opt.ic", either "AIC" or "AICc", defaults to the latter. |
crit.t |
The t-value indicating significance when using method "opt.t", defaults to 1.64. |
crit.p |
the p-value used by method "opt.h" in the hypothesis tests. Defaults to 0.05. |
test |
The hypothesis test used by "opt.h". Defaults to "LR" for the Likelihood Ratio test. Other options are "F", for an F test for joint significance of insignificant parameters, or "Chisq" for a wald test against the Chi squared distribution. Recommended setting is either "LR" as it is less dependent on correct estimation of the standard errors. Keep in mind that "Chisq" is an asymptotic test, anf "F" is more appropiate for small sample tests. Howver "Chisq" holds under milder conditions and should be used if no small sample theory is available for the model. |
share |
between 0-1, specifying the amount of data that should be passed on to the optimization strategies. Defaults to 0.75, to improve speed. Uses getSamples() to maintain first and second moments of the data. |
confidence.alternative |
passed on to getSample. Defaults to .85. |
max.iter |
passed on to getSample. Defaults to 50. |
tracelevel |
the amount of information to be printed. Passed on to underlying routines. Defaults to 1 for printing, set to 0 for no printing. |
memorymanagement |
TRUE/FALSE indicating whether garbage collection should be forec regularly when memory usage is high. Defaults to TRUE, recommended setting for large datasets. |
Either a dataframe of exogenous variables, or a vector containing the collumn names indicating the optimal variables extracted from the supplied dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13 | # load data
data(ITdata)
data(corinetable)
#Grab a sample (optional).
sample <- ITdata[getSamples(ITdata, share =.05),]
# Reclassify
catITdata <- reclassify(sample, reclasstable = corinetable)
# create a binary response dataset.
Y <- MLtoBinomData(catITdata[,1], class =1)
X <- catITdata[,-1]
selectX(Y, X, model ="lm", returntype = "colnames", method = "opt.t")
bestX <- selectX(Y, X)
describe(bestX)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.