autoGLM: A main function of the package for automated generalized...

Description Usage Arguments Examples

Description

This function is a wrapper around the optimization and selection routines in the package and can be used for automated calibration of GLM's on semi large datasets. generalizeToSpecific is more appropiate for manual R sessions, autoGLM is more appropiate for situations when calibration takes a long time. E.g., it allows to run generalizeToSpecific over a vector of dependent variable classes, and to log and write outputs to disk.

Usage

1
2
3
4
5
6
7
8
autoGLM(data, reclasstable = "default", class = 1,
  outputpath = paste(getwd(), "//", sep = ""), modelname = "autoGLM",
  tracelevel = 1, actions = c("print", "return"), NAval = -9999,
  model = "logit", preselect = "lm", method = "opt.ic", crit.t = 1.64,
  crit.p = 0.05, test = "LR", KLIC = "AICc", accuracytolerance = 0.01,
  confidence.alternative = 0.9, use.share = 0.25, maxsampleruns = 50,
  memorymanagement = TRUE, returnall = FALSE, compress = FALSE,
  JIT = TRUE)

Arguments

data

A dataframe with a categorical response variable in the first column, and covariates in subsequent columns. Typically the product of cbind(Y,X).

reclasstable

A table that maps the first column of data into a binary response variable. By default it will be ommitted (the binary response variable will be identical to data[,1]). See also corinetable, See also reclassify.

class

The class that should be 1 in the binary response variable, all other classes in the categorical variable will be set to 0. Defaults to 1. See also reclassify.

outputpath

The location on the hard drive where output wwill be written to. Defaults to getwd().

modelname

The name of the model, will be used when writing a weightsfile. Defaults to "autoGLM". See also exportWeightsfile.

tracelevel

The amount of information to be printed. Passed on to underlying routines. Defaults to 1 for printing, set to 0 for no printing.

actions

Actions to be taken by autoGLM, by default c("print", "return"), may include any combination of c("write", "print", "log", "return"), for writing a geoDMS weightsfile, See also exportWeightsfile, printing results, writing a log file, and returning results as a list object.

NAval

Optional categorical variable that should be dropped by the reclassification scheme. See also reclassify.

model

Main model type that should be calibrated, either "lm", "probit", or "logit". See also generalizeToSpecific.

preselect

Optional variable preselection using a first order approximation (linear model) of the logit or probit model, by specifying "lm" (default setting). See also selectX.

method

The optimization strategy. Either "opt.ic" to optimize using information criteria, "opt.t" for step-wise elimination of insignificant values (statistically speaking not a sound procedure, but it will provide a parsimonious model that can be usefull as a benchmark), or "opt.h" to optimize by classical hypothesis tests. defaults to "opt.ic". See also See also opt.ic, opt.t, See also opt.h.

crit.t

The t-value indicating significance when using method "opt.t", defaults to 1.64. opt.t.

crit.p

the p-value used by method "opt.h" in the hypothesis tests. Defaults to 0.05. opt.h.

test

The hypothesis test used by "opt.h". Defaults to "LR" for the Likelihood Ratio test. Other options are "F", for an F test for joint significance of insignificant parameters, or "Chisq" for a wald test against the Chi squared distribution. opt.h.

KLIC

The information criterion used by "opt.ic", either "AIC" or "AICc", defaults to the latter. opt.ic.

accuracytolerance

When aut of sample and within sample accuracy differ more than accuracytolerance, a warning will be issued, which is also logged when specifying "log" in actions. Defaults to 0.01. accuracy.

confidence.alternative

See also getSamples, confidence level used for the alternative of dissimilar samples in the sampling routine. Defaults to .85.

use.share

Share of the data used, See also getSamples. Defaults to .25.

maxsampleruns

See also getSamples, defaults to 50.

memorymanagement

TRUE/FALSE indicating whether garbage collection should be forced regularly when memory usage is high. Defaults to TRUE, recommended setting for large datasets. See also tgc.

returnall

TRUE, FALSE, or "writedisk" indicating whether all the outputted objects for each class should be returned in an array as produced by lapply, or whether only the final output should be returnd as an object. Specifying "writedisk" will write the objects containing results of each class as seperate .RDS files, which you can use to restore the output using readRDS(). iapply. Returning an array of all results can consume large amounts of memory as each object contains copies of the used datasets. When working with countrysize datasets, these array objects can easily require over 64gb of RAM. Specifying returnall = FALSE (default setting), is much more more RAM friendly as it stores results for each class in the same memory adress, overwriting previous results. Seting returnall = FALSE, will still write log files and print diagnostics to screen if specified in actions. returnall="writedisk" is the recommended setting, but it is not default. iapply.

compress,

passed on to iapply. Defaults to no compression of RDS output, which is the recommended setting if computation time is valued of disk space. Keep in mind that when using large datasets, autoGLM objects can be several gigabytes in size. iapply.

JIT,

logical indicating whether just-in-time compilation of internal functions should be used. Mainly for historical reasons.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
data(ITdata)
datacorinetable)

results <- autoGLM(data=randomlogit, reclasstable=corinetable, class=0, method ="opt.ic")


# All options:
autoGLM <- function (data, reclasstable = "default", class=1, outputpath=paste(getwd(),"//", sep=""),
 modelname="autoGLM", tracelevel=1,
 actions = c("print", "return"), NAval = -9999,
 model="logit", preselect = "lm", method = "opt.ic", crit.t = 1.64, crit.p =.05,
 test = "LR", KLIC = "AICc", accuracytolerance =0.01, confidence.alternative =0.90,
 use.share = 0.25, maxsampleruns=50, memorymanagement = TRUE, returnall = FALSE,
 compress = FALSE, JIT = TRUE)

BPJandree/AutoGLM documentation built on May 5, 2019, 10:25 a.m.