bayesx.control: Control Parameters for BayesX
In R2BayesX: Estimate Structured Additive Regression Models with 'BayesX'

bayesx.control

R Documentation

Control Parameters for BayesX

Description

Various parameters that control fitting of regression models using bayesx.

Usage

bayesx.control(model.name = "bayesx.estim", 
  family = "gaussian", method = "MCMC", verbose = FALSE, 
  dir.rm = TRUE, outfile = NULL, replace = FALSE, iterations = 12000L,
  burnin = 2000L, maxint = NULL, step = 10L, predict = TRUE,
  seed = NULL, hyp.prior = NULL, distopt = NULL, reference = NULL,
  zipdistopt = NULL, begin = NULL, level = NULL, eps = 1e-05,
  lowerlim = 0.001, maxit = 400L, maxchange = 1e+06, leftint = NULL,
  lefttrunc = NULL, state = NULL, algorithm = NULL, criterion = NULL, 
  proportion = NULL, startmodel = NULL, trace = NULL, 
  steps = NULL, CI = NULL, bootstrapsamples = NULL, ...)

Arguments

`model.name`	character, specify a base name model output files are named with in `outfile`.
`family`	character, specify the distribution used for the model, options for all methods, `"MCMC"`, `"REML"` and `"STEP"` are: `"binomial"`, `"binomialprobit"`, `"gamma"`, `"gaussian"`, `"multinomial"`, `"poisson"`. For `"MCMC"` and `"REML"` only: `"cox"`, `"cumprobit"` and `"multistate"`. For `"REML"` only use: `"binomialcomploglog"`, `"cumlogit"`, `"multinomialcatsp"`, `"multinomialprobit"`, `"seqlogit"`, `"seqprobit"`.
`method`	character, which method should be used for estimation, options are `"MCMC"`, `"HMCMC"` (hierarchical MCMC), `"REML"` and `"STEP"`.
`verbose`	logical, should output be printed to the `R` console during runtime of `bayesx`.
`dir.rm`	logical, should the the `output` files and directory removed after estimation?
`outfile`	character, specify a directory where `bayesx` should store all output files, all output files will be named with `model.name` as the base name.
`replace`	if set to `TRUE`, the files in the output directory specified in argument `outfile` will be replaced.
`iterations`	integer, sets the number of iterations for the sampler.
`burnin`	integer, sets the burn-in period of the sampler.
`maxint`	integer, if first or second order random walk priors are specified, in some cases the data will be slightly grouped: The range between the minimal and maximal observed covariate values will be divided into (small) intervals, and for each interval one parameter will be estimated. The grouping has almost no effect on estimation results as long as the number of intervals is large enough. With the `maxint` option the amount of grouping can be determined by the user. integer is the maximum number of intervals allowed. for equidistant data, the default `maxint = 150` for example, means that no grouping will be done as long as the number of different observations is equal to or below 150. for non equidistant data some grouping may be done even if the number of different observations is below 150.
`step`	integer, defines the thinning parameter for MCMC simulation. E.g., `step = 50` means, that only every 50th sampled parameter will be stored and used to compute characteristics of the posterior distribution as means, standard deviations or quantiles. The aim of thinning is to reach a considerable reduction of disk storing and autocorrelations between sampled parameters.
`predict`	logical, option `predict` may be specified to compute samples of the deviance `D`, the effective number of parameters `pD` and the deviance information criterion `DIC` of the model. In addition, if `predict = FALSE`, only output files of estimated effects will be returned, otherwise an expanded dataset using all observations would be written in the output directory, also containing the data used for estimation. Hence, this option is useful when dealing with large data sets, that might cause memory problems if `predict` is set to `TRUE`.
`seed`	integer, set the seed of the random number generator in BayesX, usually set using function `set.seed`.
`hyp.prior`	numeric, defines the value of the hyper-parameters `a` and `b` for the inverse gamma prior of the overall variance parameter `\sigma^2`, if the response distribution is Gaussian. `numeric`, must be a positive real valued number. The default is `hyp.prior = c(1, 0.005)`.
`distopt`	character, defines the implemented formulation for the negative binomial model if the response distribution is negative binomial. The two possibilities are to work with a negative binomial likelihood (`distopt = "nb"`) or to work with the Poisson likelihood and the multiplicative random effects (`distopt = "poga"`).
`reference`	character, option `reference` is meaningful only if either `family = "multinomial"` or `family = "multinomialprobit"` is specified as the response distribution. In this case `reference` defines the `reference` category to be chosen. Suppose, for instance, that the response is three categorical with categories 1, 2 and 3. Then `reference = 2` defines the value 2 to be the `reference` category.
`zipdistopt`	character, defines the zero inflated distribution for the regression analysis. The two possibilities are to work with a zero infated Poisson distribution (`zipdistopt = "zip"`) or to work with the zero inflated negative binomial likelihood (`zipdistopt = "zinb"`).
`begin`	character, option `begin` is meaningful only if `family = "cox"` is specified as the response distribution. In this case begin specifies the variable that records when the observation became at risk. This option can be used to handle left truncation and time-varying covariates. If `begin` is not specified, all observations are assumed to have become at risk at time 0.
`level`	integer, besides the posterior means and medians, BayesX provides point-wise posterior credible intervals for every effect in the model. In a Bayesian approach based on MCMC simulation techniques credible intervals are estimated by computing the respective quantiles of the sampled effects. By default, BayesX computes (point-wise) credible intervals for nominal levels of 80`\%` and 95`\%`. The option `level[1]` allows to redefine one of the nominal levels (95`\%`). Adding, for instance, `level[1] = 99` to the options list computes credible intervals for a nominal level of 99`\%` rather than 95`\%`. Similar to argument `level[1]` the option `level[2]` allows to redefine one of the nominal levels (80`\%`). Adding, for instance, `level[2] = 70` to the options list computes credible intervals for a nominal level of 70`\%` rather than 80`\%`.
`eps`	numeric, defines the termination criterion of the estimation process. If both the relative changes in the regression coefficients and the variance parameters are less than `eps`, the estimation process is assumed to have converged.
`lowerlim`	numeric, since small variances are close to the boundary of their parameter space, the usual fisher-scoring algorithm for their determination has to be modified. If the fraction of the penalized part of an effect relative to the total effect is less than `lowerlim`, the estimation of the corresponding variance is stopped and the estimator is defined to be the current value of the variance (see section 6.2 of the BayesX methodology manual for details).
`maxit`	integer, defines the maximum number of iterations to be used in estimation. Since the estimation process will not necessarily converge, it may be useful to define an upper bound for the number of iterations. Note, that BayesX returns results based on the current values of all parameters even if no convergence could be achieved within `maxit` iterations, but a warning message will be printed in the output window.
`maxchange`	numeric, defines the maximum value that is allowed for relative changes in parameters in one iteration to prevent the program from crashing because of numerical problems. Note, that BayesX produces results based on the current values of all parameters even if the estimation procedure is stopped due to numerical problems, but an error message will be printed in the output window.
`leftint`	character, gives the name of the variable that contains the lower (left) boundary `T_{lo}` of the interval `[T_{lo}, T_{up}]` for an interval censored observation. for right censored or uncensored observations we have to specify `T_{lo} = T_{up}` . If leftint is missing, all observations are assumed to be right censored or uncensored, depending on the corresponding value of the censoring indicator.
`lefttrunc`	character, option `lefttrunc` specifies the name of the variable containing the left truncation time `T_{tr}`. For observations that are not truncated, we have to specify `T_{tr} = 0`. If `lefttrunc` is missing, all observations are assumed to be not truncated. for multi-state models variable `lefttrunc` specifies the left endpoint of the corresponding time interval.
`state`	character, for multi-state models, `state` specifies the current state variable of the process.
`algorithm`	character, specifies the selection algorithm. Possible values are `"cdescent1"` (adaptive algorithms in the methodology manual, see subsection 6.3), `"cdescent2"` (adaptive algorithms 1 and 2 with backfitting, see remarks 1 and 2 of section 3 in Belitz and Lang (2008)), `"cdescent3"` (search according to cdescent1 followed by cdescent2 using the selected model in the first step as the start model) and `"stepwise"` (stepwise algorithm implemented in the `gam` routine of S-plus, see Chambers and Hastie, 1992). This option will rarely be specified by the user.
`criterion`	character, specifies the goodness of fit criterion. If `criterion = "MSEP"` is specified the data are randomly divided into a test- and validation data set. The test data set is used to estimate the models and the validation data set is used to estimate the mean squared prediction error (MSEP) which serves as the goodness of fit criterion to compare different models. The proportion of data used for the test and validation sample can be specified using option proportion, see below. The default is to use 75% of the data for the training sample.
`proportion`	numeric, this option may be used in combination with option `criterion = "MSEP"`, see above. In this case the data are randomly divided into a training and a validation sample. proportion defines the fraction (between 0 and 1) of the original data used as training sample.
`startmodel`	character, defines the start model for variable selection. Options are `"linear"`, `"empty"`, `"full"` and `"userdefined"`.
`trace`	character, specifies how detailed the output in the output window will be. Options are `"trace_on"`, `"trace_half"` and `"trace_off"`.
`steps`	integer, defines the maximum number of iterations. If the selection process has not converged after `steps` iterations the algorithm terminates and a warning is raised. Setting `steps = 0` allows the user to estimate a certain model without any model choice. This option will rarely be specified by the user.
`CI`	character, compute confidence intervals for linear and nonlinear terms. Option `CI` allows to compute confidence intervals. Options are `CI = "none"`, confidence intervals conditional on the selected model `CI = "MCMCselect"` and unconditional confidence intervals where model uncertainty is taken into account `CI = "MCMCbootstrap"`. Both alternatives are computer intensive. Conditional confidence intervals take much less computing time than unconditional intervals. The advantage of unconditional confidence intervals is that sampling distributions for the degrees of freedom or smoothing parameters are obtained.
`bootstrapsamples`	integer, defines the number of bootstrap samples used for `"CI = MCMCbootstrap"`.
`...`	not used

Value

A list with the arguments specified is returned.

Author(s)

Nikolaus Umlauf, Thomas Kneib, Stefan Lang, Achim Zeileis.

References

For methodological and reference details see the BayesX manuals available at: https://www.uni-goettingen.de/de/bayesx/550513.html.

Belitz C, Lang S (2008). Simultaneous selection of variables and smoothing parameters in structured additive regression models. Computational Statistics & Data Analysis, 53, 61–81.

Chambers JM, Hastie TJ (eds.) (1992). Statistical Models in S. Chapman & Hall, London.

Umlauf N, Adler D, Kneib T, Lang S, Zeileis A (2015). Structured Additive Regression Models: An R Interface to BayesX. Journal of Statistical Software, 63(21), 1–46. https://www.jstatsoft.org/v63/i21/

Examples

bayesx.control()

## Not run: 
set.seed(111)
n <- 500
## regressors
dat <- data.frame(x = runif(n, -3, 3))
## response
dat$y <- with(dat, 10 + sin(x) + rnorm(n, sd = 0.6))

## estimate models with
## bayesx MCMC and REML
b1 <- bayesx(y ~ sx(x), method = "MCMC", data = dat)
b2 <- bayesx(y ~ sx(x), method = "REML", data = dat)

## compare reported output
summary(b1)
summary(b2)

## End(Not run)

R2BayesX documentation built on Aug. 2, 2024, 3 a.m.