inbagg: Indirect Bagging
In ipred: Improved Predictors

inbagg

R Documentation

Indirect Bagging

Description

Function to perform the indirect bagging and subagging.

Usage

## S3 method for class 'data.frame'
inbagg(formula, data, pFUN=NULL, 
  cFUN=list(model = NULL, predict = NULL, training.set = NULL), 
  nbagg = 25, ns = 0.5, replace = FALSE, ...)

Arguments

`formula`	formula. A `formula` specified as `y~w1+w2+w3~x1+x2+x3` describes how to model the intermediate variables `w1, w2, w3` and the response variable `y`, if no other formula is specified by the elements of `pFUN` or in `cFUN`
`data`	data frame of explanatory, intermediate and response variables.
`pFUN`	list of lists, which describe models for the intermediate variables, details are given below.
`cFUN`	either a fixed function with argument `newdata` and returning the class membership by default, or a list specifying a classifying model, similar to one element of `pFUN`. Details are given below.
`nbagg`	number of bootstrap samples.
`ns`	proportion of sample to be drawn from the learning sample. By default, subagging with 50% is performed, i.e. draw 0.5*n out of n without replacement.
`replace`	logical. Draw with or without replacement.
`...`	additional arguments (e.g. `subset`).

Details

A given data set is subdivided into three types of variables: explanatory, intermediate and response variables.

Here, each specified intermediate variable is modelled separately following pFUN, a list of lists with elements specifying an arbitrary number of models for the intermediate variables and an optional element training.set = c("oob", "bag", "all"). The element training.set determines whether, predictive models for the intermediate are calculated based on the out-of-bag sample ("oob"), the default, on the bag sample ("bag") or on all available observations ("all"). The elements of pFUN, specifying the models for the intermediate variables are lists as described in inclass. Note that, if no formula is given in these elements, the functional relationship of formula is used.

The response variable is modelled following cFUN. This can either be a fixed classifying function as described in Peters et al. (2003) or a list, which specifies the modelling technique to be applied. The list contains the arguments model (which model to be fitted), predict (optional, how to predict), formula (optional, of type y~w1+w2+w3+x1+x2 determines the variables the classifying function is based on) and the optional argument training.set = c("fitted.bag", "original", "fitted.subset") specifying whether the classifying function is trained on the predicted observations of the bag sample ("fitted.bag"), on the original observations ("original") or on the predicted observations not included in a defined subset ("fitted.subset"). Per default the formula specified in formula determines the variables, the classifying function is based on.

Note that the default of cFUN = list(model = NULL, training.set = "fitted.bag") uses the function rpart and the predict function predict(object, newdata, type = "class").

Value

An object of class "inbagg", that is a list with elements

`mtrees`	a list of length `nbagg`, describing the prediction models corresponding to each bootstrap sample. Each element of `mtrees` is a list with elements `bindx` (observations of bag sample), `btree` (classifying function of bag sample) and `bfct` (predictive models for intermediates of bag sample).
`y`	vector of response values.
`W`	data frame of intermediate variables.
`X`	data frame of explanatory variables.

References

David J. Hand, Hua Gui Li, Niall M. Adams (2001), Supervised classification with structured class definitions. Computational Statistics & Data Analysis 36, 209–225.

Andrea Peters, Berthold Lausen, Georg Michelson and Olaf Gefeller (2003), Diagnosis of glaucoma by indirect classifiers. Methods of Information in Medicine 1, 99-103.

Examples


library("MASS")
library("rpart")
y <- as.factor(sample(1:2, 100, replace = TRUE))
W <- mvrnorm(n = 200, mu = rep(0, 3), Sigma = diag(3))
X <- mvrnorm(n = 200, mu = rep(2, 3), Sigma = diag(3))
colnames(W) <- c("w1", "w2", "w3") 
colnames(X) <- c("x1", "x2", "x3") 
DATA <- data.frame(y, W, X)


pFUN <- list(list(formula = w1~x1+x2, model = lm, predict = mypredict.lm),
list(model = rpart))

inbagg(y~w1+w2+w3~x1+x2+x3, data = DATA, pFUN = pFUN)

ipred documentation built on July 18, 2024, 3 p.m.