bagging: Bagging

View source: R/bagging.r

baggingR Documentation

Bagging

Description

Creates bagged tree estimates from diet data

Usage

bagging(formula, data, weights, subset, na.action = na.dpart,
            model = FALSE, x = FALSE, y = TRUE, parms, control, 
            cost, nBaggs,
                      spatial = list(fit = FALSE, sizeofgrid = 5, 
                      nsub = NULL, ID = NULL, LonID = "Longitude", 
                                               LatID = "Latitude"), 
                                               Plot = FALSE,
                                   predID, numCores = 1, ...)

Arguments

formula

a formula, with a response but no interaction terms as for the rpart function

data

an optional data frame in which to interpret the variables named in the formula

weights

case weights

subset

optional expression saying that only a subset of the rows of the data should be used in the fit.

na.action

The default action deletes all observations for which y is missing, but keeps those in which one or more predictors are missing.

model

if logical: keep a copy of the model frame in the result? If the input value for model is a model frame (likely from an earlier call to the rpart function), then this frame is used rather than constructing new data.

x

keep a copy of the x matrix in the result.

y

keep a copy of the dependent variable in the result. If missing and model is supplied this defaults to FALSE.

parms

optional parameters for the splitting function. For classification splitting, the list can contain any of: the vector of prior probabilities (component prior), the loss matrix (component loss) or the splitting index (component split). The priors must be positive and sum to 1. The loss matrix must have zeros on the diagonal and positive off-diagonal elements. The splitting index can be gini or information. The default priors are proportional to the data counts, the losses default to 1, and the split defaults to gini.

control

options that control details of the rpart algorithm.

cost

a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose.

nBaggs

numeric. Number of bootstrap samples.

spatial

A list with the following elements: fit = do spatial bootstrapping sizeofgrid = size of spatial tile to sample from (default is 5) nsub = number of sub-samples to take (defaults to no subsampling) ID = ID in which to subsample from (e.g. TripSetPredNo) (only required if sub-sampling is required)

Plot

plotting the spatial grid with samples (default: no plotting (FALSE))

predID

predator ID

numCores

Number of cores to push the bagging on to. Only available under Unix (default: 1)

...

arguments to be passed to or from other methods.

Details

Users will need to determine whether spatial bootstrapping is required. They can use the resid function to examine the residuals from the fit of the model to determine whether this is required.

Value

A list with the following elements:

baggs

tree objects for each B trees produced.

oob

numeric vector indicating the samples left as out of bag (oob) samples.

pred.oob

predicted prey composition for each set of out of bag samples.

pred

all predicted prey compositions for each bootstrap sample.

resid

data frame of residuals from the fitted tree for each bootstrap sample.

data

bootstrap sample dataset

References

Kuhnert, P.M., Duffy, L. M and Olson, R.J. (2012) The Analysis of Predator Diet and Stable Isotope Data, Journal of Statistical Software, In Prep.

Kuhnert PM, Kinsey-Henderson A, Bartley R, Herr A (2010) Incorporating uncertainty in gully erosion calculations using the random forests modelling approach. Environmetrics 21:493-509. doi:10.1002/env.999

Breiman L (1996) Bagging predictors. Mach Learn 24:123-140. doi:10.1023/A:1018054314350

Breiman L (1998) Arcing classifiers (with discussion). Ann Stat 26:801-824. doi:10.2307/120055

Breiman L (2001) Random forests. Mach Learn 45:5-32. doi:10.1023/A:1010933404324

Examples


# Assigning prey colours for default palette
#val <- apc(x = yftdiet, preyfile = PreyTaxonSort, check = TRUE)
#node.colsY <- val$cols
#dietPP <- val$x   # updated diet matrix with Group assigned prey taxa codes


# Bagging
# Bagging with NO spatial bootstrapping
# N.B. Not run as this takes a while
#yft.bag <- bagging(Group ~ Lat + Lon + Year + Quarter + SST  + Length,
#                      data = dietPP, weights = W, minsplit = 50,
#                       cp = 0.001, nBaggs = 500, predID = "TripSetPredNo")
#

                     

pkuhnert/diet documentation built on June 10, 2025, 2:59 a.m.