rgam: Outlier-robust fit for Generalized Additive Models

Description Usage Arguments Details Value Author(s) References Examples

Description

rgam is used to obtain an outlier-robust fit for generalized additive models. It uses the backfitting algorithm with weights derived from robust quasi-likelihood equations. Currently, only local regression smoothers are supported. Bandwidth selection using robust and non-robust cross-validation criteria is currently only implemented for models with a single covariate.

Usage

1
2
3
rgam(x, y, family = c("poisson", "binomial"), ni = NULL, epsilon = 1e-08,
  max.it = 50, k = 1.5, trace = FALSE, cv.method = c("rcv", "cv", "dcv",
  "rdcv"), alpha = seq(0.1, 0.9, by = 0.1), s.i = NULL)

Arguments

x

a vector or matrix of covariates

y

a vector of responses

family

a character string indicating the assumed distribution of the response (conditional on the covariates). Only ‘poisson’ and ‘binomial’ are implemented. The link function is currently chosen to be the canonical link for the selected family (log for ‘poisson’ and logit for ‘binomial’)

ni

a vector of the same length as y containing the number of tries of the binomial distribution of each entry of y. Only relevant if the argument family equals ‘binomial’

epsilon

tolerance for the convergence of the robust local scoring algorithm

max.it

maximum number of robust local scoring iterations

k

tuning constant for the robust quasi-likelihood score equations. Large values of k make the estimators closer to the classical fit (and hence less robust), while smaller values of k produce a more robust fit. Values between 1.5 and 3 generally result in a fit with good robustness properties

trace

logical flag to turn on debugging output

cv.method

character string indicating which cross-validation criterion is to be mimized to select the bandwidth from the list given in the argument alphas. Accepted values are ‘rcv’ (for a weighted squared loss where the effect of outliers is reduced); ‘cv’ (for the “classical” squared loss); ‘dcv’ (for the classical deviance loss); ‘rdcv’ (for a robustly weighted deviance loss). See the references for more details

alpha

a scalar (for models with a single covariate it can be a vector of numbers) between 0 and 1. If length(alphas)==1, its value is used as bandwidth for the local regression smoother, as described in loess. If alphas is a vector, then the value that minimizes the cross-validation criterion specified in the argument ‘cv’ is used.

s.i

optional matrix of initial values for the additive predictors (including the intercept). If missing the predictors are initialized at zero and the intercept is taken to be the transformed sample mean of the responses.

Details

The gam model is fit using the robust local scoring algorithm, which iteratively fits weighted additive models by backfitting. The weights are derived from robust quasi-likelihood estimating equations and thus effectively reduce the potentially damaging effect of outliers.

Currently, this function only implements local regression smoothers (as calculated by loess). The method can be applied to other smoothers as well.

Value

returns an object of class rgam. It contains the following components:

additive.predictors

the additive fit, the sum of the columns of the $smooth component

fitted.values

the fitted mean values, obtained by transforming the component 'additive.predictors' using the inverse link function

smooth

the matrix of smooth terms, columns correspond to the smooth predictors in the model

iterations

number of robust local scoring iterations used

convergence.criterion

last relative change of the additive predictors

converged

a logical value indicating whether the algorithm stopped due to the relative change of consecutive additive predictors being less than the tolerance specified in the epsilon argument (TRUE) or because the maximum number of iterations (in the argument max.it) was reached (FALSE)

alpha

the candidate bandwidth values that were considered

cv.method

a character string indicating the cross-validation method used to choose the bandwidth of the smoother

cv.results

a vector of the cross-validation criteria values obtained with each entry of the argument alpha

opt.alpha

the value in the argument alpha that produced the smallest cross-validation criterion. This is the bandwidth used for the reported fit.

Author(s)

Matias Salibian-Barrera matias@stat.ubc.ca and Davor Cubranic cubranic@stat.ubc.ca

References

Azadeh, A. and Salibian-Barrera, M. (2011). An outlier-robust fit for Generalized Additive Models with applications to disease outbreak detection. To appear in the Journal of the American Statistical Association.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
x <- ili.visits$week
y <- ili.visits$visits
set.seed(123)
x <- x + rnorm(x, mean=0, sd=.01)
#
# the following command needs to run over 890 fits
# and takes about 22 mins on an Intel Xeon CPU (3.2GHz)
#
# a <- rgam(x=x, y=y, family='poisson', cv.method='rcv',
#  epsilon=1e-5, alpha=12:20/80, max.it=500)
#
# the optimal is found at alpha = 17/80
#
a <- rgam(x=x, y=y, family='poisson', cv.method='rcv',
epsilon=1e-7, alpha=17/80, max.it=500)

pr.rgam.a <- predict(a, type='response')
plot(x, y, xlab='Week', ylab='ILI visits', pch=19, col='grey75')
lines(x[order(x)], pr.rgam.a[order(x)], lwd=3, col='red')

Example output



rgam documentation built on May 2, 2019, 11:26 a.m.