robustgam.CV: Smoothing parameter selection by robust cross validation

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/robustgam.crossval.R

Description

This function combines the robustgam with automatic smoothing parameter selection. The smoothing parameter is selected through robust cross validation criterion described in Wong, Yao and Lee (2013). The criterion is designed to be robust to outliers. This function uses grid search to find the smoothing parameter that minimizes the criterion.

Usage

1
2
3
robustgam.CV(X, y, family, p=3, K=30, c=1.345, show.msg=FALSE, count.lim=200,
             w.count.lim=50, smooth.basis="tp", wx=FALSE, sp.min=1e-7, sp.max=1e-3,
             len=50, show.msg.2=TRUE, ngroup=length(y), seed=12345)

Arguments

X

a vector or a matrix (each covariate form a column) of covariates

y

a vector of responses

family

A family object specifying the distribution and the link function. See glm and family.

p

order of the basis. It depends on the option of smooth.basis.

K

number of knots of the basis; dependent on the option of smooth.basis.

c

tunning parameter for Huber function; a smaller value of c corresponds to a more robust fit. It is recommended to set as 1.2 and 1.6 for binomial and poisson distribution respectively.

show.msg

If show.msg=T, progress of robustgam is displayed.

count.lim

maximum number of iterations of the whole algorithm

w.count.lim

maximum number of updates on the weight. It corresponds to zeta in Wong, Yao and Lee (2013)

smooth.basis

the specification of basis. Four choices are available: "tp" = thin plate regression spline, "cr" = cubic regression spline, "ps" = P-splines, "tr" = truncated power spline. For more details, see smooth.construct.

wx

If wx=T, robust weight on the covariates are applied. For details, see Real Data Example in Wong, Yao and Lee (2013)

sp.min

A vector of minimum values of the searching range for smoothing parameters. If only one value is specified, it will be used for all smoothing parameters.

sp.max

A vector of maximum values of the searching range for smoothing parameters. If only one value is specified, it will be used for all smoothing parameters.

len

A vector of grid sizes. If only one value is specified, it will be used for all smoothing parameters.

show.msg.2

If show.msg.2=T, progress of the grid search is displayed.

ngroup

number of group used in the cross validation. If ngroup=length(y), full cross validation is implemented. If ngroup=2, two-fold cross validation is implemented.

seed

The seed for random generator used in generating partitions.

Value

fitted.values

fitted values (of the optimum fit)

initial.fitted

the starting values of the algorithm (of the optimum fit)

beta

estimated coefficients (corresponding to the basis) (of the optimum fit)

optim.index

the index of the optimum fit

optim.index2

the index of the optimum fit in another representation:

optim.ndex2=arrayInd(optim.index,len)

optim.criterion

the optimum value of robust cross validation criterion

optim.sp

the optimum value of the smoothing parameter

criteria

the values of criteria for all fits during grid search

sp

the grid of smoothing parameter

optim.fit

the robustgam fit object of the optimum fit. It is handy for applying the prediction method.

Author(s)

Raymond K. W. Wong <raymondkww.dev@gmail.com>

References

Raymond K. W. Wong, Fang Yao and Thomas C. M. Lee (2013) Robust Estimation for Generalized Additive Models. Journal of Graphical and Computational Statistics, to appear.

See Also

robustgam.GIC, robustgam.GIC.optim, robustgam.CV, pred.robustgam

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# load library
library(robustgam)

# test function
test.fun <- function(x, ...) {
  return(2*sin(2*pi*(1-x)^2))
}

# some setting
set.seed(1234)
true.family <- poisson()
out.prop <- 0.05
n <- 100

# generating dataset for poisson case
x <- runif(n)
x <- x[order(x)]
true.eta <- test.fun(x)
true.mu <- true.family$linkinv(test.fun(x))
y <- rpois(n, true.mu) # for poisson case

# create outlier for poisson case
out.n <- trunc(n*out.prop)
out.list <- sample(1:n, out.n, replace=FALSE)
y[out.list] <- round(y[out.list]*runif(out.n,min=3,max=5)^(sample(c(-1,1),out.n,TRUE)))

## Not run: 

# robust GAM fit
robustfit.gic <- robustgam.CV(x, y, family=true.family, p=3, c=1.6, show.msg=FALSE,
  count.lim=400, smooth.basis='tp',ngroup=5); robustfit <- robustfit.gic$optim.fit


# ordinary GAM fit
nonrobustfit <- gam(y~s(x, bs="tp", m=3),family=true.family) # m = p for 'tp'

# prediction
x.new <- seq(range(x)[1], range(x)[2], len=1000)
robustfit.new <- pred.robustgam(robustfit, data.frame(X=x.new))$predict.values
nonrobustfit.new <- as.vector(predict.gam(nonrobustfit,data.frame(x=x.new),type="response"))

# plot
plot(x, y)
lines(x.new, true.family$linkinv(test.fun(x.new)), col="blue")
lines(x.new, robustfit.new, col="red")
lines(x.new, nonrobustfit.new, col="green")
legend(0.6, 23, c("true mu", "robust fit", "nonrobust fit"), col=c("blue","red","green"),
  lty=c(1,1,1))


## End(Not run)

Example output

Loading required package: Rcpp
Loading required package: RcppArmadillo
Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-28. For overview type 'help("mgcv-package")'.
Loading required package: robustbase
####  1 : sp= 1e-07  finished. ####
####  2 : sp= 1.206793e-07  finished. ####
####  3 : sp= 1.456348e-07  finished. ####
####  4 : sp= 1.757511e-07  finished. ####
####  5 : sp= 2.120951e-07  finished. ####
####  6 : sp= 2.559548e-07  finished. ####
####  7 : sp= 3.088844e-07  finished. ####
####  8 : sp= 3.727594e-07  finished. ####
####  9 : sp= 4.498433e-07  finished. ####
####  10 : sp= 5.428675e-07  finished. ####
####  11 : sp= 6.551286e-07  finished. ####
####  12 : sp= 7.906043e-07  finished. ####
####  13 : sp= 9.540955e-07  finished. ####
####  14 : sp= 1.151395e-06  finished. ####
####  15 : sp= 1.389495e-06  finished. ####
####  16 : sp= 1.676833e-06  finished. ####
####  17 : sp= 2.02359e-06  finished. ####
####  18 : sp= 2.442053e-06  finished. ####
####  19 : sp= 2.947052e-06  finished. ####
####  20 : sp= 3.55648e-06  finished. ####
####  21 : sp= 4.291934e-06  finished. ####
####  22 : sp= 5.179475e-06  finished. ####
####  23 : sp= 6.250552e-06  finished. ####
####  24 : sp= 7.54312e-06  finished. ####
####  25 : sp= 9.102982e-06  finished. ####
####  26 : sp= 1.098541e-05  finished. ####
####  27 : sp= 1.325711e-05  finished. ####
####  28 : sp= 1.599859e-05  finished. ####
####  29 : sp= 1.930698e-05  finished. ####
####  30 : sp= 2.329952e-05  finished. ####
####  31 : sp= 2.811769e-05  finished. ####
####  32 : sp= 3.393222e-05  finished. ####
####  33 : sp= 4.094915e-05  finished. ####
####  34 : sp= 4.941713e-05  finished. ####
####  35 : sp= 5.963623e-05  finished. ####
####  36 : sp= 7.196857e-05  finished. ####
####  37 : sp= 8.685114e-05  finished. ####
####  38 : sp= 0.0001048113  finished. ####
####  39 : sp= 0.0001264855  finished. ####
####  40 : sp= 0.0001526418  finished. ####
####  41 : sp= 0.000184207  finished. ####
####  42 : sp= 0.0002222996  finished. ####
####  43 : sp= 0.0002682696  finished. ####
####  44 : sp= 0.0003237458  finished. ####
####  45 : sp= 0.000390694  finished. ####
####  46 : sp= 0.0004714866  finished. ####
####  47 : sp= 0.0005689866  finished. ####
####  48 : sp= 0.0006866488  finished. ####
####  49 : sp= 0.0008286428  finished. ####
####  50 : sp= 0.001  finished. ####

robustgam documentation built on May 2, 2019, 3:23 a.m.