threeboost: Thresholded EEBoost

Description Usage Arguments Details Value See Also Examples

View source: R/threeboost.R

Description

Run the thresholded EEBoost procedure.

Usage

1
2
3
threeboost(Y, X, EE.fn, b.init = rep(0, ncol(X)), eps = 0.01,
  maxit = 1000, itertrack = FALSE, reportinterval = 1,
  stop.rule = "on.repeat", thresh = 1)

Arguments

Y

Vector of outcomes.

X

Matrix of predictors. Will be automatically scaled using the scale function.

EE.fn

Estimating function taking arguments Y, X, and parameter vector b.

b.init

Initial parameter values. For variable selection, typically start with a vector of zeroes (the default).

eps

Step length. Default is 0.01, value should be relatively small.

maxit

Maximum number of iterations. Default is 1000.

itertrack

Indicates whether or not diagnostic information should be printed out at each iteration. Default is FALSE.

reportinterval

If itertrack is TRUE, how many iterations the algorithm should wait between each diagnostic report.

stop.rule

Rule for stopping the iterations before maxit is reached. Possible values are "on.repeat" and "pct.change". See 'Details' for more information.

thresh

Threshold parameter for ThrEEBoost.

Details

threeboost Implements a thresholded version of the EEBoost algorithm described in Wolfson (2011, JASA). EEBoost is a general-purpose method for variable selection which can be applied whenever inference would be based on an estimating equation. The package currently implements variable selection based on the Generalized Estimating Equations, but can also accommodate user-provided estimating functions. Thresholded EEBoost is a generalization which allows multiple variables to enter the model at each boosting step. Thresholded EEBoost with thresholding parameter = 1 is equivalent to EEBoost.

Typically, the boosting procedure is run for maxit iterations, producing maxit models defined by a set of regression coefficients. An additional step (e.g. model scoring, cross-validated estimate of prediction error) is needed to select a final model. However, an alternative is to stop the iterations before maxit is reached. The user can request this feature by setting stop.rule to one of the following options:

Value

A matrix with maxit rows and ncol(X) columns, with each row containing the parameter vector from an iteration of ThrEEBoost.

See Also

geeboost for an example of how to call (Thr)EEBoost with a custom estimating function.

Wolfson, J. EEBoost: A general method for prediction and variable selection using estimating equations. Journal of the American Statistical Association, 2011.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
library(Matrix)

# Generate some test data - uses 'mvtnorm' package
n <- 30
n.var <- 50
clust.size <- 4
B <- c(rep(2,5),rep(0.2,5),rep(0.05,10),rep(0,n.var-20))
mn.X <- rep(0,n.var)
sd.X <- 0.5
rho.X <- 0.3
cov.sig.X <- sd.X^2*((1-rho.X)*diag(rep(1,10)) + rho.X*matrix(data=1,nrow=10,ncol=10))
sig.X <- as.matrix( Matrix::bdiag(lapply(1:(n.var/10),function(x) { cov.sig.X } ) ) )
sd.Y <- 0.5
rho.Y <- 0.3
indiv.Sig <- sd.Y^2*( (1-rho.Y)*diag(rep(1,4)) + rho.Y*matrix(data=1,nrow=4,ncol=4) )
sig.list <- list(length=n)
for(i in 1:n) { sig.list[[i]] <- indiv.Sig }
Sig <- Matrix::bdiag(sig.list)
indiv.index <- rep(1:n,each=clust.size)
sig.Y <- as.matrix(Sig)
if(require(mvtnorm)) {
X <- mvtnorm::rmvnorm(n*clust.size,mean=mn.X,sigma=sig.X)
mn.Y <- X %*% B
## Correlated continuous outcome
Y <- mvtnorm::rmvnorm(1,mean=mn.Y,sigma=sig.Y)
} else { stop('Need mvtnorm package to generate correlated example data.') }

## Define the Gaussian GEE estimating function with independence working correlation
mu.Lin <- function(eta){eta}
g.Lin <- function(m){m}
v.Lin <- function(eta){rep(1,length(eta))}

 EE.fn.ind <- function(Y,X,b) {
 ee.GEE(Y,X,b,
 mu.Y=mu.Lin,
 g.Y=g.Lin,
 v.Y=v.Lin,
 aux=function(...) { ee.GEE.aux(...,mu.Y=mu.Lin,g.Y=g.Lin,v.Y=v.Lin) },
 id=indiv.index,
 corstr="ind")
}

## These two give the same result
coef.mat <- eeboost(Y,X,EE.fn.ind,maxit=250)
coef.mat2 <- geeboost(Y,X,id=indiv.index,family="gaussian",corstr="ind",maxit=250)$coefmat

par(mfrow=c(1,2))
coef_traceplot(coef.mat)
coef_traceplot(coef.mat2)

threeboost documentation built on May 2, 2019, 2:37 a.m.