Description Usage Arguments Details Value See Also Examples
Run the thresholded EEBoost procedure.
1 2 3 | threeboost(Y, X, EE.fn, b.init = rep(0, ncol(X)), eps = 0.01,
maxit = 1000, itertrack = FALSE, reportinterval = 1,
stop.rule = "on.repeat", thresh = 1)
|
Y |
Vector of outcomes. |
X |
Matrix of predictors. Will be automatically scaled using the |
EE.fn |
Estimating function taking arguments |
b.init |
Initial parameter values. For variable selection, typically start with a vector of zeroes (the default). |
eps |
Step length. Default is 0.01, value should be relatively small. |
maxit |
Maximum number of iterations. Default is 1000. |
itertrack |
Indicates whether or not diagnostic information should be printed out at each iteration. Default is |
reportinterval |
If |
stop.rule |
Rule for stopping the iterations before |
thresh |
Threshold parameter for ThrEEBoost. |
threeboost
Implements a thresholded version of the EEBoost algorithm described in Wolfson (2011, JASA).
EEBoost is a general-purpose method for variable selection which can be applied whenever inference would be based on an estimating equation.
The package currently implements variable selection based on the Generalized Estimating Equations, but can also accommodate
user-provided estimating functions. Thresholded EEBoost is a generalization which allows multiple variables to enter the model at each boosting step.
Thresholded EEBoost with thresholding parameter = 1 is equivalent to EEBoost.
Typically, the boosting procedure is run for maxit
iterations, producing maxit
models defined by a set of regression coefficients.
An additional step (e.g. model scoring, cross-validated estimate of prediction error) is needed to select a final model. However, an alternative is to stop the iterations
before maxit
is reached. The user can request this feature by setting stop.rule
to one of the following options:
"on.repeat"
: Sometimes, ThrEEBoost will alternate between stepping on the same two directions, usually indicating numerical problems. Setting stop.rule="on.oscillate"
will terminate the algorithm if this happens.
"pct.change"
: Stop if, for conseuctive iterations, the sum of the magnitudes of the elements of the estimating equation changes by < 1%.
A matrix with maxit
rows and ncol(X)
columns, with each row containing the parameter vector from an iteration of ThrEEBoost.
geeboost
for an example of how to call (Thr)EEBoost with a custom estimating function.
Wolfson, J. EEBoost: A general method for prediction and variable selection using estimating equations. Journal of the American Statistical Association, 2011.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | library(Matrix)
# Generate some test data - uses 'mvtnorm' package
n <- 30
n.var <- 50
clust.size <- 4
B <- c(rep(2,5),rep(0.2,5),rep(0.05,10),rep(0,n.var-20))
mn.X <- rep(0,n.var)
sd.X <- 0.5
rho.X <- 0.3
cov.sig.X <- sd.X^2*((1-rho.X)*diag(rep(1,10)) + rho.X*matrix(data=1,nrow=10,ncol=10))
sig.X <- as.matrix( Matrix::bdiag(lapply(1:(n.var/10),function(x) { cov.sig.X } ) ) )
sd.Y <- 0.5
rho.Y <- 0.3
indiv.Sig <- sd.Y^2*( (1-rho.Y)*diag(rep(1,4)) + rho.Y*matrix(data=1,nrow=4,ncol=4) )
sig.list <- list(length=n)
for(i in 1:n) { sig.list[[i]] <- indiv.Sig }
Sig <- Matrix::bdiag(sig.list)
indiv.index <- rep(1:n,each=clust.size)
sig.Y <- as.matrix(Sig)
if(require(mvtnorm)) {
X <- mvtnorm::rmvnorm(n*clust.size,mean=mn.X,sigma=sig.X)
mn.Y <- X %*% B
## Correlated continuous outcome
Y <- mvtnorm::rmvnorm(1,mean=mn.Y,sigma=sig.Y)
} else { stop('Need mvtnorm package to generate correlated example data.') }
## Define the Gaussian GEE estimating function with independence working correlation
mu.Lin <- function(eta){eta}
g.Lin <- function(m){m}
v.Lin <- function(eta){rep(1,length(eta))}
EE.fn.ind <- function(Y,X,b) {
ee.GEE(Y,X,b,
mu.Y=mu.Lin,
g.Y=g.Lin,
v.Y=v.Lin,
aux=function(...) { ee.GEE.aux(...,mu.Y=mu.Lin,g.Y=g.Lin,v.Y=v.Lin) },
id=indiv.index,
corstr="ind")
}
## These two give the same result
coef.mat <- eeboost(Y,X,EE.fn.ind,maxit=250)
coef.mat2 <- geeboost(Y,X,id=indiv.index,family="gaussian",corstr="ind",maxit=250)$coefmat
par(mfrow=c(1,2))
coef_traceplot(coef.mat)
coef_traceplot(coef.mat2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.