Description Usage Arguments Details Value Note Author(s) References Examples
Function to perform BiSEE, a Bi-Level Boosting / Functional Gradient Descent / Forward Stagewise regression in the grouped covariates setting using Generalized Estimating Equations
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | bisee(y, ...)
## S3 method for class 'formula'
bisee(formula, data = list(), clusterID, waves = NULL,
lambda1, lambda2 = 1 - lambda1, contrasts = NULL, subset, ...)
## Default S3 method:
bisee(y, x, waves = NULL, lambda1, lambda2 = 1 - lambda1,
...)
## S3 method for class 'fit'
bisee(y, x, family, clusterID, waves = NULL, groupID,
corstr = "independence", alpha = NULL, lambda1 = 0.5, lambda2 = 1 -
lambda1, intercept = TRUE, offset = 0, control = sgee.control(maxIt =
200, epsilon = 0.05, stoppingThreshold = min(length(y), ncol(x)) - intercept,
undoThreshold = 0.005), standardize = TRUE, verbose = FALSE, ...)
gsee(y, x, family, clusterID, waves = NULL, groupID = 1:ncol(x),
corstr = "independence", alpha = NULL, offset = 0, intercept = TRUE,
control = sgee.control(maxIt = 200, epsilon = 0.05, stoppingThreshold =
min(length(y), ncol(x)) - intercept, undoThreshold = 0.005),
standardize = TRUE, verbose = FALSE, ...)
|
y |
Vector of response measures that corresponds with modeling family
given in 'family' parameter. |
... |
Not currently used |
formula |
Object of class 'formula'; a symbolic description of the model to be fitted |
data |
Optional data frame containing the variables in the model. |
clusterID |
Vector of integers that identifies the clusters of response
measures in |
waves |
An integer vector which identifies components in clusters.
The length of |
lambda1 |
Mixing parameter used to indicate weight of $L_2$ Norm
(group selection). While not necessary, |
lambda2 |
Mixing parameter used to indicate weight of $L_1$ Norm
(individual selection). While not necessary, |
contrasts |
An optional list provided when using a formula.
similar to |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
x |
Design matrix of dimension |
family |
Modeling family that describes the marginal distribution of
the response. Assumed to be an object such as |
groupID |
Vector of integeres that identifies the groups of the
covariates/coefficients (i.e. the columns of |
corstr |
A character string indicating the desired working correlation structure. The following are implemented : "independence" (default value), "exchangeable", and "ar1". |
alpha |
An initial guess for the correlation parameter value between -1 and 1 . If left NULL (the default), the initial estimate is 0. |
intercept |
Binary value indicating where an intercept term is to be included in the model for estimation. Default is to include an intercept. |
offset |
Vector of offset value(s) for the linear predictor.
|
control |
A list of parameters used to contorl the path generation
process; see |
standardize |
A logical parameter that indicates whether or not
the covariates need to be standardized before fitting.
If standardized before fitting, the unstandardized
path is returned as the default, with a |
verbose |
Logical parameter indicating whether output should be produced while bisee is running. Default value is FALSE. |
Function to implement BiSEE, a stagewise regression approach
that is designed to perform bi-level selection in the context of
Generalized Estimating Equations. Given a response y
and
a design matrix x
(excluding intercept) BiSEE generates a path of stagewise regression
estimates for each covariate based on the provided step size epsilon,
and tuning parameters lambda1
and lambda2
. When
lambda1 == 0
or lambda2 == 0
, the simplified versions
of bisee
called see
and gsee
, respectively,
will be called.
The resulting path can then be analyzed to determine an optimal
model along the path of coefficient estimates. The
summary.sgee
function provides such
functionality based on various
possible metrics, primarily focused on the Mean Squared Error.
Furthermore, the plot.sgee
function can be used to examine the
path of coefficient estimates versus the iteration number, or some
desired penalty.
bisee
makes use of the function uniroot in the stats package.
The extendInt
parameter for uniroot
is used, which may
cause issues for older versions of R.
Object of class sgee
containing the path
of coefficient estimates,
the path of scale estimates, the path of correlation parameter
estimates, the iteration at which BiSEE terminated, and initial regression
values including x
, y
, codefamily, clusterID
,
groupID
, offset
, epsilon
, and numIt
.
Function to execute BiSEE technique. Note that lambda1
and lambda2
are tuning parameters. Though it is advised to
fix lambda1 + lambda2 = 1
, this is not necessary. These parameters
can be tuned using various approaches including cross validation.
Gregory Vaughan
Vaughan, G., Aseltine, R., Chen, K., Yan, J., (2017). Stagewise Generalized Estimating Equations with Grouped Variables. Biometrics 73, 1332-1342. URL: http://dx.doi.org/10.1111/biom.12669, doi:10.1111/biom.12669.
Wolfson, J. (2011). EEBoost: A general method for prediction and variable selection based on estimating equations. Journal of the American Statistical Association 106, 296–305.
Tibshirani, R. J. (2015). A general framework for fast stagewise algorithms. Journal of Machine Learning Research 16, 2543–2588.
Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics 22, 231–245.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | #####################
## Generate test data
#####################
## Initialize covariate values
p <- 50
beta <- c(rep(2,5),
c(1, 0, 1.5, 0, .5),
rep(0.5,5),
rep(0,p-15))
groupSize <- 5
numGroups <- length(beta)/groupSize
generatedData <- genData(numClusters = 50,
clusterSize = 4,
clusterRho = 0.6,
clusterCorstr = "exchangeable",
yVariance = 1,
xVariance = 1,
numGroups = numGroups,
groupSize = groupSize,
groupRho = 0.3,
beta = beta,
family = gaussian(),
intercept = 1)
## Perform Fitting by providing y and x values
coefMat1 <- bisee(y = generatedData$y, x = generatedData$x,
family = gaussian(),
clusterID = generatedData$clusterID,
groupID = generatedData$groupID,
corstr = "exchangeable",
control = sgee.control(maxIt = 50, epsilon = 0.5),
lambda1 = .5,
lambda2 = .5,
verbose = TRUE)
## Perform Fitting by providing formula and data
genDF <- data.frame(generatedData$y, generatedData$x)
names(genDF) <- c("Y", paste0("Cov", 1:p))
coefMat2 <- bisee(formula(genDF), data = genDF,
family = gaussian(),
subset = Y <1.5,
waves = rep(1:4, 50),
clusterID = generatedData$clusterID,
groupID = generatedData$groupID,
corstr = "exchangeable",
control = sgee.control(maxIt = 50, epsilon = 0.5),
lambda1 = 0.5,
lambda2 = 0.5,
verbose = TRUE)
par(mfrow = c(2,1))
plot(coefMat1)
plot(coefMat2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.