PGEE: Function to fit penalized generalized estimating equations
In PGEE: Penalized Generalized Estimating Equations in High-Dimension

Description Usage Arguments Value References See Also Examples

This function fits a penalized generalized estimating equation model to longitudinal data.

PGEE(formula, id, data, na.action = NULL, family = gaussian(link = "identity"), 
corstr = "independence", Mv = NULL, beta_int = NULL, R = NULL, scale.fix = TRUE, 
scale.value = 1, lambda, pindex = NULL, eps = 10^-6, maxiter = 30, tol = 10^-3, 
silent = TRUE)

`formula`	A formula expression in the form of `response ~ predictors`.
`id`	A vector for identifying subjects/clusters.
`data`	A data frame which stores the variables in `formula` with `id` variable.
`na.action`	A function to remove missing values from the data. Only `na.omit` is allowed here.
`family`	A `family` object: a list of functions and expressions for defining `link` and `variance` functions. Families supported in `PGEE` are `binomial`, `gaussian`, `gamma` and `poisson`. The `links`, which are not available in `gee`, is not available here. The default family is `gaussian`.
`corstr`	A character string, which specifies the type of correlation structure. Structures supported in `PGEE` are `"AR-1"`,`"exchangeable"`, `"fixed"`, `"independence"`, `"stat_M_dep"`,`"non_stat_M_dep"`, and `"unstructured"`. The default `corstr` type is `"independence"`.
`Mv`	If either `"stat_M_dep"`, or `"non_stat_M_dep"` is specified in `corstr`, then this assigns a numeric value for `Mv`. Otherwise, the default value is `NULL`.
`beta_int`	User specified initial values for regression parameters. The default value is `NULL`.
`R`	If `corstr = "fixed"` is specified, then `R` is a square matrix of dimension maximum cluster size containing the user specified correlation. Otherwise, the default value is `NULL`.
`scale.fix`	A logical variable; if true, the scale parameter is fixed at the value of `scale.value`. The default value is `TRUE`.
`scale.value`	If `scale.fix = TRUE`, this assignes a numeric value to which the scale parameter should be fixed. The default value is 1.
`lambda`	A numerical value for the penalization parameter of the scad function, which is estimated via cross-validation.
`pindex`	An index vector showing the parameters which are not subject to penalization. The default value is `NULL`. However, in case of a model with intercept, the intercept parameter should be never penalized.
`eps`	A numerical value for the epsilon used in minorization-maximization algorithm. The default value is `10^-6`.
`maxiter`	The number of iterations that is used in the estimation algorithm. The default value is `25`.
`tol`	The tolerance level that is used in the estimation algorithm. The default value is `10^-3`.
`silent`	A logical variable; if false, the regression parameter estimates at each iteration are printed. The default value is `TRUE`.

An object class of PGEE representing the fit.

Wang, L., Zhou, J., and Qu, A. (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics, 68, 353–360.

CVfit, MGEE

# Consider an example similar to example 1 
# in Wang et al. (2012).

# required R package 
library(mvtnorm)
# number of subjects
n <- 200
# number of covariates 
pn <- 10
# number of time points
m <- 4

# vector if subject ids
id.vect <- rep(1:n, each = m) 

# covariance matrix of (pn-1) number of continuous covariates 
X.sigma <- matrix(0,(pn-1),(pn-1))
{
for (i in 1:(pn-1))
X.sigma[i,] <- 0.5^(abs((1:(pn-1))-i))  
}

# generate matrix of covariates    
x.mat <- as.matrix(rmvnorm(n*m, mean = rep(0,(pn-1)), X.sigma))
x.mat <- cbind(rbinom(n*m,1, 0.5), x.mat)

# true values
beta.true <- c(2,3,1.5,2,rep(0,6))
sigma2 <- 1
rho <- 0.5
R <- matrix(rho,m,m)+diag(rep(1-rho,m))

# covariance matrix of error
SIGMA <- sigma2*R
error <- rmvnorm(n, mean = rep(0,m),SIGMA)

# generate longitudinal data with continuous outcomes
y.temp <- x.mat%*%beta.true
y.vect <- y.temp+as.vector(t(error))

mydata <- data.frame(id.vect,y.vect,x.mat) 
colnames(mydata) <- c("id","y",paste("x",1:length(beta.true),sep = ""))

###Input Arguments for CVfit fitting###
library(PGEE)
formula <- "y ~.-id-1"
data <- mydata
family <- gaussian(link = "identity")
lambda.vec <- seq(0.1,1,0.1)

## Not run: 
cv <- CVfit(formula = formula, id = id, data = data, family = family,
fold = 4, lambda.vec = lambda.vec, pindex = NULL, eps = 10^-6, maxiter = 30, 
tol = 10^-3)

names(cv)
cv$lam.opt

## End(Not run)

lambda <- 0.1 #this value obtained through CVfit

# analyze the data through penalized generalized estimating equations

myfit1 <- PGEE(formula = formula, id = id, data = data, na.action = NULL, 
family = family, corstr = "exchangeable", Mv = NULL, 
beta_int = c(rep(0,length(beta.true))), R = NULL, scale.fix = TRUE, 
scale.value = 1, lambda = lambda, pindex = NULL, eps = 10^-6, maxiter = 30, 
tol = 10^-3, silent = TRUE)

summary(myfit1)

# analyze the data through unpenalized generalized estimating equations

myfit2 <- MGEE(formula = formula, id = id, data = data, na.action = NULL, 
family = family, corstr = "exchangeable", Mv = NULL, 
beta_int = c(rep(0,length(beta.true))), R = NULL, scale.fix = TRUE, 
scale.value = 1, maxiter = 30, tol = 10^-3, silent = TRUE)

summary(myfit2)

Loading required package: MASS
Loading required package: mvtnorm
[1] "fold"     "lam.vect" "cv.vect"  "lam.opt"  "cv.min"   "call"    
[1] 0.1

 PGEE: PENALIZED GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA
 Version: 1.5 

Model:
 Link:                      Identity 
 Variance to Mean Relation: Gaussian 
 Correlation Structure:     Exchangeable 

Call:
PGEE(formula = formula, id = id, data = data, na.action = NULL, 
    family = family, corstr = "exchangeable", Mv = NULL, beta_int = c(rep(0, 
        length(beta.true))), R = NULL, scale.fix = TRUE, scale.value = 1, 
    lambda = lambda, pindex = NULL, eps = 10^-6, maxiter = 30, 
    tol = 10^-3, silent = TRUE)

Summary of Residuals:
        Min          1Q      Median          3Q         Max 
-18.9425216  -2.5521524   0.7964925   4.6966820  13.9160375 


Coefficients:
         Estimate   Naive S.E.       Naive z  Robust S.E.    Robust z
x1  -6.245202e-06 0.0006018853 -0.0103760674 1.104417e-05 -0.56547497
x2  -4.627128e-05 0.0015378055 -0.0300891614 2.210173e-04 -0.20935586
x3  -4.294162e-05 0.0014826252 -0.0289632320 1.931507e-04 -0.22232184
x4  -4.035857e-05 0.0014384216 -0.0280575380 1.928112e-04 -0.20931647
x5  -1.624205e-05 0.0009285880 -0.0174911271 5.611839e-05 -0.28942473
x6  -6.321007e-06 0.0006050500 -0.0104470823 2.190199e-05 -0.28860430
x7  -1.463376e-06 0.0003509603 -0.0041696357 8.016490e-06 -0.18254579
x8   1.176508e-06 0.0003298918  0.0035663463 6.367082e-06  0.18477983
x9  -2.466263e-07 0.0002496646 -0.0009878307 3.429643e-06 -0.07191021
x10  3.118394e-07 0.0002561115  0.0012175921 3.731282e-06  0.08357433

Estimated Scale Parameter:  1
Lambda value:  0.1
Number of Iterations:  1

Working Correlation
         [,1]     [,2]     [,3]     [,4]
[1,] 1.000000 3.170369 3.170369 3.170369
[2,] 3.170369 1.000000 3.170369 3.170369
[3,] 3.170369 3.170369 1.000000 3.170369
[4,] 3.170369 3.170369 3.170369 1.000000

 MGEE: GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA
 Version: 1.5 

Model:
 Link:                      Identity 
 Variance to Mean Relation: Gaussian 
 Correlation Structure:     Exchangeable 

Call:
MGEE(formula = formula, id = id, data = data, na.action = NULL, 
    family = family, corstr = "exchangeable", Mv = NULL, beta_int = c(rep(0, 
        length(beta.true))), R = NULL, scale.fix = TRUE, scale.value = 1, 
    maxiter = 30, tol = 10^-3, silent = TRUE)

Summary of Residuals:
        Min          1Q      Median          3Q         Max 
-3.16726037 -0.65312091 -0.01575528  0.58413091  2.56392373 


Coefficients:
        Estimate Naive S.E.    Naive z Robust S.E.   Robust z
x1   2.056580168 0.05274902 38.9880259  0.04913712 41.8539002
x2   3.033784586 0.03625679 83.6749382  0.03418611 88.7431836
x3   1.460362349 0.04022637 36.3036073  0.03785158 38.5812736
x4   2.029099874 0.03925276 51.6931724  0.03470327 58.4699954
x5  -0.039044223 0.04313002 -0.9052680  0.03649799 -1.0697636
x6   0.020445664 0.04079691  0.5011571  0.03690933  0.5539430
x7  -0.025699503 0.03978206 -0.6460074  0.03796402 -0.6769437
x8   0.023445883 0.04092588  0.5728864  0.03310780  0.7081679
x9   0.004504639 0.03925565  0.1147514  0.03449533  0.1305869
x10 -0.015665129 0.03516755 -0.4454427  0.02857788 -0.5481558

Estimated Scale Parameter:  1
Number of Iterations:  4

Working Correlation
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.3824306 0.3824306 0.3824306
[2,] 0.3824306 1.0000000 0.3824306 0.3824306
[3,] 0.3824306 0.3824306 1.0000000 0.3824306
[4,] 0.3824306 0.3824306 0.3824306 1.0000000