PGEE: Function to fit penalized generalized estimating equations

Description Usage Arguments Value References See Also Examples

Description

This function fits a penalized generalized estimating equation model to longitudinal data.

Usage

1
2
3
4
PGEE(formula, id, data, na.action = NULL, family = gaussian(link = "identity"), 
corstr = "independence", Mv = NULL, beta_int = NULL, R = NULL, scale.fix = TRUE, 
scale.value = 1, lambda, pindex = NULL, eps = 10^-6, maxiter = 30, tol = 10^-3, 
silent = TRUE)

Arguments

formula

A formula expression in the form of response ~ predictors.

id

A vector for identifying subjects/clusters.

data

A data frame which stores the variables in formula with id variable.

na.action

A function to remove missing values from the data. Only na.omit is allowed here.

family

A family object: a list of functions and expressions for defining link and variance functions. Families supported in PGEE are binomial, gaussian, gamma and poisson. The links, which are not available in gee, is not available here. The default family is gaussian.

corstr

A character string, which specifies the type of correlation structure. Structures supported in PGEE are "AR-1","exchangeable", "fixed", "independence", "stat_M_dep","non_stat_M_dep", and "unstructured". The default corstr type is "independence".

Mv

If either "stat_M_dep", or "non_stat_M_dep" is specified in corstr, then this assigns a numeric value for Mv. Otherwise, the default value is NULL.

beta_int

User specified initial values for regression parameters. The default value is NULL.

R

If corstr = "fixed" is specified, then R is a square matrix of dimension maximum cluster size containing the user specified correlation. Otherwise, the default value is NULL.

scale.fix

A logical variable; if true, the scale parameter is fixed at the value of scale.value. The default value is TRUE.

scale.value

If scale.fix = TRUE, this assignes a numeric value to which the scale parameter should be fixed. The default value is 1.

lambda

A numerical value for the penalization parameter of the scad function, which is estimated via cross-validation.

pindex

An index vector showing the parameters which are not subject to penalization. The default value is NULL. However, in case of a model with intercept, the intercept parameter should be never penalized.

eps

A numerical value for the epsilon used in minorization-maximization algorithm. The default value is 10^-6.

maxiter

The number of iterations that is used in the estimation algorithm. The default value is 25.

tol

The tolerance level that is used in the estimation algorithm. The default value is 10^-3.

silent

A logical variable; if false, the regression parameter estimates at each iteration are printed. The default value is TRUE.

Value

An object class of PGEE representing the fit.

References

Wang, L., Zhou, J., and Qu, A. (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics, 68, 353–360.

See Also

CVfit, MGEE

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Consider an example similar to example 1 
# in Wang et al. (2012).

# required R package 
library(mvtnorm)
# number of subjects
n <- 200
# number of covariates 
pn <- 10
# number of time points
m <- 4

# vector if subject ids
id.vect <- rep(1:n, each = m) 

# covariance matrix of (pn-1) number of continuous covariates 
X.sigma <- matrix(0,(pn-1),(pn-1))
{
for (i in 1:(pn-1))
X.sigma[i,] <- 0.5^(abs((1:(pn-1))-i))  
}

# generate matrix of covariates    
x.mat <- as.matrix(rmvnorm(n*m, mean = rep(0,(pn-1)), X.sigma))
x.mat <- cbind(rbinom(n*m,1, 0.5), x.mat)

# true values
beta.true <- c(2,3,1.5,2,rep(0,6))
sigma2 <- 1
rho <- 0.5
R <- matrix(rho,m,m)+diag(rep(1-rho,m))

# covariance matrix of error
SIGMA <- sigma2*R
error <- rmvnorm(n, mean = rep(0,m),SIGMA)

# generate longitudinal data with continuous outcomes
y.temp <- x.mat%*%beta.true
y.vect <- y.temp+as.vector(t(error))

mydata <- data.frame(id.vect,y.vect,x.mat) 
colnames(mydata) <- c("id","y",paste("x",1:length(beta.true),sep = ""))

###Input Arguments for CVfit fitting###
library(PGEE)
formula <- "y ~.-id-1"
data <- mydata
family <- gaussian(link = "identity")
lambda.vec <- seq(0.1,1,0.1)

## Not run: 
cv <- CVfit(formula = formula, id = id, data = data, family = family,
fold = 4, lambda.vec = lambda.vec, pindex = NULL, eps = 10^-6, maxiter = 30, 
tol = 10^-3)

names(cv)
cv$lam.opt

## End(Not run)

lambda <- 0.1 #this value obtained through CVfit

# analyze the data through penalized generalized estimating equations

myfit1 <- PGEE(formula = formula, id = id, data = data, na.action = NULL, 
family = family, corstr = "exchangeable", Mv = NULL, 
beta_int = c(rep(0,length(beta.true))), R = NULL, scale.fix = TRUE, 
scale.value = 1, lambda = lambda, pindex = NULL, eps = 10^-6, maxiter = 30, 
tol = 10^-3, silent = TRUE)

summary(myfit1)

# analyze the data through unpenalized generalized estimating equations

myfit2 <- MGEE(formula = formula, id = id, data = data, na.action = NULL, 
family = family, corstr = "exchangeable", Mv = NULL, 
beta_int = c(rep(0,length(beta.true))), R = NULL, scale.fix = TRUE, 
scale.value = 1, maxiter = 30, tol = 10^-3, silent = TRUE)

summary(myfit2)

Example output

Loading required package: MASS
Loading required package: mvtnorm
[1] "fold"     "lam.vect" "cv.vect"  "lam.opt"  "cv.min"   "call"    
[1] 0.1

 PGEE: PENALIZED GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA
 Version: 1.5 

Model:
 Link:                      Identity 
 Variance to Mean Relation: Gaussian 
 Correlation Structure:     Exchangeable 

Call:
PGEE(formula = formula, id = id, data = data, na.action = NULL, 
    family = family, corstr = "exchangeable", Mv = NULL, beta_int = c(rep(0, 
        length(beta.true))), R = NULL, scale.fix = TRUE, scale.value = 1, 
    lambda = lambda, pindex = NULL, eps = 10^-6, maxiter = 30, 
    tol = 10^-3, silent = TRUE)

Summary of Residuals:
        Min          1Q      Median          3Q         Max 
-18.9425216  -2.5521524   0.7964925   4.6966820  13.9160375 


Coefficients:
         Estimate   Naive S.E.       Naive z  Robust S.E.    Robust z
x1  -6.245202e-06 0.0006018853 -0.0103760674 1.104417e-05 -0.56547497
x2  -4.627128e-05 0.0015378055 -0.0300891614 2.210173e-04 -0.20935586
x3  -4.294162e-05 0.0014826252 -0.0289632320 1.931507e-04 -0.22232184
x4  -4.035857e-05 0.0014384216 -0.0280575380 1.928112e-04 -0.20931647
x5  -1.624205e-05 0.0009285880 -0.0174911271 5.611839e-05 -0.28942473
x6  -6.321007e-06 0.0006050500 -0.0104470823 2.190199e-05 -0.28860430
x7  -1.463376e-06 0.0003509603 -0.0041696357 8.016490e-06 -0.18254579
x8   1.176508e-06 0.0003298918  0.0035663463 6.367082e-06  0.18477983
x9  -2.466263e-07 0.0002496646 -0.0009878307 3.429643e-06 -0.07191021
x10  3.118394e-07 0.0002561115  0.0012175921 3.731282e-06  0.08357433

Estimated Scale Parameter:  1
Lambda value:  0.1
Number of Iterations:  1

Working Correlation
         [,1]     [,2]     [,3]     [,4]
[1,] 1.000000 3.170369 3.170369 3.170369
[2,] 3.170369 1.000000 3.170369 3.170369
[3,] 3.170369 3.170369 1.000000 3.170369
[4,] 3.170369 3.170369 3.170369 1.000000

 MGEE: GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA
 Version: 1.5 

Model:
 Link:                      Identity 
 Variance to Mean Relation: Gaussian 
 Correlation Structure:     Exchangeable 

Call:
MGEE(formula = formula, id = id, data = data, na.action = NULL, 
    family = family, corstr = "exchangeable", Mv = NULL, beta_int = c(rep(0, 
        length(beta.true))), R = NULL, scale.fix = TRUE, scale.value = 1, 
    maxiter = 30, tol = 10^-3, silent = TRUE)

Summary of Residuals:
        Min          1Q      Median          3Q         Max 
-3.16726037 -0.65312091 -0.01575528  0.58413091  2.56392373 


Coefficients:
        Estimate Naive S.E.    Naive z Robust S.E.   Robust z
x1   2.056580168 0.05274902 38.9880259  0.04913712 41.8539002
x2   3.033784586 0.03625679 83.6749382  0.03418611 88.7431836
x3   1.460362349 0.04022637 36.3036073  0.03785158 38.5812736
x4   2.029099874 0.03925276 51.6931724  0.03470327 58.4699954
x5  -0.039044223 0.04313002 -0.9052680  0.03649799 -1.0697636
x6   0.020445664 0.04079691  0.5011571  0.03690933  0.5539430
x7  -0.025699503 0.03978206 -0.6460074  0.03796402 -0.6769437
x8   0.023445883 0.04092588  0.5728864  0.03310780  0.7081679
x9   0.004504639 0.03925565  0.1147514  0.03449533  0.1305869
x10 -0.015665129 0.03516755 -0.4454427  0.02857788 -0.5481558

Estimated Scale Parameter:  1
Number of Iterations:  4

Working Correlation
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.3824306 0.3824306 0.3824306
[2,] 0.3824306 1.0000000 0.3824306 0.3824306
[3,] 0.3824306 0.3824306 1.0000000 0.3824306
[4,] 0.3824306 0.3824306 0.3824306 1.0000000

PGEE documentation built on May 2, 2019, 3:37 p.m.