geem: Fit Generalized Estimating Equations
In leemcdaniel/geeM: Solve Generalized Estimating Equations

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/geem.R

Calculate coefficients and nuisance parameters using generalized estimating equations. Link and Variance functions can be specified by the user. Similar to glm.

geem(formula, id, waves=NULL, data = parent.frame(), family = gaussian,
 corstr = "independence", Mv = 1, weights = NULL, corr.mat = NULL, init.beta = NULL,
 init.alpha = NULL, init.phi = 1, scale.fix = FALSE, nodummy=FALSE, sandwich = TRUE, 
 useP = TRUE, maxit = 20, tol = 1e-05)

`formula`	a formula expression similar to that for `glm`, of the form `response~predictors`. An offset is allowed, as in `glm`.
`id`	a vector identifying the clusters. By default, data are assumed to be sorted such that observations in a cluster are in consecutive rows and higher numbered rows in a cluster are assumed to be later. If NULL, then each observation is assigned its own cluster.
`waves`	an integer vector identifying components of a cluster. For example, this could be a time ordering. If integers are skipped within a cluster, then dummy rows with weight 0 are added in an attempt to preserve the correlation structure (except if `corstr = "exchangeable"` or `"independent"`). This can be skipped by setting `nodummy=TRUE`.
`data`	an optional data frame containing the variables in the model.
`family`	will determine the link and variance functions. The argument can be one of three options: a `family` object, a character string, or a list of functions. For more information on how to use `family` objects, see `family`. If the supplied argument is a character string, then the string should correspond to one of the family objects. In order to define a link function, a list must be created with the components `(LinkFun, VarFun, InvLink, InvLinkDeriv)`, all of which are vectorized functions. If the components in the list are not named as `(LinkFun, VarFun, InvLink, InvLinkDeriv)`, then `geem` assumes that the functions are given in that order. LinkFun and VarFun are the link and variance functions. InvLink and InvLinkDeriv are the inverse of the link function and the derivative of the inverse of the link function and so are decided by the choice of the link function.
`corstr`	a character string specifying the correlation structure. Allowed structures are: `"independence"`, `"exchangeable"`, `"ar1"`, `"m-dependent"`, `"unstructured"`, `"fixed"`, and `"userdefined"`. Any unique substring may be supplied. If `"fixed"` or `"userdefined"`, then `corr.mat` must be specified. If `"m-dependent"`, then `Mv` is relevant.
`Mv`	for `"m-dependent"`, the value for `m`.
`weights`	A vector of weights for each observation. If an observation has weight 0, it is excluded from the calculations of any parameters. Observations with a `NA` anywhere (even in variables not included in the model) will be assigned a weight of 0. Note that these weights are now the same as PROC GEE weights and not PROC GENMOD.
`corr.mat`	the correlation matrix for `"fixed"`. Matrix should be symmetric with dimensions >= the maximum cluster size. If the correlation structure is `"userdefined"`, then this is a matrix describing which correlations are the same.
`init.beta`	an optional vector with the initial values of beta. If not specified, then the intercept will be set to `InvLink(mean(response))`. `init.beta` must be specified if not using an intercept.
`init.alpha`	an optional scalar or vector giving the initial values for the correlation. If provided along with `Mv>1` or `unstructured` correlation, then the user must ensure that the vector is of the appropriate length.
`init.phi`	an optional initial overdispersion parameter. If not supplied, initialized to 1.
`scale.fix`	if set to `TRUE`, then the scale parameter is fixed at the value of `init.phi`.
`nodummy`	if set to `TRUE`, then dummy rows will not be added based on the values in `waves`.
`sandwich`	if `TRUE`, calculate robust variance.
`useP`	if set to `FALSE`, do not use the n-p correction for dispersion and correlation estimates, as in Liang and Zeger. This can be useful when the number of observations is small, as subtracting p may yield correlations greater than 1.
`maxit`	maximum number of iterations.
`tol`	tolerance in calculation of coefficients.

Users may specify functions for link and variance functions, but the functions must be vectorized functions. See Vectorize for an easy way to vectorize functions. Vectorize should be used sparingly, however, as it can lead to fairly slow function calls. Care must be taken to ensure that convergence is possible with non-standard functions.

Offsets must be specified in the model formula, as in glm.

For the "userdefined" correlation option, the function accepts a matrix with consecutive integers. geem only looks at the upper triangle of the matrix. Any entry given as 0 will be fixed at 0. All entries given as 1 will be assumed to be the same as each other and will be assumed to be possibly different from entries with a 2, and so on.

If observations are dropped because they have a weight of 0, then the denominator for the moment estimates of the correlation matrices are calculated using the number of non-zero Pearson residuals for the correlation structures unstructured, userdefined and m-dependent with Mv>1. Therefore residuals numerically equal to 0 may cause problems in the calculation of correlation parameters.

An object of class "geem" representing the fit.

Lee McDaniel and Nick Henderson

glm, formula, family

### Generated Negative Binomial Data
generatedata <- function(beta,alpha,gamma,X,T,n)  {
     mean.vec <- exp(crossprod(t(X),beta))
     y <- matrix(0,nrow=n,ncol=T)
     y[,1] <- rnbinom(n,mu = mean.vec[1],size=mean.vec[1]/gamma)
     for (i in 1:n)  {
        for (t in 2:T)  {
           innovation.mean <- mean.vec[t] - alpha*(sqrt(mean.vec[t]*mean.vec[t-1]))
           I <- rnbinom(1,mu= innovation.mean,size= innovation.mean/gamma)           
           first.shape <- alpha*sqrt(mean.vec[t]*mean.vec[t-1])/gamma
           second.shape <- mean.vec[t-1]/gamma - first.shape
           u <- rbeta(1,shape1 = first.shape,shape2=second.shape)
           a <- rbinom(1,size=y[i,t-1],prob=u)
           y[i,t] = a + I
        }
     }
     longform <- c(t(y))
     print(apply(y,2,mean))
     simdata <- data.frame(count = longform, time = rep(X[,2],times=n),subject=rep(c(1:n),each=T))
     return(simdata)
}

X <- cbind(rep(1,5),c(-.5,-.25,0,.25,.5))
testdat <- generatedata(beta=c(1,.5),alpha=.2,gamma=.5,X=X,T=5,n=3000)
far1 <- geem(count~ time, id=subject ,data = testdat, family=poisson, 
	corstr="ar1")

### Ohio respiratory data from geepack
if(require(geepack)){
	data("ohio", package="geepack")
	resplogit <- geem(resp ~ age + smoke + age:smoke, id=id, data = ohio, family = binomial,
	corstr = "m-dep" , Mv=1)
  
  LinkFun <- function(arg){qcauchy(arg)}
  InvLink <- function(arg){pcauchy(arg)}
  InvLinkDeriv <- function(arg){dcauchy(arg)}
  VarFun <- function(arg){arg*(1-arg)}
  FunList <- list(LinkFun, VarFun, InvLink, InvLinkDeriv)
  
  respcauchit <- geem(resp ~ age + smoke + age:smoke, id=id, data = ohio, family = FunList,
  corstr = "m-dep" , Mv=1)
}

### Seizure data from geepack
if(require(geepack)){
	data("seizure", package="geepack")
	seiz.l <- reshape(seizure,
	varying=list(c("base","y1", "y2", "y3", "y4")),
		v.names="y", times=0:4, direction="long")
		seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),]
		seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2)
		seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1)

	seiz <- geem(y~ x + trt + x:trt+ offset(log(t)), id=id,data = seiz.l, 
	family = poisson, corstr = "exchangeable")
}