Function to fit high-dimensional generalized linear mixed models.

Share:

Description

Fits the solution for a high-dimensional generalized linear mixed model.

Usage

1
2
3
4
5
6
7
8
9
glmmlasso(y,...)

## Default S3 method:
glmmlasso(y, group, X, Z = NULL,
          family = c("binomial", "poisson"),
          covStruct = c("Identity", "Diagonal", "Sym"),
          lambda, weights = NULL, coefInit = NULL, exactStep = FALSE,
          exactDeriv = FALSE, random = NULL, unpenalized = 1, ranInd = 1,
          control = glmmlassoControl(family = family), ...)

Arguments

y

response variable of length n.

group

grouping variable of length n.

X

fixed-effects matrix as an n x p matrix. An intercept has to be included in the first column as (1,...,1).

Z

random-effects matrix as an n x q matrix, must be in sparse Matrix format (see package Matrix).

family

GLM family. Currently only "binomial" and "poisson" are implemented.

covStruct

Covariance Stucture to be used. "Identity" fits \bm{Σ}_{θ}=θ^2\bm{1}_q, i.e. one single covariance parameter for all random effects. "Diagonal" fits a diagonal matrix for \bm{Σ}_{θ}, i.e. for each random effect a different covariance parameter. "Sym" fits an unstructured symmetric matrix.

lambda

non-negative regularization parameter

weights

weights for the fixed-effects covariates: NA means no penalization, 0 means drop this covariate ; if given, the argument unpenalized is ignored. By default each covariate has weight 1

coefInit

starting values. They must be of the same length as the model to be fitted, i.e. the number of variables p, the number of random-effects q and the number of covariance parameters stot must coincide. Otherwise a warning is issued.

exactStep

logical. Should the Armijo step include the update of the random effects u to ensure that the objective function strictly decreases?

exactDeriv

logical. Should the exact derivate be calculated or the derivative for fixed random effects u.

random

expression for the random-effects structure for non-correlated random effects of the form "(1|group)+(0+X2|group)+(0+X3|group)". It is used only for generating the corresponding Z matrix, and it dominates the Z matrix, i.e. if random and a Z matrix is given, the Z matrix corresponding to random is used.

unpenalized

indices as subset of \{1,...,p\} indicating which fixed-effects covariates are not subject to the \ell_1-penalty. Ignored if weights is given.

ranInd

indices of the random effects as subset of \{1,...,p\}. Only used for summary.glmmlasso. They only need to be specified if the random-effects covariates do not corespond to the first columns in X.

control

control parameters for the algorithm, see lmmlassoControl for the details

...

not used.

Details

All the details of the algorithm can be found in the article.

Concerning logLik, deviance, aic and bic, we have to be very careful when comparing them with other generalized linear mixed model functions. If we study a low-dimensional data example and set λ=0, the log-likelihood is calculated as given in the paper. Deviance, aic and bic are computed such that they coincide with the results from glmer from the lme4 package. The latter does not employ the standard definitions of deviance and log-likelihood function value. Nevertheless, the differences only depends on constants, and not the parameters.

Value

A glmmlasso object is returned, for which coef,resid, fitted, logLik print, summary, plot methods exist.

fixef

fixed-effects parameter beta

coefficients

fixed-effects parameter beta

theta

covariance parameter estimates

ranef

random effects in sparse vector format

u

random effects in dense vector format

objective

Value of the objective function corresponding to the estimates

logLik

value of the log-likelihood function. See details.

deviance

value of the deviance function. See details.

aic

AIC. See details.

bic

BIC. See details.

activeSet

Indices of the non-zero fixed-effects coefficients.

eta

The linear predictor at the current values.

mu

The conditional expectation of the response at the current values.

fitted

The fitted values at the current values.

lambda

non-negative regularization parameter

weights

weights (possible adapted to the argument weights)

data

data. List with y, group, X and Z

family

GLM family used.

ntot

total number of observations

p

number of fixed-effects parameters

N

number of groups/clusters

unpenalized

indices of the non-penalized fixed-effects covariates

ranInd

indices of the random effects as subset of \{1,...,p\}.

exactStep

logical. If the Armijo step includes the update of the random effects u or not.

exactDeriv

logical. If the exact derivate has been calculated or if the derivative for fixed random effects u has been calculated.

coefInit

starting values used in the algorithm for beta, theta and u

coefOut

list with estimates in the form required for the argument coefInit

convergence

integer giving convergence information. Each time maxArmijo was reached, convergence is increased by 2. If maxIter was reached, convergence is increased by 1.

nIter

number of outer iterations.

nIterPirls

number of pirls evaluation within the outer iteration. Pirls-Evaluation within the Armijo steps are not counted.

nfctEval

number of function evaluation. See value fctEval.

fctEval

vector of all function values calculated during the algorithm. It may be interesting if studying the convergence behaviour of the algorithm. Only if argument fctSave=TRUE

gradient

gradient of the objective function with respect to the fixed-effects coefficients

maxGrad

maximal value of the gradient which has to be close to zero for convergence.

maxArmijo

the maximal value of l used in each fixed-effects component.

control

see lmmlassoControl

resid

response residuals. See McCullagh and Nelder (1989).

pearResid

pearson residuals. See McCullagh and Nelder (1989).

respResid

response residuals. See McCullagh and Nelder (1989).

workResid

working residuals. See McCullagh and Nelder (1989).

devResid

deviance residuals. See McCullagh and Nelder (1989).

Var

conditional variance of the response at the current values. See McCullagh and Nelder (1989).

di

contribution of each observation to the deviance. See McCullagh and Nelder (1989).

gof

goodness-of-fit criterion. For family="binomial", it is the in-sample missclassification rate. For family="poisson", it is the Pearson X2 statistic. See McCullagh and Nelder (1989).

cpu

cpu time needed for the algorithm.

Author(s)

Juerg Schelldorfer, schell@stat.math.ethz.ch

References

P. McCullagh and J. A. Nelder (1989), Generalized Linear Models, Chapman-Hall, London.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## (1) Use glmmlasso on the xerop data set
data(xerop)
fit1 <- glmmlasso(y=xerop$y,group=xerop$group,X=xerop$X,Z=xerop$Z,
                  family="binomial",covStruct="Identity",lambda=30)
summary(fit1)
plot(fit1)

## (2) Use the glmmlasso on a small simulated data set
set.seed(142)

N <- 40           ## number of groups
p <- 6            ## number of covariates (including intercept)
q <- 2            ## number of random effect covariates
ni <- rep(10,N)   ## observations per group
ntot <- sum(ni)   ## total number of observations

group <- factor(rep(1:N,times=ni)) ## grouping variable

beta <- c(0,1,-1,1,-1,0,0) # fixed-effects coefficients

X <- cbind(1,matrix(rnorm(ntot*p),nrow=ntot)) ## fixed-effects design matrix
Z <- t(glmer(rbinom(ntot,1,runif(ntot,0.3,1))~1+(1|group)+(0+X2|group),
             data=data.frame(X),family="binomial")@Zt) 
## random-eff. design matrix

bi <- c(rnorm(N,0,2),rnorm(N,0,1))

eta <- X%*%beta + Z%*%bi
mu  <- exp(eta)/(1+exp(eta))
y   <- rbinom(ntot,1,mu@x)

## random effects model with diagonal covariance matrix
fit2 <- glmmlasso(y=y, group=group, X=X, Z=Z, family="binomial", lambda=10,
                  unpenalized=1:2, covStruct="Diagonal")
summary(fit2)
plot(fit2)