cglasso: Conditional Graphical Lasso Estimator

View source: R/cglasso.R

cglassoR Documentation

Conditional Graphical Lasso Estimator

Description

cglasso’ fits the conditional graphical lasso model to datasets with censored and/or missing values.

Usage

cglasso(formula, data, subset, contrasts = NULL, diagonal = FALSE,
        weights.B = NULL, weights.Tht = NULL, nlambda, lambda.min.ratio,
        lambda, nrho, rho.min.ratio, rho, maxit.em = 1.0E+4, thr.em = 1.0E-3,
        maxit.bcd = 1.0E+5, thr.bcd = 1.0E-4, trace = 0L)

Arguments

formula

an object of class ‘formula’ (or one that can be coerced to that class): a symbolic description of the model to be fitted.

data

an R object of S3 class ‘datacggm’, that is, the output of the function datacggm. See section ‘Description’ for more details.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

diagonal

logical. Should diagonal entries of the concentration matrix be penalized? Default is ‘diagonal = FALSE’.

weights.B

an optional q x p dimensional matrix of non-negative weights used to penalize the regression coefficients (intercepts are unpenalized). This matrix can be also used to specify the unpenalized regression coefficients (‘weights.B[i, j] = 0’) or the structural zeros (‘weights.B[i, j] = +Inf’). By default, all weights are set equal to 1, meaning that the penalized regression coefficients are unweighted.

weights.Tht

an optional symmetric matrix of non-negative weights used to penalize the partial regression coefficients. This matrix can be used to specify the unpenalized partial correlation coefficients (‘weights.Tht[i, j] = 0’) or the structural zeros in the precision matrix (‘weights.Tht[i, j] = +Inf’). By default, the off-diagonal entries of the matrix ‘weights.Tht’ are set equal to 1, meaning that the partial correlation coefficients are unweighted, whereas the diagonal entries are equal to 0, that is the diagonal entries of the precision matrix are unpenalized.

nlambda

integer. The number of lambda-values used to penalize the regression coefficients. By default ‘nlambda = 10’.

lambda.min.ratio

the smallest lambda-value is defined as a fraction of ‘lambda.max’ (i.e., the smallest lambda-value for which all the estimated regression coefficients are equal to zero). The default depends on the sample size ‘n’ relative to the number of predictors ‘q’. If ‘q < n’, default is ‘1.0E-6’ otherwise the value ‘1.0E-2’ is used as default. A very small value of ‘lambda.min.ratio’ will lead to a saturated fitted model in the ‘q < n’ case.

lambda

an optional user-supplied decreasing sequence of lambda-values. By default cglasso computes a sequence of lambda-values using nlambda and lambda.min.ratio. If lambda is supplied then nlambda and lambda.min.ratio are overwritten. WARNING: use with care and avoid supplying a single lambda-value; supply instead a decreasing sequence.

nrho

integer. The number of rho-values used to penalize the partial correlation coefficients. By default ‘nrho = 10’.

rho.min.ratio

the smallest rho-value is defined as a fraction of ‘rho.max’ (i.e., the smallest rho-value for which all the estimated partial correlation coefficients are equal to zero). The default depends on the sample size ‘n’ relative to the number of response variables ‘p’. If ‘p < n’, the default is ‘1.0E-6’ otherwise the value ‘1.0E-2’ is used as default. A very small value of ‘rho.min.ratio’ will lead to a saturated fitted model in the ‘p < n’ case.

rho

an optional user supplied decreasing sequence of rho-values. By default cglasso computes a sequence of rho-values using nrho and rho.min.ratio. If rho is supplied then nrho and rho.min.ratio are overwritten. WARNING: use with care and avoid supplying a single rho-value; supply instead a decreasing sequence.

maxit.em

maximum number of iterations of the EM algorithm. Default is 1.0E+4.

thr.em

threshold for the convergence of the EM algorithm. Default value is 1.0E-4.

maxit.bcd

maximum number of iterations of the glasso algorithm. Default is 1.0E+5.

thr.bcd

threshold for the convergence of the glasso algorithm. Default is 1.0E-4.

trace

integer for printing information out as iterations proceed: trace = 0 no information is printed out; trace = 1 minimal information is printed on screen; trace = 2 detailed information is printed on screen.

Details

cglasso is the main model-fitting function and can be used to fit a broad range of extensions of the glasso estimator (Friedman and other, 2008). It is specifically proposed to study datasets with censored and/or missing response values. To help the user, the cglasso function has been designed to automatically select the most suitable extension by using the information stored in the ‘datacggm’ object passed through Z.

Below we sum up the available extenions:

  • if only left/right-censored are observed in the response matrix Y (without missing values) and no predictor matrix is stored in Z, then cglasso computes the censored glasso estimator (Augugliaro and other, 2020a);

  • if only left/right-censored are observed in the response matrix Y (without missing values) and and a predictor matrix X is stored in Z, then cglasso computes the conditional censored glasso estimator (Augugliaro and other, 2020b);

  • if only missing values are stored in the response matrix Y (without censored values), then cglasso computes the missglasso estimator (Stadler and other, 2012);

  • starting with version 2.0.0, cglasso can also handle datasets with both missing and censored response values.

See section ‘Examples’ for some example.

The model-fitting function cglasso returns an R object of S3 class ‘cglasso’ for which there are available a set of accessor functions, a set of functions designed to evaluate the goodness-of-fit of the fitted models and, finally, a set of functions developed to analyze the selected network. The function ShowStructure can be used to show the structure of the package.

The accessor functions coef.cglasso, fitted.cglasso, residuals.cglasso, predict.cglasso and impute can be used to extract various useful features of the object fitted by cglasso.

For an R object returned by cglasso, the functions AIC.cglasso and BIC.cglasso can be used to evaluate the goodness-of-fit of the fitted models. Usually, these functions are used together with the function summary.cglasso, which gives more information about the sequence of fitted models. The plotting function plot.GoF can be used to graphically identify the optimal pair of the tuning parameters.

Given a pair of the tuning paremeters, the functions to_graph and plot.cglasso2igraph can be used to analyze and show the selected network. Finally, the function cggm can be used to produce post-hoc maximum likelihood refitting of the selected graphical model.

Value

cglasso returns an object of S3 class “cglasso”, i.e., a named list containing the following components:

call

the call that produced this object.

Yipt

an array of dimension ‘n x p x nlambda x nrho’ storing the ‘working response matrices’, that is, Yipt[, , i, j] is used as response matrix to compute the multilasso estimator (Augugliaro and other, 2020b). The accessor function ‘impute’ can be used to extract the desidered imputed matrix.

B

an array of dimension ‘(q + 1) x p x nlambda x nrho’ storing the penalized estimate of the regression coefficient matrices. The accessor function ‘coef.cglasso’ can be used to extract the desired estimates.

mu

an array of dimension ‘n x p x nlambda x nrho’ storing the fitted values. The accessor function ‘fitted.cglasso’ can be used to extract the desired fitted values.

R

an array of dimension ‘n x p x nlambda x nrho’ storing the ‘working residuals’, that is, R[, , i, j] is defined as the difference between Yipt[, , i, j] and mu[, , i, j]. The accessor function ‘residuals.cglasso’ can be used to extract the desidered residual matrix.

S

an array of dimension ‘p x p x nlambda x nrho’ storing the ‘working empirical covariance matrices’, that is, ‘S[, , i, j]’ is used to compute the glasso estimator.

Sgm

an array of dimension ‘p x p x nlambda x nrho’ storing the estimated covariance matrices. The accessor function ‘coef’ can be used to extract the desired estimates.

Tht

an array of dimension ‘p x p x nlambda x nrho’ storing the estimated precision matrices. The accessor funtion ‘coef’ can be used to extract the desired estimates.

dfB

a matrix of dimension ‘(p + 1) x nlambda x nrho’ storing the number of estimated non-zero regression coefficients. Only for internal purpose.

dfTht

a matrix of dimension ‘nlambda x nrho’ storing the number of estimated non-zero (off-diagonal) partial correlation coefficients. Only for internal purpose.

InfoStructure

a named list whose elements contain information about the estimated networks. Only for internal purpose.

nit

an array of dimension ‘2 x nlambda x nrho’ storing the number of EM steps.

Z

the ‘datacggm’ object used to compute the censored graphical lasso estimator.

diagonal

the flag used to specify if the diagonal entries of the precision matrix are penalized.

weights.B

the matrix of non-negative weights used for the regression coefficients.

weights.Tht

the matrix of non-negative weights used for the precision matrix.

nlambda

the number of λ-values used.

lambda.min.ratio

the value used to compute the smallest lambda-value.

lambda

the sequence of lambda-values used to fit the model.

nrho

the number of ρ-values used.

rho.min.ratio

the value used to compute the smallest rho-value.

rho

the sequence of rho-values used to fit the model.

model

a description of the fitted model.

maxit.em

maximum number of iterations of the EM algorithm.

thr.em

threshold for the convergence of the EM algorithm.

maxit.bcd

maximum number of iterations of the glasso algorithm.

thr.bcd

threshold for the convergence of the glasso algorithm.

conv

a description of the error that has occurred.

subrout

the name of the Fortran subroutine where the error has occurred (for internal debug only).

trace

the integer used for printing information on screen.

nobs

the sample size

nresp

the number of response variables used to fit the model.

npred

the number of predictors used to fit the model.

Author(s)

Luigi Augugliaro (luigi.augugliaro@unipa.it)

References

Augugliaro L., Sottile G., Wit E.C., and Vinciotti V. (2023) <doi: 10.18637/jss.v105.i01>. cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values. Journal of Statistical Software 105(1), 1–58.

Augugliaro, L., Abbruzzo, A., and Vinciotti, V. (2020a) <doi: 10.1093/biostatistics/kxy043>. l1-Penalized censored Gaussian graphical model. Biostatistics 21, e1–e16.

Augugliaro, L., Sottile, G., and Vinciotti, V. (2020b) <doi: 10.1007/s11222-020-09945-7>. The conditional censored graphical lasso estimator. Statistics and Computing 30, 1273–1289.

Friedman, J.H., Hastie, T., and Tibshirani, R. (2008) <doi: 10.1093/biostatistics/kxm045>. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441.

Stadler, N. and Buhlmann, P. (2012) <doi: 10.1007/s11222-010-9219-7>. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing 22, 219–235.

See Also

datacggm, coef.cglasso, fitted.cglasso, residuals.cglasso, predict.cglasso, impute, AIC.cglasso, BIC.cglasso, summary.cglasso, select.cglasso, plot.GoF, to_graph, plot.cglasso2igraph, cggm and ShowStructure.

Examples

set.seed(123)

# Model 1: censored glasso estimator (Augugliaro \emph{and other}, 2020a)
# Y ~ N(0, Sigma) and probability of left/right censored values equal to 0.05
n <- 1000L
p <- 3L
rho <- 0.3
Sigma <- outer(1L:p, 1L:p, function(i, j) rho^abs(i - j))
Z <- rcggm(n = n, Sigma = Sigma, probl = 0.05, probr = 0.05)
out <- cglasso(. ~ ., data = Z)
out

# Model 2: conditional censored glasso estimator (Augugliaro \emph{and other}, 2020b)
# Y ~ N(b0 + XB, Sigma)  and probability of left/right censored values equal to 0.05
n <- 1000L
p <- 3L
q <- 2L
b0 <- runif(p)
B <- matrix(runif(q * p), nrow = q, ncol = p)
X <- matrix(rnorm(n * q), nrow = n, ncol = q)
rho <- 0.3
Sigma <- outer(1L:p, 1L:p, function(i, j) rho^abs(i - j))
Z <- rcggm(n = n, b0 = b0, X = X, B = B, Sigma = Sigma, probl = 0.05, probr = 0.05)
out <- cglasso(. ~ ., data = Z)
out

# Model 3: missglasso estimator (Stadler \emph{and other}, 2012)
# Y ~ N(b0 + XB, Sigma)  and probability of missing-at-random values equal to 0.05
n <- 1000L
p <- 3L
q <- 2L
b0 <- runif(p)
B <- matrix(runif(q * p), nrow = q, ncol = p)
X <- matrix(rnorm(n * q), nrow = n, ncol = q)
rho <- 0.3
Sigma <- outer(1L:p, 1L:p, function(i, j) rho^abs(i - j))
Z <- rcggm(n = n, b0 = b0, X = X, B = B, Sigma = Sigma, probna = 0.05)
out <- cglasso(. ~ ., data = Z)
out

# Model 4: mixed estimator
# Y ~ N(b0 + XB, Sigma)  and
# 1. probability of left/right censored values equal to 0.05
# 2. probability of missing-at-random values equal to 0.05
n <- 1000L
p <- 3L
q <- 2L
b0 <- runif(p)
B <- matrix(runif(q * p), nrow = q, ncol = p)
X <- matrix(rnorm(n * q), nrow = n, ncol = q)
rho <- 0.3
Sigma <- outer(1L:p, 1L:p, function(i, j) rho^abs(i - j))
Z <- rcggm(n = n, X = X, b0 = b0, B = B, Sigma = Sigma, probl = 0.05, probr = 0.05,
           probna = 0.05)
out <- cglasso(. ~ ., data = Z)
out

cglasso documentation built on Jan. 17, 2023, 5:10 p.m.