cglasso: Conditional Graphical Lasso Estimator
In cglasso: Conditional Graphical LASSO for Gaussian Graphical Models with Censored and Missing Values

cglasso

R Documentation

Conditional Graphical Lasso Estimator

Description

‘cglasso’ fits the conditional graphical lasso model to datasets with censored and/or missing values.

Usage

cglasso(formula, data, subset, contrasts = NULL, diagonal = FALSE,
        weights.B = NULL, weights.Tht = NULL, nlambda, lambda.min.ratio,
        lambda, nrho, rho.min.ratio, rho, maxit.em = 1.0E+4, thr.em = 1.0E-3,
        maxit.bcd = 1.0E+5, thr.bcd = 1.0E-4, trace = 0L)

Arguments

`formula`	an object of class ‘`formula`’ (or one that can be coerced to that class): a symbolic description of the model to be fitted.
`data`	an R object of S3 class ‘`datacggm`’, that is, the output of the function `datacggm`. See section ‘Description’ for more details.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`contrasts`	an optional list. See the `contrasts.arg` of `model.matrix.default`.
`diagonal`	logical. Should diagonal entries of the concentration matrix be penalized? Default is ‘`diagonal = FALSE`’.
`weights.B`	an optional `q\times p` dimensional matrix of non-negative weights used to penalize the regression coefficients (intercepts are unpenalized). This matrix can be also used to specify the unpenalized regression coefficients (‘`weights.B[i, j] = 0`’) or the structural zeros (‘`weights.B[i, j] = +Inf`’). By default, all weights are set equal to 1, meaning that the penalized regression coefficients are unweighted.
`weights.Tht`	an optional symmetric matrix of non-negative weights used to penalize the partial regression coefficients. This matrix can be used to specify the unpenalized partial correlation coefficients (‘`weights.Tht[i, j] = 0`’) or the structural zeros in the precision matrix (‘`weights.Tht[i, j] = +Inf`’). By default, the off-diagonal entries of the matrix ‘`weights.Tht`’ are set equal to 1, meaning that the partial correlation coefficients are unweighted, whereas the diagonal entries are equal to 0, that is the diagonal entries of the precision matrix are unpenalized.
`nlambda`	integer. The number of `\lambda`-values used to penalize the regression coefficients. By default ‘`nlambda = 10`’.
`lambda.min.ratio`	the smallest `\lambda`-value is defined as a fraction of ‘`lambda.max`’ (i.e., the smallest `\lambda`-value for which all the estimated regression coefficients are equal to zero). The default depends on the sample size ‘`n`’ relative to the number of predictors ‘`q`’. If ‘`q < n`’, default is ‘1.0E-6’ otherwise the value ‘1.0E-2’ is used as default. A very small value of ‘`lambda.min.ratio`’ will lead to a saturated fitted model in the ‘`q < n`’ case.
`lambda`	an optional user-supplied decreasing sequence of `\lambda`-values. By default `cglasso` computes a sequence of `\lambda`-values using `nlambda` and `lambda.min.ratio`. If `lambda` is supplied then `nlambda` and `lambda.min.ratio` are overwritten. WARNING: use with care and avoid supplying a single `\lambda`-value; supply instead a decreasing sequence.
`nrho`	integer. The number of `\rho`-values used to penalize the partial correlation coefficients. By default ‘`nrho = 10`’.
`rho.min.ratio`	the smallest `\rho`-value is defined as a fraction of ‘`rho.max`’ (i.e., the smallest `\rho`-value for which all the estimated partial correlation coefficients are equal to zero). The default depends on the sample size ‘`n`’ relative to the number of response variables ‘`p`’. If ‘`p < n`’, the default is ‘1.0E-6’ otherwise the value ‘1.0E-2’ is used as default. A very small value of ‘`rho.min.ratio`’ will lead to a saturated fitted model in the ‘`p < n`’ case.
`rho`	an optional user supplied decreasing sequence of `\rho`-values. By default `cglasso` computes a sequence of `\rho`-values using `nrho` and `rho.min.ratio`. If `rho` is supplied then `nrho` and `rho.min.ratio` are overwritten. WARNING: use with care and avoid supplying a single `\rho`-value; supply instead a decreasing sequence.
`maxit.em`	maximum number of iterations of the EM algorithm. Default is `1.0E+4`.
`thr.em`	threshold for the convergence of the EM algorithm. Default value is `1.0E-4`.
`maxit.bcd`	maximum number of iterations of the glasso algorithm. Default is `1.0E+5`.
`thr.bcd`	threshold for the convergence of the glasso algorithm. Default is `1.0E-4`.
`trace`	integer for printing information out as iterations proceed: `trace = 0` no information is printed out; `trace = 1` minimal information is printed on screen; `trace = 2` detailed information is printed on screen.

Details

cglasso is the main model-fitting function and can be used to fit a broad range of extensions of the glasso estimator (Friedman and other, 2008). It is specifically proposed to study datasets with censored and/or missing response values. To help the user, the cglasso function has been designed to automatically select the most suitable extension by using the information stored in the ‘datacggm’ object passed through Z.

Below we sum up the available extenions:

if only left/right-censored are observed in the response matrix Y (without missing values) and no predictor matrix is stored in Z, then cglasso computes the censored glasso estimator (Augugliaro and other, 2020a);
if only left/right-censored are observed in the response matrix Y (without missing values) and and a predictor matrix X is stored in Z, then cglasso computes the conditional censored glasso estimator (Augugliaro and other, 2020b);
if only missing values are stored in the response matrix Y (without censored values), then cglasso computes the missglasso estimator (Stadler and other, 2012);
starting with version 2.0.0, cglasso can also handle datasets with both missing and censored response values.

See section ‘Examples’ for some example.

The model-fitting function cglasso returns an R object of S3 class ‘cglasso’ for which there are available a set of accessor functions, a set of functions designed to evaluate the goodness-of-fit of the fitted models and, finally, a set of functions developed to analyze the selected network. The function ShowStructure can be used to show the structure of the package.

The accessor functions coef.cglasso, fitted.cglasso, residuals.cglasso, predict.cglasso and impute can be used to extract various useful features of the object fitted by cglasso.

For an R object returned by cglasso, the functions AIC.cglasso and BIC.cglasso can be used to evaluate the goodness-of-fit of the fitted models. Usually, these functions are used together with the function summary.cglasso, which gives more information about the sequence of fitted models. The plotting function plot.GoF can be used to graphically identify the optimal pair of the tuning parameters.

Given a pair of the tuning paremeters, the functions to_graph and plot.cglasso2igraph can be used to analyze and show the selected network. Finally, the function cggm can be used to produce post-hoc maximum likelihood refitting of the selected graphical model.

Value

cglasso returns an object of S3 class “cglasso”, i.e., a named list containing the following components:

`call`	the call that produced this object.
`Yipt`	an array of dimension ‘`n x p x nlambda x nrho`’ storing the ‘working response matrices’, that is, `Yipt[, , i, j]` is used as response matrix to compute the multilasso estimator (Augugliaro and other, 2020b). The accessor function ‘`impute`’ can be used to extract the desidered imputed matrix.
`B`	an array of dimension ‘`(q + 1) x p x nlambda x nrho`’ storing the penalized estimate of the regression coefficient matrices. The accessor function ‘`coef.cglasso`’ can be used to extract the desired estimates.
`mu`	an array of dimension ‘`n x p x nlambda x nrho`’ storing the fitted values. The accessor function ‘`fitted.cglasso`’ can be used to extract the desired fitted values.
`R`	an array of dimension ‘`n x p x nlambda x nrho`’ storing the ‘working residuals’, that is, `R[, , i, j]` is defined as the difference between `Yipt[, , i, j]` and `mu[, , i, j]`. The accessor function ‘`residuals.cglasso`’ can be used to extract the desidered residual matrix.
`S`	an array of dimension ‘`p x p x nlambda x nrho`’ storing the ‘working empirical covariance matrices’, that is, ‘`S[, , i, j]`’ is used to compute the glasso estimator.
`Sgm`	an array of dimension ‘`p x p x nlambda x nrho`’ storing the estimated covariance matrices. The accessor function ‘`coef`’ can be used to extract the desired estimates.
`Tht`	an array of dimension ‘`p x p x nlambda x nrho`’ storing the estimated precision matrices. The accessor funtion ‘`coef`’ can be used to extract the desired estimates.
`dfB`	a matrix of dimension ‘`(p + 1) x nlambda x nrho`’ storing the number of estimated non-zero regression coefficients. Only for internal purpose.
`dfTht`	a matrix of dimension ‘`nlambda x nrho`’ storing the number of estimated non-zero (off-diagonal) partial correlation coefficients. Only for internal purpose.
`InfoStructure`	a named list whose elements contain information about the estimated networks. Only for internal purpose.
`nit`	an array of dimension ‘`2 x nlambda x nrho`’ storing the number of EM steps.
`Z`	the ‘`datacggm`’ object used to compute the censored graphical lasso estimator.
`diagonal`	the flag used to specify if the diagonal entries of the precision matrix are penalized.
`weights.B`	the matrix of non-negative weights used for the regression coefficients.
`weights.Tht`	the matrix of non-negative weights used for the precision matrix.
`nlambda`	the number of `\lambda`-values used.
`lambda.min.ratio`	the value used to compute the smallest `\lambda`-value.
`lambda`	the sequence of `\lambda`-values used to fit the model.
`nrho`	the number of `\rho`-values used.
`rho.min.ratio`	the value used to compute the smallest `\rho`-value.
`rho`	the sequence of `\rho`-values used to fit the model.
`model`	a description of the fitted model.
`maxit.em`	maximum number of iterations of the EM algorithm.
`thr.em`	threshold for the convergence of the EM algorithm.
`maxit.bcd`	maximum number of iterations of the glasso algorithm.
`thr.bcd`	threshold for the convergence of the glasso algorithm.
`conv`	a description of the error that has occurred.
`subrout`	the name of the Fortran subroutine where the error has occurred (for internal debug only).
`trace`	the integer used for printing information on screen.
`nobs`	the sample size
`nresp`	the number of response variables used to fit the model.
`npred`	the number of predictors used to fit the model.

Author(s)

Luigi Augugliaro (luigi.augugliaro@unipa.it)

References

Augugliaro L., Sottile G., Wit E.C., and Vinciotti V. (2023) <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v105.i01")}>. cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values. Journal of Statistical Software 105(1), 1–58.

Augugliaro, L., Abbruzzo, A., and Vinciotti, V. (2020a) <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biostatistics/kxy043")}>. \ell_1-Penalized censored Gaussian graphical model. Biostatistics 21, e1–e16.

Augugliaro, L., Sottile, G., and Vinciotti, V. (2020b) <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s11222-020-09945-7")}>. The conditional censored graphical lasso estimator. Statistics and Computing 30, 1273–1289.

Friedman, J.H., Hastie, T., and Tibshirani, R. (2008) <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biostatistics/kxm045")}>. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441.

Stadler, N. and Buhlmann, P. (2012) <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s11222-010-9219-7")}>. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing 22, 219–235.

Examples

set.seed(123)

# Model 1: censored glasso estimator (Augugliaro \emph{and other}, 2020a)
# Y ~ N(0, Sigma) and probability of left/right censored values equal to 0.05
n <- 1000L
p <- 3L
rho <- 0.3
Sigma <- outer(1L:p, 1L:p, function(i, j) rho^abs(i - j))
Z <- rcggm(n = n, Sigma = Sigma, probl = 0.05, probr = 0.05)
out <- cglasso(. ~ ., data = Z)
out

# Model 2: conditional censored glasso estimator (Augugliaro \emph{and other}, 2020b)
# Y ~ N(b0 + XB, Sigma)  and probability of left/right censored values equal to 0.05
n <- 1000L
p <- 3L
q <- 2L
b0 <- runif(p)
B <- matrix(runif(q * p), nrow = q, ncol = p)
X <- matrix(rnorm(n * q), nrow = n, ncol = q)
rho <- 0.3
Sigma <- outer(1L:p, 1L:p, function(i, j) rho^abs(i - j))
Z <- rcggm(n = n, b0 = b0, X = X, B = B, Sigma = Sigma, probl = 0.05, probr = 0.05)
out <- cglasso(. ~ ., data = Z)
out

# Model 3: missglasso estimator (Stadler \emph{and other}, 2012)
# Y ~ N(b0 + XB, Sigma)  and probability of missing-at-random values equal to 0.05
n <- 1000L
p <- 3L
q <- 2L
b0 <- runif(p)
B <- matrix(runif(q * p), nrow = q, ncol = p)
X <- matrix(rnorm(n * q), nrow = n, ncol = q)
rho <- 0.3
Sigma <- outer(1L:p, 1L:p, function(i, j) rho^abs(i - j))
Z <- rcggm(n = n, b0 = b0, X = X, B = B, Sigma = Sigma, probna = 0.05)
out <- cglasso(. ~ ., data = Z)
out

# Model 4: mixed estimator
# Y ~ N(b0 + XB, Sigma)  and
# 1. probability of left/right censored values equal to 0.05
# 2. probability of missing-at-random values equal to 0.05
n <- 1000L
p <- 3L
q <- 2L
b0 <- runif(p)
B <- matrix(runif(q * p), nrow = q, ncol = p)
X <- matrix(rnorm(n * q), nrow = n, ncol = q)
rho <- 0.3
Sigma <- outer(1L:p, 1L:p, function(i, j) rho^abs(i - j))
Z <- rcggm(n = n, X = X, b0 = b0, B = B, Sigma = Sigma, probl = 0.05, probr = 0.05,
           probna = 0.05)
out <- cglasso(. ~ ., data = Z)
out

cglasso documentation built on May 29, 2024, 3:04 a.m.