robFitConGraph | R Documentation |
The function computes a robust estimate of a scatter matrix subject to zero-constraints in its inverse. The methodology is described in Vogel & Tyler (2014).
robFitConGraph( X, amat, df = 3, tol = 1e-05, sigma1, plug_in = TRUE, direct = FALSE )
X |
A data matrix with n rows and p columns, representing
n observations and p variables. Elements of |
amat |
A p times p matrix representing the adjacency matrix
of a graphical model. |
df |
A positive real number or |
tol |
tolerance for numerical convergence. Iteration stops if the
maximal element-wise difference between two successive matrices is less
than |
sigma1 |
numerical. A positive real value. This value is needed for the p-value of the model fit.
If missing, it is computed by |
plug_in |
logical. The function offers two types of estimates: the
plug-in M-estimator and the direct M-estimator. If |
direct |
logical. If |
Two types of graph-constrained M-estimates of scatter based on the elliptical t-distribution are implemented: the direct estimate and the plug-in estimate. The direct estimate is referred to as graphical M-estimator in Vogel & Tyler (2014).
The plug-in estimate is two algorithms performed sequentially: First an
unconstrained t-maximum likelihood estimate of scatter is computed (the
same as cov.trob
from MASS
). This is then
plugged into the Gaussian graphical model fitting routine (the same as
fitConGraph
from ggm
). Specifically
Algorithm 17.1 from Hastie, Tibshirani, Friedman (2009) is used.
The direct estimate is the actual maximum-likelihood estimator within the elliptical graphical model based on the elliptical t-distribution. The algorithm is an iteratively-reweighted least-squares algorithm, where the Gaussian graphical model fitting procedure is nested into the t-estimation iteration. The direct estimate therefore takes longer to compute, but the estimator has a better statistical efficiency for small sample sizes. Both estimators are asymptotically equivalent. The estimates tend to be very close to each other for large sample sizes.
The deviance test statistic
The robustified deviance test statistic (short: deviance) for testing a graphical model G is
D_n = n ( log det(S_n(G)) - log det(S_n) ),
where S_n(G) is a graph-constrained scatter estimator and S_n the
corresponding unconstrained scatter estimator, see Vogel & Tyler (2014, p. 866 bottom).
Under the graphical model G, the deviance D_n converges to
σ_1 χ_r^2, where r is the number of missing edges in G. The
constant σ_1 depends on the scatter estimate S_n, the dimension p,
and the population distribution. It can either be provided directly (as sigma1
)
or, if missing, is computed by find_sigma1
, assuming a Gaussian population
distribution.
Although robFitConGraph
combines the functionality of
fitConGraph
and cov.trob
and
contains both as special cases, it uses neither. The
algorithms are implemented in C++.
Input and output of robFitConGraph
are intended to resemble
fitConGraph
from the package ggm
.
Some notable differences:
fitConGraph
takes as input the
unconstrained covariance matrix; robFitConGraph
takes the actual data.
fitConGraph
alternatively offers to specify the
graphical model as list of cliques; robFitConGraph
only takes
the adjacency matrix amat
.
fitConGraph
returns a value called the degrees of freedom as df
.
These degrees of freedom mean the number of missing edges, which appear
as the degrees of freedom of the chi-square distribution of the deviance test.
However, these degrees of freedom are completely unrelated to the
the argument df_est
of robFitConGraph
, which refers to the
degrees of freedom of the t-distribution defining the scatter estimate.
robFitConGraph
returns the number of missing edges as missing_edges
.
List with 8 elements:
|
p times p symmetric scatter matrix estimate. |
|
numerical p-vector, the location estimate
In case of |
|
numerical. The p-value of the model fitted. D_n/σ_1 is compared to the χ_r^2 distribution, see Details. |
|
numerical. The deviance test statistic D_n, see Details. |
|
integer. The number of missing edges r as indicated
by the argument |
|
numerical. Either the argument |
|
integer. Number of iterations of the t-MLE computation. |
|
integer. In the case of the plug-in estimate, this is
the number of iterations of the Gaussian graphical model fitting procedure
(Algorithm 17.1) in Hastie et al 2004). In the case of the direct estimate,
the Gaussian graphical model fitting is executed |
The last five return values are
mainly for traceability purposes.
Particularly dev
, missing_edges
, and sigma1
are easily obtained by the inputs or outputs of robFitConGraph
.
They are the ingredients needed to compute the p-value:
pval <- pchisq(q = dev/sigma1, df = missing_edges, lower.tail = FALSE)
Stuart Watt, Daniel Vogel
Vogel, D., Tyler, D. E. (2014): Robust estimators
for nondecomposable elliptical graphical models, Biometrika, 101, 865-882
Hastie, T., Tibshirani, R. and Friedman, J. (2004). The elements of
statistical learning. New York: Springer.
fitConGraph
from package
ggm
for non-robust graph-constrained covariance estimation
cov.trob
from package MASS
for unconstrained
p
times p
t-MLE scatter matrix
# --- Example 1: anxieties data --- # The data set 'anxieties' contains 8 anxiety variables of 82 observations. We test the # null hypothesis that, given Generalized Anxiety (GAD), Music performance anxiety (MPA) # is conditionally independent of all remaining 6 variables: # - build the corresponding graphical model - data(anxieties) amat = matrix(1,ncol=ncol(anxieties),nrow=ncol(anxieties)) colnames(amat) <- colnames(anxieties) rownames(amat) <- colnames(amat) amat["MPA",c("SAD","PD","AG","SP","SEP","ILL")] <- 0 amat[c("SAD","PD","AG","SP","SEP","ILL"),"MPA"] <- 0 amat # MPA GAD SAD PD AG SP SEP ILL # MPA 1 1 0 0 0 0 0 0 # GAD 1 1 1 1 1 1 1 1 # SAD 0 1 1 1 1 1 1 1 # PD 0 1 1 1 1 1 1 1 # AG 0 1 1 1 1 1 1 1 # SP 0 1 1 1 1 1 1 1 # SEP 0 1 1 1 1 1 1 1 # ILL 0 1 1 1 1 1 1 1 robFitConGraph(X = anxieties, amat = amat)$pval # [1] 0.881159 # This provides no evidence against the null hypothesis. # the non-robust, sample-covariance-based model fit: robFitConGraph(X = anxieties, amat = amat, df = Inf)$pval # [1] 0.6310895 # ...provides no evidence against the null hypothesis either. # The latter is also obtained as: # pchisq(q = ggm::fitConGraph(S = cov(anxieties), amat = amat, # n = nrow(anxieties))$dev, # df = 6, lower.tail = FALSE) # --- Example 2: simulated data - the chordless 7-cycle --- # - build the corresponding graphical model - chordless_p_cycle <- function(rho, p){ M <- diag(1,p) for (i in 1:(p-1)) M[i,i+1] <- M[i+1,i] <- -rho M[1,p] <- M[p,1] <- -rho return(M) } p <- 7 # number of variables rho <- 0.4 # partial correlation PCM <- chordless_p_cycle(rho, p) # partial correlation matrix SM <- cov2cor(solve(PCM)) # shape matrix (i.e covariance matrix up to scale) model <- abs(sign(PCM)) # adjacency matrix of the chordless-7-cycle # > model # [,1] [,2] [,3] [,4] [,5] [,6] [,7] # [1,] 1 1 0 0 0 0 1 # [2,] 1 1 1 0 0 0 0 # [3,] 0 1 1 1 0 0 0 # [4,] 0 0 1 1 1 0 0 # [5,] 0 0 0 1 1 1 0 # [6,] 0 0 0 0 1 1 1 # [7,] 1 0 0 0 0 1 1 # This is the cordless-7-cycle (p.872 Figure 1 (a) in Vogel & Tyler, 2014). # All non-zero partial correlations are 0.4. # The true covariance is (up to scale) 'SM'. This matrix is constructed such # that it has zero entries in its inverse as specified by 'model'. # - generate data from the graphical model (elliptical t3 distribution) - n <- 50 # number of observations df_data <- 3 # degrees of freedom library(mvtnorm) # for rmvt function set.seed(918273) # for reproducability X <- rmvt(n = n, sigma = SM, df = df_data) # X contains a data set of size n = 50 and dimension p = 7, sampled from the # elliptical t-distribution with 3 degrees of freedom and shape matrix 'SM' # - comparing estimates - # the true correlation matrix: round(cov2cor(SM), d = 2) # Pearson correlations: round(cov2cor(cov(X)), d = 2) # robust correlations based on the direct graph-constrained t-MLE: S1 <- robFitConGraph(X, amat = model, df = df_data, direct = TRUE)$Shat round(cov2cor(S1), d = 2) # robust correlations based on the plug_in graph-constrained t-MLE: S2 <- robFitConGraph(X, amat = model, df = df_data, plug_in = TRUE)$Shat round(cov2cor(S2), d = 2) # The correlation estimates based on S1 and S2 are close to the true # correlations, whereas the sample correlations differ strongly. # Note: sample correlations are not asymptotically normal at the t3 distribution.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.