Hastie Tibshirani Friedman iterative regression algorithm
Performs maximum likelihood estimation of a covariance matrix given the independence constraints from an input undirected graph.
input matrix, in the context of this package, the sample covariance matrix.
input undirected graph.
tolerance under which the iterative algorithm stops.
show progress on calculations.
logical; if FALSE then the faster C implementation is used (default); if TRUE then only R code is executed.
This is an alternative to the Iterative Proportional Fitting (IPF) algorithm
(see, Whittaker, 1990, pp. 182-185 and
qpIPF) which also
adjusts the input matrix to the independence constraints in the input undirected
graph. However, differently to the IPF, it works by going through each of the
vertices fitting the marginal distribution over the corresponding vertex boundary.
It stops when the adjusted matrix at the current iteration differs from the matrix
at the previous iteration in less or equal than a given tolerance value. This
algorithm is described by Hastie, Tibshirani and Friedman (2009, pg. 634), hence we
name it here HTF, and it has the advantage over the IPF that it does not require the
list of maximal cliques of the graph which may be exponentially large. In
contrast, it requires that the maximum boundary size of the graph is below the
number of samples where the input sample covariance matrix
S was estimated.
For the purpose of exploring qp-graphs that meet such a requirement, one can use
The input matrix adjusted to the constraints imposed by the input undirected graph, i.e., a maximum likelihood estimate of the sample covariance matrix that includes the independence constraints encoded in the undirected graph.
Thanks to Giovanni Marchetti for bringing us our attention to this algorithm and
sharing an early version of its implementation on the R package
Castelo, R. and Roverato, A. A robust procedure for Gaussian graphical model search from microarray data with p larger than n. J. Mach. Learn. Res., 7:2621-2650, 2006.
Hastie, T., Tibshirani, R. and Friedman, J.H. The Elements of Statistical Learning, Springer, 2009.
Tur, I., Roverato, A. and Castelo, R. Mapping eQTL networks with mixed graphical Markov models. Genetics, 198(4):1377-1393, 2014.
Whittaker, J. Graphical Models in Applied Multivariate Statistics. Wiley, 1990.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
require(graph) require(mvtnorm) nVar <- 50 ## number of variables nObs <- 100 ## number of observations to simulate set.seed(123) g <- randomEGraph(as.character(1:nVar), p=0.15) Sigma <- qpG2Sigma(g, rho=0.5) X <- rmvnorm(nObs, sigma=as.matrix(Sigma)) ## MLE of the sample covariance matrix S <- cov(X) ## more efficient MLE of the sample covariance matrix using HTF S_htf <- qpHTF(S, g) ## get the adjacency matrix and put the diagonal to one A <- as(g, "matrix") diag(A) <- 1 ## entries in S and S_htf for present edges in g should coincide max(abs(S_htf[A==1] - S[A==1])) ## entries in the inverse of S_htf for missing edges in g should be zero max(solve(S_htf)[A==0])