Hastie Tibshirani Friedman iterative regression algorithm

Share:

Description

Performs maximum likelihood estimation of a covariance matrix given the independence constraints from an input undirected graph.

Usage

1
qpHTF(S, g, tol = 0.001, verbose = FALSE, R.code.only = FALSE)

Arguments

S

input matrix, in the context of this package, the sample covariance matrix.

g

input undirected graph.

tol

tolerance under which the iterative algorithm stops.

verbose

show progress on calculations.

R.code.only

logical; if FALSE then the faster C implementation is used (default); if TRUE then only R code is executed.

Details

This is an alternative to the Iterative Proportional Fitting (IPF) algorithm (see, Whittaker, 1990, pp. 182-185 and qpIPF) which also adjusts the input matrix to the independence constraints in the input undirected graph. However, differently to the IPF, it works by going through each of the vertices fitting the marginal distribution over the corresponding vertex boundary. It stops when the adjusted matrix at the current iteration differs from the matrix at the previous iteration in less or equal than a given tolerance value. This algorithm is described by Hastie, Tibshirani and Friedman (2009, pg. 634), hence we name it here HTF, and it has the advantage over the IPF that it does not require the list of maximal cliques of the graph which may be exponentially large. In contrast, it requires that the maximum boundary size of the graph is below the number of samples where the input sample covariance matrix S was estimated. For the purpose of exploring qp-graphs that meet such a requirement, one can use the function qpBoundary.

Value

The input matrix adjusted to the constraints imposed by the input undirected graph, i.e., a maximum likelihood estimate of the sample covariance matrix that includes the independence constraints encoded in the undirected graph.

Note

Thanks to Giovanni Marchetti for bringing us our attention to this algorithm and sharing an early version of its implementation on the R package ggm.

Author(s)

R. Castelo

References

Castelo, R. and Roverato, A. A robust procedure for Gaussian graphical model search from microarray data with p larger than n. J. Mach. Learn. Res., 7:2621-2650, 2006.

Hastie, T., Tibshirani, R. and Friedman, J.H. The Elements of Statistical Learning, Springer, 2009.

Tur, I., Roverato, A. and Castelo, R. Mapping eQTL networks with mixed graphical Markov models. Genetics, 198(4):1377-1393, 2014.

Whittaker, J. Graphical Models in Applied Multivariate Statistics. Wiley, 1990.

See Also

qpBoundary qpIPF qpPAC

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
require(graph)
require(mvtnorm)

nVar <- 50  ## number of variables
nObs <- 100 ## number of observations to simulate

set.seed(123)

g <- randomEGraph(as.character(1:nVar), p=0.15)

Sigma <- qpG2Sigma(g, rho=0.5)
X <- rmvnorm(nObs, sigma=as.matrix(Sigma))

## MLE of the sample covariance matrix
S <- cov(X)

## more efficient MLE of the sample covariance matrix using HTF
S_htf <- qpHTF(S, g)

## get the adjacency matrix and put the diagonal to one
A <- as(g, "matrix")
diag(A) <- 1

## entries in S and S_htf for present edges in g should coincide
max(abs(S_htf[A==1] - S[A==1]))

## entries in the inverse of S_htf for missing edges in g should be zero
max(solve(S_htf)[A==0])

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.