SEMdag | R Documentation |
Two-step extraction of the optimal DAG from an input (or empty)
graph, using in step 1) graph topological order or bottom-up search order,
and in step 2) parent recovery with the LASSO-based algorithm (FHT, 2010),
implemented in glmnet
.
SEMdag(
graph,
data,
LO = "TO",
beta = 0,
eta = NULL,
lambdas = NA,
penalty = TRUE,
verbose = FALSE,
...
)
graph |
An igraph object or a graph with no edges (make_empty_graph(n=0)). |
data |
A matrix whith n rows corresponding to subjects, and p columns to graph nodes (variables). |
LO |
character for linear order method. If LO="TO" or LO="TL" the
topological order (resp. level) of the input graph is enabled, while LO="BU"
the data-driven bottom-up search of vertex (resp. layer) order is performed
using the vertices of the empty graph. By default |
beta |
Numeric value. Minimum absolute LASSO beta coefficient for
a new direct link to be retained in the final model. By default,
|
eta |
Numeric value. Minimum fixed eta threshold for bottom-up search
of vertex (eta = 0) or layer (eta > 0) ordering. Use eta = NULL, for estimation
of eta adaptively with half of the sample data. By default, |
lambdas |
A vector of regularization LASSO lambda values. If lambdas is
NULL, the |
penalty |
A logical value. Separate penalty factors can be applied to
each coefficient. This is a number that multiplies lambda to allow differential
shrinkage. Can be 0 for some variables, which implies no shrinkage, and that
variable is always included in the model. If TRUE (default) weights are based
on the graph edges: 0 (i.e., edge present) and 1 (i.e., missing edge) ensures
that the input edges will be retained in the final model. If FALSE the
|
verbose |
A logical value. If FALSE (default), the processed graphs will not be plotted to screen. |
... |
Currently ignored. |
The extracted DAG is estimated using the two-step order search approach.
First a vertex (node) or level (layer) order of p nodes is determined, and from
this sort, the DAG can be learned using in step 2) penalized (L1) regressions
(Shojaie and Michailidis, 2010). The estimate linear order are obtained from
a priori graph topological vertex (TO) or level (TL) ordering, or with a
data-driven Bottom-up (BU) approach, assuming a SEM whose error terms have equal
variances (Peters and Bühlmann, 2014). The BU algorithm first estimates the last
element (the terminal vertex) using the diagonal entries of the inverse covariance
matrix with: t = argmin(diag(Omega)), or the terminal layer (> 1 vertices) with
d = diag(Omega)- t < eta. And then, it determines its parents with L1 regression.
After eliminating the last element (or layer) of the ordering, the algorithm applies
the same procedure until a DAG is completely estimated. In high-dimensional data
(n < p), the inverse covariance matrix is computed by glasso-based algorithm
(FHT, 2008), implemented in glasso
. If the input graph is
not acyclic, in TO or TL, a warning message will be raised, and a cycle-breaking
algorithm will be applied (see graph2dag
for details).
Output DAG will be colored: vertices in cyan, if they are source nodes, and in
orange, if they are sink nodes, and edges in gray, if they were present in the
input graph, and in green, if they are new edges generated by LASSO screening.
A list of 3 igraph objects plus the vertex ordering:
"dag", the estimated DAG;
"dag.new", new estimated connections;
"dag.old", connections preserved from the input graph;
"LO", the estimated vertex ordering.
Mario Grassi mario.grassi@unipv.it
Friedman J, Hastie T, Tibshirani R (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441. <https://doi.org/10.1093/biostatistics/kxm045>
Friedman J, Hastie T, Tibshirani R (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, Vol. 33(1), 1-22. <https://doi.org/10.18637/jss.v033.i01>
Shojaie A, Michailidis G (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika, 97(3): 519-538. <https://doi.org/10.1093/biomet/asq038>
Jankova J, van de Geer S (2015). Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics, 9(1): 1205-1229. <https://doi.org/10.1214/15-EJS1031>
Peters J, Bühlmann P (2014). Identifiability of Gaussian structural equation models with equal error variances. Biometrika, 101(1):219–228. <https://doi.org/10.1093/biomet/ast043>
modelSearch
#Set function param
ig <- sachs$graph
X <- log(sachs$pkc)
group <- sachs$group
# DAG estimation (default values)
dag0 <- SEMdag(ig, X)
sem0 <- SEMrun(ig, X, group)
# Graphs
old.par <- par(no.readonly = TRUE)
par(mfrow=c(2,2), mar=rep(1,4))
plot(sachs$graph, layout=layout.circle, main="input graph")
plot(dag0$dag, layout=layout.circle, main = "Output DAG")
plot(dag0$dag.old, layout=layout.circle, main = "Inferred old edges")
plot(dag0$dag.new, layout=layout.circle, main = "Inferred new edges")
par(old.par)
# Four DAG estimation
dag1 <- SEMdag(ig, X, LO="TO")
dag2 <- SEMdag(ig, X, LO="TL")
dag3 <- SEMdag(ig, X, LO="BU", eta=0)
dag4 <- SEMdag(ig, X, LO="BU", eta=NULL)
unlist(dag1$LO)
dag2$LO
unlist(dag3$LO)
dag4$LO
# Graphs
old.par <- par(no.readonly = TRUE)
par(mfrow=c(2,2), mar=rep(2,4))
gplot(dag1$dag, main="TO")
gplot(dag2$dag, main="TL")
gplot(dag3$dag, main="BU")
gplot(dag4$dag, main="TLBU")
par(old.par)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.