rfci | R Documentation |
Estimate an RFCI-PAG from observational data, using the RFCI-algorithm.
rfci(suffStat, indepTest, alpha, labels, p,
skel.method = c("stable", "original", "stable.fast"),
fixedGaps = NULL, fixedEdges = NULL, NAdelete = TRUE,
m.max = Inf, rules = rep(TRUE, 10),
conservative = FALSE, maj.rule = FALSE,
numCores = 1, verbose = FALSE)
suffStat |
Sufficient statistics: List containing all necessary
elements for the conditional independence decisions in the
function |
indepTest |
Predefined function for testing conditional independence. The
function is internally called as |
alpha |
significance level (number in |
labels |
(optional) character vector of variable (or
“node”) names. Typically preferred to specifying |
p |
(optional) number of variables (or nodes). May be specified
if |
skel.method |
Character string specifying method; the default,
|
fixedGaps |
A logical matrix of dimension p*p. If entry
|
fixedEdges |
A logical matrix of dimension p*p. If entry
|
NAdelete |
If indepTest returns |
m.max |
Maximum size of the conditioning sets that are considered in the conditional independence tests. |
rules |
Logical vector of length 10 indicating which rules should be used when directing edges. The order of the rules is taken from Zhang (2009). |
conservative |
Logical indicating if the unshielded triples should be checked for ambiguity after the skeleton has been found, similar to the conservative PC algorithm. |
maj.rule |
Logical indicating if the unshielded triples should be checked for ambiguity after the skeleton has been found using a majority rule idea, which is less strict than the conservative. |
numCores |
Specifies the number of cores to be used for parallel
estimation of |
verbose |
If true, more detailed output is provided. |
This function is rather similar to fci. However, it does not compute any Possible-D-SEP sets and thus does not make tests conditioning on subsets of Possible-D-SEP. This makes RFCI much faster than FCI. The orientation rules for v-structures and rule 4 were modified in order to produce an RFCI-PAG which, in the oracle version, is guaranteed to have the correct ancestral relationships.
The first part of the RFCI algorithm is analogous to the PC and FCI
algorithm. It starts with a complete undirected graph and estimates an
initial skeleton using the function skeleton
, which
produces an initial order-independent skeleton, see
skeleton
for more details. All edges
of this skeleton are of the form o-o. Due to the presence of hidden
variables, it is no longer sufficient to consider only subsets of the
neighborhoods of nodes x
and y
to decide whether the
edge x o-o y
should be removed. The FCI algorithm performs
independence tests conditioning on subsets of Possible-D-SEP to remove
those edges. Since this procedure is computationally infeasible, the
RFCI algorithm uses a different approach to remove some of those
superfluous edges before orienting the v-structures and the
discriminating paths in orientation rule 4.
Before orienting the v-structures, we perform the following additional conditional independence tests. For each unshielded triple a-b-c in the initial skeleton, we check if both a and b and b and c are conditionally dependent given the separating of a and c (sepset(a,c)). These conditional dependencies may not have been checked while estimating the initial skeleton, since sepset(a,c) does not need to be a subset of the neighbors of a nor of the neighbors of c. If both conditional dependencies hold and b is not in the sepset(a,c), the triple is oriented as a v-structure a->b<-c. On the other hand, if an additional conditional independence relationship may be detected, say a is independent from b given the sepset(a,c), the edge between a and c is removed from the graph and the set responsible for that is saved in sepset(a,b). The removal of an edge can destroy or create new unshielded triples in the graph. To solve this problem we work with lists (for details see Colombo et al., 2012).
Before orienting discriminating paths, we perform the following additional
conditional independence tests. For each triple a <-* b o- *c with a
-> c, the algorithm searches for a discriminating path p = <d,
. . . , a,b,c> for b of minimal length, and checks that the vertices
in every consecutive pair (f1,f2) on p are conditionally dependent
given all subsets of
\textrm{sepset}(d,c) \setminus \{f1, f2\}
. If we do not find any
conditional independence relationship, the path is oriented as in rule
(R4). If one or more conditional independence relationships are found,
the corresponding edges are removed, their minimal separating sets are
stored.
Conservative RFCI can be computed if the argument of conservative
is
TRUE
. After the final skeleton is computed and the
additional local tests on all unshielded triples, as described above,
have been done, all potential v-structures a-b-c are checked
in the following way. We test whether a and c are independent
conditioning on any subset of the neighbors of a or any subset of the
neighbors of c. When a subset makes a and c conditionally independent,
we call it a separating set. If b is in no such separating set or in
all such separating sets, no further action is taken and the normal
version of the RFCI algorithm is continued. If, however, b is in only
some separating sets, the triple a-b-c is marked 'ambiguous'. If a is
independent of c given some S in the skeleton (i.e., the edge a-c
dropped out), but a and c remain dependent given all subsets of
neighbors of either a or c, we will call all triples a-b-c
'unambiguous'. This is because in the RFCI algorithm, the true
separating set might be outside the neighborhood of either a or c. An
ambiguous triple is not oriented as a v-structure. Furthermore, no further
orientation rule that needs to know whether a-b-c is a v-structure or
not is applied. Instead of using the conservative version, which is
quite strict towards the v-structures, Colombo and Maathuis (2014)
introduced a less strict version for the v-structures called majority
rule. This adaptation can be called using maj.rule = TRUE
. In
this case, the triple a-b-c is marked as 'ambiguous' if and only if b
is in exactly 50 percent of such separating sets or no separating set
was found. If b is in less than 50 percent of the separating sets it
is set as a v-structure, and if in more than 50 percent it is set as a
non v-structure (for more details see Colombo and Maathuis,
2014).
The implementation uses the stabilized skeleton
skeleton
, which produces an initial order-independent
skeleton. The final skeleton and edge orientations can still be
order-dependent, see Colombo and Maathuis (2014).
An object of class
fciAlgo
(see
fciAlgo
) containing the estimated graph
(in the form of an adjacency matrix with various possible edge marks),
the conditioning sets that lead to edge removals (sepset) and several other
parameters.
Diego Colombo and Markus Kalisch (kalisch@stat.math.ethz.ch).
D. Colombo and M.H. Maathuis (2014).Order-independent constraint-based causal structure learning. Journal of Machine Learning Research 15 3741-3782.
D. Colombo, M. H. Maathuis, M. Kalisch, T. S. Richardson (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40, 294-321.
fci
and fciPlus
for estimating a
PAG using the FCI algorithm;
skeleton
for estimating an initial skeleton
using the RFCI algorithm; pc
for estimating a CPDAG using
the PC algorithm; gaussCItest
,
disCItest
, binCItest
and
dsepTest
as examples for indepTest
.
##################################################
## Example without latent variables
##################################################
set.seed(42)
p <- 7
## generate and draw random DAG :
myDAG <- randomDAG(p, prob = 0.4)
## find skeleton and PAG using the RFCI algorithm
suffStat <- list(C = cov2cor(trueCov(myDAG)), n = 10^9)
indepTest <- gaussCItest
res <- rfci(suffStat, indepTest, alpha = 0.9999, p=p, verbose=TRUE)
##################################################% --------------
## Example with hidden variables
## Zhang (2008), Fig. 6, p.1882
##################################################
## create the DAG :
V <- LETTERS[1:5]
edL <- setNames(vector("list", length = 5), V)
edL[[1]] <- list(edges=c(2,4),weights=c(1,1))
edL[[2]] <- list(edges=3,weights=c(1))
edL[[3]] <- list(edges=5,weights=c(1))
edL[[4]] <- list(edges=5,weights=c(1))
## and leave edL[[ 5 ]] empty
g <- new("graphNEL", nodes=V, edgeL=edL, edgemode="directed")
if (require(Rgraphviz))
plot(g)
## define the latent variable
L <- 1
## compute the true covariance matrix of g
cov.mat <- trueCov(g)
## delete rows and columns belonging to latent variable L
true.cov <- cov.mat[-L,-L]
## transform covariance matrix into a correlation matrix
true.corr <- cov2cor(true.cov)
## find PAG with RFCI algorithm
## as dependence "oracle", we use the true correlation matrix in
## gaussCItest() with a large "virtual sample size" and a large alpha :
rfci.pag <- rfci(suffStat = list(C = true.corr, n = 10^9),
indepTest = gaussCItest, alpha = 0.9999, labels = V[-L],
verbose=TRUE)
## define PAG given in Zhang (2008), Fig. 6, p.1882
corr.pag <- rbind(c(0,1,1,0),
c(1,0,0,2),
c(1,0,0,2),
c(0,3,3,0))
## check that estimated and correct PAG are in agreement:
stopifnot(corr.pag == rfci.pag@amat)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.