R/getRanking.R
In CompareCausalNetworks: Interface to Diverse Estimation Methods of Causal Networks

Documented in getRanking

#' Estimate a ranking of edges for causal relations in the underlying graph structure
#' using stability ranking.
#'
#' @description Estimates a ranking of edges for a given query, e.g. for 
#' parental relations in the underlying causal graph structure, using 
#' various possible methods. 
#' 
#' Supported methods at the moment are ARGES,
#' backShift, bivariateANM, bivariateCAM, CAM, FCI, FCI+, GES, GIES, hiddenICP, 
#' ICP, LINGAM, MMHC, rankARGES, rankFci, rankGES, rankGIES, rankPC, 
#' regression, RFCI and PC.
#'
#' @param X A \eqn{(n x p)}-data matrix with \eqn{n} observations of \eqn{p} variables.
#' @param environment A vector of length \eqn{n}, where the entry for 
#' observation \eqn{i} is an index for the environment in which observation \eqn{i} took 
#' place (simplest case entries \code{1} for observational data and entries
#'  \code{2} for interventional data of unspecified type). Is required for 
#'  methods \code{ICP}, \code{hiddenICP}, \code{backShift}.
#' @param interventions A optional list of length n. The entry for observation
#'  i is a numeric vector that specifies the variables on which interventions 
#'  happened for observation i (a scalar if an intervention happened on just 
#'  one variable and \code{numeric(0)} if no intervention occured for this 
#'  observation). Is used for method \code{gies} but will generate the vector 
#'  \code{environment} if this is set to \code{NULL} (even though it might 
#'  generate too many different environments for some data so a hand-picked 
#'  vector \code{environment} is preferable). Is also used for \code{ICP} and 
#'  \code{hiddenICP} to exclude interventions on the target variable of 
#'  interest.
#' @param queries One (or more of) "isParent", "isMaybeParent", "isNoParent",
#' "isAncestor","isMaybeAncestor", "isNoAncestor"
#' @param method A string that specfies the method to use. The methods 
#' \code{pc} (PC-algorithm), \code{LINGAM} (LINGAM), \code{arges} (Adaptively 
#' restricted greedy equivalence search), \code{ges} 
#' (Greedy equivalence search), \code{gies} (Greedy interventional equivalence 
#' search),  \code{fci} (Fast causal inference)  
#' and \code{rfci} (Really fast causal inference) are imported from the 
#' package "pcalg" and are documented there in more detail, including the 
#' additional options that can be supplied via \code{setOptions}. The method 
#' \code{CAM} (Causal additive models) is documented in the package "CAM" and 
#' the methods \code{ICP} (Invariant causal prediction), \code{hiddenICP} 
#' (Invariant causal prediction with hidden variables) are from the package 
#' "InvariantCausalPrediction". The method \code{backShift} comes from the 
#' package "backShift". The method \code{mmhc} comes from the 
#' package "bnlearn". 
#' Finally, the methods \code{bivariateANM} and 
#' \code{bivariateCAM} are for now implemented internally but will hopefully 
#' be part of another package at some point in the near future.
#' @param alpha The level at which tests are done. This leads to confidence 
#' intervals for \code{ICP} and \code{hiddenICP} and is used internally for 
#' \code{pc} and \code{rfci}.
#' @param variableSelMat An optional logical matrix of dimension (pxp). An 
#' entry \code{TRUE} for entry (i,j) says that variable i should be considered 
#' as a potential parent for variable j and vice versa for \code{FALSE}. If the 
#' default value of \code{NULL} is used, all variables will be considered, but 
#' this can be very slow, especially for methods \code{pc}, \code{ges}, 
#' \code{gies}, \code{rfci} and \code{CAM}.
#' @param excludeTargetInterventions When looking for parents of variable k 
#' in 1,...,p, set to \code{TRUE} if observations where an intervention on 
#' variable k occured should be excluded. Default is \code{TRUE}.
#' @param onlyObservationalData If set to \code{TRUE}, only observational data 
#' is used. It will take the index in \code{environment} specified by 
#' \code{indexObservationalData}. If \code{environment} is \code{NULL}, all 
#' observations are used. Default is \code{FALSE}.
#' @param indexObservationalData Index in \code{environment} that encodes 
#' observational data. Default is \code{1}.
#' @param setOptions A list that can take method-specific options; see the 
#' individual documentations of the methods for more options and their 
#' possible values.
#' @param assumeNoSelectionVars Set to \code{TRUE} is you want to assume the absence 
#' of selection variables.
#' @param nsim The number of resamples for stability selection.
#' @param sampleSettings The fraction of different environments to resample 
#'  in each resampling (at least two different environments will be selected so 
#'  the argument is without effect if there are just two different environments 
#'  in total).
#' @param sampleObservations The fraction of samples to resample in each 
#'  environment.
#' @param verbose If \code{TRUE}, detailed output is provided.
#' @param ... Parameters to be passed to underlying method's function.
#' 
#' @return A list with the following entries:
#' \itemize{
#' \item \code{ranking} A list of length \code{length(queries)}. For each query,
#' the corresponding list entry contains a matrix of dimension \eqn{(p x p) x 2} 
#' with the ranking of edges. E.g. the first row indicates that the edge from 
#' ranking$isParent[1,1] to ranking$isParent[1,2] is the most likely edge according
#' to the method under consideration. 
#' \item \code{resList}  A list of length \code{length(queries)}. For each query,
#' the corresponding list entry contains a matrix  of dimension \eqn{(p x p)} with the counts for 
#' \eqn{A_{i,j} = 1} across the \code{nsim} subsamples.
#' \item \code{simEstimates} A list of length \code{nsim} with the method's 
#' output for each of the \code{nsim} subsamples.
#' }
#'
#' @details For both parental and ancestral relations, three queries are supported. 
#' The existence of a relation is assessed by the queries \code{isParent} and 
#' \code{isAncestor}; the absence of a relation is assessed by the queries 
#' \code{isNoParent} and \code{isNoAncestor}; the potential existence of a 
#' relation is addressed by the queries \code{isMaybeParent} and 
#' \code{isMaybeAncestor}.
#' 
#' All queries return a connectivity matrix which we denote by \eqn{A}. 
#' The interpretation of the entries of \eqn{A} differs according to the considered query:
#' 
#' \strong{Parental relations:} Queries concerning parental relations can only 
#' be answered by those methods under consideration that return a DAG, a CPDAG 
#' or a directed cyclic graph. When we say that a particular method cannot 
#' answer a given query, then the method's output with respect to this query 
#' will be the zero matrix. However, the eventual ranking for such a query will 
#' not necessarily be random due to the tie breaking scheme that is applied 
#' when ranking pairs of variables (see below).
#' 
#' \enumerate{
#' \item \code{isParent} In the connectivity matrix \eqn{A} returned by this 
#' query, the entry \eqn{A_{i,j} = 1} means that there is \emph{a directed edge} 
#' from node \eqn{i} to node \eqn{j} in the graph structure estimated by the 
#' method under consideration. Otherwise, \eqn{A_{i,j} = 0}.
#' \item \code{isMaybeParent} \eqn{A_{i,j} = 1} means that there is 
#' \emph{a directed or an undirected edge} from node \eqn{i} to node \eqn{j}
#' in the estimated graph structure. Otherwise, \eqn{A_{i,j} = 0}.
#' \item \code{isNoParent} \eqn{A_{i,j} = 1} means that there is neither a 
#' directed nor an undirected edge from node \eqn{i} to node \eqn{j} in the 
#' estimated graph structure. Otherwise, \eqn{A_{i,j} = 0}.
#' }
#' 
#' \strong{Ancestral relations:} Queries concerning ancestral relations can be 
#' answered by all methods under consideration.
#' \enumerate{
#' \item \code{isAncestor} \eqn{A_{i,j} = 1} means that there is a 
#' \emph{directed path} from node \eqn{i} to node \eqn{j} in the estimated graph 
#' structure. Otherwise, \eqn{A_{i,j} = 0}. In case of PAGs, directed paths can 
#' contain the edge types \eqn{i --> j} and \eqn{i --o j}. Including the latter 
#' edge type in this category implies that we exclude the existence of selection 
#' variables.
#' \item \code{isMaybeAncestor} \eqn{A_{i,j} = 1} then means that there is a 
#' path from node \eqn{i} to node \eqn{j} that contains directed and/or undirected 
#' edges. Otherwise, \eqn{A_{i,j} = 0}.	For PAGs, such paths can contain the edge 
#' types \eqn{i --> j}, \eqn{i --o j}, \eqn{i o-o j} and/or 
#' \eqn{i o-> j}. Otherwise, \eqn{A_{i,j} = 0}.
#' \item \code{isNoAncestor} \eqn{A_{i,j} = 1} means that there is neither a 
#' directed path nor a partially directed path from node \eqn{i} to node \eqn{j}
#'  in the estimated graph structure. Otherwise, \eqn{A_{i,j} = 0}.
#'  }
#' 
#' \strong{Stability ranking:} To obtain a ranking of edges for a given set of 
#' queries, we run the method under consideration on \code{nsims} random 
#' subsamples of the data. In each round, we draw samples from a fraction of 
#' settings, where the size of the fraction is specified by \code{sampleSettings}. 
#' In each chosen setting, we sample a fraction of observations 
#' uniformly at random without replacement, where the size of the fraction is 
#' specified by  \code{sampleObservations}. 
#' 
#' For each subsample we randomly 
#' permute the order of the variables in the input. 
#' Methods that are order-dependent can therefore not exploit any potential 
#' advantage stemming from a data matrix with columns ordered according to the 
#' causal ordering or a similar one. We then run the method on each subsample. 
#' 
#' For each subsample and a particular query, we obtain the corresponding 
#' connectivity matrix \eqn{A}. We can then rank all pairs of nodes \eqn{i,j} 
#' according to the frequency of the occurrence of \eqn{A_{i,j} = 1} across 
#' subsamples. Ties between pairs of variables can be broken with the results 
#' of the other queries if they are also computed as specified by \code{queries}; 
#' otherwise ties are broken at random:
#' 
#' \itemize{
#' \item If the query is \code{isParent}, ties are broken with counts for 
#' \code{isMaybeParent}. 
#' \item For the query \code{isMaybeParent} ties are broken with counts for 
#' \code{isParent}, i.e. in case of equal counts we give a preference to the 
#' edge that was considered more often to be a 'certain' parent. For methods 
#' returning DAGs this scheme makes the ranking for \code{isMaybeParent} equal 
#' to the result for \code{isParent}, up to the random tie breaking that is 
#' applied for \code{isParent}.
#' \item If the query is \code{isNoParent}, ties are broken according to which 
#' edge was selected less often in the query \code{isMaybeParent}. 
#' \item If the query is \code{isAncestor}, ties are broken with counts for 
#' \code{isMaybeAncestor}. 
#' \item For the query \code{isMaybeAncestor} ties are broken with counts 
#' for \code{isAncestor}, i.e. in case of equal counts we give a preference 
#' to the edge that was considered more often to be a 'certain' ancestor. 
#' For methods returning DAGs this scheme makes the ranking for \code{isMaybeAncestor} 
#' equal to the result for \code{isAncestor}, up to the random tie breaking 
#' that is applied for \code{isAncestor}.
#' \item If the query is \code{isNoAncestor}, ties are broken according to 
#' which one was selected less often in the query \code{isMaybeAncestor}. 
#' } 
#' 
#' If the tie breaking matrix defined according to these rules is 0, 
#' a matrix with standard normal random entries is used to break ties. 
#' Similarly, if there are remaining ties after applying the tie breaking rules 
#' described above, ties are broken randomly.
#'     
#' @author Christina Heinze-Deml \email{heinzedeml@@stat.math.ethz.ch}
#' 
#' @seealso \code{\link{getParents}} for the underlying point-estimate of 
#' the causal graph.
#'
#' @examples 
#' data("simDataInv")
#' X <- simDataInv$X
#' set.seed(1)
#' if(require(pcalg)){
#'   rank <- getRanking(X,
#'                 environment = simDataInv$environment,
#'                 queries = c("isParent","isMaybeParent"),
#'                 method = c("LINGAM"),
#'                 verbose = FALSE)
#'   # estimated ranking
#'   print(rank$ranking$isParent)
#'  
#'   # true adjacency matrix
#'   print(simDataInv$configs$trueA)
#' }else{
#'   cat("\nThe packages 'pcalg' is needed for the example to
#' work. Please install it.")
#' }
#' 
#' @concept Causality
#' @concept Graph estimations
#'  
getRanking <- function(X, environment, interventions=NULL, 
                       queries = c("isParent", "isMaybeParent", "isNoParent",
                                   "isAncestor","isMaybeAncestor", "isNoAncestor"),
                       method= c("ICP", "hiddenICP", "backShift", "pc", 
                                 "LINGAM", "ges", "gies", "CAM", "fci", "rfci",
                                 "regression", "bivariateANM", 
                                 "bivariateCAM")[1],  
                       alpha=0.1, variableSelMat=NULL, 
                       excludeTargetInterventions=TRUE, 
                       onlyObservationalData=FALSE, 
                       indexObservationalData=NULL, 
                       setOptions=list(), 
                       assumeNoSelectionVars = TRUE,
                       nsim=100, 
                       sampleSettings=1/sqrt(2), 
                       sampleObservations=1/sqrt(2),
                       verbose=FALSE, ...){
  # number of variables
  p <- ncol(X)
  
  # find unique settings
  uniqueSettings <- unique(environment)
  
  # number of settings to draw in each round
  subs <- sampleSettings* length(uniqueSettings)
  
  # init result list
  simList <- vector("list", length = nsim)
  resList <- vector("list", length = length(queries))
  names(resList) <- queries
  resList <- lapply(resList, function(i) i <- matrix(0, ncol = p, nrow = p))
  for(i in 1:length(resList)) attr(resList[[i]], "name") <- queries[i]
  
  for (sim in 1:nsim){
    
    # TODO move sampling to new function
    
    # find observations to use in this round
    # only use observational data?
    if(onlyObservationalData){
      
      if(!is.null(indexObservationalData)){
        # if indexObservationalData is given, extract observational data
        ind <- which(environment %in% indexObservationalData)
        warning(paste("Will use only environment", 
                      indexObservationalData,
                      "(= observational data?) among the", 
                      length(uniqueSettings),
                      "given distinct environments for method", 
                      method, "(", length(ind),"observations)"))
      }else{
        # if indexObservationalData is set to NULL, use all data points
        ind <- 1:nrow(X)
        
        warning(paste("Will use all observations for method", method, 
                      "assuming all data points are observational data (", 
                      length(ind),"observations)"))
      }
      
      # sample from observational data
      useSamples <- sort(sample(ind, round(length(ind)*sampleObservations)))
    }else{
      # if onlyObservationalData is false, 
      # sample the settings to use and draw a balanced subsample
      useSettings <- sample(uniqueSettings, drawE(subs))
      useSamples <- NULL
      for(s in 1:length(useSettings)){
        ind <- which(  environment %in% useSettings[s])
        useSamples <- 
          c(useSamples, sort(sample(ind, round(length(ind)*sampleObservations))))
      }
    }
    
    # permutation of columns
    permuteCols <- sample(p)
    
    colnamesX <- if(is.null(colnames(X))) as.character(1:ncol(X)) else colnames(X)
    
    # run getParents with this subsample
    res <- try(getParents(X[useSamples,permuteCols], 
                      environment= environment[useSamples], 
                      interventions=interventions[useSamples],
                      parentsOf=1:p, 
                      method= method,  
                      alpha= alpha, 
                      mode = "raw",
                      variableSelMat=variableSelMat,  
                      excludeTargetInterventions= excludeTargetInterventions, 
                      onlyObservationalData=onlyObservationalData, 
                      indexObservationalData=indexObservationalData, 
                      returnAsList=FALSE, 
                      sparse = FALSE,
                      directed=FALSE, 
                      pointConf = FALSE, 
                      setOptions = setOptions, 
                      assumeNoSelectionVars = assumeNoSelectionVars,
                      verbose = verbose, ...))
    
    if(inherits(res, "try-error")){
      errorMsg <- geterrmessage()
      
      cat(paste("Error in method", method,
                            ". getParents() returned the following error:",
                errorMsg))
      
      # system is computationally singular
      if(length(grep("system is computationally singular", errorMsg)) > 0){
        res <- matrix(0, nrow = p, ncol = p)
      }else{
        stop(paste("Error in method", method,
                   ". getParents() returned the following error:",
                   errorMsg))
      }
      
    }
    
    # redo permutation of columns
    res <- res[order(permuteCols),order(permuteCols)]
    rownames(res) <- colnames(res) <- colnamesX

    simList[[sim]] <- res
    
    resultForQueries <- convertForRanking(res, queries, method = method)
    resList <- lapply(resList, function(r) r <- r + resultForQueries[attr(r, "name") == names(resultForQueries)][[1]])
   
  }
  
  resList <- lapply(resList, function(r) r/nsim*100)
  
  resList <- lapply(resList, function(resmat) {
    # add row and column names to result matrix
    rownames(resmat) <- colnames(resmat) <- colnamesX
    resmat
  })
  
  resListForRanking <- resList
  
  # if(is.element("isMaybeParent", queries) & is.element("isParent", queries)){
  #   resListForRanking$isMaybeParent <- resListForRanking$isMaybeParent + resListForRanking$isParent
  # }
  
  # if(is.element("isMaybeAncestor", queries) & is.element("isAncestor", queries)){
  #   resListForRanking$isMaybeAncestor <- resListForRanking$isMaybeAncestor + resListForRanking$isAncestor
  # }
  
  ranking <- lapply(resListForRanking, function(r){
    arrayInd(order(r, 
                   getVecTobreakTies(r, resListForRanking), # break ties with results for other queries
                   rnorm(ncol(r)^2), # if still ties, break randomly
                   decreasing = T), .dim = dim(r))
  })
  
  list(ranking = ranking,
       resList = resList, 
       simEstimates = simList)
}

getVecTobreakTies <- function(m, histList){
  if(attr(m, "name") == "isParent"){
    vec <- histList$isMaybeParent
  }else if(attr(m, "name") == "isMaybeParent"){
    vec <- histList$isParent
  }else if(attr(m, "name") == "isNoParent"){
    vec <-  max(histList$isMaybeParent) - histList$isMaybeParent
  }else if(attr(m, "name") == "isAncestor"){
    vec <-histList$isMaybeAncestor
  }else if(attr(m, "name") == "isMaybeAncestor"){
    vec <- histList$isAncestor
  }else if(attr(m, "name") == "isNoAncestor"){
    vec <-  max(histList$isMaybeAncestor) - histList$isMaybeAncestor
  }else{
    stop("Query not supported.")
  }
  
  if(!is.null(vec)){
    if(sum(vec) == 0) vec <- rnorm(ncol(m)^2)
  }else{
    vec <- rnorm(ncol(m)^2)
  }
  
  vec
}
Any scripts or data that you put into this service are public.
CompareCausalNetworks documentation built on April 14, 2020, 7:31 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
CompareCausalNetworks
Interface to Diverse Estimation Methods of Causal Networks

R/getRanking.R
In CompareCausalNetworks: Interface to Diverse Estimation Methods of Causal Networks

Defines functions getVecTobreakTies getRanking

Documented in getRanking

Try the CompareCausalNetworks package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

CompareCausalNetworks Interface to Diverse Estimation Methods of Causal Networks

R/getRanking.R In CompareCausalNetworks: Interface to Diverse Estimation Methods of Causal Networks

Defines functions getVecTobreakTies getRanking

Documented in getRanking

Try the CompareCausalNetworks package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

CompareCausalNetworks
Interface to Diverse Estimation Methods of Causal Networks

R/getRanking.R
In CompareCausalNetworks: Interface to Diverse Estimation Methods of Causal Networks