evalCand: Evaluate candidate levels and select the optimal one
In fionarhuang/treeclimbR: An algorithm to find optimal signal levels in a tree

evalCand

R Documentation

Evaluate candidate levels and select the optimal one

Description

Evaluate all candidate levels proposed by getCand and select the one with best performance. For more details about how the scoring is done, see Huang et al (2021): https://doi.org/10.1186/s13059-021-02368-1.

Usage

evalCand(
  tree,
  type = c("single", "multiple"),
  levels,
  score_data = NULL,
  node_column,
  p_column,
  sign_column,
  feature_column = NULL,
  method = "BH",
  limit_rej = 0.05,
  use_pseudo_leaf = FALSE,
  message = FALSE
)

Arguments

`tree`	A `phylo` object.
`type`	A character scalar indicating whether the evaluation is for a DA-type workflow (set `type="single"`) or a DS-type workflow (set `type="multiple"`).
`levels`	A list of candidate levels that are returned by `getCand`. If `type = "single"`, elements in the list are candidate levels, and are named by the value of the tuning parameter. If `type = "multiple"`, a nested list is required and the list should be named by the feature (e.g., genes). In that case, each element is a list of candidate levels for that feature.
`score_data`	A `data.frame` (`type = "single"`) or a list of `data.frame`s (`type = "multiple"`). Each `data.frame` must have at least one column containing the node IDs (defined by `node_column`), one column with p-values (defined by `p_column`), one column with the direction of change (defined by `sign_column`) and one optional column with the feature (`feature_column`, for `type="multiple"`).
`node_column`	The name of the column that contains the node information.
`p_column`	The name of the column that contains p-values of nodes.
`sign_column`	The name of the column that contains the direction of the (estimated) change.
`feature_column`	The name of the column that contains information about the feature ID.
`method`	method The multiple testing correction method. Please refer to the argument `method` in `p.adjust`. Default is "BH".
`limit_rej`	The desired false discovery rate threshold.
`use_pseudo_leaf`	A logical scalar. If `FALSE`, the FDR is calculated on the leaf level of the tree; If `TRUE`, the FDR is calculated on the pseudo-leaf level. The pseudo-leaf level is the level on which entities have sufficient data to run analysis and the that is closest to the leaf level.
`message`	A logical scalar, indicating whether progress messages should be printed.

Value

A list with the following components:

candidate_best: The best candidate level
output: Node-level information for best candidate level
candidate_list: A list of candidates
level_info: Summary information of all candidates
FDR: The specified FDR level
method: The method to perform multiple test correction.
column_info: A list with the specified node, p-value, sign and feature column names

More details about the columns in level_info:

t The thresholds.
r The upper limit of t to control FDR on the leaf level.
is_valid Whether the threshold is in the range to control leaf FDR.
limit_rej The specified FDR.
level_name The name of the candidate level.
rej_leaf The number of rejections on the leaf level.
rej_pseudo_leaf The number of rejected pseudo-leaf nodes.
rej_node The number of rejections on the tested candidate level (leaves or internal nodes).

Author(s)

Ruizhu Huang

Examples

suppressPackageStartupMessages({
    library(TreeSummarizedExperiment)
    library(ggtree)
})

## Generate example tree and assign p-values and signs to each node
data(tinyTree)
ggtree(tinyTree, branch.length = "none") +
   geom_text2(aes(label = node)) +
   geom_hilight(node = 13, fill = "blue", alpha = 0.5) +
   geom_hilight(node = 18, fill = "orange", alpha = 0.5)
set.seed(1)
pv <- runif(19, 0, 1)
pv[c(seq_len(5), 13, 14, 18)] <- runif(8, 0, 0.001)

fc <- sample(c(-1, 1), 19, replace = TRUE)
fc[c(seq_len(3), 13, 14)] <- 1
fc[c(4, 5, 18)] <- -1
df <- data.frame(node = seq_len(19),
                 pvalue = pv,
                 logFoldChange = fc)

## Propose candidates
ll <- getCand(tree = tinyTree, score_data = df,
               node_column = "node",
               p_column = "pvalue",
               sign_column = "logFoldChange")

## Evaluate candidates
cc <- evalCand(tree = tinyTree, levels = ll$candidate_list,
               score_data = ll$score_data, node_column = "node",
               p_column = "pvalue", sign_column = "logFoldChange",
               limit_rej = 0.05)

## Best candidate
cc$candidate_best

## Details for best candidate
cc$output

fionarhuang/treeclimbR documentation built on June 14, 2025, 4:30 p.m.