candidate_search: Candidate Search
In montilab/CaDrA: Candidate Driver Analysis

candidate_search

R Documentation

Candidate Search

Description

Performs heuristic search on a set of binary features to determine whether there are features whose union is more skewed (enriched at the extremes) than either features alone. This is the main functionality of the CaDrA package.

Usage

candidate_search(
  FS,
  input_score,
  method = c("ks_pval", "ks_score", "wilcox_pval", "wilcox_score", "revealer", "custom"),
  method_alternative = c("less", "greater", "two.sided"),
  custom_function = NULL,
  custom_parameters = NULL,
  weights = NULL,
  search_start = NULL,
  top_N = 1,
  search_method = c("both", "forward"),
  max_size = 7,
  best_score_only = FALSE,
  do_plot = FALSE,
  verbose = FALSE
)

Arguments

`FS`	a matrix of binary features or a SummarizedExperiment class object from SummarizedExperiment package where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent the samples. The assay of FS contains binary (1/0) values indicating the presence/absence of omics features.
`input_score`	a vector of continuous scores representing a phenotypic readout of interest such as protein expression, pathway activity, etc. NOTE: `input_score` object must have names or labels that match the column names of `FS` object.
`method`	a character string specifies a scoring method that is used in the search. There are 6 options: (`"ks_pval"` or `ks_score` or `"wilcox_pval"` or `wilcox_score` or `"revealer"` (conditional mutual information from REVEALER) or `"custom"` (a user-defined scoring method)). Default is `ks_pval`.
`method_alternative`	a character string specifies an alternative hypothesis testing (`"two.sided"` or `"greater"` or `"less"`). Default is `less` for left-skewed significance testing. NOTE: This argument only applies to `ks_pval` and `wilcox_pval` method
`custom_function`	if method is `"custom"`, specifies a user-defined function here. Default is `NULL`. NOTE: `custom_function` must take FS and input_score as its input arguments and its final result must return a vector of row-wise scores where its labels or names match the row names of `FS` object.
`custom_parameters`	if method is `"custom"`, specifies a list of additional arguments (excluding `FS` and `input_score`) to be passed to the `custom_function`. For example: custom_parameters = list(alternative = "less"). Default is `NULL`.
`weights`	if method is `ks_score` or `ks_pval`, specifying a vector of weights will perform a weighted-KS testing. Default is `NULL`. NOTE: `weights` must have names or labels that match the labels of `input_score`.
`search_start`	a vector of character strings (separated by commas) specifies feature names in the `FS` object to start the search with. If `search_start` is provided, then `top_N` parameter will be ignored and vice versa. Default is `NULL`.
`top_N`	an integer specifies the number of features to start the search over. By default, it starts with the feature that has the highest best score (top_N = 1). NOTE: If `top_N` is provided, then `search_start` parameter will be ignored and vice versa. If top_N > 10, it may result in a longer search time.
`search_method`	a character string specifies an algorithm to filter out the best features (`"forward"` or `"both"`). Default is `both` (i.e. backward and forward).
`max_size`	an integer specifies a maximum size that a meta-feature can extend to do for a given search. Default is `7`.
`best_score_only`	a logical value indicates whether or not to return the best score corresponding to each top N searches only. Default is `FALSE`.
`do_plot`	a logical value indicates whether or not to plot the overlapping features of the resulting meta-feature matrix. NOTE: plot can only be produced if the resulting meta-feature matrix contains more than 1 feature (e.g. length(search_start) > 1 or top_N > 1). Default is `FALSE`.
`verbose`	a logical value indicates whether or not to print the diagnostic messages. Default is `FALSE`.

Details

NOTE: The legacy function topn_eval is equivalent to the recommended candidate_search function

Value

If best_score_only = TRUE, the heuristic search will return the best feature whose its union meta-feature matrix has the highest score among the top_N feature searches. If best_score_only = FALSE, a list of objects pertaining to top_N feature searches will be returned. For each top_N feature search, the candidate search will contain 7 objects: (1) its best meta-feature matrix (feature_set), (2) its observed input scores (input_score), (3) its corresponding best score pertaining to the union meta-feature matrix (score), (4) names of the best meta-features (best_features), (5) rank of the best meta-features in term of their best scores (best indices), (6) marginal scores of the best meta-features (marginal_best_scores), (7) cumulative scores of the best meta-features (cumulative_best_scores).

Examples


# Load pre-computed feature set
data(sim_FS)

# Load pre-computed input scores
data(sim_Scores)

# Define additional parameters and run the function
candidate_search_result <- candidate_search(
  FS = sim_FS, input_score = sim_Scores, 
  method = "ks_pval", method_alternative = "less", weights = NULL, 
  search_start = NULL, top_N = 3, search_method = "both",
  max_size = 7, best_score_only = FALSE
)

montilab/CaDrA documentation built on Aug. 22, 2024, 11:55 p.m.