CaDrA: CaDrA Search
In montilab/CaDrA: Candidate Driver Analysis

CaDrA

R Documentation

CaDrA Search

Description

Perform permutation-based testings on a sample of permuted input scores using candidate_search as the main iterative function for each run.

Usage

CaDrA(
  FS,
  input_score,
  method = c("ks_pval", "ks_score", "wilcox_pval", "wilcox_score", "revealer", "custom"),
  method_alternative = c("less", "greater", "two.sided"),
  custom_function = NULL,
  custom_parameters = NULL,
  weights = NULL,
  search_start = NULL,
  top_N = 1,
  search_method = c("both", "forward"),
  max_size = 7,
  n_perm = 1000,
  perm_alternative = c("one.sided", "two.sided"),
  obs_best_score = NULL,
  smooth = TRUE,
  plot = FALSE,
  ncores = 1,
  cache = FALSE,
  cache_path = NULL,
  verbose = FALSE
)

Arguments

`FS`	a matrix of binary features or a SummarizedExperiment class object from SummarizedExperiment package where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent the samples. The assay of FS contains binary (1/0) values indicating the presence/absence of omics features.
`input_score`	a vector of continuous scores representing a phenotypic readout of interest such as protein expression, pathway activity, etc. NOTE: `input_score` object must have names or labels that match the column names of `FS` object.
`method`	a character string specifies a scoring method that is used in the search. There are 6 options: (`"ks_pval"` or `ks_score` or `"wilcox_pval"` or `wilcox_score` or `"revealer"` (conditional mutual information from REVEALER) or `"custom"` (a user-defined scoring method)). Default is `ks_pval`.
`method_alternative`	a character string specifies an alternative hypothesis testing (`"two.sided"` or `"greater"` or `"less"`). Default is `less` for left-skewed significance testing. NOTE: This argument only applies to `ks_pval` and `wilcox_pval` method
`custom_function`	If method is `"custom"`, specifies a user-defined function here. Default is `NULL`. NOTE: `custom_function` must take `FS` and `input_score` as its input arguments and its final result must return a vector of row-wise scores where its labels or names match the row names of `FS` object.
`custom_parameters`	If method is `"custom"`, specifies a list of additional arguments (excluding `FS` and `input_score`) to be passed to `custom_function`. For example: custom_parameters = list(alternative = "less"). Default is `NULL`.
`weights`	If method is `ks_score` or `ks_pval`, specifying a vector of weights will perform a weighted-KS testing. Default is `NULL`. NOTE: `weights` must have names or labels that match the labels of `input_score`.
`search_start`	a vector of character strings (separated by commas) specifies feature names in the FS object to start the search with. If `search_start` is provided, then `top_N` parameter will be ignored and vice versa. Default is `NULL`.
`top_N`	an integer specifies the number of features to start the search over. By default, it starts with the feature that has the highest best score (top_N = 1). NOTE: If `top_N` is provided, then `search_start` parameter will be ignored and vice versa. If top_N > 10, it may result in a longer search time.
`search_method`	a character string specifies an algorithm to filter out the best candidates (`"forward"` or `"both"`). Default is `both` (i.e., backward and forward).
`max_size`	an integer specifies a maximum size that a meta-feature can extend to do for a given search. Default is `7`.
`n_perm`	an integer specifies the number of permutations to perform. Default is `1000`.
`perm_alternative`	an alternative hypothesis type for calculating permutation-based p-value. Options: one.sided, two.sided. Default is `one.sided`.
`obs_best_score`	a numeric value corresponding to the best observed score. This value is used to compare against the `n_perm` calculated best scores. Default is `NULL`. If set to NULL, we will compute the observed best score based on the given parameters.
`smooth`	a logical value indicates whether or not to add a smoothing factor of 1 to the calculation of permutation-based p-value. This option is used to avoid a returned p-value of 0. Default is `TRUE`.
`plot`	a logical value indicates whether or not to plot the empirical null distribution of the permuted best scores. Default is `FALSE`.
`ncores`	an integer specifies the number of cores to perform parallelization for permutation-based testing. Default is `1`.
`cache`	a logical value determines whether or not to cache the permuted best scores. This helps to save time for future loading instead of re-computing the permutation-based testing every time. Default is `FALSE`.
`cache_path`	a path to cache permuted best scores. Default is `NULL`. If NULL, the cache path is set to system home directory (e.g. `$HOME/.Rcache`) for future loading.
`verbose`	a logical value indicates whether or not to print the diagnostic messages. Default is `FALSE`.

Value

a list of 4 objects: key, perm_best_scores, obs_best_score, perm_pval

-key: a list of parameters that was used to cache the results of the permutation-based testing. This is useful as the permuted best scores can be recycled to save time for future loading.

-perm_best_scores: a vector of permuted best scores obtained by performing candidate_search over n_perm iterations of permuted input scores.

-obs_best_score: a user-provided best score or an observed best score obtained by performing candidate_search on a given dataset and input parameters. This value is later used to compare against the permuted best scores (perm_best_scores).

perm_pval: a permutation-based p-value obtained by calculating sum(perm_best_scores > obs_best_score)/n_perm

NOTE: If smooth = TRUE, a smoothing factor of 1 will be added to the calculation of perm_pval.

e.g. (sum(perm_best_scores > obs_best_score) + 1) / (n_perm + c)

This is just to not return a p-value of 0

Examples


# Load pre-computed feature set
data(sim_FS)

# Load pre-computed input-score
data(sim_Scores)

# Set seed for permutation
set.seed(21)

# Define additional parameters and start the function
cadra_result <- CaDrA(
  FS = sim_FS, input_score = sim_Scores, method = "ks_pval", 
  weights = NULL, method_alternative = "less", top_N = 1,
  search_start = NULL, search_method = "both", max_size = 7, 
  n_perm = 10, perm_alternative = "one.sided", plot = FALSE, 
  smooth = TRUE, obs_best_score = NULL,
  ncores = 1, cache = FALSE, cache_path = NULL
)

montilab/CaDrA documentation built on Aug. 22, 2024, 11:55 p.m.