calc_rowscore: Calculate row-wise scores of a given binary feature set based...

View source: R/calc_rowscore.R

calc_rowscoreR Documentation

Calculate row-wise scores of a given binary feature set based on a given scoring method

Description

Calculate row-wise scores of a given binary feature set based on a given scoring method

Usage

calc_rowscore(
  FS_mat,
  input_score,
  method = c("ks_pval", "ks_score", "wilcox_pval", "wilcox_score", "revealer", "custom"),
  custom_function = NULL,
  custom_parameters = NULL,
  alternative = c("less", "greater", "two.sided"),
  weight = NULL,
  seed_names = NULL,
  do_check = TRUE,
  verbose = FALSE,
  ...
)

Arguments

FS_mat

a matrix of binary features where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent the samples.

input_score

a vector of continuous scores representing a phenotypic readout of interest such as protein expression, pathway activity, etc.

NOTE: input_score object must have names or labels that match the column names of FS_mat object.

method

a character string specifies a scoring method that is used in the search. There are 6 options: ("ks_pval" or ks_score or "wilcox_pval" or wilcox_score or "revealer" (conditional mutual information from REVEALER) or "custom" (a customized scoring method)). Default is ks_pval.

custom_function

if method is "custom", specifies the name of the customized function here. Default is NULL.

NOTE: custom_function() must take FS_mat and input_score as its input arguments, and its final result must return a vector of row-wise scores ordered from most significant to least significant where its labels or names matched the row names of FS_mat object.

custom_parameters

if method is "custom", specifies a list of additional arguments (excluding FS_mat and input_score) to be passed to custom_function. Default is NULL.

alternative

a character string specifies an alternative hypothesis testing ("two.sided" or "greater" or "less"). Default is less for left-skewed significance testing.

NOTE: This argument is applied to KS and Wilcoxon method

weight

if method is ks_score or ks_pval, specifying a vector of weights will perform a weighted-KS testing. Default is NULL.

seed_names

a vector of one or more features representing known “causes” of activation or features associated with a response of interest. It is applied for method = "revealer" only.

do_check

a logical value indicates whether or not to validate if the given parameters (FS_mat and input_score) are valid inputs. Default is TRUE.

verbose

a logical value indicates whether or not to print the diagnostic messages. Default is FALSE.

...

additional parameters to be passed to custom_function

Value

return a vector of row-wise scores where it is ordered from most significant to least significant (e.g. from highest to lowest values) where its labels or names must match the row names of FS_mat object

Examples


# Create a feature matrix
mat <- matrix(c(1,0,1,0,0,0,0,0,1,0, 
                0,0,1,0,1,0,1,0,0,0,
                0,0,0,0,1,0,1,0,1,0), nrow=3)

colnames(mat) <- 1:10
row.names(mat) <- c("TP_1", "TP_2", "TP_3")

# Create a vector of observed input scores
set.seed(42)
input_score = rnorm(n = ncol(mat))
names(input_score) <- colnames(mat)

# Run the ks method
ks_rowscore_result <- calc_rowscore(
  FS_mat = mat,
  input_score = input_score,
  method = "ks_pval",
  weight = NULL,
  alternative = "less"
)

# Run the wilcoxon method
wilcox_rowscore_result <- calc_rowscore(
  FS_mat = mat,
  input_score = input_score,
  method = "wilcox_pval",
  alternative = "less"
)

# Run the revealer method
revealer_rowscore_result <- calc_rowscore(
  FS_mat = mat,
  input_score = input_score,
  method = "revealer",
  seed_names = NULL
)

# A customized function using ks-test function
customized_rowscore <- function(FS_mat, input_score, alternative="less"){
  
  ks <- apply(FS_mat, 1, function(r){ 
    x = input_score[which(r==1)]; 
    y = input_score[which(r==0)];
    res <- ks.test(x, y, alternative=alternative)
    return(c(res$statistic, res$p.value))
  })
  
  # Obtain score statistics and p-values from KS method
  stat <- ks[1,]
  pval <- ks[2,]
  
  # Compute the -log scores for pval
  # Make sure scores has names that match the row names of FS_mat object
  scores <- -log(pval)
  names(scores) <- rownames(FS_mat)
  
  # Remove scores that are Inf as it is resulted from
  # taking the -log(0). They are uninformative.
  scores <- scores[scores != Inf]  
  
  # Re-order FS_mat in a decreasing order (from most to least significant)
  # This comes in handy when doing the top-N evaluation of
  # the top N 'best' features
  scores <- scores[order(scores, decreasing=TRUE)]
  
  return(scores)
  
}

# Search for best features using a custom-defined function
custom_rowscore_result <- calc_rowscore(
  FS_mat = mat,
  input_score = input_score,
  method = "custom",
  custom_function = customized_rowscore,            
  custom_parameters = NULL  
)


RC-88/CaDrA documentation built on March 28, 2023, 12:18 a.m.