calc_rowscore: Calculate row-wise scores of a given binary feature set based...

View source: R/calc_rowscore.R

calc_rowscoreR Documentation

Calculate row-wise scores of a given binary feature set based on a given scoring method

Description

Calculate row-wise scores of a given binary feature set based on a given scoring method

Usage

calc_rowscore(
  FS,
  input_score,
  meta_feature = NULL,
  method = c("ks_pval", "ks_score", "wilcox_pval", "wilcox_score", "revealer", "custom"),
  method_alternative = c("less", "greater", "two.sided"),
  custom_function = NULL,
  custom_parameters = NULL,
  weights = NULL,
  do_check = TRUE,
  verbose = FALSE,
  ...
)

Arguments

FS

a matrix of binary features or a SummarizedExperiment class object from SummarizedExperiment package where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent the samples. The assay of FS contains binary (1/0) values indicating the presence/absence of omics features.

input_score

a vector of continuous scores representing a phenotypic readout of interest such as protein expression, pathway activity, etc.

NOTE: input_score object must have names or labels that match the column names of FS object.

meta_feature

a vector of one or more features representing known causes of activation or features associated with a response of interest (e.g. input_score). Default is NULL.

method

a character string specifies a scoring method that is used in the search. There are 6 options: ("ks_pval" or ks_score or "wilcox_pval" or wilcox_score or "revealer" (conditional mutual information from REVEALER) or "custom" (a user-defined scoring method)). Default is ks_pval.

method_alternative

a character string specifies an alternative hypothesis testing ("two.sided" or "greater" or "less"). Default is less for left-skewed significance testing.

NOTE: This argument only applies to ks_pval and wilcox_pval method

custom_function

if method is "custom", specifies a user-defined function here. Default is NULL.

NOTE: custom_function must take FS and input_score as its input arguments, and its final result must return a vector of row-wise scores where its labels or names matched the row names of FS object.

custom_parameters

if method is "custom", specifies a list of additional arguments (excluding FS and input_score) to be passed to custom_function. For example: custom_parameters = list(alternative = "less"). Default is NULL.

weights

If method is ks_score or ks_pval, specifying a vector of weights will perform a weighted-KS testing. Default is NULL.

NOTE: weights must have names or labels that match the names or labels of input_score.

do_check

a logical value indicates whether or not to validate if the given parameters (FS and input_score) are valid inputs. Default is TRUE.

verbose

a logical value indicates whether or not to print the diagnostic messages. Default is FALSE.

...

additional parameters to be passed to custom_function

Value

return a vector of row-wise positive scores where it is ordered from most significant to least significant (e.g. from highest to lowest values) and its labels or names must match the row names of FS object

Examples


# Create a feature matrix
mat <- matrix(c(1,0,1,0,0,0,0,0,1,0, 
                0,0,1,0,1,0,1,0,0,0,
                0,0,0,0,1,0,1,0,1,0), nrow=3)

colnames(mat) <- 1:10
row.names(mat) <- c("TP_1", "TP_2", "TP_3")

# Create a vector of observed input scores
set.seed(42)
input_score = rnorm(n = ncol(mat))
names(input_score) <- colnames(mat)

# Run the ks method
ks_rowscore_result <- calc_rowscore(
  FS = mat,
  input_score = input_score,
  meta_feature = NULL,
  method = "ks_pval",
  method_alternative = "less",
  weights = NULL
)

# Run the wilcoxon method
wilcox_rowscore_result <- calc_rowscore(
  FS = mat,
  input_score = input_score,
  meta_feature = NULL,
  method = "wilcox_pval",
  method_alternative = "less"
)

# Run the revealer method
revealer_rowscore_result <- calc_rowscore(
  FS = mat,
  input_score = input_score,
  meta_feature = NULL,
  method = "revealer"
)

# A customized function using ks-test function
customized_ks_rowscore <- function(FS, input_score, meta_feature=NULL, alternative="less"){
  
  # Check if meta_feature is provided
  if(!is.null(meta_feature)){
    # Getting the position of the known meta features
    locs <- match(meta_feature, row.names(FS))
    
    # Taking the union across the known meta features
    if(length(locs) > 1) {
      meta_vector <- as.numeric(ifelse(colSums(FS[locs,]) == 0, 0, 1))
    }else{
      meta_vector <- as.numeric(FS[locs,])
    }
     
    # Remove the meta features from the binary feature matrix
    # and taking logical OR btw the remaining features with the meta vector
    FS <- base::sweep(FS[-locs, , drop=FALSE], 2, meta_vector, `|`)*1
     
    # Check if there are any features that are all 1s generated from
    # taking the union between the matrix
    # We cannot compute statistics for such features and thus they need
    # to be filtered out
    if(any(rowSums(FS) == ncol(FS))){
      warning("Features with all 1s generated from taking the matrix union ",
              "will be removed before progressing...\n")
      FS <- FS[rowSums(FS) != ncol(FS), , drop=FALSE]
    }
  }
   
  # KS is a ranked-based method
  # So we need to sort input_score from highest to lowest values
  input_score <- sort(input_score, decreasing=TRUE)
   
  # Re-order the matrix based on the order of input_score
  FS <- FS[, names(input_score), drop=FALSE]  
  
  # Compute the scores using the KS method
  ks <- apply(FS, 1, function(r){ 
    x = input_score[which(r==1)]; 
    y = input_score[which(r==0)];
    res <- ks.test(x, y, alternative=alternative)
    return(c(res$statistic, res$p.value))
  })
  
  # Obtain score statistics
  stat <- ks[1,]
  
  # Obtain p-values and change values of 0 to the machine lowest value 
  # to avoid taking -log(0)
  pval <- ks[2,]
  pval[which(pval == 0)] <- .Machine$double.xmin
  
  # Compute the -log(pval)
  # Make sure scores has names that match the row names of FS object
  scores <- -log(pval)
  names(scores) <- rownames(FS)
  
  return(scores)
  
}

# Search for best features using a custom-defined function
custom_rowscore_result <- calc_rowscore(
  FS = mat,
  input_score = input_score,
  meta_feature = NULL,
  method = "custom",
  custom_function = customized_ks_rowscore,            
  custom_parameters = NULL  
)


montilab/CaDrA documentation built on March 15, 2024, 9:59 p.m.