rankAgainstReference: Compare multiple methods and rank against reference...

View source: R/compare.R

rankAgainstReferenceR Documentation

Compare multiple methods and rank against reference accordingly

Description

Compare multiple methods and rank against reference accordingly

Usage

rankAgainstReference(
  input,
  reference,
  method = c("spearman", "pearson", "gsea"),
  geneSize = 150,
  cellLines = NULL,
  cellLineMean = "auto",
  rankByAscending = TRUE,
  rankPerCellLine = FALSE,
  threads = 1,
  chunkGiB = 1,
  verbose = FALSE
)

Arguments

input

Named numeric vector of differentially expressed genes whose names are gene identifiers and respective values are a statistic that represents significance and magnitude of differentially expressed genes (e.g. t-statistics); or character of gene symbols composing a gene set that is tested for enrichment in reference data (only used if method includes gsea)

reference

Data matrix or character object with file path to CMap perturbations (see prepareCMapPerturbations()) or gene expression and drug sensitivity association (see loadExpressionDrugSensitivityAssociation())

method

Character: comparison method (spearman, pearson or gsea; multiple methods may be selected at once)

geneSize

Numeric: number of top up-/down-regulated genes to use as gene sets to test for enrichment in reference data; if a 2-length numeric vector, the first index is the number of top up-regulated genes and the second index is the number of down-regulated genes used to create gene sets; only used if method includes gsea and if input is not a gene set

cellLines

Integer: number of unique cell lines

cellLineMean

Boolean: add rows with the mean of method across cell lines? If cellLineMean = "auto" (default), rows will be added when data for more than one cell line is available.

rankByAscending

Boolean: rank values based on their ascending (TRUE) or descending (FALSE) order?

rankPerCellLine

Boolean: rank results based on both individual cell lines and mean scores across cell lines (TRUE) or based on mean scores alone (FALSE)? If cellLineMean = FALSE, individual cell line conditions are always ranked.

threads

Integer: number of parallel threads

chunkGiB

Numeric: if second argument is a path to an HDF5 file (.h5 extension), that file is loaded and processed in chunks of a given size in gibibytes (GiB); lower values decrease peak RAM usage (see details below)

verbose

Boolean: print additional details?

Value

Data table with correlation and/or GSEA score results

Process data by chunks

If a file path to a valid HDF5 (.h5) file is provided instead of a data matrix, that file can be loaded and processed in chunks of size chunkGiB, resulting in decreased peak memory usage.

The default value of 1 GiB (1 GiB = 1024^3 bytes) allows loading chunks of ~10000 columns and 14000 rows (10000 * 14000 * 8 bytes / 1024^3 = 1.04 GiB).

GSEA score

When method = "gsea", weighted connectivity scores (WTCS) are calculated (https://clue.io/connectopedia/cmap_algorithms).


nuno-agostinho/cTRAP documentation built on March 28, 2024, 3:59 p.m.