classify_gsva_percent: Sample classification according to pathway activity using a...

View source: R/classification.R

classify_gsva_percentR Documentation

Sample classification according to pathway activity using a percentile threshold for assessing expression consistency with both the up-regulated and down-regulated gene-set of a gene signature.

Description

Classifies samples according to pathway activity by first ranking samples by their expression abundance of the up-regulated gene set and then the down-regulated gene-set using GSVA scores generated by the GSVA algorithm as measures of expression abundance. Samples are then assessed for expression consistency with both the up-regulated and down-regulated gene-sets using percentile thresholds during the pathway activity sample classification.

Usage

classify_gsva_percent(expr_mat, sig_df, percent_thresh = 25)

Arguments

expr_mat

Normalised expression data set matrix comprising the expression levels of genes (rows) for each sample (columns) in a data set. Row names are gene symbols and column names are sample IDs / names. Gene expression matrices can contain normalised (logCPM transformed) RNASeq or microarray transcriptomic data.

sig_df

Gene expression signature for a specific pathway given as data frame with the first column named "gene" containing a list of genes that are the most differentially expressed when the given pathway is active and the second column named "expression" containing their corresponding expression: -1 for down-regulated genes and 1 for up-regulated genes.

percent_thresh

Percentile threshold (0-100) of samples for checking consistency of gene expression of a sample with first the up-regulated and then down-regulated gene-set of the gene signature (default= 25% (quartile)). For example, using the 25% percentile threshold samples ranked in the top 25% and bottom 25% of the up-regulated and down-regulated gene-sets respectively, would be considered as "Active". Likewise, samples ranked in the bottom 25% and top 25% of the up-regulated and down-regulated gene-set of the gene signature would be classified as "Inactive".

Value

A data frame with the first column named "sample" containing sample names and the second column named "class" containing their corresponding predicted pathway activity classes (Active, Inactive or Uncertain).

Author(s)

Anisha Thind a.thind@cranfield.ac.uk

Examples

# default using quartile threshold (25th percentile)
## Not run: classes_df <- classify_gsva_percent(ER_data_mat, ER_sig)
# custom percentile threshold e.g. 30th percentile
## Not run: classes_df <- classify_gsva_percent(ER_data_mat, ER_sig,
       percent_thresh=30)
## End(Not run)

a-thind/PathAnalyser documentation built on May 6, 2022, 9:50 a.m.