calculate_go_auc: Evaluate the recovery of proteins annotated to the same GO...

Description Usage Arguments Value

View source: R/calculate_go_auc.R

Description

Assess the intrinsic quality of a CF-MS dataset by evaluating its ability to recover groups of proteins annotated to the same GO term, using receiver operating characteristic (ROC) analysis. In this analysis, the correlation coefficients between every pair of proteins in the dataset are ranked, and compared to a binary outcome variable reflecting whether the two proteins annotated to the same GO term. The analysis is then repeated for each GO term in turn. The area under the curve (AUC) for each GO term is returned as a measure of the ability of the CF-MS data to recover proteins annotated to this term. This measure ranges from 0 to 1, with 1 representing perfect recovery, and 0.5 representing random recovery.

Usage

1
calculate_go_auc(pairs, ann, score_column = "cor", verbose = TRUE)

Arguments

pairs

a matrix of dimensions (# of proteins) x (# of proteins), scoring every possible protein pair, in which higher values reflect more similar pairs, e.g. as returned by score_pairs. Alternatively, a data frame of candidate protein-protein interactions, with proteins in the first two columns.

ann

a list in which each entry corresponds to a GO term and contains all of the proteins annotated to that GO term, e.g. as returned by as_annotation_list

score_column

when pairs is a data frame, the column that contains the score for each protein pair

verbose

set to FALSE to disable messages from the function

Value

a data frame with four columns:

  1. go_term: the GO term in question, obtained from the names of the input annotation list

  2. n_proteins: the number of proteins in the input annotation list which are annotated to that GO term

  3. n_chromatograms: the number of proteins in the CF-MS dataset which are annotated to that GO term

  4. auroc: the AUC for that GO term


fosterlab/CFTK documentation built on Jan. 19, 2021, 10:31 p.m.