score_interactions: Score potential interactions in cross-validation

Description Usage Arguments

View source: R/score_interactions.R

Description

Use a machine-learning approach to integrate data from across multiple CF-MS replicates, or multiple features within a single replicate. This function takes a data frame of features as input, alongside a set of 'gold-standard' reference interactions. The gold standard is split into a user-specified number of folds, and a classifier is trained on the reference interactions after leaving out each fold in turn. Each classifier is then applied to predict interactions in the entire feature data frame, minus the protein pairs that overlap with the training interactions for that fold. The mean classifier score across all folds is calculated for each protein pair, and the proteins are sorted in descending order by their mean score.

Usage

1
2
3
4
5
6
7
8
score_interactions(
  features,
  gold_standard,
  classifier = c("RF", "NB", "SVM", "LR"),
  split_by = c("proteins", "pairs"),
  n_folds = 10,
  verbose = TRUE
)

Arguments

features

a data frame containing features for all protein pairs across all replicates, containing columns protein_A and protein_B, as returned by calculate_features

gold_standard

a data frame with columns protein_A and protein_B, as returned by to_pairwise_df, containing 'gold standard' interacting protein pairs

classifier

the classifier to use; one of 'RF' (random forest), 'NB' (naive Bayes), 'SVM' (support vector machine), or 'LR' (logistic regression)

split_by

the mechanism by which to split the gold standard into cross-validation folds; either by protein complex subunits ('proteins') or by pairwise interactions between those subunits ('pairs')

n_folds

the number of folds of cross-validation to perform

verbose

set to FALSE to disable messages from the function


fosterlab/CFTK documentation built on Jan. 19, 2021, 10:31 p.m.