score_interactions: Score potential interactions in cross-validation
In fosterlab/CFTK: The Co-Fractionation Toolkit

Use a machine-learning approach to integrate data from across multiple CF-MS replicates, or multiple features within a single replicate. This function takes a data frame of features as input, alongside a set of 'gold-standard' reference interactions. The gold standard is split into a user-specified number of folds, and a classifier is trained on the reference interactions after leaving out each fold in turn. Each classifier is then applied to predict interactions in the entire feature data frame, minus the protein pairs that overlap with the training interactions for that fold. The mean classifier score across all folds is calculated for each protein pair, and the proteins are sorted in descending order by their mean score.

score_interactions(
  features,
  gold_standard,
  classifier = c("RF", "NB", "SVM", "LR"),
  split_by = c("proteins", "pairs"),
  n_folds = 10,
  verbose = TRUE
)

`features`	a data frame containing features for all protein pairs across all replicates, containing columns `protein_A` and `protein_B`, as returned by calculate_features
`gold_standard`	a data frame with columns `protein_A` and `protein_B`, as returned by to_pairwise_df, containing 'gold standard' interacting protein pairs
`classifier`	the classifier to use; one of `'RF'` (random forest), `'NB'` (naive Bayes), `'SVM'` (support vector machine), or `'LR'` (logistic regression)
`split_by`	the mechanism by which to split the gold standard into cross-validation folds; either by protein complex subunits (`'proteins'`) or by pairwise interactions between those subunits (`'pairs'`)
`n_folds`	the number of folds of cross-validation to perform
`verbose`	set to `FALSE` to disable messages from the function