predict_interactions: Predict interactions given a set of features and examples
In PrInCE: Predicting Interactomes from Co-Elution

Description Usage Arguments Details Value Examples

Discriminate interacting from non-interacting protein pairs by training a machine learning model on a set of labelled examples, given a set of features derived from a co-elution profile matrix (see calculate_features.

1
2
3

predict_interactions(features, gold_standard, classifier = c("NB", "SVM",
  "RF", "LR", "ensemble"), verbose = FALSE, models = 10,
  cv_folds = 10, trees = 500)

`features`	a data frame with proteins in the first two columns, and features to be passed to the classifier in the remaining columns
`gold_standard`	an adjacency matrix of "gold standard" interactions used to train the classifier
`classifier`	the type of classifier to use: one of `"NB"` (naive Bayes), `"SVM"` (support vector machine), `"RF"` (random forest), `"LR"` (logistic regression), or `"ensemble"` (an ensemble of all four)
`verbose`	if `TRUE`, print a series of messages about the stage of the analysis
`models`	the number of classifiers to train and average across, each with a different k-fold cross-validation split
`cv_folds`	the number of folds to use for k-fold cross-validation
`trees`	for random forests only, the number of trees in the forest

PrInCE implements four different classifiers (naive Bayes, support vector machine, random forest, and logistic regression). Naive Bayes is used as a default. The classifiers are trained on the gold standards using a ten-fold cross-validation procedure, training on 90 that are part of the training data, the held-out split is used to assign a classifier score, whereas for the remaining protein pairs, the median of all ten folds is used. Furthermore, to ensure the results are not sensitive to the precise classifier split used, an ensemble of multiple classifiers (ten, by default) is trained, and the classifier score is subsequently averaged across classifiers.

PrInCE can also ensemble across multiple different types of classifiers, by supplying the "ensemble" option to the classifier argument.

a ranked data frame of pairwise interactions, with the classifier score, label, and cumulative precision for each interaction

## calculate features
data(scott)
data(scott_gaussians)
subset <- scott[seq_len(500), ] ## limit to first 500 proteins
gauss <- scott_gaussians[names(scott_gaussians) %in% rownames(subset)]
features <- calculate_features(subset, gauss)
## load training data
data(gold_standard)
ref <- adjacency_matrix_from_list(gold_standard)
## predict interactions
ppi <- predict_interactions(features, ref, cv_folds = 3, models = 1)