predict_interactions: Predict interactions given a set of features and examples

Description Usage Arguments Details Value Examples

View source: R/predict_interactions.R

Description

Discriminate interacting from non-interacting protein pairs by training a machine learning model on a set of labelled examples, given a set of features derived from a co-elution profile matrix (see calculate_features.

Usage

1
2
3
4
5
6
7
8
9
predict_interactions(
  features,
  gold_standard,
  classifier = c("NB", "SVM", "RF", "LR", "ensemble"),
  verbose = FALSE,
  models = 10,
  cv_folds = 10,
  trees = 500
)

Arguments

features

a data frame with proteins in the first two columns, and features to be passed to the classifier in the remaining columns

gold_standard

an adjacency matrix of "gold standard" interactions used to train the classifier

classifier

the type of classifier to use: one of "NB" (naive Bayes), "SVM" (support vector machine), "RF" (random forest), "LR" (logistic regression), or "ensemble" (an ensemble of all four)

verbose

if TRUE, print a series of messages about the stage of the analysis

models

the number of classifiers to train and average across, each with a different k-fold cross-validation split

cv_folds

the number of folds to use for k-fold cross-validation

trees

for random forests only, the number of trees in the forest

Details

PrInCE implements four different classifiers (naive Bayes, support vector machine, random forest, and logistic regression). Naive Bayes is used as a default. The classifiers are trained on the gold standards using a ten-fold cross-validation procedure, training on 90 that are part of the training data, the held-out split is used to assign a classifier score, whereas for the remaining protein pairs, the median of all ten folds is used. Furthermore, to ensure the results are not sensitive to the precise classifier split used, an ensemble of multiple classifiers (ten, by default) is trained, and the classifier score is subsequently averaged across classifiers.

PrInCE can also ensemble across multiple different types of classifiers, by supplying the "ensemble" option to the classifier argument.

Value

a ranked data frame of pairwise interactions, with the classifier score, label, and cumulative precision for each interaction

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## calculate features
data(scott)
data(scott_gaussians)
subset <- scott[seq_len(500), ] ## limit to first 500 proteins
gauss <- scott_gaussians[names(scott_gaussians) %in% rownames(subset)]
features <- calculate_features(subset, gauss)
## load training data
data(gold_standard)
ref <- adjacency_matrix_from_list(gold_standard)
## predict interactions
ppi <- predict_interactions(features, ref, cv_folds = 3, models = 1)

fosterlab/PrInCE documentation built on Dec. 13, 2020, 5:50 a.m.