Description Usage Arguments Details Value Examples
View source: R/predict_interactions.R
Discriminate interacting from non-interacting protein pairs by training a
machine learning model on a set of labelled examples, given a set of
features derived from a co-elution profile matrix (see
calculate_features
.
1 2 3 4 5 6 7 8 9 | predict_interactions(
features,
gold_standard,
classifier = c("NB", "SVM", "RF", "LR", "ensemble"),
verbose = FALSE,
models = 10,
cv_folds = 10,
trees = 500
)
|
features |
a data frame with proteins in the first two columns, and features to be passed to the classifier in the remaining columns |
gold_standard |
an adjacency matrix of "gold standard" interactions used to train the classifier |
classifier |
the type of classifier to use: one of |
verbose |
if |
models |
the number of classifiers to train and average across, each with a different k-fold cross-validation split |
cv_folds |
the number of folds to use for k-fold cross-validation |
trees |
for random forests only, the number of trees in the forest |
PrInCE implements four different classifiers (naive Bayes, support vector machine, random forest, and logistic regression). Naive Bayes is used as a default. The classifiers are trained on the gold standards using a ten-fold cross-validation procedure, training on 90 that are part of the training data, the held-out split is used to assign a classifier score, whereas for the remaining protein pairs, the median of all ten folds is used. Furthermore, to ensure the results are not sensitive to the precise classifier split used, an ensemble of multiple classifiers (ten, by default) is trained, and the classifier score is subsequently averaged across classifiers.
PrInCE can also ensemble across multiple different types of classifiers,
by supplying the "ensemble"
option to the classifier
argument.
a ranked data frame of pairwise interactions, with the classifier score, label, and cumulative precision for each interaction
1 2 3 4 5 6 7 8 9 10 11 | ## calculate features
data(scott)
data(scott_gaussians)
subset <- scott[seq_len(500), ] ## limit to first 500 proteins
gauss <- scott_gaussians[names(scott_gaussians) %in% rownames(subset)]
features <- calculate_features(subset, gauss)
## load training data
data(gold_standard)
ref <- adjacency_matrix_from_list(gold_standard)
## predict interactions
ppi <- predict_interactions(features, ref, cv_folds = 3, models = 1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.