predict_ensemble: Predict interactions using an ensemble of classifiers

Description Usage Arguments Value Examples

View source: R/predict_ensemble.R

Description

Use an ensemble of classifiers to predict interactions from co-elution dataset features. The ensemble approach ensures that results are robust to the partitioning of the dataset into folds. For each model, the median of classifier scores across all folds is calculated. Then, the median of all such medians across all models is calculated.

Usage

1
2
3
4
5
6
7
8
9
predict_ensemble(
  dat,
  labels,
  classifier = c("NB", "SVM", "RF", "LR"),
  models = 1,
  cv_folds = 10,
  trees = 500,
  node_columns = c(1, 2)
)

Arguments

dat

a data frame containing interacting gene/protein pairs in the first two columns, and the features to use for classification in the remaining columns

labels

labels for each interaction in dat: 0 for negatives, 1 for positives, and NA for interactions outside the reference set

classifier

the type of classifier to use; one of "NB" (naive Bayes), "SVM" (support vector machine), "RF" (random forest), or "LR" (logistic regression)

models

the number of classifiers to train

cv_folds

the number of folds to split the reference dataset into when training each classifier. By default, each classifier uses ten-fold cross-validation, i.e., the classifier is trained on 90% of the dataset and used to classify the remaining 10%

trees

for random forest classifiers only, the number of trees to grow for each fold

node_columns

a vector of length two, denoting either the indices (integer vector) or column names (character vector) of the columns within the input data frame containing the nodes participating in pairwise interactions; defaults to the first two columns of the data frame (c(1, 2))

Value

the input data frame of pairwise interactions, ranked by the median of classifier scores across all ensembled models

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## calculate features
data(scott)
data(scott_gaussians)
subset <- scott[seq_len(500), ] ## limit to first 500 proteins
gauss <- scott_gaussians[names(scott_gaussians) %in% rownames(subset)]
features <- calculate_features(subset, gauss)
## make training labels
data(gold_standard)
ref <- adjacency_matrix_from_list(gold_standard)
labels <- make_labels(ref, features)
## predict interactions with naive Bayes classifier
ppi <- predict_ensemble(features, labels, classifier = "NB", 
                        cv_folds = 3, models = 1)

fosterlab/PrInCE-R documentation built on Dec. 11, 2020, 3:51 p.m.