The MSiP is a computational approach to predict protein-protein interactions (PPIs) from large-scale affinity purification mass spectrometry (AP-MS) data. This approach includes both spoke and matrix models for interpreting AP-MS data in a network context. The "spoke" model considers only bait-prey interactions, whereas the "matrix" model assumes that each of the identified proteins (baits and prey) in a given AP-MS experiment interacts with each of the others. The spoke model has a high false-negative rate, whereas the matrix model has a high false-positive rate. Thus, although both statistical models have merits, a combination of both models has shown to increase the performance of machine learning classifiers in terms of their capabilities in discrimination between true and false positive interactions.
library(MSiP)
A demo AP-MS proteomics dataset is provided in this package to guide the users about data structure.
data("SampleDatInput") head(SampleDatInput)
Comparative Proteomic Analysis Software Suite (CompPASS) is a robust statistical scoring scheme for assigning scores to bait-prey interactions. The output from CompPASS scoring includes Z-score, S-score, D-score, WD-score and other features. This function was optimized from the.
datScoring <- cPASS(SampleDatInput) head(datScoring)
The Dice coefficient was first applied by to score interaction between all identified proteins (baits and preys) in a given AP-MS expriment.
datScoring <- diceCoefficient(SampleDatInput) head(datScoring)
Alternatively, Jaccard, Simpson, and Overlap scores can be used to score the interaction between all the identified proteins in a given AP-MS experiment.
#Jaccard coefficient datScoring <- jaccardCoefficient(SampleDatInput) head(datScoring) #Simpson coefficient datScoring <- simpsonCoefficient(SampleDatInput) head(datScoring) #Overlap score datScoring <- simpsonCoefficient(SampleDatInput) head(datScoring)
Finally, a weighted matrix model can also be employed to score interactions between identified proteins in a given AP-MS experiment. The output of the weighted matrix model includes the number of experiments for which the pair of proteins is co-purified (i.e., k) and $-1$*log(P-value) of the hypergeometric test (i.e., logHG) given the experimental overlap value, each protein's total number of observed experiments, and the total number of experiments.
datScoring <- Weighted.matrixModel(SampleDatInput) head(datScoring)
The labeled feature matrix can be used as input for Support Vector Machine (SVM) or Random Forest (RF) classifiers. The classifier then assigns each bait-prey pair a confidence score, indicating the level of support for that pair of proteins to interact. Hyperparameter optimization can also be performed to select a set of parameters that maximizes the model's performance. The RF and the SVM functions provided in this package also computes the areas under the precision-recall (PR) and ROC curve to evalute the performance of the classifier.
data("testdfClassifier") head(testdfClassifier)
#only generate the pr.curve predidcted_RF <- rfTrain(testdfClassifier,impute = FALSE, p = 0.3, parameterTuning = FALSE, mtry = seq(from = 1, to = 5, by = 1), min_node_size = seq(from = 1, to = 5, by = 1), splitrule =c("gini"),metric = "Accuracy", resampling.method = "repeatedcv",iter = 5,repeats = 5, pr.plot = TRUE, roc.plot = FALSE )
#positive score corresponds to the level of support for the pair of proteins to be true positive #negative score corresponds to the level of support for the pair of proteins to be true negative head(predidcted_RF)
#only generate the ROC curve predidcted_SVM <- svmTrain(testdfClassifier,impute = FALSE,p = 0.3,parameterTuning = FALSE, cost = seq(from = 2, to = 10, by = 2), gamma = seq(from = 0.01, to = 0.10, by = 0.02), kernel = "radial",ncross = 10, pr.plot = FALSE, roc.plot = TRUE )
#positive score corresponds to the level of support for the pair of proteins to be true positive #negative score corresponds to the level of support for the pair of proteins to be true negative head(predidcted_SVM)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.