predPPI_MACP | R Documentation |
This function first begins by executing several pre-processing steps to improve the quality of the raw data, followed by computing similarity between protein pairs using their co-elution profiles. Computed features and and class labels generated from reference complexes are then fed into an individual or ensemble of ML classifiers.These models then generate a weighted protein interaction network in which edge weights between protein nodes represent the ML model's probability estimate for interaction. High-confidence PPIs resulted from ROC-curve cutoff analysis is then denoised and finally are partitioned via two-stage clustering, first by ClusterONE,then by MCL clustering.
predPPI_MACP( data, refcpx, tpath = tempdir(), data_processing = TRUE, data_imputing = TRUE, scaling = TRUE, keepMT = FALSE, pcc = TRUE, PCCN = TRUE, pcc_p = TRUE, spearman = TRUE, kendall = TRUE, bicor = TRUE, weighted_rank = TRUE, cosine = TRUE, jaccard = TRUE, dice = TRUE, apex = TRUE, minfo = TRUE, bayesian = TRUE, wcc = TRUE, euclidean = TRUE, manhattan = TRUE, canberra = TRUE, avg.distance = TRUE, rept = 10, corr_removal = FALSE, corr_cutoff = 0.5, classifier = c("glm", "svmRadial", "ranger"), verboseIter = TRUE, cv_fold = 5, plots = FALSE, subcellular_mtPPI = FALSE, organism = "mouse", csize = 3, d = 0.3, p = 2, max_overlap = 0.8, inflation = 9 )
data |
A data matrix with rows including proteins and fractions
along the columns. see |
refcpx |
A list of known reference complexes.
see |
tpath |
A character string indicating the path to the project directory. If the directory is missing, it will be stored in the Temp directory. |
data_processing |
If TRUE, removes proteins for which peptide only
detected in one fraction (i.e., "one-hit-wonders") across the
co-elution table, common contaminants (e.g., keratins) only for mouse
and human organisms and frequent flyers. Defaults to TRUE.
See |
data_imputing |
if TRUE, imputes missing values in protein elution
profile matrix via average of adjacent rows. This function is
not applicable for missing values present in the first or last column.
Defaults to TRUE. See |
scaling |
If TRUE, performs column and row-wise normalization.
Defaults to TRUE. See |
keepMT |
if TRUE, removes all the non-mitochondrial proteins by
mapping the co-eluted proteins from chromatography fractions to MitoCarta
database. Note that this function is only applicable to
mouse or human organisms.Defaults to FALSE. See |
pcc |
If TRUE, computes pairwise protein profile similarity
using Pearson correlation metric. Defaults to TRUE.
See |
PCCN |
If TRUE, computes pairwise protein profile similarity using
Pearson correlation plus noise. Defaults to TRUE.
See |
pcc_p |
If TRUE, computes P-value of the Pearson correlation.
Defaults to TRUE.
See |
spearman |
if TRUE, computes pairwise protein profile similarity using
spearman correlation. Defaults to TRUE.
See |
kendall |
if TRUE, computes pairwise protein profile similarity using
kendall correlation. Defaults to TRUE.
See |
bicor |
if TRUE, computes pairwise protein profile similarity using
biweight midcorrealtion (bicor) correlation. Defaults to TRUE.
See |
weighted_rank |
if TRUE, computes pairwise protein profile similarity
using weighted rank measure. Defaults to TRUE.
See |
cosine |
If TRUE, computes pairwise protein profile similarity
using cosine metric. Defaults to TRUE.
See |
jaccard |
If TRUE, computes pairwise protein profile similarity
using jaccard metric. Defaults to TRUE.
See |
dice |
if TRUE, computes pairwise protein profile similarity
using dice measure. Defaults to TRUE.
See |
apex |
If TRUE, computes pairwise protein profile similarity
using apex. Defaults to TRUE. See |
minfo |
If TRUE, computes pairwise protein profile similarity
using mutual information. Defaults to TRUE.
See |
bayesian |
If TRUE, computes pairwise protein profile similarity using
Bayes correlation based on zero-count distribution. Defaults to TRUE.
See |
wcc |
If TRUE, computes pairwise protein profile similarity
using weighted cross correlation. Defaults to TRUE.
See |
euclidean |
if TRUE, computes pairwise protein profile similarity
using euclidean measure. Defaults to TRUE.
See |
manhattan |
if TRUE, computes pairwise protein profile similarity
using manhattan measure. Defaults to TRUE.
See |
canberra |
if TRUE, computes pairwise protein profile similarity
using canberra measure. Defaults to TRUE.
See |
avg.distance |
if TRUE, computes pairwise protein profile similarity
using avg.distance measure. Defaults to TRUE.
See |
rept |
Poisson iterations, defaults to 10. Defaults to TRUE.
See |
corr_removal |
If TRUE, removes protein pairs with
correlation scores < the user defined threshold; defaults to FALSE.
See |
corr_cutoff |
user defined threshold for correlation similarity
scores. Defaults to 0.5.See |
classifier |
The type of classifier to use for ensemble or
individual model. See |
verboseIter |
Logical value, indicating whether to check the status
of training process;defaults to FALSE. See |
cv_fold |
Number of partitions for cross-validation; defaults to 5.
See |
plots |
Logical value, indicating whether to plot the performance of the learning algorithm using k-fold cross-validation; defaults to FALSE. These plots are :
See |
subcellular_mtPPI |
if TRUE, removes PPIs occurring between outer mt
membrane (OMM) and matrix, between intermembrane space (IMS) and matrix,
as well as between any subcellular mt compartment (except OMM) and
cytosolic proteins as they deemed to be erroneous. Defaults to FALSE.
See |
organism |
Organism under study (i.e., mouse or human).
Defaults to mouse. See |
csize |
An integer, the minimum size of the predicted complexes.
Defaults to 2. See |
d |
A number, density of predicted complexes. Defaults to 0.3.
See |
p |
An integer, penalty value for the inclusion of each node.
Defaults to 2.
See |
max_overlap |
A number, specifies the maximum allowed
overlap between two clusters. Defaults to 0.8.
See |
inflation |
MCL inflation parameter. Defaults to 9. |
predPPI_MACP
Return following data sets in the current directory including:
unfilteredPPIs - Unfiltered interactions
filteredPPI - High-confidence interactions defined by ROC threshold.
High_confidence interactions_with_mt_sublocalization - if subcellular.mtPPI is TRUE, it return high-confidene PPIs with mt sublocalization status.
predicted_cpx_clusterONE - Putative complexes generated by clusterONE.
predicted_cpx_clusterONE_MCL - Putative complexes generated by clusterONE and MCL.
Best_roc_curve_cutoff - Best cutoff generated from ROC curve.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.