predPPI_MACP: Predict Protein-Protein Interactions and Putative Complexes

View source: R/predPPI_MACP.R

predPPI_MACPR Documentation

Predict Protein-Protein Interactions and Putative Complexes

Description

This function first begins by executing several pre-processing steps to improve the quality of the raw data, followed by computing similarity between protein pairs using their co-elution profiles. Computed features and and class labels generated from reference complexes are then fed into an individual or ensemble of ML classifiers.These models then generate a weighted protein interaction network in which edge weights between protein nodes represent the ML model's probability estimate for interaction. High-confidence PPIs resulted from ROC-curve cutoff analysis is then denoised and finally are partitioned via two-stage clustering, first by ClusterONE,then by MCL clustering.

Usage

predPPI_MACP(
  data,
  refcpx,
  tpath = tempdir(),
  data_processing = TRUE,
  data_imputing = TRUE,
  scaling = TRUE,
  keepMT = FALSE,
  pcc = TRUE,
  PCCN = TRUE,
  pcc_p = TRUE,
  spearman = TRUE,
  kendall = TRUE,
  bicor = TRUE,
  weighted_rank = TRUE,
  cosine = TRUE,
  jaccard = TRUE,
  dice = TRUE,
  apex = TRUE,
  minfo = TRUE,
  bayesian = TRUE,
  wcc = TRUE,
  euclidean = TRUE,
  manhattan = TRUE,
  canberra = TRUE,
  avg.distance = TRUE,
  rept = 10,
  corr_removal = FALSE,
  corr_cutoff = 0.5,
  classifier = c("glm", "svmRadial", "ranger"),
  verboseIter = TRUE,
  cv_fold = 5,
  plots = FALSE,
  subcellular_mtPPI = FALSE,
  organism = "mouse",
  csize = 3,
  d = 0.3,
  p = 2,
  max_overlap = 0.8,
  inflation = 9
)

Arguments

data

A data matrix with rows including proteins and fractions along the columns. see exampleData.

refcpx

A list of known reference complexes. see getCPX.

tpath

A character string indicating the path to the project directory. If the directory is missing, it will be stored in the Temp directory.

data_processing

If TRUE, removes proteins for which peptide only detected in one fraction (i.e., "one-hit-wonders") across the co-elution table, common contaminants (e.g., keratins) only for mouse and human organisms and frequent flyers. Defaults to TRUE. See data_filtering.

data_imputing

if TRUE, imputes missing values in protein elution profile matrix via average of adjacent rows. This function is not applicable for missing values present in the first or last column. Defaults to TRUE. See impute_MissingData.

scaling

If TRUE, performs column and row-wise normalization. Defaults to TRUE. See scaling.

keepMT

if TRUE, removes all the non-mitochondrial proteins by mapping the co-eluted proteins from chromatography fractions to MitoCarta database. Note that this function is only applicable to mouse or human organisms.Defaults to FALSE. See keepMT.

pcc

If TRUE, computes pairwise protein profile similarity using Pearson correlation metric. Defaults to TRUE. See calculate_PPIscore.

PCCN

If TRUE, computes pairwise protein profile similarity using Pearson correlation plus noise. Defaults to TRUE. See calculate_PPIscore.

pcc_p

If TRUE, computes P-value of the Pearson correlation. Defaults to TRUE. See calculate_PPIscore.

spearman

if TRUE, computes pairwise protein profile similarity using spearman correlation. Defaults to TRUE. See calculate_PPIscore.

kendall

if TRUE, computes pairwise protein profile similarity using kendall correlation. Defaults to TRUE. See calculate_PPIscore.

bicor

if TRUE, computes pairwise protein profile similarity using biweight midcorrealtion (bicor) correlation. Defaults to TRUE. See calculate_PPIscore.

weighted_rank

if TRUE, computes pairwise protein profile similarity using weighted rank measure. Defaults to TRUE. See calculate_PPIscore.

cosine

If TRUE, computes pairwise protein profile similarity using cosine metric. Defaults to TRUE. See calculate_PPIscore.

jaccard

If TRUE, computes pairwise protein profile similarity using jaccard metric. Defaults to TRUE. See calculate_PPIscore.

dice

if TRUE, computes pairwise protein profile similarity using dice measure. Defaults to TRUE. See calculate_PPIscore.

apex

If TRUE, computes pairwise protein profile similarity using apex. Defaults to TRUE. See calculate_PPIscore.

minfo

If TRUE, computes pairwise protein profile similarity using mutual information. Defaults to TRUE. See calculate_PPIscore.

bayesian

If TRUE, computes pairwise protein profile similarity using Bayes correlation based on zero-count distribution. Defaults to TRUE. See calculate_PPIscore.

wcc

If TRUE, computes pairwise protein profile similarity using weighted cross correlation. Defaults to TRUE. See calculate_PPIscore.

euclidean

if TRUE, computes pairwise protein profile similarity using euclidean measure. Defaults to TRUE. See calculate_PPIscore.

manhattan

if TRUE, computes pairwise protein profile similarity using manhattan measure. Defaults to TRUE. See calculate_PPIscore.

canberra

if TRUE, computes pairwise protein profile similarity using canberra measure. Defaults to TRUE. See calculate_PPIscore.

avg.distance

if TRUE, computes pairwise protein profile similarity using avg.distance measure. Defaults to TRUE. See calculate_PPIscore.

rept

Poisson iterations, defaults to 10. Defaults to TRUE. See calculate_PPIscore.

corr_removal

If TRUE, removes protein pairs with correlation scores < the user defined threshold; defaults to FALSE. See calculate_PPIscore.

corr_cutoff

user defined threshold for correlation similarity scores. Defaults to 0.5.See calculate_PPIscore.

classifier

The type of classifier to use for ensemble or individual model. See caret for the available classifiers. Defaults to c("glm", "svmRadial", "ranger"). See ensemble_model.

verboseIter

Logical value, indicating whether to check the status of training process;defaults to FALSE. See ensemble_model.

cv_fold

Number of partitions for cross-validation; defaults to 5. See ensemble_model.

plots

Logical value, indicating whether to plot the performance of the learning algorithm using k-fold cross-validation; defaults to FALSE. These plots are :

  • pr_plot - Precision-recall PLOT

  • roc_plot - ROC plot

  • point_plot - Point plot showing accuracy, F1-score , positive predictive value (PPV), sensitivity (SE) and MCC.

See ensemble_model.

subcellular_mtPPI

if TRUE, removes PPIs occurring between outer mt membrane (OMM) and matrix, between intermembrane space (IMS) and matrix, as well as between any subcellular mt compartment (except OMM) and cytosolic proteins as they deemed to be erroneous. Defaults to FALSE. See subcellular.mtPPI.

organism

Organism under study (i.e., mouse or human). Defaults to mouse. See subcellular.mtPPI.

csize

An integer, the minimum size of the predicted complexes. Defaults to 2. See get_clusters.

d

A number, density of predicted complexes. Defaults to 0.3. See get_clusters.

p

An integer, penalty value for the inclusion of each node. Defaults to 2. See get_clusters.

max_overlap

A number, specifies the maximum allowed overlap between two clusters. Defaults to 0.8. See get_clusters.

inflation

MCL inflation parameter. Defaults to 9.

Details

predPPI_MACP

Value

Return following data sets in the current directory including:

  • unfilteredPPIs - Unfiltered interactions

  • filteredPPI - High-confidence interactions defined by ROC threshold.

  • High_confidence interactions_with_mt_sublocalization - if subcellular.mtPPI is TRUE, it return high-confidene PPIs with mt sublocalization status.

  • predicted_cpx_clusterONE - Putative complexes generated by clusterONE.

  • predicted_cpx_clusterONE_MCL - Putative complexes generated by clusterONE and MCL.

  • Best_roc_curve_cutoff - Best cutoff generated from ROC curve.


MACP documentation built on March 7, 2023, 7:42 p.m.