predPPI_MACP: Predict Protein-Protein Interactions and Putative Complexes
In MACP: Macromolecular Assemblies from Co-Elution Profile (MACP)

predPPI_MACP

R Documentation

Predict Protein-Protein Interactions and Putative Complexes

Description

This function first begins by executing several pre-processing steps to improve the quality of the raw data, followed by computing similarity between protein pairs using their co-elution profiles. Computed features and and class labels generated from reference complexes are then fed into an individual or ensemble of ML classifiers.These models then generate a weighted protein interaction network in which edge weights between protein nodes represent the ML model's probability estimate for interaction. High-confidence PPIs resulted from ROC-curve cutoff analysis is then denoised and finally are partitioned via two-stage clustering, first by ClusterONE,then by MCL clustering.

Usage

predPPI_MACP(
  data,
  refcpx,
  tpath = tempdir(),
  data_processing = TRUE,
  data_imputing = TRUE,
  scaling = TRUE,
  keepMT = FALSE,
  pcc = TRUE,
  PCCN = TRUE,
  pcc_p = TRUE,
  spearman = TRUE,
  kendall = TRUE,
  bicor = TRUE,
  weighted_rank = TRUE,
  cosine = TRUE,
  jaccard = TRUE,
  dice = TRUE,
  apex = TRUE,
  minfo = TRUE,
  bayesian = TRUE,
  wcc = TRUE,
  euclidean = TRUE,
  manhattan = TRUE,
  canberra = TRUE,
  avg.distance = TRUE,
  rept = 10,
  corr_removal = FALSE,
  corr_cutoff = 0.5,
  classifier = c("glm", "svmRadial", "ranger"),
  verboseIter = TRUE,
  cv_fold = 5,
  plots = FALSE,
  subcellular_mtPPI = FALSE,
  organism = "mouse",
  csize = 3,
  d = 0.3,
  p = 2,
  max_overlap = 0.8,
  inflation = 9
)

Arguments

`data`	A data matrix with rows including proteins and fractions along the columns. see `exampleData`.
`refcpx`	A list of known reference complexes. see `getCPX`.
`tpath`	A character string indicating the path to the project directory. If the directory is missing, it will be stored in the Temp directory.
`data_processing`	If TRUE, removes proteins for which peptide only detected in one fraction (i.e., "one-hit-wonders") across the co-elution table, common contaminants (e.g., keratins) only for mouse and human organisms and frequent flyers. Defaults to TRUE. See `data_filtering`.
`data_imputing`	if TRUE, imputes missing values in protein elution profile matrix via average of adjacent rows. This function is not applicable for missing values present in the first or last column. Defaults to TRUE. See `impute_MissingData`.
`scaling`	If TRUE, performs column and row-wise normalization. Defaults to TRUE. See `scaling`.
`keepMT`	if TRUE, removes all the non-mitochondrial proteins by mapping the co-eluted proteins from chromatography fractions to MitoCarta database. Note that this function is only applicable to mouse or human organisms.Defaults to FALSE. See `keepMT`.
`pcc`	If TRUE, computes pairwise protein profile similarity using Pearson correlation metric. Defaults to TRUE. See `calculate_PPIscore`.
`PCCN`	If TRUE, computes pairwise protein profile similarity using Pearson correlation plus noise. Defaults to TRUE. See `calculate_PPIscore`.
`pcc_p`	If TRUE, computes P-value of the Pearson correlation. Defaults to TRUE. See `calculate_PPIscore`.
`spearman`	if TRUE, computes pairwise protein profile similarity using spearman correlation. Defaults to TRUE. See `calculate_PPIscore`.
`kendall`	if TRUE, computes pairwise protein profile similarity using kendall correlation. Defaults to TRUE. See `calculate_PPIscore`.
`bicor`	if TRUE, computes pairwise protein profile similarity using biweight midcorrealtion (bicor) correlation. Defaults to TRUE. See `calculate_PPIscore`.
`weighted_rank`	if TRUE, computes pairwise protein profile similarity using weighted rank measure. Defaults to TRUE. See `calculate_PPIscore`.
`cosine`	If TRUE, computes pairwise protein profile similarity using cosine metric. Defaults to TRUE. See `calculate_PPIscore`.
`jaccard`	If TRUE, computes pairwise protein profile similarity using jaccard metric. Defaults to TRUE. See `calculate_PPIscore`.
`dice`	if TRUE, computes pairwise protein profile similarity using dice measure. Defaults to TRUE. See `calculate_PPIscore`.
`apex`	If TRUE, computes pairwise protein profile similarity using apex. Defaults to TRUE. See `calculate_PPIscore`.
`minfo`	If TRUE, computes pairwise protein profile similarity using mutual information. Defaults to TRUE. See `calculate_PPIscore`.
`bayesian`	If TRUE, computes pairwise protein profile similarity using Bayes correlation based on zero-count distribution. Defaults to TRUE. See `calculate_PPIscore`.
`wcc`	If TRUE, computes pairwise protein profile similarity using weighted cross correlation. Defaults to TRUE. See `calculate_PPIscore`.
`euclidean`	if TRUE, computes pairwise protein profile similarity using euclidean measure. Defaults to TRUE. See `calculate_PPIscore`.
`manhattan`	if TRUE, computes pairwise protein profile similarity using manhattan measure. Defaults to TRUE. See `calculate_PPIscore`.
`canberra`	if TRUE, computes pairwise protein profile similarity using canberra measure. Defaults to TRUE. See `calculate_PPIscore`.
`avg.distance`	if TRUE, computes pairwise protein profile similarity using avg.distance measure. Defaults to TRUE. See `calculate_PPIscore`.
`rept`	Poisson iterations, defaults to 10. Defaults to TRUE. See `calculate_PPIscore`.
`corr_removal`	If TRUE, removes protein pairs with correlation scores < the user defined threshold; defaults to FALSE. See `calculate_PPIscore`.
`corr_cutoff`	user defined threshold for correlation similarity scores. Defaults to 0.5.See `calculate_PPIscore`.
`classifier`	The type of classifier to use for ensemble or individual model. See `caret` for the available classifiers. Defaults to c("glm", "svmRadial", "ranger"). See `ensemble_model`.
`verboseIter`	Logical value, indicating whether to check the status of training process;defaults to FALSE. See `ensemble_model`.
`cv_fold`	Number of partitions for cross-validation; defaults to 5. See `ensemble_model`.
`plots`	Logical value, indicating whether to plot the performance of the learning algorithm using k-fold cross-validation; defaults to FALSE. These plots are : pr_plot - Precision-recall PLOT roc_plot - ROC plot point_plot - Point plot showing accuracy, F1-score , positive predictive value (PPV), sensitivity (SE) and MCC. See `ensemble_model`.
`subcellular_mtPPI`	if TRUE, removes PPIs occurring between outer mt membrane (OMM) and matrix, between intermembrane space (IMS) and matrix, as well as between any subcellular mt compartment (except OMM) and cytosolic proteins as they deemed to be erroneous. Defaults to FALSE. See `subcellular.mtPPI`.
`organism`	Organism under study (i.e., mouse or human). Defaults to mouse. See `subcellular.mtPPI`.
`csize`	An integer, the minimum size of the predicted complexes. Defaults to 2. See `get_clusters`.
`d`	A number, density of predicted complexes. Defaults to 0.3. See `get_clusters`.
`p`	An integer, penalty value for the inclusion of each node. Defaults to 2. See `get_clusters`.
`max_overlap`	A number, specifies the maximum allowed overlap between two clusters. Defaults to 0.8. See `get_clusters`.
`inflation`	MCL inflation parameter. Defaults to 9.

Details

predPPI_MACP

Value

Return following data sets in the current directory including:

unfilteredPPIs - Unfiltered interactions
filteredPPI - High-confidence interactions defined by ROC threshold.
High_confidence interactions_with_mt_sublocalization - if subcellular.mtPPI is TRUE, it return high-confidene PPIs with mt sublocalization status.
predicted_cpx_clusterONE - Putative complexes generated by clusterONE.
predicted_cpx_clusterONE_MCL - Putative complexes generated by clusterONE and MCL.
Best_roc_curve_cutoff - Best cutoff generated from ROC curve.