imaging_identification: imaging_identification

View source: R/workflow.R

imaging_identificationR Documentation

imaging_identification

Description

This is a peptide mass fingerprint search function for maldi imaging data analysis

Usage

imaging_identification(
  datafile,
  projectfolder = NULL,
  threshold = 0.001,
  ppm = 5,
  mode = c("Proteomics", "Metabolomics"),
  Digestion_site = "trypsin",
  missedCleavages = 0:1,
  Fastadatabase = "uniprot-bovin.fasta",
  adducts = c("M+H"),
  Modifications = list(fixed = NULL, fixmod_position = NULL, variable = NULL,
    varmod_position = NULL),
  Substitute_AA = NULL,
  Decoy_search = TRUE,
  Decoy_adducts = c("M+ACN+H", "M+IsoProp+H", "M+DMSO+H", "M+Co", "M+Ag", "M+Cu",
    "M+He", "M+Ne", "M+Ar", "M+Kr", "M+Xe", "M+Rn"),
  Decoy_mode = "isotope",
  mzrange = c(700, 4000),
  Database_stats = F,
  adjust_score = FALSE,
  IMS_analysis = TRUE,
  PMFsearch = IMS_analysis,
  Load_candidatelist = IMS_analysis || plot_cluster_image_grid,
  Bypass_generate_spectrum = FALSE,
  peptide_ID_filter = 2,
  Protein_feature_summary = TRUE,
  Peptide_feature_summary = TRUE,
  plot_ion_image = FALSE,
  parallel = detectCores(),
  spectra_segments_per_file = 4,
  Segmentation = c("spatialKMeans", "spatialShrunkenCentroids", "Virtual_segmentation",
    "none", "def_file"),
  Segmentation_def = "Segmentation_def.csv",
  Segmentation_ncomp = "auto-detect",
  Segmentation_variance_coverage = 0.8,
  preprocess = list(force_preprocess = FALSE, use_preprocessRDS = TRUE, smoothSignal =
    list(method = "disable"), reduceBaseline = list(method = "locmin"), peakPick =
    list(method = "adaptive"), peakAlign = list(tolerance = ppm/2, units = "ppm"),
    peakFilter = list(freq.min = 0.05), normalize = list(method = c("rms", "tic",
    "reference")[1], mz = 1)),
  Smooth_range = 1,
  Virtual_segmentation_rankfile = NULL,
  Rotate_IMG = NULL,
  Region_feature_summary = FALSE,
  Spectrum_validate = TRUE,
  output_candidatelist = TRUE,
  use_previous_candidates = FALSE,
  score_method = "SQRTP",
  plot_cluster_image_grid = FALSE,
  deconv_peaklist = "New",
  plot_cluster_image_maxretry = 2,
  plot_cluster_image_overwrite = F,
  smooth.image = "gaussian",
  componentID_colname = "Peptide",
  ClusterID_colname = "Protein",
  Protein_desc_of_interest = ".",
  Protein_desc_of_exclusion = NULL,
  plot_unique_component = TRUE,
  FDR_cutoff = 0.05,
  use_top_rank = NULL,
  plot_matching_score = F,
  Component_plot_coloure = "mono",
  cluster_color_scale = "blackwhite",
  plot_layout = "line",
  export_Header_table = T,
  export_footer_table = T,
  attach_summary_cluster = T,
  remove_cluster_from_grid = attach_summary_cluster,
  pixel_size_um = 50,
  img_brightness = 100,
  Thread = 4,
  cluster_rds_path = NULL,
  remove_score_outlier = F,
  Plot_score_IQR_cutoff = 0.75,
  Plot_score_abs_cutoff = -0.1,
  mzAlign_runs = "TopNfeature_mean",
  ...
)

Arguments

datafile

the data files' path for the analysis, leave it as blank to enable a graphical user interface to select the data

projectfolder

optional, if NULL script will extract the path from datafile(s), and use the first workdir as project folder

threshold

specify the intensities threshold (0 to 1 in percentage)to report a identified molecule

ppm

the mz tolerance (in ppm) for peak integration

Digestion_site

Set the enzyme digestion specificity by one or more regex expressions or the name of a enzyme

missedCleavages

miss cleavage number allowed in this PMF search

Fastadatabase

the fasta database used in this pmf search, the file should be placed in the same folder with data files

adducts

the adducts list to be used for generating the PMF search candidates

Modifications

set the modifications

Substitute_AA

set the amino acid Substitutions

Decoy_search

enable (default) or disable the decoy search

Decoy_adducts

define the adduct list for decoy search. the decoy adducts could be "M+ACN+H","M+IsoProp+H","M+DMSO+H","M+Co","M+Ag","M+Cu","M+He","M+Ne","M+Ar","M+Kr","M+Xe" or"M+Rn".

Decoy_mode

select the decoy search mode between "isotope" (default), "element" and "adduct"

mzrange

define the mz range for the experiment, default is 700 to 4000 m/z.

IMS_analysis

Set "true" if you want to perform data pre-processing and proteomics search, set "false" if you want to bypass it

peptide_ID_filter

set the minimal count of peptides needed to identify a protein

Protein_feature_summary

"IMS_analysis" follow-up process that will collect all the identified peptide information and associate them with possible proteins

Peptide_feature_summary

"IMS_analysis" follow-up process that will summarize all datafiles identified peptides and generats a "peptide shortlist" in the result summary folder

plot_ion_image

"Peptide_feature_summarya" follow-up process that will plot every connponents in the "peptide shortlist". please use the cluster image grid to output the images.

parallel

the number of threads will be used in the PMF search, this option now only works for windows OS

spectra_segments_per_file

optimal number of distinctive regions in the tissue section, a virtual segmentation will be applied to the image files with this value. To have a better PMF result you may set a value that in the sweet point of sensitivety and false discovery rate (FDR).

Segmentation

set as "spatialKMeans" to enable a "spatialKMeans" Segmentation; set as "spatialShrunkenCentroids" to enable a "spatialShrunkenCentroids" Segmentation; If a region rank file was supplied, you can set this as "Virtual_segmentation" to perform a manual segmentation; Set it as "none" to bypass the segmentation.

preprocess

a list of params that define the IMS data pre-processing procedure

Smooth_range

"Segmentation" pixel smooth range

Virtual_segmentation_rankfile

specify a region rank file contains region information for manualy region segmentation

Rotate_IMG

specify a configuration file to further change the rotation of the images

Region_feature_summary

"IMS_analysis" follow-up process that will summarize mz feature of all regions of all data files into the summary folder

use_previous_candidates

set as TRUE to reload the previously generated candidate list.

score_method

specify the peptide spectrum scoring method, "SQRTP" is recommended.

plot_cluster_image_grid

set as "TRUE" to enable the protein cluster image function.

plot_cluster_image_overwrite

Set as true to generate the cluster images regardless the existance of previously file(s)

componentID_colname

Specify the component ID column in the result spreadsheet.

ClusterID_colname

Specify the cluster ID column in the result spreadsheet.

Protein_desc_of_interest

Specify a list of protein descriptions for cluster image plotting. Default setting will plot all reported proteins.

Protein_desc_of_exclusion

Specify a list of protein descriptions to be excluded from cluster image plotting.

plot_unique_component

Set as "TRUE" to plot only the unique components in the cluster image plotting.

FDR_cutoff

set the protein FDR cutoff threshold, default is 5 percent

plot_matching_score

enable the spectrum matching overlay plot

Component_plot_coloure

set as "mono" to use a pre-defined color scale to plot component images. Set as "as.cluster" to use the previously assigned mono color in the additive cluster binning process.

cluster_color_scale

Set as "blackwhite" to use only black and white color in the cluster image plotting. using "blackwhite" in cluster_color_scale will overwrite the components' color setting.

plot_layout

Set as "line" to plot cluster and component images for multiple data file or as "grid" to plot cluster images for single data file. In "grid" mode, Image's will be rendered into a grid with 5 columns.

export_Header_table

Set as "TRUE" to plot the header in the cluster image plotting. Header table includes the basic information of cluster and components.

export_footer_table

Set as "TRUE" to plot the footer in the cluster image plotting. Footer shows the protein coverage in the Proteomics mode.

attach_summary_cluster

Set as "TRUE" to attach an enlarged cluster image to the bottom of the cluster image.

remove_cluster_from_grid

Set as "TRUE" to remove the cluster image from the cluster image grid. it is recommended to set this same as the attach_summary_cluster.

cluster_rds_path

set as NULL if there is not preprocessed.rds available for a single file, script will load the raw data file which may reduce the signal intensities. For multiple samples, scripts will try to load the RDS file from each "ID" folder and merge the mz features via instrument resolution setting and output a combined RDS file to the project folder. For multiple files cluster images rendering user should set the attach_summary_cluster as False, and set remove_cluster_from_grid as true.

Value

None

Examples

imaging_identification(threshold=0.05, ppm=5,Digestion_site="[G]",
                       missedCleavages=0:1,Fastadatabase="murine_matrisome.fasta",
                       adducts=c("M+H","M+NH4","M+Na"),IMS_analysis=TRUE,
                       Protein_feature_summary=TRUE,plot_cluster_image=TRUE,
                       Peptide_feature_summary=TRUE,plot_ion_image=FALSE,
                       parallel=3,spectra_segments_per_file=5,Segmentation="spatialKMeans"
                       )


MASHUOA/HiTMaP documentation built on Nov. 14, 2024, 5:23 p.m.