imaging_identification: imaging_identification
In guoguodigit/Metwork: A collection of tools for imaging MS data processing

View source: R/workflow.R

imaging_identification

R Documentation

imaging_identification

Description

This is a peptide mass fingerprint search function for maldi imaging data analysis

Usage

imaging_identification(
  datafile,
  projectfolder = NULL,
  threshold = 0.001,
  ppm = 5,
  mode = c("Proteomics", "Metabolomics"),
  Digestion_site = "trypsin",
  missedCleavages = 0:1,
  Fastadatabase = "uniprot-bovin.fasta",
  adducts = c("M+H"),
  Modifications = list(fixed = NULL, fixmod_position = NULL, variable = NULL,
    varmod_position = NULL),
  Substitute_AA = NULL,
  Decoy_search = TRUE,
  Decoy_adducts = c("M+ACN+H", "M+IsoProp+H", "M+DMSO+H", "M+Co", "M+Ag", "M+Cu",
    "M+He", "M+Ne", "M+Ar", "M+Kr", "M+Xe", "M+Rn"),
  Decoy_mode = "isotope",
  mzrange = c(700, 4000),
  Database_stats = F,
  adjust_score = FALSE,
  IMS_analysis = TRUE,
  PMFsearch = IMS_analysis,
  Load_candidatelist = IMS_analysis || plot_cluster_image_grid,
  Bypass_generate_spectrum = FALSE,
  peptide_ID_filter = 2,
  Protein_feature_summary = TRUE,
  Peptide_feature_summary = TRUE,
  plot_ion_image = FALSE,
  parallel = detectCores(),
  spectra_segments_per_file = 4,
  Segmentation = c("spatialKMeans", "spatialShrunkenCentroids", "Virtual_segmentation",
    "none", "def_file"),
  Segmentation_def = "Segmentation_def.csv",
  Segmentation_ncomp = "auto-detect",
  Segmentation_variance_coverage = 0.8,
  preprocess = list(force_preprocess = FALSE, use_preprocessRDS = TRUE, smoothSignal =
    list(method = "disable"), reduceBaseline = list(method = "locmin"), peakPick =
    list(method = "adaptive"), peakAlign = list(tolerance = ppm/2, units = "ppm"),
    peakFilter = list(freq.min = 0.05), normalize = list(method = c("rms", "tic",
    "reference")[1], mz = 1)),
  Smooth_range = 1,
  Virtual_segmentation_rankfile = NULL,
  Rotate_IMG = NULL,
  Region_feature_summary = FALSE,
  Spectrum_validate = TRUE,
  output_candidatelist = TRUE,
  use_previous_candidates = FALSE,
  score_method = "SQRTP",
  plot_cluster_image_grid = FALSE,
  deconv_peaklist = "New",
  plot_cluster_image_maxretry = 2,
  plot_cluster_image_overwrite = F,
  smooth.image = "gaussian",
  componentID_colname = "Peptide",
  ClusterID_colname = "Protein",
  Protein_desc_of_interest = ".",
  Protein_desc_of_exclusion = NULL,
  plot_unique_component = TRUE,
  FDR_cutoff = 0.05,
  use_top_rank = NULL,
  plot_matching_score = F,
  Component_plot_coloure = "mono",
  cluster_color_scale = "blackwhite",
  plot_layout = "line",
  export_Header_table = T,
  export_footer_table = T,
  attach_summary_cluster = T,
  remove_cluster_from_grid = attach_summary_cluster,
  pixel_size_um = 50,
  img_brightness = 100,
  Thread = 4,
  cluster_rds_path = NULL,
  remove_score_outlier = F,
  Plot_score_IQR_cutoff = 0.75,
  Plot_score_abs_cutoff = -0.1,
  mzAlign_runs = "TopNfeature_mean",
  ...
)

Arguments

`datafile`	the data files' path for the analysis, leave it as blank to enable a graphical user interface to select the data
`projectfolder`	optional, if NULL script will extract the path from datafile(s), and use the first workdir as project folder
`threshold`	specify the intensities threshold (0 to 1 in percentage)to report a identified molecule
`ppm`	the mz tolerance (in ppm) for peak integration
`Digestion_site`	Set the enzyme digestion specificity by one or more regex expressions or the name of a enzyme
`missedCleavages`	miss cleavage number allowed in this PMF search
`Fastadatabase`	the fasta database used in this pmf search, the file should be placed in the same folder with data files
`adducts`	the adducts list to be used for generating the PMF search candidates
`Modifications`	set the modifications
`Substitute_AA`	set the amino acid Substitutions
`Decoy_search`	enable (default) or disable the decoy search
`Decoy_adducts`	define the adduct list for decoy search. the decoy adducts could be "M+ACN+H","M+IsoProp+H","M+DMSO+H","M+Co","M+Ag","M+Cu","M+He","M+Ne","M+Ar","M+Kr","M+Xe" or"M+Rn".
`Decoy_mode`	select the decoy search mode between "isotope" (default), "element" and "adduct"
`mzrange`	define the mz range for the experiment, default is 700 to 4000 m/z.
`IMS_analysis`	Set `"true"` if you want to perform data pre-processing and proteomics search, set `"false"` if you want to bypass it
`peptide_ID_filter`	set the minimal count of peptides needed to identify a protein
`Protein_feature_summary`	`"IMS_analysis"` follow-up process that will collect all the identified peptide information and associate them with possible proteins
`Peptide_feature_summary`	`"IMS_analysis"` follow-up process that will summarize all datafiles identified peptides and generats a `"peptide shortlist"` in the result summary folder
`plot_ion_image`	`"Peptide_feature_summarya"` follow-up process that will plot every connponents in the `"peptide shortlist"`. please use the cluster image grid to output the images.
`parallel`	the number of threads will be used in the PMF search, this option now only works for windows OS
`spectra_segments_per_file`	optimal number of distinctive regions in the tissue section, a virtual segmentation will be applied to the image files with this value. To have a better PMF result you may set a value that in the sweet point of sensitivety and false discovery rate (FDR).
`Segmentation`	set as "spatialKMeans" to enable a `"spatialKMeans"` Segmentation; set as "spatialShrunkenCentroids" to enable a `"spatialShrunkenCentroids"` Segmentation; If a region rank file was supplied, you can set this as "Virtual_segmentation" to perform a manual segmentation; Set it as "none" to bypass the segmentation.
`preprocess`	a list of params that define the IMS data pre-processing procedure
`Smooth_range`	`"Segmentation"` pixel smooth range
`Virtual_segmentation_rankfile`	specify a region rank file contains region information for manualy region segmentation
`Rotate_IMG`	specify a configuration file to further change the rotation of the images
`Region_feature_summary`	`"IMS_analysis"` follow-up process that will summarize mz feature of all regions of all data files into the summary folder
`use_previous_candidates`	set as TRUE to reload the previously generated candidate list.
`score_method`	specify the peptide spectrum scoring method, "SQRTP" is recommended.
`plot_cluster_image_grid`	set as `"TRUE"` to enable the protein cluster image function.
`plot_cluster_image_overwrite`	Set as true to generate the cluster images regardless the existance of previously file(s)
`componentID_colname`	Specify the component ID column in the result spreadsheet.
`ClusterID_colname`	Specify the cluster ID column in the result spreadsheet.
`Protein_desc_of_interest`	Specify a list of protein descriptions for cluster image plotting. Default setting will plot all reported proteins.
`Protein_desc_of_exclusion`	Specify a list of protein descriptions to be excluded from cluster image plotting.
`plot_unique_component`	Set as `"TRUE"` to plot only the unique components in the cluster image plotting.
`FDR_cutoff`	set the protein FDR cutoff threshold, default is 5 percent
`plot_matching_score`	enable the spectrum matching overlay plot
`Component_plot_coloure`	set as "mono" to use a pre-defined color scale to plot component images. Set as "as.cluster" to use the previously assigned mono color in the additive cluster binning process.
`cluster_color_scale`	Set as "blackwhite" to use only black and white color in the cluster image plotting. using "blackwhite" in cluster_color_scale will overwrite the components' color setting.
`plot_layout`	Set as `"line"` to plot cluster and component images for multiple data file or as `"grid"` to plot cluster images for single data file. In "grid" mode, Image's will be rendered into a grid with 5 columns.
`export_Header_table`	Set as `"TRUE"` to plot the header in the cluster image plotting. Header table includes the basic information of cluster and components.
`export_footer_table`	Set as `"TRUE"` to plot the footer in the cluster image plotting. Footer shows the protein coverage in the Proteomics mode.
`attach_summary_cluster`	Set as `"TRUE"` to attach an enlarged cluster image to the bottom of the cluster image.
`remove_cluster_from_grid`	Set as `"TRUE"` to remove the cluster image from the cluster image grid. it is recommended to set this same as the attach_summary_cluster.
`cluster_rds_path`	set as NULL if there is not preprocessed.rds available for a single file, script will load the raw data file which may reduce the signal intensities. For multiple samples, scripts will try to load the RDS file from each "ID" folder and merge the mz features via instrument resolution setting and output a combined RDS file to the project folder. For multiple files cluster images rendering user should set the attach_summary_cluster as False, and set remove_cluster_from_grid as true.

Value

None

Examples

imaging_identification(threshold=0.05, ppm=5,Digestion_site="[G]",
                       missedCleavages=0:1,Fastadatabase="murine_matrisome.fasta",
                       adducts=c("M+H","M+NH4","M+Na"),IMS_analysis=TRUE,
                       Protein_feature_summary=TRUE,plot_cluster_image=TRUE,
                       Peptide_feature_summary=TRUE,plot_ion_image=FALSE,
                       parallel=3,spectra_segments_per_file=5,Segmentation="spatialKMeans"
                       )

guoguodigit/Metwork documentation built on June 11, 2025, 5 a.m.