normPep: Combines peptide reports across multiple experiments.

normPepR Documentation

Combines peptide reports across multiple experiments.

Description

Median summary of data from the same TMT or LFQ experiment at different LCMS injections summed pep_n_psm, prot_n_psm, and prot_n_pep after data merging no Z_log2_R yet available use col_select = expr(Sample_ID) not col_select to get all Z_log2_R why: users may specify col_select only partial to Sample_ID entries.

Usage

normPep(
  dat_dir = NULL,
  group_psm_by = "pep_seq_mod",
  group_pep_by = "prot_acc",
  engine = "mz",
  lfq_mbr = TRUE,
  use_duppeps = TRUE,
  duppeps_repair = "denovo",
  cut_points = Inf,
  omit_single_lfq = FALSE,
  use_mq_pep = FALSE,
  use_mf_pep = FALSE,
  rm_allna = FALSE,
  mbr_ret_tol = 25,
  max_mbr_fold = 20,
  ret_sd_tol = Inf,
  rm_ret_outliers = FALSE,
  use_spec_counts = FALSE,
  ...
)

Arguments

dat_dir

A character string to the working directory. The default is to match the value under the global environment.

group_psm_by

A character string specifying the method in PSM grouping. At the pep_seq default, descriptive statistics will be calculated based on the same pep_seq groups. At the pep_seq_mod alternative, peptides with different variable modifications will be treated as different species and descriptive statistics will be calculated based on the same pep_seq_mod groups.

group_pep_by

A character string specifying the method in peptide grouping. At the prot_acc default, descriptive statistics will be calculated based on the same prot_acc groups. At the gene alternative, proteins with the same gene name but different accession numbers will be treated as one group.

engine

The name of search engine.

lfq_mbr

Logical; if TRUE, performs match-between-run (MBR) with Mzion LFQ data. Also requires ms1full_[rawfile].rds at the same file-folder level of psmQ[...].txt.

use_duppeps

Logical; if TRUE, re-assigns double/multiple dipping peptide sequences to the most likely proteins by majority votes.

duppeps_repair

Not currently used (or only with majority). Character string; the method of reparing double-dipping peptide sequences upon data pooling.

For instance, the same sequence of PEPTIDE may be assigned to protein accession PROT_ACC1 in data set 1 and PROT_ACC2 in data set 2. At the denovo default, the peptide to protein association will be re-established freshly. At the majority alternative, a majority rule will be applied for the re-assignments.

cut_points

A named, numeric vector defines the cut points (knots) for the median-centering of log2FC by sections. For example, at cut_points = c(mean_lint = seq(4, 7, .5)), log2FC will be binned according to the intervals of -Inf, 4, 4.5, ..., 7, Inf under column mean_lint (mean log10 intensity) in the input data. The default is cut_points = Inf, or equivalently -Inf, where the log2FC under each sample will be median-centered as one piece. See also prnHist for data binning in histogram visualization.

omit_single_lfq

Depreciated. Logical; if TRUE, omits LFQ entries with single measured values across all samples. The default is FALSE.

use_mq_pep

Logical; if TRUE, uses the peptides.txt from MaxQuant.

use_mf_pep

Logical; if TRUE, uses the peptides.txt from MSFragger.

rm_allna

Logical; if TRUE, removes data rows that are exclusively NA across ratio columns of log2_R126 etc. The setting also applies to log2_R000 in LFQ.

mbr_ret_tol

Retention time tolerance (in seconds) for LFQ-MBR.

max_mbr_fold

The maximum absolute fold change in MBR.

ret_sd_tol

Depreciated. Numeric; the tolerance in the variance of retention time (w.r.t. measures in seconds). The thresholding applies to TMT data. The default is Inf. Depends on the setting of LCMS gradients, a setting of, e.g., 150 might be suitable.

rm_ret_outliers

Depreciated. Logical; if TRUE, removes peptide entries with outlying retention times across samples and/or LCMS series.

use_spec_counts

Logical; If TRUE, uses spectrum counts for quantitation with Mascot or Mzion outputs.

...

filter_: Variable argument statements for the filtration of data rows. Each statement contains to a list of logical expression(s). The lhs needs to start with filter_. The logical condition(s) at the rhs needs to be enclosed in exprs with round parenthesis. For example, pep_expect is a column key present in Mascot PSM exports and filter_psms_at = exprs(pep_expect <= 0.1) will remove PSM entries with pep_expect > 0.1.


qzhang503/proteoQ documentation built on Dec. 14, 2024, 12:27 p.m.