normMulGau: Data normalization

normMulGauR Documentation

Data normalization

Description

normMulGau normalizes log2FC.

Usage

normMulGau(
  df,
  method_align = "MC",
  n_comp = NULL,
  seed = NULL,
  range_log2r = c(0, 100),
  range_int = c(0, 100),
  filepath = NULL,
  col_select = NULL,
  cut_points = Inf,
  ...
)

Arguments

df

An input data frame

method_align

Character string indicating the method in aligning log2FC across samples. MC: median-centering; MGKernel: the kernel density defined by multiple Gaussian functions (normalmixEM). At the MC default, the ratio profiles of each sample will be aligned in that the medians of the log2FC are zero. At MGKernel, the ratio profiles of each sample will be aligned in that the log2FC at the maximums of kernel density are zero.

n_comp

Integer; the number of Gaussian components to be used with method_align = MGKernel. A typical value is 2 or 3. The variable n_comp overwrites the argument k in normalmixEM.

seed

Integer; a seed for reproducible fitting at method_align = MGKernel.

range_log2r

Numeric vector at length two. The argument specifies the range of the log2FC for use in the scaling normalization of standard deviation across samples. The default is between the 10th and the 90th quantiles.

range_int

Numeric vector at length two. The argument specifies the range of the intensity of reporter ions (including I000) for use in the scaling normalization of standard deviation across samples. The default is between the 5th and the 95th quantiles.

filepath

A file path to output results. By default, it will be determined automatically by the name of the calling function and the value of id in the call.

col_select

Character string to a column key in expt_smry.xlsx. At the NULL default, the column key of Select in expt_smry.xlsx will be used. In the case of no samples being specified under Select, the column key of Sample_ID will be used. The non-empty entries under the ascribing column will be used in indicated analysis.

cut_points

A named, numeric vector defines the cut points (knots) in histograms. The default is cut_points = c(mean_lint = NA) where the cut points correspond to the quantile values under column mean_lint (mean log10 intensity) of input data. Values of log2FC will be then binned from -Inf to Inf according to the cut points. To disable data binning, set cut_points = Inf or -Inf. The binning of log2FC can also be achieved through a different numeric column, e.g., cut_points = c(prot_icover = seq(.25, .75, .25)). See also mergePep for data alignment with binning.

...

filter_: Variable argument statements for the row filtration of data against the column keys in Peptide.txt for peptides or Protein.txt for proteins. Each statement contains to a list of logical expression(s). The lhs needs to start with filter_. The logical condition(s) at the rhs needs to be enclosed in exprs with round parenthesis.

For example, pep_len is a column key in Peptide.txt. The statement filter_peps_at = exprs(pep_len <= 50) will remove peptide entries with pep_len > 50. See also normPSM.

Additional parameters for plotting with ggplot2:
xmin, the minimum x at a log2 scale; the default is -2.
xmax, the maximum x at a log2 scale; the default is +2.
xbreaks, the breaks in x-axis at a log2 scale; the default is 1.
binwidth, the binwidth of log2FC; the default is (xmax - xmin)/80.
ncol, the number of columns; the default is 1.
width, the width of plot;
height, the height of plot.
scales, should the scales be fixed across panels; the default is "fixed" and the alternative is "free".

Details

When executed with mergePep or linkPep2Prn, the method_align is always MC. As a result, peptide or protein data are at first median-centered.

It is then up to standPep or standPep for alternative choices in method_align, col_select etc.

Value

A data frame.


qzhang503/proteoQ documentation built on March 16, 2024, 5:27 a.m.