fml_gsea: Protein GSEA by formula(s) in 'pepSig' or 'prnSig'

fml_gseaR Documentation

Protein GSEA by formula(s) in 'pepSig' or 'prnSig'

Description

Protein GSEA by formula(s) in 'pepSig' or 'prnSig'

Usage

fml_gsea(
  fml,
  fml_nm,
  var_cutoff,
  pval_cutoff,
  logFC_cutoff,
  gspval_cutoff,
  gslogFC_cutoff,
  min_size,
  max_size,
  df,
  col_ind,
  id,
  gsets,
  label_scheme_sub,
  complete_cases,
  scale_log2r,
  filepath,
  filename,
  ...
)

Arguments

fml

A character string; the formula used in prnSig.

fml_nm

A character string; the name of fml.

var_cutoff

Numeric; the cut-off in the variances of protein log2FC. Entries with variances smaller than the threshold will be removed from GSVA. The default is 0.5.

pval_cutoff

Numeric value or vector; the cut-off in protein significance pVal. Entries with pVals less significant than the threshold will be excluded from enrichment analysis. The default is 0.05 for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of cut-off values.

logFC_cutoff

Numeric value or vector; the cut-off in protein log2FC. Entries with absolute log2FC smaller than the threshold will be excluded from enrichment analysis. The default magnitude is log2(1.2) for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of absolute values in log2FC.

gspval_cutoff

Numeric value or vector; the cut-off in gene-set significance pVal. Only enrichment terms with pVals more significant than the threshold will be reported. The default is 0.05 for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of cut-off values.

gslogFC_cutoff

Numeric value or vector; the cut-off in gene-set enrichment fold change. Only enrichment terms with absolute fold change greater than the threshold will be reported. The default magnitude is log2(1.2) for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of absolute values in log2FC.

min_size

Numeric value or vector; minimum number of protein entries for consideration in gene set tests. The number is after data filtration by pval_cutoff, logFC_cutoff or varargs expressions under filter_. The default is 10 for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of sizes.

max_size

Numeric value or vector; maximum number of protein entries for consideration in gene set tests. The number is after data filtration by pval_cutoff, logFC_cutoff or varargs expressions under filter_. The default in infinite for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of sizes.

df

The name of a primary data file. By default, it will be determined automatically after matching the types of data and analysis with an id among c("pep_seq", "pep_seq_mod", "prot_acc", "gene"). A primary file contains normalized peptide or protein data and is among c("Peptide.txt", "Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt"). For analyses require the fields of significance p-values, the df will be one of c("Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt").

col_ind

Numeric vector; the indexes of columns for the ascribed fml_nm.

id

Character string; one of pep_seq, pep_seq_mod, prot_acc and gene.

gsets

The gene sets.

label_scheme_sub

A data frame. Subset entries from label_scheme for selected samples.

complete_cases

Logical; if TRUE, only cases that are complete with no missing values will be used. The default is FALSE.

scale_log2r

Logical; if TRUE, adjusts log2FC to the same scale of standard deviation across all samples. The default is TRUE. At scale_log2r = NA, the raw log2FC without normalization will be used.

filepath

A file path to output results. By default, it will be determined automatically by the name of the calling function and the value of id in the call.

filename

A representative file name to outputs. By default, the name(s) will be determined automatically. For text files, a typical file extension is .txt. For image files, they are typically saved via ggsave or pheatmap where the image type will be determined by the extension of the file name.

...

filter_: Variable argument statements for the row filtration of data against the column keys in Peptide.txt for peptides or Protein.txt for proteins. Each statement contains to a list of logical expression(s). The lhs needs to start with filter_. The logical condition(s) at the rhs needs to be enclosed in exprs with round parenthesis.

For example, pep_len is a column key in Peptide.txt. The statement filter_peps_at = exprs(pep_len <= 50) will remove peptide entries with pep_len > 50. See also normPSM.

Additional parameters for plotting with ggplot2:
xmin, the minimum x at a log2 scale; the default is -2.
xmax, the maximum x at a log2 scale; the default is +2.
xbreaks, the breaks in x-axis at a log2 scale; the default is 1.
binwidth, the binwidth of log2FC; the default is (xmax - xmin)/80.
ncol, the number of columns; the default is 1.
width, the width of plot;
height, the height of plot.
scales, should the scales be fixed across panels; the default is "fixed" and the alternative is "free".


qzhang503/proteoQ documentation built on Dec. 14, 2024, 12:27 p.m.