gspaTest: Perform GSPA tests

gspaTestR Documentation

Perform GSPA tests

Description

logFC_cutoff Numeric A threshold for the subset of data before the calculation of adjusted pvals

Usage

gspaTest(
  df = NULL,
  id = "entrez",
  label_scheme_sub = NULL,
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  filepath = NULL,
  filename = NULL,
  gset_nms = "go_sets",
  var_cutoff = 0.5,
  pval_cutoff = 0.05,
  logFC_cutoff = log2(1.2),
  gspval_cutoff = 0.05,
  gslogFC_cutoff = log2(1),
  min_size = 6,
  max_size = Inf,
  min_delta = 4,
  min_greedy_size = 1,
  use_adjP = FALSE,
  method = "mean",
  anal_type = "GSPA",
  ...
)

Arguments

df

The name of a primary data file. By default, it will be determined automatically after matching the types of data and analysis with an id among c("pep_seq", "pep_seq_mod", "prot_acc", "gene"). A primary file contains normalized peptide or protein data and is among c("Peptide.txt", "Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt"). For analyses require the fields of significance p-values, the df will be one of c("Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt").

id

Currently only "entrez".

label_scheme_sub

A data frame. Subset entries from label_scheme for selected samples.

scale_log2r

Logical; if TRUE, adjusts log2FC to the same scale of standard deviation across all samples. The default is TRUE. At scale_log2r = NA, the raw log2FC without normalization will be used.

complete_cases

Logical; if TRUE, only cases that are complete with no missing values will be used. The default is FALSE.

impute_na

Logical; if TRUE, data with the imputation of missing values will be used. The default is FALSE.

filepath

A file path to output results. By default, it will be determined automatically by the name of the calling function and the value of id in the call.

filename

A representative file name to outputs. By default, the name(s) will be determined automatically. For text files, a typical file extension is .txt. For image files, they are typically saved via ggsave or pheatmap where the image type will be determined by the extension of the file name.

gset_nms

Character string or vector containing the shorthanded name(s), full file path(s), or both, to gene sets for enrichment analysis. For species among "human", "mouse", "rat", the default of c("go_sets", "c2_msig", "kinsub") will utilize terms from gene ontology (GO), molecular signatures (MSig) and kinase-substrate network (PSP Kinase-Substrate). Custom GO, MSig and other data bases at given species are also supported. See also: prepGO for the preparation of custom GO; prepMSig for the preparation of custom MSig. For other custom data bases, follow the same format of list as GO or MSig.

var_cutoff

Numeric; the cut-off in the variances of protein log2FC. Entries with variances smaller than the threshold will be removed from GSVA. The default is 0.5.

pval_cutoff

Numeric value or vector; the cut-off in protein significance pVal. Entries with pVals less significant than the threshold will be excluded from enrichment analysis. The default is 0.05 for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of cut-off values.

logFC_cutoff

Numeric value or vector; the cut-off in protein log2FC. Entries with absolute log2FC smaller than the threshold will be excluded from enrichment analysis. The default magnitude is log2(1.2) for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of absolute values in log2FC.

gspval_cutoff

Numeric value or vector; the cut-off in gene-set significance pVal. Only enrichment terms with pVals more significant than the threshold will be reported. The default is 0.05 for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of cut-off values.

gslogFC_cutoff

Numeric value or vector; the cut-off in gene-set enrichment fold change. Only enrichment terms with absolute fold change greater than the threshold will be reported. The default magnitude is log2(1.2) for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of absolute values in log2FC.

min_size

Numeric value or vector; minimum number of protein entries for consideration in gene set tests. The number is after data filtration by pval_cutoff, logFC_cutoff or varargs expressions under filter_. The default is 10 for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of sizes.

max_size

Numeric value or vector; maximum number of protein entries for consideration in gene set tests. The number is after data filtration by pval_cutoff, logFC_cutoff or varargs expressions under filter_. The default in infinite for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of sizes.

min_delta

Numeric value or vector; the minimum count difference between the up- and the down-expressed group of proteins for consideration in gene set tests. For example at min_delta = 4, a gene set will 6 upregulated proteins and 2 down-expressed proteins, or vice versa, will be assessed. The number is after data filtration by pval_cutoff, logFC_cutoff or varargs expressions under filter_. The default is 4 for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of sizes.

min_greedy_size

Numeric value or vector; minimum number of unique protein entries for a gene set to be considered essential. The default in 1 for all formulas matched to or specified in argument fml_nms. Formula-specific threshold is allowed by supplying a vector of sizes.

use_adjP

Logical; if TRUE, use Benjamini-Hochberg pVals. The default is FALSE.

method

Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.

anal_type

Character string; the type of analysis that are preset for method dispatch in function factories. The value will be determined automatically. Exemplary values include anal_type = c("PCA", "Corrplot", "EucDist", "GSPA", "Heatmap", "Histogram", "MDS", "Model", "NMF", "Purge", "Trend", "LDA", ...).

...

filter_: Variable argument statements for the row filtration of data against the column keys in Peptide.txt for peptides or Protein.txt for proteins. Each statement contains to a list of logical expression(s). The lhs needs to start with filter_. The logical condition(s) at the rhs needs to be enclosed in exprs with round parenthesis.

For example, pep_len is a column key in Peptide.txt. The statement filter_peps_at = exprs(pep_len <= 50) will remove peptide entries with pep_len > 50. See also normPSM.

Additional parameters for plotting with ggplot2:
xmin, the minimum x at a log2 scale; the default is -2.
xmax, the maximum x at a log2 scale; the default is +2.
xbreaks, the breaks in x-axis at a log2 scale; the default is 1.
binwidth, the binwidth of log2FC; the default is (xmax - xmin)/80.
ncol, the number of columns; the default is 1.
width, the width of plot;
height, the height of plot.
scales, should the scales be fixed across panels; the default is "fixed" and the alternative is "free".


qzhang503/proteoQ documentation built on March 16, 2024, 5:27 a.m.