prnGSPAHM: Heat map visualization of GSPA results
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

prnGSPAHM

R Documentation

Heat map visualization of GSPA results

Description

prnGSPAHM visualizes distance heat maps and networks between essential and all gene sets.

Usage

prnGSPAHM(
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  fml_nms = NULL,
  annot_cols = NULL,
  annot_colnames = NULL,
  annot_rows = NULL,
  df2 = NULL,
  filename = NULL,
  ...
)

Arguments

`scale_log2r`	Logical; at the TRUE default, input files with `_Z[...].txt` in name will be used. Otherwise, files with `_N[...].txt` in name will be taken. An error will be thrown if no files are matched under given conditions.
`complete_cases`	Logical; if TRUE, only cases that are complete with no missing values will be used. The default is FALSE.
`impute_na`	Logical; at TRUE, input files with `_impNA[...].txt` in name will be loaded. Otherwise, files without `_impNA` in name will be taken. An error will be thrown if no files are matched under given conditions. The default is FALSE.
`fml_nms`	Character string or vector; the formula name(s). By default, the formula(s) will match those used in `pepSig` or `prnSig`.
`annot_cols`	A character vector of column keys that can be found in `_essmap.txt`. The values under the selected keys will be used to color-code enrichment terms on the top of heat maps. The default is NULL without column annotation.
`annot_colnames`	A character vector of replacement name(s) to `annot_cols`. The default is NULL without name replacement.
`annot_rows`	A character vector of column keys that can be found from `_essmeta.txt` . The values under the selected keys will be used to color-code essential terms on the side of heat maps. The default is NULL without row annotation.
`df2`	Character vector or string; the name(s) of secondary data file(s). An informatic task, i.e. `anal_prnTrend(...)` against a primary `df` generates secondary files such as `Protein_Trend_Z_nclust6.txt` etc. See also `prnHist` for the description of a primary `df`; `normPSM` for the lists of `df` and `df2`.
`filename`	A representative file name to outputs. By default, it will be determined automatically by the name of the current call.
`...`	`filter2_`: Variable argument statements for the row filtration against data in secondary file(s) of `_essmap.txt`. Each statement contains to a list of logical expression(s). The `lhs` needs to start with `filter2_`. The logical condition(s) at the `rhs` needs to be enclosed in `exprs` with round parenthesis. For example, `distance` is a column key in `Protein_GSPA_Z_essmap.txt`. The statement `filter2_ = exprs(distance <= .95),` will remove entries with `distance > 0.95`. See also `normPSM` for the format of `filter_` statements against primary data. `arrange2_`: Variable argument statements for the row ordering against data in secondary file(s) of `_essmap.txt`. The `lhs` needs to start with `arrange2_`. The expression(s) at the `rhs` needs to be enclosed in `exprs` with round parenthesis. For example, `distance` and `size` are column keys in `Protein_GSPA_Z_essmap.txt`. The statement `arrange2_ = exprs(distance, size),` will order entries by `distance`, then by `size`. See also `prnHM` for the format of `arrange_` statements against primary data. Additional arguments for `pheatmap`, i.e., `fontsize` ... Note arguments disabled from `pheatmap`: `annotation_col`; instead use keys indicated in `annot_cols` `annotation_row`; instead use keys indicated in `annot_rows`

Details

The list of gene sets and the associative quality metrics of size and ess_size are assessed after data filtration with the criteria specified by arguments pval_cutoff and logFC_cutoff, as well as optional varargs of filter_.

`Protein_GSPA_[...].txt`

Key	Description
term	a gene set term
is_essential	a logical indicator of gene set essentiality
size	the number of IDs under a `term`
ess_size	the number of IDs that can be found under a corresponding essential set
contrast	a contrast of sample groups
p_val	significance p values
q_val	`p_val` with `BH` adjustment of multiple tests
log2fc	the fold change of a gene set at logarithmic base of 2

`Protein_GSPA_[...]essmap.txt`

Key	Descrption
term	a gene set term
ess_term	an essential gene set term
size	the number of IDs under a `term` with matches to an `ess_term`
ess_size	the number of essential IDs under a `term` with matches to an `ess_term`
fraction	a fraction of matches in IDs between a `term` and a `ess_term`
distance	1 - `fraction`
idx	a numeric index of `term`
ess_idx	a numeric index of `ess_term`

Data normalization
normPSM for extended examples in PSM data normalization
PSM2Pep for extended examples in PSM to peptide summarization
mergePep for extended examples in peptide data merging
standPep for extended examples in peptide data normalization
Pep2Prn for extended examples in peptide to protein summarization
standPrn for extended examples in protein data normalization.
purgePSM and purgePep for extended examples in data purging
pepHist and prnHist for extended examples in histogram visualization.
extract_raws and extract_psm_raws for extracting MS file names

Variable arguments of 'filter_...'
contain_str, contain_chars_in, not_contain_str, not_contain_chars_in, start_with_str, end_with_str, start_with_chars_in and ends_with_chars_in for data subsetting by character strings

Missing values
pepImp and prnImp for missing value imputation

Informatics
pepSig and prnSig for significance tests
pepVol and prnVol for volcano plot visualization
prnGSPA for gene set enrichment analysis by protein significance pVals
gspaMap for mapping GSPA to volcano plot visualization
prnGSPAHM for heat map and network visualization of GSPA results
prnGSVA for gene set variance analysis
prnGSEA for data preparation for online GSEA.
pepMDS and prnMDS for MDS visualization
pepPCA and prnPCA for PCA visualization
pepLDA and prnLDA for LDA visualization
pepHM and prnHM for heat map visualization
pepCorr_logFC, prnCorr_logFC, pepCorr_logInt and prnCorr_logInt for correlation plots
anal_prnTrend and plot_prnTrend for trend analysis and visualization
anal_pepNMF, anal_prnNMF, plot_pepNMFCon, plot_prnNMFCon, plot_pepNMFCoef, plot_prnNMFCoef and plot_metaNMF for NMF analysis and visualization

Custom databases
Uni2Entrez for lookups between UniProt accessions and Entrez IDs
Ref2Entrez for lookups among RefSeq accessions, gene names and Entrez IDs
prepGO for gene ontology
prepMSig for molecular signatures
prepString and anal_prnString for STRING-DB

Column keys in PSM, peptide and protein outputs
system.file("extdata", "psm_keys.txt", package = "proteoQ")
system.file("extdata", "peptide_keys.txt", package = "proteoQ")
system.file("extdata", "protein_keys.txt", package = "proteoQ")

Examples


# ===================================
# Heat maps of GSPA
# ===================================

## !!!require the brief working example in `?load_expts`

## global option
scale_log2r <- TRUE

## prerequisites in significance and enrichment tests
# (see also ?prnSig, ?prnGSPA)
pepSig(
  impute_na = FALSE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"], # batch effects
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"], # location effects
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = FALSE)

prnGSPA(
  pval_cutoff = 5E-2,
  logFC_cutoff = log2(1.2),
  gspval_cutoff = 5E-2,
  gset_nms = c("go_sets", "kegg_sets"),
  impute_na = FALSE,
)

# ===================================
# Distance heat maps of gene sets
# (also interactive networks)
# ===================================
# a `term` is a subset of an `ess_term` if the distance is zero
prnGSPAHM(
  filter2_by = exprs(distance <= .6),
  annot_cols = "ess_idx",
  annot_colnames = "Eset index",
  annot_rows = "ess_size",
  filename = show_some_redundancy.png,
)

# human terms only
prnGSPAHM(
  filter2_by_dist = exprs(distance <= .95),
  filter2_by_sp = exprs(start_with_str("hs", term)),
  annot_cols = "ess_idx",
  annot_colnames = "Eset index",
  filename = show_more_connectivity.png,
)

# custom color palette
prnGSPAHM(
  annot_cols = c("ess_idx", "ess_size"),
  annot_colnames = c("Eset index", "Size"),
  filter2_by = exprs(distance <= .95),
  color = colorRampPalette(c("blue", "white", "red"))(100),
  filename = custom_colors.png,
)

qzhang503/proteoQ documentation built on April 13, 2025, 8:31 a.m.