prnHM: Visualization of heat maps

pepHMR Documentation

Visualization of heat maps

Description

pepHM applies dist and hclust for the visualization of the heat maps of peptide log2FC via pheatmap.

prnHM applies dist and hclust for the visualization of the heat maps of protein log2FC via pheatmap.

Usage

pepHM(
  col_select = NULL,
  col_order = NULL,
  col_benchmark = NULL,
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  rm_allna = TRUE,
  df = NULL,
  filepath = NULL,
  filename = NULL,
  annot_cols = NULL,
  annot_colnames = NULL,
  annot_rows = NULL,
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  hc_method_rows = "complete",
  hc_method_cols = "complete",
  p_dist_rows = 2,
  p_dist_cols = 2,
  x = NULL,
  p = NULL,
  method = NULL,
  diag = NULL,
  upper = NULL,
  annotation_col = NULL,
  annotation_row = NULL,
  clustering_method = NULL,
  ...
)

prnHM(
  col_select = NULL,
  col_order = NULL,
  col_benchmark = NULL,
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  rm_allna = TRUE,
  df = NULL,
  filepath = NULL,
  filename = NULL,
  annot_cols = NULL,
  annot_colnames = NULL,
  annot_rows = NULL,
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  hc_method_rows = "complete",
  hc_method_cols = "complete",
  p_dist_rows = 2,
  p_dist_cols = 2,
  x = NULL,
  p = NULL,
  method = NULL,
  diag = NULL,
  upper = NULL,
  annotation_col = NULL,
  annotation_row = NULL,
  clustering_method = NULL,
  ...
)

Arguments

col_select

Character string to a column key in expt_smry.xlsx. At the NULL default, the column key of Select in expt_smry.xlsx will be used. In the case of no samples being specified under Select, the column key of Sample_ID will be used. The non-empty entries under the ascribing column will be used in indicated analysis.

col_order

Character string to a column key in expt_smry.xlsx. Numeric values under which will be used for the left-to-right arrangement of samples in graphic outputs or top-to-bottom arrangement in text outputs. At the NULL default, the column key Order will be used. If values under column Order are left blank, samples will be ordered by their names.

col_benchmark

Not used.

scale_log2r

Logical; if TRUE, adjusts log2FC to the same scale of standard deviation across all samples. The default is TRUE. At scale_log2r = NA, the raw log2FC without normalization will be used.

complete_cases

Logical; if TRUE, only cases that are complete with no missing values will be used. The default is FALSE.

impute_na

Logical; if TRUE, data with the imputation of missing values will be used. The default is FALSE.

rm_allna

Logical; if TRUE, removes data rows that are exclusively NA across ratio columns of log2_R126 etc. The setting also applies to log2_R000 in LFQ.

df

The name of a primary data file. By default, it will be determined automatically after matching the types of data and analysis with an id among c("pep_seq", "pep_seq_mod", "prot_acc", "gene"). A primary file contains normalized peptide or protein data and is among c("Peptide.txt", "Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt"). For analyses require the fields of significance p-values, the df will be one of c("Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt").

filepath

A file path to output results. By default, it will be determined automatically by the name of the calling function and the value of id in the call.

filename

A representative file name to outputs. By default, the name(s) will be determined automatically. For text files, a typical file extension is .txt. For image files, they are typically saved via ggsave or pheatmap where the image type will be determined by the extension of the file name.

annot_cols

A character vector of column keys in expt_smry.xlsx. The values under the selected keys will be used to color-code sample IDs on the top of the indicated plot. The default is NULL without column annotation.

annot_colnames

A character vector of replacement name(s) to annot_cols. The default is NULL without name replacement.

annot_rows

A character vector of column keys that can be found from input files of Peptide.txt, Protein.txt etc. The values under the selected keys will be used to color-code peptides or proteins on the side of the indicated plot. The default is NULL without row annotation.

xmin

Numeric; the minimum x at a log2 scale. The default is -1.

xmax

Numeric; the maximum x at a log2 scale. The default is 1.

xmargin

Numeric; the margin in heat scales. The default is 0.1.

hc_method_rows

A character string; the same agglomeration method for hclust of data rows. The default is complete.

hc_method_cols

A character string; similar to hc_method_rows but for column data.

p_dist_rows

Numeric; the power of the Minkowski distance in the measures of row dist at clustering_distance_rows = "minkowski". The default is 2.

p_dist_cols

Numeric; similar to p_dist_rows but for column data.

x

Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.

p

Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.

method

Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.

diag

Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.

upper

Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.

annotation_col

Dummy argument to avoid incurring the corresponding argument in pheatmap.

annotation_row

Dummy argument to avoid incurring the corresponding argument in pheatmap.

clustering_method

Dummy argument to avoid incurring the corresponding argument in pheatmap.

...

filter_: Variable argument statements for the row filtration against data in a primary file linked to df. Each statement contains to a list of logical expression(s). The lhs needs to start with filter_. The logical condition(s) at the rhs needs to be enclosed in exprs with round parenthesis. For example, pep_len is a column key in Peptide.txt. The statement filter_peps_at = exprs(pep_len <= 50) will remove peptide entries with pep_len > 50. See also pepHist, normPSM.

arrange_: Variable argument statements for the row ordering against data in a primary file linked to df. The lhs needs to start with arrange_. The expression(s) at the rhs needs to be enclosed in exprs with round parenthesis. For example, arrange_peps_by = exprs(gene, prot_n_pep) will arrange entries by gene, then by prot_n_pep.

Additional parameters for plotting:
width, the width of plot
height, the height of plot

Additional arguments for pheatmap:
cluster_rows, clustering_method, clustering_distance_rows...

Notes about pheatmap:
annotation_col disabled; instead use keys indicated in annot_cols
annotation_row disabled; instead use keys indicated in annot_rows
clustering_method breaks into hc_method_rows for row data and hc_method_cols for column data
clustering_distance_rows = "minkowski" allowed together with the powder of p_dist_rows and/or p_dist_cols

Details

Data rows without non-missing pairs will result in NA distances in inter-row dissimilarities (dist). At complet_cases = TRUE, the data subset that are complete without missing values will be used. At impute_na = TRUE, all data rows will be used with NA imputation (see prnImp). At the default of complet_cases = FALSE and impute_na = FALSE, NA distances will be arbitrarily replaced with the mean value of the row-distance matrix for hierarchical row clustering.

Similar to data rows, NA distances in data columns will be replaced with the mean value of the column-distance matrix.

To avoid memory failure, row aggregation using the kmeans_k option (pheatmap) may be considered for large data sets.

Value

Heat maps and optional sub trees.

See Also

Metadata
load_expts for metadata preparation and a reduced working example in data normalization

Data normalization
normPSM for extended examples in PSM data normalization
PSM2Pep for extended examples in PSM to peptide summarization
mergePep for extended examples in peptide data merging
standPep for extended examples in peptide data normalization
Pep2Prn for extended examples in peptide to protein summarization
standPrn for extended examples in protein data normalization.
purgePSM and purgePep for extended examples in data purging
pepHist and prnHist for extended examples in histogram visualization.
extract_raws and extract_psm_raws for extracting MS file names

Variable arguments of 'filter_...'
contain_str, contain_chars_in, not_contain_str, not_contain_chars_in, start_with_str, end_with_str, start_with_chars_in and ends_with_chars_in for data subsetting by character strings

Missing values
pepImp and prnImp for missing value imputation

Informatics
pepSig and prnSig for significance tests
pepVol and prnVol for volcano plot visualization
prnGSPA for gene set enrichment analysis by protein significance pVals
gspaMap for mapping GSPA to volcano plot visualization
prnGSPAHM for heat map and network visualization of GSPA results
prnGSVA for gene set variance analysis
prnGSEA for data preparation for online GSEA.
pepMDS and prnMDS for MDS visualization
pepPCA and prnPCA for PCA visualization
pepLDA and prnLDA for LDA visualization
pepHM and prnHM for heat map visualization
pepCorr_logFC, prnCorr_logFC, pepCorr_logInt and prnCorr_logInt for correlation plots
anal_prnTrend and plot_prnTrend for trend analysis and visualization
anal_pepNMF, anal_prnNMF, plot_pepNMFCon, plot_prnNMFCon, plot_pepNMFCoef, plot_prnNMFCoef and plot_metaNMF for NMF analysis and visualization

Custom databases
Uni2Entrez for lookups between UniProt accessions and Entrez IDs
Ref2Entrez for lookups among RefSeq accessions, gene names and Entrez IDs
prepGO for gene ontology
prepMSig for molecular signatures
prepString and anal_prnString for STRING-DB

Column keys in PSM, peptide and protein outputs
system.file("extdata", "psm_keys.txt", package = "proteoQ")
system.file("extdata", "peptide_keys.txt", package = "proteoQ")
system.file("extdata", "protein_keys.txt", package = "proteoQ")

Examples


# ===================================
# Heat map
# ===================================

## !!!require the brief working example in `?load_expts`

## global option
scale_log2r <- TRUE

## proteins
# row clustering
prnHM(
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  cutree_rows = 10,
  show_rownames = FALSE,
  show_colnames = TRUE,
  fontsize_row = 3,
  cellwidth = 14,
  width = 18,
  height = 12,
  filter_sp = exprs(species == "human", prot_n_pep >= 2),
  filename = "huprns_npep2.png",
)

# rows ordered by kinase classes then by gene names
# (error if `normPSM(annot_kinases = FALSE, ...)`)
prnHM(
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = FALSE,
  annot_rows = c("kin_class"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 2,
  cellheight = 2,
  cellwidth = 14,
  width = 22,
  height = 22,
  filter_kin = exprs(kin_attr, species == "human"),
  arrange_kin = exprs(kin_order, gene),
  filename = "hukins_rows_by_class.png",
)

# `cutree_rows` ignored at `cluster_rows = FALSE`
prnHM(
  scale_log2r = TRUE,
  annot_cols = c("Group"),
  cluster_rows = FALSE,
  clustering_distance_rows  = "maximum",
  cutree_rows = 6,
  show_rownames = FALSE,
  show_colnames = TRUE,
  fontsize_row = 3,
  cellwidth = 14,
  width = 22,
  height = 22,
  filename = "cutree_overruled.png",
)

# `minkowski` distance and `ward.D2` clustering
prnHM(
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  cutree_rows = 10,
  show_rownames = FALSE,
  show_colnames = TRUE,
  fontsize_row = 3,
  cellwidth = 14,
  width = 18,
  height = 12,
  filter_sp = exprs(species == "human", prot_n_pep >= 2),
  hc_method_rows = "ward.D2", 
  hc_method_cols = "ward.D2", 
  clustering_distance_rows = "minkowski", 
  clustering_distance_cols = "minkowski", 
  p_dist_rows = 2,
  p_dist_cols = 2,
  clustering_distance_cols = "manhattan", 
  filename = "rowminko2_colman_clustward.D2.png",
)

## additional row filtration by pVals (proteins, impute_na = FALSE)
# if not yet, run prerequisitive significance tests at `impute_na = FALSE`
pepSig(
  impute_na = FALSE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"],
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = FALSE)

# (`W16_vs_W2.pVal (W16-W2)` now a column key)
prnHM(
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  cutree_rows = 10,
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 3,
  cellwidth = 14,
  filter_sp = exprs(species == "human", prot_n_pep >= 2),
  filter_by = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
  filename = "pval_cutoff_at_1e6.png", 
)

## additional row filtration by pVals (proteins, impute_na = TRUE)
# if not yet, run prerequisitive NA imputation
pepImp(m = 2, maxit = 2)
prnImp(m = 5, maxit = 5)

# if not yet, run prerequisitive significance tests at `impute_na = TRUE`
pepSig(
  impute_na = TRUE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"],
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = TRUE)

prnHM(
  impute_na = TRUE, 
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  cutree_rows = 10,
  show_rownames = FALSE,
  show_colnames = TRUE,
  fontsize_row = 3,
  cellwidth = 14,
  width = 18,
  height = 12,
  filter_prots_by_sp_npep = exprs(species == "human", prot_n_pep >= 3),
  filter_prots_by_pvals = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
  filename = "huprns_fil_impna.png",
)

## peptides
# under selected protein(s)
pepHM(
  xmin = -2,
  xmax = 2,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  annot_rows = c("gene"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 10,
  cellwidth = 12,
  cellheight = 12,
  width = 18,
  height = 12,
  filter_by = exprs(gene %in% c("NCL", "Ncl")),
  filename = "ncl_all.png",
)

# rows ordered by gene 
pepHM(
  xmin = -2,
  xmax = 2,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = FALSE,
  annot_rows = c("gene"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 10,
  cellwidth = 12,
  cellheight = 12,
  width = 18,
  height = 12,
  
  filter_by = exprs(gene %in% c("NCL", "Ncl")),
  arrange_peps_by = exprs(gene),
  filename = "ncl_rows_by_gene.png",
)

# rows ordered by sequence 
# (may try alternatively `exprs(pep_seq)` if `pep_seq_mod` not a column key in `Peptide.txt`)
pepHM(
  xmin = -2,
  xmax = 2,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = FALSE,
  annot_rows = c("gene"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 10,
  cellwidth = 12,
  cellheight = 12,
  width = 18,
  height = 12,
  filter_by = exprs(gene %in% c("NCL", "Ncl")),
  arrange_peps_by = exprs(pep_seq_mod),
  filename = "ncl_rows_by_seq.png",
)

# more options
pepHM(
  xmin = -2,
  xmax = 2,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = FALSE,
  annot_rows = c("gene", "W16_vs_W2.pVal (W16-W2)"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 10,
  cellwidth = 12,
  cellheight = 12,
  width = 18,
  height = 12,
  filter_by = exprs(gene %in% c("NCL", "Ncl")),
  filter_prots_by_pvals = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-5), 
  arrange_by = exprs(gene, -`W16_vs_W2.pVal (W16-W2)`), 
  filename = "ncl_more.png",
)

# selected samples
pepHM(
  col_select = BI_1, 
  xmin = -2,
  xmax = 2,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  annot_rows = c("gene"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 10,
  cellwidth = 12,
  cellheight = 12,
  width = 18,
  height = 12,
  filter_by = exprs(gene %in% c("NCL", "Ncl")),
  arrange_peps_by = exprs(gene),  
  filename = "ncl_bi1.png",
)

## multiple genes
genes <- c("NCL", "Ncl")

lapply(genes, function (gene) {
  gn <- gene
  
  pepHM(
    xmin = -2, 
    xmax = 2, 
    xmargin = 0.1, 
    annot_cols = c("Group", "Color", "Alpha", "Shape"),
    annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
    cluster_rows = FALSE, 
    show_rownames = TRUE, 
    show_colnames = TRUE, 
    fontsize_row = 10, 
    cellwidth = 12, 
    cellheight = 12, 
    width = 18,
    height = 12,
    arrange_pep = exprs(pep_start, pep_end), 
    filter_sp = exprs(gene == !!gn), 
    filename = !!paste0(gene, ".png"),
  )
})

## Customer annotation colors 
annot_colors_group <- colorRampPalette(brewer.pal(n = 9, "Set1"))(12)
names(annot_colors_group) <- c("W16.BI.TMT1", "W16.BI.TMT2", 
                               "W16.JHU.TMT1", "W16.JHU.TMT2", 
                               "W16.PNNL.TMT1", "W16.PNNL.TMT2",
                               "W2.BI.TMT1", "W2.BI.TMT2", 
                               "W2.JHU.TMT1", "W2.JHU.TMT2", 
                               "W2.PNNL.TMT1", "W2.PNNL.TMT2")

annot_colors_lab <- brewer.pal(n = 3, "Set2")
names(annot_colors_lab) <- c("BI", "JHU", "PNNL")

annot_colors_batch <- brewer.pal(n = 4, "Set3")[1:2]
names(annot_colors_batch) <- c("TMT1", "TMT2")

annot_colors_whim <- brewer.pal(n = 4, "Set3")[3:4]
names(annot_colors_whim) <- c("W16", "W2")

annot_colors <- list(Group = annot_colors_group, 
                     Lab = annot_colors_lab, 
                     Batch = annot_colors_batch, 
                     WHIM = annot_colors_whim)

prnHM(
  xmin = -1, 
  xmax = 1, 
  xmargin = 0.1, 
  annot_cols = c("Group", "Color", "Alpha", "Shape"), 
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"), 
  annotation_colors = annot_colors, 
  cluster_rows = TRUE, 
  cutree_rows = 10, 
  show_rownames = FALSE, 
  show_colnames = TRUE, 
  fontsize_row = 3, 
  cellwidth = 14, 
  filter_sp = exprs(species == "human"), 
  filename = custom.png,
)



qzhang503/proteoQ documentation built on March 16, 2024, 5:27 a.m.