prnHM: Visualization of heat maps
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

pepHM

R Documentation

Visualization of heat maps

Description

pepHM applies dist and hclust for the visualization of the heat maps of peptide log2FC via pheatmap.

prnHM applies dist and hclust for the visualization of the heat maps of protein log2FC via pheatmap.

Usage

pepHM(
  col_select = NULL,
  col_order = NULL,
  col_benchmark = NULL,
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  rm_allna = TRUE,
  df = NULL,
  filepath = NULL,
  filename = NULL,
  annot_cols = NULL,
  annot_colnames = NULL,
  annot_rows = NULL,
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  hc_method_rows = "complete",
  hc_method_cols = "complete",
  p_dist_rows = 2,
  p_dist_cols = 2,
  x = NULL,
  p = NULL,
  method = NULL,
  diag = NULL,
  upper = NULL,
  annotation_col = NULL,
  annotation_row = NULL,
  clustering_method = NULL,
  ...
)

prnHM(
  col_select = NULL,
  col_order = NULL,
  col_benchmark = NULL,
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  rm_allna = TRUE,
  df = NULL,
  filepath = NULL,
  filename = NULL,
  annot_cols = NULL,
  annot_colnames = NULL,
  annot_rows = NULL,
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  hc_method_rows = "complete",
  hc_method_cols = "complete",
  p_dist_rows = 2,
  p_dist_cols = 2,
  x = NULL,
  p = NULL,
  method = NULL,
  diag = NULL,
  upper = NULL,
  annotation_col = NULL,
  annotation_row = NULL,
  clustering_method = NULL,
  ...
)

Arguments

`col_select`	Character string to a column key in `expt_smry.xlsx`. At the `NULL` default, the column key of `Select` in `expt_smry.xlsx` will be used. In the case of no samples being specified under `Select`, the column key of `Sample_ID` will be used. The non-empty entries under the ascribing column will be used in indicated analysis.
`col_order`	Character string to a column key in `expt_smry.xlsx`. Numeric values under which will be used for the left-to-right arrangement of samples in graphic outputs or top-to-bottom arrangement in text outputs. At the NULL default, the column key `Order` will be used. If values under column `Order` are left blank, samples will be ordered by their names.
`col_benchmark`	Not used.
`scale_log2r`	Logical; if TRUE, adjusts `log2FC` to the same scale of standard deviation across all samples. The default is TRUE. At `scale_log2r = NA`, the raw `log2FC` without normalization will be used.
`complete_cases`	Logical; if TRUE, only cases that are complete with no missing values will be used. The default is FALSE.
`impute_na`	Logical; if TRUE, data with the imputation of missing values will be used. The default is FALSE.
`rm_allna`	Logical; if TRUE, removes data rows that are exclusively NA across ratio columns of `log2_R126` etc. The setting also applies to `log2_R000` in LFQ.
`df`	The name of a primary data file. By default, it will be determined automatically after matching the types of data and analysis with an `id` among `c("pep_seq", "pep_seq_mod", "prot_acc", "gene")`. A primary file contains normalized peptide or protein data and is among `c("Peptide.txt", "Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt")`. For analyses require the fields of significance p-values, the `df` will be one of `c("Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt")`.
`filepath`	A file path to output results. By default, it will be determined automatically by the name of the calling function and the value of `id` in the `call`.
`filename`	A representative file name to outputs. By default, the name(s) will be determined automatically. For text files, a typical file extension is `.txt`. For image files, they are typically saved via `ggsave` or `pheatmap` where the image type will be determined by the extension of the file name.
`annot_cols`	A character vector of column keys in `expt_smry.xlsx`. The values under the selected keys will be used to color-code sample IDs on the top of the indicated plot. The default is NULL without column annotation.
`annot_colnames`	A character vector of replacement name(s) to `annot_cols`. The default is NULL without name replacement.
`annot_rows`	A character vector of column keys that can be found from input files of `Peptide.txt`, `Protein.txt` etc. The values under the selected keys will be used to color-code peptides or proteins on the side of the indicated plot. The default is NULL without row annotation.
`xmin`	Numeric; the minimum x at a log2 scale. The default is -1.
`xmax`	Numeric; the maximum x at a log2 scale. The default is 1.
`xmargin`	Numeric; the margin in heat scales. The default is 0.1.
`hc_method_rows`	A character string; the same agglomeration method for `hclust` of data rows. The default is `complete`.
`hc_method_cols`	A character string; similar to `hc_method_rows` but for column data.
`p_dist_rows`	Numeric; the power of the Minkowski distance in the measures of row `dist` at `clustering_distance_rows = "minkowski"`. The default is 2.
`p_dist_cols`	Numeric; similar to `p_dist_rows` but for column data.
`x`	Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.
`p`	Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.
`method`	Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.
`diag`	Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.
`upper`	Dummy argument to avoid incurring the corresponding argument in dist by partial argument matches.
`annotation_col`	Dummy argument to avoid incurring the corresponding argument in pheatmap.
`annotation_row`	Dummy argument to avoid incurring the corresponding argument in pheatmap.
`clustering_method`	Dummy argument to avoid incurring the corresponding argument in pheatmap.
`...`	`filter_`: Variable argument statements for the row filtration against data in a primary file linked to `df`. Each statement contains to a list of logical expression(s). The `lhs` needs to start with `filter_`. The logical condition(s) at the `rhs` needs to be enclosed in `exprs` with round parenthesis. For example, `pep_len` is a column key in `Peptide.txt`. The statement `filter_peps_at = exprs(pep_len <= 50)` will remove peptide entries with `pep_len > 50`. See also `pepHist`, `normPSM`. `arrange_`: Variable argument statements for the row ordering against data in a primary file linked to `df`. The `lhs` needs to start with `arrange_`. The expression(s) at the `rhs` needs to be enclosed in `exprs` with round parenthesis. For example, `arrange_peps_by = exprs(gene, prot_n_pep)` will arrange entries by `gene`, then by `prot_n_pep`. Additional parameters for plotting: `width`, the width of plot `height`, the height of plot Additional arguments for `pheatmap`: `cluster_rows, clustering_method, clustering_distance_rows`... Notes about `pheatmap`: `annotation_col` disabled; instead use keys indicated in `annot_cols` `annotation_row` disabled; instead use keys indicated in `annot_rows` `clustering_method` breaks into `hc_method_rows` for row data and `hc_method_cols` for column data `clustering_distance_rows = "minkowski"` allowed together with the powder of `p_dist_rows` and/or `p_dist_cols`

Details

Data rows without non-missing pairs will result in NA distances in inter-row dissimilarities (dist). At complet_cases = TRUE, the data subset that are complete without missing values will be used. At impute_na = TRUE, all data rows will be used with NA imputation (see prnImp). At the default of complet_cases = FALSE and impute_na = FALSE, NA distances will be arbitrarily replaced with the mean value of the row-distance matrix for hierarchical row clustering.

Similar to data rows, NA distances in data columns will be replaced with the mean value of the column-distance matrix.

To avoid memory failure, row aggregation using the kmeans_k option (pheatmap) may be considered for large data sets.

Value

Heat maps and optional sub trees.

Examples


# ===================================
# Heat map
# ===================================

## !!!require the brief working example in `?load_expts`

## global option
scale_log2r <- TRUE

## proteins
# row clustering
prnHM(
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  cutree_rows = 10,
  show_rownames = FALSE,
  show_colnames = TRUE,
  fontsize_row = 3,
  cellwidth = 14,
  width = 18,
  height = 12,
  filter_sp = exprs(species == "human", prot_n_pep >= 2),
  filename = "huprns_npep2.png",
)

# rows ordered by kinase classes then by gene names
# (error if `normPSM(annot_kinases = FALSE, ...)`)
prnHM(
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = FALSE,
  annot_rows = c("kin_class"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 2,
  cellheight = 2,
  cellwidth = 14,
  width = 22,
  height = 22,
  filter_kin = exprs(kin_attr, species == "human"),
  arrange_kin = exprs(kin_order, gene),
  filename = "hukins_rows_by_class.png",
)

# `cutree_rows` ignored at `cluster_rows = FALSE`
prnHM(
  scale_log2r = TRUE,
  annot_cols = c("Group"),
  cluster_rows = FALSE,
  clustering_distance_rows  = "maximum",
  cutree_rows = 6,
  show_rownames = FALSE,
  show_colnames = TRUE,
  fontsize_row = 3,
  cellwidth = 14,
  width = 22,
  height = 22,
  filename = "cutree_overruled.png",
)

# `minkowski` distance and `ward.D2` clustering
prnHM(
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  cutree_rows = 10,
  show_rownames = FALSE,
  show_colnames = TRUE,
  fontsize_row = 3,
  cellwidth = 14,
  width = 18,
  height = 12,
  filter_sp = exprs(species == "human", prot_n_pep >= 2),
  hc_method_rows = "ward.D2", 
  hc_method_cols = "ward.D2", 
  clustering_distance_rows = "minkowski", 
  clustering_distance_cols = "minkowski", 
  p_dist_rows = 2,
  p_dist_cols = 2,
  clustering_distance_cols = "manhattan", 
  filename = "rowminko2_colman_clustward.D2.png",
)

## additional row filtration by pVals (proteins, impute_na = FALSE)
# if not yet, run prerequisitive significance tests at `impute_na = FALSE`
pepSig(
  impute_na = FALSE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"],
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = FALSE)

# (`W16_vs_W2.pVal (W16-W2)` now a column key)
prnHM(
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  cutree_rows = 10,
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 3,
  cellwidth = 14,
  filter_sp = exprs(species == "human", prot_n_pep >= 2),
  filter_by = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
  filename = "pval_cutoff_at_1e6.png", 
)

## additional row filtration by pVals (proteins, impute_na = TRUE)
# if not yet, run prerequisitive NA imputation
pepImp(m = 2, maxit = 2)
prnImp(m = 5, maxit = 5)

# if not yet, run prerequisitive significance tests at `impute_na = TRUE`
pepSig(
  impute_na = TRUE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"],
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = TRUE)

prnHM(
  impute_na = TRUE, 
  xmin = -1,
  xmax = 1,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  cutree_rows = 10,
  show_rownames = FALSE,
  show_colnames = TRUE,
  fontsize_row = 3,
  cellwidth = 14,
  width = 18,
  height = 12,
  filter_prots_by_sp_npep = exprs(species == "human", prot_n_pep >= 3),
  filter_prots_by_pvals = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
  filename = "huprns_fil_impna.png",
)

## peptides
# under selected protein(s)
pepHM(
  xmin = -2,
  xmax = 2,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  annot_rows = c("gene"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 10,
  cellwidth = 12,
  cellheight = 12,
  width = 18,
  height = 12,
  filter_by = exprs(gene %in% c("NCL", "Ncl")),
  filename = "ncl_all.png",
)

# rows ordered by gene 
pepHM(
  xmin = -2,
  xmax = 2,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = FALSE,
  annot_rows = c("gene"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 10,
  cellwidth = 12,
  cellheight = 12,
  width = 18,
  height = 12,
  
  filter_by = exprs(gene %in% c("NCL", "Ncl")),
  arrange_peps_by = exprs(gene),
  filename = "ncl_rows_by_gene.png",
)

# rows ordered by sequence 
# (may try alternatively `exprs(pep_seq)` if `pep_seq_mod` not a column key in `Peptide.txt`)
pepHM(
  xmin = -2,
  xmax = 2,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = FALSE,
  annot_rows = c("gene"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 10,
  cellwidth = 12,
  cellheight = 12,
  width = 18,
  height = 12,
  filter_by = exprs(gene %in% c("NCL", "Ncl")),
  arrange_peps_by = exprs(pep_seq_mod),
  filename = "ncl_rows_by_seq.png",
)

# more options
pepHM(
  xmin = -2,
  xmax = 2,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = FALSE,
  annot_rows = c("gene", "W16_vs_W2.pVal (W16-W2)"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 10,
  cellwidth = 12,
  cellheight = 12,
  width = 18,
  height = 12,
  filter_by = exprs(gene %in% c("NCL", "Ncl")),
  filter_prots_by_pvals = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-5), 
  arrange_by = exprs(gene, -`W16_vs_W2.pVal (W16-W2)`), 
  filename = "ncl_more.png",
)

# selected samples
pepHM(
  col_select = BI_1, 
  xmin = -2,
  xmax = 2,
  xmargin = 0.1,
  annot_cols = c("Group", "Color", "Alpha", "Shape"),
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
  cluster_rows = TRUE,
  annot_rows = c("gene"),
  show_rownames = TRUE,
  show_colnames = TRUE,
  fontsize_row = 10,
  cellwidth = 12,
  cellheight = 12,
  width = 18,
  height = 12,
  filter_by = exprs(gene %in% c("NCL", "Ncl")),
  arrange_peps_by = exprs(gene),  
  filename = "ncl_bi1.png",
)

## multiple genes
genes <- c("NCL", "Ncl")

lapply(genes, function (gene) {
  gn <- gene
  
  pepHM(
    xmin = -2, 
    xmax = 2, 
    xmargin = 0.1, 
    annot_cols = c("Group", "Color", "Alpha", "Shape"),
    annot_colnames = c("Group", "Lab", "Batch", "WHIM"),
    cluster_rows = FALSE, 
    show_rownames = TRUE, 
    show_colnames = TRUE, 
    fontsize_row = 10, 
    cellwidth = 12, 
    cellheight = 12, 
    width = 18,
    height = 12,
    arrange_pep = exprs(pep_start, pep_end), 
    filter_sp = exprs(gene == !!gn), 
    filename = !!paste0(gene, ".png"),
  )
})

## Customer annotation colors 
annot_colors_group <- colorRampPalette(brewer.pal(n = 9, "Set1"))(12)
names(annot_colors_group) <- c("W16.BI.TMT1", "W16.BI.TMT2", 
                               "W16.JHU.TMT1", "W16.JHU.TMT2", 
                               "W16.PNNL.TMT1", "W16.PNNL.TMT2",
                               "W2.BI.TMT1", "W2.BI.TMT2", 
                               "W2.JHU.TMT1", "W2.JHU.TMT2", 
                               "W2.PNNL.TMT1", "W2.PNNL.TMT2")

annot_colors_lab <- brewer.pal(n = 3, "Set2")
names(annot_colors_lab) <- c("BI", "JHU", "PNNL")

annot_colors_batch <- brewer.pal(n = 4, "Set3")[1:2]
names(annot_colors_batch) <- c("TMT1", "TMT2")

annot_colors_whim <- brewer.pal(n = 4, "Set3")[3:4]
names(annot_colors_whim) <- c("W16", "W2")

annot_colors <- list(Group = annot_colors_group, 
                     Lab = annot_colors_lab, 
                     Batch = annot_colors_batch, 
                     WHIM = annot_colors_whim)

prnHM(
  xmin = -1, 
  xmax = 1, 
  xmargin = 0.1, 
  annot_cols = c("Group", "Color", "Alpha", "Shape"), 
  annot_colnames = c("Group", "Lab", "Batch", "WHIM"), 
  annotation_colors = annot_colors, 
  cluster_rows = TRUE, 
  cutree_rows = 10, 
  show_rownames = FALSE, 
  show_colnames = TRUE, 
  fontsize_row = 3, 
  cellwidth = 14, 
  filter_sp = exprs(species == "human"), 
  filename = custom.png,
)

qzhang503/proteoQ documentation built on April 13, 2025, 8:31 a.m.

qzhang503/proteoQ index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

qzhang503/proteoQ
Processing and Informatic Analysis of Mass Spectrometrirc Data

prnHM: Visualization of heat maps
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

Visualization of heat maps

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to prnHM in qzhang503/proteoQ...

R Package Documentation

Browse R Packages

We want your feedback!

qzhang503/proteoQ Processing and Informatic Analysis of Mass Spectrometrirc Data

prnHM: Visualization of heat maps In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

Visualization of heat maps

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to prnHM in qzhang503/proteoQ...

R Package Documentation

Browse R Packages

We want your feedback!

qzhang503/proteoQ
Processing and Informatic Analysis of Mass Spectrometrirc Data

prnHM: Visualization of heat maps
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data