prnHist: Histogram visualization

pepHistR Documentation

Histogram visualization


pepHist plots the histograms of peptide log2FC.

prnHist plots the histograms of protein log2FC.


  col_select = NULL,
  scale_log2r = TRUE,
  complete_cases = FALSE,
  cut_points = c(mean_lint = NA),
  show_curves = TRUE,
  show_vline = TRUE,
  scale_y = TRUE,
  df = NULL,
  filepath = NULL,
  filename = NULL,
  theme = NULL,

Character string to a column key in expt_smry.xlsx. At the NULL default, the column key of Select in expt_smry.xlsx will be used. In the case of no samples being specified under Select, the column key of Sample_ID will be used. The non-empty entries under the ascribing column will be used in indicated analysis.


Logical; if TRUE, adjusts log2FC to the same scale of standard deviation across all samples. The default is TRUE. At scale_log2r = NA, the raw log2FC without normalization will be used.


Logical; if TRUE, only cases that are complete with no missing values will be used. The default is FALSE.


A named, numeric vector defines the cut points (knots) in histograms. The default is cut_points = c(mean_lint = NA) where the cut points correspond to the quantile values under column mean_lint (mean log10 intensity) of input data. Values of log2FC will be then binned from -Inf to Inf according to the cut points. To disable data binning, set cut_points = Inf or -Inf. The binning of log2FC can also be achieved through a different numeric column, e.g., cut_points = c(prot_icover = seq(.25, .75, .25)). See also mergePep for data alignment with binning.


Logical; if TRUE, shows the fitted curves. At the TRUE default, the curve parameters are based on the latest call to standPep or standPrn with method_align = MGKernel. This feature can inform the effects of data filtration on the alignment of logFC profiles. Also see standPep and standPrn for more examples.


Logical; if TRUE, shows the vertical lines at x = 0. The default is TRUE.


Logical; if TRUE, scale data on the y-axis. The default is TRUE.


The name of a primary data file. By default, it will be determined automatically after matching the types of data and analysis with an id among c("pep_seq", "pep_seq_mod", "prot_acc", "gene"). A primary file contains normalized peptide or protein data and is among c("Peptide.txt", "Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt"). For analyses require the fields of significance p-values, the df will be one of c("Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt").


A file path to output results. By default, it will be determined automatically by the name of the calling function and the value of id in the call.


A representative file name to outputs. By default, the name(s) will be determined automatically. For text files, a typical file extension is .txt. For image files, they are typically saved via ggsave or pheatmap where the image type will be determined by the extension of the file name.


A ggplot2 theme, i.e., theme_bw(), or a custom theme. At the NULL default, a system theme will be applied.


filter_: Variable argument statements for the row filtration of data against the column keys in Peptide.txt for peptides or Protein.txt for proteins. Each statement contains to a list of logical expression(s). The lhs needs to start with filter_. The logical condition(s) at the rhs needs to be enclosed in exprs with round parenthesis.

For example, pep_len is a column key in Peptide.txt. The statement filter_peps_at = exprs(pep_len <= 50) will remove peptide entries with pep_len > 50. See also normPSM.

Additional parameters for plotting with ggplot2:
xmin, the minimum x at a log2 scale; the default is -2.
xmax, the maximum x at a log2 scale; the default is +2.
xbreaks, the breaks in x-axis at a log2 scale; the default is 1.
binwidth, the binwidth of log2FC; the default is (xmax - xmin)/80.
ncol, the number of columns; the default is 1.
width, the width of plot;
height, the height of plot.
scales, should the scales be fixed across panels; the default is "fixed" and the alternative is "free".


In the histograms, the log2FC under each TMT channel are color-coded by their contributing reporter-ion or LFQ intensity.


Histograms of log2FC; raw histogram data: [...]_raw.txt; fitted data for curves: [...]_fitted.txt

# ===================================
# Histogram
# ===================================

## !!!require the brief working example in `?load_expts`

## examplary `MGKernel` alignment
  method_align = MGKernel, 
  n_comp = 3, 
  seed = 749662, 
  maxit = 200, 
  epsilon = 1e-05, 

  method_align = MGKernel, 
  n_comp = 2, 
  seed = 749662, 
  maxit = 200, 
  epsilon = 1e-05, 

## (1) effects of data scaling
# peptide without log2FC scaling
pepHist(scale_log2r = FALSE)

# with scaling
pepHist(scale_log2r = TRUE)

## (2) sample column selection
# sample IDs indicated under column `Select` in `expt_smry.xlsx`
pepHist(col_select = Select, filename = colsel.png)

# protein data for samples under column `W2` in `expt_smry.xlsx`
prnHist(col_select = W2, filename = w2.png)

## (3) row filtration of data
# exclude oxidized methione or deamidated asparagine
  # filter_by = exprs(!grepl("[mn]", pep_seq_mod)),
  filter_by = exprs(not_contain_chars_in("mn", pep_seq_mod)),
  filename = "no_mn.png",

# phosphopeptide subset (error message if no matches)
  filter_peps = exprs(contain_chars_in("sty", pep_seq_mod)), 
  scale_y = FALSE, 
  filename = phospho.png,

# or use `grepl` directly
  filter_by = exprs(grepl("[sty]", pep_seq_mod)),
  filename = same_phospho.png,

## (4) between lead and lag
# leading profiles
  filename = lead.png,

# lagging profiles at
#   (1) n_psm >= 10
#   (2) and no methionine oxidation or asparagine deamidation
  filter_peps_by_npsm = exprs(pep_n_psm >= 10),
  filter_peps_by_mn = exprs(not_contain_chars_in("mn", pep_seq_mod)),
  filename = lag.png,

## (5) Data binning by `prot_icover`
  cut_points = c(prot_icover = NA),
  filename = prot_icover_coded.png,

## (6) custom theme
my_histo_theme <- theme_bw() + theme(
  axis.text.x  = element_text(angle=0, vjust=0.5, size=18),
  axis.ticks.x  = element_blank(), # x-axis ticks
  axis.text.y  = element_text(angle=0, vjust=0.5, size=18),
  axis.title.x = element_text(colour="black", size=24),
  axis.title.y = element_text(colour="black", size=24),
  plot.title = element_text(colour="black", size=24, hjust=.5, vjust=.5),
  strip.text.x = element_text(size = 18, colour = "black", angle = 0),
  strip.text.y = element_text(size = 18, colour = "black", angle = 90),
  panel.grid.major.x = element_blank(),
  panel.grid.minor.x = element_blank(),
  panel.grid.major.y = element_blank(),
  panel.grid.minor.y = element_blank(),
  legend.key = element_rect(colour = NA, fill = 'transparent'),
  legend.background = element_rect(colour = NA,  fill = "transparent"),
  legend.title = element_blank(),
  legend.text = element_text(colour="black", size=18),
  legend.text.align = 0, = NULL

  theme = my_histo_theme,
  filename = my_theme.png,

  col_select = BI_1,
  theme = theme_dark(),
  filename = bi1_dark.png,

## (7) direct uses of ggplot2
res <- pepHist(filename = default.png)

# names(res)

p <- ggplot() +
  geom_histogram(data = res$raw, aes(x = value, y = ..count.., fill = Int_index),
                 color = "white", alpha = .8, binwidth = .05, size = .1) +
  scale_fill_brewer(palette = "Spectral", direction = -1) +
  labs(title = "", x = expression("Ratio (" * log[2] * ")"), y = expression("Frequency")) +
  scale_x_continuous(limits = c(-2, 2), breaks = seq(-2, 2, by = 1),
                     labels = as.character(seq(-2, 2, by = 1))) +
  scale_y_continuous(limits = NULL) + 
  facet_wrap(~ Sample_ID, ncol = 5, scales = "fixed") # + 
  # my_histo_theme

p <- p + 
  geom_line(data = res$fitted, mapping = aes(x = x, y = value, colour = variable), size = .2) +
  scale_colour_manual(values = c("gray", "gray", "gray", "black"), name = "Gaussian",
                      breaks = c(c("G1", "G2", "G3"), paste(c("G1", "G2", "G3"), collapse = " + ")),
                      labels = c("G1", "G2", "G3", "G1 + G2 + G3"))

p <- p + geom_vline(xintercept = 0, size = .25, linetype = "dashed")

ggsave(file.path(dat_dir, "Peptide/Histogram/my_ggplot2.png"), 
       width = 22, height = 48, limitsize = FALSE)

## Not run: 
# sample selection
  col_select = "a_column_key_not_in_`expt_smry.xlsx`",

# data filtration
  filter_by = exprs(!grepl("[m]", a_column_key_not_in_data_table)),

  lhs_not_start_with_filter_ = exprs(n_psm >= 5),

## End(Not run)

