prnPCA: PCA plots
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

pepPCA

R Documentation

PCA plots

Description

prnPCA visualizes the principal component analysis (PCA) for peptide data.

prnPCA visualizes the principal component analysis (PCA) for protein data.

Usage

pepPCA(
  col_select = NULL,
  col_group = NULL,
  col_color = NULL,
  col_fill = NULL,
  col_shape = NULL,
  col_size = NULL,
  col_alpha = NULL,
  color_brewer = NULL,
  fill_brewer = NULL,
  size_manual = NULL,
  shape_manual = NULL,
  alpha_manual = NULL,
  choice = c("prcomp"),
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  center_features = TRUE,
  scale_features = TRUE,
  show_ids = TRUE,
  show_ellipses = FALSE,
  dimension = 2,
  folds = 1,
  df = NULL,
  filepath = NULL,
  filename = NULL,
  theme = NULL,
  type = c("obs", "feats"),
  ...
)

prnPCA(
  col_select = NULL,
  col_group = NULL,
  col_color = NULL,
  col_fill = NULL,
  col_shape = NULL,
  col_size = NULL,
  col_alpha = NULL,
  color_brewer = NULL,
  fill_brewer = NULL,
  size_manual = NULL,
  shape_manual = NULL,
  alpha_manual = NULL,
  choice = c("prcomp"),
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  center_features = TRUE,
  scale_features = TRUE,
  show_ids = TRUE,
  show_ellipses = FALSE,
  dimension = 2,
  folds = 1,
  df = NULL,
  filepath = NULL,
  filename = NULL,
  theme = NULL,
  type = c("obs", "feats"),
  ...
)

Arguments

`col_select`	Character string to a column key in `expt_smry.xlsx`. At the `NULL` default, the column key of `Select` in `expt_smry.xlsx` will be used. In the case of no samples being specified under `Select`, the column key of `Sample_ID` will be used. The non-empty entries under the ascribing column will be used in indicated analysis.
`col_group`	Character string to a column key in `expt_smry.xlsx`. Samples corresponding to non-empty entries under `col_group` will be used for sample grouping in the indicated analysis. At the NULL default, the column key `Group` will be used. No data annotation by groups will be performed if the fields under the indicated group column is empty.
`col_color`	Character string to a column key in `expt_smry.xlsx`. Values under which will be used for the `color` aesthetics in plots. At the NULL default, the column key `Color` will be used. If NA, bypasses the aesthetics (a means to bypass the look-up of column `Color` and handle duplication in aesthetics).
`col_fill`	Character string to a column key in `expt_smry.xlsx`. Values under which will be used for the `fill` aesthetics in plots. At the NULL default, the column key `Fill` will be used. If NA, bypasses the aesthetics (a means to bypass the look-up of column `Fill` and handle duplication in aesthetics).
`col_shape`	Character string to a column key in `expt_smry.xlsx`. Values under which will be used for the `shape` aesthetics in plots. At the NULL default, the column key `Shape` will be used. If NA, bypasses the aesthetics (a means to bypass the look-up of column `Shape` and handle duplication in aesthetics).
`col_size`	Character string to a column key in `expt_smry.xlsx`. Values under which will be used for the `size` aesthetics in plots. At the NULL default, the column key `Size` will be used. If NA, bypasses the aesthetics (a means to bypass the look-up of column `Size` and handle duplication in aesthetics).
`col_alpha`	Character string to a column key in `expt_smry.xlsx`. Values under which will be used for the `alpha` (transparency) aesthetics in plots. At the NULL default, the column key `Alpha` will be used. If NA, bypasses the aesthetics (a means to bypass the look-up of column `Alpha` and handle duplication in aesthetics).
`color_brewer`	Character string to the name of a color brewer for use in ggplot2::scale_color_brewer, i.e., `color_brewer = Set1`. At the NULL default, the setting in `ggplot2` will be used.
`fill_brewer`	Character string to the name of a color brewer for use in ggplot2::scale_fill_brewer, i.e., `fill_brewer = Spectral`. At the NULL default, the setting in `ggplot2` will be used.
`size_manual`	Numeric vector to the scale of sizes for use in ggplot2::scale_size_manual, i.e., `size_manual = c(8, 12)`. At the NULL default, the setting in `ggplot2` will be used.
`shape_manual`	Numeric vector to the scale of shape IDs for use in ggplot2::scale_shape_manual, i.e., `shape_manual = c(5, 15)`. At the NULL default, the setting in `ggplot2` will be used.
`alpha_manual`	Numeric vector to the scale of transparency of objects for use in ggplot2::scale_alpha_manual , i.e., `alpha_manual = c(.5, .9)`. At the NULL default, the setting in `ggplot2` will be used.
`choice`	Character string; the PCA method in `c("prcomp")`. The default is "prcomp".
`scale_log2r`	Logical; if TRUE, adjusts `log2FC` to the same scale of standard deviation across all samples. The default is TRUE. At `scale_log2r = NA`, the raw `log2FC` without normalization will be used.
`complete_cases`	Logical; always TRUE for PCA.
`impute_na`	Logical; if TRUE, data with the imputation of missing values will be used. The default is FALSE.
`center_features`	Logical; if TRUE, adjusts log2FC to center zero by features (proteins or peptides). The default is TRUE. Note the difference to data alignment with `method_align` in `standPrn` or `standPep` where log2FC are aligned by observations (samples).
`scale_features`	Logical; if TRUE, adjusts log2FC to the same scale of variance by features (protein or peptide entries). The default is TRUE. Note the difference to data scaling with `scale_log2r` where log2FC are scaled by observations (samples).
`show_ids`	Logical; if TRUE, shows the sample IDs in `MDS/PCA` plots. The default is TRUE.
`show_ellipses`	Logical; if TRUE, shows the ellipses by sample groups according to `col_group`. The default is FALSE.
`dimension`	Numeric; The desired dimension for pairwise visualization. The default is 2.
`folds`	Not currently used. Integer; the degree of folding data into subsets. The default is one without data folding.
`df`	The name of a primary data file. By default, it will be determined automatically after matching the types of data and analysis with an `id` among `c("pep_seq", "pep_seq_mod", "prot_acc", "gene")`. A primary file contains normalized peptide or protein data and is among `c("Peptide.txt", "Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt")`. For analyses require the fields of significance p-values, the `df` will be one of `c("Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt")`.
`filepath`	A file path to output results. By default, it will be determined automatically by the name of the calling function and the value of `id` in the `call`.
`filename`	A representative file name to outputs. By default, the name(s) will be determined automatically. For text files, a typical file extension is `.txt`. For image files, they are typically saved via `ggsave` or `pheatmap` where the image type will be determined by the extension of the file name.
`theme`	A ggplot2 theme, i.e., theme_bw(), or a custom theme. At the NULL default, a system theme will be applied.
`type`	Character string indicating the type of PCA by either observations or features. At the `type = obs` default, observations (samples) are in rows and features (peptides or proteins) in columns for `prcomp`. The principal components are then plotted by observations. Alternatively at `type = feats`, features (peptides or proteins) are in rows and observations (samples) are in columns. The principal components are then plotted by features.
`...`	`filter_`: Variable argument statements for the row filtration against data in a primary file linked to `df`. See also `normPSM` for the format of `filter_` statements. Arguments passed to `prcomp`: `rank.`, `tol` etc. At `type = obs`, argument `scale` becomes `scale_features` and `center` matches `center_features`. At `type = feats`, the setting of `scale_log2r` will be applied for data scaling and data centering be automated by `standPep` or `standPrn`. Additional arguments for `ggsave`: `width`, the width of plot; `height`, the height of plot `...`

Details

The utility is a wrapper of prcomp against log2FC. The results are then visualized by either observations or features. See also https://proteoq.netlify.app/post/wrapping-pca-into-proteoq/ for data centering by either observations or features.

Value

PCA plots.

Examples


# ===================================
# PCA
# ===================================

## !!!require the brief working example in `?load_expts`

## global option
scale_log2r <- TRUE

# peptides, all samples
pepPCA(
  col_select = Select, 
  filter_peps_by = exprs(pep_n_psm >= 3),
  show_ids = FALSE, 
  filename = "peps_rowfil.png",
)

# peptides, samples under column `BI`
pepPCA(
  col_select = BI, 
  col_shape = Shape,   
  col_color = Alpha, 
  filter_peps_by = exprs(pep_n_psm >= 10),
  show_ids = FALSE, 
  filename = "peps_rowfil_colsel.png",
)

# proteins
prnPCA(
  col_color = Color,
  col_shape = Shape,
  show_ids = FALSE,
  filter_peps_by = exprs(prot_n_pep >= 5),
  filename = "prns_rowfil.png",
)

# subset by mean deviation values
# deviations to means may not be symmetric;
prnPCA(
  col_select = Select, 
  filter_peps_by = exprs(prot_mean_z >= -.25, prot_mean_z <= .3),
  show_ids = FALSE, 
  filename = "subset_by_mean_dev.png",
)

# proteins, custom palette
prnPCA(
  col_shape = Shape,
  color_brewer = Set1,
  show_ids = FALSE,
  filename = "my_palette.png",
)

# proteins, by features
prnPCA(
  type = feats,
  scale_log2r = TRUE,
  filename = "by_feats.png",
)

## additional row filtration by pVals (proteins, impute_na = FALSE)
# if not yet, run prerequisitive significance tests at `impute_na = FALSE`
pepSig(
  impute_na = FALSE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"],
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = FALSE)

# (`W16_vs_W2.pVal (W16-W2)` now a column key)
prnPCA(
  col_color = Color,
  col_shape = Shape,
  show_ids = FALSE,
  filter_peps_by = exprs(prot_n_pep >= 5),
  filter_by = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
  filename = pvalcutoff.png, 
)

# analogous peptides
prnPCA(
  col_color = Color,
  col_shape = Shape,
  show_ids = FALSE,
  filter_peps_by = exprs(prot_n_pep >= 5),
  filter_by = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
  filename = pvalcutoff.png, 
)

## additional row filtration by pVals (proteins, impute_na = TRUE)
# if not yet, run prerequisitive NA imputation
pepImp(m = 2, maxit = 2)
prnImp(m = 5, maxit = 5)

# if not yet, run prerequisitive significance tests at `impute_na = TRUE`
pepSig(
  impute_na = TRUE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"],
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = TRUE)

prnPCA(
  impute_na = TRUE,
  col_color = Color,
  col_shape = Shape,
  show_ids = FALSE,
  filter_peps_by = exprs(prot_n_pep >= 5),
  filter_by = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
  filename = filpvals_impna.png, 
)

# analogous peptides
pepPCA(
  impute_na = TRUE,
  col_color = Color,
  col_shape = Shape,
  show_ids = FALSE,
  filter_peps_by = exprs(prot_n_pep >= 5),
  filter_by = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
  filename = filpvals_impna.png,
)

## a higher dimension
pepPCA(
  show_ids = FALSE,
  rank. = 5, 
  dimension = 3,
  filename = d3.pdf,
)

prnPCA(
  show_ids = TRUE,
  rank. = 4, 
  dimension = 3,
  filename = d3.png,
)

prnPCA(
  type = feats,
  rank. = 4, 
  dimension = 3,
  filename = feat_d3.png,
)

# show ellipses
# (column `expt_smry.xlsx::Color` codes `labs`.)
prnPCA(
  show_ids = FALSE,
  show_ellipses = TRUE,
  col_group = Color, 
  rank. = 4, 
  dimension = 3,
  filename = d3_labs.png,
)

# (column `expt_smry.xlsx::Shape` codes `WHIMs`.)
prnPCA(
  show_ids = FALSE,
  show_ellipses = TRUE,
  col_group = Shape, 
  rank. = 4, 
  dimension = 3,
  filename = d3_whims.png,
)

## custom theme
library(ggplot2)
my_theme <- theme_bw() + theme(
  axis.text.x  = element_text(angle=0, vjust=0.5, size=20),
  axis.text.y  = element_text(angle=0, vjust=0.5, size=20),
  axis.title.x = element_text(colour="black", size=20),
  axis.title.y = element_text(colour="black", size=20),
  plot.title = element_text(face="bold", colour="black", size=20, hjust=0.5, vjust=0.5),
  
  panel.grid.major.x = element_blank(),
  panel.grid.minor.x = element_blank(),
  panel.grid.major.y = element_blank(),
  panel.grid.minor.y = element_blank(),
  
  legend.key = element_rect(colour = NA, fill = 'transparent'),
  legend.background = element_rect(colour = NA,  fill = "transparent"),
  legend.title = element_blank(),
  legend.text = element_text(colour="black", size=14),
  legend.text.align = 0,
  legend.box = NULL
)

pepPCA(
  impute_na = TRUE,
  col_color = Color,
  col_shape = Shape,
  show_ids = FALSE,
  filter_peps_by = exprs(prot_n_pep >= 5),
  filter_by = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
  theme = my_theme, 
  filename = my_theme.png,
)

## direct uses of ggplot2
library(ggplot2)
res <- prnPCA(filename = foo.png)

# names(res)

p_fil <- ggplot(res$pca, aes(PC1, PC2)) +
  geom_point(aes(colour = Color, shape = Shape, alpha = Alpha), size = 4, stroke = 0.02) + 
  scale_alpha_manual(values = c(.5, .9)) + 
  stat_ellipse(aes(fill = Shape), geom = "polygon", alpha = .4) + 
  guides(fill = FALSE) + 
  labs(title = "", 
       x = paste0("PC1 (", res$prop_var[1], ")"), 
       y = paste0("PC2 (", res$prop_var[2], ")")) +
  coord_fixed() 

ggsave(file.path(dat_dir, "Protein/PCA/my_ggplot2_fil.png"))

## Not run: 
# Ambiguous matches of `scale` to `scale_log2r` or `scale_features`
prnPCA(scale = TRUE)

# need to match correct column key(s) in `expt_smry.xlsx`
prnPCA(
  col_color = "column_key_not_existed",
  col_shape = "another_missing_column_key"
)  

## End(Not run)

qzhang503/proteoQ documentation built on April 13, 2025, 8:31 a.m.

qzhang503/proteoQ index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

qzhang503/proteoQ
Processing and Informatic Analysis of Mass Spectrometrirc Data

prnPCA: PCA plots
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

PCA plots

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to prnPCA in qzhang503/proteoQ...

R Package Documentation

Browse R Packages

We want your feedback!

qzhang503/proteoQ Processing and Informatic Analysis of Mass Spectrometrirc Data

prnPCA: PCA plots In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

PCA plots

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to prnPCA in qzhang503/proteoQ...

R Package Documentation

Browse R Packages

We want your feedback!

qzhang503/proteoQ
Processing and Informatic Analysis of Mass Spectrometrirc Data

prnPCA: PCA plots
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data