anal_prnTrend: Trend analysis of protein data
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

anal_prnTrend

R Documentation

Trend analysis of protein data

Description

anal_prnTrend performs unsupervised clustering of protein log2FC.

Usage

anal_prnTrend(
  col_select = NULL,
  col_group = NULL,
  col_order = NULL,
  choice = c("cmeans", "clara", "kmeans", "pam", "fanny"),
  n_clust = NULL,
  scale_log2r = TRUE,
  complete_cases = FALSE,
  impute_na = FALSE,
  df = NULL,
  filepath = NULL,
  filename = NULL,
  ...
)

Arguments

`col_select`	Character string to a column key in `expt_smry.xlsx`. At the `NULL` default, the column key of `Select` in `expt_smry.xlsx` will be used. In the case of no samples being specified under `Select`, the column key of `Sample_ID` will be used. The non-empty entries under the ascribing column will be used in indicated analysis.
`col_group`	Character string to a column key in `expt_smry.xlsx`. Samples corresponding to non-empty entries under `col_group` will be used for sample grouping in the indicated analysis. At the NULL default, the column key `Group` will be used. No data annotation by groups will be performed if the fields under the indicated group column is empty.
`col_order`	Character string to a column key in `expt_smry.xlsx`. Numeric values under which will be used for the left-to-right arrangement of samples in graphic outputs or top-to-bottom arrangement in text outputs. At the NULL default, the column key `Order` will be used. If values under column `Order` are left blank, samples will be ordered by their names.
`choice`	Character string; the clustering method in `c("cmeans", "clara", "kmeans", "pam", "fanny")`. The default is "cmeans".
`n_clust`	Numeric vector; the number(s) of clusters that data will be divided into. At the NULL default, it will be determined by the gap method in `clusGap`. The `n_clust` overwrites the argument `centers` in `cmeans`.
`scale_log2r`	Logical; if TRUE, adjusts `log2FC` to the same scale of standard deviation across all samples. The default is TRUE. At `scale_log2r = NA`, the raw `log2FC` without normalization will be used.
`complete_cases`	Logical; if TRUE, only cases that are complete with no missing values will be used. The default is FALSE.
`impute_na`	Logical; if TRUE, data with the imputation of missing values will be used. The default is FALSE.
`df`	The name of a primary data file. By default, it will be determined automatically after matching the types of data and analysis with an `id` among `c("pep_seq", "pep_seq_mod", "prot_acc", "gene")`. A primary file contains normalized peptide or protein data and is among `c("Peptide.txt", "Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt")`. For analyses require the fields of significance p-values, the `df` will be one of `c("Peptide_pVal.txt", "Peptide_impNA_pVal.txt", "Protein_pVal.txt", "protein_impNA_pVal.txt")`.
`filepath`	Use system default.
`filename`	A representative file name to outputs. By default, it will be determined automatically by the name of the current call.
`...`	`filter_`: Variable argument statements for the row filtration against data in a primary file linked to `df`. See also `normPSM` for the format of `filter_` statements. `arrange_`: Variable argument statements for the row ordering against data in a primary file linked to `df`. See also `prnHM` for the format of `arrange_` statements. Additional arguments for cmeans, kmeans, clara, pam. Note that `centers` in cmeans or kmeans is replaced with `n_clust`. The same applies to `k` in clara or pam. With cmeans, `m` is according to Schwaemmle and Jensen if not provided; `x` is disabled with input data being determined automatically.

Details

The option of complete_cases will be forced to TRUE at impute_na = FALSE

Value

Classified log2FC.

Examples


# ===================================
# Trend analysis
# ===================================

## !!!require the brief working example in `?load_expts`

## global option
scale_log2r <- TRUE


# ===================================
# Analysis
# ===================================
## base (proteins, with sample order supervision)
anal_prnTrend(
  impute_na = FALSE,
  col_order = Order,
  n_clust = c(5:6), 
)

## against selected samples
anal_prnTrend(
  col_select = BI,
  impute_na = FALSE,
  col_order = Order,
  n_clust = c(5:6), 
  filename = sel.txt,
)

## row filtration (proteins)
anal_prnTrend(
  impute_na = FALSE,
  col_order = Order,
  n_clust = c(5:6), 
  filter_prots_by = exprs(prot_n_pep >= 2),
)

## manual m degree of fuzziness (proteins)
anal_prnTrend(
  impute_na = FALSE,
  col_order = Order,
  n_clust = c(5:6), 
  filter_prots = exprs(prot_n_pep >= 2),
  m = 1.5,
  filename = my_m.txt,
)

## additional row filtration by pVals (proteins, impute_na = FALSE)
# if not yet, run prerequisitive significance tests at `impute_na = FALSE`
pepSig(
  impute_na = FALSE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"],
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = FALSE)

# (`W16_vs_W2.pVal (W16-W2)` now a column key)
anal_prnTrend(
  impute_na = FALSE,
  col_order = Order,
  n_clust = c(5:6), 
  filter_prots_by_npep = exprs(prot_n_pep >= 3), 
  filter_prots_by_pval = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
)

## additional row filtration by pVals (impute_na = TRUE)
# if not yet, run prerequisitive NA imputation and corresponding 
# significance tests at `impute_na = TRUE`
pepImp(m = 2, maxit = 2)
prnImp(m = 5, maxit = 5)

pepSig(
  impute_na = TRUE, 
  W2_bat = ~ Term["(W2.BI.TMT2-W2.BI.TMT1)", 
                  "(W2.JHU.TMT2-W2.JHU.TMT1)", 
                  "(W2.PNNL.TMT2-W2.PNNL.TMT1)"],
  W2_loc = ~ Term_2["W2.BI-W2.JHU", 
                    "W2.BI-W2.PNNL", 
                    "W2.JHU-W2.PNNL"],
  W16_vs_W2 = ~ Term_3["W16-W2"], 
)

prnSig(impute_na = TRUE)

anal_prnTrend(
  impute_na = TRUE,
  col_order = Order,
  n_clust = c(5:6), 
  filter_prots_by_npep = exprs(prot_n_pep >= 3), 
  filter_prots_by_pval = exprs(`W16_vs_W2.pVal (W16-W2)` <= 1e-6), 
)


# ===================================
# Visualization
# ===================================
## base (proteins, no NA imputation) 
plot_prnTrend(
  col_order = Order, 
)

# at specific cluster ID(s)
# (`cluster` is a column key in `Protein_Trend_[...].txt`)
plot_prnTrend(
  impute_na = FALSE, 
  col_order = Order,
  filter2_by_clusters = exprs(cluster == 5),
  width = 8, 
  height = 10,
  filename = cl5.png,
)

# manual selection of secondary input data file(s)
# may be used for optimizing individual plots
plot_prnTrend(
  df2 = c("Protein_Trend_Z_nclust5.txt"),
  col_order = Order, 
  filename = n5.png,
)

# manual secondary input(s) at specific rank(s)
plot_prnTrend(
  df2 = c("Protein_Trend_Z_nclust5.txt"),
  impute_na = FALSE, 
  col_order = Order,
  filter2_by_clusters = exprs(cluster == 5),
  width = 8, 
  height = 10,
  filename = n5_cl5.png,
)

## NA imputation
# also save as pdf
plot_prnTrend(
  impute_na = TRUE,
  col_order = Order,
  filename = my.pdf,
)

## against selected samples
plot_prnTrend(
  col_order = Order, 
  col_select = BI,
  filename = bi.png,
)

## custom theme
library(ggplot2)
my_trend_theme <- theme_bw() + theme(
  axis.text.x  = element_text(angle=60, vjust=0.5, size=24),
  axis.ticks.x  = element_blank(), 
  axis.text.y  = element_text(angle=0, vjust=0.5, size=24),
  axis.title.x = element_text(colour="black", size=24),
  axis.title.y = element_text(colour="black", size=24),
  plot.title = element_text(face="bold", colour="black",
                            size=20, hjust=.5, vjust=.5),
  panel.grid.major.x = element_blank(),
  panel.grid.minor.x = element_blank(),
  panel.grid.major.y = element_blank(),
  panel.grid.minor.y = element_blank(),
  panel.background = element_rect(fill = '#0868ac', colour = 'red'),
  
  strip.text.x = element_text(size = 24, colour = "black", angle = 0),
  strip.text.y = element_text(size = 24, colour = "black", angle = 90),
  
  plot.margin = unit(c(5.5, 55, 5.5, 5.5), "points"), 
  
  legend.key = element_rect(colour = NA, fill = 'transparent'),
  legend.background = element_rect(colour = NA,  fill = "transparent"),
  legend.position = "none",
  legend.title = element_text(colour="black", size=18),
  legend.text = element_text(colour="black", size=18),
  legend.text.align = 0,
  legend.box = NULL
)

plot_prnTrend(
  col_order = Order, 
  col_select = BI,
  theme = my_trend_theme,
  filename = my_theme.png,
)

## no grouping 
# each sample under column `Select` forms its own group
anal_prnTrend(
  col_group = Select,
  col_order = Order,
  n_clust = 6, 
  filter_prots = exprs(prot_n_pep >= 2),
  filename = sample_ids_as_groups.txt,
)

plot_prnTrend(
  df2 = "sample_ids_as_groups_Protein_Trend_Z_nclust6.txt",
  filter2_by_clusters = exprs(cluster == 4),
  width = 24,
  height = 16,
)

## grouped by column `Term_2` in metadata
anal_prnTrend(
  col_group = Term_2,
  col_order = Order,
  n_clust = 6, 
  filter_prots = exprs(prot_n_pep >= 2),
  filename = term_2_grouping.txt,
)

plot_prnTrend(
  df2 = "term_2_grouping_Protein_Trend_Z_nclust6.txt",
  filter2_by_clusters = exprs(cluster == 3),
  width = 6,
  height = 6,
)

## Cytoscape visualization
# (Make sure that Cytoscape is open.)
# Human
cluego(
  df2 = "Protein_Trend_Z_nclust5.txt", 
  species = c(human = "Homo Sapiens"), 
  n_clust = c(3, 5)
)

# Mouse
cluego(
  df2 = "Protein_Trend_Z_nclust5.txt", 
  species = c(mouse = "Mus Musculus"), 
  n_clust = c(3:4)
)

qzhang503/proteoQ documentation built on April 13, 2025, 8:31 a.m.

qzhang503/proteoQ index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

qzhang503/proteoQ
Processing and Informatic Analysis of Mass Spectrometrirc Data

anal_prnTrend: Trend analysis of protein data
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

Trend analysis of protein data

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to anal_prnTrend in qzhang503/proteoQ...

R Package Documentation

Browse R Packages

We want your feedback!

qzhang503/proteoQ Processing and Informatic Analysis of Mass Spectrometrirc Data

anal_prnTrend: Trend analysis of protein data In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data

Trend analysis of protein data

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to anal_prnTrend in qzhang503/proteoQ...

R Package Documentation

Browse R Packages

We want your feedback!

qzhang503/proteoQ
Processing and Informatic Analysis of Mass Spectrometrirc Data

anal_prnTrend: Trend analysis of protein data
In qzhang503/proteoQ: Processing and Informatic Analysis of Mass Spectrometrirc Data