plot_PVCA: Plot variance distribution by variable

View source: R/proteome_wide_diagnostics.R

plot_PVCAR Documentation

Plot variance distribution by variable

Description

Plot variance distribution by variable

Usage

plot_PVCA(data_matrix, sample_annotation,
  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", technical_factors = c("MS_batch",
  "instrument"), biological_factors = c("cell_line", "drug_dose"),
  fill_the_missing = -1, pca_threshold = 0.6,
  variance_threshold = 0.01, colors_for_bars = NULL, filename = NULL,
  width = NA, height = NA, units = c("cm", "in", "mm"),
  plot_title = NULL, theme = "classic", base_size = 20)

Arguments

data_matrix

features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. See "example_proteome_matrix" for more details (to call the description, use help("example_proteome_matrix"))

sample_annotation

data frame with:

  1. sample_id_col (this can be repeated as row names)

  2. biological covariates

  3. technical covariates (batches etc)

. See help("example_sample_annotation")

feature_id_col

name of the column with feature/gene/peptide/protein ID used in the long format representation df_long. In the wide formatted representation data_matrix this corresponds to the row names.

sample_id_col

name of the column in sample_annotation table, where the filenames (colnames of the data_matrix are found).

technical_factors

vector sample_annotation column names that are technical covariates

biological_factors

vector sample_annotation column names, that are biologically meaningful covariates

fill_the_missing

numeric value determining how missing values should be substituted. If NULL, features with missing values are excluded. If NULL, features with missing values are excluded.

pca_threshold

the percentile value of the minimum amount of the variabilities that the selected principal components need to explain

variance_threshold

the percentile value of weight each of the covariates needs to explain (the rest will be lumped together)

colors_for_bars

four-item color vector, specifying colors for the following categories: c('residual', 'biological', 'biol:techn', 'technical')

filename

path where the results are saved. If null the object is returned to the active window; otherwise, the object is save into the file. Currently only pdf and png format is supported

width

option determining the output image width

height

option determining the output image width

units

units: 'cm', 'in' or 'mm'

plot_title

title of the plot (e.g., processing step + representation level (fragments, transitions, proteins) + purpose (meanplot/corrplot etc))

theme

ggplot theme, by default classic. Can be easily overriden

Value

ggplot object with the plot

See Also

sample_annotation_to_colors, ggplot

Examples

matrix_test <- example_proteome_matrix[1:150, ]
pvca_plot <- plot_PVCA(matrix_test, example_sample_annotation, 
technical_factors = c('MS_batch', 'digestion_batch'),
biological_factors = c("Diet", "Sex", "Strain"))

## Not run: 
pvca_plot <- plot_PVCA(matrix_test, example_sample_annotation, 
technical_factors = c('MS_batch', 'digestion_batch'),
biological_factors = c("Diet", "Sex", "Strain"), 
filename = 'test_PVCA.png', width = 28, height = 22, units = 'cm')

## End(Not run)


symbioticMe/proBatch documentation built on April 9, 2023, 11:59 a.m.