plot_hierarchical_clustering: cluster the data matrix to visually inspect which confounder...
In proBatch: Tools for Diagnostics and Corrections of Batch Effects in Proteomics

Description Usage Arguments Value See Also Examples

View source: R/proteome_wide_diagnostics.R

cluster the data matrix to visually inspect which confounder dominates

plot_hierarchical_clustering(
  data_matrix,
  sample_annotation,
  sample_id_col = "FullRunName",
  color_list = NULL,
  factors_to_plot = NULL,
  fill_the_missing = 0,
  distance = "euclidean",
  agglomeration = "complete",
  label_samples = TRUE,
  label_font = 0.2,
  filename = NULL,
  width = 38,
  height = 25,
  units = c("cm", "in", "mm"),
  plot_title = NULL,
  ...
)

`data_matrix`	features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. See "example_proteome_matrix" for more details (to call the description, use `help("example_proteome_matrix")`)
`sample_annotation`	data frame with: `sample_id_col` (this can be repeated as row names) biological covariates technical covariates (batches etc) . See `help("example_sample_annotation")`
`sample_id_col`	name of the column in `sample_annotation` table, where the filenames (colnames of the `data_matrix` are found).
`color_list`	list, as returned by `sample_annotation_to_colors`, where each item contains a color vector for each factor to be mapped to the color.
`factors_to_plot`	vector of technical and biological covariates to be plotted in this diagnostic plot (assumed to be present in `sample_annotation`)
`fill_the_missing`	numeric value determining how missing values should be substituted. If `NULL`, features with missing values are excluded.
`distance`	distance metric used for clustering
`agglomeration`	agglomeration methods as used by `hclust`
`label_samples`	if `TRUE` sample IDs (column names of `data_matrix`) will be printed
`label_font`	size of the font. Is active if `label_samples` is `TRUE`, ignored otherwise
`filename`	path where the results are saved. If null the object is returned to the active window; otherwise, the object is save into the file. Currently only pdf and png format is supported
`width`	option determining the output image width
`height`	option determining the output image width
`units`	units: 'cm', 'in' or 'mm'
`plot_title`	title of the plot (e.g., processing step + representation level (fragments, transitions, proteins) + purpose (meanplot/corrplot etc))
`...`	other parameters of `plotDendroAndColors` from `WGCNA` package

No return

hclust, sample_annotation_to_colors, plotDendroAndColors

selected_batches = example_sample_annotation$MS_batch %in% 
                                              c('Batch_1', 'Batch_2')
selected_samples = example_sample_annotation$FullRunName[selected_batches]
test_matrix = example_proteome_matrix[,selected_samples]

hierarchical_clustering_plot <- plot_hierarchical_clustering(
example_proteome_matrix, example_sample_annotation,
factors_to_plot = c('MS_batch', 'Diet', 'DateTime'),
color_list = NULL,  
distance = "euclidean", agglomeration = 'complete',
label_samples = FALSE)

#with defined color scheme:
color_list <- sample_annotation_to_colors (example_sample_annotation, 
factor_columns = c('MS_batch', "Strain", "Diet", "digestion_batch"),
numeric_columns = c('DateTime', 'order'))
hierarchical_clustering_plot <- plot_hierarchical_clustering(
example_proteome_matrix, example_sample_annotation,
factors_to_plot = c('MS_batch', "Strain", 'DateTime', "digestion_batch"),
color_list = color_list,  
distance = "euclidean", agglomeration = 'complete',
label_samples = FALSE)