print(params) knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE, fig.path = params$output_figure)
library(dplyr) library(stringr) library(MassExpression) library(plotly) CompleteIntensityExperiment <- params$listInt$CompleteIntensityExperiment IntensityExperiment <- params$listInt$IntensityExperiment design <- colData(IntensityExperiment) comparisonExperiments <- listComparisonExperiments(CompleteIntensityExperiment)
This QC Report is designed to help Scientists quickly assess the several aspect of experiment quality, There are 4 categories in the QC report:
\clearpage
design <- as_tibble(design) design$SampleName <- make.names(design$SampleName) design <- design %>% tidyr::unite(SampleNameInPlots, Condition, Replicate, sep="_", remove=FALSE) %>% dplyr::select(-Replicate, everything()) design <- design %>% dplyr::select(SampleName, Condition, SampleNameInPlots, everything()) knitr::kable(design, row.names = FALSE)
\clearpage
The table below reports the number of proteins (Number of Proteins column) considered for each pairwise comparison analysis (proteins with more than 50% missing values across samples are removed) and the number of differentially expressed (DE) proteins detected in each comparison (Number DE Proteins column). A protein is defined DE if the adjusted P Value is less than 0.05. No threshold is applied on the log ratio to define a protein as DE.
The overall total number of proteins included in the experiment is r nrow(rowData(CompleteIntensityExperiment))
.
stats_one_comp <- function(se){ stats <- as_tibble(rowData(se)) total_proteins_in_experiment <- nrow(stats) total_proteins_de <- sum(stats$ADJ.PVAL < 0.05) list(total_proteins_in_experiment=total_proteins_in_experiment, total_proteins_de=total_proteins_de) } n_proteins <- sapply(1:length(comparisonExperiments), function(exp) stats_one_comp(comparisonExperiments[[exp]])) names_experiments <- names(comparisonExperiments) colnames(n_proteins) <- names_experiments n_proteins <- data.frame(t(n_proteins)) n_proteins$Comparison <- rownames(n_proteins) rownames(n_proteins) <- NULL n_proteins <- n_proteins %>% rename(`Number of Proteins` = total_proteins_in_experiment, `Number DE Proteins` = total_proteins_de) %>% select(Comparison, `Number of Proteins`, `Number DE Proteins`) knitr::kable(n_proteins, row.names = FALSE, caption="Summary of differential expression results across comparisons.")
The Principal Component Analysis (PCA) plot is used to visualise differences between samples that are induced by their intensity profiles. PCA transforms high-dimensional data, like thousands of measured proteins or peptides intensities, into a reduced set of dimensions. The first two dimensions explain the greatest variability between the samples and they are a useful visual tool to confirm known clustering of the samples or to identify potential problems in the data. This section displays two PCA plots: For a healthy experiment we expect: If unexpected clusters occur or replicates don't cluster together it can be due to extra variability introduced by factors such as technical processing, other unexplored biological differences, sample swaps etc... The interpretation and trust in the the differential expression results should take these consideration into a account. If you think that the samples in the experiment show largely unexpected patterns, it is advisable to request support from an analyst. A scree plot shows the amount of variance explained by each dimension extracted by PCA. A high degree of variance in the first few dimensions may suggest large differences between your samples.Principal Component Analysis details
## PCA p=plot_chosen_pca_experiment(CompleteIntensityExperiment, format = params$format) p[[1]]
p[[2]]
p=plot_chosen_pca_experiment(CompleteIntensityExperiment, format = params$format, auto_select_features = "de") if(is.null(p[[1]])){ text <- "Not enough differentially expressed proteins to produce a PCA plot." }
r if(is.null(p[[1]])) print(text)
The Coefficient of Variation (CV) or Relative Standard Deviation, is calculated by the ratio of the standard deviation to the mean. It is used to measure the precision of a measure, in this case protein/peptide intensity. The plot below shows the distribution of the CVs by experimental conditions where each CV is calculated by protein and by experimental condition. The CV is displayed as %CV, which is the percentage of the mean represented by the standard deviation. For a healthy experiment we expect: If the distributions show worringly large %CV, this could affect the quality of the differential expression analysis.
Coefficient of Variation details
p=plot_condition_cv_distribution(IntensityExperiment) if (params$format == "pdf"){ p[[1]] } else { ggplotly(p[[1]]) %>% plotly::config(displayModeBar = T, modeBarButtons = list(list('toImage')), displaylogo = F) }
cv_data <- p[[2]] cv_data <- as_tibble(cv_data) %>% group_by(Condition) %>% summarise(`Median CV %` = round(median(cv, na.rm = TRUE)*100)) knitr::kable(cv_data, row.names = FALSE)
The correlation plot shows the Pearson's correlation between the samples in the experiment. Hierarchical clustering is adopted to order the samples in the matrix. Clustering of samples with high correlation aids with the visual inspection of similarity between samples. This section displays two correlation plots: For a healthy experiment we expect:Correlation plots details
plot_samples_correlation_matrix(CompleteIntensityExperiment)
p=plot_samples_correlation_matrix(CompleteIntensityExperiment, onlyDEProteins = TRUE) if(is.null(p)){ text <- "Not enough differentially expressed proteins to produce a correlation plot." }
r if(is.null(p)) print(text)
The amount of missing values can be affected by the biological condition or by technical factors and it can vary largely between experiments. For a healthy experiment we expect: There isn't a strict threshold to look for in terms of minimum % of available measurements. However, an unusually low value in one or a few replicates can be symptomatic of technical problems and should be taken into account when interpreting the final differential expression results.
Data completedness details
p <- plot_replicate_measured_values(IntensityExperiment, title = NULL) if (params$format == "pdf"){ p } else { ggplotly(p, tooltip = c("y")) %>% plotly::config(displayModeBar = T, modeBarButtons = list(list('toImage')), displaylogo = F) }
p <- plot_protein_missingness(IntensityExperiment, title = NULL) if (params$format == "pdf"){ p } else { ggplotly(p, tooltip = c("y")) %>% plotly::config(displayModeBar = T, modeBarButtons = list(list('toImage')), displaylogo = F) }
Identifications of proteins is a measure of the number of non missing measurements by replicate. Low counts in a run may suggest a systematic flaw in the experiment that needs to be addressed prior to interpretation.Identifications Details
p <- plot_n_identified_proteins_by_replicate(IntensityExperiment) if (params$format == "pdf"){ p } else { ggplotly(p, tooltip = c("y")) %>% plotly::config(displayModeBar = T, modeBarButtons = list(list('toImage')), displaylogo = F) }
It is useful to inspect and compare the distributions of the intensities to identify samples with largely unusual distributions. The sections reported here show: For more details on each plot, inspect each section.
Distributions of raw, normalised (when requested), and imputed intensities
Missing values are not considered when creating the boxplot. Zero intensities are considered as missing values.Log2 raw intensities distributions
normalised <- FALSE if(metadata(CompleteIntensityExperiment)$NormalisationAppliedToAssay != "None"){ normalised <- TRUE } # Plot RLE of log2 raw intensity as well as RLE of normalised p_raw <- plot_log_measurement_boxplot(IntensityExperiment, format = "pdf", title = "log2 Raw Intensities") p_raw
It is useful to inspect the distribution of the Relative Log Expression (RLE) values to identify samples with largely unusual distributions. The RLE values for a protein are obtained by centering intensities to the protein medians, where the median is computed using only available intensities, i.e. non zero values. The RLE is computed on the log-transformed data before and after applying normalisation, when required. For a healthy experiment we expect: If some samples show large deviations from the expected behaviour, it can be symptomatic of problems in the pre-processing of those samples.
Relative Log Expression distributions
normalised <- FALSE if(metadata(CompleteIntensityExperiment)$NormalisationAppliedToAssay != "None"){ normalised <- TRUE } # Plot RLE of log2 raw intensity as well as RLE of normalised p_raw <- plot_rle_boxplot(IntensityExperiment, CompleteIntensityExperiment, includeImputed = FALSE, plotRawRLE = TRUE, title = "RLE of log2 Raw Intensities", format = "pdf") p_raw
if(normalised){ # Plot RLE of log2 raw intensity as well as RLE of normalised p_norm <- plot_rle_boxplot(IntensityExperiment, CompleteIntensityExperiment, includeImputed = FALSE, plotRawRLE = FALSE, title = "RLE of Normalised log2 Intensities", format = "pdf") p_norm }
Initial intensities equal to zero are considered as missing values and imputed prior to the DE analysis. Imputation is performed using the MNAR ("Missing Not At Random") method as adopted in Perseus. Imputed values are randomly drawn from a normal distribution with mean equal to the observed mean (mean of the available intensities) shifted by -1.8 times the observed standard deviation, and a standard deviation equal to the observed standard deviation scaled by a factor of 0.3 (as in Perseus). The plots below show the distribution of imputed values (Imputed = TRUE) and actual values (Imputed = FALSE), all of which are then used for the downstream DE analyses. Density distribution of imputed vs actual intensities
plot_imputed_vs_not(CompleteIntensityExperiment = CompleteIntensityExperiment, format = params$format)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.