knitr::opts_chunk$set(dev = "pdf")
diff_true <- nrow(SummarizedExperiment::rowData(dds)[!is.na(SummarizedExperiment::rowData(dds, use.names = FALSE)$significant), ]) != 0
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=80), tidy=TRUE)
contrasts <- SummarizedExperiment::rowData(dds) %>% 
    data.frame(check.names = FALSE) %>% 
    select(starts_with("lfc.")) %>% 
    colnames() %>% str_replace_all("lfc.", "")

Summary

r #browser() Original file contains r data@metadata$n.pre_filt genes, of which r ifelse(!is.null(data@metadata$n.filtered), data@metadata$n.filtered, 0) were removed due to missing values (filt_type = 'r data@metadata$filt_type'), and r ifelse(!is.null(data@metadata$n.filtered.rsd), data@metadata$n.filtered.rsd, 0) due to too high RSD (threshold: r ifelse(!is.null(data@metadata$rsd.thresh), data@metadata$rsd.thresh, NA)).

r nrow(dds) genes were reproducibly quantified.

In total r ncol(data) samples were detected:

r colFmt(colnames(data), 'blue')

Samples were grouped into r length(unique(data[['condition']])) conditions:

r colFmt(unique(data[['condition']]), 'blue')

r length(which(SummarizedExperiment::rowData(dds)[!is.na(SummarizedExperiment::rowData(dds, use.names = FALSE)$significant), "significant"])) genes differ significantly between samples.

if (length(contrasts) > 1){
  for (i in 1:length(contrasts)) {
    contrasts_safe <- gsub("_", "\\_", contrasts[i], fixed = TRUE)
    cat(paste0("* _",
               nrow(dds[SummarizedExperiment::rowData(dds)[[paste0("significant.", contrasts[i])]][!is.na(SummarizedExperiment::rowData(dds)[[paste0("significant.", contrasts[i])]])],]),
               "_ genes for contrast ", contrasts_safe, "\n"))
  }
}

Parameters used for the analysis:

\pagebreak

Gene numbers

Gene identifications per sample:

pg_width = ncol(data) / 2
if (pg_width > 10) { pg_width = 10 }
suppress_ggrepel <- function(w) {
    if (any(grepl("ggrepel", w)))
      invokeRestart("muffleWarning")
  }
rna.plot_numbers(data, plot = T, export = F, basesize = 9)

Cooks distance in all samples for outlier detection.

par(mar=c(8,5,2,2))
boxplot(log10(assays(dds)[["cooks"]]), range=0, las=2)
title("Box plot of Cook's distance")

\pagebreak

Normalization

The data was normalized with automatically estimated size factors using the "median ratio method" (see ?DESeq2::estimateSizeFactors,DESeqDataSet-method) and by adding a pseudocount of 1/2 to allow for log scale plotting.

norm_height = ncol(data) / 2.2
if (norm_height > 9.5) {norm_height = 9.5}

Box plots of the distributions before and after normalization:

data.log <- data
assay(data.log) <- log2(assay(data.log))
norm.log <- dds
assay(norm.log) <- log2(assay(norm.log))
prot.boxplot_intensity(data.log, norm.log, plot = T, export = F)

\pagebreak

Missing values

Missing values in the data set must be imputed. The downstream analysis can be strongly influenced by the imputation. Therefore, the selection of an appropriate method is crucial. The data can be missing at random (MAR), for example if genes are quantified in some replicates but not in others, without any bias towards low intensities. This class of missing values can arise when the peptide sequence is mapped incorrectly or software erroneously assigns shared peptides to precursors leading to misidentification in some samples and missing values in others.\ MNAR may result from experimental effects such as (1) enzyme miscleavages, (2) true presence/absence in the biological samples and (3) instrumentation effects (when peptide measurements are low in abundance compared to background noise or constitute low ionization efficiency). Because values are missing because of low abundant nature of the respective genes, this category of missing values is considered left-censored, i.e., the distribution of values (if present in the data) would fall on the left tail of the total observations in the dataset.

To asses the type of missing data (random or not), the following heat map of missing values and density distributions of genes with/without missing values provide a visual aid.\ If data is randomly missing, use the k-nearest neighbor (“knn”) or maximum likelihood (“MLE”) options for imputation. If the missing data is biased to certain samples (e.g. controls) which are expected to be depleted of certain genes and/or if there is a clear bias of missing values towards low abundances, use the "QRILC", "MinProb", "SampMin", or "man" options for imputation. For a more detailed description of different imputation methods, see the MSnbase vignette and in particular the impute function description. The "SampMin" method was proposed by Liu and Dongre (2020)) and shown to outperform other common methods for MNAR, left-censored datasets.

if (any(is.na(data.frame(SummarizedExperiment::assay(data))))) {
  suppressMessages(prot.plot_missval(data, fontsize = 12, export = F))
}

\pagebreak To check whether missing values are biased toward low-intensity genes (i.e., 'left-censored'), densities and cumulative proportions are plotted for genes with and without missing values.

  se_log2 <- data
  assay(se_log2) <- log2(assay(se_log2))
  try(suppressMessages(
    prot.plot_detect(se_log2, basesize = 10, plot = T, export = F)
  ))

\pagebreak The effects of data normalization and imputation on the distributions are visualized via density distributions:

suppressMessages(
    rna.plot_imputation(log2(SummarizedExperiment::assay(data)+1),
                        "Imputed" = log2(SummarizedExperiment::assay(imp)+1),
                        "Normalized" = log2(counts(dds, normalized = TRUE)+1), colData = SummarizedExperiment::colData(data),
                        plot = T, export = F, basesize = 12)
  )

\pagebreak

Quality control

Principal component analysis

To get a high-level overview of the data, the unsupervised method principal component analysis (PCA) reduces the data dimensionality (i.e., number of genes included in the analysis) while retaining most of the data information.

ncomp <- 10
if(length(PCAtools::getComponents(pca)) < 10) ncomp = length(PCAtools::getComponents(pca))
withCallingHandlers(
    suppressWarnings(
        prot.plot_screeplot(pca, axisLabSize = 16, titleLabSize = 17, plot = T, export = F, 
                            components = PCAtools::getComponents(pca)[1:ncomp])
    ), warning = suppress_ggrepel)

\pagebreak

2D PCA plots

Notice that the new coordinates (PCs) are no longer real variables generated by the system. Thus, applying PCA to your dataset loses its interpretability. However, PCA can be very useful to observe batch effects and helps to assess which original samples are similar and different from each other.

suppressMessages(
      rna.plot_pca(dds,x = 1, y = 2, point_size = 4, basesize = 12, plot = T, export = F, title = "PC Scores - PC2 vs. PC1")
    )
suppressMessages(
      rna.plot_pca(dds,x = 1, y = 3, point_size = 4, basesize = 12, plot = T, export = F, title = "PC Scores - PC3 vs. PC1")
    )

\pagebreak

Loadings plot

PCA loadings are the coefficients of the linear combination of the original variables (genes) from which the principal components (PCs) are generated. They describe how much each gene contributes to a particular principal component. Large loadings (positive or negative) indicate that a particular variable has a strong relationship to a particular principal component. The sign of a loading indicates whether a variable and a principal component are positively or negatively correlated.

 withCallingHandlers(suppressMessages(
      suppressWarnings(prot.plot_loadings(pca,labSize = 3, plot = T, export = F)) ), 
                     warning = suppress_ggrepel )

\pagebreak

Correlation matrix for samples with all genes

    prot.plot_corrheatmap(dds, font_size = 12, significant = FALSE )

\pagebreak

Differential expression analysis

r if_else(diff_true, "Correlation matrix for samples with differentially expressed genes", "")

  prot.plot_corrheatmap(dds, font_size = 12, significant = T )

\pagebreak

Heatmaps

The heat map gives an overview of all genes with significantly different abundances (rows) in all samples (bars). This allows general trends to be identified, for example when one sample or replicate differs from others. In addition, clustering of samples (columns) can point to more similar samples and clustering of genes (rows) can indicate genes that behave in a similar way.

width = 6+length(dds$condition)/10
if (width < 7) { width = 7 }
if (width > 12) { width = 12 }
len = nrow(SummarizedExperiment::rowData(dds)[!is.na(SummarizedExperiment::rowData(dds, use.names = FALSE)$significant), ]) / 55
if (len < 5) { len = 5 }
if(export==TRUE){
  knitr::include_graphics(paste0(dirname(wd), "/Plots/HeatMap_contrast.pdf"), dpi = 300)
} else {
  try(suppressMessages(
    rna.plot_heatmap(
      dds,
      type = "contrast",
      kmeans = ifelse(exists("heatmap.kmeans"), heatmap.kmeans, FALSE),
      k = ifelse(exists("k"), k, 6),
      show_all = heatmap.show_all,
      show_row_names = T,
      row_font_size = heatmap.row_font_size,
      plot = T,
      export = F
    )
  ))
}

\pagebreak

if(export==TRUE){
  knitr::include_graphics(paste0(dirname(wd), "/Plots/HeatMap_centered.pdf"),dpi = 300)
} else {
  try(suppressMessages(
        rna.plot_heatmap(dds, type = "centered", 
                          kmeans = ifelse(exists("heatmap.kmeans"), heatmap.kmeans, FALSE),
                          k = ifelse(exists("k"), k, 6),
                          show_all = T, 
                          show_row_names = T, 
                          row_font_size = heatmap.row_font_size, 
                          indicate = c("condition"),
                          plot = T, 
                          export = F)
      ))
}

\pagebreak

Volcano plot(s)

Volcano plots can be used to visualize a particular contrast (comparison between two samples). This allows to examine the gene enrichment between two samples (x-axis) and their corresponding (adjusted) p-values (y-axis).

if(export==TRUE){
  volcano_list <- c()
  for(i in 1:length(grep(paste("VolcanoPlot_", contrasts, ".pdf", sep="", collapse = "|"), list.files(path = paste0(dirname(wd), "/Plots"))))){
    volcano_list <- c(volcano_list, list.files(path = paste0(dirname(wd), "/Plots"), 
                                  pattern = paste0("VolcanoPlot_", contrasts[i],".pdf"),
                                  full.names = TRUE))
  }
  knitr::include_graphics(volcano_list ,dpi = 300)
} else {
for (i in 1:length(contrasts)){
    volcano.temp <- rna.plot_volcano(dds, contrast = contrasts[i],
                            add_names = volcano.add_names, label_size = 2.5, adjusted =  volcano.adjusted,
                            plot = T, export = F, lfc = param$lfc, alpha = param$alpha)
  }
}

\pagebreak

r if(pathway_enrichment == TRUE){"Pathway overrepresentation analysis"}

r if(pathway_enrichment == TRUE){"Genes that were identified as differentially expressed based on the chosen log2(fold change) and adj. p-value thresholds are used to identify enriched pathways with FDR control (Benjamini-Hochberg method).\n"}

r if("pora_kegg_up" %in% names(results)){"KEGG pathways"}

r if("pora_kegg_up" %in% names(results)){if(all(sapply(results$pora_kegg_up, is.null))){"No significantly upregulated KEGG pathways found."}} r if("pora_kegg_dn" %in% names(results)){if(all(sapply(results$pora_kegg_dn, is.null))){"No significantly downregulated KEGG pathways found."}}

w_pora <- 11
h_pora <- 7
if ("pora_kegg_up" %in% names(results)) {
  for (i in 1:length(contrasts)) {
    if(!is.null(pora_kegg_up[[i]]) && !(nrow(pora_kegg_up[[i]])) == 0){
      prot.plot_enrichment(
        pora_kegg_up[[i]],
        title = paste0(
          "Upregulated pathways",
          if_else(pora_kegg_up[[i]]@ontology == "KEGG", " - KEGG", "")
        ),
        subtitle =  str_replace(contrasts[i], "_vs_", " vs. "),
        plot = T,
        export = F,
        kegg=T
      )
      if (nrow(as.data.frame(pora_kegg_up[[i]])) > 1) {
        cat("\n")
        print(prot.plot_upset(pora_kegg_up[[i]], line.size = 0.9, mb.ratio = c(
          (1-((15+10*nrow(as.data.frame(pora_kegg_up[[i]]))^(1/2.4))/100)), 
          ((15+10*nrow(as.data.frame(pora_kegg_up[[i]]))^(1/2.4))/100))) 
              )
      }
      cat("\n\n\\pagebreak\n")
    } else {
      cat(
          paste0("No significantly upregulated KEGG pathways found for contrast:\n",
                 contrasts[i], "\n")
          )
      cat("\n\n\\pagebreak\n")
    }
    if(!is.null(pora_kegg_dn[[i]]) && !(nrow(pora_kegg_dn[[i]])) == 0){
      prot.plot_enrichment(
        pora_kegg_dn[[i]],
        title = paste0(
          "Downregulated pathways",
          if_else(pora_kegg_dn[[i]]@ontology == "KEGG", " - KEGG", "")
        ),
        subtitle = str_replace(contrasts[i], "_vs_", " vs. "),
        plot = T,
        export = F,
        kegg=T
      )
      if (nrow(as.data.frame(pora_kegg_dn[[i]])) > 1) {
        cat("\n")
        print(prot.plot_upset(pora_kegg_dn[[i]], line.size = 0.9, mb.ratio = c(
          (1-((15+10*nrow(as.data.frame(pora_kegg_dn[[i]]))^(1/2.4))/100)), 
          ((15+10*nrow(as.data.frame(pora_kegg_dn[[i]]))^(1/2.4))/100))) 
              )
      }
      cat("\n\n\\pagebreak\n")
    } else {
      cat(
          paste0("No significantly downregulated KEGG pathways found for contrast:\n",
                 contrasts[i], "\n")
          )
      cat("\n\n\\pagebreak\n")
    }
  }
}

r if("pora_custom_up" %in% names(results)){"Custom pathways"}

r if("pora_custom_up" %in% names(results)) {if(all(sapply(results$pora_custom_up, is.null))){"No significantly upregulated custom pathways found."}} r if("pora_custom_dn" %in% names(results)) {if(all(sapply(results$pora_custom_up, is.null))){"No significantly downregulated custom pathways found."}}

if("pora_custom_up" %in% names(results)) {
  for (i in 1:length(contrasts)) {
    if (!(nrow(as.data.frame(pora_custom_up[[i]])) == 0)) {
      prot.plot_enrichment(
        pora_custom_up[[i]],
        title = paste0(
          "Upregulated pathways",
          if_else(pora_custom_up[[i]]@ontology == "KEGG", " - KEGG", "")
        ),
        subtitle =  str_replace(contrasts[i], "_vs_", " vs. "),
        plot = T,
        export = F,
        kegg=F
      )
      if (nrow(as.data.frame(pora_custom_up[[i]])) > 1) {
        cat("\n")
        print(prot.plot_upset(pora_custom_up[[i]], line.size = 0.9, mb.ratio = c(
          (1-((15+10*nrow(as.data.frame(pora_custom_up[[i]]))^(1/2.4))/100)), 
          ((15+10*nrow(as.data.frame(pora_custom_up[[i]]))^(1/2.4))/100))) 
              )
      }
      cat("\n\n\\pagebreak\n")
    } else {
      cat(
          paste0("No significantly upregulated custom pathways found for contrast:\n",
                 contrasts[i], "\n")
          )
      cat("\n\n\\pagebreak\n")
    }
    if (!(nrow(as.data.frame(pora_custom_dn[[i]])) == 0)) {
      prot.plot_enrichment(
        pora_custom_dn[[i]],
        title = paste0(
          "Downregulated pathways",
          if_else(pora_custom_dn[[i]]@ontology == "KEGG", " - KEGG", "")
        ),
        subtitle = str_replace(contrasts[i], "_vs_", " vs. "),
        plot = T,
        export = F,
        kegg=F
      )
      if (nrow(as.data.frame(pora_custom_dn[[i]])) > 1) {
        cat("\n")
        print(prot.plot_upset(pora_custom_dn[[i]], line.size = 0.9, mb.ratio = c(
          (1-((15+10*nrow(as.data.frame(pora_custom_dn[[i]]))^(1/2.4))/100)),
          ((15+10*nrow(as.data.frame(pora_custom_dn[[i]]))^(1/2.4))/100)
          )) 
              )
      }
      cat("\n\n\\pagebreak\n")
    } else {
      cat(
          paste0("No significantly downregulated custom pathways found for contrast:\n",
                 contrasts[i], "\n")
          )
      cat("\n\n\\pagebreak\n")
    }
  }
}


NicWir/VisomX documentation built on Dec. 8, 2024, 1:27 a.m.