knitr::opts_chunk$set(
  comment = "#>", collapse = TRUE, message = FALSE, warning = FALSE,
  out.width="60%", out.height="60%", fig.align="center"
)

Setup

In this vignette, we benchmarked Figure 3D of the MultiPLIER paper: Neutrophil count of NARES samples were estimated by MCPcounter and plotted against RAV1551-assigned sample scores.

Load packages

if (!"GenomicSuperSignaturePaper" %in% installed.packages())
    devtools::install_github("shbrief/GenomicSuperSignaturePaper")

suppressPackageStartupMessages({
  library(GenomicSuperSignature)
  library(GenomicSuperSignaturePaper)
  library(dplyr)
  library(RColorBrewer)
  library(ComplexHeatmap)
})

RAVmodel

To directly compare our new analysis with the results from MultiPLIER paper, we used the RAVmodel annotated with the same priors as MultiPLER: bloodCellMarkersIRISDMAP, svmMarkers, and canonicalPathways.

## If GenomicSuperSignaturePaper is built locally with RAVmodel in inst/extdata
data.dir <- system.file("extdata", package = "GenomicSuperSignaturePaper")
RAVmodel <- readRDS(file.path(data.dir, "RAVmodel_PLIERpriors.rds"))
RAVmodel <- getModel("PLIERpriors", load=TRUE)
RAVmodel

version(RAVmodel)

NARES Nasal Brushings

Expression data

NARES (Grayson et al., 2015) is a nasal brushing microarray dataset that includes patients with ANCA-associated vasculitis, patients with sarcoidosis and healthy controls among other groups. In the MultiPLIER paper, NARES data was projected into the MultiPLIER latent space.

exprs <- readr::read_tsv("data/NARES_SCANfast_ComBat_with_GeneSymbol.pcl") %>% 
  as.data.frame
rownames(exprs) <- exprs$GeneSymbol
dataset <- as.matrix(exprs[,3:ncol(exprs)])

dataset[1:2, 1:4]
demo <- read.table("data/NARES_demographic_data.tsv", sep = "\t", header = TRUE)
demo[1:3, 1:3]
colnames(demo)

MCPcounter

NARES data doesn't have any information on the cell type composition of the samples. So in the MultiPLIER paper, the authors applied MCPcounter) (Becht et al., Genome Biology, 2016) , a method for estimating cell type abundance in solid tissues. As MultiPLIER paper, we tested whether the neutrophil-associated RAV is well-correlated with the neutrophil estimates of the NARES dataset.


* Install MCPcounter

devtools::install_github("ebecht/MCPcounter", 
                         ref = "a79614eee002c88c64725d69140c7653e7c379b4",
                         subdir = "Source",
                         dependencies = TRUE)
mcp.results <- MCPcounter::MCPcounter.estimate(exprs.mat, 
                                               featuresType = "HUGO_symbols")
mcp.melt <- reshape2::melt(mcp.results, 
                           varnames = c("Cell_type", "Sample"),
                           value.name = "MCP_estimate")
readr::write_tsv(mcp.melt, "data/NARES_ComBat_MCPCounter_results_tidy.tsv")
mcp.melt <- read.table("data/NARES_ComBat_MCPCounter_results_tidy.tsv", 
                       header = TRUE, sep = "\t")

Transfer Learning

Validation

RAV1551, a neutrophil-associated RAV based on SLE-WB dataset, is not found in the top 5 validated RAVs of NARES dataset, implying that neutrophil count is not the major feature of NARES dataset.

val_all <- validate(dataset, RAVmodel) 
validated_ind <- validatedSignatures(val_all, RAVmodel,
                                     num.out = 5, swCutoff = 0, 
                                     indexOnly = TRUE)
heatmapTable(val_all, RAVmodel, num.out = 5, swCutoff = 0)

RAV1551 is actually ranked as the top 14th RAV based on its validation score.

ordered <- val_all[order(val_all$score, decreasing = TRUE),]
rankInd <- which(rownames(ordered) == "RAV1551")
rankInd

ordered[rankInd,]

r-squared value

sampleScore <- calculateScore(dataset, RAVmodel)
## Only with the top 5 RAVs
sampleScoreHeatmap(score = sampleScore[,c(validated_ind, 1551)],
                   dataName = "NARES",
                   modelName = "RAVmodel",
                   show_column_names = TRUE,
                   column_names_gp = 10,
                   row_names_gp = 4)
## Remove Sample category "gene"
gene_ind <- which(mcp.melt$Sample == "Gene")
mcp.melt <- mcp.melt[-gene_ind,]

## Subset sampleScore to join with MCPcounter
sampleScore_sub <- sampleScore[, 1551] %>% as.data.frame
sampleScore_sub <- tibble::rownames_to_column(sampleScore_sub)
colnames(sampleScore_sub) <- c("Sample", "RAV1551")

## Join with MCPcounter neutrophil estimates
dat <- dplyr::filter(mcp.melt, Cell_type == "Neutrophils") %>%
  dplyr::inner_join(y = sampleScore_sub, by = "Sample")

head(dat, 3)

We plot the neutrophil estimates against RAV1551-assigned sample scores.

plot <- LVScatter(dat, "RAV1551", 
                  y.var = "MCP_estimate",
                  ylab = "MCPcounter Neutrophil Estimate",
                  title = "RAVmodel",
                  subtitle = "NARES Nasal Brushings")
plot
## This chunk was executed only to save the summary table, not for vignettes.
saveRDS(plot, "outputs/nares_neutrophil.rds")
png("outputs/png/nares_neutrophil.png", width = 400, height = 400)
plot
dev.off()

Session Info

r sessionInfo()



shbrief/PCAGenomicSignaturesPaper documentation built on July 31, 2022, 12:41 a.m.