NanoStringClustR enables users to quickly and easily assess the performane of mutliple normalisation methods on nanostring nCounter data. NanoStringClustR performs nCounter scaling factor based normalisations using spike-in controls and housekeeping genes and uses wrappers for geNorm, variance stabilising normalisation (vsn), cyclic loess, quantile and RUV-III normalisation. A combination of a cluster validity index and Relative Log Expression are used to rank normalisations. NanoStringClustR also enables the effect of normalization on differential gene expression to be assessed by implementing a wrapper for limma. NanoStringClustR currently supports NanoString nCounter mRNA and miRNA data, although it has only been tested with mRNA data.
NanoStringClustR contains 4 main functions:
count_set() generate a count_set summarising a NanoString experimentmulti_norm() perform normalisations and output diagnostic plots norm_rank() rank normalisationsmulti_diff() perform differential gene expression analysis of all pairwise combinations on normalized datasetsFirst, install and load the NanoStringClustR library and example dataset. NanoStringClustR uses the R package SummarizedExperiment to hold NanoString count data, so load this package too.
library(NanoStringClustR) data("Rnf5") library(SummarizedExperiment)
A count_set is a SummarizedExperiment that holds NanoString count data and sample annotations. To build a count_set, provide:
First, define sample annotations
# biological groups rnf5_group <- c(rep("WT", 5), rep("KO", 5)) # sample ids rnf5_sampleid <- c("GSM3638131", "GSM3638132", "GSM3638133", "GSM3638134", "GSM3638135", "GSM3638136", "GSM3638137", "GSM3638138", "GSM3638139", "GSM3638140")
Second, build a count_set
# for this example, we will use in-package Rnf5 dataset as an example: rnf5_count_set <- count_set(count_data = Rnf5, group = rnf5_group, samp_id = rnf5_sampleid)
You can generate count_set with a file path to a csv generated by nSolver RCC Collector Tool Format Export:
# e.g. rnf5_count_set <- count_set(rccexp_dir = "~/path/to/file.csv", #group = rnf5_group, #samp_id = rnf_sampleid)
Adding output_log = "~/Dropbox/NanoStringCountR/NanoStringCountR/raw_data/") will save the se.R
Then, you can load an existing summarizedExperiment from an se object in R, or a full file path to a saved se.R
# e.g. rnf5_count_set <- count_set(count_set = rnf5_count_set, #group = group, #batch = batch, samp_id = samp_id) # e.g. rnf5_count_set <- count_set(count_se = "~/path/to/se.R")
The count_set can be accessed by functions in the SummarizedExperiment package
rnf5_count_set
multi_norm() performs the following types of normalisation
A. Optional Pre-processing. Choose which pre-processing method you would prefer.
background_correct"mean2sd", "proportional", "none"count_threshold"mean2sd" of the negative controls
or any number from 0 - infpositive_control_scalingTRUE/FALSEB. Count Normalisations. multi_norm() performs all normalizations automatically.
housekeeping_scaled geNorm_housekeeping geNorm_n stably expressed housekeeping genes selected by geNormall_endogenous_scaled loess vsn quantileruv multi_norm() returns a summarized experiment with the normalized counts as assays.
Diagnostic plots will be saved if a plot_dir is provided e.g. plot_dir = "~/full/path/to/my/plots/"
rnf5_count_set_norm <- multi_norm(count_set = rnf5_count_set, positive_control_scaling = TRUE, background_correct = "mean2sd")
Log2 transformed, normalised data are returned in a normalized count_set as assays and can be accessed here:
#list types of normalisations names(assays(rnf5_count_set_norm)) #access normalisations #assays(rnf5_count_set_norm)$housekeeping_scaled
To access log2 transformed counts, use
#assays(rnf5_count_set_norm)$counts
norm_rank() performs cluster based ranking of normalisation methods using Generalized Dunn Index between groups and sum RLE variation.
rnf5_eval <- norm_rank(count_set = rnf5_count_set_norm)
norm_rank() returns a dataframe ranks. Lower ranking normalizations are rated better.
rnf5_eval
multi_diff() performs differential gene expression analysis on all possible pairs of groups defined in the count_set. The threshold for significantly differentially expressed genes is defined by p_cut_off and logFC_cut_off.
rnf5_multi_diff <- multi_diff(count_set = rnf5_count_set_norm, adj_method = "fdr", p_cut_off = 0.05, logFC_cut_off = 0)
multi_diff() will return a list with:
rnf5_multi_diff$plot_DEG
rnf5_multi_diff$overlap_DEG
rnf5_multi_diff$summary_DEG
#rnf5_multi_diff$full_result$NAME_OF_NORM_METHOD e.g. head(rnf5_multi_diff$results_DEG$housekeeping_scaled$`KO - WT`)
For more information, see the topTable function from the limma R package.
NanoStingClustR supports differential gene expression with pairing, for example an experiment where samples have been taken in the same person, before and after treatment. For this example, we will consider each WT and KO sample to be paired.
First, add pairing information to the normalized count_set.
colData(rnf5_count_set_norm)$pair <- as.factor(c("pair1", "pair2", "pair3", "pair4", "pair5", "pair1", "pair2", "pair3", "pair4", "pair5"))
Second, run multi_diff with pairing = "paired"
rnf5_multi_diff_paired <- multi_diff(count_set = rnf5_count_set_norm, adj_method = "fdr", p_cut_off = 0.05, logFC_cut_off = 0, pairing = "paired")
rnf5_multi_diff_paired$plot_DEG
If technical replicates are present, multi_norm will perform RUV normalisation by RUV-III. NanoStringClustR defines technical replicates by the sample ID in the samp_id slot of the count_set object. Technical (or pseudo) replicates should have the same name. For this example, we will consider the first 2 WT samples to be technical replicates. Currently, only one factor of variation is determined (k = 1).
For more information on using RUV-III with NanoString data, see: Molania R, Gagnon-Bartsch JA, Dobrovic A, et al. A new normalization for Nanostring nCounter gene expression data. Nucleic Acids Res. 2019 May 22. doi: 10.1093/nar/gkz433. PubMed PMID: 31114909
First define technical replicates in the un-normalized count_set.
rnf5_count_set$samp_id <- c("techrep_1", "techrep_1", "GSM3638133", "GSM3638134", "GSM3638135", "GSM3638136", "GSM3638137", "GSM3638138", "GSM3638139", "GSM3638140")
Run multi_norm with the count_set containing the technical replicate info in the $samp_id slot
rnf5_ruv_count_set_norm <- multi_norm(count_set = rnf5_count_set, positive_control_scaling = TRUE, background_correct = "mean2sd")
Running norm_rank and multi_diff will now include ruvIII
rnf5_ruv_eval <- norm_rank(count_set = rnf5_ruv_count_set_norm) rnf5_ruv_eval
rnf5_ruv_multi_diff <- multi_diff(count_set = rnf5_ruv_count_set_norm, adj_method = "fdr", p_cut_off = 0.05, logFC_cut_off = 0, pairing = "unpaired")
When using this package, please cite NanoStringClustR as follows:
citation("NanoStringClustR")
Please also cite all methods used.
If you use multi_norm, cite:
citation("vsn") citation("affy") citation("ruv") citation("preprocessCore")
If you use norm_rank, cite:
citation("clv")
If you use multi_diff, cite:
citation("limma") citation("UpSetR")
Please also check reference suggestions for each package.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.