Introduction to NanoStringClustR

NanoStringClustR enables users to quickly and easily assess the performane of mutliple normalisation methods on nanostring nCounter data. NanoStringClustR performs nCounter scaling factor based normalisations using spike-in controls and housekeeping genes and uses wrappers for geNorm, variance stabilising normalisation (vsn), cyclic loess, quantile and RUV-III normalisation. A combination of a cluster validity index and Relative Log Expression are used to rank normalisations. NanoStringClustR also enables the effect of normalization on differential gene expression to be assessed by implementing a wrapper for limma. NanoStringClustR currently supports NanoString nCounter mRNA and miRNA data, although it has only been tested with mRNA data.

NanoStringClustR contains 4 main functions:

NanoStringClustR examples

First, install and load the NanoStringClustR library and example dataset. NanoStringClustR uses the R package SummarizedExperiment to hold NanoString count data, so load this package too.

library(NanoStringClustR)
data("Rnf5")

library(SummarizedExperiment)

Generating a count_set

A count_set is a SummarizedExperiment that holds NanoString count data and sample annotations. To build a count_set, provide:

First, define sample annotations

# biological groups
rnf5_group <- c(rep("WT", 5), rep("KO", 5))

# sample ids
rnf5_sampleid <- c("GSM3638131", "GSM3638132", "GSM3638133", "GSM3638134", "GSM3638135", "GSM3638136", "GSM3638137", "GSM3638138", "GSM3638139", "GSM3638140")

Second, build a count_set

# for this example, we will use in-package Rnf5 dataset as an example:
rnf5_count_set <- count_set(count_data = Rnf5, 
                            group = rnf5_group,
                            samp_id = rnf5_sampleid)

You can generate count_set with a file path to a csv generated by nSolver RCC Collector Tool Format Export:

# e.g. rnf5_count_set <- count_set(rccexp_dir = "~/path/to/file.csv",
                            #group = rnf5_group, 
                           #samp_id = rnf_sampleid)

Adding output_log = "~/Dropbox/NanoStringCountR/NanoStringCountR/raw_data/") will save the se.R Then, you can load an existing summarizedExperiment from an se object in R, or a full file path to a saved se.R

# e.g. rnf5_count_set <- count_set(count_set = rnf5_count_set, 
                                  #group = group, 
                                  #batch = batch, samp_id = samp_id)

# e.g. rnf5_count_set <- count_set(count_se = "~/path/to/se.R")

The count_set can be accessed by functions in the SummarizedExperiment package

rnf5_count_set

Normalisation

multi_norm() performs the following types of normalisation

A. Optional Pre-processing. Choose which pre-processing method you would prefer.

  1. Background correction background_correct
    • select a method for background substitution.
    • options are "mean2sd", "proportional", "none"
  2. Count threshold count_threshold
    • select a count threshold. Options are "mean2sd" of the negative controls or any number from 0 - inf
  3. Positive control scaling positive_control_scaling
    • options are TRUE/FALSE

B. Count Normalisations. multi_norm() performs all normalizations automatically.

  1. housekeeping_scaled
    • Scaling factor normalisation based on the geometric mean of all housekeeping genes
  2. geNorm_housekeeping
    • Scaling factor normalisation based on the geometric mean of geNorm_n stably expressed housekeeping genes selected by geNorm
  3. all_endogenous_scaled
    • Scaling factor normalisation based on the geometric mean of all endogenous + housekeeping genes
  4. loess
    • Cyclic Loess normalization on all endogenous + housekeeping genes
  5. vsn
    • VSN on all endogenous + housekeeping genes
  6. quantile
    • Quantile normalization on all endogenous + housekeeping genes
  7. ruv
    • RUVIII normalization (if replicates or pseudoreplicates are defined) on all endogenous + housekeeping genes

multi_norm() returns a summarized experiment with the normalized counts as assays. Diagnostic plots will be saved if a plot_dir is provided e.g. plot_dir = "~/full/path/to/my/plots/"

rnf5_count_set_norm <- multi_norm(count_set = rnf5_count_set, 
                                  positive_control_scaling = TRUE, 
                                  background_correct = "mean2sd")

Log2 transformed, normalised data are returned in a normalized count_set as assays and can be accessed here:

#list types of normalisations
names(assays(rnf5_count_set_norm))

#access normalisations
#assays(rnf5_count_set_norm)$housekeeping_scaled

To access log2 transformed counts, use

#assays(rnf5_count_set_norm)$counts

Evaluating normalisations

norm_rank() performs cluster based ranking of normalisation methods using Generalized Dunn Index between groups and sum RLE variation.

rnf5_eval <- norm_rank(count_set = rnf5_count_set_norm)

norm_rank() returns a dataframe ranks. Lower ranking normalizations are rated better.

rnf5_eval

Differential Gene Expression

multi_diff() performs differential gene expression analysis on all possible pairs of groups defined in the count_set. The threshold for significantly differentially expressed genes is defined by p_cut_off and logFC_cut_off.

rnf5_multi_diff <- multi_diff(count_set = rnf5_count_set_norm, 
                              adj_method = "fdr", 
                              p_cut_off = 0.05, 
                              logFC_cut_off = 0)

multi_diff() will return a list with:

rnf5_multi_diff$plot_DEG
rnf5_multi_diff$overlap_DEG
rnf5_multi_diff$summary_DEG
#rnf5_multi_diff$full_result$NAME_OF_NORM_METHOD e.g.
head(rnf5_multi_diff$results_DEG$housekeeping_scaled$`KO - WT`) 

For more information, see the topTable function from the limma R package.

Paired data

NanoStingClustR supports differential gene expression with pairing, for example an experiment where samples have been taken in the same person, before and after treatment. For this example, we will consider each WT and KO sample to be paired.

First, add pairing information to the normalized count_set.

colData(rnf5_count_set_norm)$pair <- as.factor(c("pair1", "pair2", "pair3", "pair4", "pair5",
                                                 "pair1", "pair2", "pair3", "pair4", "pair5"))

Second, run multi_diff with pairing = "paired"

rnf5_multi_diff_paired <- multi_diff(count_set = rnf5_count_set_norm, 
                                     adj_method = "fdr", 
                                     p_cut_off = 0.05, 
                                     logFC_cut_off = 0,
                                     pairing = "paired")
rnf5_multi_diff_paired$plot_DEG

Removing unwanted variation III (RUV-III)

If technical replicates are present, multi_norm will perform RUV normalisation by RUV-III. NanoStringClustR defines technical replicates by the sample ID in the samp_id slot of the count_set object. Technical (or pseudo) replicates should have the same name. For this example, we will consider the first 2 WT samples to be technical replicates. Currently, only one factor of variation is determined (k = 1).

For more information on using RUV-III with NanoString data, see: Molania R, Gagnon-Bartsch JA, Dobrovic A, et al. A new normalization for Nanostring nCounter gene expression data. Nucleic Acids Res. 2019 May 22. doi: 10.1093/nar/gkz433. PubMed PMID: 31114909

First define technical replicates in the un-normalized count_set.

rnf5_count_set$samp_id <- c("techrep_1", "techrep_1", "GSM3638133", "GSM3638134", "GSM3638135",
                            "GSM3638136", "GSM3638137", "GSM3638138", "GSM3638139", "GSM3638140")

Run multi_norm with the count_set containing the technical replicate info in the $samp_id slot

rnf5_ruv_count_set_norm <- multi_norm(count_set = rnf5_count_set, 
                                      positive_control_scaling = TRUE, 
                                      background_correct = "mean2sd")

Running norm_rank and multi_diff will now include ruvIII

rnf5_ruv_eval <- norm_rank(count_set = rnf5_ruv_count_set_norm)
rnf5_ruv_eval
rnf5_ruv_multi_diff <- multi_diff(count_set = rnf5_ruv_count_set_norm, 
                                     adj_method = "fdr", 
                                     p_cut_off = 0.05, 
                                     logFC_cut_off = 0,
                                     pairing = "unpaired")

References

When using this package, please cite NanoStringClustR as follows:

citation("NanoStringClustR")

Please also cite all methods used.

If you use multi_norm, cite:

citation("vsn")
citation("affy")
citation("ruv")
citation("preprocessCore")

If you use norm_rank, cite:

citation("clv")

If you use multi_diff, cite:

citation("limma")
citation("UpSetR")

Please also check reference suggestions for each package.



MarthaCooper/NanoStringClustR documentation built on June 25, 2021, 9:41 p.m.