In MarthaCooper/NanoStringClustR: NanoString data normalization and differential gene expression analysis

Introduction to NanoStringClustR

NanoStringClustR enables users to quickly and easily assess the performane of mutliple normalisation methods on nanostring nCounter data. NanoStringClustR performs nCounter scaling factor based normalisations using spike-in controls and housekeeping genes and uses wrappers for geNorm, variance stabilising normalisation (vsn), cyclic loess, quantile and RUV-III normalisation. A combination of a cluster validity index and Relative Log Expression are used to rank normalisations. NanoStringClustR also enables the effect of normalization on differential gene expression to be assessed by implementing a wrapper for limma. NanoStringClustR currently supports NanoString nCounter mRNA and miRNA data, although it has only been tested with mRNA data.

NanoStringClustR contains 4 main functions:

count_set() generate a count_set summarising a NanoString experiment
multi_norm() perform normalisations and output diagnostic plots
norm_rank() rank normalisations
multi_diff() perform differential gene expression analysis of all pairwise combinations on normalized datasets

NanoStringClustR examples

First, install and load the NanoStringClustR library and example dataset. NanoStringClustR uses the R package SummarizedExperiment to hold NanoString count data, so load this package too.

library(NanoStringClustR)
data("Rnf5")

library(SummarizedExperiment)

Generating a count_set

A count_set is a SummarizedExperiment that holds NanoString count data and sample annotations. To build a count_set, provide:

The full path to the csv file generated by nSolver RCC Collector Tool Format Export
Sample annotations as vectors or factors that descibe the sample IDs, biological groupings, batches (i.e. nanostring chip ID) and pairing, if paired.

First, define sample annotations

# biological groups
rnf5_group <- c(rep("WT", 5), rep("KO", 5))

# sample ids
rnf5_sampleid <- c("GSM3638131", "GSM3638132", "GSM3638133", "GSM3638134", "GSM3638135", "GSM3638136", "GSM3638137", "GSM3638138", "GSM3638139", "GSM3638140")

Second, build a count_set

# for this example, we will use in-package Rnf5 dataset as an example:
rnf5_count_set <- count_set(count_data = Rnf5, 
                            group = rnf5_group,
                            samp_id = rnf5_sampleid)

You can generate count_set with a file path to a csv generated by nSolver RCC Collector Tool Format Export:

# e.g. rnf5_count_set <- count_set(rccexp_dir = "~/path/to/file.csv",
                            #group = rnf5_group, 
                           #samp_id = rnf_sampleid)

Adding output_log = "~/Dropbox/NanoStringCountR/NanoStringCountR/raw_data/") will save the se.R Then, you can load an existing summarizedExperiment from an se object in R, or a full file path to a saved se.R

# e.g. rnf5_count_set <- count_set(count_set = rnf5_count_set, 
                                  #group = group, 
                                  #batch = batch, samp_id = samp_id)

# e.g. rnf5_count_set <- count_set(count_se = "~/path/to/se.R")

The count_set can be accessed by functions in the SummarizedExperiment package

rnf5_count_set

Normalisation

multi_norm() performs the following types of normalisation

A. Optional Pre-processing. Choose which pre-processing method you would prefer.

Background correction background_correct
- select a method for background substitution.
- options are "mean2sd", "proportional", "none"
Count threshold count_threshold
- select a count threshold. Options are "mean2sd" of the negative controls or any number from 0 - inf
Positive control scaling positive_control_scaling
- options are TRUE/FALSE

B. Count Normalisations. multi_norm() performs all normalizations automatically.

housekeeping_scaled
- Scaling factor normalisation based on the geometric mean of all housekeeping genes
geNorm_housekeeping
- Scaling factor normalisation based on the geometric mean of geNorm_n stably expressed housekeeping genes selected by geNorm
all_endogenous_scaled
- Scaling factor normalisation based on the geometric mean of all endogenous + housekeeping genes
loess
- Cyclic Loess normalization on all endogenous + housekeeping genes
vsn
- VSN on all endogenous + housekeeping genes
quantile
- Quantile normalization on all endogenous + housekeeping genes
ruv
- RUVIII normalization (if replicates or pseudoreplicates are defined) on all endogenous + housekeeping genes

multi_norm() returns a summarized experiment with the normalized counts as assays. Diagnostic plots will be saved if a plot_dir is provided e.g. plot_dir = "~/full/path/to/my/plots/"

rnf5_count_set_norm <- multi_norm(count_set = rnf5_count_set, 
                                  positive_control_scaling = TRUE, 
                                  background_correct = "mean2sd")

Log2 transformed, normalised data are returned in a normalized count_set as assays and can be accessed here:

#list types of normalisations
names(assays(rnf5_count_set_norm))

#access normalisations
#assays(rnf5_count_set_norm)$housekeeping_scaled

To access log2 transformed counts, use

#assays(rnf5_count_set_norm)$counts

Evaluating normalisations

norm_rank() performs cluster based ranking of normalisation methods using Generalized Dunn Index between groups and sum RLE variation.

rnf5_eval <- norm_rank(count_set = rnf5_count_set_norm)

norm_rank() returns a dataframe ranks. Lower ranking normalizations are rated better.

rnf5_eval

Differential Gene Expression

multi_diff() performs differential gene expression analysis on all possible pairs of groups defined in the count_set. The threshold for significantly differentially expressed genes is defined by p_cut_off and logFC_cut_off.

rnf5_multi_diff <- multi_diff(count_set = rnf5_count_set_norm, 
                              adj_method = "fdr", 
                              p_cut_off = 0.05, 
                              logFC_cut_off = 0)

multi_diff() will return a list with:

A plot comparing numbers of differentially expressed genes per normalisation method

rnf5_multi_diff$plot_DEG

UpSet plots showing the overlap of differentially expressed genes per normalisation method. There will be one UpSet plot per pairwise comparison (contrast).

rnf5_multi_diff$overlap_DEG

A summary table showing the number of differentially expressed genes per normalization method for each pairwise comparison.

rnf5_multi_diff$summary_DEG

To access full differential gene expression results, use

#rnf5_multi_diff$full_result$NAME_OF_NORM_METHOD e.g.
head(rnf5_multi_diff$results_DEG$housekeeping_scaled$`KO - WT`)

For more information, see the topTable function from the limma R package.

Paired data

NanoStingClustR supports differential gene expression with pairing, for example an experiment where samples have been taken in the same person, before and after treatment. For this example, we will consider each WT and KO sample to be paired.

First, add pairing information to the normalized count_set.

colData(rnf5_count_set_norm)$pair <- as.factor(c("pair1", "pair2", "pair3", "pair4", "pair5",
                                                 "pair1", "pair2", "pair3", "pair4", "pair5"))

Second, run multi_diff with pairing = "paired"

rnf5_multi_diff_paired <- multi_diff(count_set = rnf5_count_set_norm, 
                                     adj_method = "fdr", 
                                     p_cut_off = 0.05, 
                                     logFC_cut_off = 0,
                                     pairing = "paired")

rnf5_multi_diff_paired$plot_DEG

Removing unwanted variation III (RUV-III)

If technical replicates are present, multi_norm will perform RUV normalisation by RUV-III. NanoStringClustR defines technical replicates by the sample ID in the samp_id slot of the count_set object. Technical (or pseudo) replicates should have the same name. For this example, we will consider the first 2 WT samples to be technical replicates. Currently, only one factor of variation is determined (k = 1).

For more information on using RUV-III with NanoString data, see: Molania R, Gagnon-Bartsch JA, Dobrovic A, et al. A new normalization for Nanostring nCounter gene expression data. Nucleic Acids Res. 2019 May 22. doi: 10.1093/nar/gkz433. PubMed PMID: 31114909

First define technical replicates in the un-normalized count_set.

rnf5_count_set$samp_id <- c("techrep_1", "techrep_1", "GSM3638133", "GSM3638134", "GSM3638135",
                            "GSM3638136", "GSM3638137", "GSM3638138", "GSM3638139", "GSM3638140")

Run multi_norm with the count_set containing the technical replicate info in the $samp_id slot

rnf5_ruv_count_set_norm <- multi_norm(count_set = rnf5_count_set, 
                                      positive_control_scaling = TRUE, 
                                      background_correct = "mean2sd")

Running norm_rank and multi_diff will now include ruvIII

rnf5_ruv_eval <- norm_rank(count_set = rnf5_ruv_count_set_norm)
rnf5_ruv_eval

rnf5_ruv_multi_diff <- multi_diff(count_set = rnf5_ruv_count_set_norm, 
                                     adj_method = "fdr", 
                                     p_cut_off = 0.05, 
                                     logFC_cut_off = 0,
                                     pairing = "unpaired")

References

When using this package, please cite NanoStringClustR as follows:

citation("NanoStringClustR")

Please also cite all methods used.

If you use multi_norm, cite:

citation("vsn")
citation("affy")
citation("ruv")
citation("preprocessCore")

If you use norm_rank, cite:

citation("clv")

If you use multi_diff, cite:

citation("limma")
citation("UpSetR")

Please also check reference suggestions for each package.

MarthaCooper/NanoStringClustR documentation built on June 25, 2021, 9:41 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MarthaCooper/NanoStringClustR
NanoString data normalization and differential gene expression analysis

In MarthaCooper/NanoStringClustR: NanoString data normalization and differential gene expression analysis

Introduction to NanoStringClustR

NanoStringClustR examples

Generating a count_set

Normalisation

Evaluating normalisations

Differential Gene Expression

Paired data

Removing unwanted variation III (RUV-III)

References

R Package Documentation

Browse R Packages

We want your feedback!

MarthaCooper/NanoStringClustR NanoString data normalization and differential gene expression analysis

In MarthaCooper/NanoStringClustR: NanoString data normalization and differential gene expression analysis

Introduction to NanoStringClustR

NanoStringClustR examples

Generating a count_set

Normalisation

Evaluating normalisations

Differential Gene Expression

Paired data

Removing unwanted variation III (RUV-III)

References

R Package Documentation

Browse R Packages

We want your feedback!

MarthaCooper/NanoStringClustR
NanoString data normalization and differential gene expression analysis