Quantile normalization is one of the most widely used multi-sample normalization tools for the analysis of noisy high-throughput data. Although it was originally developed for gene expression microarrays it is now used across many different high-throughput applications including RNAseq and ChIPseq. However, quantile normalization relies on assumptions about the data generation process that are not appropriate in some context. Unfortunately, no method exists to check for the appropriateness of these assumptions.

For example in gene expression, we assume that observed differences between the distributions of each sample are due to only technical variation unrelated to biological variation. To normalize the samples, the distributions are forced to be the same. In general, this assumption is justified as only a minority of genes are expected to be differentially expressed between samples, but if the samples are expected to have a high percentage of global differences, it may not be appropriate to use quantile normalization as it may remove interesting global biological variation.

The **quantro** R-package can be used to test a priori to the data analysis whether global normalization methods such as quantile normalization should be applied. Our method uses the raw unprocessed high-throughput data to test for global differences in the distributions across a set of groups.

For help with the **quantro** R-package, there is a vignette available in the /vignettes folder.

The R-package **quantro** can be installed from the Bioconductor

```
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("quantro")
```

After installation, the package can be loaded into R.

```
library(quantro)
```

The main function in the **quantro** package is `quantro()`

. The `quantro()`

function needs two objects: (1) a data frame containing the samples to test for differences between their distributions with observations (rows) and samples (columns) (e.g. let's call it `mySamps`

) and (2) a group level factor called `groupFactor`

(let's call it `outcome`

). This order of this factor variable must match the order of the columns in the `mySamps`

object because it contains information about which group each sample is from.

To run the `quantro()`

function,

```
qtest <- quantro(object = mySamps, groupFactor = outcome)
qtest
```

Individual slots can be extracted using accessor methods:

```
summary(qtest)
quantroStat(qtest)
```

A permutation test is performed to assess the statistical significance of the test statistic `quantroStat`

from `quantro()`

.

`quantro()`

include:Element | Description
--------|------------
`summary`

| A list that contains (1) number of groups (`nGroups`

), (2) total number of samples (`nTotSamples`

) (3) number of samples in each group (`nSamplesinGroups`

)
`anova`

| ANOVA to test if the average medians of the distributions are different across groups
`MSbetween`

| mean squared error between groups
`MSwithin`

| mean squared error within groups
`quantroStat`

| test statistic which is a ratio of the mean squared error between groups of distributions to the mean squared error within groups of distributions
`quantroStatPerm`

| If `B`

is not equal to 0, then a permutation test was performed to assess the statistical significance of `quantroStat`

. These are the test statistics resulting from the permuted samples
`quantroPvalPerm`

| If `B`

is not equal to 0, then this is the $p$-value associated with the proportion of times the test statistics (`quantroStatPerm`

) resulting from the permuted samples were larger than `quantroStat`

There is a second function in the package called `quantroPlot()`

which will plot the results from the permutation testing. The plot is a histogram of the test statistics `quantroStatPerm`

from the permuted samples from `quantro()`

and the red line is the observed test statistic `quantroStat`

from `quantro()`

.

```
qtest <- quantro(object = mySamps, groupFactor = outcome)
quantroPlot(qtest)
```

Additional options in the `quantroPlot()`

function include:

Element | Description --------|------------- xLab | the x-axis label yLab | the y-axis label mainLab | title of the histogram binWidth | change the binwidth

Report bugs as issues on the GitHub repository

stephaniehicks/quantro documentation built on Aug. 9, 2019, 5:10 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.