tQN: Perform tQN normalization of intensity data.

Description Usage Arguments Details Value References

Description

Perform tQN normalization of intensity data.

Usage

1
2
tQN(gty, thresholds = c(1.5, 1.5), clusters = NULL, prenorm = TRUE,
  xynorm = TRUE, adjust.lrr = TRUE, ...)

Arguments

gty

a genotypes object

thresholds

thresholds for scaling of x- and y-intensities; defaults recommended in Staaf et al. (2008)

clusters

a pre-computed matrix of cluster means

prenorm

logical; if TRUE, perform quantile normalization on whole dataset before tQN procedure

xynorm

logical; if TRUE, perform within-sample normalization of x- vs y-intensities

adjust.lrr

logical; if TRUE, normalize post-tQN LRRs against population mean (clusters$Rmean)

...

ignored

Details

Implements thresholded quantile normalization (tQN) as described in Staaf et al. (2008). Quantile normalization as originally described in Bolstad et al. (2003) matches quantiles across multiple samples so that all samples' intensities have the same empirical distribution. The tQN instead matches the quantiles of the x- and y-intensities sample-wise in order to reduce noise in the B-allele frequency (BAF) calculation proposed by Peiffer et al. (2006). The quantile-normalized intensities are then subject to a threshold to limit on the ratio between the transformed and raw values. NB: the quality of the result of tQN depends strongly on the reference clusters provided in clusters, so beware.

The object clusters should be a dataframe with one row per marker and at least the following six columns: A.R, A.T, the values of R and theta, respectively, for the centroid of the AA homozygous cluster; B.R, B.T, likewise for the BB homozygous cluster; and H.R, H.T, likewise for the AB heterozygous cluster.

The transformations proposed by Peiffer et al. (2006) assume that most samples will fall into three well-defined clusters at each marker, save for a relatively small proportion of abberrantly-hybridizing samples. Indeed the BAF is only well-defined in this case. However, these assumptions are a bit too restrictive for many arrays, and in particular for arrays which include copy-number probes. It may be possible to obtain a tighter distribution of LRR values by choosing adjust.lrr = TRUE and re-computing the LRR values against a reference distribution independent of BAF or underlying clustering pattern. For this, an additional column Rmean is required in clusters which gives the "mean" (or other appropriately-chosen central value) of R across *all* possible clusters at this marker.

Value

A copy of the input object, with raw intensities replaced by the normalized ones. Two additional attributes baf and lrr store the BAF (B-allele frequency) and LRR (log2 intensity ratio).

References

Adapted from code provided by Johan Staaf to John Didion.

Staaf J et al. (2008) BMC Bioinformatics. doi:10.1186/1471-2105-9-409.

Bolstad BM et al. (2003) A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 19(2): 185-193.

Peiffer DA et al. (2006) Genome Res 16(9): 1136-1148. doi:10.1101/gr.5402306.

Didion JP et al. (2014) BMC Genomics. doi:10.1186/1471-2164-15-847.


andrewparkermorgan/argyle documentation built on May 10, 2019, 11:08 a.m.