Description Methods References See Also Examples
Correct gc biases in array data for arrays stored as objects of an acceptable class. The user provides the array data and a range of windows and these methods will format the data and call gcCorrectMain. This function computes the tv score for each window and corrects the arrays based upon the window with the highest tv score (i.e with window showing the most gc bias)
gcCorrect(object, ...)
:
Methods have been defined for objects of class matrix
,
BeadStudioSet
, BafLrrSet
, and BafLrrSetList
.
Objects of class BafLrrSetList
are containers for log R
ratios and B allele frequencies (BAFs) stored by chromosome. In
particular, each element in this list class contains BAFs and log
R ratios as assayData
and genomic annotation of the markers
(featureData
) for one chromosome.
The gcCorrect
method for objects of this class extracts
the log R ratios for all elements (chromosomes) in the list, and
then combines these into a single matrix. If the log R ratios
are stored as ff
objects, the samples will be
GC-corrected in chunks determined by the function
ocSamples()
. For example, if object
contains 100
samples and ocSamples(10)
was specified prior to calling
gcCorrect
, the wave correction will be performed on 10
samples at a time in order to keep RAM at an acceptable level.
When processing large numbers of samples, it can be helpful to
evaluate the tv score on a subset of the samples, pick one or
two windows for the correction, and precompute the GC content
for these windows. See the examples below for the implementation
with BafLrrSetList
objects for details. When the assay
data is represented as ff
objects, the gcCorrect
method returns the value NULL
. Note that the assay data
stored on disk will have changed as a result of calling this
method.
When object
is a matrix
of log R ratios, the
gcCorrect
method returns a matrix of wave-adjusted log R
ratios.
Additional arguments are passed to gcCorrectMain
through
the ...
operator.
Benjamini Y, Speed TP. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 2012;40:e72
BeadStudioSet
,
BafLrrSetList
, BafLrrSet
gcCorrectMain
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | ## Example with 2 iterations of correction
path <- system.file("extdata", package="ArrayTV")
load(file.path(path, "array_logratios.rda"))
nimblegen[, "M"] <- nimblegen[, "M"]/1000
increms <- c(10,1000,100e3)
wins <- c(100,10e3,1e6)
if(require(doParallel)){
## nodes may be set to the number of cpus available
nodes<-3
cl <- makeCluster(nodes)
registerDoParallel(cl)
}
## calculate tv scores in many windows and correct M values using the best window
nimcM1List <- gcCorrectMain(nimblegen[, "M",
drop=FALSE],
chr=rep("chr15",
nrow(nimblegen)),
starts=nimblegen[,"position"],
increms=increms,
maxwins=wins,build='hg18')
tvScores <- nimcM1List[['tvScore']]
winsorted <- as.numeric(rownames(tvScores)[order(tvScores,decreasing=TRUE)])
logwinsorted <- log(as.numeric(winsorted),10)
logwinsortdiff <- abs(logwinsorted[1]-logwinsorted)
## correct M values a 2nd time using a different window size
nimcM2List <- gcCorrect(nimcM1List[['correctedVals']],
chr=rep("chr15",
nrow(nimblegen)),
starts=nimblegen[,"position"],
maxwins=winsorted[logwinsortdiff>=1][1],
build='hg18')
## Refer to the vignette for details
## Example using a list container (BafLrrSetList) containing log R
## ratios and B allele frequencies
if(require(crlmm) && require(ff)){
data(cnSetExample, package="crlmm")
brList <- BafLrrSetList(cnSetExample)
is(lrr(brList)[[1]], "ff_matrix")
rold <- lrr(brList)[[1]][, 1]/100
##
## To avoid having too much data in RAM it might be
## useful to process the samples n at a time
##
## The number of samples to be processed at a time is
## set by the ocSamples function. For example, to
## process 20 samples at a time one would do
ocSamples(20)
##
## When assay data elements are ff objects, the wave
## corrected values are updated on disk. Currently,
## the data is stored is here:
filename(lrr(brList)[[1]])
##
## To avoid permanently changing the log R ratio
## values for the brList object, we copy the ff files
## to a different path and create a new BafLrrSetList
## object. This is done by the non-exported function
## "duplicateBLList". The path for the new ff objects
## is given by the function ldPath(). Here, we use a
## temp directory
ldPath(tempdir())
brList.copy <- oligoClasses:::duplicateBLList(brList, ids=sampleNames(brList))
filename(lrr(brList.copy)[[1]])
##
## wave correct the log R ratios
##
gcCorrect(brList.copy, increms=c(12, 10e3),
maxwins=c(12, 10e3))
##
##
rupdated <- lrr(brList.copy)[[1]][, 1]/100
## note that rold and rupdated are no longer the same
## since the log R ratios were updated in the
## brList.copy container
plot(rupdated, rold, pch=".")
##
## Remarks on efficiency
##
## If a large number of samples are to be processed,
## the most efficient procedure is to settle on an
## appropriate window size for gc correction using a
## subset of the dataset (e.g., 20 samples). See the
## vignette for details. Here, we assume that two
## windows were already selected using such a
## procedure (here, 12 bp and 10,000 bp) and the goal
## is to efficiently do wave correction using these
## two windows for all the samples in a large study.
## Currently, our recommended approach is to first
## calculate the gc content for these windows:
gc.matrix <- ArrayTV:::computeGC(brList.copy, c(12, 10e3),
c(12, 10e3))
## The number of columns in gc.matrix will correspond
## to the number of windows
ncol(gc.matrix)
## Having calculated the gc content for these two
## windows, we pass the gc matrix to the method
## gcCorrect. This function will iteratively update
## the log R ratios for the gc content given by the
## columns in gc.matrix (the log R ratios are updated
## for each column in gc.matrix).
ArrayTV:::gcCorrect(brList.copy, increms=c(12, 10e3),
maxwins=c(12, 10e3),
providedGC=gc.matrix)
stopCluster(cl)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.