gcCorrect-methods: Methods for 'gcCorrect' in Package 'ArrayTV'

Description Methods References See Also Examples

Description

Correct gc biases in array data for arrays stored as objects of an acceptable class. The user provides the array data and a range of windows and these methods will format the data and call gcCorrectMain. This function computes the tv score for each window and corrects the arrays based upon the window with the highest tv score (i.e with window showing the most gc bias)

Methods

gcCorrect(object, ...):

Methods have been defined for objects of class matrix, BeadStudioSet, BafLrrSet, and BafLrrSetList.

Objects of class BafLrrSetList are containers for log R ratios and B allele frequencies (BAFs) stored by chromosome. In particular, each element in this list class contains BAFs and log R ratios as assayData and genomic annotation of the markers (featureData) for one chromosome.

The gcCorrect method for objects of this class extracts the log R ratios for all elements (chromosomes) in the list, and then combines these into a single matrix. If the log R ratios are stored as ff objects, the samples will be GC-corrected in chunks determined by the function ocSamples(). For example, if object contains 100 samples and ocSamples(10) was specified prior to calling gcCorrect, the wave correction will be performed on 10 samples at a time in order to keep RAM at an acceptable level. When processing large numbers of samples, it can be helpful to evaluate the tv score on a subset of the samples, pick one or two windows for the correction, and precompute the GC content for these windows. See the examples below for the implementation with BafLrrSetList objects for details. When the assay data is represented as ff objects, the gcCorrect method returns the value NULL. Note that the assay data stored on disk will have changed as a result of calling this method.

When object is a matrix of log R ratios, the gcCorrect method returns a matrix of wave-adjusted log R ratios.

Additional arguments are passed to gcCorrectMain through the ... operator.

References

Benjamini Y, Speed TP. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 2012;40:e72

See Also

BeadStudioSet, BafLrrSetList, BafLrrSet gcCorrectMain

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
        ## Example with 2 iterations of correction
	path <- system.file("extdata", package="ArrayTV")
	load(file.path(path, "array_logratios.rda"))
	nimblegen[, "M"] <- nimblegen[, "M"]/1000
	increms <- c(10,1000,100e3)
	wins <- c(100,10e3,1e6)
        if(require(doParallel)){
	  ## nodes may be set to the number of cpus available
          nodes<-3
          cl <- makeCluster(nodes)
          registerDoParallel(cl)
        }

	## calculate tv scores in many windows and correct M values using the best window
	nimcM1List <- gcCorrectMain(nimblegen[, "M",
										drop=FALSE],
								chr=rep("chr15",
								nrow(nimblegen)),
								starts=nimblegen[,"position"],
								increms=increms,
								maxwins=wins,build='hg18')
	tvScores <- nimcM1List[['tvScore']]
	winsorted <- as.numeric(rownames(tvScores)[order(tvScores,decreasing=TRUE)])
	logwinsorted <- log(as.numeric(winsorted),10)
	logwinsortdiff <- abs(logwinsorted[1]-logwinsorted)
	## correct M values a 2nd time using a different window size
	nimcM2List <- gcCorrect(nimcM1List[['correctedVals']],
							chr=rep("chr15",
							nrow(nimblegen)),
							starts=nimblegen[,"position"],
							maxwins=winsorted[logwinsortdiff>=1][1],
							build='hg18')
	## Refer to the vignette for details

	## Example using a list container (BafLrrSetList) containing log R
	## ratios and B allele frequencies

	if(require(crlmm) && require(ff)){
			data(cnSetExample, package="crlmm")
			brList <- BafLrrSetList(cnSetExample)
			is(lrr(brList)[[1]], "ff_matrix")
			rold <- lrr(brList)[[1]][, 1]/100
			##
			## To avoid having too much data in RAM it might be
			## useful to process the samples n at a time
			##
			##  The number of samples to be processed at a time is
			##  set by the ocSamples function.  For example, to
			##  process 20 samples at a time one would do
			ocSamples(20)
			##
			## When assay data elements are ff objects, the wave
			## corrected values are updated on disk.  Currently,
			## the  data is stored is here:
			filename(lrr(brList)[[1]])
			##
			## To avoid permanently changing the log R ratio
			## values for the brList object, we copy the ff files
			## to a different path and create a new BafLrrSetList
			## object.  This is done by the non-exported function
			## "duplicateBLList". The path for the new ff objects
			## is given by the function ldPath().  Here, we use a
			## temp directory
			ldPath(tempdir())
			brList.copy <- oligoClasses:::duplicateBLList(brList, ids=sampleNames(brList))
			filename(lrr(brList.copy)[[1]])
			##
			## wave correct the log R ratios
			##
			gcCorrect(brList.copy, increms=c(12, 10e3),
					  maxwins=c(12, 10e3))
			##
			##
			rupdated <- lrr(brList.copy)[[1]][, 1]/100
			## note that rold and rupdated are no longer the same
			## since the log R ratios were updated in the
			## brList.copy container
			plot(rupdated, rold, pch=".")
			##
			## Remarks on efficiency
			##
			## If a large number of samples are to be processed,
			## the most efficient procedure is to settle on an
			## appropriate window size for gc correction using a
			## subset of the dataset (e.g., 20 samples).  See the
			## vignette for details.  Here, we assume that two
			## windows were already selected using such a
			## procedure (here, 12 bp and 10,000 bp) and the goal
			## is to efficiently do wave correction using these
			## two windows for all the samples in a large study.
			## Currently, our recommended approach is to first
			## calculate the gc content for these windows:
			gc.matrix <- ArrayTV:::computeGC(brList.copy, c(12, 10e3),
											 c(12, 10e3))
			## The number of columns in gc.matrix will correspond
			## to the number of windows
			ncol(gc.matrix)
			## Having calculated the gc content for these two
			## windows, we pass the gc matrix to the method
			## gcCorrect.  This function will iteratively update
			## the log R ratios for the gc content given by the
			## columns in gc.matrix (the log R ratios are updated
			## for each column in gc.matrix).
			ArrayTV:::gcCorrect(brList.copy, increms=c(12, 10e3),
								maxwins=c(12, 10e3),
								providedGC=gc.matrix)
                        stopCluster(cl)
}

ArrayTV documentation built on April 28, 2020, 9:02 p.m.