View source: R/CNpreprocessing.R
CNpreprocessing | R Documentation |
The package evaluates DNA copy number data, using both their initial form (copy number as a noisy function of genomic position) and their approximation by a piecewise-constant function (segmentation), for the purpose of identifying genomic regions where the copy number differs from the norm.
CNpreprocessing( segall, ratall = NULL, idCol = NULL, startCol = NULL, endCol = NULL, medCol = NULL, madCol = NULL, errorCol = NULL, chromCol = NULL, bpStartCol = NULL, bpEndCol = NULL, annot = NULL, annotStartCol = NULL, annotEndCol = NULL, annotChromCol = NULL, useEnd = FALSE, blsize = NULL, minJoin = NULL, nTrial = 10, bestBIC = -1e+07, modelNames = "E", cWeight = NULL, bsTimes = NULL, chromRange = NULL, nJobs = 1, normalLength = NULL, normalMedian = NULL, normalMad = NULL, normalError = NULL )
segall |
a |
ratall |
a |
idCol |
a |
startCol |
a |
endCol |
a |
medCol |
a |
madCol |
a |
errorCol |
a |
chromCol |
a |
bpStartCol |
a |
bpEndCol |
a |
annot |
a |
annotStartCol |
a |
annotEndCol |
a |
annotChromCol |
a |
useEnd |
a single logical value specifying whether the segment end
positions as given by the |
blsize |
a single |
minJoin |
a single |
nTrial |
a single positive |
bestBIC |
a single |
modelNames |
a |
cWeight |
A single |
bsTimes |
a single positive |
chromRange |
a |
nJobs |
a single positive |
normalLength |
an integer |
normalMedian |
a numeric |
normalMad |
a numeric |
normalError |
a numeric |
Depending on the availability of input, the function will perform the following operations for each copy number profile.
If raw data are available in addition to segment start and end positions, median and MAD of each segment will be computed. For each profile, bootstrap sampling of the segment median values will be performed, and the sample will be used to estimate the error in the median for each segment. Model-dependent clustering (fitting to a gaussian mixture) of the sample will be performed. The central cluster (the one nearest the expected unaltered value) will be identified and, if necessary, merged with adjacent clusters in order to comprise the minimal required fraction of the data. Deviation of each segment from the center, its probability to belong to the central cluster and its marginal probability in the central cluster will be computed.
If segment medians or median deviations are available or have been computed, and, in addition, genomic lengths and average values are given for a collection of segments with unaltered copy number, additional estimates will be performed. If median values are available for the unaltered segments, the marginal probability of the observed median or median deviation in the unaltered set will be computed for each segment. Likewise, marginal probabilities for median/MAD and/or median/error will be computed if these statistics are available.
The input segall
data.frame
to which some or all of
the following columns may be bound, depending on the availability of input:
segmedian a numeric
, the median function of copy number
segmad a numeric
, the MAD for the function of copy number
mediandev a numeric
, the median function of copy number
relative to its central value
segerr a numeric
, the error estimate for the
function of copy number
centerz a numeric
between 0
and 1
, the
probability that the segment is in the central cluster
marginalprob a numeric
, the marginal probability for
the segment in the central cluster
maxz TODO
maxzmean TODO
maxzsigma TODO
samplesize TODO
negtail the probability of finding the deviation as observed or larger in a collection of central segments
negtailnormad the probability of finding the deviation/MAD as observed or larger in a collection of central segments
negtailnormerror a numeric
, the probability of finding
the deviation/error as observed or larger in a collection of
central segments
Alexander Krasnitz
## Load needed datasets data(segexample) data(ratexample) data(normsegs) ## Small toy example segtable <- CNpreprocessing(segall=segexample[segexample[,"ID"]=="WZ1",], ratall=ratexample, idCol="ID", startCol="start", endCol="end", chromCol="chrom", bpStartCol="chrom.pos.start", bpEndCol="chrom.pos.end", blsize=50, minJoin=0.25, cWeight=0.4, bsTimes=50, chromRange=1:3, nJobs=1, modelNames="E", normalLength=normsegs[,1], normalMedian=normsegs[,2]) ## Not run: ## Example 1: 5 whole genome analysis, choosing the right format of arguments segtable <- CNpreprocessing(segall=segexample,ratall=ratexample, idCol="ID", "start","end", chromCol="chrom",bpStartCol="chrom.pos.start", bpEndCol="chrom.pos.end", blsize=50, minJoin=0.25, cWeight=0.4, bsTimes=50, chromRange=1:22, nJobs=4, modelNames="E", normalLength=normsegs[,1], normalMedian=normsegs[,2]) ## Example 2: how to use annotexample, when segment table does not have columns of integer positions in terms of measuring units(probes), such as "mysegs" below mysegs <- segexample[,c(1,5:12)] data(annotexample) segtable <- CNpreprocessing(segall=mysegs,ratall=ratexample, idCol="ID", chromCol="chrom", bpStartCol="chrom.pos.start",bpEndCol="chrom.pos.end", annot=annotexample, annotStartCol="CHROM.POS",annotEndCol="CHROM.POS", annotChromCol="CHROM", blsize=50, minJoin=0.25, cWeight=0.4, bsTimes=50, chromRange=1:22, modelNames="E", nJobs=4, normalLength=normsegs[,1], normalMedian=normsegs[,2]) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.