validateCNpreprocessing | R Documentation |
Parameters validation for the CNpreprocessing
function
Description
Validation of all parameters needed by the public
CNpreprocessing
function.
Usage
validateCNpreprocessing(
segall,
ratall,
idCol,
startCol,
endCol,
medCol,
madCol,
errorCol,
chromCol,
bpStartCol,
bpEndCol,
annot,
annotStartCol,
annotEndCol,
annotChromCol,
useEnd,
blsize,
minJoin,
nTrial,
bestBIC,
modelNames,
cWeight,
bsTimes,
chromRange,
nJobs,
normalLength,
normalMedian,
normalMad,
normalError
)
Arguments
ratall |
A matrix whose rows correspond to genomic positions
and columns to copy number profiles. Its matrix elements are functions of
copy number, most often log ratios of copy number to the expected standard
value, such as 2 in diploid genomes.
|
idCol |
A character string specifying the name for the
column in segall tabulating the profile IDs. When not specified,
the numerical column of the ratall object will be used as the
profile IDs.
|
startCol |
A character string specifying the name of column
in segall that tabulates the (integer) start postion of each segment
in internal units such as probe numbers for data of CGH microarray origin.
|
endCol |
A character string specifying the name of column
in segall that tabulates the (integer) end postion of each segment
in internal units such as probe numbers for data of CGH microarray origin.
|
medCol |
A character string specifying the
name of column in segall that, for the function of copy number used
in the study (typically log ratios), tabulates the (numeric) values for
the function (medCol ), a measure of its spread (madCol ) and
its error (errorCol ) for the segment.
|
madCol |
A character string specifying the
name of column in segall that, for the function of copy number used
in the study (typically log ratios), tabulates the (numeric) values for
a measure of spread (madCol ) related to
the function (medCol ) for the segment.
|
errorCol |
A character string specifying the
name of column in segall that, for the function of copy number used
in the study (typically log ratios), tabulates the (numeric) values for
the error (errorCol ) related to
the function (medCol ) for the segment.
|
chromCol |
A character string specifying the name for the
column in segall tabulating the (integer) chromosome number for
each segment.
|
bpStartCol |
A character string specifying the name of
column in segall that tabulates the (integer) genomic start
coordinate of each segment.
|
bpEndCol |
A character string specifying the name of
column in segall that tabulates the (integer) genomic end
coordinate of each segment.
|
annot |
A matrix or a data.frame that contains the annotation
for the copy number measurement platform in the study. It is generally
expected to contain columns with names specified by
annotStartCol, annotEndCol, annotChromCol .
|
annotStartCol |
A character string
specifying the name of column in annot that tabulates the (integer)
genomic start coordinates in case of CGH
microarrays.
|
annotEndCol |
A character string
specifying the name of column in annot that tabulates the (integer)
genomic end coordinates in case of CGH
microarrays.
|
annotChromCol |
A character string
specifying the name of column in annot that tabulates the chromosome
number for each copy number measuring unit, such as a probe in case of CGH
microarrays.
|
useEnd |
A single logical value specifying whether the segment
end positions as given by the bpEndCol of segall are to be
looked up in the annotEndCol column of annot
(if useEnd=TRUE ) or in the annotStartCol column (default).
|
blsize |
A single integer specifying the bootstrap sampling
rate of segment medians to generate input for model-based clustering. The
number of times a segment is sampled is then given by the (integer)
division of the segment length in internal units by blsize .
|
minJoin |
A single numeric value between 0 and 1 specifying the
degree of overlap above which two clusters will be joined into one.
|
nTrial |
A single positive integer specifying the number of
times a model-based
clustering is attempted for each profile in order to achieve the
highest Bayesian information criterion (BIC).
|
bestBIC |
A single numeric value for initalizing BIC
maximization. A large negative value is recommended.
|
modelNames |
A vector of character strings specifying
the names of models to be used in model-based clustering (see package
mclust for further details).
|
cWeight |
A single numeric value between 0 and 1
specifying the minimal share of the central cluster in each profile.
|
bsTimes |
A single positive double value specifying the number
of time the median of each segment is sampled in order to predict the
cluster assignment for the segment.
|
chromRange |
A integer vector enumerating chromosomes
from which segments are to be used for initial model-based clustering.
|
nJobs |
a single positive integer specifying the number of
worker jobs to create in case of distributed computation.
|
normalLength |
An integer vector specifying the genomic lengths
of segments in the normal reference data.
|
normalMedian |
A numeric vector ,
of the same length as normalLength , specifying the segment values
of the normal reference segments.
|
normalMad |
A numeric vector ,
of the same length as normalLength , specifying the value spreads
of the normal reference segments.
|
normalError |
A numeric vector ,
of the same length as normalLength , specifying the error values
of the normal reference segments.
|
Value
0
.
Author(s)
Astrid DeschĂȘnes
Examples
data(segexample)
data(ratexample)
data(normsegs)
## Return zero as all parameters are valid
CNprep:::validateCNpreprocessing(segall=segexample,
ratall=ratexample, idCol="ID", startCol="start", endCol="end",
chromCol="chrom", bpStartCol="chrom.pos.start",
bpEndCol="chrom.pos.end", blsize=50, nTrial=10,
useEnd=FALSE, minJoin=0.25, cWeight=0.4, bsTimes=50, chromRange=1:3,
nJobs=1, modelNames="E", normalLength=normsegs[,1],
normalMedian=normsegs[,2])