validateCNpreprocessing: Parameters validation for the 'CNpreprocessing' function

View source: R/validateCNpreprocessing.R

validateCNpreprocessingR Documentation

Parameters validation for the CNpreprocessing function

Description

Validation of all parameters needed by the public CNpreprocessing function.

Usage

validateCNpreprocessing(
  segall,
  ratall,
  idCol,
  startCol,
  endCol,
  medCol,
  madCol,
  errorCol,
  chromCol,
  bpStartCol,
  bpEndCol,
  annot,
  annotStartCol,
  annotEndCol,
  annotChromCol,
  useEnd,
  blsize,
  minJoin,
  nTrial,
  bestBIC,
  modelNames,
  cWeight,
  bsTimes,
  chromRange,
  nJobs,
  normalLength,
  normalMedian,
  normalMad,
  normalError
)

Arguments

ratall

A matrix whose rows correspond to genomic positions and columns to copy number profiles. Its matrix elements are functions of copy number, most often log ratios of copy number to the expected standard value, such as 2 in diploid genomes.

idCol

A character string specifying the name for the column in segall tabulating the profile IDs. When not specified, the numerical column of the ratall object will be used as the profile IDs.

startCol

A character string specifying the name of column in segall that tabulates the (integer) start postion of each segment in internal units such as probe numbers for data of CGH microarray origin.

endCol

A character string specifying the name of column in segall that tabulates the (integer) end postion of each segment in internal units such as probe numbers for data of CGH microarray origin.

medCol

A character string specifying the name of column in segall that, for the function of copy number used in the study (typically log ratios), tabulates the (numeric) values for the function (medCol), a measure of its spread (madCol) and its error (errorCol) for the segment.

madCol

A character string specifying the name of column in segall that, for the function of copy number used in the study (typically log ratios), tabulates the (numeric) values for a measure of spread (madCol) related to the function (medCol) for the segment.

errorCol

A character string specifying the name of column in segall that, for the function of copy number used in the study (typically log ratios), tabulates the (numeric) values for the error (errorCol) related to the function (medCol) for the segment.

chromCol

A character string specifying the name for the column in segall tabulating the (integer) chromosome number for each segment.

bpStartCol

A character string specifying the name of column in segall that tabulates the (integer) genomic start coordinate of each segment.

bpEndCol

A character string specifying the name of column in segall that tabulates the (integer) genomic end coordinate of each segment.

annot

A matrix or a data.frame that contains the annotation for the copy number measurement platform in the study. It is generally expected to contain columns with names specified by annotStartCol, annotEndCol, annotChromCol.

annotStartCol

A character string specifying the name of column in annot that tabulates the (integer) genomic start coordinates in case of CGH microarrays.

annotEndCol

A character string specifying the name of column in annot that tabulates the (integer) genomic end coordinates in case of CGH microarrays.

annotChromCol

A character string specifying the name of column in annot that tabulates the chromosome number for each copy number measuring unit, such as a probe in case of CGH microarrays.

useEnd

A single logical value specifying whether the segment end positions as given by the bpEndCol of segall are to be looked up in the annotEndCol column of annot (if useEnd=TRUE) or in the annotStartCol column (default).

blsize

A single integer specifying the bootstrap sampling rate of segment medians to generate input for model-based clustering. The number of times a segment is sampled is then given by the (integer) division of the segment length in internal units by blsize.

minJoin

A single numeric value between 0 and 1 specifying the degree of overlap above which two clusters will be joined into one.

nTrial

A single positive integer specifying the number of times a model-based clustering is attempted for each profile in order to achieve the highest Bayesian information criterion (BIC).

bestBIC

A single numeric value for initalizing BIC maximization. A large negative value is recommended.

modelNames

A vector of character strings specifying the names of models to be used in model-based clustering (see package mclust for further details).

cWeight

A single numeric value between 0 and 1 specifying the minimal share of the central cluster in each profile.

bsTimes

A single positive double value specifying the number of time the median of each segment is sampled in order to predict the cluster assignment for the segment.

chromRange

A integer vector enumerating chromosomes from which segments are to be used for initial model-based clustering.

nJobs

a single positive integer specifying the number of worker jobs to create in case of distributed computation.

normalLength

An integer vector specifying the genomic lengths of segments in the normal reference data.

normalMedian

A numeric vector, of the same length as normalLength, specifying the segment values of the normal reference segments.

normalMad

A numeric vector, of the same length as normalLength, specifying the value spreads of the normal reference segments.

normalError

A numeric vector, of the same length as normalLength, specifying the error values of the normal reference segments.

Value

0.

Author(s)

Astrid DeschĂȘnes

Examples


data(segexample)
data(ratexample)
data(normsegs)

## Return zero as all parameters are valid
CNprep:::validateCNpreprocessing(segall=segexample,
    ratall=ratexample, idCol="ID", startCol="start", endCol="end", 
    chromCol="chrom", bpStartCol="chrom.pos.start", 
    bpEndCol="chrom.pos.end", blsize=50, nTrial=10,
    useEnd=FALSE, minJoin=0.25, cWeight=0.4, bsTimes=50, chromRange=1:3, 
    nJobs=1, modelNames="E", normalLength=normsegs[,1],
    normalMedian=normsegs[,2])


KrasnitzLab/CNprep documentation built on May 28, 2022, 8:32 p.m.