model.auto: Automatic generation of copy number model

Description Usage Arguments Details Value Author(s) See Also Examples

Description

This function computes a copy number model, as needed by model.apply to translate logRatios into copy numbers.

Usage

1
2
3
4
5
  model.auto(segLogRatios, segChroms, segLengths = rep(1, length(segLogRatios)),
    from = 0.02, to = 0.5, by = 0.001, precision = 512, maxPeaks = 8, minWidth = 0.15,
    maxWidth = 0.9, minDensity = 0.001, peakFrom = -2, peakTo = 1.3, ploidy = 0,
    discreet = FALSE, method = c("stm", "sdd", "ptm"), exclude = c("X", "Y", "Xp", "Xq",
    "Yp", "Yq"))

Arguments

segLogRatios

Double vector, the log ratios of the CGH segments to modelize.

segChroms

Vector, the chromosome holding the CGH segments to modelize.

segLengths

Double vector, the lengths of the CGH segments to modelize. Amount of probes should be prefered if available, but nucleotide length or no length at all can also be used.

from

Single double value, the minimal bandwidth to test for density.

to

Single double value, the maximal bandwidth to test for density.

by

Single double value, the precision of the bandwidths to test for density.

precision

Single integer value, the amount of points to compute for density. As its help page suggests, values greater than 512 should be powers of 2.

maxPeaks

Single integer value, the maximal amount of peaks in the density of distribution to consider a model as valid.

minWidth

Single double value, minimal value allowed for the width model parameter (thus for tumoral cell proportion in the sample).

maxWidth

Single double value, maximal value allowed for the width model parameter (thus for tumoral cell proportion in the sample).

minDensity

Single double value, minimal density for a peak to be detected.

peakFrom

Single double value, minimal logRatio for a peak to be detected. Use NA for no lower limit. Only 1, 2 and 3 copies peaks should be considered for a more precise model.

peakTo

Single double value, maximal logRatio for a peak to be detected. Use NA for no upper limit. Only 1, 2 and 3 copies peaks should be considered for a more precise model.

ploidy

Single numeric value, copy number supposed to be the most common within the analyzed genome.

discreet

Single logical value, if FALSE a fail in modelization raises an error, if TRUE it returns a NA filled model.

method

Single character value, the statistic to minimize ("stm" is default). See below for further details.

exclude

Vector, the chromosomes to exclude from the density computation and to plot with distinct symbols (use NULL to disable this feature). Sexual chromosomes should be excluded in heterogeneous DNA source, as their desequilibrium (2 'X' and no 'Y' in women) impact normal cells AND tumoral ones.

Details

More details about the cghRA copy number model and modelization can be found in the vignette associated with this package, as well as in the related publication. Once the parameters of a model (width and center) are set, three scores can be computed to assess its fitness to the data :

STM is the "Segment To Model" score, computed at the segment level as the average of the residuals weighted by the segment size (in probe counts). Residuals are computed as the absolute difference between exact copy numbers (see the copies function) and their rounding, assuming that copy numbers should be integers and that decimal parts are noise in the model. This is the recommended score to use with cghRA.

PTM is the "Peak To Model" score, computed at the peak level as the average of the residuals. Residuals are computed as the absolute difference between exact copy numbers (see the copies function) and their rounding, assuming that copy numbers should be integers and that decimal parts are noise in the model.

SDD is the "Standard Deviation of peak Differences" score. As its name suggests, it is computed as the sd or differences between consecutive peaks, considering that good models should show very regularly spaced density peaks.

Value

Returns a double vector, with the following values :

bw

Bandwidth used for density computation.

peaks

Amount of peaks considered in the model.

peakFrom

See the peakFrom argument.

peakTo

See the peakTo argument.

center

Center parameter of the model.

width

Width paremeter of the model.

ploidy

Ploidy paremeter of the model, as provided.

sdd

Quality statistic, see 'Details'.

ptm

Quality statistic, see 'Details'.

stm

Quality statistic, see 'Details'.

Author(s)

Sylvain Mareschal

See Also

model.test, model.apply

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
  # Generating random segmentation results
  ## with 30% normal cells contamination
  ## with +10% for normal DNA labelling
  segLogRatios <- c(
    rnorm(
      sample(5:20, 1),
      mean = log((1*0.7 + 2*0.3)/(2*1.1), 2),   # One deletion
      sd = 0.08
    ),
    rnorm(
      sample(80:120, 1),
      mean = log(2/(2*1.1), 2),                 # No alteration
      sd = 0.08
    ),
    rnorm(
      sample(40:60, 1),
      mean = log((3*0.7 + 2*0.3)/(2*1.1), 2),   # One more copy
      sd = 0.08
    )
  )
  segLogRatios <- sample(segLogRatios)
  segLengths <- as.integer(3 + round(rchisq(length(segLogRatios), 1)*100))
  segEnds <- cumsum(segLengths)
  segStarts <- c(1L, head(segEnds, -1))
  segChroms <- rep("chr1", length(segEnds))
  
  # Generated genome
  genome <- data.frame(
    segChroms,
    segStarts,
    segEnds,
    segLogRatios,
    segLengths
  )
  print(genome)
  
  # Automatic modelization
  model <- model.auto(
    segLogRatios = segLogRatios,
    segChroms = segChroms,
    segLengths = segLengths
  )
  print(model)

cghRA documentation built on May 2, 2019, 3:34 a.m.