larsCBSsegment: Segmentation based on Circular Binary Segmentation followed...

Description Usage Arguments Value Author(s) References See Also Examples

Description

A model selection procedure is applied after CBS segmentation. In another word, we assess which ones in over-detected change points from CBS calls are really necessary. More specifically, we used $K$ change points as $K$ predictors for input $X_i, i = (0,..., n)$ to fit a linear model and select variables by step-wise regression implemented in lars()(from R package lars). Then optimal change points could be selected from the LARS solution path via different criterions.

Usage

1
larsCBSsegment(data, selection = .selection.default(), collapse.k = 0, ncores = 1, verbose = TRUE, variation.control = TRUE, rss = FALSE, S = 0.1, k = 50, ...)

Arguments

data

A GRanges object, output of SomatiCAFormat().

selection

Model selection parameters.

collapse.k

Number of data points collapsed.

ncores

Number of cores used.

verbose

Whether working messages are shown.

variation.control

A logical value, whether pseudo points are used to smooth the segment. Default is TRUE.

rss

A logical value, whether a cutoff based on residue sum of squares is used. Default is FALSE.

S

The cutoff based on residue sum of squares. Default is 0.1.

k

The window size used to smooth the outliers.

...

Arguments for segment() in DNAcopy package.

Value

segment

S4 class, "Segmented".

hetsites

Heterozygous sites used in segmentation, unsmoothed.

Author(s)

Mengjie Chen

References

Efron, Hastie, Johnstone and Tibshirani (2003) "Least Angle Regression" (with discussion) Annals of Statistics. Olshen, A. B., Venkatraman, E. S., Lucito, R., Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5: 557-572. Venkatraman, E. S., Olshen, A. B. (2007) A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23: 657-63.

See Also

See Also SomatiCAFormat, lars, segment.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
rawLAF <- c(rnorm(300, 0.2, 0.05), rnorm(300, 0.4, 0.05), rnorm(200, 0.3, 0.05), rnorm(200, 0.2, 0.05), rnorm(200, 0.3, 0.05), rnorm(250, 0.4, 0.05)) 
rawLAF <- ifelse(rawLAF>0.5, 1-rawLAF, rawLAF) 
germLAF <- c(rnorm(800+650, 0.4, 0.05)) 
germLAF <- ifelse(germLAF>0.5, 1-germLAF, germLAF) 
reads1 <- c(rpois(300, 25), rpois(300, 50), rpois(200, 60),  rpois(200, 25), rpois(200, 40), rpois(250, 50))
reads2 <- rpois(800+650,50)
chr <- c(rep("chr1", 800), rep("chr2", 650))
position <- c(c(1:800), c(1:650))
zygo <- rep("het", 800+650)
x <- data.frame(chr, as.integer(position), as.character(zygo), as.integer(reads1), rawLAF, as.integer(reads2), germLAF) 
colnames(x) <- c("seqnames", "start", "zygosity", "tCount", "LAF", "tCountN", "germLAF")            
data <- SomatiCAFormat(x)

### This is an easy example, without much noise.
### Consider to use rss=T to select change points from sequencing data
seg <- larsCBSsegment(data, rss = FALSE)
 
plotSegment(seg$segment, data, k = 1, smooth = FALSE)
plotSegment(seg$segment, data, k = 2, smooth = FALSE)

SomatiCA documentation built on Oct. 5, 2016, 4:18 a.m.