PSSeg: Parent-Specific copy number segmentation
In mpierrejean/jointseg: Joint Segmentation of Multivariate (Copy Number) Signals

Description Usage Arguments Details Value Author(s) References See Also Examples

This function splits (bivariate) copy number signals into parent-specific (PS) segments using recursive binary segmentation

1 2	PSSeg(data, method, stat = NULL, dropOutliers = TRUE, rankTransform = FALSE, ..., profile = FALSE, verbose = FALSE)

`data`	Data frame containing the following columns: c: Total copy number (logged or non-logged) b: Allele B fraction genotype: (germline) genotype of the SNP, coded as 0 for AA, 1/2 for AB, 1 for BB These data are assumed to be ordered by genome position.
`method`	"RBS" Recursive Binary Segmentation, see `doRBS` "GFLars" Group fused LARS as described in Bleakley and Vert (2011). "DP" Univariate pruned dynamic programming Rigaill et al (2010) or bivariate dynamic programming "PSCBS" Parent-specific copy number in paired tumor-normal studies using circular binary segmentation by Olshen A. et al (2011) "other" The segmentation method is passed as a function using argument `segFUN` (see examples in directory `otherMethods`).
`stat`	A vector containing the names or indices of the columns of `Y` to be segmented
`dropOutliers`	If TRUE, outliers are droped by using DNAcopy package
`rankTransform`	If TRUE, data are replaced by their ranks before segmentation
`...`	Further arguments to be passed to `jointSeg`
`profile`	Trace time and memory usage ?
`verbose`	A `logical` value: should extra information be output ? Defaults to `FALSE`.

Before segmentation, the decrease in heterozygosity d=2|b-1/2| defined in Bengtsson et al, 2010 is calculated from the input data. d is only defined for heterozygous SNPs, that is, SNPs for which data$genotype==1/2. d may be seen as a "mirrored" version of allelic ratios (b): it converts them to a piecewise-constant signals by taking advantage of the bimodality of b for heterozygous SNPs. The rationale for this transformation is that allelic ratios (b) are only informative for heterozygous SNPs (see e.g. Staaf et al, 2008).

Before segmentation, the outliers in the copy number signal are droped according the method explained by Venkatraman, E. S. and Olshen, A. B., 2007.

The resulting data are then segmented using the jointSeg function, which combines an initial segmentation according to argument method and pruning of candidate change points by dynamic programming (skipped when the initial segmentation *is* dynamic programming).

If argument stat is not provided, then dynamic programming is run on the two dimensional statistic "(c,d)".

If argument stat is provided, then dynamic programming is run on stat; in this case we implicitly assume that stat is a piecewise-constant signal.

A list with elements

bestBkp: Best set of breakpoints after dynamic programming
initBkp: Results of the initial segmentation, using 'doNnn', where 'Nnn' corresponds to argument method
dpBkpList: Results of dynamic programming, a list of vectors of breakpoint positions for the best model with k breakpoints for k=1, 2, ... K where K=length(initBkp)
prof: a matrix providing time usage (in seconds) and memory usage (in Mb) for the main steps of the program. Only defined if argument profile is set to TRUE

Morgane Pierre-Jean and Pierre Neuvial

Bengtsson, H., Neuvial, P., & Speed, T. P. (2010). TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC bioinformatics, 11(1), 245.

Staaf, J., Lindgren, D., Vallon-Christersson, et al. (2008). Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol, 9(9), R136.

Pierre-Jean, M, Rigaill, G. J. and Neuvial, P. (2015). "Performance Evaluation of DNA Copy Number Segmentation Methods." *Briefings in Bioinformatics*, no. 4: 600-615.

jointSeg

## load known real copy number regions
affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.5)

## generate a synthetic CN profile
K <- 10
len <- 1e4
sim <- getCopyNumberDataByResampling(len, K, regData=affyDat)
datS <- sim$profile

## run binary segmentation (+ dynamic programming)
resRBS <- PSSeg(data=datS, method="RBS", stat=c("c", "d"), K=2*K, profile=TRUE)
resRBS$prof

getTpFp(resRBS$bestBkp, sim$bkp, tol=5)
plotSeg(datS, breakpoints=list(sim$bkp, resRBS$bestBkp))