TitanCNA-output: Formatting and printing 'TitanCNA' results.
In gavinha/TitanCNA: Subclonal copy number and LOH prediction from whole genome sequencing of tumours

Description Usage Arguments Details Value Author(s) References See Also Examples

Function to format TitanCNA results in to a data.frame and output the results to a tab-delimited file.

  outputTitanResults(data, convergeParams, optimalPath, filename = NULL, 
      is.haplotypeData = FALSE, posteriorProbs = FALSE, subcloneProfiles = TRUE,
      correctResults = TRUE, proportionThreshold = 0.05, 
      proportionThresholdClonal = 0.05, recomputeLogLik = TRUE, rerunViterbi = FALSE,
      verbose = TRUE)
      
  outputModelParameters(convergeParams, results, filename, 
  		S_Dbw.scale = 1, S_Dbw.method = "Tong", S_Dbw.useCorrectedCN = TRUE)
  
  outputTitanSegments(results, id, convergeParams, filename = NULL, 
  		igvfilename = NULL)

`id`	Character string identifier for sample
`data`	`list` object that contains the components for the data to be analyzed. `chr`, `posn`, `ref`, and `tumDepth` that can be obtained using `loadAlleleCounts`, and `logR` that can be obtained using `correctReadDepth` and `getPositionOverlap` (see Example).
`convergeParams`	`list` object that is returned from the function `runEMclonalCN` in TitanCNA.
`optimalPath`	`numeric array` containing the optimal TitanCNA genotype and clonal cluster states for each data point in the analysis. `optimalPath` is obtained from running `viterbiClonalCN`.
`results`	Formatted TitanCNA results output from `outputTitanResults`.
`filename`	Path of the file to write the TitanCNA results.
`igvfilename`	Path of the file to write the IGV seg file.
`posteriorProbs`	`Logical TRUE` to include the posterior marginal probabilities in printing to `filename`.
`is.haplotypeData`	`Logical TRUE` if the `data` contains the haplotype information. In particular, the column headers `HaplotypeCount`, `HaplotypeDepth`, `HaplotypeRatio` are included.
`subcloneProfiles`	`Logical TRUE` to include the subclone profiles to the output `data.frame`. Currently, this only works for 1 or 2 clonal clusters.
`correctResults`	`Logical TRUE` to correct the results by removing empty clusters and adjusting cellular prevalence and normal contamination parameters accordingly.
`recomputeLogLik`	`Logical TRUE` to re-run forwards-backwards to re-estimate the log-likelihood after correcting results (e.g. `correctResults` is `TRUE`)
`rerunViterbi`	`Logical TRUE` to re-run viterbi to segment the results again after correcting results (e.g. `correctResults` is `TRUE`)
`proportionThreshold`	Minimum proportion of the genome altered (by SNPs) for a cluster to be retained. Clonal clusters having lower proportion of alteration are removed.
`proportionThresholdClonal`	Minimum proportion of genome altered by clonal events (by SNPs) for the highest cellular prevalence cluster. If the highest prevalence cluster contains lower proportion of events than this threshold, this cluster will be removed and the next highest (subclonal) cluster will be readjusted to be the clonal cluster.
`S_Dbw.scale`	The S_Dbw validity index can be adjusted to account for differences between datasets. `SDbw.scale` can be used to penalize the S_Dbw `dens.bw` component. The default is 1.
`S_Dbw.method`	Compute S_Dbw validity index using `Halkidi` or `Tong` method. See `computeSDbwIndex`.
`S_Dbw.useCorrectedCN`	`TRUE`: Will use corrected copy number calls for computing S_Dbw validity index.
`verbose`	Print status messages.

outputModelParameters outputs to a file with the estimated TITAN model parameters and model selection index. Each row contains information regarding different parameters:

1) Normal contamination estimate - proportion of normal content in the sample; tumour content is 1 minus this number

2) Average tumour ploidy estimate - average number of estimated copies in the genome; 2 represents diploid

3) Clonal cluster cellular prevalence - Z denotes the number of clonal clusters; each value (space-delimited) following are the cellular prevalence estimates for each cluster. Cellular prevalence here is defined as the proportion of tumour sample that does contain the aberrant genotype.

4) Genotype binomial means for clonal cluster Z - set of 21 binomial estimated parameters for each specified cluster

5) Genotype Gaussian means for clonal cluster Z - set of 21 Gaussian estimated means for each specified cluster

6) Genotype Gaussian variance - set of 21 Gaussian estimated variances; variances are shared for across all clusters

7) Number of iterations - number of EM iterations needed for convergence

8) Log likelihood - complete data log-likelihood for current cluster run

9) S_Dbw dens.bw - density component of S_Dbw index; see computeSDbwIndex

10) S_Dbw scat - scatter component of S_Dbw index; see computeSDbwIndex

11) S_Dbw validity index - used for model selection where the run with optimal number of clusters based on lowest S_Dbw index. This value is slightly modified from that computed from computeSDbwIndex. It is computed as S_Dbw= S_Dbw.scale * dens.bw + scat

12) S_Dbw dens.bw, scat, validity index is computed for LogRatio and AllelicRatio datatypes, as well as the combination of Both. For Both, the values are summed for both datatypes.

outputTitanResults outputs a file that has the similar format described in ‘Value’ section.

outputTitanResults also returns a list containing the following:

`results`	TITAN results, uncorrected for cluster number and parameters
`corrResults`	TITAN results, corrected by removing empty clusters and parameters adjusted accordingly.
`convergeParams`	Corrected parameter object

The results and corrResults are data.table objects, where each row corresponds to a position in the analysis, and with the following columns:

`Chr`	character denoting chromosome number. ChrX and ChrY uses ‘X’ and ‘Y’.
`Position`	genomic coordinate
`RefCount`	number of reads matching the reference base
`NRefCount`	number of reads matching the non-reference base
`Depth`	total read depth at the position
`AllelicRatio`	RefCount/Depth
`LogRatio`	log2 ratio between normalized tumour and normal read depths
`CopyNumber`	predicted TitanCNA copy number
`TITANstate`	internal state number used by TitanCNA; see Reference
`TITANcall`	interpretable TitanCNA state; string (HOMD,DLOH,HET,NLOH,ALOH,ASCNA,BCNA,UBCNA); See Reference
`ClonalCluster`	predicted TitanCNA clonal cluster; lower cluster numbers represent clusters with higher cellular prevalence
`CellularPrevalence`	proportion of tumour cells containing event; not to be mistaken as proportion of sample (including normal)

If subcloneProfiles is set to TRUE, then the subclone profiles will be appended to the output data.frame.

`Subclone1.CopyNumber`	Integer copy number for Subclone 1.
`Subclone1.TITANcall`	States for Subclone 1
`Subclone1.Prevalence`	The cellular prevalence of Subclone 1, or sometimes referred to as the subclone fraction.

outputModelParameters returns a list containing the S_Dbw model selection:

`dens.bw`
`scat`
`S_Dbw`	S_Dbw.scale * dens.bw + scat

Gavin Ha <gavinha@gmail.com>

Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E., Biele, J., Ding, J., Le, A., Rosner, J., Shumansky, K., Marra, M. A., Huntsman, D. G., McAlpine, J. N., Aparicio, S. A. J. R., and Shah, S. P. (2014). TITAN: Inference of copy number architectures in clonal cell populations from tumour whole genome sequence data. Genome Research, 24: 1881-1893. (PMID: 25060187)

runEMclonalCN, viterbiClonalCN, computeSDbwIndex

data(EMresults)

#### COMPUTE OPTIMAL STATE PATH USING VITERBI ####
optimalPath <- viterbiClonalCN(data, convergeParams)

#### FORMAT RESULTS ####
results <- outputTitanResults(data, convergeParams, optimalPath,
                              filename = NULL, posteriorProbs = FALSE,
                              subcloneProfiles = TRUE, correctResults = TRUE, 
                              proportionThreshold = 0.05, recomputeLogLik = FALSE,
                              proportionThresholdClonal = 0.05,
                              is.haplotypeData = FALSE)
## use corrected parameters
convergeParams <- results$convergeParam 
## use corrected results
results <- results$corrResults 

#### OUTPUT RESULTS TO FILE ####
outparam <- paste0("cluster2_params.txt")
outputModelParameters(convergeParams, results, outparam)

#### OUTPUT SEGMENTS TO FILE ####
outseg <- paste0("cluster2_segs.txt")
outigv <- paste0("cluster2.seg")
segs <- outputTitanSegments(results, id = "test", convergeParams, 
  filename = outseg, igvfilename = outigv)
# segment results also stored in data.frame "segs"