Segmentation and classification of copy number profiles
Description
Takes in a copy number profile and segments it into predicted regions of equal copy number, and assigns a biologically motivated copy number state to each region using a Hidden Markov Model (HMM). This is an extension to the HMM described in Shah et al., 2006.
Usage
1 2  HMMsegment(correctOut, param = NULL, autosomes = NULL,
maxiter = 50, getparam = FALSE, verbose = TRUE)

Arguments
correctOut 
Output value from 
param 
If none is provided, will generate a reasonable set of parameters based on the input data, which can optionally be returned for inspection and manual adjustment by setting 'getparam' to TRUE. See Details for more information on parameters. A matrix with parameters values in columns for each state in rows:

autosomes 
Array of LOGICAL values corresponding to the 'chr' argument where an element is TRUE if the chromosome is an autosome, otherwise FALSE. If not provided, will automatically set the following chromosomes to false: "X", "Y", "23", "24", "chrX", chrY", "M", "MT", "chrM". 
maxiter 
The maximum number of iterations allows for the MaximumExpectation algorithm, reduce to decrease running time at the expense of robustness. 
getparam 
If TRUE, generates and returns parameters without running segmentation. 
verbose 
Set to FALSE if messages are not desired 
Details
HMMsegment
is a two stage algorithm that first runs an
ExpectationMaximization algorithm to find the optimal set of parameters
based on suggested parameter inputs, and allowed flexibilities. After
iteratively finding the optimal parameters, the actual segmentation of the
data is conducted with the Viterbi algorithm, finally output segmented
states. This is an extension to the hidden Markov model described in Shah
et al., 2006.
Parameters are divided into two main categories:
Initial parameters: e, mu, lambda, nu, kappa
Flexibility parameters: strength, m, eta, gamma, S
Where initial parameters are treated as starting suggestions for the parameter optimization algorithm, and flexibility parameters (hyperparameters) define how much the initial parameters are allowed to deviate during the search for the optimal parameters.
With a good copy number dataset, in theory, given enough flexibility, the algorithm should always find a similar set of optimal parameters regardless of initial parameters (although running times will vary).
If for some reason you wish to manually set the parameters for the final segmentation process, one should tune all flexibility parameters to minimal values. For example, if you wish to increase the length of segments, you could set:
1 2 3  param$e < 0.9999999999999999
param$strength < 1e30

Which suggests that segments should be very long, and gives minimal to nonexistant flexibility to your suggestion.
See vignette for diagrammed example:
1 2  vignette("HMMcopy")

Value
A list object containing multiple values, although in practice only the state assigned to each copy number value in 'states' and the segments of nonoverlapping states in 'segs' are of interest.
By default, there are 6 states, which in a diploid sample corresponds to the following chromosomal copies and biological state:
 1
<=0 copies, homozogous deletion
 2
1 copy, heterozogous deletion
 3
2 copies, neutral
 4
3 copies, gain
 5
4 copies, amplification
 6
>=5 copies, high level amplification
The full list of output is as follows:
 states
The state assigned to each copy number value
 segs
Nonoverlapping segments and medians of each segment
 mus
Optimal median of of copy numbers in state
 lambda
Optimal precision of copy numbers in state
 pi
Optimal state distribution
 loglik
The likelihood values of each EM algorithm iteration
 rho
Posterior marginals (responsibilities) for each position and state
Author(s)
Daniel Lai, Gavin Ha, Sohrab Shah
References
Sohrab P Shah, Xiang Xuan, Ron J DeLeeuw, Mehrnoush Khojasteh, Wan L Lam, Raymond Ng, and Kevin P Murphy. Integrating copy number polymorphisms into array cgh analysis using a robust hmm. Bioinformatics, 22(14):e4319, Jul 2006.
See Also
correctReadcount
, to correct the readcounts prior to
segmentation and classification for better results.
Examples
1 2  data(tumour) # Load tumour_copy
tumour_segments < HMMsegment(tumour_copy)
