GFL: Solve Generalized Fused Lasso Model

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/GFL.R

Description

This function use MM algorithm to fit a piece-wise constant curve to each of the signal sequences.

Usage

1
2
GFL(Y, Delta, sigma, rho1 = 1, rho2 = 2, rho3 = 0, 
    obj_c = NULL, max_iter = 1000, verbose = FALSE)

Arguments

Y

A matrix of original signal, where each column corresponds to a sequence and each row correspond to a marker.

Delta

A matrix of the same dimension as Y. Each entry is associated with the entry of Y at the same location, indicating whether the value of the corresponding entry is regarded as missing or not. 1 = available, 0 = missing.

sigma

A vector of standard deviations for each column of Y. It should have the same length as the number of columns of Y.

rho1,rho2,rho3

Factors to be set in the tuning parameters of lambda1, lambda2, and lambda3. See details.

obj_c

Stopping criterion based on the size of improvement of objective function.

max_iter

Maximum iteration of MM algorithm to be used to solve the GFL model.

verbose

Logical. It indicates whether display the intermediate diagnosis imformation. Defautl is FALSE (highly recommended).

Details

In order to fit a piece-wise constant curve to each of the signal sequences, we try to minimize the following objective function

loss function + lambda1 * lasso penalty + lambda2 * fused lasso penalty + lambda3 * group fused lasso penalty

The optimal solution is approached via an iteration based algorithm called Majorization-Minimization (MM) algorithm developed by Kenneth Lange (2004). The choices of tuning parameters of the model are suggested as follows:

λ_{1,i} = c_1 σ_i

λ_{2,i} = ρ(p) c_2 σ_i √{\log N}

λ_{3,i} = [1-ρ(p)] c_3 σ_i √{pM} √{\log N}

where σ_i is signal noise level of each sequence, M is the number of sequences, N is the number of markers and c_1, c_2, c_3, ρ, and p are properly chosen contants, which are absorbed in ρ_1, ρ_2, and ρ_3 respectively. More details are referred to Zhang et al. (2012).

Value

All outputs are collected in a list:

obj

A vector of values of objective function at each MM iteration.

Beta

A matrix of the same dimension as Y, recording the fitted piece-wise contant curves for each sequence. One column correpond to one sequence, while one row reprents one marker.

Note

Y and Delta must be of the class matrix. If only one signal sequence is to be analyzed, they should be also coerced to matrix with only one column.

Author(s)

Zhongyang (Thomas) Zhang, [email protected]

References

  1. Kenneth Lange. (2004) Optimization. Springer, New York.

  2. Zhongyang Zhang, Kenneth Lange, Roel Ophoff, and Chiara Sabatti. (2010) Reconstructing DNA copy number by penalized estimation and imputation. The Annals of Applied Statistics, 4(4): 1749-1773.

  3. Zhongyang Zhang, Kenneth Lange, and Chiara Sabatti. (2012) Reconstructing DNA copy number by segmentation of multiple sequences. Submitted.

See Also

See FL for segmentation of only one sequence of signals.

Examples

1
2
3
4
5
6
7
8
9
## Jointly segment 2 sequences of signals with 100 markers
## Duplications are superimposed on both sequences
Y <- matrix(rnorm(200,0,0.15),100,2)
Y[41:60,] <- rnorm(40,0.3,0.2)
Delta <- matrix(1,100,2)
sigma <- apply(Y,2,FUN="mad")
res <- GFL(Y=Y, Delta, sigma, rho1 = 0.01, rho2 = 0.5*2, rho3 = 0.5*2, 
           obj_c = 1e-4, max_iter = 1000, verbose = FALSE)
plot(res$Beta[,1],type="s")

Piet documentation built on May 31, 2017, 3:10 a.m.