fusedanova-package: Fused-ANOVA package: general presentation

Description Problem solved Choice of weights and performance of the algorithm Efficient cross-validation procedure Technical remarks Author(s) References

Description

This package is designed to fit accurately the Fused-ANOVA model, a penalized method to solve the one-way ANOVA problem by collapsing the coefficients of K conditions. For a large class of weights implemented here, our homotopy algorithm is in O(klog(K)). These weights induce a balanced tree structure and simplify the interpretation of the results. The package contains an illustrating phenotypic data set: given a trait, we reconstruct a balanced tree structure and assess its agreement with the known phylogeny.

Problem solved

The optimization problem solved by fused-ANOVA is

βhat λ1 = argminβ sumk sum_i (Yik - &betak)2 + λ sumk,l wk,l | βk - βl |,

where Y_ik is the intensity of a continuous random variable for sample i in condition k and beta_k is the mean parameter of condition k. We denote by K the total number of conditions and n_k the number of sample in each condition.

Choice of weights and performance of the algorithm

For various weights in the fused-penalty (entailing "laplace", "gaussian", "default", "adaptive" - see the corresponding documentation), the homotopy algorithm produces a path that contains no split, which is highly desirable since in this case

  1. the order of the beta_k always matches the order of the empirical mean of each condition;

  2. the recovered structure is a tree which simplifies the interpretation;

  3. the total number of iterations is guaranteed to be small and equal to K;

  4. we avoid maximum flow problems whose resolution is computationally demanding.

The associated algorithm is in O(klog(K)). In this perspective, we extend the work of Hocking et al. to a larger class of weights.

For other weights, split can occur along the path of solution. We adapted the algorithm developed by Hoefling (reference below) to the fused-ANOVA problem.

Efficient cross-validation procedure

We provide a fast cross validation (CV) procedure to select lambda for both the general and the no split algorithms. The idea behind this procedure is to take advantage of the DAG structure of the path of solutions along lambda. Rather than computing the CV error for each condition separately, we traverse each edge of the DAG once and only once and compute simultaneously the error of all conditions going through this edge. If we consider a perfectly balanced tree and a grid of P values of lambda we achieve O(P log (P)) rather than a O(P^2) complexity.

Technical remarks

Most of the numerical work is done in C++, relying on the Rcpp package. We also use the multi-core capability of the computer through the parallel package when multiple variables are to be classified. This feature is not available for Windows user, though.

Author(s)

Pierre Gutierrez, Julien Chiquet, Guillem Rigaill.

References

Fused-ANOVA: shortly coming

H. Hoefling. A path algorithm for the fused lasso signal approximator, technical report, arXiv, 2010.

T. Hocking, J.-P. Vert, F. Bach, and A. Joulin. Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties, ICML, 2011.


fusedanova documentation built on May 31, 2017, 1:38 a.m.