Description Problem solved Choice of weights and performance of the algorithm Efficient cross-validation procedure Technical remarks Author(s) References

This package is designed to fit accurately the Fused-ANOVA
model, a penalized method to solve the one-way ANOVA
problem by collapsing the coefficients of *K*
conditions. For a large class of weights implemented here,
our homotopy algorithm is in
*O(klog(K))*. These weights
induce a balanced tree structure and simplify the
interpretation of the results. The package contains an
illustrating phenotypic data set: given a trait, we
reconstruct a balanced tree structure and assess its
agreement with the known phylogeny.

The optimization problem solved by fused-ANOVA is

where *Y_ik* is the intensity of a continuous
random variable for sample *i* in condition
*k* and *beta_k* is the mean
parameter of condition *k*. We denote by
*K* the total number of conditions and
*n_k* the number of sample in each condition.

For various weights in the fused-penalty (entailing "laplace", "gaussian", "default", "adaptive" - see the corresponding documentation), the homotopy algorithm produces a path that contains no split, which is highly desirable since in this case

the order of the

*beta_k*always matches the order of the empirical mean of each condition;the recovered structure is a tree which simplifies the interpretation;

the total number of iterations is guaranteed to be small and equal to

*K*;we avoid maximum flow problems whose resolution is computationally demanding.

The associated algorithm is in
*O(klog(K))*. In this
perspective, we extend the work of Hocking et al. to a
larger class of weights.

For other weights, split can occur along the path of solution. We adapted the algorithm developed by Hoefling (reference below) to the fused-ANOVA problem.

We provide a fast cross validation (CV) procedure to
select *lambda* for both the general and the
no split algorithms. The idea behind this procedure is
to take advantage of the DAG structure of the path of
solutions along *lambda*. Rather than
computing the CV error for each condition separately, we
traverse each edge of the DAG once and only once and
compute simultaneously the error of all conditions going
through this edge. If we consider a perfectly balanced
tree and a grid of *P* values of
*lambda* we achieve *O(P log (P))* rather than a
*O(P^2)* complexity.

Most of the numerical work is done in C++, relying on the Rcpp package. We also use the multi-core capability of the computer through the parallel package when multiple variables are to be classified. This feature is not available for Windows user, though.

Pierre Gutierrez, Julien Chiquet, Guillem Rigaill.

Fused-ANOVA: shortly coming

H. Hoefling. A path algorithm for the fused lasso signal approximator, technical report, arXiv, 2010.

T. Hocking, J.-P. Vert, F. Bach, and A. Joulin. Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties, ICML, 2011.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.