knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Fused-ANOVA is a penalized method that solves the one-way ANOVA problem by collapsing the coefficients of K conditions. It reconstructs a balanced tree structure between the condition with a homotopy algorithm.
For a class of weights implemented here, our homotopy algorithm is in K log(K). These weights induce a balanced tree structure and simplify the interpretation of the results. The package contains an illustrating phenotypic data set: given a trait, we reconstruct a balanced tree structure and assess its agreement with the known phylogeny. More in the vignette.
The optimization problem solved by fused-ANOVA is \begin{equation} \hat{\beta}{\lambda} = \arg \min{\beta} \left{\sum_{k=1}^K \sum_{i=1}^{n_k} \left(Y_{ik}-\beta_k \right)^2 + \lambda \sum_{k,\ell} w_{k\ell} \left|\beta_k - \beta_\ell \right|\right} \end{equation} where $Y_{ik}$ is the intensity of a continuous random variable for sample $i$ in condition $k$ and $\beta_k$ is the mean parameter of condition $k$. We denote by $K$ the total number of conditions and $n_k$ the number of sample in each condition.
We propose various weights in the fused-penalty (entailing "laplace", "gaussian", "adaptive" - see the corresponding documentation) for which the homotopy algorithm produces a path that contains no split, which is highly desirable since in this case - the order of the \eqn{\beta_k}{beta_k} always matches the order of the empirical mean of each condition; - the recovered structure is a tree which simplifies the interpretation; - the total number of iterations is guaranteed to be small and equal to \eqn{K}{K}; - we avoid maximum flow problems whose resolution is computationally demanding.
Letting $\bar{y}k = \sum{i=1}^{n_k} y_{ik} / n_k$ be the initial means in group $k$ (also equal to $\beta_k^0$), we define the three set of weights as follows: \begin{equation} \begin{aligned} w_{k\ell}^{\text{laplace}} & = n_k n_\ell \exp{-\gamma |\bar{y}k - \bar{y}\ell| } \K w_{k\ell}^{\text{Gaussian}} & = n_k n_\ell \exp{-\gamma (\bar{y}k - \bar{y}\ell) ^2 } \ w_{k\ell}^{\text{adaptive}} & = \frac{n_k n_\ell}{|\bar{y}k - \bar{y}\ell|^\gamma} \ \end{aligned} \end{equation}
For the "laplace" weight, the associated algorithm is in $\mathcal{O}(K\log(K))$ thanks to a recursion property allowing the computation of the K^2 weights in only $K$ opeartions.
Most of the numerical work is done in C++
, relying on the Rcpp package.
This aves data set gives the birth weight of for 40 bird families classified in 15 orders and regrouping a total of 184 individuals. It comes from An Age: The Animal Ageing and Longevity Database.
The corresponding data frame contains 184 rows with three columns: "weight" (the weight in grammes), "family" (a factor with 40 levels) and "order" (a factor with 15 levels):
library(univarclust) data(aves) knitr::kable(head(aves))
By default, $\gamma = 0$ and so all weights reduce to $n_k n_\ell$ and are just equivalent.
default <- clusterpath_l1(aves$weight, aves$order)
If not specify, the weighting scheme is Laplace. As can be seen, using a positive $\gamma$ tends to fused quicker the groups which were closer at the start-up of the path.
laplace <- clusterpath_l1(aves$weight, aves$order, gamma = .01)
Gaussian weights provide a nice interpretation but do not guarante low complexity.
gaussian <- clusterpath_l1(aves$weight, aves$order, "gaussian", gamma = 1e-4)
The adaptive-weigths correspond to the Cas-ANOVA approach by Bondel and Reich, that do not enjoy a low computational complexity but the same kind of statistical guarantees as the lapace weights.
adaptive <- clusterpath_l1(aves$weight, aves$order, "adaptive", gamma = exp(.1))
An output of clusterpath_l1 has class "hclust" so you can rely on the available methods for this kind of object:
plot(as.dendrogram(default), type = "triangle", main = "default weights") plot(as.dendrogram(laplace), type = "triangle", main = "Laplace weights") plot(as.dendrogram(gaussian), type = "triangle", main = "Gaussian weights") plot(as.dendrogram(adaptive), type = "triangle", main = "adaptive weights") cutree(adaptive, c(2, 6))
Package ape provides more method for plotting dendrogram in a phylogenetic perpective :
library(ape) fa <- clusterpath_l1(aves$weight, gamma = 1e-4) plot(as.phylo(fa), type="fan", tip.color = as.numeric(aves$family))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.