# PostHocGroups: Provides algorithm that divides ordered sets of items into... In adamryczkowski/PostHocGroups: Approximation of paired post-hoc results as mutualy exclusive sets

## Description

One usefull application is to present results of pairwise post-hoc tests of many groups in a easy to understand manner.

## Details

This matrix is input either explicitely as dissimilarity.matrix or inferred implicitely from input post-hoc model.

A given dissimilarity matrix is not divisable into the exclusive groups unless we accept some error. The algorithm operates by trying to maximize quality of partitioning (QoP).

The algorithm doesn't search through all possible ways to divide the item set - the number of iterations would equal 2^n (and this value do not include recursively divided groups ) and even for relatively small number of post-hoc groups it would be computationaly not feasible to calculate. It rather uses hierarchical clustering to do the heavy lifting. Details in the algorithm section.

## Features

• The algorithm can output alternative results for user to choose among.

• Algorithm can identify a case, when certain group of items is not different from all (almost all) other exclusive groups and output it separately. This can happen when performing post hoc tests on grouping variable that doesn't divide the population equally.

• Algorithm support nested exclusive groups, i.e. it can identify a case, when there is a subgroup of variables that are together different from all other groups, but are not all similar to each other. Nested grouping can only improve fit to the given dissimilarity matrix.

## The algorithm

Basically the algorithm tries to get as many as possible meaningful partitionings of the items and chooses the one which maximises the utility.

#### Utility of the partitioning

The partition to evalute is converted into the Result Dissimilarity Matrix (RDM), where situation where two items belong to the same partition is marked as "0" and if the two items belong to different partitions - by "1". The utility of the paritioning is simply a difference between sum of all elements of the original dissimilarity matrix where the corresponding elements of the RDM matrix have value "0" minus the elements, where corresponding elements of the RDM matrix have value "1".

In the following description for simplicity I will assume that the dissimilarity matrix is binary, i.e. it tells whether two items are different from each other (value 1) as opposed to being similar (value -1).

I will also asume, that the items are sorted according to the same measure that user used to calculate the dissimilarity matrix (usually it is group mean in case of PostHoc tests).

#### The algorithm step-by-step

The algorithm tries very hard not to test all 2^n possible partitionings.

1. Pre-processing of the dissimilarity matrix

At the beginning the algorithm will check if user input binary dissimilarity matrix (i.e. if it contains only 2 distinct values: is equal/is nonequal). If so, then a random noise will be added. The noise will be sufficiently small to not to change the meaning of the matrix. It is needed because a lot of similar values in the dissimilarity matrix would lead to sub-optimal solutions.

2. Gathering all possible partitionings of the items

1. Getting the "smile vector": The key statistics to get meaningful candidates for a common group is the "smile vector". It is a column-wise sum of the dissimilarity matrix (or row-wise: it doesn't matter, since the matrix is assumed symmetric) which can be interpreted as a number of significant differences for each item. Elements of the "smile vector" correspond to items that are going to partitioned. Usually the edge items (those with minimal of maximal value) tend to be more dissimilar to the rest of items, than items that have value closer to the median. If the vector is plotted it usually has a "U" shape, hence its name: the "smile vector".

2. Correcting the "smile vector" for its smile: [This is optional step. It is performed only on top recursion level, and only if user ask for it] To offset the "U" shape of the "smile vector", algorithm can fits second level multiplicative "penalty" polynomial in the form

p(x) = A ( \frac{2 x^2}{(n-1)^2}-\frac{2 x}{n-1}+1 )

. The polynomial is choosen the way, that its second derivative is always positive, and p(0)=A, p(n-1)=A and p(\frac{n-1}{2})=\frac{A}{2}. In other words, the penalty makes sure, that significant differences between items that are on the far ends of the item list are twice less pronounced than differences in the middle of the set. After fitting the only parameter of the penalty (A) it multiplies all elements of the smile vector by the fitted function to get the corrected smile vector.

3. Partition the smile vector into the common group and the items that will be further partitioned.

Common group is a group of items that are not dissimilar to any other items in the set. Items in this group would have very low sum of dissimilarities to the other items. Algorithm guesses the common group by assigning the first i \in (0,n) elements of the sorted (ascending) smile vector.

4. Perform cluster analysis on the items that are left

Partitioning is performed by the hierarchical clustering hclust using the dissimilarity matrix direclty.

Algorithm gathers for evaluation all partitionings for each distinct "height" of the hierarchical clustering.

3. If the final recursion level is reached, return all alternatives together with the utility (quality)

4. Otherwise, prepare for efficient nested partitioning of all eligible groups.

The algorithm is complicated, because of the need to restate the problem in manner compatible with the main function, and to make sure we don't launch the recursion for the same set of items twice.

adamryczkowski/PostHocGroups documentation built on June 3, 2017, 12:54 p.m.