README.md

MultiPattern

Status codecov

MultiPattern is an R package for discovery of multiple patterns in data.

Overview

Unsupervised exploration of data is an open-ended task. Consider, for example, the task of clustering the following two toy datasets in two dimensions.

two toy datasets in two dimensions

In each dataset, there exists a natural grouping for the points. There are four natural clusters in the first case and six in the second. Various machine learning algorithms, for example using hierarchical approaches, would be able to identify these groups.

But let's suppose that we need to assign the points to a maximum of two or three categories. Although this is also a clustering task, it does not have a unique or a natural solution. Machine learning algorithms like hierarchical clustering that produce a single output can provide one suggestion but inherently cannot provide a complete assessment to the task at hand.

The MultiPattern package provides a framework for exploring data in a systematic manner and is particularly suited to describing ambiguous situations as in these examples. For the first dataset, the output of a multi-pattern analysis may be series of clusterings below (arranged using two colors).

multiple patterns in a four-group dataset

For the second dataset, the output may be as follows (arranged into three colors).

multiple patterns in a six-group dataset

Each of the suggested partitions reveals a reasonable pattern. Follow-up analyses can then either exploit one of these patterns or integrate information from several of them.

Documentation

The package documentation and vignette contains details about the package usage.

Development

The package code is available under a GPL-2 license.

Parts of the package use third-party components. These pacakges should be installed before running MultiPattern.



tkonopka/MultiPattern documentation built on May 31, 2019, 3:45 p.m.