Views of a multidimensional dataset, ranked by their differentiation power.

Description

findviews_to_compare detects views on which two arbitrary sets of rows differ. It plots the results with ggplot and Shiny.

Usage

1
2
findviews_to_compare(group1, group2, data, view_size_max = NULL,
  clust_method = "complete", ...)

Arguments

group1

Logical vector of size nrow(data), which describes the first group to compare. The value TRUE at position i indicates the the i-th row of data belongs to the group.

group2

Logical vector, which describes the second group to compare. The value TRUE at position i indicates the the i-th row of data belongs to the group.

data

Data frame or matrix to be processed

view_size_max

Maximum number of columns in the views. If set to NULL, findviews uses log2(ncol(data)), rounded upwards and capped at 5.

clust_method

Character describing a clustering method, used internally by hclust. Example values are "complete", "single" or "average".

...

Optional Shiny parameters, used in Shiny's runApp function.

Details

The function findviews_to_compare takes two groups of rows as input and detects views on which the statistical distribution of those two groups differ.

To detect the set of views, findviews_to_compare eliminates the rows which are present in neither group and applies findviews.

To evaluate the differentiation power of the views, findviews computes the histograms of the two groups to be compared, and computes their dissimilarity them with the Euclidean distance.

This method is loosely based on the following paper:

1
2
3
Fast, Explainable View Detection to Characterize Exploration Queries
Thibault Sellam, Martin Kersten
SSDBM, 2016

Examples

1
2
3
4
## Not run: 
findviews_to_compare(mtcars$mpg >= 20 , mtcars$mpg < 20 , mtcars)

## End(Not run)