Views of a multidimensional dataset.

Description

findviews detects and plots groups of mutually dependent columns. It is based on Shiny and ggplot.

Usage

1
findviews(data, view_size_max = NULL, clust_method = "complete", ...)

Arguments

data

Data frame or matrix to be processed

view_size_max

Maximum number of columns in the views. If set to NULL, findviews uses log2(ncol(data)), rounded upwards and capped at 5.

clust_method

Character describing a clustering method, used internally by hclust. Example values are "complete", "single" or "average".

...

Optional Shiny parameters, used in Shiny's runApp function.

Details

The function findviews takes a data frame or a matrix as input. It computes the pairwise dependency between the columns, detects clusters in the resulting structure and displays the results with a Shiny app.

findviews processes numerical and categorical data separately. It excludes the columns with only one value, the columns in which all the values are distinct (e.g., primary keys), and the columns with more than 75% missing values.

findviews computes the dependency between the columns differently depending on their type. It uses Pearson's coefficient of correlation for numerical data, and Cramer's V for categorical data.

To cluster the columns, findviews uses the function hclust, R's implementation of agglomerative hierarchical clustering. The parameter clust_method specifies which flavor of agglomerative clustering to use. The number of clusters is determined by the parameter view_size_max.

Examples

1
2
3
4
5
## Not run: 
findviews(mtcars)
findviews(mtcars, view_size_max = 4,  port = 7000)

## End(Not run)