findviews detects and plots groups of mutually dependent columns.
It is based on Shiny and ggplot.
Data frame or matrix to be processed
Maximum number of columns in the views. If set to
Character describing a clustering method, used internally
Optional Shiny parameters, used in Shiny's
findviews takes a data frame or a matrix as input. It
computes the pairwise dependency between the columns, detects clusters in the
resulting structure and displays the results with a Shiny app.
findviews processes numerical and categorical data separately. It excludes
the columns with only one value, the columns in which all the values are
distinct (e.g., primary keys), and the columns with more than 75% missing values.
findviews computes the dependency between the columns differently
depending on their type. It uses Pearson's coefficient of correlation for
numerical data, and Cramer's V for categorical data.
To cluster the columns,
findviews uses the function
hclust, R's implementation of agglomerative hierarchical
clustering. The parameter
clust_method specifies which flavor of
agglomerative clustering to use. The number of clusters is determined by the
1 2 3 4 5