knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = "672px", out.height = "480px", fig.width = 7, fig.height = 5, fig.align = "center", fig.retina = 1, dpi = 150 )
The muHVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below :
Data Compression: Vector quantization (VQ), HVQ (hierarchical vector quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective
Data Projection: Dimension projection of the compressed cells to 1D,2D and 3D with the Sammons Nonlinear Algorithm. This step creates topology preserving map coordinates into the desired output dimension
Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes heatmap plots for hierarchical Voronoi tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map. Useful for semi-supervised tasks
Prediction: Scoring new data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required
This package additionally provides functions for computing Sammonâ€™s projection and plotting the heat map of the variables on the tiles of the tessellations.
The muHVT process involves three steps:
06th December, 2022
This package now additionally provides functionality to predict based on a set of maps to monitor entities over time.
The creation of a predictive set of maps involves four steps -
Let us try to understand the steps with the help of the diagram below -
knitr::include_graphics('./mlayer1.png')
Initially, the raw data is passed, and a highly compressed Map A is constructed using the HVT
function. The output of this function will be hierarchically arranged vector quantized data that is used to identify the outlier cells in the dataset using the number of data points within each cell and the z-scores for each cell.
The identified outlier cell(s) is then passed to the removeOutliers
function along with Map A. This function removes the identified outlier cell(s) from the dataset and stores them in Map B as shown in the diagram. The final output of this function is a list of two items - a newly constructed map (Map B), and a subset of the dataset without outlier cell(s).
The plotCells
function plots the Voronoi tessellations for the compressed map (Map A) and highlights the identified outlier cell(s) in red on the plot. The function requires the identified outlier cell(s) number and the compressed map (Map A) as input in order to plot the tessellations map and highlight those outlier cells on it.
The dataset without outlier(s) gotten as an output from the removeOutliers function is then passed as an argument to the HVT
function with other parameters such as n_cells, quant.error, depth, etc. to construct another map (Map C).
Finally, all the constructed maps are passed to the mlayerHVT
function along with the test dataset on which the function will predict/score for finding which map and what cell each test record gets assigned to.
For detailed information on the above functions, refer the vignette.
Following are the links to the vignettes for the muHVT package:
muHVT Vignette: Contains descriptions of the functions used for vector quantization and construction of hierarchical voronoi tessellations for data analysis
muHVT Model Diagnostics Vignette: Contains descriptions of functions used to perform model diagnostics and validation for muHVT model
muHVT : Using mlayerHVT() for Monitoring Entities over Time Contains descriptions of the functions used for monitoring entities over time using a predictive set of HVT maps
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.